github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Simplify bench_latency.py

merrymercy opened this pull request 4 months ago

Update test_srt_backend.py

merrymercy opened this pull request 4 months ago

[Bug] radixcache stack_overflow

luzengxiangcn opened this issue 4 months ago

[CI] Move AMD test to a separate file

merrymercy opened this pull request 4 months ago

debug radixcache stack_overflow

luzengxiangcn opened this pull request 4 months ago

Speculative decoding with EAGLE2

yukavio opened this pull request 4 months ago

MoE torch compile

ispobock opened this pull request 4 months ago

Fix the overhead due to penalizer in bench_latency

merrymercy opened this pull request 4 months ago

Fix RuntimeEndpoint.select method

jeffrey-fong opened this pull request 4 months ago

minor: add mla fp8 test

zhyncs opened this pull request 4 months ago

[Community] Add open collective sponsor link to README

Ying1123 opened this pull request 4 months ago

Update dockerfile to include datamodel_code_generator

merrymercy opened this pull request 4 months ago

Add AMD tests to CI

Ying1123 opened this pull request 4 months ago

[API, Feature] Support response prefill for openai API

Ying1123 opened this pull request 4 months ago

Add a unit test for data parallelism

merrymercy opened this pull request 4 months ago

Better unit tests for adding a new model

merrymercy opened this pull request 4 months ago

Development Roadmap (2024 Q4)

Ying1123 opened this issue 4 months ago

doc: update backend

zhyncs opened this pull request 4 months ago

[Bug] tp-4 start timeout

siddhatiwari opened this issue 4 months ago

Add MLA gsm8k eval

ispobock opened this pull request 4 months ago

chore: bump v0.3.1.post3

zhyncs opened this pull request 4 months ago

Fix triton head num

ispobock opened this pull request 4 months ago

fix incorrect links in documentation

rchen19 opened this pull request 4 months ago

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch

liangan1 opened this pull request 4 months ago

[Bug] Deepseek-V2.5 capture cuda graph failed

halexan opened this issue 4 months ago

[Bug] The sglang cannot reach the preset concurrency level.

rangehow opened this issue 4 months ago

Add OLMoE

Muennighoff opened this pull request 4 months ago

minor: add quant eval compared with base

zhyncs opened this pull request 4 months ago

[Bug] The engine hangs after requesting health_generate 190 times.

unix1986 opened this issue 4 months ago

Fix env vars in bench_latency

merrymercy opened this pull request 4 months ago

[Performance] Add triton kernels for LoRA

Ying1123 opened this pull request 4 months ago

Release v0.3.1.post2

merrymercy opened this pull request 4 months ago

Fix padding in the cuda graph

merrymercy opened this pull request 4 months ago

[Bug] illegal memory access encountered

wonderisland opened this issue 4 months ago

[Bug] enable-mixed-chunk may cause the regex request get wrong result and output_token_logprobs

liuteng opened this issue 4 months ago

Debug schedule optimization

hnyls2002 opened this pull request 4 months ago

fix: creat new dict everytime for putting new frame

Luodian opened this pull request 4 months ago

[Bug] oom,torch.OutOfMemoryError: seems to only use one gpu on A800-80G,available 40g on each card

chuangzhidan opened this issue 4 months ago

[WIP] Prometheus Metrics

binarycrayon opened this pull request 4 months ago

[Question]Why is the default value of max_prefill_tokens 16384?

wjj19950828 opened this issue 4 months ago

Support double sparsity

andy-yang-1 opened this pull request 4 months ago

[Event] Add public meeting invite to README

Ying1123 opened this pull request 4 months ago

Fuse top_k and top_k in the sampler

merrymercy opened this pull request 4 months ago

Pr fix max workers

wellhowtosay opened this pull request 4 months ago

[Bug] OOM when runing `bench_serving` with DeepSeekCoder-V2-Lite.

zh-zheng opened this issue 4 months ago

Fix oom issues with fp8 for llama

merrymercy opened this pull request 4 months ago

[Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419)

HaiShaw opened this pull request 4 months ago

Add bench_server_latency.py

merrymercy opened this pull request 4 months ago

Fix schedule bug

hnyls2002 opened this pull request 4 months ago

fix schedule bug

hnyls2002 opened this pull request 4 months ago

Fixed n>1 causing list index out of range with VLM

jasonyux opened this pull request 4 months ago

Fix attention backend

ispobock opened this pull request 4 months ago

Enable MLA by default

ispobock opened this pull request 4 months ago

[Bug] Performance issue on MoE with torch.compile

ispobock opened this issue 4 months ago

Release 0.3.1.post1

merrymercy opened this pull request 4 months ago

Add OLMoE model

janimo opened this pull request 4 months ago

[Bug] The latest Sglang docker image cannot start online services

CedricHwong opened this issue 4 months ago

Fix torch compile for deepseek-v2

ispobock opened this pull request 4 months ago

Simplify sampler and its error handling

merrymercy opened this pull request 4 months ago

Clean up model loader

merrymercy opened this pull request 4 months ago

[Bug] Llama 405B FP8 causes OOM on 16xA40

sumukshashidhar opened this issue 4 months ago

Add constrained_json_whitespace_pattern to ServerArgs

zifeitong opened this pull request 4 months ago

[Feature] Add initial support for sequence parallelism

Ying1123 opened this pull request 4 months ago

[Feature] Expert parallelism support

chongli-uw opened this issue 4 months ago

[Bug] Nonsense and slow output under high concurrency

tongyx361 opened this issue 4 months ago

[Feature] Support LoRA path renaming and add LoRA serving benchmarks

Ying1123 opened this pull request 4 months ago

Revert "[Minor] Raise exception for wrong import (#1409)"

Ying1123 opened this pull request 4 months ago

Remove deprecated configs

merrymercy opened this pull request 4 months ago

Release v0.3.1

merrymercy opened this pull request 4 months ago

Update backend.md

merrymercy opened this pull request 4 months ago

[Fix] Fix logprob and normalized_logprob

merrymercy opened this pull request 4 months ago

Add libibverbs-dev to Dockerfile

Aphoh opened this pull request 4 months ago

fix: resolve nightly eval

zhyncs opened this pull request 4 months ago

Add pytorch sampling backend ut

ispobock opened this pull request 4 months ago

[Bug] missing max_workers param when initiate ProcessPoolExecutor

wellhowtosay opened this issue 4 months ago

[Bug] MLA models can't use enable-torch-compile. Can be fix by suppressing errors.

Achazwl opened this issue 4 months ago

Enable torch.compile for triton backend

merrymercy opened this pull request 4 months ago

[Bug] deepseek-v2 fp8 cuda graph errror

fengyang95 opened this issue 4 months ago

[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm

HaiShaw opened this pull request 4 months ago

[Feature] Support AMD GPU via PyTorch for ROCm

HaiShaw opened this issue 4 months ago

Add torchao quant for mixtral and qwen_moe

jerryzh168 opened this pull request 4 months ago

fallback to round robin scheduler

qeternity opened this pull request 4 months ago

[Bug] AttributeError: 'MiniCPM3ForCausalLM' object has no attribute 'get_module_name'

Lixtt opened this issue 4 months ago

[Bug] OpenAI batch API gets stuck

dmakhervaks opened this issue 4 months ago

ci: fix finish

zhyncs opened this pull request 4 months ago

[Bug] triton attention-backend bug

81549361 opened this issue 4 months ago

Update pr-test.yml

merrymercy opened this pull request 4 months ago

Balance test in CI

merrymercy opened this pull request 4 months ago

Update pr-test.yml

merrymercy opened this pull request 4 months ago

[Minor] Raise exception for wrong import

Ying1123 opened this pull request 4 months ago

[CI] Include triton backend and online serving benchmark into CI

merrymercy opened this pull request 4 months ago

Make stop reason a dict instead of str

merrymercy opened this pull request 4 months ago

[Minor, CI] remove lora test from minimal suite

Ying1123 opened this pull request 4 months ago

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator

josephydu opened this issue 4 months ago

[Bug] ImportError : cannot import name 'gemma_fused_add_rmsnorm' from 'flashinfer.norm'

luo647 opened this issue 4 months ago

kernel: use tensor cores for flashinfer gqa kernels

yzh119 opened this pull request 4 months ago

[Minor Fix] Fix llava modalities issue for single-image

kcz358 opened this pull request 4 months ago

Support cuda graph in the triton attention backend

merrymercy opened this pull request 4 months ago

[Bug] LLaVA performance inconsistent with the result

kcz358 opened this issue 4 months ago

Fix README format

Achazwl opened this pull request 4 months ago