Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
[Bug] Performance issue on MoE with torch.compile
ispobock opened this issue 4 months ago
ispobock opened this issue 4 months ago
Release 0.3.1.post1
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
Add OLMoE model
janimo opened this pull request 4 months ago
janimo opened this pull request 4 months ago
[Bug] The latest Sglang docker image cannot start online services
CedricHwong opened this issue 4 months ago
CedricHwong opened this issue 4 months ago
Fix torch compile for deepseek-v2
ispobock opened this pull request 4 months ago
ispobock opened this pull request 4 months ago
Simplify sampler and its error handling
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
Clean up model loader
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
[Bug] Llama 405B FP8 causes OOM on 16xA40
sumukshashidhar opened this issue 4 months ago
sumukshashidhar opened this issue 4 months ago
Add constrained_json_whitespace_pattern to ServerArgs
zifeitong opened this pull request 4 months ago
zifeitong opened this pull request 4 months ago
[Feature] Add initial support for sequence parallelism
Ying1123 opened this pull request 4 months ago
Ying1123 opened this pull request 4 months ago
[Feature] Expert parallelism support
chongli-uw opened this issue 4 months ago
chongli-uw opened this issue 4 months ago
[Bug] Nonsense and slow output under high concurrency
tongyx361 opened this issue 4 months ago
tongyx361 opened this issue 4 months ago
[Feature] Support LoRA path renaming and add LoRA serving benchmarks
Ying1123 opened this pull request 4 months ago
Ying1123 opened this pull request 4 months ago
Revert "[Minor] Raise exception for wrong import (#1409)"
Ying1123 opened this pull request 4 months ago
Ying1123 opened this pull request 4 months ago
Remove deprecated configs
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
Release v0.3.1
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
Update backend.md
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
[Fix] Fix logprob and normalized_logprob
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
Add libibverbs-dev to Dockerfile
Aphoh opened this pull request 4 months ago
Aphoh opened this pull request 4 months ago
fix: resolve nightly eval
zhyncs opened this pull request 4 months ago
zhyncs opened this pull request 4 months ago
Add pytorch sampling backend ut
ispobock opened this pull request 4 months ago
ispobock opened this pull request 4 months ago
[Bug] missing max_workers param when initiate ProcessPoolExecutor
wellhowtosay opened this issue 4 months ago
wellhowtosay opened this issue 4 months ago
[Bug] MLA models can't use enable-torch-compile. Can be fix by suppressing errors.
Achazwl opened this issue 4 months ago
Achazwl opened this issue 4 months ago
Enable torch.compile for triton backend
merrymercy opened this pull request 4 months ago
merrymercy opened this pull request 4 months ago
[Bug] deepseek-v2 fp8 cuda graph errror
fengyang95 opened this issue 4 months ago
fengyang95 opened this issue 4 months ago
[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm
HaiShaw opened this pull request 5 months ago
HaiShaw opened this pull request 5 months ago
[Feature] Support AMD GPU via PyTorch for ROCm
HaiShaw opened this issue 5 months ago
HaiShaw opened this issue 5 months ago
Add torchao quant for mixtral and qwen_moe
jerryzh168 opened this pull request 5 months ago
jerryzh168 opened this pull request 5 months ago
fallback to round robin scheduler
qeternity opened this pull request 5 months ago
qeternity opened this pull request 5 months ago
[Bug] AttributeError: 'MiniCPM3ForCausalLM' object has no attribute 'get_module_name'
Lixtt opened this issue 5 months ago
Lixtt opened this issue 5 months ago
[Bug] OpenAI batch API gets stuck
dmakhervaks opened this issue 5 months ago
dmakhervaks opened this issue 5 months ago
ci: fix finish
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Bug] triton attention-backend bug
81549361 opened this issue 5 months ago
81549361 opened this issue 5 months ago
Update pr-test.yml
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Balance test in CI
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Update pr-test.yml
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Minor] Raise exception for wrong import
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
[CI] Include triton backend and online serving benchmark into CI
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Make stop reason a dict instead of str
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Minor, CI] remove lora test from minimal suite
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator
josephydu opened this issue 5 months ago
josephydu opened this issue 5 months ago
[Bug] ImportError : cannot import name 'gemma_fused_add_rmsnorm' from 'flashinfer.norm'
luo647 opened this issue 5 months ago
luo647 opened this issue 5 months ago
kernel: use tensor cores for flashinfer gqa kernels
yzh119 opened this pull request 5 months ago
yzh119 opened this pull request 5 months ago
[Minor Fix] Fix llava modalities issue for single-image
kcz358 opened this pull request 5 months ago
kcz358 opened this pull request 5 months ago
Support cuda graph in the triton attention backend
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] LLaVA performance inconsistent with the result
kcz358 opened this issue 5 months ago
kcz358 opened this issue 5 months ago
Fix README format
Achazwl opened this pull request 5 months ago
Achazwl opened this pull request 5 months ago
Add Support for XVERSE Models (Dense and MoE) to sglang
hxer7963 opened this pull request 5 months ago
hxer7963 opened this pull request 5 months ago
[Feature] support awq of deepseek-v2 or deepseek-v2.5
tutu329 opened this issue 5 months ago
tutu329 opened this issue 5 months ago
[Feature] need DeepSeek-v2 or deepseek-v2.5 awq support
tutu329 opened this issue 5 months ago
tutu329 opened this issue 5 months ago
Remove synchronization in cuda graph replay
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Add no commit to main rule
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Optimize conflicts between CUDA graph and vocab mask tensors
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Bug] 'LlamaTokenizerFast' object has no attribute 'tokenizer'
zwc163 opened this issue 5 months ago
zwc163 opened this issue 5 months ago
Improve error reporting during server launch
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Fix] Fix --disable-flashinfer
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] Support torch profiler
danielhua23 opened this issue 5 months ago
danielhua23 opened this issue 5 months ago
[Feature] Can centos7 use this project?
luo647 opened this issue 5 months ago
luo647 opened this issue 5 months ago
[Bug] requests.exceptions.JSONDecodeError:
eyuansu62 opened this issue 5 months ago
eyuansu62 opened this issue 5 months ago
remove assertion in triton attention and add an unit test
ByronHsu opened this pull request 5 months ago
ByronHsu opened this pull request 5 months ago
[Feature] Support RM API
UbeCc opened this issue 5 months ago
UbeCc opened this issue 5 months ago
Rewrite mixed chunked prefill
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Bug] too many processes
wellhowtosay opened this issue 5 months ago
wellhowtosay opened this issue 5 months ago
Refactor attention backend
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Deprecate --disable-flashinfer and introduce --attention-backend
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Minor] move triton attention kernels into a separate folder
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Organize flashinfer indices update
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Do not merge] Test torchao
jerryzh168 opened this pull request 5 months ago
jerryzh168 opened this pull request 5 months ago
Fix vocab mask update bug
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Minor] improve kill scripts and torchao import
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] 4-bit quantized prefix cache
josephrocca opened this issue 5 months ago
josephrocca opened this issue 5 months ago
Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy
josephrocca opened this pull request 5 months ago
josephrocca opened this pull request 5 months ago
deepseek-v2 torch.compile error
cdj0311 opened this issue 5 months ago
cdj0311 opened this issue 5 months ago
Support MiniCPM3
Achazwl opened this pull request 5 months ago
Achazwl opened this pull request 5 months ago
fix bug of `undefined is_single` in meth `create_abort_task`
wcsjtu opened this pull request 5 months ago
wcsjtu opened this pull request 5 months ago
deepseek-v2 enable-mla 4x slower
cdj0311 opened this issue 5 months ago
cdj0311 opened this issue 5 months ago
[Docs] Improve documentations
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
BaiChuan2 Model
blacker521 opened this pull request 5 months ago
blacker521 opened this pull request 5 months ago
SGLang Discussion WeChat Group
qingkelab opened this issue 5 months ago
qingkelab opened this issue 5 months ago
[Bug] Unable to see logprobs for prompt/input
dmakhervaks opened this issue 5 months ago
dmakhervaks opened this issue 5 months ago
[Bug] Mixed chunked prefill is not compatible with vocab tensor mask
hnyls2002 opened this issue 5 months ago
hnyls2002 opened this issue 5 months ago
Support OpenAI API json_schema response format
zifeitong opened this pull request 5 months ago
zifeitong opened this pull request 5 months ago
[Bug] sgLang v0.3 breaks TP8 Llama 3.1 405B FP8 on 8xH100
jischein opened this issue 5 months ago
jischein opened this issue 5 months ago
[CI] Return output logprobs in unit test
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
Unify forward mode
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Feature] Follow up on non power of 2 triton kernel
ByronHsu opened this issue 5 months ago
ByronHsu opened this issue 5 months ago
[Bug] it seems memory leak in sglang when longtime serving
CSEEduanyu opened this issue 5 months ago
CSEEduanyu opened this issue 5 months ago
[Minor] Many cleanup
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] support LLaVA-NeXT-Video-32B-Qwen
HarperGG opened this issue 5 months ago
HarperGG opened this issue 5 months ago
[Feature] smooth quant or other quant method
MichoChan opened this issue 5 months ago
MichoChan opened this issue 5 months ago
[Feature] support qwen2 vl
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Feature] KV Cache Quantization
ghost opened this issue 5 months ago
ghost opened this issue 5 months ago
[Feature] DRY repetition penalty
vnkc1 opened this issue 5 months ago
vnkc1 opened this issue 5 months ago
[Bug] `served_model_name` argument in the server_arg.py is not checked
zhaochenyang20 opened this issue 5 months ago
zhaochenyang20 opened this issue 5 months ago
[Feature] KV Cache Compression
ghost opened this issue 5 months ago
ghost opened this issue 5 months ago
[Feat] Add modalities for vision server when handling pixel values for llava
kcz358 opened this pull request 5 months ago
kcz358 opened this pull request 5 months ago
Fix some online scheduling delay
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Bug] it didn't work when using tp on RTX 3090
milktea888 opened this issue 5 months ago
milktea888 opened this issue 5 months ago
jinja2.exceptions.TemplateError: System role not supported
sdecoder opened this issue 5 months ago
sdecoder opened this issue 5 months ago