Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
Add torchao quant (int4/int8/fp8) to llama models
jerryzh168 opened this pull request 5 months ago
jerryzh168 opened this pull request 5 months ago
docs: add conclusion
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
Optimize schedule
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Bug] Multi machine, multi card, slow speed
guleng opened this issue 5 months ago
guleng opened this issue 5 months ago
docs: highlight ttft itl and throughput
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
docs: update README
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Feature] Per-request random seed
laoconeth opened this issue 5 months ago
laoconeth opened this issue 5 months ago
[Bug] ConnectionResetError: [Errno 104] Connection reset by peer
oliver-li opened this issue 5 months ago
oliver-li opened this issue 5 months ago
[Bug] Unsupported architectures: ChatGLMForConditionalGeneration.
maxin9966 opened this issue 5 months ago
maxin9966 opened this issue 5 months ago
[Bug] Using 8 H20 GPUs, the deepseek-coder-v2-fp8 starts up normally, but there is no response to client requests.
fengyang95 opened this issue 5 months ago
fengyang95 opened this issue 5 months ago
Remove useless fields in global_config.py
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
docs: update news
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
Fix the flaky test test_moe_eval_accuracy_large.py
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] T4 Crash
Abdulhanan535 opened this issue 5 months ago
Abdulhanan535 opened this issue 5 months ago
[Bug] RuntimeError in ModelTpServer
Lzhang-hub opened this issue 5 months ago
Lzhang-hub opened this issue 5 months ago
[Feature] support smooth-quant?
Lzhang-hub opened this issue 5 months ago
Lzhang-hub opened this issue 5 months ago
[Bug] Facing Error When starting.
Abdulhanan535 opened this issue 5 months ago
Abdulhanan535 opened this issue 5 months ago
chore: bump v0.3.0
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
misc: speedup load safetensors
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
Fix select by ensuring each request has at least one token
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Fix llama2 weight loader
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] Unable to fix model output
cherishhh opened this issue 5 months ago
cherishhh opened this issue 5 months ago
The CPU is also occupied at 100% when there are no requests.
luhairong11 opened this issue 5 months ago
luhairong11 opened this issue 5 months ago
Update README.md for llava-onevision instructions
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] gen with regex: Token fusion between input and output, try to avoid this by removing the space at the end of the input.
alanxmay opened this issue 5 months ago
alanxmay opened this issue 5 months ago
Removed unused methods
janimo opened this pull request 5 months ago
janimo opened this pull request 5 months ago
[Bug] Update to 0.2.15 and torch compile leads to error
zhaochenyang20 opened this issue 5 months ago
zhaochenyang20 opened this issue 5 months ago
Adding document for backend
zhaochenyang20 opened this pull request 5 months ago
zhaochenyang20 opened this pull request 5 months ago
[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] Initial support for multi-LoRA serving
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
Fix bugs in sampler with CUDA graph / torch.compile
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
feat: update linear deps 1/N
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
feat: update nightly gsm8k eval
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
Do you support frontend-language inference for Llava-OneVision ?
ehayeshaiper opened this issue 5 months ago
ehayeshaiper opened this issue 5 months ago
[Bug] A100 PCIE torch compile error
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
Adding Documentation for installation
zhaochenyang20 opened this pull request 5 months ago
zhaochenyang20 opened this pull request 5 months ago
Support Phi3 mini and medium
janimo opened this pull request 5 months ago
janimo opened this pull request 5 months ago
[server] Passing `model_override_args` to `launch_server` via the CLI.
kevin85421 opened this pull request 5 months ago
kevin85421 opened this pull request 5 months ago
Fix hang when doing s += None.
max99x opened this pull request 5 months ago
max99x opened this pull request 5 months ago
Fix regex mask
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Release v0.2.15
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[doc] Fix more broken links
ByronHsu opened this pull request 5 months ago
ByronHsu opened this pull request 5 months ago
Fix the flaky tests in test_moe_eval_accuracy_large.py
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] Correctness test for Triton kernels
ByronHsu opened this issue 5 months ago
ByronHsu opened this issue 5 months ago
ci: add nightly eval
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
fix: resolve fp8 for mixtral
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[CI] merge all ci tests into one file
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[triton] Remove the zero initialization of qk_acc by directly writing the result
ByronHsu opened this pull request 5 months ago
ByronHsu opened this pull request 5 months ago
Separated control and compute loop, shorten the critical path, and enable more complicated policies
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
Support Triton fp8 e5m2 kv cache
ispobock opened this pull request 5 months ago
ispobock opened this pull request 5 months ago
feat: fix fp8 for MLA and support bmm fp8 for DeepSeek V2
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Chore] Rename model_overide_args to model_override_args
kevin85421 opened this pull request 5 months ago
kevin85421 opened this pull request 5 months ago
[Feature] Support phi-3 model
ByronHsu opened this issue 5 months ago
ByronHsu opened this issue 5 months ago
[doc] fix quick start link
ByronHsu opened this pull request 5 months ago
ByronHsu opened this pull request 5 months ago
[triton] Support head_dim not 2^n in triton extend and decode attention
ByronHsu opened this pull request 5 months ago
ByronHsu opened this pull request 5 months ago
[CI] Add more multi-gpu tests
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] device-side assert triggered when using run_batch
stikkireddy opened this issue 5 months ago
stikkireddy opened this issue 5 months ago
Optimize new token calculation
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Allow new lines during JSON generation
qeternity opened this pull request 5 months ago
qeternity opened this pull request 5 months ago
fix: resolve the fp8 bug introduced by vLLM 0.5.5
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Bug] sglang.launch_server error
andyluo7 opened this issue 5 months ago
andyluo7 opened this issue 5 months ago
[Bug] Device-side assert triggered in logits processor when running Llama 3.1 70B
hrukalive opened this issue 5 months ago
hrukalive opened this issue 5 months ago
[Feature] support long context eval and benchmark
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Feature] support nightly eval
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Feature] support ultravox
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Bug] sglang run for few hours, it will stop returning valid response
liho00 opened this issue 5 months ago
liho00 opened this issue 5 months ago
Report median instead of mean in bench_latency.py
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] Why sglang is slower than vllm on ShareGPT datasets?
lullabies777 opened this issue 5 months ago
lullabies777 opened this issue 5 months ago
Update README Support Exaone 3.0
Deepfocused opened this pull request 5 months ago
Deepfocused opened this pull request 5 months ago
[Bug] OpenAI Compatible Prompt Template Error
BabyChouSr opened this issue 5 months ago
BabyChouSr opened this issue 5 months ago
[Bug] Lower single request speed with mla enabled
halexan opened this issue 5 months ago
halexan opened this issue 5 months ago
Optimize the update flashinfer indices
xiaobochen123 opened this pull request 5 months ago
xiaobochen123 opened this pull request 5 months ago
Transpose mla weight offline
ispobock opened this pull request 5 months ago
ispobock opened this pull request 5 months ago
fix: multimodal_config in monkey_patch_vllm_dummy_weight_loader
lxww302 opened this pull request 5 months ago
lxww302 opened this pull request 5 months ago
[Bug] cannot set --load-format=dummy with vllm 0.5.5
lxww302 opened this issue 5 months ago
lxww302 opened this issue 5 months ago
EXAONE 3.0 Model Support
Deepfocused opened this pull request 5 months ago
Deepfocused opened this pull request 5 months ago
[Bug] incorrect input_tokens_logprob slicing in RuntimeEndpoint.select method
jeffrey-fong opened this issue 5 months ago
jeffrey-fong opened this issue 5 months ago
Allow more flexible assistant and system response
BabyChouSr opened this pull request 5 months ago
BabyChouSr opened this pull request 5 months ago
fix data racing due to mutable reference using deepcopy
xiezhq-hermann opened this pull request 5 months ago
xiezhq-hermann opened this pull request 5 months ago
make json_schema usable from gen
qeternity opened this pull request 5 months ago
qeternity opened this pull request 5 months ago
Sampler cudagraph
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
fix: resolve qwen2 moe weight loader
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Bug] Error in loading Qwen2-57B-A14B-Instruct
LucienShui opened this issue 5 months ago
LucienShui opened this issue 5 months ago
chore: bump v0.2.14.post2
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Fix] Fix OOM in llava base class
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] Context Caching
RonanKMcGovern opened this issue 5 months ago
RonanKMcGovern opened this issue 5 months ago
Fix llava on multi images
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] 0.2.14 version. ValueError: malformed node or string: None
lss15151161 opened this issue 5 months ago
lss15151161 opened this issue 5 months ago
fix: increase max_new_tokens when testing generation models
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
Add sglang.bench_latency to CI
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
hotfix: revert sampler CUDA Graph
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Bug] AttributeError: 'ScheduleBatch' object has no attribute 'sample' WHEN I DO Benchmarking
ArtificialZeng opened this issue 5 months ago
ArtificialZeng opened this issue 5 months ago
[Bug] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib'
ArtificialZeng opened this issue 5 months ago
ArtificialZeng opened this issue 5 months ago
Update README.md
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Minor] Add more type annotations
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Fix readme
ArtificialZeng opened this pull request 5 months ago
ArtificialZeng opened this pull request 5 months ago
feat: replace GeluAndMul
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
feat: support sm75 with FlashInfer v0.1.6
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
feat: update GemmaRMSNorm
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago