github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Add torchao quant (int4/int8/fp8) to llama models

jerryzh168 opened this pull request 5 months ago

docs: add conclusion

zhyncs opened this pull request 5 months ago

Optimize schedule

hnyls2002 opened this pull request 5 months ago

[Bug] Multi machine, multi card, slow speed

guleng opened this issue 5 months ago

docs: highlight ttft itl and throughput

zhyncs opened this pull request 5 months ago

docs: update README

zhyncs opened this pull request 5 months ago

[Feature] Per-request random seed

laoconeth opened this issue 5 months ago

[Bug] ConnectionResetError: [Errno 104] Connection reset by peer

oliver-li opened this issue 5 months ago

[Bug] Unsupported architectures: ChatGLMForConditionalGeneration.

maxin9966 opened this issue 5 months ago

[Bug] Using 8 H20 GPUs, the deepseek-coder-v2-fp8 starts up normally, but there is no response to client requests.

fengyang95 opened this issue 5 months ago

Remove useless fields in global_config.py

merrymercy opened this pull request 5 months ago

docs: update news

zhyncs opened this pull request 5 months ago

Fix the flaky test test_moe_eval_accuracy_large.py

merrymercy opened this pull request 5 months ago

[Bug] T4 Crash

Abdulhanan535 opened this issue 5 months ago

[Bug] RuntimeError in ModelTpServer

Lzhang-hub opened this issue 5 months ago

[Feature] support smooth-quant?

Lzhang-hub opened this issue 5 months ago

[Bug] Facing Error When starting.

Abdulhanan535 opened this issue 5 months ago

chore: bump v0.3.0

zhyncs opened this pull request 5 months ago

misc: speedup load safetensors

zhyncs opened this pull request 5 months ago

Fix select by ensuring each request has at least one token

merrymercy opened this pull request 5 months ago

Fix llama2 weight loader

merrymercy opened this pull request 5 months ago

[Bug] Unable to fix model output

cherishhh opened this issue 5 months ago

The CPU is also occupied at 100% when there are no requests.

luhairong11 opened this issue 5 months ago

Update README.md for llava-onevision instructions

merrymercy opened this pull request 5 months ago

[Bug] gen with regex: Token fusion between input and output, try to avoid this by removing the space at the end of the input.

alanxmay opened this issue 5 months ago

Removed unused methods

janimo opened this pull request 5 months ago

[Bug] Update to 0.2.15 and torch compile leads to error

zhaochenyang20 opened this issue 5 months ago

Adding document for backend

zhaochenyang20 opened this pull request 5 months ago

[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping

merrymercy opened this pull request 5 months ago

[Feature] Initial support for multi-LoRA serving

Ying1123 opened this pull request 5 months ago

Fix bugs in sampler with CUDA graph / torch.compile

hnyls2002 opened this pull request 5 months ago

feat: update linear deps 1/N

zhyncs opened this pull request 5 months ago

feat: update nightly gsm8k eval

zhyncs opened this pull request 5 months ago

Do you support frontend-language inference for Llava-OneVision ?

ehayeshaiper opened this issue 5 months ago

[Bug] A100 PCIE torch compile error

zhyncs opened this issue 5 months ago

Adding Documentation for installation

zhaochenyang20 opened this pull request 5 months ago

Support Phi3 mini and medium

janimo opened this pull request 5 months ago

[server] Passing `model_override_args` to `launch_server` via the CLI.

kevin85421 opened this pull request 5 months ago

Fix hang when doing s += None.

max99x opened this pull request 5 months ago

Fix regex mask

hnyls2002 opened this pull request 5 months ago

Release v0.2.15

merrymercy opened this pull request 5 months ago

[doc] Fix more broken links

ByronHsu opened this pull request 5 months ago

Fix the flaky tests in test_moe_eval_accuracy_large.py

merrymercy opened this pull request 5 months ago

[Feature] Correctness test for Triton kernels

ByronHsu opened this issue 5 months ago

ci: add nightly eval

zhyncs opened this pull request 5 months ago

fix: resolve fp8 for mixtral

zhyncs opened this pull request 5 months ago

[CI] merge all ci tests into one file

merrymercy opened this pull request 5 months ago

[triton] Remove the zero initialization of qk_acc by directly writing the result

ByronHsu opened this pull request 5 months ago

Separated control and compute loop, shorten the critical path, and enable more complicated policies

Ying1123 opened this pull request 5 months ago

Support Triton fp8 e5m2 kv cache

ispobock opened this pull request 5 months ago

feat: fix fp8 for MLA and support bmm fp8 for DeepSeek V2

zhyncs opened this pull request 5 months ago

[Chore] Rename model_overide_args to model_override_args

kevin85421 opened this pull request 5 months ago

[Feature] Support phi-3 model

ByronHsu opened this issue 5 months ago

[doc] fix quick start link

ByronHsu opened this pull request 5 months ago

[triton] Support head_dim not 2^n in triton extend and decode attention

ByronHsu opened this pull request 5 months ago

[CI] Add more multi-gpu tests

merrymercy opened this pull request 5 months ago

[Bug] device-side assert triggered when using run_batch

stikkireddy opened this issue 5 months ago

Optimize new token calculation

hnyls2002 opened this pull request 5 months ago

Allow new lines during JSON generation

qeternity opened this pull request 5 months ago

fix: resolve the fp8 bug introduced by vLLM 0.5.5

zhyncs opened this pull request 5 months ago

[Bug] sglang.launch_server error

andyluo7 opened this issue 5 months ago

[Bug] Device-side assert triggered in logits processor when running Llama 3.1 70B

hrukalive opened this issue 5 months ago

[Feature] support long context eval and benchmark

zhyncs opened this issue 5 months ago

[Feature] support nightly eval

zhyncs opened this issue 5 months ago

[Feature] support ultravox

zhyncs opened this issue 5 months ago

[Bug] sglang run for few hours, it will stop returning valid response

liho00 opened this issue 5 months ago

Report median instead of mean in bench_latency.py

merrymercy opened this pull request 5 months ago

[Bug] Why sglang is slower than vllm on ShareGPT datasets?

lullabies777 opened this issue 5 months ago

Update README Support Exaone 3.0

Deepfocused opened this pull request 5 months ago

[Bug] OpenAI Compatible Prompt Template Error

BabyChouSr opened this issue 5 months ago

[Bug] Lower single request speed with mla enabled

halexan opened this issue 5 months ago

Optimize the update flashinfer indices

xiaobochen123 opened this pull request 5 months ago

Transpose mla weight offline

ispobock opened this pull request 5 months ago

fix: multimodal_config in monkey_patch_vllm_dummy_weight_loader

lxww302 opened this pull request 5 months ago

[Bug] cannot set --load-format=dummy with vllm 0.5.5

lxww302 opened this issue 5 months ago

EXAONE 3.0 Model Support

Deepfocused opened this pull request 5 months ago

[Bug] incorrect input_tokens_logprob slicing in RuntimeEndpoint.select method

jeffrey-fong opened this issue 5 months ago

Allow more flexible assistant and system response

BabyChouSr opened this pull request 5 months ago

fix data racing due to mutable reference using deepcopy

xiezhq-hermann opened this pull request 5 months ago

make json_schema usable from gen

qeternity opened this pull request 5 months ago

Sampler cudagraph

hnyls2002 opened this pull request 5 months ago

fix: resolve qwen2 moe weight loader

zhyncs opened this pull request 5 months ago

[Bug] Error in loading Qwen2-57B-A14B-Instruct

LucienShui opened this issue 5 months ago

chore: bump v0.2.14.post2

zhyncs opened this pull request 5 months ago

[Fix] Fix OOM in llava base class

merrymercy opened this pull request 5 months ago

[Feature] Context Caching

RonanKMcGovern opened this issue 5 months ago

Fix llava on multi images

merrymercy opened this pull request 5 months ago

[Bug] 0.2.14 version. ValueError: malformed node or string: None

lss15151161 opened this issue 5 months ago

fix: increase max_new_tokens when testing generation models

zhyncs opened this pull request 5 months ago

Add sglang.bench_latency to CI

merrymercy opened this pull request 5 months ago

hotfix: revert sampler CUDA Graph

zhyncs opened this pull request 5 months ago

[Bug] AttributeError: 'ScheduleBatch' object has no attribute 'sample' WHEN I DO Benchmarking

ArtificialZeng opened this issue 5 months ago

[Bug] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib'

ArtificialZeng opened this issue 5 months ago

Update README.md

merrymercy opened this pull request 5 months ago

[Bug] get jammed when deploy Qwen2-72b ：UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

ArtificialZeng opened this issue 5 months ago

[Minor] Add more type annotations

merrymercy opened this pull request 5 months ago

Fix readme

ArtificialZeng opened this pull request 5 months ago

feat: replace GeluAndMul

zhyncs opened this pull request 5 months ago

feat: support sm75 with FlashInfer v0.1.6

zhyncs opened this pull request 5 months ago

feat: update GemmaRMSNorm

zhyncs opened this pull request 5 months ago