github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

[Bug] Performance issue on MoE with torch.compile

ispobock opened this issue 4 months ago

Release 0.3.1.post1

merrymercy opened this pull request 4 months ago

Add OLMoE model

janimo opened this pull request 4 months ago

[Bug] The latest Sglang docker image cannot start online services

CedricHwong opened this issue 4 months ago

Fix torch compile for deepseek-v2

ispobock opened this pull request 4 months ago

Simplify sampler and its error handling

merrymercy opened this pull request 4 months ago

Clean up model loader

merrymercy opened this pull request 4 months ago

[Bug] Llama 405B FP8 causes OOM on 16xA40

sumukshashidhar opened this issue 4 months ago

Add constrained_json_whitespace_pattern to ServerArgs

zifeitong opened this pull request 4 months ago

[Feature] Add initial support for sequence parallelism

Ying1123 opened this pull request 4 months ago

[Feature] Expert parallelism support

chongli-uw opened this issue 4 months ago

[Bug] Nonsense and slow output under high concurrency

tongyx361 opened this issue 4 months ago

[Feature] Support LoRA path renaming and add LoRA serving benchmarks

Ying1123 opened this pull request 4 months ago

Revert "[Minor] Raise exception for wrong import (#1409)"

Ying1123 opened this pull request 4 months ago

Remove deprecated configs

merrymercy opened this pull request 4 months ago

Release v0.3.1

merrymercy opened this pull request 4 months ago

Update backend.md

merrymercy opened this pull request 4 months ago

[Fix] Fix logprob and normalized_logprob

merrymercy opened this pull request 4 months ago

Add libibverbs-dev to Dockerfile

Aphoh opened this pull request 4 months ago

fix: resolve nightly eval

zhyncs opened this pull request 4 months ago

Add pytorch sampling backend ut

ispobock opened this pull request 4 months ago

[Bug] missing max_workers param when initiate ProcessPoolExecutor

wellhowtosay opened this issue 4 months ago

[Bug] MLA models can't use enable-torch-compile. Can be fix by suppressing errors.

Achazwl opened this issue 4 months ago

Enable torch.compile for triton backend

merrymercy opened this pull request 4 months ago

[Bug] deepseek-v2 fp8 cuda graph errror

fengyang95 opened this issue 4 months ago

[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm

HaiShaw opened this pull request 5 months ago

[Feature] Support AMD GPU via PyTorch for ROCm

HaiShaw opened this issue 5 months ago

Add torchao quant for mixtral and qwen_moe

jerryzh168 opened this pull request 5 months ago

fallback to round robin scheduler

qeternity opened this pull request 5 months ago

[Bug] AttributeError: 'MiniCPM3ForCausalLM' object has no attribute 'get_module_name'

Lixtt opened this issue 5 months ago

[Bug] OpenAI batch API gets stuck

dmakhervaks opened this issue 5 months ago

ci: fix finish

zhyncs opened this pull request 5 months ago

[Bug] triton attention-backend bug

81549361 opened this issue 5 months ago

Update pr-test.yml

merrymercy opened this pull request 5 months ago

Balance test in CI

merrymercy opened this pull request 5 months ago

Update pr-test.yml

merrymercy opened this pull request 5 months ago

[Minor] Raise exception for wrong import

Ying1123 opened this pull request 5 months ago

[CI] Include triton backend and online serving benchmark into CI

merrymercy opened this pull request 5 months ago

Make stop reason a dict instead of str

merrymercy opened this pull request 5 months ago

[Minor, CI] remove lora test from minimal suite

Ying1123 opened this pull request 5 months ago

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 458752000 and alignment 16 in AlignedAllocator

josephydu opened this issue 5 months ago

[Bug] ImportError : cannot import name 'gemma_fused_add_rmsnorm' from 'flashinfer.norm'

luo647 opened this issue 5 months ago

kernel: use tensor cores for flashinfer gqa kernels

yzh119 opened this pull request 5 months ago

[Minor Fix] Fix llava modalities issue for single-image

kcz358 opened this pull request 5 months ago

Support cuda graph in the triton attention backend

merrymercy opened this pull request 5 months ago

[Bug] LLaVA performance inconsistent with the result

kcz358 opened this issue 5 months ago

Fix README format

Achazwl opened this pull request 5 months ago

[Bug] This modeling file requires the following packages that were not found in your environment: datamodel_code_generator. Run `pip install datamodel_code_generator`

cicicji opened this issue 5 months ago

Add Support for XVERSE Models (Dense and MoE) to sglang

hxer7963 opened this pull request 5 months ago

[Feature] support awq of deepseek-v2 or deepseek-v2.5

tutu329 opened this issue 5 months ago

[Feature] need DeepSeek-v2 or deepseek-v2.5 awq support

tutu329 opened this issue 5 months ago

Remove synchronization in cuda graph replay

hnyls2002 opened this pull request 5 months ago

Add no commit to main rule

hnyls2002 opened this pull request 5 months ago

Optimize conflicts between CUDA graph and vocab mask tensors

hnyls2002 opened this pull request 5 months ago

[Bug] 'LlamaTokenizerFast' object has no attribute 'tokenizer'

zwc163 opened this issue 5 months ago

Improve error reporting during server launch

merrymercy opened this pull request 5 months ago

[Fix] Fix --disable-flashinfer

merrymercy opened this pull request 5 months ago

[Feature] Support torch profiler

danielhua23 opened this issue 5 months ago

[Feature] Can centos7 use this project?

luo647 opened this issue 5 months ago

[Bug] requests.exceptions.JSONDecodeError:

eyuansu62 opened this issue 5 months ago

remove assertion in triton attention and add an unit test

ByronHsu opened this pull request 5 months ago

[Feature] Support RM API

UbeCc opened this issue 5 months ago

Rewrite mixed chunked prefill

hnyls2002 opened this pull request 5 months ago

[Bug] too many processes

wellhowtosay opened this issue 5 months ago

Refactor attention backend

merrymercy opened this pull request 5 months ago

Deprecate --disable-flashinfer and introduce --attention-backend

merrymercy opened this pull request 5 months ago

[Minor] move triton attention kernels into a separate folder

merrymercy opened this pull request 5 months ago

Organize flashinfer indices update

hnyls2002 opened this pull request 5 months ago

[Do not merge] Test torchao

jerryzh168 opened this pull request 5 months ago

Fix vocab mask update bug

hnyls2002 opened this pull request 5 months ago

[Minor] improve kill scripts and torchao import

merrymercy opened this pull request 5 months ago

[Feature] 4-bit quantized prefix cache

josephrocca opened this issue 5 months ago

Fix CORS compatibility with OpenAI, vLLM, TGI, LMDeploy

josephrocca opened this pull request 5 months ago

deepseek-v2 torch.compile error

cdj0311 opened this issue 5 months ago

Support MiniCPM3

Achazwl opened this pull request 5 months ago

fix bug of `undefined is_single` in meth `create_abort_task`

wcsjtu opened this pull request 5 months ago

deepseek-v2 enable-mla 4x slower

cdj0311 opened this issue 5 months ago

[Docs] Improve documentations

merrymercy opened this pull request 5 months ago

BaiChuan2 Model

blacker521 opened this pull request 5 months ago

SGLang Discussion WeChat Group

qingkelab opened this issue 5 months ago

[Bug] Unable to see logprobs for prompt/input

dmakhervaks opened this issue 5 months ago

[Bug] Mixed chunked prefill is not compatible with vocab tensor mask

hnyls2002 opened this issue 5 months ago

Support OpenAI API json_schema response format

zifeitong opened this pull request 5 months ago

[Bug] sgLang v0.3 breaks TP8 Llama 3.1 405B FP8 on 8xH100

jischein opened this issue 5 months ago

[CI] Return output logprobs in unit test

Ying1123 opened this pull request 5 months ago

Unify forward mode

hnyls2002 opened this pull request 5 months ago

[Feature] Follow up on non power of 2 triton kernel

ByronHsu opened this issue 5 months ago

[Bug] it seems memory leak in sglang when longtime serving

CSEEduanyu opened this issue 5 months ago

[Minor] Many cleanup

merrymercy opened this pull request 5 months ago

[Feature] support LLaVA-NeXT-Video-32B-Qwen

HarperGG opened this issue 5 months ago

[Feature] smooth quant or other quant method

MichoChan opened this issue 5 months ago

[Feature] support qwen2 vl

zhyncs opened this issue 5 months ago

[Feature] KV Cache Quantization

ghost opened this issue 5 months ago

[Feature] DRY repetition penalty

vnkc1 opened this issue 5 months ago

[Bug] `served_model_name` argument in the server_arg.py is not checked

zhaochenyang20 opened this issue 5 months ago

[Feature] KV Cache Compression

ghost opened this issue 5 months ago

[Feat] Add modalities for vision server when handling pixel values for llava

kcz358 opened this pull request 5 months ago

Fix some online scheduling delay

hnyls2002 opened this pull request 5 months ago

[Bug] it didn't work when using tp on RTX 3090

milktea888 opened this issue 5 months ago

jinja2.exceptions.TemplateError: System role not supported

sdecoder opened this issue 5 months ago