github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Dispatch flashinfer wrappers

hnyls2002 opened this pull request 4 months ago

[Refactor] Simplify io_struct and tokenizer_manager

Ying1123 opened this pull request 4 months ago

Fix bugs of `logprobs_nums`

hnyls2002 opened this pull request 4 months ago

Organize Attention Backends

hnyls2002 opened this pull request 4 months ago

Support qwen2 vl model

yizhang2077 opened this pull request 4 months ago

[Fix, LoRA] fix LoRA with updates in main

Ying1123 opened this pull request 4 months ago

Clean up batch data structures: Introducing ModelWorkerBatch

merrymercy opened this pull request 4 months ago

Rename InputMetadata -> ForwardBatch

merrymercy opened this pull request 4 months ago

Add support for Molmo-D-7B Model

BabyChouSr opened this pull request 4 months ago

Let ModelRunner take InputMetadata as input, instead of ScheduleBatch

merrymercy opened this pull request 4 months ago

[Refactor] Simplify io_struct and tokenizer_manager

Ying1123 opened this pull request 4 months ago

Process image in parallel

hnyls2002 opened this pull request 4 months ago

Move scheduler code from tp_worker.py to scheduler.py

merrymercy opened this pull request 4 months ago

fix ipv6 url when warm up model

cauyxy opened this pull request 4 months ago

[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim'

mssongit opened this pull request 4 months ago

[Fix] Fix AttributeError in Qwen2.5(huggingface model) LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_module_name'

mssongit opened this pull request 4 months ago

Improve process creation

merrymercy opened this pull request 4 months ago

[Bug] ValueError: The memory capacity is unbalanced

chuangzhidan opened this issue 4 months ago

Make detokenizer_manager.py not asyncio

merrymercy opened this pull request 4 months ago

Organize image inputs

hnyls2002 opened this pull request 4 months ago

Multiple minor fixes

merrymercy opened this pull request 4 months ago

[Event] Update meeting link

Ying1123 opened this pull request 4 months ago

Add float8 dynamic quant to torchao_utils

jerryzh168 opened this pull request 4 months ago

[Feature] VLLM 6.0 support

arunpatala opened this issue 4 months ago

[Bug] IndexError: list index out of range

lvxianfeng-git opened this issue 4 months ago

[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B

Ying1123 opened this pull request 4 months ago

minor: fix config

hnyls2002 opened this pull request 4 months ago

[Feature] add support for llama 3.2

Stealthwriter opened this issue 4 months ago

[Bug] Unable to use gptq or awq with torch.compile (8*A40)

smallstepman opened this issue 4 months ago

[FIX] Catch syntax error of Regex Guide to avoid crash

du00cs opened this pull request 4 months ago

[bugfix]Add modelscope package to avoid docker image without modelscope

KylinMountain opened this pull request 4 months ago

Accuracy reduction of Lora

yileld opened this issue 4 months ago

Update Dockerfile

KylinMountain opened this pull request 4 months ago

[Bug] no module modelscope using docker compose to start sglang

KylinMountain opened this issue 4 months ago

How to study the code?

TJ949 opened this issue 4 months ago

[Feature] _get_pixel_values needs to return tgt_sizes

huangzl18883 opened this issue 4 months ago

[Fix] Ignore model import error

merrymercy opened this pull request 4 months ago

Release v0.3.2

Ying1123 opened this pull request 4 months ago

Revert "kernel: use tensor cores for flashinfer gqa kernels"

Ying1123 opened this pull request 4 months ago

[Fix] Fix clean_up_tokenization_spaces in tokenizer

merrymercy opened this pull request 4 months ago

[Bug] tensor parallel run error

jerryzh168 opened this issue 4 months ago

Add support for tie_word_embeddings when loading weights + support for SmolLM

TianyiQ opened this pull request 4 months ago

[CI] Update nightly eval

Ying1123 opened this pull request 4 months ago

[Bug] LLaVa-next does not work for single image processing

ThomasBenzshawel opened this issue 4 months ago

AWQ performance tracking

zhyncs opened this issue 4 months ago

Possible timing side-channels caused by shared prefix

Unik-lif opened this issue 4 months ago

Simplify bench_latency.py

merrymercy opened this pull request 4 months ago

Update test_srt_backend.py

merrymercy opened this pull request 4 months ago

[Bug] radixcache stack_overflow

luzengxiangcn opened this issue 4 months ago

[CI] Move AMD test to a separate file

merrymercy opened this pull request 4 months ago

debug radixcache stack_overflow

luzengxiangcn opened this pull request 4 months ago

Speculative decoding with EAGLE2

yukavio opened this pull request 4 months ago

MoE torch compile

ispobock opened this pull request 4 months ago

Fix the overhead due to penalizer in bench_latency

merrymercy opened this pull request 4 months ago

Fix RuntimeEndpoint.select method

jeffrey-fong opened this pull request 4 months ago

minor: add mla fp8 test

zhyncs opened this pull request 4 months ago

[Community] Add open collective sponsor link to README

Ying1123 opened this pull request 4 months ago

Update dockerfile to include datamodel_code_generator

merrymercy opened this pull request 4 months ago

Add AMD tests to CI

Ying1123 opened this pull request 4 months ago

[API, Feature] Support response prefill for openai API

Ying1123 opened this pull request 4 months ago

Add a unit test for data parallelism

merrymercy opened this pull request 4 months ago

Better unit tests for adding a new model

merrymercy opened this pull request 4 months ago

Development Roadmap (2024 Q4)

Ying1123 opened this issue 4 months ago

doc: update backend

zhyncs opened this pull request 4 months ago

[Bug] tp-4 start timeout

siddhatiwari opened this issue 4 months ago

Add MLA gsm8k eval

ispobock opened this pull request 4 months ago

chore: bump v0.3.1.post3

zhyncs opened this pull request 4 months ago

Fix triton head num

ispobock opened this pull request 4 months ago

fix incorrect links in documentation

rchen19 opened this pull request 4 months ago

[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch

liangan1 opened this pull request 4 months ago

[Bug] Deepseek-V2.5 capture cuda graph failed

halexan opened this issue 4 months ago

[Bug] The sglang cannot reach the preset concurrency level.

rangehow opened this issue 4 months ago

Add OLMoE

Muennighoff opened this pull request 4 months ago

minor: add quant eval compared with base

zhyncs opened this pull request 4 months ago

[Bug] The engine hangs after requesting health_generate 190 times.

unix1986 opened this issue 4 months ago

Fix env vars in bench_latency

merrymercy opened this pull request 4 months ago

[Performance] Add triton kernels for LoRA

Ying1123 opened this pull request 4 months ago

Release v0.3.1.post2

merrymercy opened this pull request 4 months ago

Fix padding in the cuda graph

merrymercy opened this pull request 4 months ago

[Bug] illegal memory access encountered

wonderisland opened this issue 4 months ago

[Bug] enable-mixed-chunk may cause the regex request get wrong result and output_token_logprobs

liuteng opened this issue 4 months ago

Debug schedule optimization

hnyls2002 opened this pull request 4 months ago

fix: creat new dict everytime for putting new frame

Luodian opened this pull request 4 months ago

[Bug] oom,torch.OutOfMemoryError: seems to only use one gpu on A800-80G,available 40g on each card

chuangzhidan opened this issue 4 months ago

[WIP] Prometheus Metrics

binarycrayon opened this pull request 4 months ago

[Question]Why is the default value of max_prefill_tokens 16384?

wjj19950828 opened this issue 4 months ago

Support double sparsity

andy-yang-1 opened this pull request 4 months ago

[Event] Add public meeting invite to README

Ying1123 opened this pull request 4 months ago

Fuse top_k and top_k in the sampler

merrymercy opened this pull request 4 months ago

Pr fix max workers

wellhowtosay opened this pull request 4 months ago

[Bug] OOM when runing `bench_serving` with DeepSeekCoder-V2-Lite.

zh-zheng opened this issue 4 months ago

Fix oom issues with fp8 for llama

merrymercy opened this pull request 4 months ago

[Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419)

HaiShaw opened this pull request 4 months ago

Add bench_server_latency.py

merrymercy opened this pull request 4 months ago

Fix schedule bug

hnyls2002 opened this pull request 4 months ago

fix schedule bug

hnyls2002 opened this pull request 4 months ago

Fixed n>1 causing list index out of range with VLM

jasonyux opened this pull request 4 months ago

Fix attention backend

ispobock opened this pull request 4 months ago

Enable MLA by default

ispobock opened this pull request 4 months ago

[Bug] Performance issue on MoE with torch.compile

ispobock opened this issue 4 months ago