github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

[rust] refactor server and router

ByronHsu opened this pull request 2 months ago

[Bug] Make multi-lora serving compatible with cuda graph and radix cache

LIUKAI0815 opened this issue 2 months ago

Change judge to classify & Modify make file

zhaochenyang20 opened this pull request 2 months ago

Add get latest commit

Ying1123 opened this pull request 2 months ago

[Feature] add model LlavaOnevisionForConditionalGeneration

zhangucan opened this issue 2 months ago

[Release, ROCm] release ROCm docker build for AMD MI GPUs

HaiShaw opened this pull request 2 months ago

Update CODEOWNERS

ByronHsu opened this pull request 2 months ago

[Docs, ROCm] update install to cover ROCm with MI GPUs

HaiShaw opened this pull request 2 months ago

Scheduler methods

josephydu opened this pull request 2 months ago

[Feature]Support Qwen2_5...etc tools calling by OpenAI API

CedricHwong opened this issue 2 months ago

[Feature] sglang.rocm support flashinfer?

linqingxu opened this issue 2 months ago

Add Reward API Docs etc

zhaochenyang20 opened this pull request 2 months ago

Fix regex docs

merrymercy opened this pull request 2 months ago

Release v0.3.5

merrymercy opened this pull request 2 months ago

Let reward model take text inputs instead of message lists

merrymercy opened this pull request 2 months ago

feat: support truss endpoint for benchmark serving

zhyncs opened this pull request 2 months ago

Unify the model type checking

merrymercy opened this pull request 2 months ago

Simplify tokenizer manager

merrymercy opened this pull request 2 months ago

Allow passing dtype and max_new_tokens to HF reference script

janimo opened this pull request 2 months ago

Escape backwards slash

inakineitor opened this pull request 2 months ago

Fix issue with `stop_token_ids` not being iterable.

inakineitor opened this pull request 2 months ago

Expose max_total_num_tokens for Token Limit Calculation in Request Handling

hahmad2008 opened this issue 2 months ago

Simplify tokenizer manager

merrymercy opened this pull request 2 months ago

Difference between TokenizerManager and Runtime class

NrKhader opened this issue 2 months ago

Do not use longest prefix matching when #queue-req is large

merrymercy opened this pull request 2 months ago

QWen VL Follow-up Fixes

merrymercy opened this issue 2 months ago

turn off log for the offline engine

zhaochenyang20 opened this pull request 2 months ago

Add engine api

zhaochenyang20 opened this pull request 2 months ago

[router] Impl radix tree and set up CI

ByronHsu opened this pull request 2 months ago

Fix ci and link error

zhaochenyang20 opened this pull request 2 months ago

Add Rust Router Python Binding

austin362667 opened this pull request 2 months ago

Fix docs

merrymercy opened this pull request 2 months ago

Fix docs

merrymercy opened this pull request 2 months ago

Fix docs ci

zhaochenyang20 opened this pull request 2 months ago

Fix docs

zhaochenyang20 opened this pull request 2 months ago

Native api

zhaochenyang20 opened this pull request 2 months ago

Update index.rst to improve the order of docs

merrymercy opened this pull request 2 months ago

Add requests with curl

zhaochenyang20 opened this pull request 2 months ago

add native api docs

zhaochenyang20 opened this pull request 2 months ago

Fix doc links

merrymercy opened this pull request 2 months ago

Update docs and workflow

merrymercy opened this pull request 2 months ago

Native api documents

zhaochenyang20 opened this pull request 2 months ago

Update docs title

merrymercy opened this pull request 2 months ago

Fix links in the docs

merrymercy opened this pull request 2 months ago

Add a FAQ documentation

merrymercy opened this pull request 2 months ago

Add Tensor Parallel to torch_native_llama

kwen2501 opened this pull request 2 months ago

Improve docs and fix the broken links

merrymercy opened this pull request 2 months ago

Benchmark torchao and torch.compile (need torch 2.5)

jerryzh168 opened this issue 2 months ago

Fix incorrect context length for llama3.2-11b

rchen19 opened this pull request 2 months ago

[Bug] Offline engine performance is not better than local server when running batch

jischein opened this issue 2 months ago

[3rdparty, document] Updated Documentation that covers performance tuning techniques for AMD Instinct GPUs.

yichiche opened this pull request 2 months ago

Question: Does sglang support prefix cache for multimodal models?

htrekker opened this issue 2 months ago

Unable to Load Gemma2 Model with SGLANG

hahmad2008 opened this issue 2 months ago

Update spec infer

yukavio opened this pull request 2 months ago

minor: update nightly eval

zhyncs opened this pull request 2 months ago

Add vlm document

zhaochenyang20 opened this pull request 2 months ago

[Feature] Create a benchmark script for offline inference

ByronHsu opened this issue 2 months ago

Add vlm tutorial

zhaochenyang20 opened this pull request 2 months ago

[Bug] Exception output when Cuda Graph is enabled for Qwen2.5-Coder

TechxGenus opened this issue 2 months ago

Update vocab embedding deps and add TP switch

ispobock opened this pull request 2 months ago

[Build, ROCm] Dockerfile.rocm for Instinct GPUs, with package updates

HaiShaw opened this pull request 2 months ago

Fix retraction + overlap

hnyls2002 opened this pull request 2 months ago

change file tree

zhaochenyang20 opened this pull request 2 months ago

Fix memory leak for chunked prefill 2

merrymercy opened this pull request 2 months ago

TP8 scheduling overhead is very high for small model, Llama 3 8B on AMD

hliuca opened this issue 2 months ago

Update vocab embedding deps and add TP switch

ispobock opened this pull request 2 months ago

delete unused character

geeker-smallwhite opened this pull request 2 months ago

delete unused characters

geeker-smallwhite opened this pull request 2 months ago

support prometheus metrics

Lzhang-hub opened this pull request 2 months ago

Fix warnings in doc build

merrymercy opened this pull request 2 months ago

Simplify documentation

merrymercy opened this pull request 2 months ago

Fix mixed chunked prefill

merrymercy opened this pull request 2 months ago

chore: update torch v2.5.1

zhyncs opened this pull request 2 months ago

Make decode log interval configurable

ByronHsu opened this pull request 2 months ago

Refactor tokenizer manager

ByronHsu opened this pull request 2 months ago

[Performance, Triton Kernel Args] _decode_grouped_softmax_reducev_fwd…

HaiShaw opened this pull request 2 months ago

Byhsu/router 1

ByronHsu opened this pull request 2 months ago

Byhsu/router 1

ByronHsu opened this pull request 2 months ago

Fix suggest edit

zhaochenyang20 opened this pull request 2 months ago

[Bug] sglang template import issue

multimodalpragmatic opened this issue 2 months ago

Update README.md

merrymercy opened this pull request 2 months ago

Update docs

merrymercy opened this pull request 2 months ago

[Production] Drain requests before exit when receive SIGTERM

Ying1123 opened this pull request 2 months ago

Fix memroy leak caused by chunked prefill

hnyls2002 opened this pull request 2 months ago

[Performance, Hardware] MoE weights padding to AMD MI300x GPUs

HaiShaw opened this pull request 2 months ago

Gpt2

DanielC12321 opened this pull request 2 months ago

[Bug] stop_str of qwen2-vl template should be a tuple not a str

wellhowtosay opened this issue 2 months ago

Debug studio

zolinthecow opened this pull request 2 months ago

fix get_memory_pool_size deadlock for DP

ByronHsu opened this pull request 2 months ago

Remove delay after cancelled streaming requests are aborted

matthew-hippocratic opened this pull request 2 months ago

Questions Regarding sglang vs vllm and Memory Management

hahmad2008 opened this issue 2 months ago

Imporve openai api documents

zhaochenyang20 opened this pull request 2 months ago

[Feature] Support QLoRA weights

zzh-www opened this issue 2 months ago

Fix update_weights deadlock for DP

ByronHsu opened this pull request 2 months ago

Support setting `use_thread` in the `run_program` for easier debugging.

liuyanyi opened this pull request 2 months ago

[3rdparty, document] Add 3rdparty/amd, with profiling and tuning instructions to be added

HaiShaw opened this pull request 2 months ago

Fix docs deploy ci

zhaochenyang20 opened this pull request 2 months ago

support token ids in `engine.generate`

ByronHsu opened this pull request 2 months ago

Fix Triton decode kernel & ut

ispobock opened this pull request 2 months ago

Granite and GraniteMoE models.

janimo opened this pull request 3 months ago