github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

[CI] Split test cases in CI for better load balancing

merrymercy opened this pull request 28 days ago

feat: add should_use_tensor_core

zhyncs opened this pull request 28 days ago

[Feature] Get the real logprobs to analyze decoding

Snowdar opened this issue 28 days ago

[Bug] frequency penalty

vivian0429 opened this issue 28 days ago

Update XGrammar to the latest API

Ubospica opened this pull request 28 days ago

[Fix] Avoid calling fill_vocab_mask for terminated requests

Ubospica opened this pull request 28 days ago

feat: fused_moe fp8 monkey patch

zhyncs opened this pull request 28 days ago

[feat] Refactor session control interface and add CI

Ying1123 opened this pull request 28 days ago

Question about ragged wrapper

ZhongYingMatrix opened this issue 28 days ago

[Performance]: Process affinity to CPU cores with multiple sockets support

HaiShaw opened this pull request 28 days ago

Replace prob based with threshold based load balancing

ByronHsu opened this pull request 28 days ago

Allow overwrite flashinfer use_tensorcore

merrymercy opened this pull request 28 days ago

[Feature] How to accelerate constrained decoding when regex needs to change with input?

GrittyChen opened this issue 28 days ago

[Fused moe] add tuning fused configs for qwen2 57b and mixtral 8x7b

BBuf opened this pull request 28 days ago

[Bug] cannot import name 'CachedGrammarCompiler' from 'xgrammar' (version 0.3.6)

Quang-elec44 opened this issue 28 days ago

test select concurrency

qeternity opened this pull request 29 days ago

Fix docs

merrymercy opened this pull request 29 days ago

Rename triton_fused_moe -> fused_moe_triton

merrymercy opened this pull request 29 days ago

Balance CI tests

merrymercy opened this pull request 29 days ago

fix: use torch.sum for compatible

zhyncs opened this pull request 29 days ago

[Bug] FusedMoE compatible with vllm 0.6.3.post1

zhyncs opened this issue 29 days ago

Update CI threshold & Improve code style

merrymercy opened this pull request 29 days ago

Fix mixed chunked prefill in overlap mode

merrymercy opened this pull request 29 days ago

fix: resolve end-of-file-fixer

zhyncs opened this pull request 29 days ago

feat: update other MoE models deps

zhyncs opened this pull request 29 days ago

feat: update gitignore and add tuning config for FusedMoE

zhyncs opened this pull request 29 days ago

Simplify `Scheduler.update_running_batch`

merrymercy opened this pull request 29 days ago

feat: remove the dependency on FusedMoE

zhyncs opened this pull request 29 days ago

Merged three native APIs into one: get_server_info

henryhmko opened this pull request 29 days ago

[Bug] llava use image hash as token，leading to cache bug

zwc163 opened this issue 29 days ago

Speculative EAGLE2

yukavio opened this pull request 29 days ago

Byhsu/fairness router

ByronHsu opened this pull request 29 days ago

Improve sglang router

ByronHsu opened this pull request 29 days ago

add prefix match for certain tenant

ByronHsu opened this pull request 29 days ago

Add more api routes (completion, health, etc) to the router

ByronHsu opened this pull request 29 days ago

[Draft] Resolving integration differences after XGrammar lauch refactoring

gittb opened this pull request 29 days ago

fix dp_rank env

ByronHsu opened this pull request 29 days ago

update router doc

ByronHsu opened this pull request 29 days ago

Bump sglang-router to 0.0.5

ByronHsu opened this pull request 30 days ago

[Bug] Error when using LLAVA 1.5 for llava bench

pspdada opened this issue 30 days ago

fix: resolve bench_serving args

zhyncs opened this pull request 30 days ago

Fix dp print message

merrymercy opened this pull request 30 days ago

[CI] Fix test cases

merrymercy opened this pull request 30 days ago

Add concurrency option for benchmark

cermeng opened this pull request 30 days ago

Add concurrency option in benchmark

cermeng opened this pull request 30 days ago

Fix grid size in Triton decoding kernel

ispobock opened this pull request 30 days ago

[Bug] Error when launching llava1.5

pspdada opened this issue 30 days ago

deps(flashinfer): fix `is_flashinfer_available()` and make `flashinfer` optional dependency

XuehaiPan opened this pull request about 1 month ago

[Feature] Support LLaMA-3.2 finetuned with Sentence Transformers !

thusinh1969 opened this issue about 1 month ago

Revert "Only stream output on tp rank 0"

merrymercy opened this pull request about 1 month ago

EAGLE2: general part [2]

yukavio opened this pull request about 1 month ago

EAGLE2: Eagle related part [1]

yukavio opened this pull request about 1 month ago

feat(pre-commit): trim unnecessary notebook metadata from git history

XuehaiPan opened this pull request about 1 month ago

fix: add xgrammar dependency

zhyncs opened this pull request about 1 month ago

minor: update gsm8k threshold

zhyncs opened this pull request about 1 month ago

Only stream output on tp rank 0

merrymercy opened this pull request about 1 month ago

add profile in offline benchmark & update doc

bjmsong opened this pull request about 1 month ago

[minor] Clean up unused imports

merrymercy opened this pull request about 1 month ago

Add initial support for intel Gaudi accelerators

ankurneog opened this pull request about 1 month ago

chore: bump v0.3.6

zhyncs opened this pull request about 1 month ago

Online weight update [WIP]

zhaochenyang20 opened this pull request about 1 month ago

Rename sglang.bench_latency to sglang.bench_one_batch

merrymercy opened this pull request about 1 month ago

[Bug] Unable to load GPTQ Mixtral 8x7 v0.1 with SGLang

DhruvaBansal00 opened this issue about 1 month ago

Turn off autotune for scaled mm for fp8 dynamic quant in torchao

jerryzh168 opened this pull request about 1 month ago

[router] add base_gpu_id server args & merged radix tree python reference

ByronHsu opened this pull request about 1 month ago

[router] cache-aware load-balancing router v1

ByronHsu opened this pull request about 1 month ago

[Feature] Inference example code for Qwen2-VL

YuanLiuuuuuu opened this issue about 1 month ago

[Bug] Qwen2-VL-7B with sglang Performance Degradation on MME benchmark

Mr-Loevan opened this issue about 1 month ago

ROCm: Fix MoE padding for none FP8 cases

HaiShaw opened this pull request about 1 month ago

Benchmark with Pytorch Profiler easily

bjmsong opened this pull request about 1 month ago

[Feature] Support for rerank models

dinhanhx opened this issue about 1 month ago

[Feature] Is Yarn supported in sglang?

klykq111 opened this issue about 1 month ago

Error out when torchao-config option is not recognized

jerryzh168 opened this pull request about 1 month ago

Fix #2037 - Context length check does not take into out pad tokens for visual models

jakep-allenai opened this pull request about 1 month ago

Enable overlap scheduler by default for the triton attention backend

merrymercy opened this pull request about 1 month ago

Move test_session_id.py to playground

merrymercy opened this pull request about 1 month ago

Allow skipping warmup in bench_offline_throughput.py

merrymercy opened this pull request about 1 month ago

[Bug] RuntimeError: Failed to allocate memory for batch_prefill_tmp_v with size 435814400 and alignment 16 in AlignedAllocator

yuki252111 opened this issue about 1 month ago

feat: use cascade attention kernel (single level)

james-p-xu opened this pull request about 1 month ago

Update nightly-eval.yml

merrymercy opened this pull request about 1 month ago

[Bug] canot load Gemma2 awq

Foreist opened this issue about 1 month ago

[Bug] big TPOT and ITL when running the offline benchmark

TraceIvan opened this issue about 1 month ago

Use native fp8 format on MI300X

HaiShaw opened this pull request about 1 month ago

minor: add dataset dump and questions shuffle

zhyncs opened this pull request about 1 month ago

Expose max total num tokens from Runtime & Engine API

henryhmko opened this pull request about 1 month ago

minor: update gsm8k eval

zhyncs opened this pull request about 1 month ago

[Bug] disk cache io error when simultaneously loading lots of sglang offline engine

LeeSureman opened this issue about 1 month ago

Use cuda event wait and synchronization instead of busy waiting

merrymercy opened this pull request about 1 month ago

Fix: incorrect top_logprobs in chat completion

ajwaitz opened this pull request about 1 month ago

[Feature, Performance] kv cache performance improvement

HaiShaw opened this issue about 1 month ago

Simplify logits penalizer

merrymercy opened this pull request about 1 month ago

Allow passing extra request body to bench_offline_throughput.py

merrymercy opened this pull request about 1 month ago

[Bug] Qwen-2.5-Math-7B-Instruct and Llama-3.1-8B-Instruct Produce Nonsensical Results

Broyojo opened this issue about 1 month ago

Fix chunked prefill with output logprob

merrymercy opened this pull request about 1 month ago

feat(srt): support prefill and generate with `input_embeds`

XuehaiPan opened this pull request about 1 month ago

Add simple CPU offloading support.

janimo opened this pull request about 1 month ago

[Feature] TorchAO support for Qwen 32B

grahama1970 opened this issue about 1 month ago

Rename layer_idx to layer_id for consistency

janimo opened this pull request about 1 month ago

docs: fix module docstrings and copyright headers

XuehaiPan opened this pull request about 1 month ago

[Performance] why so many bubbles between steps when running llava-one-vision?

sleepwalker2017 opened this issue about 1 month ago