github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

chore: bump v0.0.2.post17 for sgl-kernel

zhyncs opened this pull request about 22 hours ago

mirror fix for custom allreduce

yizhang2077 opened this pull request 1 day ago

update 3rdparty for sgl-kernel

zhyncs opened this pull request 1 day ago

fix: Fix deprecated max_tokens param in openai ChatCompletionRequest

mickqian opened this pull request 1 day ago

support fp32 in sampling_scaling_penalties kernel

BBuf opened this pull request 1 day ago

Add EngineFragment

fzyzcjy opened this pull request 1 day ago

[DO NOT MERGE] Bump CI to check

fzyzcjy opened this pull request 1 day ago

Split communication logic from computation logic into orchestrator

fzyzcjy opened this pull request 1 day ago

Let DetokenizerManager use TypeBasedDispatcher

fzyzcjy opened this pull request 1 day ago

Rename TokenizerManager to StdOrchestrator

fzyzcjy opened this pull request 1 day ago

Extract generation_manager from tokenizer_manager

fzyzcjy opened this pull request 1 day ago

First draft on GSM8K benchmark.

simveit opened this pull request 1 day ago

enable kv_scale for Gemma2

hliuca opened this pull request 1 day ago

[Bug] Service crashed with 4 H100s and QPS=25

yh-yao opened this issue 1 day ago

Set USE_VLLM_CUSTOM_ALLREDUCE to empty string

tot0 opened this pull request 1 day ago

Add step to update sgl-kernel whl index

ispobock opened this pull request 1 day ago

Add workflow for sgl-kernel cu118 release

ispobock opened this pull request 2 days ago

[Bug] Crash special token xgrammar

maximegmd opened this issue 2 days ago

minor: update sgl-kernel setup

zhyncs opened this pull request 2 days ago

[Bug] Qwen2-VL-7B with sglang has significant numerical calculation errors compared to HF Transformers

kritohyh opened this issue 2 days ago

minor: sync flashinfer and add turbomind as 3rdparty

zhyncs opened this pull request 2 days ago

[Bug] constrained decoding performance is worse when qps>2

qibaoyuan opened this issue 2 days ago

Batch inference over multiple nodes

boyang-nlp opened this issue 2 days ago

Question About Model Integration and Parameter Updates (update_weight) in Sglang

davidlvxin opened this issue 2 days ago

[Bug] The batch decoding speed of DeepSeek V3 is too slow.

SonChoulJun opened this issue 2 days ago

[Bug] Multi-node BUG

sitabulaixizawaluduo opened this issue 2 days ago

[Bug] Qwen2-VL Online Serving Issue

ywang96 opened this issue 2 days ago

Fix cu118 group gemm compile issue

ispobock opened this pull request 2 days ago

[Docs] minor update for phi-3 and phi-4

adarshxs opened this pull request 2 days ago

[router] Fix twine uploading

ByronHsu opened this pull request 2 days ago

bump router to 0.1.4

ByronHsu opened this pull request 2 days ago

Add shapes for int8 gemm benchmark

ispobock opened this pull request 2 days ago

[Feature] Support InterVL

zhaochenyang20 opened this issue 2 days ago

create All2All MoE module && place holder for EP group and token disp…

shawlleyw opened this pull request 2 days ago

[Feature] Add support for Phi4

Stealthwriter opened this issue 2 days ago

[Benchmarks] Cant'run examples benchmark. Flashinfer error:

dsantiago opened this issue 2 days ago

feat: use sgl-kernel by default

zhyncs opened this pull request 3 days ago

chore: bump sgl-kernel 0.0.2.post16

zhyncs opened this pull request 3 days ago

feat: integrate sampling kernels into sgl-kernel

zhyncs opened this pull request 3 days ago

Add CPU affinity setting to latency benchmark

hubertlu-tw opened this pull request 3 days ago

[hotfix] fix test_sampling_scaling_penalties.py ci test

BBuf opened this pull request 3 days ago

Use flashinfer vec_dtypes in sgl_kernel

BBuf opened this pull request 3 days ago

sync flashinfer and update sgl-kernel tests

zhyncs opened this pull request 3 days ago

use env variable to control the build conf on the CPU build node

zhyncs opened this pull request 3 days ago

update version setup for sgl-kernel

zhyncs opened this pull request 3 days ago

fix build error for sgl-kernel

zhyncs opened this pull request 3 days ago

[Feature] docs: Improve documentation on how to use EAGLE speculative docoding

daviddl9 opened this issue 3 days ago

[Bug] DeepSeek-V3 load weights failed with --enable-ep-moe

MtFitzRoy opened this issue 3 days ago

Remove torch dependency in sgl-kernel

merrymercy opened this pull request 3 days ago

[Feature] Support service discovery on Kubernetes in router

gaocegege opened this issue 3 days ago

Some question about layernom in MLA code

hcyz33 opened this issue 3 days ago

use v0.6.4.post1 for sgl-kernel ci

zhyncs opened this pull request 3 days ago

[router] Forward all request headers from router to workers

ByronHsu opened this pull request 3 days ago

docs: update developer guide for sgl-kernel

zhyncs opened this pull request 3 days ago

docs: add developer guide for sgl-kernel

zhyncs opened this pull request 3 days ago

Revert "disable custom allreduce on HIP"

merrymercy opened this pull request 3 days ago

[Feature] Beam Search

laixinn opened this pull request 3 days ago

[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124

aooxin opened this issue 3 days ago

Indexing.cu:1255: indexSelectSmallIndex: block: [3,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

CallmeZhangChenchen opened this issue 3 days ago

[router] make error actionable

ByronHsu opened this pull request 3 days ago

Fix tp token sync for dp attention

merrymercy opened this pull request 3 days ago

Support loading of larger models with on-the-fly quantization

kwen2501 opened this pull request 3 days ago

Add some flags to allow sync token ids across TP ranks

merrymercy opened this pull request 4 days ago

[Bug] Problems with logit_bias.

cinjon opened this issue 4 days ago

disable custom allreduce on HIP

hliuca opened this pull request 4 days ago

add notice about flashinfer in sgl-kernel

zhyncs opened this pull request 4 days ago

feat: integrate bmm_fp8 kernel into sgl-kernel

zhyncs opened this pull request 4 days ago

fix rotary_embedding rope_scaling for phi

sudo-root-ns opened this pull request 4 days ago

minor: update header and use pytest

zhyncs opened this pull request 4 days ago

feat: integrate activation kernels into sgl-kernel

zhyncs opened this pull request 4 days ago

feat: integrate norm kernels into sgl-kernel

zhyncs opened this pull request 4 days ago

sync the upstream updates of flashinfer

zhyncs opened this pull request 4 days ago

[Bug] Decode Throughput Inconsistency Between bench_serving and Engine Logs

leepoly opened this issue 4 days ago

[Help wanted] CANN'T capture GPU activities using `nsight system`

sleepwalker2017 opened this issue 4 days ago

update norm cu

zhyncs opened this pull request 4 days ago

support w8a8 fp8 kernel with CUTLASS

HandH1998 opened this pull request 4 days ago

Fix sgl-kernel compile for sm80

ispobock opened this pull request 4 days ago

Fix the FP8 E4M3 parsing offline scales failure bug

sleepcoo opened this pull request 4 days ago

Modify the kernel test path & add it to the CI process.

sleepcoo opened this pull request 4 days ago

[Feature] Reasoning model API support

lambert0312 opened this issue 4 days ago

[Bug] Qwen2-VL-7B with sglang Performance Degradation

yileld opened this issue 4 days ago

[Feature] batch concurrent requests while streaming responses

moxiegushi opened this issue 4 days ago

Use int64 as indices for set_kv_buffer

merrymercy opened this pull request 4 days ago

[Doc]Update doc of profiling with PyTorch Profiler

Fridge003 opened this pull request 4 days ago

Allow local cutlass directory to be used in sgl-kernel build

trevor-m opened this pull request 4 days ago

fix pr-test-sgl-kernel

zhyncs opened this pull request 5 days ago

Support sm90 Int8 gemm

ispobock opened this pull request 5 days ago

Support int8 kvcahe

sleepcoo opened this pull request 5 days ago

feat: add flashinfer as 3rdparty and use rmsnorm as example

zhyncs opened this pull request 5 days ago

[Feature] Support Beam Search

laixinn opened this issue 5 days ago

Can router support --api-key parameter

lambert0312 opened this issue 5 days ago

support lightning_attention_decode in sgl-kernel for MiniMax-Text-01

BBuf opened this pull request 5 days ago

Debug radixcache: refactor recursive helper methods

luzengxiangcn opened this pull request 5 days ago

refactor radix cache

luzengxiangcn opened this pull request 5 days ago

Add accuracy and latency tests of eagle into CI

merrymercy opened this pull request 5 days ago

upgrade torch version for sgl-kernel

zhyncs opened this pull request 5 days ago

minor: update Makefile for sgl-kernel

zhyncs opened this pull request 5 days ago

[Bug] [EAGLE2] CUDA errors occur under high concurrency.

Xu-Chen opened this issue 5 days ago

Minicpmo

mickqian opened this pull request 5 days ago

Fix flaky tests in test_programs.py

merrymercy opened this pull request 5 days ago