github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Fix packet loss when deploy little model

sdli1995 opened this pull request 17 days ago

A better aio rwlock that guarantees the order

merrymercy opened this pull request 17 days ago

feat:support 2 kenrels for mixed chunked prefill

chosen-ox opened this pull request 18 days ago

Updated documentation for Grammar Backend

shuaills opened this pull request 18 days ago

[Feature] Function Calling

Tushar-ml opened this pull request 18 days ago

[Misc] Fix metrics, weight update lock, request logging

merrymercy opened this pull request 18 days ago

[Feature] (Willing to PR) Avoid KV cache occupying GPU memory when not used

fzyzcjy opened this issue 18 days ago

fix #2528

zhyncs opened this pull request 19 days ago

formatted

yixin-huang1 opened this pull request 19 days ago

[Bug] `Qwen/QwQ-32B-Preview` undefined symbol error

HuanzhiMao opened this issue 19 days ago

[Feature] Why qwen2-vl not support radix cache

vchzls opened this issue 19 days ago

[Bug] Eagle2 has an unstable sampling rate during multi concurrency。

coolhok opened this issue 19 days ago

[Bug] Failed to launch engine when working with Ray Serve: signal only works in main thread of the main interpreter

pengye91 opened this issue 19 days ago

Enable Nvidia's ModelOpt fp8 quantized models

Edwardf0t1 opened this pull request 19 days ago

[Cache Offload] Improve radix cache offload benchmark

Edenzzzz opened this pull request 19 days ago

[Cache Offload] Remove device sync overhead

Edenzzzz opened this pull request 20 days ago

[Bug] Transformers doesn't recognize LLaVA variant architectures

amosyou opened this issue 20 days ago

[Feature] Add Docs For Quantization

binhtranmcs opened this issue 20 days ago

[Bug] SGLang v0.4.0 with AMD MI300X

BruceXcluding opened this issue 20 days ago

Add lora_paths to v1_chat_generate_request

ccchow opened this pull request 21 days ago

Add integration with gemlite weight only quant

jerryzh168 opened this pull request 21 days ago

[Bug] install from source cannot start

fansongfs opened this issue 21 days ago

[Feature] Support new parameter - EBNF in xgrammar

adarshxs opened this pull request 21 days ago

chore: bump v0.4.0.post2

zhyncs opened this pull request 21 days ago

fix followup #2517

zhyncs opened this pull request 21 days ago

docs: update sponsorship (DataCrunch)

zhyncs opened this pull request 21 days ago

Update pyproject.toml: add dependancy "ninja"

adarshxs opened this pull request 21 days ago

fix: package data missing

yudian0504 opened this pull request 21 days ago

sglang for Qwen2.5-14b deploy Error

wuxianyess opened this issue 21 days ago

Is video inference supported?

wuxianyess opened this issue 21 days ago

[Bug] using xgrammar with json schema, performance is worse than no xgrammar and json schema

fansongfs opened this issue 21 days ago

fix: continue to use flashinfer 0.1.6 temporarily

zhyncs opened this pull request 21 days ago

docs: update README

zhyncs opened this pull request 21 days ago

feat: add llama3 eval

zhyncs opened this pull request 21 days ago

[Bug] RuntimeRrror: Ninja is required to load c++ extensions

Flynn-Zh opened this issue 21 days ago

Add generator-style run_batch function

xingyaoww opened this pull request 22 days ago

adapt custom allreduce for tensorrt llm

yizhang2077 opened this pull request 22 days ago

[Feature] Support for Evicting Specific KV Cache to Save GPU Memory

ChenlongDeng opened this issue 22 days ago

[kernel optimize] benchmark write_req_to_token_pool_triton and optimize kernel

BBuf opened this pull request 22 days ago

[Feature] do sample = False?

boqiny opened this issue 22 days ago

[Feature] Faster torch.compile

MichoChan opened this issue 22 days ago

[Feature] Integration SGLang into OpenRLHF

zhaochenyang20 opened this issue 22 days ago

[Feature] Add Tutorial for Constraint Decoding

zhaochenyang20 opened this issue 22 days ago

[Feature] Add Math in our CI

zhaochenyang20 opened this issue 22 days ago

Print progress bar during cuda graph capture

merrymercy opened this pull request 23 days ago

fix: add ninja as dependency for flashinfer v0.2

zhyncs opened this pull request 23 days ago

Update readme

merrymercy opened this pull request 23 days ago

Fix openai protocols and pass top_k, min_p

merrymercy opened this pull request 23 days ago

torcho gemlite integration

HDCharles opened this pull request 23 days ago

[Bug] got asyncio.exceptions.InvalidStateError: invalid state when concurrent request interface /get_server_info

Lzhang-hub opened this issue 23 days ago

improve performance by removing use_tensor_core dependency

bjmsong opened this pull request 23 days ago

Small fix for the order of apply_torchao_config

merrymercy opened this pull request 23 days ago

Add a benchmark script for in-batch prefix caching

merrymercy opened this pull request 23 days ago

Revert "Small fixes for torchao quant"

merrymercy opened this pull request 23 days ago

Temporarily disable unit test of torch native attention backend

merrymercy opened this pull request 23 days ago

Simplify pytorch sampling kernel and logit processor

merrymercy opened this pull request 23 days ago

minor: update flashinfer nightly

zhyncs opened this pull request 24 days ago

fix moe-ep accuracy issue for fp8

xiaobochen123 opened this pull request 24 days ago

[Feature] Benchmarking Performance on General Devices

zhaochenyang20 opened this issue 24 days ago

fix typo

zhyncs opened this pull request 25 days ago

[Benchmark] add a benchmark for hf/vllm/sglang rmsnorm

BBuf opened this pull request 25 days ago

hotfix: checking for HIP

zhyncs opened this pull request 26 days ago

Remove cuda graph batch size adjustment for dp attention

ispobock opened this pull request 26 days ago

format: add clang-format for sgl-kernel

zhyncs opened this pull request 26 days ago

[Bug] Accuracy is abnormal when EP MoE is enabled

ispobock opened this issue 26 days ago

sgl-kernel adapt tensorrt llm custom allreduce

yizhang2077 opened this pull request 26 days ago

Fix correctness issue for triton decoding kernel

ispobock opened this pull request 26 days ago

[Experimental] Add a gRPC server for completion request

MrAta opened this pull request 26 days ago

How to debug sglang using pdb?

sleepwalker2017 opened this issue 27 days ago

Small fixes for torchao quant

jerryzh168 opened this pull request 27 days ago

[FIX] Update EOS from config

zhengy001 opened this pull request 27 days ago

[Feature] request smoothquant (int8, W8A8) quantization on 40G A100

Hao-YunDeng opened this issue 27 days ago

[Minor] Fix grok model loader

merrymercy opened this pull request 27 days ago

[Feature] Integrate CUTLASS FP8 GEMM into sgl-kernel

zhyncs opened this issue 28 days ago

[Feature] FusedMoE H200 tuning

zhyncs opened this issue 28 days ago

[Bug] Different behavior benchmarking w/ request-range-range vs. separate request-rates

Mutinifni opened this issue 28 days ago

feat: support dev image

zhyncs opened this pull request 28 days ago

"GET / HTTP/1.1" 404 Not Found

LordEdison opened this issue 28 days ago

benchmark decoding attention kernel with cudnn

bjmsong opened this pull request 28 days ago

fix: set runtime path

zhyncs opened this pull request 28 days ago

[Bug] potential correctness with triton-attention-num-kv-splits > 1

HaiShaw opened this issue 28 days ago

Rename rust folder to sgl-router

MrAta opened this pull request 28 days ago

minor: update pypi tag

zhyncs opened this pull request 28 days ago

chore: bump v0.0.2 for sgl-kernel

zhyncs opened this pull request 28 days ago

[Feature] Do we have any plan for supporting MiniCPM-V 2.6?

Xeladoes opened this issue 28 days ago

[Bug] CUDA Graph Build Failure

dangxingyu opened this issue 28 days ago

Bump sglang-router to 0.1.1

MrAta opened this pull request 28 days ago

[Feature] MoE Expert Parallel with awq

Xu-Chen opened this issue 28 days ago

Clean up GPU memory after killing sglang processes

MrAta opened this pull request 28 days ago

Include version info into the router package

MrAta opened this pull request 28 days ago

[router] Release router 0.1.0 with dynamic scaling and fault tolerance

ByronHsu opened this pull request 29 days ago

[router] Update doc for dynamic scaling and fault tolerance

ByronHsu opened this pull request 29 days ago

[router] remove main.rs because only lib.rs is used for py binding

ByronHsu opened this pull request 29 days ago

[router] Add retries based fault tolerance

ByronHsu opened this pull request 29 days ago

[Bug] Gemma 2 GGUF

slivka83 opened this issue 29 days ago

[Feature]: Benchmarking H200

antferdom opened this issue 29 days ago

Fix warmup in bench_offline_throughput.py

merrymercy opened this pull request 29 days ago

Fix model loader for more quantization formats

merrymercy opened this pull request 29 days ago

chore: update ao v0.7.0

zhyncs opened this pull request 29 days ago

It's hard to install it

ToSev7en opened this issue 29 days ago