github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Fix unit tests

merrymercy opened this pull request 3 months ago

Add a watch dog thread

merrymercy opened this pull request 3 months ago

Offline serving final

hnyls2002 opened this pull request 3 months ago

[Bug] Parameter Update API update_weights Fails in DP=2, TP=1 Configuration

rbao2018 opened this issue 3 months ago

Update hyperparameter_tuning.md

merrymercy opened this pull request 3 months ago

profile of M sizes for Torch native and TE (ignore)

Zhuohao-Li opened this pull request 3 months ago

Improve the user control of new_token_ratio

merrymercy opened this pull request 3 months ago

Add openAI compatible API

zhaochenyang20 opened this pull request 3 months ago

Provide an argument to set the maximum batch size for cuda graph

merrymercy opened this pull request 3 months ago

Fix docs ci

zhaochenyang20 opened this pull request 3 months ago

Simplify our docs with complicated functions into utils

zhaochenyang20 opened this pull request 3 months ago

detach two CI for documentation

zhaochenyang20 opened this pull request 3 months ago

Update links

merrymercy opened this pull request 3 months ago

Update ci workflows

merrymercy opened this pull request 3 months ago

fix int conversion for `SGLANG_CPU_COUNT`

ByronHsu opened this pull request 3 months ago

Allow consecutive ports when launching multiple sglang servers.

hnyls2002 opened this pull request 3 months ago

Set `ZMQ` buffer size heuristic

hnyls2002 opened this pull request 3 months ago

Fix possible ZMQ hanging

hnyls2002 opened this pull request 3 months ago

move max_position_embeddings to the last

hliuca opened this pull request 3 months ago

[Fix] Fix --skip-tokenizer-init

merrymercy opened this pull request 3 months ago

Revert "Fix memory leak when doing chunked prefill"

merrymercy opened this pull request 3 months ago

Release v0.3.4.post2

merrymercy opened this pull request 3 months ago

Fix logprob in the overlapped mode

merrymercy opened this pull request 3 months ago

[Fix] Fix the log parsing in chunked prefill uni tests

merrymercy opened this pull request 3 months ago

Fix log parsing in the chunked prefill unit tests

merrymercy opened this pull request 3 months ago

[Bug] Got error with awq_marlin quantization args.

liangzelang opened this issue 3 months ago

[Bug] param of max_workers is int type while a string type value os.environ.get("SGLANG_CPU_COUNT") provided

wellhowtosay opened this issue 3 months ago

[router] rust-based router

ByronHsu opened this pull request 3 months ago

Fix seq_lens_sum for cuda graph runner in padded cases

merrymercy opened this pull request 3 months ago

[Bug] cutlass group_gemm.initialize failed

senlice opened this issue 3 months ago

Fix memory leak when doing chunked prefill

hnyls2002 opened this pull request 3 months ago

add support for ipynb

zhaochenyang20 opened this pull request 3 months ago

Enhance the test case for chunked prefill and check memory leak

merrymercy opened this pull request 3 months ago

Create deploy-docs.yml

zhaochenyang20 opened this pull request 3 months ago

Re-introduce `get_cuda_graph_seq_len_fill_value`

merrymercy opened this pull request 3 months ago

[Fix] Fix cuda graph padding for triton attention backend

merrymercy opened this pull request 3 months ago

Shortfin Backend

stbaione opened this pull request 3 months ago

Qwen2vl support cuda graph and disable radix cache

yizhang2077 opened this pull request 3 months ago

[Fix] Fix NaN issues by fixing the cuda graph padding values for flashinfer

merrymercy opened this pull request 3 months ago

check user-specified model_max_len with hf derived max_model_len

BBuf opened this pull request 3 months ago

[Bug] Catch any errors caused by parsing json schema

zolinthecow opened this pull request 3 months ago

Fix MockTokenizer in the unit tests

merrymercy opened this pull request 3 months ago

Fix the perf regression due to additional_stop_token_ids

merrymercy opened this pull request 3 months ago

Crash the server on warnings in CI

merrymercy opened this pull request 3 months ago

Fix out of memory message.

hnyls2002 opened this pull request 3 months ago

Fix missing additional_stop_token_ids

merrymercy opened this pull request 3 months ago

Update docs

merrymercy opened this pull request 3 months ago

[Fix] Fix abort in data parallelism

merrymercy opened this pull request 3 months ago

Fix stop condition for <|eom_id|>

merrymercy opened this pull request 3 months ago

Fix perf regression for set_kv_buffer

merrymercy opened this pull request 3 months ago

Detected errors during sampling! NaN in the probability error in Qwen2.5-7b-instruct with two a30

zzh-www opened this issue 3 months ago

[Feature] Request to 8-bit Quantization of Attention with SageAttention

Snowdar opened this issue 3 months ago

[Feature] Multi options

QinghanLai opened this issue 3 months ago

[API] add get memory pool size

Ying1123 opened this pull request 3 months ago

[Bug] Unable to run Qwen2-VL with OpenAI server

Quang-elec44 opened this issue 3 months ago

Fuse more ops & Simplify token mapping

merrymercy opened this pull request 3 months ago

Add send request ipynb

zhaochenyang20 opened this pull request 3 months ago

Add Send request.ipynb

zhaochenyang20 opened this pull request 3 months ago

Why StreamingResponse 3s Delay to Abort Requests?

matthew-hippocratic opened this issue 3 months ago

minor: add human eval

zhyncs opened this pull request 3 months ago

[Performance] Support both xgrammar and outlines for constrained decoding

DarkSharpness opened this pull request 3 months ago

[Bug] Cannot run `microsoft/Phi-3.5-mini-instruct`; Capture cuda graph failed

HuanzhiMao opened this issue 3 months ago

[Bug] Llama 3.1/3.2 model in FC mode output continue past where it should stop

HuanzhiMao opened this issue 3 months ago

Release v0.3.4.post1

merrymercy opened this pull request 3 months ago

Update `max_req_len` and `max_req_input_len`

hnyls2002 opened this pull request 3 months ago

Fix edge case for truncated

ByronHsu opened this pull request 3 months ago

Fix sliding window attention and gemma-2 unit tests in CI

merrymercy opened this pull request 3 months ago

Introducing SGLang Guru on Gurubase.io

kursataktas opened this pull request 3 months ago

[Bug] Issue in latest sglang docker image

shubhamgajbhiye1994 opened this issue 3 months ago

Fix prefill oom

hnyls2002 opened this pull request 3 months ago

[Bug] when sglang received over 16 concurrency request(means i create 16 thread to call the service all the time), it will return abnormal result, and in the log, will occur NaN

GGBond8488 opened this issue 3 months ago

Maintain seq_lens_sum to make more FlashInfer operations non-blocking

merrymercy opened this pull request 3 months ago

Make token mapping non-blocking in the overlapped mode

merrymercy opened this pull request 3 months ago

[Bug] Prefill OOM!

yichuan520030910320 opened this issue 3 months ago

Faster overlap mode scheduler

merrymercy opened this pull request 3 months ago

misc: add CODEOWNERS

zhyncs opened this pull request 3 months ago

Add GLM-4 TextGeneration Model support for SGLang

sixsixcoder opened this pull request 3 months ago

Simplify batch result resolution

merrymercy opened this pull request 3 months ago

Simplify the usage of device

merrymercy opened this pull request 3 months ago

Add documentations for Installation

zhaochenyang20 opened this pull request 3 months ago

[Feature] Cache-aware Data Parallel Router

ByronHsu opened this issue 3 months ago

Optimize ZMQ receive operations to reduce idle CPU usage

zyearw1024 opened this pull request 3 months ago

[Bug] 100% CPU Usage When Idle in sglang

zyearw1024 opened this issue 3 months ago

[Bug][minimal reproducible demo] High variability across batch inference runs

FredericOdermatt opened this issue 3 months ago

[LoRA, Performance] Add gemm expand triton kernel for multi-LoRA

Ying1123 opened this pull request 3 months ago

[Bugfix] qwen2vl forward_extend

yizhang2077 opened this pull request 3 months ago

Split the overlapped version of TpModelWorkerClient into a separate file

merrymercy opened this pull request 3 months ago

Temporarily skip the test_mixed_batch for QWen2VL

merrymercy opened this pull request 3 months ago

Unify the memory pool api and tp worker API

merrymercy opened this pull request 3 months ago

docs: fix README

zhyncs opened this pull request 3 months ago

Update README.md

Ying1123 opened this pull request 3 months ago

Support qwen2 vl model

zhyncs opened this pull request 3 months ago

Update vllm to 0.6.3 (#1711)

zhyncs opened this pull request 3 months ago

CPU Inference

JocelynPanPan opened this issue 3 months ago

Simplify the interface of tp_worker

merrymercy opened this pull request 3 months ago

Created SECURITY.md

NishantRana07 opened this pull request 3 months ago

Update readme and workflow

merrymercy opened this pull request 3 months ago

[Feature] Cascade attention kernels

merrymercy opened this issue 3 months ago

Release v0.3.4

merrymercy opened this pull request 3 months ago

Update README.md

merrymercy opened this pull request 3 months ago