Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
Fix unit tests
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Add a watch dog thread
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Offline serving final
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
[Bug] Parameter Update API update_weights Fails in DP=2, TP=1 Configuration
rbao2018 opened this issue 3 months ago
rbao2018 opened this issue 3 months ago
Update hyperparameter_tuning.md
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
profile of M sizes for Torch native and TE (ignore)
Zhuohao-Li opened this pull request 3 months ago
Zhuohao-Li opened this pull request 3 months ago
Improve the user control of new_token_ratio
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Add openAI compatible API
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Provide an argument to set the maximum batch size for cuda graph
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix docs ci
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Simplify our docs with complicated functions into utils
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
detach two CI for documentation
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Update links
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Update ci workflows
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
fix int conversion for `SGLANG_CPU_COUNT`
ByronHsu opened this pull request 3 months ago
ByronHsu opened this pull request 3 months ago
Allow consecutive ports when launching multiple sglang servers.
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
Set `ZMQ` buffer size heuristic
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
Fix possible ZMQ hanging
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
move max_position_embeddings to the last
hliuca opened this pull request 3 months ago
hliuca opened this pull request 3 months ago
[Fix] Fix --skip-tokenizer-init
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Revert "Fix memory leak when doing chunked prefill"
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Release v0.3.4.post2
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix logprob in the overlapped mode
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Fix] Fix the log parsing in chunked prefill uni tests
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix log parsing in the chunked prefill unit tests
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Bug] Got error with awq_marlin quantization args.
liangzelang opened this issue 3 months ago
liangzelang opened this issue 3 months ago
[Bug] param of max_workers is int type while a string type value os.environ.get("SGLANG_CPU_COUNT") provided
wellhowtosay opened this issue 3 months ago
wellhowtosay opened this issue 3 months ago
[router] rust-based router
ByronHsu opened this pull request 3 months ago
ByronHsu opened this pull request 3 months ago
Fix seq_lens_sum for cuda graph runner in padded cases
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Bug] cutlass group_gemm.initialize failed
senlice opened this issue 3 months ago
senlice opened this issue 3 months ago
Fix memory leak when doing chunked prefill
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
add support for ipynb
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Enhance the test case for chunked prefill and check memory leak
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Create deploy-docs.yml
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Re-introduce `get_cuda_graph_seq_len_fill_value`
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Fix] Fix cuda graph padding for triton attention backend
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Shortfin Backend
stbaione opened this pull request 3 months ago
stbaione opened this pull request 3 months ago
Qwen2vl support cuda graph and disable radix cache
yizhang2077 opened this pull request 3 months ago
yizhang2077 opened this pull request 3 months ago
[Fix] Fix NaN issues by fixing the cuda graph padding values for flashinfer
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
check user-specified model_max_len with hf derived max_model_len
BBuf opened this pull request 3 months ago
BBuf opened this pull request 3 months ago
[Bug] Catch any errors caused by parsing json schema
zolinthecow opened this pull request 3 months ago
zolinthecow opened this pull request 3 months ago
Fix MockTokenizer in the unit tests
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix the perf regression due to additional_stop_token_ids
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Crash the server on warnings in CI
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix out of memory message.
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
Fix missing additional_stop_token_ids
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Update docs
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Fix] Fix abort in data parallelism
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix stop condition for <|eom_id|>
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Fix perf regression for set_kv_buffer
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Detected errors during sampling! NaN in the probability error in Qwen2.5-7b-instruct with two a30
zzh-www opened this issue 3 months ago
zzh-www opened this issue 3 months ago
[Feature] Request to 8-bit Quantization of Attention with SageAttention
Snowdar opened this issue 3 months ago
Snowdar opened this issue 3 months ago
[Feature] Multi options
QinghanLai opened this issue 3 months ago
QinghanLai opened this issue 3 months ago
[API] add get memory pool size
Ying1123 opened this pull request 3 months ago
Ying1123 opened this pull request 3 months ago
[Bug] Unable to run Qwen2-VL with OpenAI server
Quang-elec44 opened this issue 3 months ago
Quang-elec44 opened this issue 3 months ago
Fuse more ops & Simplify token mapping
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Add send request ipynb
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Add Send request.ipynb
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
Why StreamingResponse 3s Delay to Abort Requests?
matthew-hippocratic opened this issue 3 months ago
matthew-hippocratic opened this issue 3 months ago
minor: add human eval
zhyncs opened this pull request 3 months ago
zhyncs opened this pull request 3 months ago
[Performance] Support both xgrammar and outlines for constrained decoding
DarkSharpness opened this pull request 3 months ago
DarkSharpness opened this pull request 3 months ago
[Bug] Cannot run `microsoft/Phi-3.5-mini-instruct`; Capture cuda graph failed
HuanzhiMao opened this issue 3 months ago
HuanzhiMao opened this issue 3 months ago
[Bug] Llama 3.1/3.2 model in FC mode output continue past where it should stop
HuanzhiMao opened this issue 3 months ago
HuanzhiMao opened this issue 3 months ago
Release v0.3.4.post1
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Update `max_req_len` and `max_req_input_len`
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
Fix edge case for truncated
ByronHsu opened this pull request 3 months ago
ByronHsu opened this pull request 3 months ago
Fix sliding window attention and gemma-2 unit tests in CI
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Introducing SGLang Guru on Gurubase.io
kursataktas opened this pull request 3 months ago
kursataktas opened this pull request 3 months ago
[Bug] Issue in latest sglang docker image
shubhamgajbhiye1994 opened this issue 3 months ago
shubhamgajbhiye1994 opened this issue 3 months ago
Fix prefill oom
hnyls2002 opened this pull request 3 months ago
hnyls2002 opened this pull request 3 months ago
Maintain seq_lens_sum to make more FlashInfer operations non-blocking
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Make token mapping non-blocking in the overlapped mode
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Bug] Prefill OOM!
yichuan520030910320 opened this issue 3 months ago
yichuan520030910320 opened this issue 3 months ago
Faster overlap mode scheduler
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
misc: add CODEOWNERS
zhyncs opened this pull request 3 months ago
zhyncs opened this pull request 3 months ago
Add GLM-4 TextGeneration Model support for SGLang
sixsixcoder opened this pull request 3 months ago
sixsixcoder opened this pull request 3 months ago
Simplify batch result resolution
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Simplify the usage of device
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Add documentations for Installation
zhaochenyang20 opened this pull request 3 months ago
zhaochenyang20 opened this pull request 3 months ago
[Feature] Cache-aware Data Parallel Router
ByronHsu opened this issue 3 months ago
ByronHsu opened this issue 3 months ago
Optimize ZMQ receive operations to reduce idle CPU usage
zyearw1024 opened this pull request 3 months ago
zyearw1024 opened this pull request 3 months ago
[Bug] 100% CPU Usage When Idle in sglang
zyearw1024 opened this issue 3 months ago
zyearw1024 opened this issue 3 months ago
[Bug][minimal reproducible demo] High variability across batch inference runs
FredericOdermatt opened this issue 3 months ago
FredericOdermatt opened this issue 3 months ago
[LoRA, Performance] Add gemm expand triton kernel for multi-LoRA
Ying1123 opened this pull request 3 months ago
Ying1123 opened this pull request 3 months ago
[Bugfix] qwen2vl forward_extend
yizhang2077 opened this pull request 3 months ago
yizhang2077 opened this pull request 3 months ago
Split the overlapped version of TpModelWorkerClient into a separate file
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Temporarily skip the test_mixed_batch for QWen2VL
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Unify the memory pool api and tp worker API
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
docs: fix README
zhyncs opened this pull request 3 months ago
zhyncs opened this pull request 3 months ago
Update README.md
Ying1123 opened this pull request 3 months ago
Ying1123 opened this pull request 3 months ago
Support qwen2 vl model
zhyncs opened this pull request 3 months ago
zhyncs opened this pull request 3 months ago
Update vllm to 0.6.3 (#1711)
zhyncs opened this pull request 3 months ago
zhyncs opened this pull request 3 months ago
CPU Inference
JocelynPanPan opened this issue 3 months ago
JocelynPanPan opened this issue 3 months ago
Simplify the interface of tp_worker
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Created SECURITY.md
NishantRana07 opened this pull request 3 months ago
NishantRana07 opened this pull request 3 months ago
Update readme and workflow
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
[Feature] Cascade attention kernels
merrymercy opened this issue 3 months ago
merrymercy opened this issue 3 months ago
Release v0.3.4
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago
Update README.md
merrymercy opened this pull request 3 months ago
merrymercy opened this pull request 3 months ago