Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang

[CI] Split test cases in CI for better load balancing

merrymercy opened this pull request 28 days ago
feat: add should_use_tensor_core

zhyncs opened this pull request 28 days ago
[Feature] Get the real logprobs to analyze decoding

Snowdar opened this issue 28 days ago
[Bug] frequency penalty

vivian0429 opened this issue 28 days ago
Update XGrammar to the latest API

Ubospica opened this pull request 28 days ago
[Fix] Avoid calling fill_vocab_mask for terminated requests

Ubospica opened this pull request 28 days ago
feat: fused_moe fp8 monkey patch

zhyncs opened this pull request 28 days ago
[feat] Refactor session control interface and add CI

Ying1123 opened this pull request 28 days ago
Question about ragged wrapper

ZhongYingMatrix opened this issue 28 days ago
Replace prob based with threshold based load balancing

ByronHsu opened this pull request 28 days ago
Allow overwrite flashinfer use_tensorcore

merrymercy opened this pull request 28 days ago
test select concurrency

qeternity opened this pull request 29 days ago
Fix docs

merrymercy opened this pull request 29 days ago
Rename triton_fused_moe -> fused_moe_triton

merrymercy opened this pull request 29 days ago
Balance CI tests

merrymercy opened this pull request 29 days ago
fix: use torch.sum for compatible

zhyncs opened this pull request 29 days ago
[Bug] FusedMoE compatible with vllm 0.6.3.post1

zhyncs opened this issue 29 days ago
Update CI threshold & Improve code style

merrymercy opened this pull request 29 days ago
Fix mixed chunked prefill in overlap mode

merrymercy opened this pull request 29 days ago
fix: resolve end-of-file-fixer

zhyncs opened this pull request 29 days ago
feat: update other MoE models deps

zhyncs opened this pull request 29 days ago
feat: update gitignore and add tuning config for FusedMoE

zhyncs opened this pull request 29 days ago
Simplify `Scheduler.update_running_batch`

merrymercy opened this pull request 29 days ago
feat: remove the dependency on FusedMoE

zhyncs opened this pull request 29 days ago
Merged three native APIs into one: get_server_info

henryhmko opened this pull request 29 days ago
[Bug] llava use image hash as token,leading to cache bug

zwc163 opened this issue 29 days ago
Speculative EAGLE2

yukavio opened this pull request 29 days ago
Byhsu/fairness router

ByronHsu opened this pull request 29 days ago
Improve sglang router

ByronHsu opened this pull request 29 days ago
add prefix match for certain tenant

ByronHsu opened this pull request 29 days ago
Add more api routes (completion, health, etc) to the router

ByronHsu opened this pull request 29 days ago
fix dp_rank env

ByronHsu opened this pull request 29 days ago
update router doc

ByronHsu opened this pull request 29 days ago
Bump sglang-router to 0.0.5

ByronHsu opened this pull request 30 days ago
[Bug] Error when using LLAVA 1.5 for llava bench

pspdada opened this issue 30 days ago
fix: resolve bench_serving args

zhyncs opened this pull request 30 days ago
Fix dp print message

merrymercy opened this pull request 30 days ago
[CI] Fix test cases

merrymercy opened this pull request 30 days ago
Add concurrency option for benchmark

cermeng opened this pull request 30 days ago
Add concurrency option in benchmark

cermeng opened this pull request 30 days ago
Fix grid size in Triton decoding kernel

ispobock opened this pull request 30 days ago
[Bug] Error when launching llava1.5

pspdada opened this issue 30 days ago
[Feature] Support LLaMA-3.2 finetuned with Sentence Transformers !

thusinh1969 opened this issue about 1 month ago
Revert "Only stream output on tp rank 0"

merrymercy opened this pull request about 1 month ago
EAGLE2: general part [2]

yukavio opened this pull request about 1 month ago
EAGLE2: Eagle related part [1]

yukavio opened this pull request about 1 month ago
feat(pre-commit): trim unnecessary notebook metadata from git history

XuehaiPan opened this pull request about 1 month ago
fix: add xgrammar dependency

zhyncs opened this pull request about 1 month ago
minor: update gsm8k threshold

zhyncs opened this pull request about 1 month ago
Only stream output on tp rank 0

merrymercy opened this pull request about 1 month ago
add profile in offline benchmark & update doc

bjmsong opened this pull request about 1 month ago
[minor] Clean up unused imports

merrymercy opened this pull request about 1 month ago
Add initial support for intel Gaudi accelerators

ankurneog opened this pull request about 1 month ago
chore: bump v0.3.6

zhyncs opened this pull request about 1 month ago
Online weight update [WIP]

zhaochenyang20 opened this pull request about 1 month ago
Rename sglang.bench_latency to sglang.bench_one_batch

merrymercy opened this pull request about 1 month ago
[Bug] Unable to load GPTQ Mixtral 8x7 v0.1 with SGLang

DhruvaBansal00 opened this issue about 1 month ago
Turn off autotune for scaled mm for fp8 dynamic quant in torchao

jerryzh168 opened this pull request about 1 month ago
[router] add base_gpu_id server args & merged radix tree python reference

ByronHsu opened this pull request about 1 month ago
[router] cache-aware load-balancing router v1

ByronHsu opened this pull request about 1 month ago
[Feature] Inference example code for Qwen2-VL

YuanLiuuuuuu opened this issue about 1 month ago
[Bug] Qwen2-VL-7B with sglang Performance Degradation on MME benchmark

Mr-Loevan opened this issue about 1 month ago
ROCm: Fix MoE padding for none FP8 cases

HaiShaw opened this pull request about 1 month ago
Benchmark with Pytorch Profiler easily

bjmsong opened this pull request about 1 month ago
[Feature] Support for rerank models

dinhanhx opened this issue about 1 month ago
[Feature] Is Yarn supported in sglang?

klykq111 opened this issue about 1 month ago
Error out when torchao-config option is not recognized

jerryzh168 opened this pull request about 1 month ago
Fix #2037 - Context length check does not take into out pad tokens for visual models

jakep-allenai opened this pull request about 1 month ago
Enable overlap scheduler by default for the triton attention backend

merrymercy opened this pull request about 1 month ago
Move test_session_id.py to playground

merrymercy opened this pull request about 1 month ago
Allow skipping warmup in bench_offline_throughput.py

merrymercy opened this pull request about 1 month ago
feat: use cascade attention kernel (single level)

james-p-xu opened this pull request about 1 month ago
Update nightly-eval.yml

merrymercy opened this pull request about 1 month ago
[Bug] canot load Gemma2 awq

Foreist opened this issue about 1 month ago
[Bug] big TPOT and ITL when running the offline benchmark

TraceIvan opened this issue about 1 month ago
Use native fp8 format on MI300X

HaiShaw opened this pull request about 1 month ago
minor: add dataset dump and questions shuffle

zhyncs opened this pull request about 1 month ago
Expose max total num tokens from Runtime & Engine API

henryhmko opened this pull request about 1 month ago
minor: update gsm8k eval

zhyncs opened this pull request about 1 month ago
Use cuda event wait and synchronization instead of busy waiting

merrymercy opened this pull request about 1 month ago
Fix: incorrect top_logprobs in chat completion

ajwaitz opened this pull request about 1 month ago
[Feature, Performance] kv cache performance improvement

HaiShaw opened this issue about 1 month ago
Simplify logits penalizer

merrymercy opened this pull request about 1 month ago
Allow passing extra request body to bench_offline_throughput.py

merrymercy opened this pull request about 1 month ago
Fix chunked prefill with output logprob

merrymercy opened this pull request about 1 month ago
feat(srt): support prefill and generate with `input_embeds`

XuehaiPan opened this pull request about 1 month ago
Add simple CPU offloading support.

janimo opened this pull request about 1 month ago
[Feature] TorchAO support for Qwen 32B

grahama1970 opened this issue about 1 month ago
Rename layer_idx to layer_id for consistency

janimo opened this pull request about 1 month ago
docs: fix module docstrings and copyright headers

XuehaiPan opened this pull request about 1 month ago
[Performance] why so many bubbles between steps when running llava-one-vision?

sleepwalker2017 opened this issue about 1 month ago