Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
[Bug] when llama-3.1-70b-instruct batch inference, CUDA memory usage is unusually large
yak9meat opened this issue 5 months ago
yak9meat opened this issue 5 months ago
[Feature] Support TRI-ML/prismatic-vlms
Depetrol opened this issue 5 months ago
Depetrol opened this issue 5 months ago
[RFC] Add an LLM engine
JianyuZhan opened this pull request 5 months ago
JianyuZhan opened this pull request 5 months ago
[FEAT] JSON constrained support
havetc opened this pull request 5 months ago
havetc opened this pull request 5 months ago
[Bug] I set `--host 0.0.0.0`, but it can't be called on another server
YinSonglin1997 opened this issue 5 months ago
YinSonglin1997 opened this issue 5 months ago
[Feature] add disable_custom_all_reduce
Xu-Chen opened this issue 5 months ago
Xu-Chen opened this issue 5 months ago
[Bug] After service, `torch.distributed.DistBackendError`
YinSonglin1997 opened this issue 5 months ago
YinSonglin1997 opened this issue 5 months ago
[Bug] Failure to Dispatch Head Dimension 80 in sglang with Specific Configurations
hxer7963 opened this issue 5 months ago
hxer7963 opened this issue 5 months ago
[Feature] Do we have any plan for supporting Phi3V?
boqiny opened this issue 5 months ago
boqiny opened this issue 5 months ago
[Develop] Performance Improving Feature
yukavio opened this issue 5 months ago
yukavio opened this issue 5 months ago
[Bug] Low QPS for 1.2b model
lxww302 opened this issue 5 months ago
lxww302 opened this issue 5 months ago
[Bug] Can't run Qwen2-57B-A14B-Instruct-GPTQ-Int4
xcxjack opened this issue 5 months ago
xcxjack opened this issue 5 months ago
will triton kernels support cuda graph?
AlvL1225 opened this issue 5 months ago
AlvL1225 opened this issue 5 months ago
[Bug] Always Watch Dog TimeOut
Rookie-Kai opened this issue 5 months ago
Rookie-Kai opened this issue 5 months ago
[Bug] cuda out of memory when using MQA and input_len=output_len=1024
lxww302 opened this issue 5 months ago
lxww302 opened this issue 5 months ago
[Feature] Are there plans to implement a prefill-decode split inference architecture?
CSEEduanyu opened this issue 5 months ago
CSEEduanyu opened this issue 5 months ago
[Bug] nsys profile failed
zhangjun opened this issue 5 months ago
zhangjun opened this issue 5 months ago
[Bug] T4 not work
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Feature] Support InternVL 2
luohao123 opened this issue 5 months ago
luohao123 opened this issue 5 months ago
Sequence Parallel
ZYHowell opened this pull request 5 months ago
ZYHowell opened this pull request 5 months ago
[Feature] Allow arbitrary logit processors
iiLaurens opened this issue 5 months ago
iiLaurens opened this issue 5 months ago
[Bug] OOM for concurrent long requests
hahmad2008 opened this issue 5 months ago
hahmad2008 opened this issue 5 months ago
[Bug] Multinode Llama 3.1 405B fp8
matthew-hippocratic opened this issue 5 months ago
matthew-hippocratic opened this issue 5 months ago
Torch.compile Performance Tracking
merrymercy opened this issue 5 months ago
merrymercy opened this issue 5 months ago
[Bug] backend stuck at Prefill batch
sophiapeng90 opened this issue 5 months ago
sophiapeng90 opened this issue 5 months ago
[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100
halexan opened this issue 5 months ago
halexan opened this issue 5 months ago
[Feature] Add runtime/process cache to avoid booting sever each time.
hnyls2002 opened this issue 5 months ago
hnyls2002 opened this issue 5 months ago
feat: frequency, min_new_tokens, presence, and repetition penalties
vhain opened this pull request 5 months ago
vhain opened this pull request 5 months ago
Add skip_tokenizer_init args.
gryffindor-rr opened this pull request 5 months ago
gryffindor-rr opened this pull request 5 months ago
[Bug] Multinode cannot be started on runpod
Desmond819 opened this issue 5 months ago
Desmond819 opened this issue 5 months ago
[Bug] pt_main_thread uses 100% cpu all the time
wizd opened this issue 5 months ago
wizd opened this issue 5 months ago
[Bug] FlashInfer support for <=sm_75
horiacristescu opened this issue 5 months ago
horiacristescu opened this issue 5 months ago
Inference Llama3-70b has an AssertionError
Ikkyu321 opened this issue 5 months ago
Ikkyu321 opened this issue 5 months ago
[Feature] tokenizer_manager accept external tokenizer or skip tokenizer init
gryffindor-rr opened this issue 5 months ago
gryffindor-rr opened this issue 5 months ago
TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)
gkiri opened this issue 5 months ago
gkiri opened this issue 5 months ago
[Feature] Google TPU Support
RonanKMcGovern opened this issue 5 months ago
RonanKMcGovern opened this issue 5 months ago
[Feature] Does sglang now support beam search
StevenZHB opened this issue 5 months ago
StevenZHB opened this issue 5 months ago
[Feature] Add a flag for computing the prompt's logprobs or not.
hnyls2002 opened this issue 5 months ago
hnyls2002 opened this issue 5 months ago
[Bug] 运行sglang.launch_server报错:cannot import name 'default_dump_dir' from 'triton.runtime.cache'
NoobPythoner opened this issue 5 months ago
NoobPythoner opened this issue 5 months ago
run llama 3.1 405B with multi node has tp server error [Bug]
kinglion811 opened this issue 5 months ago
kinglion811 opened this issue 5 months ago
[Bug] AWQ Marlin not work with Torch Compile
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
RuntimeError: TopKTopPSamplingFromProbs failed with error code no kernel image is available for execution on the device 已杀死[Bug]
mayu123mayu opened this issue 5 months ago
mayu123mayu opened this issue 5 months ago
[Feature] plan to support medusa?
CSEEduanyu opened this issue 5 months ago
CSEEduanyu opened this issue 5 months ago
[Bug] Multi-Node communication issue
dmakhervaks opened this issue 5 months ago
dmakhervaks opened this issue 5 months ago
[Feature] RadixCache: remove recursive logic
hnyls2002 opened this issue 5 months ago
hnyls2002 opened this issue 5 months ago
OPTIONS method is not supported when using sglang with the nextchat client
jjiwei opened this issue 6 months ago
jjiwei opened this issue 6 months ago
[Feature] Frontend: be able to run generate super long text
xianbaoqian opened this issue 6 months ago
xianbaoqian opened this issue 6 months ago
ROCM
BasDiaz opened this issue 6 months ago
BasDiaz opened this issue 6 months ago
[Feature] Generation Inputs: input_embeds
AlekseyKorshuk opened this issue 6 months ago
AlekseyKorshuk opened this issue 6 months ago
Initialization failed. warmup error:
bravelll opened this issue 6 months ago
bravelll opened this issue 6 months ago
Support for WebAssembly models
jaanli opened this issue 6 months ago
jaanli opened this issue 6 months ago
Development Roadmap (2024 Q3)
Ying1123 opened this issue 6 months ago
Ying1123 opened this issue 6 months ago
select() on first assistant token broken (in different ways in Mistral and Llama). Likely tokenization issue.
max99x opened this issue 6 months ago
max99x opened this issue 6 months ago
`model_override_args` with server
ValeKnappich opened this issue 6 months ago
ValeKnappich opened this issue 6 months ago
Add a HuggingFace backend
cloneofsimo opened this issue 6 months ago
cloneofsimo opened this issue 6 months ago
Function calling for OpenAI backend
Yiyun-Liang opened this pull request 6 months ago
Yiyun-Liang opened this pull request 6 months ago
Add Support to Florence-2
KaifAhmad1 opened this issue 6 months ago
KaifAhmad1 opened this issue 6 months ago
Will speculative decoding be supported?
arunpatala opened this issue 7 months ago
arunpatala opened this issue 7 months ago
Llava CUDA error: device-side assert triggered
dmilcevski opened this issue 7 months ago
dmilcevski opened this issue 7 months ago
[Bug]: Random model output using sglang backend server
PanJason opened this issue 7 months ago
PanJason opened this issue 7 months ago
SG-Lang Runtime Stuck Launching in Docker Container
schopra8 opened this issue 7 months ago
schopra8 opened this issue 7 months ago
Qwen 2 7B not working
sudarshan-kamath opened this issue 7 months ago
sudarshan-kamath opened this issue 7 months ago
Does llava-next-video deploy only focus on first frames?
LetheRiver0 opened this issue 7 months ago
LetheRiver0 opened this issue 7 months ago
Unable to load 72b llava qwen on 8*A100 40GB
jeffhernandez1995 opened this issue 7 months ago
jeffhernandez1995 opened this issue 7 months ago
remove redundant pad_input_ids function
amosyou opened this pull request 7 months ago
amosyou opened this pull request 7 months ago
llava-next-video inference result is empty
AmazDeng opened this issue 7 months ago
AmazDeng opened this issue 7 months ago
no longer can load 72b llava qwen on 4*H100 80GB
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
Invalid API key
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
Trace OpenAI backend usage
Ying1123 opened this issue 8 months ago
Ying1123 opened this issue 8 months ago
Regex generation causes 37x lower performance
Gintasz opened this issue 8 months ago
Gintasz opened this issue 8 months ago
OOM CUDA error on 8 * L4 machine when launching sglang server
mounamokaddem opened this issue 8 months ago
mounamokaddem opened this issue 8 months ago
Llama-3 regex generation can get stuck in infinite generation beyond max_tokens and crash server (reproduction example)
Gintasz opened this issue 8 months ago
Gintasz opened this issue 8 months ago
Please add Phi3 support
Curiosity007 opened this issue 8 months ago
Curiosity007 opened this issue 8 months ago
no batch run when using openai's format for calling.
xjw00654 opened this issue 8 months ago
xjw00654 opened this issue 8 months ago
ImportError: cannot import name 'function' from partially initialized module 'sglang'
lambda7xx opened this issue 9 months ago
lambda7xx opened this issue 9 months ago
ImportError: cannot import name 'get_cuda_stream' from 'triton.runtime.jit' In triton-nightly(V100)
nenomigami opened this issue 9 months ago
nenomigami opened this issue 9 months ago
Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging
alessiodallapiazza opened this issue 10 months ago
alessiodallapiazza opened this issue 10 months ago
Allow OPTIONS Method on Http Server and add Cors headers.
kseyhan opened this issue 10 months ago
kseyhan opened this issue 10 months ago
Supports the InternVL multimodal large model
exceedzhang opened this issue 10 months ago
exceedzhang opened this issue 10 months ago
Openrouter usage example
janimo opened this pull request 10 months ago
janimo opened this pull request 10 months ago
Update dependencies
janimo opened this pull request 10 months ago
janimo opened this pull request 10 months ago
Setting Data Type from the CLI interface
Reichenbachian opened this issue 10 months ago
Reichenbachian opened this issue 10 months ago
Add OpenRouter backend
janimo opened this pull request 10 months ago
janimo opened this pull request 10 months ago
Use Anthropic messages API
janimo opened this pull request 10 months ago
janimo opened this pull request 10 months ago
Add StableLM model.
janimo opened this pull request 10 months ago
janimo opened this pull request 10 months ago
Add logo
merrymercy opened this pull request 10 months ago
merrymercy opened this pull request 10 months ago
RuntimeError: CUDA error: device-side assert triggered when running
aliencaocao opened this issue 10 months ago
aliencaocao opened this issue 10 months ago
Error: Connection Refused by SGLANG Backend
deadpipe opened this issue 10 months ago
deadpipe opened this issue 10 months ago
Will qwen-vl be supported in the future?
zhangqingwu opened this issue 11 months ago
zhangqingwu opened this issue 11 months ago
Does sglang support multi-node backend model?
Luodian opened this issue 11 months ago
Luodian opened this issue 11 months ago
Add SGLang usage examples
Ying1123 opened this issue 11 months ago
Ying1123 opened this issue 11 months ago
Development Roadmap (Deprecated)
Ying1123 opened this issue 11 months ago
Ying1123 opened this issue 11 months ago
`RecursionError: maximum recursion depth exceeded while calling a Python object` when inferencing with long input
Ja1Zhou opened this issue 11 months ago
Ja1Zhou opened this issue 11 months ago
[Bug] liuhaotian/llava-v1.6-mistral-7b doesn't load
fozziethebeat opened this issue 11 months ago
fozziethebeat opened this issue 11 months ago
Tutorial for Batch Decoding and Obtaining Log Probs
aflah02 opened this issue 12 months ago
aflah02 opened this issue 12 months ago
Can SGL generate list of json?
CSWellesSun opened this issue 12 months ago
CSWellesSun opened this issue 12 months ago
Triton support
TheodoreGalanos opened this issue 12 months ago
TheodoreGalanos opened this issue 12 months ago
Colab?
srush opened this issue 12 months ago
srush opened this issue 12 months ago