Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sglang
feat: replace get_act_fn for gpt_bigcode
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[FIX] Wrong logger
havetc opened this pull request 5 months ago
havetc opened this pull request 5 months ago
Some questions about TTFT and TPOT benchmarks
sitabulaixizawaluduo opened this issue 5 months ago
sitabulaixizawaluduo opened this issue 5 months ago
[Minor] add delete test and delete tmp file on ci server
yichuan520030910320 opened this pull request 5 months ago
yichuan520030910320 opened this pull request 5 months ago
No such file or directory: '/sbin/ldconfig'
zwc163 opened this issue 5 months ago
zwc163 opened this issue 5 months ago
Fix bench latency benchmark
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Openvla
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Torch compile CI throughput test
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[FEAT] Support batches cancel
caiyueliang opened this pull request 5 months ago
caiyueliang opened this pull request 5 months ago
Safety test
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
在A6000上启动,14bqwen1.5,发现有问题,多GPU启动,只能用1张卡或者2张卡,如果设置3,4,5,6会报错,
yawzhe opened this issue 5 months ago
yawzhe opened this issue 5 months ago
[CI] Parallelize unit tests in CI
wisclmy0611 opened this pull request 5 months ago
wisclmy0611 opened this pull request 5 months ago
[Fix] Multi-images loading error
kcz358 opened this pull request 5 months ago
kcz358 opened this pull request 5 months ago
[CI] Fix CI
wisclmy0611 opened this pull request 5 months ago
wisclmy0611 opened this pull request 5 months ago
[Feature] add option to use liger triton kernel
binarycrayon opened this issue 5 months ago
binarycrayon opened this issue 5 months ago
improve the threshold and ports in tests
wisclmy0611 opened this pull request 5 months ago
wisclmy0611 opened this pull request 5 months ago
Update workflow files
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
Update CI runner docs
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Minor] improve CI and dependencies
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
Update CI workflows
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Feature] Support fp8 e5m2 kv cache with flashinfer
ispobock opened this pull request 5 months ago
ispobock opened this pull request 5 months ago
Accuracy degrading in concurrent scenario
frankxyy opened this issue 5 months ago
frankxyy opened this issue 5 months ago
Move sampler into CUDA graph
hnyls2002 opened this pull request 5 months ago
hnyls2002 opened this pull request 5 months ago
[Feature] Use Embedding/Generation Model to get its Generation/Emebedding
zhaochenyang20 opened this issue 5 months ago
zhaochenyang20 opened this issue 5 months ago
[Bug] enable-torch-compile error
siddhatiwari opened this issue 5 months ago
siddhatiwari opened this issue 5 months ago
[Bug] Bad outputs with fp8 quantization at high RPS
siddhatiwari opened this issue 5 months ago
siddhatiwari opened this issue 5 months ago
[Bug] Server crashes after loading (Mixtral 8x7b) on L4
nivibilla opened this issue 5 months ago
nivibilla opened this issue 5 months ago
[Feature] Jamba 1.5 Support PLS
nivibilla opened this issue 5 months ago
nivibilla opened this issue 5 months ago
[Bug] schedule_batch.py: IndexError: list index out of range
Quang-elec44 opened this issue 5 months ago
Quang-elec44 opened this issue 5 months ago
Dry sample
81549361 opened this pull request 5 months ago
81549361 opened this pull request 5 months ago
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model
zhaochenyang20 opened this pull request 5 months ago
zhaochenyang20 opened this pull request 5 months ago
[Bug] vllm updated its get_model function
zhaochenyang20 opened this issue 5 months ago
zhaochenyang20 opened this issue 5 months ago
Separated control and compute loop, shorten the critical path, and enable more complicated policies
xiezhq-hermann opened this pull request 5 months ago
xiezhq-hermann opened this pull request 5 months ago
[Minor] Improve logging and rename the health check endpoint name
merrymercy opened this pull request 5 months ago
merrymercy opened this pull request 5 months ago
[Bug] Dynamic FP8 quantization fails due to incorrect tensor shape
qeternity opened this issue 5 months ago
qeternity opened this issue 5 months ago
[Bug] Empty `top_logprobs` in LogProbs Output for Meta-Llama-3.1-8B-Instruct Model when Using OpenAI Compatible API
GuanghaoYe opened this issue 5 months ago
GuanghaoYe opened this issue 5 months ago
[Feature] Repeated generation expression
laurens-gs opened this issue 5 months ago
laurens-gs opened this issue 5 months ago
[Bug] head_dim 96 not supported
ZX-ModelCloud opened this issue 5 months ago
ZX-ModelCloud opened this issue 5 months ago
[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
chore: bump v0.2.14
zhyncs opened this pull request 5 months ago
zhyncs opened this pull request 5 months ago
[Tracker] OpenRouter LLM rankings tracking
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
Save memory from interleaved attention
Ying1123 opened this pull request 5 months ago
Ying1123 opened this pull request 5 months ago
[Feature] add disable-custom-all-reduce
Xu-Chen opened this pull request 5 months ago
Xu-Chen opened this pull request 5 months ago
Flex scheduler
yukavio opened this pull request 5 months ago
yukavio opened this pull request 5 months ago
Optimize MLA/GQA/MQA Triton decoding
ispobock opened this pull request 5 months ago
ispobock opened this pull request 5 months ago
[Bug] Llama3 70B A100 PCIE TP4 slow speed
zhyncs opened this issue 5 months ago
zhyncs opened this issue 5 months ago
[Bug] Wrong tokens with mistral model
StevenZHB opened this issue 5 months ago
StevenZHB opened this issue 5 months ago
[Bug] when llama-3.1-70b-instruct batch inference, CUDA memory usage is unusually large
yak9meat opened this issue 5 months ago
yak9meat opened this issue 5 months ago
[Feature] Support TRI-ML/prismatic-vlms
Depetrol opened this issue 5 months ago
Depetrol opened this issue 5 months ago
[RFC] Add an LLM engine
JianyuZhan opened this pull request 5 months ago
JianyuZhan opened this pull request 5 months ago
[FEAT] JSON constrained support
havetc opened this pull request 5 months ago
havetc opened this pull request 5 months ago
[Bug] I set `--host 0.0.0.0`, but it can't be called on another server
YinSonglin1997 opened this issue 5 months ago
YinSonglin1997 opened this issue 5 months ago
[Feature] add disable_custom_all_reduce
Xu-Chen opened this issue 5 months ago
Xu-Chen opened this issue 5 months ago
[Bug] After service, `torch.distributed.DistBackendError`
YinSonglin1997 opened this issue 5 months ago
YinSonglin1997 opened this issue 5 months ago
[Bug] Failure to Dispatch Head Dimension 80 in sglang with Specific Configurations
hxer7963 opened this issue 5 months ago
hxer7963 opened this issue 5 months ago
[Feature] Do we have any plan for supporting Phi3V?
boqiny opened this issue 5 months ago
boqiny opened this issue 5 months ago
[Develop] Performance Improving Feature
yukavio opened this issue 5 months ago
yukavio opened this issue 5 months ago
[Bug] Low QPS for 1.2b model
lxww302 opened this issue 5 months ago
lxww302 opened this issue 5 months ago
[Bug] Can't run Qwen2-57B-A14B-Instruct-GPTQ-Int4
xcxjack opened this issue 5 months ago
xcxjack opened this issue 5 months ago
will triton kernels support cuda graph?
AlvL1225 opened this issue 5 months ago
AlvL1225 opened this issue 5 months ago
[Bug] Always Watch Dog TimeOut
Rookie-Kai opened this issue 6 months ago
Rookie-Kai opened this issue 6 months ago
[Bug] cuda out of memory when using MQA and input_len=output_len=1024
lxww302 opened this issue 6 months ago
lxww302 opened this issue 6 months ago
[Feature] Are there plans to implement a prefill-decode split inference architecture?
CSEEduanyu opened this issue 6 months ago
CSEEduanyu opened this issue 6 months ago
[Bug] nsys profile failed
zhangjun opened this issue 6 months ago
zhangjun opened this issue 6 months ago
[Bug] T4 not work
zhyncs opened this issue 6 months ago
zhyncs opened this issue 6 months ago
[Feature] Support InternVL 2
luohao123 opened this issue 6 months ago
luohao123 opened this issue 6 months ago
Sequence Parallel
ZYHowell opened this pull request 6 months ago
ZYHowell opened this pull request 6 months ago
[Feature] Allow arbitrary logit processors
iiLaurens opened this issue 6 months ago
iiLaurens opened this issue 6 months ago
[Bug] OOM for concurrent long requests
hahmad2008 opened this issue 6 months ago
hahmad2008 opened this issue 6 months ago
[Bug] Multinode Llama 3.1 405B fp8
matthew-hippocratic opened this issue 6 months ago
matthew-hippocratic opened this issue 6 months ago
Torch.compile Performance Tracking
merrymercy opened this issue 6 months ago
merrymercy opened this issue 6 months ago
[Bug] backend stuck at Prefill batch
sophiapeng90 opened this issue 6 months ago
sophiapeng90 opened this issue 6 months ago
[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100
halexan opened this issue 6 months ago
halexan opened this issue 6 months ago
[Feature] Add runtime/process cache to avoid booting sever each time.
hnyls2002 opened this issue 6 months ago
hnyls2002 opened this issue 6 months ago
feat: frequency, min_new_tokens, presence, and repetition penalties
vhain opened this pull request 6 months ago
vhain opened this pull request 6 months ago
Add skip_tokenizer_init args.
gryffindor-rr opened this pull request 6 months ago
gryffindor-rr opened this pull request 6 months ago
[Bug] Multinode cannot be started on runpod
Desmond819 opened this issue 6 months ago
Desmond819 opened this issue 6 months ago
[Bug] pt_main_thread uses 100% cpu all the time
wizd opened this issue 6 months ago
wizd opened this issue 6 months ago
[Bug] FlashInfer support for <=sm_75
horiacristescu opened this issue 6 months ago
horiacristescu opened this issue 6 months ago
Inference Llama3-70b has an AssertionError
Ikkyu321 opened this issue 6 months ago
Ikkyu321 opened this issue 6 months ago
[Feature] tokenizer_manager accept external tokenizer or skip tokenizer init
gryffindor-rr opened this issue 6 months ago
gryffindor-rr opened this issue 6 months ago
TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)
gkiri opened this issue 6 months ago
gkiri opened this issue 6 months ago
[Feature] Google TPU Support
RonanKMcGovern opened this issue 6 months ago
RonanKMcGovern opened this issue 6 months ago
[Feature] Does sglang now support beam search
StevenZHB opened this issue 6 months ago
StevenZHB opened this issue 6 months ago
[Feature] Add a flag for computing the prompt's logprobs or not.
hnyls2002 opened this issue 6 months ago
hnyls2002 opened this issue 6 months ago
[Bug] 运行sglang.launch_server报错:cannot import name 'default_dump_dir' from 'triton.runtime.cache'
NoobPythoner opened this issue 6 months ago
NoobPythoner opened this issue 6 months ago
run llama 3.1 405B with multi node has tp server error [Bug]
kinglion811 opened this issue 6 months ago
kinglion811 opened this issue 6 months ago
[Bug] AWQ Marlin not work with Torch Compile
zhyncs opened this issue 6 months ago
zhyncs opened this issue 6 months ago
RuntimeError: TopKTopPSamplingFromProbs failed with error code no kernel image is available for execution on the device 已杀死[Bug]
mayu123mayu opened this issue 6 months ago
mayu123mayu opened this issue 6 months ago
[Feature] plan to support medusa?
CSEEduanyu opened this issue 6 months ago
CSEEduanyu opened this issue 6 months ago
[Bug] Multi-Node communication issue
dmakhervaks opened this issue 6 months ago
dmakhervaks opened this issue 6 months ago
[Feature] RadixCache: remove recursive logic
hnyls2002 opened this issue 6 months ago
hnyls2002 opened this issue 6 months ago
OPTIONS method is not supported when using sglang with the nextchat client
jjiwei opened this issue 6 months ago
jjiwei opened this issue 6 months ago
[Feature] Frontend: be able to run generate super long text
xianbaoqian opened this issue 6 months ago
xianbaoqian opened this issue 6 months ago
[Bug] Unable to install on mac
xianbaoqian opened this issue 6 months ago
xianbaoqian opened this issue 6 months ago
ROCM
BasDiaz opened this issue 6 months ago
BasDiaz opened this issue 6 months ago
[Feature] Generation Inputs: input_embeds
AlekseyKorshuk opened this issue 6 months ago
AlekseyKorshuk opened this issue 6 months ago
Initialization failed. warmup error:
bravelll opened this issue 6 months ago
bravelll opened this issue 6 months ago
Support for WebAssembly models
jaanli opened this issue 6 months ago
jaanli opened this issue 6 months ago
Development Roadmap (2024 Q3)
Ying1123 opened this issue 6 months ago
Ying1123 opened this issue 6 months ago