github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

feat: replace get_act_fn for gpt_bigcode

zhyncs opened this pull request 5 months ago

[FIX] Wrong logger

havetc opened this pull request 5 months ago

Some questions about TTFT and TPOT benchmarks

sitabulaixizawaluduo opened this issue 5 months ago

[Minor] add delete test and delete tmp file on ci server

yichuan520030910320 opened this pull request 5 months ago

No such file or directory: '/sbin/ldconfig'

zwc163 opened this issue 5 months ago

Fix bench latency benchmark

hnyls2002 opened this pull request 5 months ago

Openvla

hnyls2002 opened this pull request 5 months ago

Torch compile CI throughput test

hnyls2002 opened this pull request 5 months ago

[FEAT] Support batches cancel

caiyueliang opened this pull request 5 months ago

Safety test

hnyls2002 opened this pull request 5 months ago

在A6000上启动，14bqwen1.5，发现有问题，多GPU启动，只能用1张卡或者2张卡，如果设置3,4,5,6会报错，

yawzhe opened this issue 5 months ago

[CI] Parallelize unit tests in CI

wisclmy0611 opened this pull request 5 months ago

[Fix] Multi-images loading error

kcz358 opened this pull request 5 months ago

[CI] Fix CI

wisclmy0611 opened this pull request 5 months ago

[Feature] add option to use liger triton kernel

binarycrayon opened this issue 5 months ago

improve the threshold and ports in tests

wisclmy0611 opened this pull request 5 months ago

Update workflow files

merrymercy opened this pull request 5 months ago

Update CI runner docs

merrymercy opened this pull request 5 months ago

[Minor] improve CI and dependencies

hnyls2002 opened this pull request 5 months ago

Update CI workflows

merrymercy opened this pull request 5 months ago

[Feature] Support fp8 e5m2 kv cache with flashinfer

ispobock opened this pull request 5 months ago

Accuracy degrading in concurrent scenario

frankxyy opened this issue 5 months ago

Move sampler into CUDA graph

hnyls2002 opened this pull request 5 months ago

[Feature] Use Embedding/Generation Model to get its Generation/Emebedding

zhaochenyang20 opened this issue 5 months ago

[Bug] enable-torch-compile error

siddhatiwari opened this issue 5 months ago

[Bug] Bad outputs with fp8 quantization at high RPS

siddhatiwari opened this issue 5 months ago

[Bug] Server crashes after loading (Mixtral 8x7b) on L4

nivibilla opened this issue 5 months ago

[Feature] Jamba 1.5 Support PLS

nivibilla opened this issue 5 months ago

[Bug] schedule_batch.py: IndexError: list index out of range

Quang-elec44 opened this issue 5 months ago

Dry sample

81549361 opened this pull request 5 months ago

Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model

zhaochenyang20 opened this pull request 5 months ago

[Bug] vllm updated its get_model function

zhaochenyang20 opened this issue 5 months ago

Separated control and compute loop, shorten the critical path, and enable more complicated policies

xiezhq-hermann opened this pull request 5 months ago

[Minor] Improve logging and rename the health check endpoint name

merrymercy opened this pull request 5 months ago

[Bug] Dynamic FP8 quantization fails due to incorrect tensor shape

qeternity opened this issue 5 months ago

[Bug] Empty `top_logprobs` in LogProbs Output for Meta-Llama-3.1-8B-Instruct Model when Using OpenAI Compatible API

GuanghaoYe opened this issue 5 months ago

[Feature] Repeated generation expression

laurens-gs opened this issue 5 months ago

[Bug] head_dim 96 not supported

ZX-ModelCloud opened this issue 5 months ago

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2

zhyncs opened this issue 5 months ago

chore: bump v0.2.14

zhyncs opened this pull request 5 months ago

[Tracker] OpenRouter LLM rankings tracking

zhyncs opened this issue 5 months ago

Save memory from interleaved attention

Ying1123 opened this pull request 5 months ago

[Feature] add disable-custom-all-reduce

Xu-Chen opened this pull request 5 months ago

Flex scheduler

yukavio opened this pull request 5 months ago

Optimize MLA/GQA/MQA Triton decoding

ispobock opened this pull request 5 months ago

[Bug] Llama3 70B A100 PCIE TP4 slow speed

zhyncs opened this issue 5 months ago

[Bug] Wrong tokens with mistral model

StevenZHB opened this issue 5 months ago

[Bug] when llama-3.1-70b-instruct batch inference, CUDA memory usage is unusually large

yak9meat opened this issue 5 months ago

[Feature] Support TRI-ML/prismatic-vlms

Depetrol opened this issue 5 months ago

[RFC] Add an LLM engine

JianyuZhan opened this pull request 5 months ago

[FEAT] JSON constrained support

havetc opened this pull request 5 months ago

[Bug] I set `--host 0.0.0.0`, but it can't be called on another server

YinSonglin1997 opened this issue 5 months ago

[Feature] add disable_custom_all_reduce

Xu-Chen opened this issue 5 months ago

[Bug] After service, `torch.distributed.DistBackendError`

YinSonglin1997 opened this issue 5 months ago

[Bug] Failure to Dispatch Head Dimension 80 in sglang with Specific Configurations

hxer7963 opened this issue 5 months ago

[Feature] Do we have any plan for supporting Phi3V?

boqiny opened this issue 5 months ago

[Develop] Performance Improving Feature

yukavio opened this issue 5 months ago

[Bug] Low QPS for 1.2b model

lxww302 opened this issue 5 months ago

[Bug] Can't run Qwen2-57B-A14B-Instruct-GPTQ-Int4

xcxjack opened this issue 5 months ago

will triton kernels support cuda graph?

AlvL1225 opened this issue 5 months ago

[Bug] Always Watch Dog TimeOut

Rookie-Kai opened this issue 6 months ago

[Bug] cuda out of memory when using MQA and input_len=output_len=1024

lxww302 opened this issue 6 months ago

[Feature] Are there plans to implement a prefill-decode split inference architecture?

CSEEduanyu opened this issue 6 months ago

[Bug] nsys profile failed

zhangjun opened this issue 6 months ago

[Bug] T4 not work

zhyncs opened this issue 6 months ago

[Feature] Support InternVL 2

luohao123 opened this issue 6 months ago

Sequence Parallel

ZYHowell opened this pull request 6 months ago

[Feature] Allow arbitrary logit processors

iiLaurens opened this issue 6 months ago

[Bug] OOM for concurrent long requests

hahmad2008 opened this issue 6 months ago

[Bug] Multinode Llama 3.1 405B fp8

matthew-hippocratic opened this issue 6 months ago

Torch.compile Performance Tracking

merrymercy opened this issue 6 months ago

[Bug] backend stuck at Prefill batch

sophiapeng90 opened this issue 6 months ago

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

halexan opened this issue 6 months ago

[Feature] Add runtime/process cache to avoid booting sever each time.

hnyls2002 opened this issue 6 months ago

feat: frequency, min_new_tokens, presence, and repetition penalties

vhain opened this pull request 6 months ago

Add skip_tokenizer_init args.

gryffindor-rr opened this pull request 6 months ago

[Bug] Multinode cannot be started on runpod

Desmond819 opened this issue 6 months ago

[Bug] pt_main_thread uses 100% cpu all the time

wizd opened this issue 6 months ago

[Bug] FlashInfer support for <=sm_75

horiacristescu opened this issue 6 months ago

Inference Llama3-70b has an AssertionError

Ikkyu321 opened this issue 6 months ago

[Feature] tokenizer_manager accept external tokenizer or skip tokenizer init

gryffindor-rr opened this issue 6 months ago

TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)

gkiri opened this issue 6 months ago

[Feature] Google TPU Support

RonanKMcGovern opened this issue 6 months ago

[Feature] Does sglang now support beam search

StevenZHB opened this issue 6 months ago

[Feature] Add a flag for computing the prompt's logprobs or not.

hnyls2002 opened this issue 6 months ago

[Bug] 运行sglang.launch_server报错：cannot import name 'default_dump_dir' from 'triton.runtime.cache'

NoobPythoner opened this issue 6 months ago

run llama 3.1 405B with multi node has tp server error [Bug]

kinglion811 opened this issue 6 months ago

[Bug] AWQ Marlin not work with Torch Compile

zhyncs opened this issue 6 months ago

RuntimeError: TopKTopPSamplingFromProbs failed with error code no kernel image is available for execution on the device 已杀死[Bug]

mayu123mayu opened this issue 6 months ago

[Feature] plan to support medusa?

CSEEduanyu opened this issue 6 months ago

[Bug] Multi-Node communication issue

dmakhervaks opened this issue 6 months ago

[Feature] RadixCache: remove recursive logic

hnyls2002 opened this issue 6 months ago

OPTIONS method is not supported when using sglang with the nextchat client

jjiwei opened this issue 6 months ago

[Feature] Frontend: be able to run generate super long text

xianbaoqian opened this issue 6 months ago

[Bug] Unable to install on mac

xianbaoqian opened this issue 6 months ago

ROCM

BasDiaz opened this issue 6 months ago

[Feature] Generation Inputs: input_embeds

AlekseyKorshuk opened this issue 6 months ago

Initialization failed. warmup error:

bravelll opened this issue 6 months ago

Support for WebAssembly models

jaanli opened this issue 6 months ago

Development Roadmap (2024 Q3)

Ying1123 opened this issue 6 months ago