SGLang issues | Ecosyste.ms: OpenCollective

[Develop] Performance Improving Feature

github.com/sgl-project/sglang - yukavio opened this issue 6 months ago

[Bug] Low QPS for 1.2b model

github.com/sgl-project/sglang - lxww302 opened this issue 6 months ago

[Bug] Can't run Qwen2-57B-A14B-Instruct-GPTQ-Int4

github.com/sgl-project/sglang - xcxjack opened this issue 6 months ago

will triton kernels support cuda graph?

github.com/sgl-project/sglang - AlvL1225 opened this issue 6 months ago

[Bug] Always Watch Dog TimeOut

github.com/sgl-project/sglang - Rookie-Kai opened this issue 6 months ago

[Bug] cuda out of memory when using MQA and input_len=output_len=1024

github.com/sgl-project/sglang - lxww302 opened this issue 6 months ago

[Feature] Are there plans to implement a prefill-decode split inference architecture?

github.com/sgl-project/sglang - CSEEduanyu opened this issue 6 months ago

[Bug] nsys profile failed

github.com/sgl-project/sglang - zhangjun opened this issue 6 months ago

[Bug] T4 not work

github.com/sgl-project/sglang - zhyncs opened this issue 6 months ago

[Feature] Support InternVL 2

github.com/sgl-project/sglang - luohao123 opened this issue 6 months ago

Sequence Parallel

github.com/sgl-project/sglang - ZYHowell opened this pull request 6 months ago

[Feature] Allow arbitrary logit processors

github.com/sgl-project/sglang - iiLaurens opened this issue 6 months ago

[Bug] OOM for concurrent long requests

github.com/sgl-project/sglang - hahmad2008 opened this issue 6 months ago

[Bug] Multinode Llama 3.1 405B fp8

github.com/sgl-project/sglang - matthew-hippocratic opened this issue 6 months ago

Torch.compile Performance Tracking

github.com/sgl-project/sglang - merrymercy opened this issue 6 months ago

[Bug] backend stuck at Prefill batch

github.com/sgl-project/sglang - sophiapeng90 opened this issue 6 months ago

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

github.com/sgl-project/sglang - halexan opened this issue 6 months ago

[Feature] Add runtime/process cache to avoid booting sever each time.

github.com/sgl-project/sglang - hnyls2002 opened this issue 6 months ago

feat: frequency, min_new_tokens, presence, and repetition penalties

github.com/sgl-project/sglang - vhain opened this pull request 6 months ago

Add skip_tokenizer_init args.

github.com/sgl-project/sglang - gryffindor-rr opened this pull request 6 months ago

[Bug] Multinode cannot be started on runpod

github.com/sgl-project/sglang - Desmond819 opened this issue 6 months ago

[Bug] pt_main_thread uses 100% cpu all the time

github.com/sgl-project/sglang - wizd opened this issue 6 months ago

[Bug] FlashInfer support for <=sm_75

github.com/sgl-project/sglang - horiacristescu opened this issue 6 months ago

Inference Llama3-70b has an AssertionError

github.com/sgl-project/sglang - Ikkyu321 opened this issue 6 months ago

[Feature] tokenizer_manager accept external tokenizer or skip tokenizer init

github.com/sgl-project/sglang - gryffindor-rr opened this issue 6 months ago

TTFT latency for long context (16K) is very high around 15 seconds for llama3.1 70b model. (same or worse than vLLM)

github.com/sgl-project/sglang - gkiri opened this issue 6 months ago

[Feature] Google TPU Support

github.com/sgl-project/sglang - RonanKMcGovern opened this issue 6 months ago

[Feature] Does sglang now support beam search

github.com/sgl-project/sglang - StevenZHB opened this issue 6 months ago

[Feature] Add a flag for computing the prompt's logprobs or not.

github.com/sgl-project/sglang - hnyls2002 opened this issue 6 months ago

[Bug] 运行sglang.launch_server报错：cannot import name 'default_dump_dir' from 'triton.runtime.cache'

github.com/sgl-project/sglang - NoobPythoner opened this issue 6 months ago

run llama 3.1 405B with multi node has tp server error [Bug]

github.com/sgl-project/sglang - kinglion811 opened this issue 6 months ago

[Bug] AWQ Marlin not work with Torch Compile

github.com/sgl-project/sglang - zhyncs opened this issue 6 months ago

RuntimeError: TopKTopPSamplingFromProbs failed with error code no kernel image is available for execution on the device 已杀死[Bug]

github.com/sgl-project/sglang - mayu123mayu opened this issue 6 months ago

[Feature] plan to support medusa?

github.com/sgl-project/sglang - CSEEduanyu opened this issue 6 months ago

[Bug] Multi-Node communication issue

github.com/sgl-project/sglang - dmakhervaks opened this issue 6 months ago

[Feature] RadixCache: remove recursive logic

github.com/sgl-project/sglang - hnyls2002 opened this issue 6 months ago

OPTIONS method is not supported when using sglang with the nextchat client

github.com/sgl-project/sglang - jjiwei opened this issue 7 months ago

[Feature] Frontend: be able to run generate super long text

github.com/sgl-project/sglang - xianbaoqian opened this issue 7 months ago

[Bug] Unable to install on mac

github.com/sgl-project/sglang - xianbaoqian opened this issue 7 months ago

ROCM

github.com/sgl-project/sglang - BasDiaz opened this issue 7 months ago

[Feature] Generation Inputs: input_embeds

github.com/sgl-project/sglang - AlekseyKorshuk opened this issue 7 months ago

Initialization failed. warmup error:

github.com/sgl-project/sglang - bravelll opened this issue 7 months ago

Support for WebAssembly models

github.com/sgl-project/sglang - jaanli opened this issue 7 months ago

Development Roadmap (2024 Q3)

github.com/sgl-project/sglang - Ying1123 opened this issue 7 months ago

select() on first assistant token broken (in different ways in Mistral and Llama). Likely tokenization issue.

github.com/sgl-project/sglang - max99x opened this issue 7 months ago

`model_override_args` with server

github.com/sgl-project/sglang - ValeKnappich opened this issue 7 months ago

Add a HuggingFace backend

github.com/sgl-project/sglang - cloneofsimo opened this issue 7 months ago

Function calling for OpenAI backend

github.com/sgl-project/sglang - Yiyun-Liang opened this pull request 7 months ago

Add Support to Florence-2

github.com/sgl-project/sglang - KaifAhmad1 opened this issue 7 months ago

Will speculative decoding be supported?

github.com/sgl-project/sglang - arunpatala opened this issue 8 months ago

Llava CUDA error: device-side assert triggered

github.com/sgl-project/sglang - dmilcevski opened this issue 8 months ago

[Bug]: Random model output using sglang backend server

github.com/sgl-project/sglang - PanJason opened this issue 8 months ago

SG-Lang Runtime Stuck Launching in Docker Container

github.com/sgl-project/sglang - schopra8 opened this issue 8 months ago

Qwen 2 7B not working

github.com/sgl-project/sglang - sudarshan-kamath opened this issue 8 months ago

Does llava-next-video deploy only focus on first frames?

github.com/sgl-project/sglang - LetheRiver0 opened this issue 8 months ago

Unable to load 72b llava qwen on 8*A100 40GB

github.com/sgl-project/sglang - jeffhernandez1995 opened this issue 8 months ago

remove redundant pad_input_ids function

github.com/sgl-project/sglang - amosyou opened this pull request 8 months ago

llava-next-video inference result is empty

github.com/sgl-project/sglang - AmazDeng opened this issue 8 months ago

no longer can load 72b llava qwen on 4*H100 80GB

github.com/sgl-project/sglang - pseudotensor opened this issue 9 months ago

Invalid API key

github.com/sgl-project/sglang - pseudotensor opened this issue 9 months ago

Trace OpenAI backend usage

github.com/sgl-project/sglang - Ying1123 opened this issue 9 months ago

Regex generation causes 37x lower performance

github.com/sgl-project/sglang - Gintasz opened this issue 9 months ago

OOM CUDA error on 8 * L4 machine when launching sglang server

github.com/sgl-project/sglang - mounamokaddem opened this issue 9 months ago

Llama-3 regex generation can get stuck in infinite generation beyond max_tokens and crash server (reproduction example)

github.com/sgl-project/sglang - Gintasz opened this issue 9 months ago

Please add Phi3 support

github.com/sgl-project/sglang - Curiosity007 opened this issue 9 months ago

no batch run when using openai's format for calling.

github.com/sgl-project/sglang - xjw00654 opened this issue 9 months ago

ImportError: cannot import name 'function' from partially initialized module 'sglang'

github.com/sgl-project/sglang - lambda7xx opened this issue 10 months ago

ImportError: cannot import name 'get_cuda_stream' from 'triton.runtime.jit' In triton-nightly(V100)

github.com/sgl-project/sglang - nenomigami opened this issue 10 months ago

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging

github.com/sgl-project/sglang - alessiodallapiazza opened this issue 11 months ago

Allow OPTIONS Method on Http Server and add Cors headers.

github.com/sgl-project/sglang - kseyhan opened this issue 11 months ago

Supports the InternVL multimodal large model

github.com/sgl-project/sglang - exceedzhang opened this issue 11 months ago

Openrouter usage example