github.com/sgl-project/sglang issues | Ecosyste.ms: OpenCollective

Release v0.3.3

merrymercy opened this pull request 3 months ago

[Profile] Add pytorch profiler

Ying1123 opened this pull request 3 months ago

Remove references to squeezellm

janimo opened this pull request 3 months ago

[WIP] Support NVLM-D

amosyou opened this pull request 3 months ago

Update README.md

kushal34712 opened this pull request 3 months ago

Returning a per request metric for number of cached_tokens read

havetc opened this pull request 3 months ago

Optimize broadcast & Reorg code

merrymercy opened this pull request 3 months ago

Fix the port_args in bench_latency

merrymercy opened this pull request 3 months ago

Use is_flashinfer_available to replace is_hip for flashinfer check

merrymercy opened this pull request 3 months ago

Use `atexit` hook to implicitly shutdown `Runtime`

ByronHsu opened this pull request 3 months ago

Fix chunked prefill condition

ispobock opened this pull request 3 months ago

[Fix] Fix the case where prompt_len = 0

merrymercy opened this pull request 3 months ago

Fix modality for image inputs

merrymercy opened this pull request 3 months ago

Update README.md

merrymercy opened this pull request 3 months ago

Test consistency for single and batch seperately

ByronHsu opened this pull request 3 months ago

[Minor, Performance] Use torch.argmax for greedy sampling

Ying1123 opened this pull request 3 months ago

fix(docs): Improve grammar and readability in README

amantyagiprojects opened this pull request 3 months ago

[LoRA, Performance] Speedup multi-LoRA serving - Step 1

Ying1123 opened this pull request 3 months ago

Clean up event loop

merrymercy opened this pull request 3 months ago

[Bug] Fix decode stats error on output_len 1

HaiShaw opened this pull request 3 months ago

[Minor] Improve the style and fix flaky tests

merrymercy opened this pull request 3 months ago

Fix styling

ByronHsu opened this pull request 3 months ago

Fix runtime.generate when sampling param is not passed

ByronHsu opened this pull request 3 months ago

default sampling param should be deepcopied

ByronHsu opened this pull request 3 months ago

chore: update README.md

eltociear opened this pull request 3 months ago

[Bug] Fix the Image Input of Batch Generation

OBJECT907 opened this pull request 3 months ago

Update io_struct.py

OBJECT907 opened this pull request 3 months ago

[Easy] use .text() instead of .text

ByronHsu opened this pull request 3 months ago

Backend method not found when SRT Runtime is used

ByronHsu opened this pull request 3 months ago

[Bug] Inconsistent results when executing independent sglang functions in different orders

ByronHsu opened this issue 3 months ago

Refine the add request reasons to avoid corner cases.

hnyls2002 opened this pull request 3 months ago

Support min_tokens in sgl.gen

ByronHsu opened this pull request 3 months ago

[Event] Update README.md

Ying1123 opened this pull request 3 months ago

[Bug] `Meta-Llama-3.1-8B-Instruct` triggers "Detected errors during sampling! NaN in the probability." under high concurrency/RPS.

tongyx361 opened this issue 3 months ago

[LoRA, Performance] Speedup multi-LoRA serving - Step 1

Ying1123 opened this pull request 3 months ago

[Minifix] Remove extra space in cot example

FredericOdermatt opened this pull request 3 months ago

Make input_ids a torch.Tensor

merrymercy opened this pull request 3 months ago

Provide an offline engine API

ByronHsu opened this pull request 3 months ago

Use ipc instead of tcp in zmq

merrymercy opened this pull request 3 months ago

[doc] Chinese Documentation Translation Available for sglang

khum08 opened this issue 3 months ago

[Feature] Add `choices` in `/generate` endpoint and add `min_new_tokens` in `sgl.gen()`

TING2938 opened this issue 3 months ago

[Fix] Fix major performance bug in certain cases

Ying1123 opened this pull request 3 months ago

Organize sampling batch info better

merrymercy opened this pull request 3 months ago

Add llama implementation with no tensor parallel linears

jerryzh168 opened this pull request 3 months ago

Print out what the model saw?

cinjon opened this issue 3 months ago

[FP8 KV Cache] Avoid KeyError at loading pre-quantized FP8 model with kv_scale

HaiShaw opened this pull request 3 months ago

[Bug] Exception: Capture cuda graph failed: Triton Error [CUDA]: device kernel image is invalid

a136214808 opened this issue 3 months ago

Move status check in the memory pool to CPU

merrymercy opened this pull request 3 months ago

[Fix] Move ScheduleBatch out of SamplingInfo

merrymercy opened this pull request 3 months ago

[Fix] do not maintain regex_fsm in SamplingBatchInfo

merrymercy opened this pull request 3 months ago

[Performance, Hardware] MoE tuning on AMD MI300x GPUs

kkHuang-amd opened this pull request 3 months ago

[Fix] Fix all the Huggingface paths

tbarton16 opened this pull request 3 months ago

Simplify flashinfer dispatch

hnyls2002 opened this pull request 3 months ago

Llama3.2 vision model support

hnyls2002 opened this pull request 3 months ago

Dispatch flashinfer wrappers

hnyls2002 opened this pull request 3 months ago

[Refactor] Simplify io_struct and tokenizer_manager

Ying1123 opened this pull request 3 months ago

Fix bugs of `logprobs_nums`

hnyls2002 opened this pull request 3 months ago

Organize Attention Backends

hnyls2002 opened this pull request 3 months ago

Support qwen2 vl model

yizhang2077 opened this pull request 3 months ago

[Fix, LoRA] fix LoRA with updates in main

Ying1123 opened this pull request 3 months ago

Clean up batch data structures: Introducing ModelWorkerBatch

merrymercy opened this pull request 3 months ago

Rename InputMetadata -> ForwardBatch

merrymercy opened this pull request 3 months ago

Add support for Molmo-D-7B Model

BabyChouSr opened this pull request 3 months ago

Let ModelRunner take InputMetadata as input, instead of ScheduleBatch

merrymercy opened this pull request 3 months ago

[Refactor] Simplify io_struct and tokenizer_manager

Ying1123 opened this pull request 3 months ago

Process image in parallel

hnyls2002 opened this pull request 3 months ago

Move scheduler code from tp_worker.py to scheduler.py

merrymercy opened this pull request 3 months ago

fix ipv6 url when warm up model

cauyxy opened this pull request 3 months ago

[Fix] Fix AttributeError in Qwen2.5 LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_hidden_dim'

mssongit opened this pull request 3 months ago

[Fix] Fix AttributeError in Qwen2.5(huggingface model) LoRA: 'Qwen2ForCausalLM' object has no attribute 'get_module_name'

mssongit opened this pull request 3 months ago

Improve process creation

merrymercy opened this pull request 3 months ago

[Bug] ValueError: The memory capacity is unbalanced

chuangzhidan opened this issue 3 months ago

Make detokenizer_manager.py not asyncio

merrymercy opened this pull request 3 months ago

Organize image inputs

hnyls2002 opened this pull request 3 months ago

Multiple minor fixes

merrymercy opened this pull request 3 months ago

[Event] Update meeting link

Ying1123 opened this pull request 3 months ago

Add float8 dynamic quant to torchao_utils

jerryzh168 opened this pull request 3 months ago

[Feature] VLLM 6.0 support

arunpatala opened this issue 3 months ago

[Bug] IndexError: list index out of range

lvxianfeng-git opened this issue 3 months ago

[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B

Ying1123 opened this pull request 3 months ago

minor: fix config

hnyls2002 opened this pull request 4 months ago

[Feature] add support for llama 3.2

Stealthwriter opened this issue 4 months ago

[Bug] Unable to use gptq or awq with torch.compile (8*A40)

smallstepman opened this issue 4 months ago

[FIX] Catch syntax error of Regex Guide to avoid crash

du00cs opened this pull request 4 months ago

[bugfix]Add modelscope package to avoid docker image without modelscope

KylinMountain opened this pull request 4 months ago

Accuracy reduction of Lora

yileld opened this issue 4 months ago

Update Dockerfile

KylinMountain opened this pull request 4 months ago

[Bug] no module modelscope using docker compose to start sglang

KylinMountain opened this issue 4 months ago

How to study the code?

TJ949 opened this issue 4 months ago

[Feature] _get_pixel_values needs to return tgt_sizes

huangzl18883 opened this issue 4 months ago

[Fix] Ignore model import error

merrymercy opened this pull request 4 months ago

Release v0.3.2

Ying1123 opened this pull request 4 months ago

Revert "kernel: use tensor cores for flashinfer gqa kernels"

Ying1123 opened this pull request 4 months ago

[Fix] Fix clean_up_tokenization_spaces in tokenizer

merrymercy opened this pull request 4 months ago

[Bug] tensor parallel run error

jerryzh168 opened this issue 4 months ago

Add support for tie_word_embeddings when loading weights + support for SmolLM

TianyiQ opened this pull request 4 months ago

[CI] Update nightly eval

Ying1123 opened this pull request 4 months ago

[Bug] LLaVa-next does not work for single image processing

ThomasBenzshawel opened this issue 4 months ago

AWQ performance tracking

zhyncs opened this issue 4 months ago

Possible timing side-channels caused by shared prefix

Unik-lif opened this issue 4 months ago