github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

Support softcap in ROCm Flash Attention

hliuca opened this pull request about 1 month ago

[CI/Build] Dockerfile build for ARM64 / GH200

drikster80 opened this pull request about 1 month ago

[Bugfix] GPU memory profiling should be per LLM instance

tjohnson31415 opened this pull request about 1 month ago

[Frontend] Add Command-R and Llama-3 chat template

ccs96307 opened this pull request about 1 month ago

[Misc] Increase default video fetch timeout

DarkLight1337 opened this pull request about 1 month ago

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug

BBuf opened this pull request about 1 month ago

[Bug]: Error when calling vLLM with audio input using Qwen/Qwen2-Audio-7B-Instruct model

jiahansu opened this issue about 1 month ago

[V1] Replace traversal search with lookup table

Abatom opened this pull request about 1 month ago

[Bugfix] Handle transformers v4.47 and fix placeholder matching in merged multi-modal processors

DarkLight1337 opened this pull request about 1 month ago

Add support for reporting metrics in completion response headers in o…

coolkp opened this pull request about 1 month ago

[torch.compile] limit inductor threads and lazy import quant

youkaichao opened this pull request about 1 month ago

[Usage]: VSCode debugger is hanging

jeejeelee opened this issue about 1 month ago

[Bug]: vLLM CPU mode broken Unable to get JIT kernel for brgemm

samos123 opened this issue about 1 month ago

[Usage]: Cant use vllm on a multiGPU node

4k1s opened this issue about 1 month ago

[Misc] Add multipstep chunked-prefill support for FlashInfer

elfiegg opened this pull request about 1 month ago

[Bugfix]: allow extra fields in requests to openai compatible server

gcalmettes opened this pull request about 1 month ago

[Core] Add Sliding Window Support with Flashinfer

pavanimajety opened this pull request about 1 month ago

[Bugfix] Fix the LoRA weight sharding in ColumnParallelLinearWithLoRA

jeejeelee opened this pull request about 1 month ago

[Pixtral-Large] Pixtral actually has no bias in vision-lang adapter

patrickvonplaten opened this pull request about 1 month ago

[misc][plugin] improve plugin loading

youkaichao opened this pull request about 1 month ago

[Bug]: Speculative decoding + guided decoding not working

arunpatala opened this issue about 1 month ago

[CI][CPU] adding numa node number as container name suffix

zhouyuan opened this pull request about 1 month ago

[Bug]: Input prompt (35247 tokens) is too long and exceeds limit of 1000

Crista23 opened this issue about 1 month ago

[Bug]: Unable to run Qwen2.5-0.5B-Instruct model in v0.6.4.post1 version, Error: No available memory for the cache blocks

Valdanitooooo opened this issue about 1 month ago

[Misc] Avoid misleading warning messages

jeejeelee opened this pull request about 1 month ago

[6/N] torch.compile rollout to users

youkaichao opened this pull request about 1 month ago

[ci/build] Have dependabot ignore all patch update

khluu opened this pull request about 1 month ago

Compressed tensors w8a8 tpu

robertgshaw2-neuralmagic opened this pull request about 1 month ago

[CI/Build] Update Dockerfile.rocm

Alexei-V-Ivanov-AMD opened this pull request about 1 month ago

Add openai.beta.chat.completions.parse example to structured_outputs.rst

mgoin opened this pull request about 1 month ago

[Bug]: vllm server crash when num-scheduler-steps > 1 and max_tokens=0

atanikan opened this issue about 1 month ago

[ci][bugfix] fix kernel tests

youkaichao opened this pull request about 1 month ago

[Bugfix] Guard for negative counter metrics to prevent crash

tjohnson31415 opened this pull request about 1 month ago

[Bug]: rocm issue

YYXLN opened this issue about 1 month ago

[Doc]: Pages were moved without a redirect

shannonxtreme opened this issue about 1 month ago

[Doc]: Migrate to Markdown

rafvasq opened this issue about 1 month ago

Fix open_collective value in FUNDING.yml

andrew opened this pull request about 1 month ago

[Doc] Update doc for LoRA support in GLM-4V

B-201 opened this pull request about 1 month ago

[CI/Build] Support compilation with local cutlass path (#10423)

wchen61 opened this pull request about 1 month ago

[Feature]: Add Support for Specifying Local CUTLASS Source Directory via Environment Variable

wchen61 opened this issue about 1 month ago

[Misc] Reduce medusa weight

skylee-01 opened this pull request about 1 month ago

Fix: Build error seen on Power Architecture

mikejuliet13 opened this pull request about 1 month ago

[Model][LoRA]LoRA support added for glm-4v

B-201 opened this pull request about 1 month ago

[Bugfix]Fix Phi-3 BNB online quantization

jeejeelee opened this pull request about 1 month ago

[Bug]: Encountered issues when deploying Llama-3.2-11B-Vision-Instruct for online inference.

CapitalLiu opened this issue about 1 month ago

[Model] Remove transformers attention porting in VITs

Isotr0py opened this pull request about 1 month ago

Bump the patch-update group with 2 updates

dependabot[bot] opened this pull request about 1 month ago

[core] Bump ray to use _overlap_gpu_communication in compiled graph tests

ruisearch42 opened this pull request about 1 month ago

[Bug]: (Program crashes after increasing --tensor-parallel-size) with error pynvml.NVMLError_InvalidArgument: Invalid Argument

JohnConnor123 opened this issue about 1 month ago

[Bug]: 使用vllm和transformer部署Qwen2vl，同一张图片输出结果不一致

Apricot1225 opened this issue about 1 month ago

[5/N][torch.compile] torch.jit.script --> torch.compile

youkaichao opened this pull request about 1 month ago

[Model][Bugfix] Support TP for PixtralHF ViT

mgoin opened this pull request about 1 month ago

[platforms] refactor cpu code

youkaichao opened this pull request about 1 month ago

[4/N][torch.compile] clean up set_torch_compile_backend

youkaichao opened this pull request about 1 month ago

Support Cross encoder models

maxdebayser opened this pull request about 1 month ago

[3/N][torch.compile] consolidate custom op logging

youkaichao opened this pull request about 1 month ago

[BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395)

xiyuan-lee opened this pull request about 1 month ago

[V1] Add code owners for V1

WoosukKwon opened this pull request about 1 month ago

[BugFix] Fix hermes tool parser output error stream arguments in some cases

xiyuan-lee opened this pull request about 1 month ago

[Bug]: Hermes tool parser output error stream arguments in some cases.

xiyuan-lee opened this issue about 1 month ago

[Bugfix][Hardware][CPU] Fix CPU embedding runner with tensor parallel

Isotr0py opened this pull request about 1 month ago

[Misc] Enhance offline_inference to support user-configurable paramet…

wchen61 opened this pull request about 1 month ago

[Feature]: Enhance offline_inference.py with Configurable Parameters for Greater Flexibility

wchen61 opened this issue about 1 month ago

Add ngram speculation to API

flozi00 opened this pull request about 1 month ago

[Bug]: v0.6.4.post1 crashed：Error in model execution: CUDA error: an illegal memory access was encountered

wciq1208 opened this issue about 1 month ago

[Bugfix] Fix M-RoPE position calculation when chunked prefill is enabled

imkero opened this pull request about 1 month ago

[Misc]: Ask for the roadmap of async output processing support for speculative decoding

Lin-Qingyang-Alec opened this issue about 1 month ago

[misc][plugin] improve log messages

youkaichao opened this pull request about 1 month ago

[BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel

rasmith opened this pull request about 1 month ago

[CI/Build] Fix IDC hpu [Device not found] issue

xuechendi opened this pull request about 1 month ago

[2/N][torch.compile] make compilation cfg part of vllm cfg

youkaichao opened this pull request about 1 month ago

[v1] V1EngineArgs for better config handling

rickyyx opened this pull request about 1 month ago

[BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel

rasmith opened this pull request about 1 month ago

Fix integer overflow causing gpu segfault

rasmith opened this pull request about 1 month ago

[Bug]: Granite 3.0 disconnect between parser and example template

wilbry opened this issue about 1 month ago

Test k8s agent

dhonnappa-amd opened this pull request about 1 month ago

[Feature]: NVIDIA Triton GenAI Perf Benchmark

simon-mo opened this issue about 1 month ago

[Bug]: Guided Decoding Broken in Streaming mode

JC1DA opened this issue about 1 month ago

[Bugfix] Ignore ray reinit error when current platform is ROCm or XPU

HollowMan6 opened this pull request about 1 month ago

[V1] Refactor model executable interface for all text-only language models

ywang96 opened this pull request about 1 month ago

[Bug]: VLM benchmark_serving request not working

gracehonv opened this issue about 1 month ago

[doc] add doc for the plugin system

youkaichao opened this pull request about 1 month ago

[Doc] Add the start of an arch overview page

russellb opened this pull request about 1 month ago

[CI/Build] Add sphinx/rst linter for docs

rafvasq opened this pull request about 1 month ago

[Misc] Medusa supports custom bias

skylee-01 opened this pull request about 1 month ago

[Bug]: contine generation but do not return the output

siyuyuan opened this issue about 1 month ago

[Platform][Refactor] Extract func `get_default_attn_backend` to `Platform`

MengqingCao opened this pull request about 1 month ago

[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU

bigPYJ1151 opened this pull request about 1 month ago

Add KV-Cache int8 quant support

YanyunDuanIEI opened this pull request about 1 month ago

[Core] Interface for accessing model from engine

DarkLight1337 opened this pull request about 1 month ago

[Bugfix] Fix fully sharded LoRA bug

jeejeelee opened this pull request about 1 month ago

[Misc] Consolidate pooler config overrides

DarkLight1337 opened this pull request about 1 month ago

[Bugfix] Qwen-vl output is inconsistent in speculative decoding

skylee-01 opened this pull request about 1 month ago

[Misc] Fix import error in tensorizer tests and cleanup some code

DarkLight1337 opened this pull request about 1 month ago

[Doc] Remove float32 choice from --lora-dtype

xyang16 opened this pull request about 1 month ago

Add default value to avoid Falcon crash (#5363)

wchen61 opened this pull request about 1 month ago

[DRAFT] Cutlass 2:4

robertgshaw2-neuralmagic opened this pull request about 1 month ago

[Usage]: cuda oom when serving multi task on same server

reneix opened this issue about 1 month ago

[Misc]: Snowflake Arctic out of memory error with TP-8

rajagond opened this issue about 1 month ago

[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend

manninglucas opened this issue about 1 month ago