github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[help wanted]: rename vllm/logging module to avoid shadowing builtin logging module

youkaichao opened this issue about 1 month ago

[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill

NickLucche opened this pull request about 1 month ago

[Kernel]Enable HPU for Speculative Decoding

xuechendi opened this pull request about 1 month ago

[Mistral] FP8 format

patrickvonplaten opened this pull request about 1 month ago

[Bug]: can not serve microsoft/llava-med-v1.5-mistral-7b

cubense opened this issue about 1 month ago

Prefix Cache Aware Scheduling [1/n]

rickyyx opened this pull request about 1 month ago

[V1][Bugfix] Propagate V1 LLMEngine properly

comaniac opened this pull request about 1 month ago

[Usage]: VLLM failing to stream response after 512+ prompt tokens.

aghbd opened this issue about 1 month ago

[Core] Add padding-aware scheduling for 2D prefills

kzawora-intel opened this pull request about 1 month ago

[Usage]: Engine iteration timed out. (during using qwen2-vl-7b)

HuiyuanYan opened this issue about 1 month ago

[CI/Build] Always run mypy

russellb opened this pull request about 1 month ago

[V1] Allow piecewise cuda graphs to run with custom allreduce

SageMoore opened this pull request about 1 month ago

Fix quantization config of vl model

jinzhen-lin opened this pull request about 1 month ago

[New Model]: dunzhang/stella_en_1.5B_v5

cavities opened this issue about 1 month ago

[Bug]: vllm0.6.3.post1 7B model can not use cmd vllm.entrypoints.openai.api_server on wsl

xiezhipeng-git opened this issue about 2 months ago

[Doc]: follow the doc but got error

husheng-liu opened this issue about 2 months ago

[RFC]: Merge input processor and input mapper for multi-modal models

DarkLight1337 opened this issue about 2 months ago

[Hardware][CPU][torch.compile] integrate torch compile

bigPYJ1151 opened this pull request about 2 months ago

[Bugfix] Make image processor respect `mm_processor_kwargs` for Qwen2-VL

li-plus opened this pull request about 2 months ago

[Bug]: When apply continue_final_message for OpenAI server, the `"echo":false` is ignored.

DIYer22 opened this issue about 2 months ago

[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target

bigPYJ1151 opened this pull request about 2 months ago

[Hardware][XPU] AWQ/GPTQ support for xpu backend

yma11 opened this pull request about 2 months ago

[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark.

spliii opened this pull request about 2 months ago

[Bug]: Engine loop has died for Meta-Llama-3.1-8B-Instruct TP=2

HaoyuWang4188 opened this issue about 2 months ago

[V1][BugFix] Fix Generator construction in greedy + seed case

njhill opened this pull request about 2 months ago

Add hf_transfer to testing image

mgoin opened this pull request about 2 months ago

[Kernel]Generalize Speculative decode from Cuda

xuechendi opened this pull request about 2 months ago

[Usage]: disable pydantic request validation

matbee-eth opened this issue about 2 months ago

Splitting attention kernel file

maleksan85 opened this pull request about 2 months ago

[Misc] Improve Web UI

rafvasq opened this pull request about 2 months ago

[Feature]: Enhance integration with advanced LB/gateways with better load/cost reporting and LoRA management

liu-cong opened this issue about 2 months ago

[CI/Build] Automate PR body text cleanup

russellb opened this pull request about 2 months ago

[Bug]:Structured outputs inference often took a very long time,and eventually causing a timeout and vLLM engine crushing.

hpx502766238 opened this issue about 2 months ago

[Feature]: Add Gamma Distribution Request Support for Serving Benchmark.

spliii opened this issue about 2 months ago

[Performance]: Throughput and Latency degradation with a single LoRA adapter on A100 40 GB

kaushikmitr opened this issue about 2 months ago

[Core] Add dynamic chunk size calculation

prashantgupta24 opened this pull request about 2 months ago

[Build] Fix for the Wswitch-bool clang warning

gshtras opened this pull request about 2 months ago

[Doc] Updated TPU install instructions

mikegre-google opened this pull request about 2 months ago

[Kernel] Refactor Cutlass c3x

varun-sundar-rabindranath opened this pull request about 2 months ago

[Benchmark] guided decoding

aarnphm opened this pull request about 2 months ago

[0/N] Rename `MultiModalInputs` to `MultiModalKwargs`

DarkLight1337 opened this pull request about 2 months ago

[Bug]: PyTorch 2.5.x vLLM 1.0.0 dev issue with tensor parallel size > 1

CortexEdgeUser opened this issue about 2 months ago

Online video support for VLMs

litianjian opened this pull request about 2 months ago

Adding cascade inference to vLLM

raywanb opened this pull request about 2 months ago

[WIP] Ray Backend V1

rkooo567 opened this pull request about 2 months ago

[Bugfix] Upgrade to pytorch 2.5.1

bnellnm opened this pull request about 2 months ago

[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined.

gcalmettes opened this pull request about 2 months ago

[Doc] Update VLM doc about loading from local files

ywang96 opened this pull request about 2 months ago

[Bug]: last_token_time is equal to arrival_time

wolfgangsmdt opened this issue about 2 months ago

[Misc] Modify BNB parameter name

jeejeelee opened this pull request about 2 months ago

[Core] Enhance memory profiling in determine_num_available_blocks with error handling and fallback

Ahmed14z opened this pull request about 2 months ago

[Bug]: For speculative decoding with a draft model, the "determine_num_available_blocks" only considers the memory usage of the target model

hustxiayang opened this issue about 2 months ago

[Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep

tlrmchlsmth opened this pull request about 2 months ago

[Bug]: Segment fault when import decord before import vllm

litianjian opened this issue about 2 months ago

[Performance]: FP8 performance worse than FP16 for Qwen2-VL-2B-Instruct

LinJianping opened this issue about 2 months ago

[Bug]: Llama3.2 tool calling OpenAI API not working

SinanAkkoyun opened this issue about 2 months ago

[Bug]: I cannot able to load the model on TESLA T4 GPU in Full precision

VpkPrasanna opened this issue about 2 months ago

[Bug]: internvl “max_dynamic_patch” not work, and add_special_tokens bug

wangpeng138375 opened this issue about 2 months ago

[Bug]: [Regression Issue] The output from Qwen2 VL are different between vLLM v0.6.3-post1 and vLLM v0.6.1-post2

tjtanaa opened this issue about 2 months ago

[Misc]Reduce BNB static variable

jeejeelee opened this pull request about 2 months ago

[Bug]: Deploying glm4 reported an error："auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set

shnyyds opened this issue about 2 months ago

[Usage]: Are there any batch size requirements for offline batch inference? For example, is 10,000 okay?

joyyyhuang opened this issue about 2 months ago

[Bugfix] Fix E2EL mean and median stats

daitran2k1 opened this pull request about 2 months ago

[5/N] pass the whole config to model

youkaichao opened this pull request about 2 months ago

[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers

sroy745 opened this pull request about 2 months ago

[Installation]: Model Architectures FalconMambaForCasualLM are not supported for now.

RohithDAces opened this issue about 2 months ago

[4/N] make quant config first-class citizen

youkaichao opened this pull request about 2 months ago

[Feature]: do you plan to support "suffix" of "v1/completions"

qiao-wei opened this issue about 2 months ago

[Bugfix][OpenVINO] Fix circular reference #9939

MengqingCao opened this pull request about 2 months ago

[Bugfix] Fix `MQLLMEngine` hanging

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[V1] Prefix caching (take 2)

comaniac opened this pull request about 2 months ago

[Doc] correct schema in example batch jsonl file: max_completion_tokens -> max_tokens

staeiou opened this pull request about 2 months ago

[CI] Basic Integration Test For TPU

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[Usage]: How to use `llava-hf/llava-1.5-7b-hf` with bitsandbytes quantization in `vllm serve`?

asadfgglie opened this issue about 2 months ago

../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && in dex < sizes[i] && "index out of bounds"` failed.

Wiselnn570 opened this issue about 2 months ago

[Bug]: ValueError:Could not broadcast input array from shape (542,) into shape (512,)

sherlockma11 opened this issue about 2 months ago

[help wanted]: fix broken xverse model

youkaichao opened this issue about 2 months ago

[Hardware][CPU] Add ARM CPU backend

ShawnD200 opened this pull request about 2 months ago

[BugFix]: properly deserialize `tool_calls` iterator before processing by mistral-common when MistralTokenizer is used

gcalmettes opened this pull request about 2 months ago

[V1][VLM] Enable proper chunked prefill for multimodal models

ywang96 opened this pull request about 2 months ago

[Bugfix] Fix Phi-3 BNB quantization with tensor parallel

Isotr0py opened this pull request about 2 months ago

[V1] Support per-request seed

njhill opened this pull request about 2 months ago

[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1

FurtherAI opened this pull request about 2 months ago

[Doc] Add documentation for Structured Outputs

ismael-dm opened this pull request about 2 months ago

Bump the patch-update group with 3 updates

dependabot[bot] opened this pull request about 2 months ago

[Core]Add New Run:ai Streamer Load format.

pandyamarut opened this pull request about 2 months ago

[CI] Prune tests/models/decoder_only/language/* tests

mgoin opened this pull request about 2 months ago

[Bug]: from vllm.platforms import current_platform infinite loop error with OpenVino Build.

CalebXDonoho opened this issue about 2 months ago

[Bug]: Phi-3 cannot be used with bitsandbytes

yananchen1989 opened this issue about 2 months ago

[CI] Prune down LM Eval test time

mgoin opened this pull request about 2 months ago

[ci/build] Have dependabot ignore pinned dependencies

khluu opened this pull request about 2 months ago

[CI] Prune back the number of tests in tests/kernels/*

mgoin opened this pull request about 2 months ago

[Bugfix] Fix pickle of input when async output processing is on

wallashss opened this pull request about 2 months ago

[misc] Allow partial prefix benchmarking & random input generation for prefix benchmarking

rickyyx opened this pull request about 2 months ago

Doc: Improve benchmark documentation

rafvasq opened this pull request about 2 months ago

[RFC] Propose a vulnerability management team

russellb opened this pull request about 2 months ago

[Doc] Move CONTRIBUTING to docs site

russellb opened this pull request about 2 months ago

[Frontend] Automatic detection of chat content format from AST

DarkLight1337 opened this pull request about 2 months ago

[Bug]: illegal memory access error when using prefix caching

StevenTang1998 opened this issue about 2 months ago

[Bugfix] Fix MiniCPMV and Mllama BNB bug

jeejeelee opened this pull request about 2 months ago