github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Bug]: vllm 0.6.4 部署 MiniCPM-V_2_6_awq_int4 报错

fengqiliang93 opened this issue 10 days ago

[Bugfix] Fix none seed sampling in rejection_sampler

TopIdiot opened this pull request 10 days ago

[Bug]: N-gram speculative decoding got wrong output when some of seeds is None in a batch

TopIdiot opened this issue 10 days ago

[Bug]: InternVL2-Llama3-76B-AWQ RUN ERROR KeyError: 'layers.39.mlp.gate_up_proj.qweight'

Oldpan opened this issue 10 days ago

[Misc][LoRA] Ensure Lora Adapter requests return adapter name

Jeffwan opened this pull request 11 days ago

[Bug]: lora adapter request still return the base model name

Jeffwan opened this issue 11 days ago

[Misc]: Potential division by zero in csrc/cpu/attention.cpp

Xaenalt opened this issue 12 days ago

[Bug]: Actively generated `request` is starved when new requests arrive (tensor parallel)

llmwiz opened this issue 12 days ago

[Misc]: Brand guidelines around vLLM logo; is there a media kit that can be downloaded with brand assets?

jessicachitas opened this issue 12 days ago

[RFC]: Adding support for Geospatial models

christian-pinto opened this issue 12 days ago

[Bug]: When I use llmcompressor to quantify the llama3 70b model to int8-a8w8,it shows ValueError: Failed to invert hessian due to numerical instability.

rexmxw02 opened this issue 12 days ago

[Hardware][CPU] support cpu in v1 engine

yma11 opened this pull request 12 days ago

[Performance]: It takes over 20 hours to quantize llama3-70B with w8a8 and I wonder does it meet expectations?

moonlightian opened this issue 12 days ago

[V1][Bugfix] Always set enable_chunked_prefill = True for V1

WoosukKwon opened this pull request 12 days ago

Don't try to add special tokens to the matcher in XGrammar.

sjuxax opened this pull request 12 days ago

[torch.compile] add a flag to track batchsize statistics

youkaichao opened this pull request 12 days ago

[Bugfix] Fix value unpack error of simple connector for KVCache transfer.

ShangmingCai opened this pull request 12 days ago

make `fused_moe_kernel`'s `EM` and `num_valid_tokens` arguments `do_not_specialize`

JiayiFeng opened this pull request 12 days ago

[Performance]: Arguments `EM` and `num_valid_tokens` of `fused_moe_kernel` should be set to `do_not_specialize`

JiayiFeng opened this issue 12 days ago

[Bug, V1]: Service launch failed with v1 code and custom models

PYNing opened this issue 12 days ago

[Bug]: bug when using 8*GPU, Error while creating shared memory segment

Tan-Hexiang opened this issue 12 days ago

[CI/Build] Increase VLLM_MAX_SIZE_MB to 300M

tolak opened this pull request 12 days ago

[Performance]: TGI processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config

paulcx opened this issue 12 days ago

[Feature]: logging request_id instead of random uuid

cynial opened this issue 12 days ago

[Usage]: How to specify the local storage path for vllm download models？

MiDonkey opened this issue 12 days ago

[CI] Expand OpenAI guided decoding tests

mgoin opened this pull request 12 days ago

[Bugfix] cuda error running llama 3.2

GeneDer opened this pull request 12 days ago

[Bugfix] Fix guided decoding with tokenizer mode mistral

wallashss opened this pull request 12 days ago

[Bug]: Guided decoding crashing when tokenizer_mode is set to mistral

wallashss opened this issue 12 days ago

[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF.

sjuxax opened this pull request 12 days ago

[Pixtral] Improve loading

patrickvonplaten opened this pull request 12 days ago

[Bugfix] Handle <|tool_call|> token in granite tool parser

tjohnson31415 opened this pull request 13 days ago

[Bugfix] Backport request id validation to v0

joerunde opened this pull request 13 days ago

Update README.md

dmoliveira opened this pull request 13 days ago

[Kernel] Triton Paged Attn Decode Kernel

rahulbatra85 opened this pull request 13 days ago

[V1] Use input_ids as input for text-only models

WoosukKwon opened this pull request 13 days ago

monitor metrics of tokens per step using cudagraph batchsizes

youkaichao opened this pull request 13 days ago

[Hardware][Gaudi] Add multiprocessing HPU executor

kzawora-intel opened this pull request 13 days ago

[Frontend] Add OpenAI API support for input_audio

kylehh opened this pull request 13 days ago

[Bugfix] Fix usage of `deprecated` decorator

DarkLight1337 opened this pull request 13 days ago

[Model] Add Llama-SwiftKV model

aurickq opened this pull request 13 days ago

[BUG] Remove token param #10921

flaviabeo opened this pull request 13 days ago

[V1] VLM preprocessor hashing

alexm-neuralmagic opened this pull request 13 days ago

Avoid mistakenly picking Gaudi/HPU if XPU is requested.

janimo opened this pull request 13 days ago

[Misc]: Has anyone tried to run Microsoft Graphrag with vllm?

SushmitaSingh96 opened this issue 13 days ago

[Neuron] Upgrade neuron to 2.20.2

xendo opened this pull request 13 days ago

[Performance]: Is it a normal case that sampling will take up most of time during the execution of one iteration?

oldcpple opened this issue 13 days ago

[Usage]: Multiple rounds of image dialogue support ？（多轮图片对话支持？）

qingchen177 opened this issue 13 days ago

[torch.compile] add dynamo time tracking

youkaichao opened this pull request 13 days ago

[Performance]: why pipeline parallel performance will be severely degraded when using offline batching?

zhaocaibei123 opened this issue 13 days ago

[Misc][LoRA] Add PEFTHelper for LoRA

jeejeelee opened this pull request 13 days ago

[Feature]: Support for Qwen2-VL on AWS Neuron

Chin-Vic opened this issue 13 days ago

[Feature]: Use Block Group for KV cache allocation in FastSwitch to support better I/O usage

aoshen524 opened this issue 13 days ago

[v1] fix use compile sizes

youkaichao opened this pull request 13 days ago

[misc] clean up and unify logging

youkaichao opened this pull request 13 days ago

[Doc][V1] Add V1 support column for multimodal models

ywang96 opened this pull request 13 days ago

[V1] Fix Detokenizer loading in `AsyncLLM`

ywang96 opened this pull request 13 days ago

[core] clean up cudagraph batchsize padding logic

youkaichao opened this pull request 13 days ago

[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support

dsikka opened this pull request 14 days ago

[Usage]: Qwen/Qwen2-VL-7B-Instruct

mahmoudelnazer opened this issue 14 days ago

[torch.compile][misc] fix comments

youkaichao opened this pull request 14 days ago

[Model] PP support for Mamba-like models

mzusman opened this pull request 14 days ago

[CI/Build] Check transformers v4.47

DarkLight1337 opened this pull request 14 days ago

[Doc]: ValueError: Model architectures ['Qwen2ForSequenceClassification'] are not supported for no

skywindy opened this issue 14 days ago

[V1] Further reduce CPU overheads in flash-attn

WoosukKwon opened this pull request 14 days ago

[V1][VLM] Add V1-rearch image inference support for Qwen2-VL

ywang96 opened this pull request 14 days ago

[Bug]: Qwen2VL doesn't work with TPU backend

carlesoctav opened this issue 14 days ago

[core][distributed] initialization from StatelessProcessGroup

youkaichao opened this pull request 14 days ago

[Doc] Update README.md

habaohaba opened this pull request 14 days ago

[torch.compile] allow candidate compile sizes

youkaichao opened this pull request 14 days ago

[Bug]: LLama 3.2 vision focuses only on first image

hrodruck opened this issue 15 days ago

Update benchmarking code

Faraz9877 opened this pull request 15 days ago

[Bug]: Why is structured output in 0.6.4.post1 overflowing my RAM but 0.6.3.post1 has a workaround?

Leon-Sander opened this issue 15 days ago

[V1][WIP] V1 sampler implements parallel sampling (PR 1/N for parallel sampling support)

afeldman-nm opened this pull request 15 days ago

[Bugfix] Multiple fixes to tool streaming with hermes and mistral

cedonley opened this pull request 15 days ago

[Doc] Explicitly state that InternVL 2.5 is supported

DarkLight1337 opened this pull request 15 days ago

[Model] Implement merged input processor for Phi-3-Vision models

Isotr0py opened this pull request 15 days ago

[core][executor] simplify instance id

youkaichao opened this pull request 15 days ago

[Doc] Explicitly state that PP isn't compatible with speculative decoding yet

DarkLight1337 opened this pull request 15 days ago

[Usage]: How to run local model in docker with cpu

yuzifu opened this issue 15 days ago

[Bugfix] Fix test-pipeline.yaml

jeejeelee opened this pull request 15 days ago

[torch.compile] use depyf to dump torch.compile internals

youkaichao opened this pull request 15 days ago

[Bug]: Vllm CPU mode only takes 1 single core for multi-core cpu

fzyzcjy opened this issue 15 days ago

[Bug]: embedding model not supported

cosmic-chichu opened this issue 15 days ago

[Bug]: ngram Speculation for LlamaForCausalLM Models Fails due to Sampler

avnukala opened this issue 15 days ago

[Frontend] Use request id from header

joerunde opened this pull request 15 days ago

[Misc]: Saved sharded state should also include GPU P2P access cache

k4rth33k opened this issue 15 days ago

[Usage]: Unable to server embedding model e5-mistral-7b-instruct

SushmitaSingh96 opened this issue 16 days ago

[Core] Add support for loading weight that has already done TP sharding

HollowMan6 opened this pull request 16 days ago

[New Model]: Add support for Llama3.3

jorgeantonio21 opened this issue 16 days ago

[Bug]: Can't load/compile Mixtral-8x7B-Instruct-v0.1 on TPU

hosseinsarshar opened this issue 16 days ago

[V1] Input Batch Relocation

varun-sundar-rabindranath opened this pull request 16 days ago

[Core] Cleanup startup logging a bit

russellb opened this pull request 16 days ago

[misc] fix typo

youkaichao opened this pull request 16 days ago

[V1] Run mypy on

WoosukKwon opened this pull request 16 days ago

[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation

Isotr0py opened this pull request 16 days ago

[V1] LoRA Support

varun-sundar-rabindranath opened this pull request 16 days ago

[ci] fix broken tests

youkaichao opened this pull request 16 days ago

[Misc][LoRA] Abstract PunicaWrapper

jeejeelee opened this pull request 16 days ago

[Bug]: Function calling not working properly for Qwen2.5-Coder models

wizche opened this issue 16 days ago