github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Core]: (2/N) Support prefill only models by Workflow Defined Engine - Prefill only scheduler

noooop opened this pull request 2 months ago

[Installation]: Pytorch nightly version 2.6 meets error: error: can't copy '/tmp/tmpv5hlsgcm.build-lib/vllm/_core_C.abi3.so': doesn't exist or not a regular file

shaoyuyoung opened this issue 3 months ago

[Bugfix] Fix lora loading for Compressed Tensors in #9120

fahadh4ilyas opened this pull request 3 months ago

[TPU] Fix memory profiling

WoosukKwon opened this pull request 3 months ago

[Bug]: quantization does not work with dummy weight format

youkaichao opened this issue 3 months ago

[Bug]: Extreme low throughput when using pipeline parallelism when Batch Size(running req) is small

AlvL1225 opened this issue 3 months ago

[Bug]: Error Running Qwen2.5-7B-Instruct on CPU

xiayouran opened this issue 3 months ago

[Model] Remap FP8 kv_scale in CommandR and DBRX

hliuca opened this pull request 3 months ago

Update link to KServe deployment guide

terrytangyuan opened this pull request 3 months ago

[Bug]: Port binding keep failing due to unnecessary code

James4Ever0 opened this issue 3 months ago

Add classifiers in setup.py

terrytangyuan opened this pull request 3 months ago

[Doc] Fix VLM prompt placeholder sample bug

ycool opened this pull request 3 months ago

[Usage]: due to large max_mm_tokens, number of images that multimodal models can support is underestimated

SepehrV opened this issue 3 months ago

[Bug]: vLLM OpenAI-api server `/docs` endpoint fails to load

mgoin opened this issue 3 months ago

[Misc] Improve validation errors around best_of and n

tjohnson31415 opened this pull request 3 months ago

[WIP] Prototyping re-arch

WoosukKwon opened this pull request 3 months ago

[ci][test] use load dummy for testing

youkaichao opened this pull request 3 months ago

[Feature]: Enabling MSS for larger number of sequences (>256)

kushanam opened this issue 3 months ago

[Performance]: Llama-3.2-11B-Vision-Instruct taking up a lot of memory

pbarker opened this issue 3 months ago

mypy: check additional directories

russellb opened this pull request 3 months ago

Add `lm-eval` directly to requirements-test.txt

mgoin opened this pull request 3 months ago

[Bugfix] Optimize composite weight loading and fix EAGLE weight loading

DarkLight1337 opened this pull request 3 months ago

[Bugfix][Doc] Report neuron error in output

joerowell opened this pull request 3 months ago

[Misc]: How to set num-scheduler-steps

o1iv3r opened this issue 3 months ago

[Usage]: Multi-gpu inference takes too much memory + how to make uneven load

Ouna-the-Dataweaver opened this issue 3 months ago

[Misc]: Segmentation Fault in vLLM API Server during Model Initialization (NCCL Error: Unhandled System Error)

shreyasp-07 opened this issue 3 months ago

[Doc] Update vlm.rst to include an example on videos

sayakpaul opened this pull request 3 months ago

[Frontend][Feature] Add jamba tool parser

tomeras91 opened this pull request 3 months ago

[Bug]: InternVL bounding box prediction does not work

MoritzLaurer opened this issue 3 months ago

[Bug]: Can not pip install vllm inside docker

fahadh4ilyas opened this issue 3 months ago

[Frontend] Add Early Validation For Chat Template / Tool Call Parser

alex-jw-brooks opened this pull request 3 months ago

[Misc]: Nobody reviews my PR

CharlesRiggins opened this issue 3 months ago

[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1

sroy745 opened this pull request 3 months ago

support bitsandbytes quantization with more models

chenqianfzh opened this pull request 3 months ago

[Neuron] Introduce paged attention support for neuron backend

liangfu opened this pull request 3 months ago

[Bug]: vllm much slower on long context inputs when using --enable-lora even when lora is not used

badrjd opened this issue 3 months ago

[Bugfix] Fix crashing for multimodal when image passed with height == 1

Pernekhan opened this pull request 3 months ago

[torch.compile] Fuse RMSNorm with quant

ProExpertProg opened this pull request 3 months ago

[Bug]: Unable to use --enable-lora on latest vllm docker container (v0.6.2)

noelo opened this issue 3 months ago

[Doc] Improve contributing and installation documentation

rafvasq opened this pull request 3 months ago

[Core][Frontend] Add Support for Inference Time mm_processor_kwargs

alex-jw-brooks opened this pull request 3 months ago

[CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04

mgoin opened this pull request 3 months ago

[Bug]: assert len(indices) == len(inputs) with `Qwen/Qwen2-VL-2B-Instruct`

sayakpaul opened this issue 3 months ago

[Bug]: Error Encountered in vLLM Benchmarking with Input Length greater than 8192 in Llama 3.1 405B Model

Bihan opened this issue 3 months ago

[Usage]: Not getting the infrence metrics in the api response

vverma01232 opened this issue 3 months ago

[New Model]: silma-ai/SILMA-9B-Instruct-v1.0

hassanraha opened this issue 3 months ago

[Core]: (1/N) Support prefill only models by Workflow Defined Engine - Prefill only attention

noooop opened this pull request 3 months ago

[Bugfix][Core] Handle empty ids_list in BlockSpaceManagerV1.get_common_computed_block_ids to prevent msgspec serialization errors

amberOoO opened this pull request 3 months ago

[Bug] BlockSpaceManagerV1.get_common_computed_block_ids returns empty string, causing msgspec decode failure

amberOoO opened this issue 3 months ago

[OpenVINO] Use torch 2.4.0 and newer optimim version

ilya-lavrenov opened this pull request 3 months ago

[Bug]: Unsupported base layer: QKVParallelLinear when loading lora to a quantized model

fahadh4ilyas opened this issue 3 months ago

[Bug]: Installation from last commit (version wrong)

johnnynunez opened this issue 3 months ago

[Bug]: Issue Running VLLM Open AI using nonroot user in K8s

luhurfth opened this issue 3 months ago

[Frontend] API support for beam search for MQLLMEngine

LunrEclipse opened this pull request 3 months ago

[Bugfix][Hardware] Fix model input for decode

yma11 opened this pull request 3 months ago

[Usage]: How to run llama 3.2 with CPU only version

chanandrew96 opened this issue 3 months ago

[Bug] In v0.6.2, when tp=1, TPOT becomes very slow for batch sizes of 10 or so. (not happened in v0.5.5)

ashgold opened this issue 3 months ago

[Feature]: Does vLLM support ONNX models?

LetianLee opened this issue 3 months ago

[Bug]: AMD MultiStep Feature Issue. Missing argument: 'turn_prefills_into_decodes' in `advance_step()`

tjtanaa opened this issue 3 months ago

[Feature]: LLMEngine and ModelConfig explicitly require path or HF model id, but no InferenceClient class for locally running VLLM server

DanielViglione opened this issue 3 months ago

support jetson AGX Orin

johnnynunez opened this pull request 3 months ago

[Model] Explicit interface for vLLM models and support OOT embedding models

DarkLight1337 opened this pull request 3 months ago

[Usage]: chat 接口有问题，completion接口正常

cdhx opened this issue 3 months ago

[core] remove beam search from the core

youkaichao opened this pull request 3 months ago

[Misc] Remove user-facing error for removed VLM args

DarkLight1337 opened this pull request 3 months ago

[BugFix][Core] Fix BlockManagerV2 when Encoder Input is None

sroy745 opened this pull request 3 months ago

[torch.compile] register blocksparse attention

youkaichao opened this pull request 3 months ago

[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model

tjtanaa opened this pull request 3 months ago

[Bug]: Try-catch conditions are incorrect to import correct ROCm Flash Attention Backend in Draft Model

tjtanaa opened this issue 3 months ago

[Bug]: Llama-3.2-11B-Vision-Instruct which is an encoder-decoder model fails with BlockManager V2

sroy745 opened this issue 3 months ago

[RFC]: hide continuous batching complexity through forward context

youkaichao opened this issue 3 months ago

[core] use forward context for flash infer

youkaichao opened this pull request 3 months ago

[Bug]: vllm serve Exception in ASGI application

SpaceHunterInf opened this issue 3 months ago

[Model] Make llama3.2 support multiple and interleaved images

xiangxu-google opened this pull request 3 months ago

[Bug]: VLLM Model Fails on Kubernetes with "CUDA error: operation not permitted when stream is capturing"

CREESTL opened this issue 3 months ago

[Bugfix] limit lora init id greater than 0

Ssunbell opened this pull request 3 months ago

[Installation]: cannot install vllm with openvino backend

guanxiang opened this issue 3 months ago

[Bug]: Qwen2-VL model support

kulievvitaly opened this issue 3 months ago

[Model] PP support for embedding models and update docs

DarkLight1337 opened this pull request 3 months ago

[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend

Isotr0py opened this pull request 3 months ago

[Doc] Update README.md with Ray summit slides

zhuohan123 opened this pull request 3 months ago

[Frontend] API support for beam search

LunrEclipse opened this pull request 3 months ago

[Bugfix] Try to handle older versions of pytorch

bnellnm opened this pull request 3 months ago

[Misc] Fix CI lint

comaniac opened this pull request 3 months ago

[Bugfix] use blockmanagerv1 for encoder-decoder

heheda12345 opened this pull request 3 months ago

[Bugfix] Deprecate registration of custom configs to huggingface

heheda12345 opened this pull request 3 months ago

[Bug]: vLLM MQLLMEngine Timeout - Json Schema

wrisigo opened this issue 3 months ago

[Misc] Add random seed for prefix cache benchmark

Imss27 opened this pull request 3 months ago

[Bug]: Lack of reproducibility across multiple runs of prefix cache benchmark

Imss27 opened this issue 3 months ago

Yet another Prefill-Decode separation in vllm

chenqianfzh opened this pull request 3 months ago

Developed the PoC of dAttention support. It will utilize the similar idea of vAttention, but it introduces a new memory layout that overcomes the waste of memory of vAttention.

tongping opened this pull request 3 months ago

[Misc] Improved prefix cache example

Imss27 opened this pull request 3 months ago

[Bug]: vllm overrides transformer's Autoconfig for mllama

lyuqin-scale opened this issue 3 months ago

Remove AMD Ray Summit Banner

simon-mo opened this pull request 3 months ago

[Doc]: Clear documentation about function / tool calling with examples

greg2705 opened this issue 3 months ago

[Installation]: Build failed with error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

ReeceResearch opened this issue 3 months ago

[Misc]: Need to understand support for torch.compile in Q4 roadmap

amd-abhikulk opened this issue 3 months ago

[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL

whyiug opened this pull request 3 months ago

[Usage]: Benchmarking Issues: Low Success Rate and Tensor Parallel Size Constraints on 8x AMD MI300x GPUs

Bihan opened this issue 3 months ago

[Bug]: Issue with Pixtral Model: Unsupported Vision Configuration in vLLM ( AMD amd 7900 xtx)

matrix1233 opened this issue 3 months ago