github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Doc] Specify async engine args in docs

DarkLight1337 opened this pull request about 2 months ago

[V1] Prototype Fully Async Detokenizer

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[core] cudagraph output with tensor weak reference

youkaichao opened this pull request about 2 months ago

[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL

hector-gr opened this issue about 2 months ago

[Performance]: How to Improve Performance Under Concurrency

ljwps opened this issue about 2 months ago

[Bugfix] Use temporary directory in registry

DarkLight1337 opened this pull request about 2 months ago

[Model] Add BNB quantization support for Mllama

Isotr0py opened this pull request about 2 months ago

[Misc] SpecDecodeWorker supports profiling

Abatom opened this pull request about 2 months ago

[torch.compile] rework compile control with piecewise cudagraph

youkaichao opened this pull request about 2 months ago

[Usage]: ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now.

Mjiegu opened this issue about 2 months ago

[Bug]: Inconsistent evaluations when enabling / disabling chunked_prefill?

Jingyu6 opened this issue about 2 months ago

[Model] Add classification Task with Qwen2ForSequenceClassification

kakao-kevin-us opened this pull request about 2 months ago

[Usage]: Using a model for inference and embedding

micuentadecasa opened this issue about 2 months ago

[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows

xiezhipeng-git opened this issue about 2 months ago

CI TEST

maxdebayser opened this pull request about 2 months ago

[Model] Support math-shepherd-mistral-7b-prm model

Went-Liang opened this pull request about 2 months ago

[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled

ankush13r opened this issue about 2 months ago

[Model] Support GGUF models newly added in `transformers` 4.46.0

Isotr0py opened this pull request about 2 months ago

[Core] Support offloading KV cache to CPU

KuntaiDu opened this pull request about 2 months ago

[Build] skip renaming files for release wheels pipeline

simon-mo opened this pull request about 2 months ago

[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nemotron-70B-Instruct-HF generate garbage on v0.6.3 ( issue is not seen in v0.6.2)

source-ram opened this issue about 2 months ago

[Doc] Update FAQ links in spec_decode.rst

whyiug opened this pull request about 2 months ago

[Misc]: huggingface_hub.errors.HFValidationError using LLama3.1-405b

unrue opened this issue about 2 months ago

[V1] Move mm_input_mapper to a separate process

WoosukKwon opened this pull request about 2 months ago

[torch.compile] Adding torch compile annotations to some models

CRZbulabula opened this pull request about 2 months ago

[Bugfix] Fix edge cases for MistralTokenizer

tjohnson31415 opened this pull request 2 months ago

[Model][LoRA]LoRA support added for Qwen

jeejeelee opened this pull request 2 months ago

[CI/Build] improve python-only dev setup

dtrifiro opened this pull request 2 months ago

[Bug]: crash：RecursionError: maximum recursion depth exceeded

wciq1208 opened this issue 2 months ago

[Core] Make encoder-decoder inputs a nested structure to be more composable

DarkLight1337 opened this pull request 2 months ago

Linter test

maxdebayser opened this pull request 2 months ago

[Misc] Upgrade to pytorch 2.5

bnellnm opened this pull request 2 months ago

[Feature]: LoRA support for Qwen model

zhangfan-algo opened this issue 2 months ago

[Bugfix] use AF_INET6 instead of AF_INET for OpenAI Compatible Server

jxpxxzj opened this pull request 2 months ago

[Performance]: vllm Eagle performance is worse than expected

LiuXiaoxuanPKU opened this issue 2 months ago

[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models

sroy745 opened this pull request 2 months ago

[Bug]: GGUF Llama-3.1-Nemotron-70B-Instruct-HF ValueError: cannot reshape array of size into shape

paolovic opened this issue 2 months ago

[Bug]: MistralTokenizer Detokenization Issue

prashantgupta24 opened this issue 2 months ago

[Bugfix][Misc]: fix graph capture for decoder

yudian0504 opened this pull request 2 months ago

[Feature]: Support for Controlled Decoding

simonucl opened this issue 2 months ago

[Bugfix] Fix load config when using bools

madt2709 opened this pull request 2 months ago

[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix caching, block manager v2 and xformers enabled together

sasha0552 opened this pull request 2 months ago

[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger

heheda12345 opened this pull request 2 months ago

[Frontend] Support suffix in completions API (fill-in-the-middle)

njhill opened this pull request 2 months ago

Adds method to read the pooling types from model's files

flaviabeo opened this pull request 2 months ago

[Model] Update MPT model with GLU and rope and add low precision layer norm

kazuki opened this pull request 2 months ago

[Bug]: When reading the content from the configuration file specified by the --config parameter, the parameter type was not considered.

SakigamiYang opened this issue 2 months ago

[Bug]: [Performance] 100% performance drop using multiple lora vs no lora(qwen-chat model)

askcs517 opened this issue 2 months ago

[Feature]: LoRA support for InternVLChatModel

AkshataABhat opened this issue 2 months ago

[Misc] Fix ImportError causing by triton

MengqingCao opened this pull request 2 months ago

【Frontend】Add sampler_priority and repetition_penalty_range

ZeroYuJie opened this pull request 2 months ago

[Performance]: InternVL multi image speed is not improved compare to original

luohao123 opened this issue 2 months ago

[Feature]: Consider parallel_tool_calls parameter at the API level

lucasalvarezlacasa opened this issue 2 months ago

[Misc] Compute query_start_loc/seq_start_loc on CPU

zhengy001 opened this pull request 2 months ago

[Frontend] re-enable multi-modality input in the new beam search implementation

FerdinandZhong opened this pull request 2 months ago

[Bug]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241016-170451.pkl): view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

double-vin opened this issue 2 months ago

[Performance]: inference with qwen2.5 using version vLLM 0.6.3 is felt to be slower

Jimmy-L99 opened this issue 2 months ago

[Usage]: Which branch should I use to test speculative decoding

v-lmn opened this issue 2 months ago

Begin refactoring executor_base ABC

jberkhahn opened this pull request 2 months ago

Support Roberta embedding models

maxdebayser opened this pull request 2 months ago

[Performance][Kernel] Fused_moe Performance Improvement

charlifu opened this pull request 2 months ago

[New Model]: Support Zyphra/Zamba2-7B

mgoin opened this issue 2 months ago

[Bug]: KeyError: 'layers.60.mlp.gate_up_proj.weight' mistral large bitsandbytes

copasseron opened this issue 2 months ago

[CI/Build] remove .github from .dockerignore

dtrifiro opened this pull request 2 months ago

[Neuron] [Bugfix] Fix neuron startup

xendo opened this pull request 2 months ago

[Bug]: Tensor Parallelism performs poorly

DanielViglione opened this issue 2 months ago

[CI/Build] VLM Test Consolidation

alex-jw-brooks opened this pull request 2 months ago

[Bug]: process killed when I set tp>1 for running benchmark_throughput.py

zeyang12-jpg opened this issue 2 months ago

[CI][Misc] Add tests for python-only development

cermeng opened this pull request 2 months ago

[Bug]: cannot run model when TP>1 (already run debug file)

jli943 opened this issue 2 months ago

[Feature]: support for prompt cache

wiluen opened this issue 2 months ago

[Bug]: --cpu-offload-gb flag not honored in vllm/vllm-openai container on amazon g5.2xlarge

DanielViglione opened this issue 2 months ago

[Usage]: 在尝试利用vllm通过加载lora适配器来进行模型推理的时候，表显不符合预期

PeaceAndJoyAaron opened this issue 2 months ago

[Bug]: In function calls, when outputting Chinese, a backslash character "\" appears before Chinese characters.

yhhit opened this issue 2 months ago

[Bug]: 400 Bad Request

ErykCh opened this issue 2 months ago

[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs

bhupendra1324 opened this issue 2 months ago

[Misc]: Im trying to host my finetuned Llama -3-8b instruct in Vllm

preethiisenthil opened this issue 2 months ago

[Bug]: Error running Molmo on API in v0.6.3

Inforeon opened this issue 2 months ago

[Bug]: guided_json fails on pixtral when using OpenAI API

ktrapeznikov opened this issue 2 months ago

[Bugfix]: Make chat content text allow type content

vrdn-23 opened this pull request 2 months ago

[BugFix] Fix chat API continuous usage stats

njhill opened this pull request 2 months ago

[Bug]: llama3.2-11B-Vision-Instruct not working

warlockedward opened this issue 2 months ago

bugfix on draft_tp value

qibaoyuan opened this pull request 2 months ago

[Installation]: v0.6.3 pip install -e . error

tolry418 opened this issue 2 months ago

[Installation]: When can release the WHL package for version v0.6.3 of cu118?

controlRun opened this issue 2 months ago

[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage

joerunde opened this pull request 2 months ago

[Bugfix] Update InternVL input mapper to support image embeds

hhzhang16 opened this pull request 2 months ago

[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel

WoosukKwon opened this pull request 2 months ago

pass ignore_eos parameter to all benchmark_serving calls

gracehonv opened this pull request 2 months ago

[Doc] Fix code formatting in spec_decode.rst

mgoin opened this pull request 2 months ago

[Docs] Remove PDF build from Readtehdocs

simon-mo opened this pull request 2 months ago

[Usage]: Obtaining success / error rate % metrics

yqlu opened this issue 2 months ago

[Frontend] Clarify model_type error messages

stevegrubb opened this pull request 2 months ago

[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support

bigPYJ1151 opened this pull request 2 months ago

[Bugfix] Clean up some cruft in mamba.py

tlrmchlsmth opened this pull request 2 months ago

[Bug]: vllm crashes when preemption of priority scheduling is triggered on vllm-0.6.3.dev173+g36ea7907.d20241011

tonyaw opened this issue 2 months ago

[Bug]: LLAMA 3.2 11B Vision Instruct Model not Running in VLLM 0.6.2

saikatscalers opened this issue 2 months ago

[Installation]: Adding opentelemetry packages in container image

sanketsudake opened this issue 2 months ago

[Usage]: --cpu-offload-gb no use

Rane2021 opened this issue 2 months ago

[Hardware] [Intel GPU] Add multistep scheduler for xpu device

jikunshang opened this pull request 2 months ago