github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Bug]: Tensor Parallelism performs poorly

DanielViglione opened this issue 4 months ago

[CI/Build] VLM Test Consolidation

alex-jw-brooks opened this pull request 4 months ago

[Bug]: process killed when I set tp>1 for running benchmark_throughput.py

zeyang12-jpg opened this issue 4 months ago

[CI][Misc] Add tests for python-only development

cermeng opened this pull request 4 months ago

[Bug]: cannot run model when TP>1 (already run debug file)

jli943 opened this issue 4 months ago

[Feature]: support for prompt cache

wiluen opened this issue 4 months ago

[Bug]: --cpu-offload-gb flag not honored in vllm/vllm-openai container on amazon g5.2xlarge

DanielViglione opened this issue 4 months ago

[Usage]: 在尝试利用vllm通过加载lora适配器来进行模型推理的时候，表显不符合预期

PeaceAndJoyAaron opened this issue 4 months ago

[Bug]: In function calls, when outputting Chinese, a backslash character "\" appears before Chinese characters.

yhhit opened this issue 4 months ago

[Bug]: 400 Bad Request

ErykCh opened this issue 4 months ago

[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs

bhupendra1324 opened this issue 4 months ago

[Misc]: Im trying to host my finetuned Llama -3-8b instruct in Vllm

preethiisenthil opened this issue 4 months ago

[Bug]: Error running Molmo on API in v0.6.3

Inforeon opened this issue 4 months ago

[Bug]: guided_json fails on pixtral when using OpenAI API

ktrapeznikov opened this issue 4 months ago

[Bugfix]: Make chat content text allow type content

vrdn-23 opened this pull request 4 months ago

[BugFix] Fix chat API continuous usage stats

njhill opened this pull request 4 months ago

[Bug]: llama3.2-11B-Vision-Instruct not working

warlockedward opened this issue 4 months ago

bugfix on draft_tp value

qibaoyuan opened this pull request 4 months ago

[Installation]: v0.6.3 install -cutlass failed (Fetchcontent.cmake build step)

tolry418 opened this issue 4 months ago

[Installation]: When can release the WHL package for version v0.6.3 of cu118?

controlRun opened this issue 4 months ago

[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage

joerunde opened this pull request 4 months ago

[Bugfix] Update InternVL input mapper to support image embeds

hhzhang16 opened this pull request 4 months ago

[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel

WoosukKwon opened this pull request 4 months ago

pass ignore_eos parameter to all benchmark_serving calls

gracehonv opened this pull request 4 months ago

[Doc] Fix code formatting in spec_decode.rst

mgoin opened this pull request 4 months ago

[Docs] Remove PDF build from Readtehdocs

simon-mo opened this pull request 4 months ago

[Usage]: Obtaining success / error rate % metrics

yqlu opened this issue 4 months ago

[Frontend] Clarify model_type error messages

stevegrubb opened this pull request 4 months ago

[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support

bigPYJ1151 opened this pull request 4 months ago

[Bugfix] Clean up some cruft in mamba.py

tlrmchlsmth opened this pull request 4 months ago

[Bug]: vllm crashes when preemption of priority scheduling is triggered on vllm-0.6.3.dev173+g36ea7907.d20241011

tonyaw opened this issue 4 months ago

[Bug]: LLAMA 3.2 11B Vision Instruct Model not Running in VLLM 0.6.2

saikatscalers opened this issue 4 months ago

[Installation]: Adding opentelemetry packages in container image

sanketsudake opened this issue 4 months ago

[Usage]: --cpu-offload-gb no use

Rane2021 opened this issue 4 months ago

[Hardware] [Intel GPU] Add multistep scheduler for xpu device

jikunshang opened this pull request 4 months ago

[Feature]: Allow max_tokens = 0

fgebhart opened this issue 4 months ago

[Bug]: missing 'Finished request xxxx' log

jinzhen-lin opened this issue 4 months ago

[Bug]: TPU single-host v5e-8 HBM OOM with Llama 3.1 70B and tpu_int8 quantization

samos123 opened this issue 4 months ago

[Bug]: Exception in worker VllmWorkerProcess while processing method init_device: NCCL error: unhandled cuda error

wangyao123456a opened this issue 4 months ago

[Bug]: Gemma 27B Produces no Outputs (2B and 9B work fine)

RonanKMcGovern opened this issue 4 months ago

[Feature]: Support for rhymes-ai/Aria

engchina opened this issue 4 months ago

[Misc]: remove dropout related stuff from triton flash attention kernel

HaiShaw opened this issue 4 months ago

[Bug]: vLLM was installed and used without issues, but recently, during more frequent usage, it suddenly throws an error on a particular request and stops working entirely. Even nvidia-smi cannot return any output. The log is as follows:

alexchenyu opened this issue 4 months ago

[Bug]: 当vLLM 部署实现 OpenAI API，并且生成模型使用llama 3 8b instruct做RAG任务时，模型生成不停

asilverlight opened this issue 4 months ago

[Bug]: Installed vllm successfully for AMD MI60 but inference is failing

Said-Akbar opened this issue 4 months ago

[Usage]: [rank0]: AttributeError: 'LLMEngine' object has no attribute 'driver_worker'

xuyuemei opened this issue 4 months ago

[CI] Fix merge conflict

LiuXiaoxuanPKU opened this pull request 4 months ago

[Bug]: KeyError during loading of Mixtral 8x22B in FP8

IowaSovereign opened this issue 4 months ago

[help wanted]: write tests for python-only development

youkaichao opened this issue 4 months ago

[RFC]: Let every model be a reward model/embedding model for PRMs

zhuzilin opened this issue 4 months ago

[Bug]: different generation result when changing parameters using `copy_` and `=` method

hxdtest opened this issue 4 months ago

[Bug]: api_server.py: error: argument --tool-call-parser: invalid choice: 'llama3_json' (choose from 'mistral', 'hermes')

joestein-ssc opened this issue 4 months ago

[Bugfix] Update grafana dashboard

zhan9san opened this pull request 4 months ago

[Bug]: vllm mistralai--Codestral-22B-v0.1 response is truncated

Fly-Pluche opened this issue 4 months ago

[Misc][Installation] Improve source installation script and related documentation

cermeng opened this pull request 4 months ago

[Bug]: Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered

eyuansu62 opened this issue 4 months ago

[Bug]: latest docker build (0.6.2) got error due to VLLM_MAX_SIZE_MB

ZJLi2013 opened this issue 4 months ago

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered

Clint-chan opened this issue 4 months ago

[Installation]: vllm installation error

leoneyar opened this issue 4 months ago

[Bug]: Hermes 2 Pro Tool parser could not locate tool call start/end tokens in the tokenizer!

LuckLittleBoy opened this issue 4 months ago

[Model] VLM2Vec, the first multimodal embedding model in vLLM

DarkLight1337 opened this pull request 4 months ago

[core] move parallel sampling out from vllm core

youkaichao opened this pull request 4 months ago

[Quantization][TPU] `compressed-tensors` integration for TPU

robertgshaw2-neuralmagic opened this pull request 4 months ago

[misc] Fine-grained CustomOp enabling mechanism

ProExpertProg opened this pull request 4 months ago

[Bugfix] Fix support for dimension like integers and ScalarType

bnellnm opened this pull request 4 months ago

[SpecDec] Remove Batch Expansion (2/3)

LiuXiaoxuanPKU opened this pull request 4 months ago

[CI/Build] Adds a test for multi step with TPUs

allenwang28 opened this pull request 4 months ago

[Frontend] merge beam search implementations

LunrEclipse opened this pull request 4 months ago

[bugfix] fix f-string for error

prashantgupta24 opened this pull request 4 months ago

[New Model]: meta-llama/Llama-Guard-3-1B

ayeganov opened this issue 4 months ago

[Misc] Add environment variables collection in collect_env.py tool

ycool opened this pull request 4 months ago

[Model] Support Mamba2 (Codestral Mamba)

tlrmchlsmth opened this pull request 4 months ago

[Feature] [Spec decode]: Combine chunked prefill with speculative decoding

NickLucche opened this pull request 4 months ago

[Bug]: Out of memory with large multi-step and large gpu-memory-utilization values - `--num-scheduler-steps 16 --gpu-memory-utilization 0.941`

varun-sundar-rabindranath opened this issue 4 months ago

Add `vllm_v1`

WoosukKwon opened this pull request 4 months ago

[Doc] Remove outdated comment to avoid misunderstanding

homeffjy opened this pull request 4 months ago

[Bugfix]Fix MiniCPM's LoRA bug

jeejeelee opened this pull request 4 months ago

Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph.

sighingnow opened this pull request 4 months ago

May I ask what is the image parameter of vllm's api about blip2, and I have an error here，INFO: 127.0.0.1:40608 - "POST /v1/completions HTTP/1.1" 400 Bad Request

zhaoxueqi6666 opened this issue 4 months ago

[Bug]: Simultaneous mm calls lead to permanently degraded performance.

SeanIsYoung opened this issue 4 months ago

[Bug]: MiniCPM3-4B is support lora by --enable-lora ?

ML-GCN opened this issue 4 months ago

`seed_everything` doesn't handle HPU

SanjuCSudhakaran opened this pull request 4 months ago

[Bug]: VLLM doesn't support LoRa with config `modules_to_save`

fahadh4ilyas opened this issue 4 months ago

[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs

hissu-hyvarinen opened this pull request 4 months ago

[CI] add `ignore_eos` for `benchmark_serving.py`

jikunshang opened this pull request 4 months ago

[Bugfix] Fix priority in multiprocessing engine

schoennenbeck opened this pull request 4 months ago

[Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps`

junstar92 opened this pull request 4 months ago

[Misc][LoRA] Support loading LoRA weights for target_modules in reg format

jeejeelee opened this pull request 4 months ago

[Usage]: Manually Increasing inference time

Playerrrrr opened this issue 4 months ago

[Usage]: VLLM 0.6.2 includes vllm-flash-attn, is it no longer necessary to install flash-attn separately?

Rssevenyu opened this issue 4 months ago

[Bug]: priority scheduling doesn't work on vllm-0.6.3.dev152+gde895f16.d20241010

tonyaw opened this issue 4 months ago

Max num seqs"

seungrokj opened this pull request 4 months ago

[Usage]: blip2 inference code

zhaoxueqi6666 opened this issue 4 months ago

[Bug]: ptxas /tmp/tmpxft_002385ca_00000000-11_attention_kernels.compute_50.ptx, line 4986061; error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

zhangfan-algo opened this issue 4 months ago

[RFC]: Make device agnostic for diverse hardware support

wangshuai09 opened this issue 4 months ago

[CI/Build] mypy: Resolve some errors from checking vllm/engine

russellb opened this pull request 4 months ago

Pytorch hete spec

jiqing-feng opened this pull request 4 months ago

[Feature]: Improve Logging For Embedding Models

robertgshaw2-neuralmagic opened this issue 4 months ago

[Frontend, Core] Adding stop and stop_token_ids for beam search.

nFunctor opened this pull request 4 months ago

[Bug]: AsyncLLMEngine stuck on a single too long request

rickyyx opened this issue 4 months ago