github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Feature]: Allow max_tokens = 0

fgebhart opened this issue 2 months ago

[Bug]: Exception in worker VllmWorkerProcess while processing method init_device: NCCL error: unhandled cuda error

wangyao123456a opened this issue 2 months ago

[Feature]: Support for rhymes-ai/Aria

engchina opened this issue 2 months ago

[Misc]: remove dropout related stuff from triton flash attention kernel

HaiShaw opened this issue 2 months ago

[Bug]: vLLM was installed and used without issues, but recently, during more frequent usage, it suddenly throws an error on a particular request and stops working entirely. Even nvidia-smi cannot return any output. The log is as follows:

alexchenyu opened this issue 2 months ago

[Bug]: 当vLLM 部署实现 OpenAI API，并且生成模型使用llama 3 8b instruct做RAG任务时，模型生成不停

asilverlight opened this issue 2 months ago

[Bug]: Installed vllm successfully for AMD MI60 but inference is failing

Said-Akbar opened this issue 2 months ago

[Usage]: [rank0]: AttributeError: 'LLMEngine' object has no attribute 'driver_worker'

xuyuemei opened this issue 2 months ago

[CI] Fix merge conflict

LiuXiaoxuanPKU opened this pull request 2 months ago

[Bug]: KeyError during loading of Mixtral 8x22B in FP8

IowaSovereign opened this issue 2 months ago

[help wanted]: write tests for python-only development

youkaichao opened this issue 2 months ago

[RFC]: Let every model be a reward model/embedding model for PRMs

zhuzilin opened this issue 2 months ago

[Bug]: different generation result when changing parameters using `copy_` and `=` method

hxdtest opened this issue 2 months ago

[Bug]: api_server.py: error: argument --tool-call-parser: invalid choice: 'llama3_json' (choose from 'mistral', 'hermes')

joestein-ssc opened this issue 2 months ago

[Bugfix] Update grafana dashboard

zhan9san opened this pull request 2 months ago

[Bug]: vllm mistralai--Codestral-22B-v0.1 response is truncated

Fly-Pluche opened this issue 2 months ago

[Misc][Installation] Improve source installation script and related documentation

cermeng opened this pull request 2 months ago

[Bug]: Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered

eyuansu62 opened this issue 2 months ago

[Bug]: latest docker build (0.6.2) got error due to VLLM_MAX_SIZE_MB

ZJLi2013 opened this issue 2 months ago

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered

Clint-chan opened this issue 2 months ago

[Installation]: vllm installation error

leoneyar opened this issue 2 months ago

[Bug]: Hermes 2 Pro Tool parser could not locate tool call start/end tokens in the tokenizer!

LuckLittleBoy opened this issue 2 months ago

[Model] VLM2Vec, the first multimodal embedding model in vLLM

DarkLight1337 opened this pull request 2 months ago

[core] move parallel sampling out from vllm core

youkaichao opened this pull request 2 months ago

[Quantization][TPU] `compressed-tensors` integration for TPU

robertgshaw2-neuralmagic opened this pull request 2 months ago

[misc] Fine-grained CustomOp enabling mechanism

ProExpertProg opened this pull request 2 months ago

[Bugfix] Fix support for dimension like integers and ScalarType

bnellnm opened this pull request 2 months ago

[SpecDec] Remove Batch Expansion (2/3)

LiuXiaoxuanPKU opened this pull request 2 months ago

[CI/Build] Adds a test for multi step with TPUs

allenwang28 opened this pull request 2 months ago

[Frontend] merge beam search implementations

LunrEclipse opened this pull request 2 months ago

[bugfix] fix f-string for error

prashantgupta24 opened this pull request 2 months ago

[New Model]: meta-llama/Llama-Guard-3-1B

ayeganov opened this issue 2 months ago

[Misc] Add environment variables collection in collect_env.py tool

ycool opened this pull request 2 months ago

[Model] Support Mamba2 (Codestral Mamba)

tlrmchlsmth opened this pull request 2 months ago

[Feature] [Spec decode]: Combine chunked prefill with speculative decoding

NickLucche opened this pull request 2 months ago

[Bug]: Out of memory with large multi-step and large gpu-memory-utilization values - `--num-scheduler-steps 16 --gpu-memory-utilization 0.941`

varun-sundar-rabindranath opened this issue 2 months ago

Add `vllm_v1`

WoosukKwon opened this pull request 2 months ago

[Doc] Remove outdated comment to avoid misunderstanding

homeffjy opened this pull request 2 months ago

[Bugfix]Fix MiniCPM's LoRA bug

jeejeelee opened this pull request 2 months ago

Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph.

sighingnow opened this pull request 2 months ago

May I ask what is the image parameter of vllm's api about blip2, and I have an error here，INFO: 127.0.0.1:40608 - "POST /v1/completions HTTP/1.1" 400 Bad Request

zhaoxueqi6666 opened this issue 2 months ago

[Bug]: Simultaneous mm calls lead to permanently degraded performance.

SeanIsYoung opened this issue 2 months ago

[Bug]: MiniCPM3-4B is support lora by --enable-lora ?

ML-GCN opened this issue 2 months ago

`seed_everything` doesn't handle HPU

SanjuCSudhakaran opened this pull request 2 months ago

[Bug]: VLLM doesn't support LoRa with config `modules_to_save`

fahadh4ilyas opened this issue 2 months ago

[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs

hissu-hyvarinen opened this pull request 2 months ago

[CI] add `ignore_eos` for `benchmark_serving.py`

jikunshang opened this pull request 2 months ago

[Bugfix] Fix priority in multiprocessing engine

schoennenbeck opened this pull request 2 months ago

[Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps`

junstar92 opened this pull request 2 months ago

[Misc][LoRA] Support loading LoRA weights for target_modules in reg format

jeejeelee opened this pull request 2 months ago

[Usage]: Manually Increasing inference time

Playerrrrr opened this issue 2 months ago

[Usage]: VLLM 0.6.2 includes vllm-flash-attn, is it no longer necessary to install flash-attn separately?

Rssevenyu opened this issue 2 months ago

[Bug]: priority scheduling doesn't work on vllm-0.6.3.dev152+gde895f16.d20241010

tonyaw opened this issue 2 months ago

Max num seqs"

seungrokj opened this pull request 2 months ago

[Usage]: blip2 inference code

zhaoxueqi6666 opened this issue 2 months ago

[Bug]: ptxas /tmp/tmpxft_002385ca_00000000-11_attention_kernels.compute_50.ptx, line 4986061; error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

zhangfan-algo opened this issue 2 months ago

[RFC]: Make device agnostic for diverse hardware support

wangshuai09 opened this issue 2 months ago

[CI/Build] mypy: Resolve some errors from checking vllm/engine

russellb opened this pull request 2 months ago

Pytorch hete spec

jiqing-feng opened this pull request 2 months ago

[Feature]: Improve Logging For Embedding Models

robertgshaw2-neuralmagic opened this issue 2 months ago

[Frontend, Core] Adding stop and stop_token_ids for beam search.

nFunctor opened this pull request 2 months ago

[Bug]: AsyncLLMEngine stuck on a single too long request

rickyyx opened this issue 2 months ago

[ci/build] Add placeholder command for custom models test and add comments

khluu opened this pull request 2 months ago

[misc] hide best_of from engine

youkaichao opened this pull request 2 months ago

[Bug]: Streaming response fails after one token (0.5.3.post1)

NeonDaniel opened this issue 2 months ago

[CI/Build] Adopt Mergify for auto-labeling PRs

russellb opened this pull request 2 months ago

[torch.compile] generic decorators

youkaichao opened this pull request 2 months ago

[Doc][Neuron] add note to neuron documentation about resolving triton issue

omrishiv opened this pull request 2 months ago

[Doc] Improve quickstart documentation

rafvasq opened this pull request 2 months ago

[Usage]: running gated models offline

SamuelBG13 opened this issue 2 months ago

[Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected

LucasWilkinson opened this pull request 2 months ago

[Bug]: new beam search implementation ignores stop conditions

nFunctor opened this issue 2 months ago

[CI/Build] Make the `Dockerfile.cpu` file's `PIP_EXTRA_INDEX_URL` Configurable as a Build Argument

jyono opened this pull request 2 months ago

[Misc] Standardize RoPE handling for Qwen2-VL

DarkLight1337 opened this pull request 2 months ago

[Model] Add Qwen2-Audio model support

faychu opened this pull request 2 months ago

[Doc]: The relationship between FlashAttentionBackend and paged_attention_kernel

zhaotyer opened this issue 2 months ago

[Kernel] adding fused moe kernel config for L40S TP4

bringlein opened this pull request 2 months ago

[Bug]: vllm0.6.2 Using FLASHINFER to start VLLM reported an error, enabling -- quantification gptq -- kv cache dtype fp8_e5m2

Rssevenyu opened this issue 2 months ago

[Model] Add GLM-4v support and meet vllm==0.6.2

sixsixcoder opened this pull request 2 months ago

Questions about the inference performance of the GPTQ model

Rssevenyu opened this issue 2 months ago

[Model] support input image embedding for minicpmv

whyiug opened this pull request 2 months ago

[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker

FBR65 opened this issue 2 months ago

[Misc] Fix sampling from sonnet for long context case

Imss27 opened this pull request 2 months ago

[issue tracker] make quantization compatible with dynamo dynamic shape

youkaichao opened this issue 2 months ago

[Misc] Collect model support info in a single process per model

DarkLight1337 opened this pull request 2 months ago

[Bug]: AssertionError: Error in memory profiling. Initial free memory 85470478336, current free memory 85470478336. This happens when the GPU memory was not properly cleaned up before initializing the vLLM instance. [rank0]:[W1010 16:28:18.581149478 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors] ERROR 10-10 16:28:20 api_server.py:188] RPCServer process died before responding to readiness probe

imrankh46 opened this issue 2 months ago

[Bug]: Qwen2.5-72B-Instruct压测出现AsyncLLMEngine has failed, terminating server process

WangJianQ-0118 opened this issue 2 months ago

[Feature] vLLM ARM Enablement for AARCH64 CPUs

sanketkaleoss opened this pull request 2 months ago

[Bug]: Could not `pip install vllm` inside dockerfile after certain commit in `main` branch

fahadh4ilyas opened this issue 2 months ago

[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args

DarkLight1337 opened this pull request 2 months ago

[BugFix] Fix tool call finish reason in streaming case

maxdebayser opened this pull request 2 months ago

[Bugfix] Sets `is_first_step_output` for TPUModelRunner

allenwang28 opened this pull request 2 months ago

Add example of helm chart for vllm deployment on k8s

mfournioux opened this pull request 2 months ago

Bump actions/setup-python from 3 to 5

dependabot[bot] opened this pull request 2 months ago

[RFC]: Adopt mergify for auto-labeling PRs

russellb opened this issue 2 months ago

[Performance]: phi 3.5 vision model consuming high CPU RAM and the process getting killed

kuladeephx opened this issue 2 months ago

[Kernel][Model] Improve continuous batching for Jamba and Mamba

mzusman opened this pull request 2 months ago

[Misc]: Repeat the sample sonnet.txt contents to accomodate large seq lengths in benchmarking

Bihan opened this issue 2 months ago

[Bug]: Qwen2.5-Math-7B-Instruct vllm output garbled code, but huggingface not

ziyuwan opened this issue 2 months ago

[Installation]: pip install vllm-0.6.2.zip err:setuptools-scm was unable to detect version for /tmp/pip-req-build-7ptioibj

uRENu opened this issue 2 months ago