github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Bug]: Very slow execution of from_lora_tensors() when using mp instead of ray as --distributed-executor-backend.

ashgold opened this issue 7 months ago

[Bug]: In vLLM v0.4.3 and later, calling list_loras() in a tensor parallelism situation causes the system to hang.

ashgold opened this issue 7 months ago

[ci] Diff check step

khluu opened this pull request 7 months ago

[CI/Build] Disable LLaVA-NeXT CPU test

DarkLight1337 opened this pull request 7 months ago

[Core][Distributed] improve p2p cache generation

youkaichao opened this pull request 7 months ago

[Bug]: MOE模型，2卡推理，报错AssertionError("Invalid device id")

Elissa0723 opened this issue 7 months ago

[CI/Build] [1/3] Reorganize entrypoints tests

DarkLight1337 opened this pull request 7 months ago

[Core] Remove duplicate processing in async engine

DarkLight1337 opened this pull request 7 months ago

[Misc] Fix arg names

AllenDou opened this pull request 7 months ago

[Bug]: The speed of loading the qwen2 72b model, glm-4-9b-chat-1m model in v0.5.0 is much lower than that in v0.4.2.

majestichou opened this issue 7 months ago

bump version to v0.5.0.post1

simon-mo opened this pull request 7 months ago

[Bug]: Shutdown error when using multiproc_gpu_executor

wooyeonlee0 opened this issue 7 months ago

[RFC]: Usage Data Enhancement for v0.5.*

simon-mo opened this issue 7 months ago

Limit visible devices for 2gpu tests

khluu opened this pull request 7 months ago

Add basic correctness 2 GPU tests to 4 GPU pipeline

Yard1 opened this pull request 7 months ago

[Bug]: Excessive Memory Consumption of Cudagraph on A10G/L4 GPUs

ymwangg opened this issue 7 months ago

[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue

tlrmchlsmth opened this pull request 7 months ago

[Misc] Log cudagraph memory usage

ymwangg opened this pull request 7 months ago

[Kernel] Update Cutlass int8 kernel configs for SM90

varun-sundar-rabindranath opened this pull request 7 months ago

[Bug]: Error loading FP8 weights for `gpt_bigcode` model

tdoublep opened this issue 7 months ago

[misc][distributed] fix benign error in `is_in_the_same_node`

youkaichao opened this pull request 7 months ago

[misc] fix format.sh

youkaichao opened this pull request 7 months ago

[CI/Build] Disable test_fp8.py

tlrmchlsmth opened this pull request 7 months ago

[Bugfix]typofix

AllenDou opened this pull request 7 months ago

[Bug]: Illegal memory access in CUTLASS FP8 kernels

tlrmchlsmth opened this issue 7 months ago

[Kernel] Disable CUTLASS kernels for fp8

tlrmchlsmth opened this pull request 7 months ago

[Bug]: ModuleNotFoundError: No module named 'bitsandbytes'

emillykkejensen opened this issue 7 months ago

[Bug]: ailed to import from vllm._C with ImportError('/usr/local/lib/python3.8/dist-packages/vllm/_C.abi3.so: undefined symbol: _ZN5torch7LibraryC1ENS0_4KindESsSt8optionalIN3c1011DispatchKeyEEPKcj')

MonolithFoundation opened this issue 7 months ago

[Bug]: RuntimeError: out must have shape (total_q, num_heads, head_size_og)

zhihui96 opened this issue 7 months ago

support load qwen2-72b-instruct lora

NiuBlibing opened this pull request 7 months ago

[Bug]: Qwen/Qwen2-72B-Instruct 128k server down

junior-zsy opened this issue 7 months ago

[Bug]: ray not work when tp>=2

Jimmy-Lu opened this issue 7 months ago

[Usage]: How do I get the FP8 scaling factors for KV cache?

CharlesRiggins opened this issue 7 months ago

[Hardware][Intel] fp8 kv cache support for CPU

jikunshang opened this pull request 7 months ago

[Feature]: load/unload API to run multiple LLMs in a single GPU instance

lizzzcai opened this issue 7 months ago

当调用接口，不传system时，输出卡主了，输出全是！！！！！

shujun1992 opened this issue 7 months ago

[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗

huai-ying opened this issue 7 months ago

Enable random seed option to make latency benchmarking more configurable

qingquansong opened this pull request 7 months ago

[Bug]: ImportError: cannot import name 'boolean_dispatched' from partially initialized module 'torch._jit_internal'

morestart opened this issue 7 months ago

[Bug]: NCCL hangs and causes timeout

wjj19950828 opened this issue 7 months ago

[Misc] add code to get git hash info for vllm

dhuangnm opened this pull request 7 months ago

[CI/Build] Update CPU tests to include all "standard" tests

DarkLight1337 opened this pull request 7 months ago

[Usage]: Can I use vllm.LLM(quantization="bitsandbytes"...) when bitsandbytes is supported in the v0.5.0 version

cywuuuu opened this issue 7 months ago

[Bug]: Loading Mixtral-8x22B-Instruct-v0.1-FP8 on 8xL40S causes a SIGSEGV

nickandbro opened this issue 7 months ago

[Usage]: OpenRLHF: How can I create a second NCCL Group in a vLLM v0.4.3+ Ray worker?

hijkzzz opened this issue 7 months ago

Add `cuda_device_count_stateless`

Yard1 opened this pull request 7 months ago

[Doc] Update documentation on Tensorizer

sangstar opened this pull request 7 months ago

[ci] Upload wheels

khluu opened this pull request 7 months ago

[Bug][v0.5.0]: Benign error reported by Python multiprocessing resource_tracker

mgoin opened this issue 7 months ago

[Feature]: Allow user defined extra request args to be logged in OpenAI compatible server

davidgxue opened this issue 7 months ago

[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations

mgoin opened this pull request 7 months ago

[Bug]: Runtime Error: GET was unable to find an engine to execute this computation for LLaVa-NEXT

XkunW opened this issue 7 months ago

[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests

khluu opened this pull request 7 months ago

Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations"

simon-mo opened this pull request 7 months ago

[misc] add hint for AttributeError

youkaichao opened this pull request 7 months ago

[Bug]: Torch2.3 run fail

lucasjinreal opened this issue 7 months ago

[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models

tdoublep opened this pull request 7 months ago

[Feature]: PagedAttention multiple of 8

barschiiii opened this issue 7 months ago

[Bug]: Error when --tensor-parallel-size > 1

javi111717 opened this issue 7 months ago

[Installation]: M2 Mac Dependency Torch 2.1.2 (Incompatible)

velocity33 opened this issue 7 months ago

[Bug]: Outdated binaries when re-building vLLM from source

DarkLight1337 opened this issue 7 months ago

[Bugfix] Skip test temporarily; failing quantization test

dsikka opened this pull request 7 months ago

[Bug]: 0.5.0 AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'

WangErXiao opened this issue 7 months ago

[Usage] Clarify and Update Argument for Specifying Model Revisions

Etelis opened this pull request 7 months ago

[Hardware][Intel] Support CPU inference with AVX2 ISA

DamonFool opened this pull request 7 months ago

[Bugfix] Fix wrong multi_modal_input format for CPU runner

Isotr0py opened this pull request 7 months ago

[Bug]: vllm v0.5.0 internal assert failed

changshivek opened this issue 7 months ago

[Usage]: How to serve embedding model and LLM at the same time

weiyunfei opened this issue 7 months ago

[Bug]: AttributeError: '_OpNamespace' '_C_cache_ops' object has no attribute 'reshape_and_cache_flash'

syuoni opened this issue 7 months ago

[Model] Bert Embedding Model

laishzh opened this pull request 7 months ago

[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.

bigPYJ1151 opened this pull request 7 months ago

multilora_inference调用qwen2-1.5b报错

zigangzhao-ai opened this issue 7 months ago

[Bugfix] TYPE_CHECKING for MultiModalData

kimdwkimdw opened this pull request 7 months ago

[Bug]: v0.4.3 AsyncEngineDeadError

changshivek opened this issue 7 months ago

[Bugfix] Avoid to warmup when world size is 1

kerthcet opened this pull request 7 months ago

[Kernel] Add punica dimension for Qwen2 LoRA

jinzhen-lin opened this pull request 7 months ago

[Bug]: TypeError: a bytes-like object is required, not 'str'

yaoyasong opened this issue 7 months ago

[Bug]: resource_tracker unregister error with 2*3090

xuhao916 opened this issue 7 months ago

[Doc] Update debug docs

DarkLight1337 opened this pull request 7 months ago

[Doc] Update LLaVA docs

DarkLight1337 opened this pull request 7 months ago

[Bug]: get the degree of the `outlines FSM` compilation progress from vlllm0.5.0 engine (via a route)

syGOAT opened this issue 7 months ago

`compressed-tensors` marlin 24 support

dsikka opened this pull request 7 months ago

[Feature]: PagedAttention for CPU-memory constraned environments?

peeteeman opened this issue 7 months ago

[Feature]: Add guided-* Parameters to Sampling Parameters

zhanghx0905 opened this issue 7 months ago

[ Misc ] Rs/compressed tensors cleanup

robertgshaw2-neuralmagic opened this pull request 7 months ago

[Feature]: Support [RecurrentGemmaForCausalLM]

sung-ho-moon opened this issue 7 months ago

[Bugfix] fix lora_dtype value type in arg_utils.py - part 2

c3-ali opened this pull request 7 months ago

[Docs] [Spec decode] Fix docs error in code example

cadedaniel opened this pull request 7 months ago

[Feature]: ci test with vGPU

youkaichao opened this issue 7 months ago

[Frontend] Add "input speed" to tqdm postfix alongside output speed

mgoin opened this pull request 7 months ago

[Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size

qaz-wsx-1 opened this issue 7 months ago

[RFC]: Improve guided decoding (logit_processor) APIs and performance.

rkooo567 opened this issue 7 months ago

[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes

mawong-amd opened this pull request 7 months ago

[Bug]: Automatic Prefix caching not working while hitting same request multiple times

Abhinay2323 opened this issue 7 months ago

cache image build

khluu opened this pull request 7 months ago

[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'

zhaobu opened this issue 7 months ago

[Bug]: Small context lengths consume more memory than large context lengths

majestichou opened this issue 7 months ago

[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?

fake-name opened this issue 7 months ago

[Speculative Decoding] Support draft model on different tensor-parallel size than target model

wooyeonlee0 opened this pull request 7 months ago

[Doc]: Urgent MoE question

ymmm-4 opened this issue 7 months ago