vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: Embedding doesn't work with `device="cpu"`

github.com/vllm-project/vllm - TheRoadQaQ opened this issue 4 months ago

[Model] Port over CLIPVisionModel for VLMs

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Bug]: The "sorted(gpu_ids)" operation in ray_gpu_executor.py causes an incorrect order of GPU IDs When using the NVIDIA HGX A100 (16-GPU) platform for model inference.

github.com/vllm-project/vllm - JiantaoXu opened this issue 4 months ago

[Usage]: 'InternVLChatConfig' object has no attribute 'num_attention_heads'

github.com/vllm-project/vllm - wangdong1992 opened this issue 4 months ago

[Optimization] use a pool to reuse LogicalTokenBlock.token_ids

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Hardware][Intel] Add AWQ support for CPU backend

github.com/vllm-project/vllm - zhouyuan opened this pull request 4 months ago

[Frontend] Add model peak memory usage to loading weights log

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Bug]: chunked prefill scheudler uses up swap on many n>=2 requests

github.com/vllm-project/vllm - toslunar opened this issue 4 months ago

[CI/BUILD] Support non-AVX512 vLLM building and testing

github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago

[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard

github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago

[Bug]: BitsandBytes quantization is not working as expected

github.com/vllm-project/vllm - QwertyJack opened this issue 4 months ago

[Bug]: Regression in LoRA Adapter loading speed between vllm 0.4.3 and 0.5.0

github.com/vllm-project/vllm - sampritipanda opened this issue 4 months ago

[Bug]: Speculative decoding server: `ValueError: could not broadcast input array from shape (513,) into shape (512,)`

github.com/vllm-project/vllm - jeffreyling opened this issue 4 months ago

[Performance] [Speculative decoding] Speed up autoregressive proposal methods by making sampler CPU serialization optional

github.com/vllm-project/vllm - cadedaniel opened this issue 4 months ago

[Kernel] Adding bias epilogue support for `cutlass_scaled_mm`

github.com/vllm-project/vllm - ProExpertProg opened this pull request 4 months ago

[Kernel] Add punica dimensions for Granite 13b

github.com/vllm-project/vllm - joerunde opened this pull request 4 months ago

[RFC]: Implement disaggregated prefilling via KV cache transfer

github.com/vllm-project/vllm - KuntaiDu opened this issue 4 months ago

[Bug]: RuntimeError: CUDA error: no kernel image is available for execution on the device

github.com/vllm-project/vllm - seungyoonee opened this issue 4 months ago

[Bug]: prefix-caching: inconsistent completions

github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago

[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Feature]: LoRA support for Mixtral GPTQ and AWQ

github.com/vllm-project/vllm - StrikerRUS opened this issue 4 months ago

[Bug]: CUDA illegal memory access error when `enable_prefix_caching=True`

github.com/vllm-project/vllm - mpoemsl opened this issue 4 months ago

[Usage]: how to use enable-chunked-prefill?

github.com/vllm-project/vllm - chenchunhui97 opened this issue 4 months ago

[Bug]: Very slow execution of from_lora_tensors() when using mp instead of ray as --distributed-executor-backend.

github.com/vllm-project/vllm - ashgold opened this issue 4 months ago

[Bug]: In vLLM v0.4.3 and later, calling list_loras() in a tensor parallelism situation causes the system to hang.

github.com/vllm-project/vllm - ashgold opened this issue 4 months ago

[ci] Diff check step

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[CI/Build] Disable LLaVA-NeXT CPU test

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Core][Distributed] improve p2p cache generation

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bug]: MOE模型，2卡推理，报错AssertionError("Invalid device id")

github.com/vllm-project/vllm - Elissa0723 opened this issue 4 months ago

[CI/Build] [1/3] Reorganize entrypoints tests

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Core] Remove duplicate processing in async engine

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Misc] Fix arg names

github.com/vllm-project/vllm - AllenDou opened this pull request 4 months ago

[Bug]: The speed of loading the qwen2 72b model, glm-4-9b-chat-1m model in v0.5.0 is much lower than that in v0.4.2.

github.com/vllm-project/vllm - majestichou opened this issue 4 months ago

bump version to v0.5.0.post1

github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago

[Bug]: Shutdown error when using multiproc_gpu_executor

github.com/vllm-project/vllm - wooyeonlee0 opened this issue 4 months ago

[RFC]: Usage Data Enhancement for v0.5.*

github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago

Limit visible devices for 2gpu tests

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

Add basic correctness 2 GPU tests to 4 GPU pipeline

github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago

[Bug]: Excessive Memory Consumption of Cudagraph on A10G/L4 GPUs

github.com/vllm-project/vllm - ymwangg opened this issue 4 months ago

[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Misc] Log cudagraph memory usage

github.com/vllm-project/vllm - ymwangg opened this pull request 4 months ago

[Kernel] Update Cutlass int8 kernel configs for SM90

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 4 months ago

[Bug]: Error loading FP8 weights for `gpt_bigcode` model

github.com/vllm-project/vllm - tdoublep opened this issue 4 months ago

[misc][distributed] fix benign error in `is_in_the_same_node`

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[misc] fix format.sh

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[CI/Build] Disable test_fp8.py

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Bugfix]typofix

github.com/vllm-project/vllm - AllenDou opened this pull request 4 months ago

[Bug]: Illegal memory access in CUTLASS FP8 kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this issue 4 months ago

[Kernel] Disable CUTLASS kernels for fp8

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Bug]: ModuleNotFoundError: No module named 'bitsandbytes'

github.com/vllm-project/vllm - emillykkejensen opened this issue 4 months ago

[Bug]: ailed to import from vllm._C with ImportError('/usr/local/lib/python3.8/dist-packages/vllm/_C.abi3.so: undefined symbol: _ZN5torch7LibraryC1ENS0_4KindESsSt8optionalIN3c1011DispatchKeyEEPKcj')

github.com/vllm-project/vllm - MonolithFoundation opened this issue 4 months ago

[Bug]: RuntimeError: out must have shape (total_q, num_heads, head_size_og)

github.com/vllm-project/vllm - zhihui96 opened this issue 4 months ago

support load qwen2-72b-instruct lora

github.com/vllm-project/vllm - NiuBlibing opened this pull request 4 months ago

[Bug]: Qwen/Qwen2-72B-Instruct 128k server down

github.com/vllm-project/vllm - junior-zsy opened this issue 4 months ago

[Bug]: ray not work when tp>=2

github.com/vllm-project/vllm - Jimmy-Lu opened this issue 4 months ago

[Usage]: How do I get the FP8 scaling factors for KV cache?

github.com/vllm-project/vllm - CharlesRiggins opened this issue 4 months ago

[Hardware][Intel] fp8 kv cache support for CPU

github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago

[Feature]: load/unload API to run multiple LLMs in a single GPU instance

github.com/vllm-project/vllm - lizzzcai opened this issue 4 months ago

当调用接口，不传system时，输出卡主了，输出全是！！！！！

github.com/vllm-project/vllm - shujun1992 opened this issue 4 months ago

[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗

github.com/vllm-project/vllm - huai-ying opened this issue 4 months ago

Enable random seed option to make latency benchmarking more configurable

github.com/vllm-project/vllm - qingquansong opened this pull request 4 months ago

[Bug]: ImportError: cannot import name 'boolean_dispatched' from partially initialized module 'torch._jit_internal'

github.com/vllm-project/vllm - morestart opened this issue 4 months ago

[Bug]: NCCL hangs and causes timeout

github.com/vllm-project/vllm - wjj19950828 opened this issue 4 months ago

[Misc] add code to get git hash info for vllm

github.com/vllm-project/vllm - dhuangnm opened this pull request 4 months ago

[CI/Build] Enable CPU test for VLMs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Usage]: Can I use vllm.LLM(quantization="bitsandbytes"...) when bitsandbytes is supported in the v0.5.0 version

github.com/vllm-project/vllm - cywuuuu opened this issue 4 months ago

[Bug]: Loading Mixtral-8x22B-Instruct-v0.1-FP8 on 8xL40S causes a SIGSEGV

github.com/vllm-project/vllm - nickandbro opened this issue 4 months ago

[Usage]: OpenRLHF: How can I create a second NCCL Group in a vLLM v0.4.3+ Ray worker?

github.com/vllm-project/vllm - hijkzzz opened this issue 4 months ago

Add `cuda_device_count_stateless`

github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago

[Doc] Update documentation on Tensorizer

github.com/vllm-project/vllm - sangstar opened this pull request 4 months ago

[ci] Upload wheels

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Bug][v0.5.0]: Benign error reported by Python multiprocessing resource_tracker

github.com/vllm-project/vllm - mgoin opened this issue 4 months ago

[Feature]: Allow user defined extra request args to be logged in OpenAI compatible server

github.com/vllm-project/vllm - davidgxue opened this issue 4 months ago

[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Bug]: Runtime Error: GET was unable to find an engine to execute this computation for LLaVa-NEXT

github.com/vllm-project/vllm - XkunW opened this issue 4 months ago

[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations"

github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago

[misc] add hint for AttributeError

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bug]: Torch2.3 run fail

github.com/vllm-project/vllm - lucasjinreal opened this issue 4 months ago

[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models

github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago

[Feature]: PagedAttention multiple of 8

github.com/vllm-project/vllm - barschiiii opened this issue 4 months ago

[Bug]: Error when --tensor-parallel-size > 1

github.com/vllm-project/vllm - javi111717 opened this issue 4 months ago

[Installation]: M2 Mac Dependency Torch 2.1.2 (Incompatible)

github.com/vllm-project/vllm - velocity33 opened this issue 4 months ago

[Bug]: Outdated binaries when re-building vLLM from source

github.com/vllm-project/vllm - DarkLight1337 opened this issue 4 months ago

[Bugfix] Skip test temporarily; failing quantization test

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Bug]: 0.5.0 AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'

github.com/vllm-project/vllm - WangErXiao opened this issue 4 months ago

[Usage] Clarify and Update Argument for Specifying Model Revisions

github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago

[Hardware][Intel] Support CPU inference with AVX2 ISA

github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago

[Bugfix] Fix wrong multi_modal_input format for CPU runner

github.com/vllm-project/vllm - Isotr0py opened this pull request 4 months ago

[Bug]: vllm v0.5.0 internal assert failed

github.com/vllm-project/vllm - changshivek opened this issue 4 months ago

[Usage]: How to serve embedding model and LLM at the same time

github.com/vllm-project/vllm - weiyunfei opened this issue 4 months ago

[Bug]: AttributeError: '_OpNamespace' '_C_cache_ops' object has no attribute 'reshape_and_cache_flash'

github.com/vllm-project/vllm - syuoni opened this issue 4 months ago

[Model] Bert Embedding Model

github.com/vllm-project/vllm - laishzh opened this pull request 4 months ago

[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago

multilora_inference调用qwen2-1.5b报错

github.com/vllm-project/vllm - zigangzhao-ai opened this issue 4 months ago

[Bugfix] TYPE_CHECKING for MultiModalData

github.com/vllm-project/vllm - kimdwkimdw opened this pull request 4 months ago

[Bug]: v0.4.3 AsyncEngineDeadError

github.com/vllm-project/vllm - changshivek opened this issue 4 months ago

[Bugfix] Avoid to warmup when world size is 1

github.com/vllm-project/vllm - kerthcet opened this pull request 4 months ago

[Kernel] Add punica dimension for Qwen2 LoRA

github.com/vllm-project/vllm - jinzhen-lin opened this pull request 4 months ago

[Bug]: TypeError: a bytes-like object is required, not 'str'

github.com/vllm-project/vllm - yaoyasong opened this issue 4 months ago