vLLM issues | Ecosyste.ms: OpenCollective

[bitsandbytes]: support read bnb pre-quantized model

github.com/vllm-project/vllm - thesues opened this pull request 4 months ago

[Core] Support sparse KV cache framework

github.com/vllm-project/vllm - chizhang118 opened this pull request 4 months ago

[RFC]: Support sparse KV cache framework

github.com/vllm-project/vllm - chizhang118 opened this issue 4 months ago

compressed-tensors accuracy testing

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Bug]: Detokenizer stage is causing a significant delay

github.com/vllm-project/vllm - hbikki opened this issue 4 months ago

[Core] Add fault tolerance for `RayTokenizerGroupPool`

github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago

[Bug]: 'int' object has no attribute 'expansion'

github.com/vllm-project/vllm - RobertFischer opened this issue 4 months ago

[ci][test] fix ca test in main

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Doc] Documentation on supported hardware for quantization methods

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[BugFix] [Kernel] Add Cutlass2x fallback kernels

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 4 months ago

[ROCm] add some utility apis and fix some unit test based on torch version

github.com/vllm-project/vllm - hongxiayang opened this pull request 4 months ago

[Frontend] Continuous usage stats in OpenAI completion API

github.com/vllm-project/vllm - jvlunteren opened this pull request 4 months ago

[Feature]: Need CPU inferencing support for non-x86 architectures

github.com/vllm-project/vllm - ChipKerchner opened this issue 4 months ago

[Bug]: KeyError: '/psm_ed65b7e3'

github.com/vllm-project/vllm - randydl opened this issue 4 months ago

[Bugfix] fix the bug for lora request

github.com/vllm-project/vllm - InkdyeHuang opened this pull request 4 months ago

[Bug]: VLLM usage on AWS Inferentia instances

github.com/vllm-project/vllm - ashutoshsaboo opened this issue 4 months ago

[Usage]: has vllm supported encoder-only model such as bge-m3?

github.com/vllm-project/vllm - chenchunhui97 opened this issue 4 months ago

[Bug]: which torchvision version required

github.com/vllm-project/vllm - tusharraskar opened this issue 4 months ago

[Draft] Tensor parallel for CPU

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago

[Feature]: Support for OpenAIEmbeddings with Langchain

github.com/vllm-project/vllm - yuhon0528 opened this issue 4 months ago

[LoRA] Adds support for bias in LoRA

github.com/vllm-project/vllm - followumesh opened this pull request 4 months ago

[Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError

github.com/vllm-project/vllm - ZZhangxian opened this issue 4 months ago

[Bug]: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 t transpose_mat2 n m 9216 n 3398 k 7168 mat1_ld 7168 mat2_ld 7168 result_ld 9216 computeType 68 scaleType 0

github.com/vllm-project/vllm - medwang1 opened this issue 4 months ago

api_server.py: error: unrecognized arguments: --tool-use-prompt-template --enable-api-tools --enable-auto-tool-choice

github.com/vllm-project/vllm - lk1983823 opened this issue 4 months ago

[Bug]: asyncio.exceptions.CancelledError asyncio.exceptions.TimeoutError

github.com/vllm-project/vllm - ZZhangxian opened this issue 4 months ago

[Misc]: how to understand: NUM_ELEMS_PER_THREAD = HEAD_SIZE / THREAD_GROUP_SIZE

github.com/vllm-project/vllm - ZJLi2013 opened this issue 4 months ago

[Bug]: Two V100 server with a total of 16GPU running Distributed Inference and Serving Vllm with error

github.com/vllm-project/vllm - warlockedward opened this issue 4 months ago

[Bugfix] support `tie_word_embeddings` for all models

github.com/vllm-project/vllm - zijian-hu opened this pull request 4 months ago

[RFC]: Add runtime weight update API

github.com/vllm-project/vllm - lyuqin-scale opened this issue 4 months ago

[New Model]: Support Nemotron-4-340B

github.com/vllm-project/vllm - dskhudia opened this issue 4 months ago

[New Model]: Chameleon support

github.com/vllm-project/vllm - nopperl opened this issue 4 months ago

[Bug fix]: enumerate's seq is not equal to quant_states'key

github.com/vllm-project/vllm - thesues opened this pull request 4 months ago

[Distributed] Add send and recv helpers

github.com/vllm-project/vllm - andoorve opened this pull request 4 months ago

[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Kernel][CPU] Add Quick `gelu` to CPU

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Bug]: "Triton Error [CUDA]: device kernel image is invalid" when loading Mixtral-8x7B-Instruct-v0.1 in fused_moe.py

github.com/vllm-project/vllm - xiangcao opened this issue 4 months ago

[Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs

github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago

[Feature]: Continuous streaming of `UsageInfo`

github.com/vllm-project/vllm - tdoublep opened this issue 4 months ago

[Misc]: 我在使用vllm启动的openai api在进行对话时出现这样的情况

github.com/vllm-project/vllm - ArboterJams opened this issue 4 months ago

max_tokens must be at least 1, got -160

github.com/vllm-project/vllm - njhouse365 opened this issue 4 months ago

[Misc] optimize sampler with top_p=1 and top_k>0

github.com/vllm-project/vllm - gx16377 opened this pull request 4 months ago

[Usage]: TimeoutError()

github.com/vllm-project/vllm - ZZhangxian opened this issue 4 months ago

[Bugfix] Add verbose error if scipy is missing for blocksparse attention

github.com/vllm-project/vllm - JGSweets opened this pull request 4 months ago

[Bug]: vision chat completion output with odd Instruction/Output prompting.

github.com/vllm-project/vllm - pseudotensor opened this issue 4 months ago

[Bug]:Qwen2-57B-A14B 两卡推理报错

github.com/vllm-project/vllm - CXLiang123 opened this issue 4 months ago

[WIP] [Speculative Decoding] Use MQA kernel for target model verification

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 4 months ago

[Installation]: poetry add vllm not working on my Mac -- xformers (0.0.26.post1) not supporting PEP 517 builds.

github.com/vllm-project/vllm - srushti98 opened this issue 4 months ago

[Bug]: Illegal memory access

github.com/vllm-project/vllm - w013nad opened this issue 4 months ago

[Hardware][Intel GPU] Refactor distributed Executor for xpu device

github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago

[Bug]: `flash_attn_cuda.varlen_fwd` may output a bad result when enabling prefix caching

github.com/vllm-project/vllm - syGOAT opened this issue 4 months ago

[Misc] Making launch_tgi_server.sh script parameterizable

github.com/vllm-project/vllm - AllenDou opened this pull request 4 months ago

ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.[Bug]:

github.com/vllm-project/vllm - QuanhuiGuan opened this issue 4 months ago

[Installation]: pip install -e failed

github.com/vllm-project/vllm - chunniunai220ml opened this issue 4 months ago

[WIP][Misc] Create setup_files dir for cleanup

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Installation]: Build from source: Could NOT find Python. Could not build wheels for vllm.

github.com/vllm-project/vllm - Brennanzuz opened this issue 4 months ago

[BugFix] exclude version 1.15.0 for modelscope

github.com/vllm-project/vllm - zhyncs opened this pull request 4 months ago

[Bug]: Eabling Prefix-Caching doesn't speed up inference

github.com/vllm-project/vllm - yangelaboy opened this issue 4 months ago

[Usage]: Does class LLM support inference quantization on CPU?

github.com/vllm-project/vllm - rsong0606 opened this issue 4 months ago

[Bug]: Qwen2-72B-Instruct-gptq-int4 Repetitive issues

github.com/vllm-project/vllm - Storm0921 opened this issue 4 months ago

[Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs

github.com/vllm-project/vllm - SamKG opened this issue 4 months ago

IfEval Metrics not consistent with different vLLM versions

github.com/vllm-project/vllm - akjindal53244 opened this issue 4 months ago

Support CPU inference with VSX PowerPC ISA

github.com/vllm-project/vllm - ChipKerchner opened this pull request 4 months ago

[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models

github.com/vllm-project/vllm - K-Mistele opened this pull request 4 months ago

[build][misc] remove nvidia runtime docker base image

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bugfix] [Core] don't schedule prefill if freeing kv cache

github.com/vllm-project/vllm - toslunar opened this pull request 4 months ago

test a100

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Installation]: 是否支持在windows操作系统上安装vllm

github.com/vllm-project/vllm - hiahia121 opened this issue 4 months ago

[Misc]Add param max-model-len in benchmark_latency.py

github.com/vllm-project/vllm - DearPlanet opened this pull request 4 months ago

[Bugfix] Fix Phi-3 Long RoPE scaling implementation

github.com/vllm-project/vllm - ShukantPal opened this pull request 4 months ago

Why is the GPU KV cache usage very low?

github.com/vllm-project/vllm - tammypi opened this issue 4 months ago

[Misc] Remove import from transformers logging

github.com/vllm-project/vllm - CatherineSue opened this pull request 4 months ago

[ci] Deprecate original CI template

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[CI/Build][Misc] Update Pytest Marker for VLMs

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Bugfix][Model]Solve the problem of inability to infer on vllm after minicpm training or awq

github.com/vllm-project/vllm - LDLINGLINGLING opened this pull request 4 months ago

[Bug]: many models may not load the weights correctly if `tie_word_embeddings` is enabled

github.com/vllm-project/vllm - zijian-hu opened this issue 4 months ago

[misc][typo] fix typo

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[misc][distributed] use localhost for single-node

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Misc] Fix typo

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Usage]: Qwen2-7B-Instruct got stuck in infinite loop using vllm==0.5.0 with tp = 2

github.com/vllm-project/vllm - YanXingyu1998 opened this issue 4 months ago

[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline

github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago

[CI] Avoid naming different metrics with the same name in performance benchmark

github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago

[Doc] Update docker references

github.com/vllm-project/vllm - rafvasq opened this pull request 4 months ago

[Bug]: Failed: /home/runner/work/vllm/vllm/csrc/custom_all_reduce.cuh:310 'invalid argument'

github.com/vllm-project/vllm - XianmingJin08 opened this issue 4 months ago

[bugfix][distributed] do not error if two processes do not agree on p2p capability

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Model] Add support for Qwen2 for embeddings

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[ci] Setup Release pipeline and build release wheels with cache

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Feature]: Initial LLM token

github.com/vllm-project/vllm - CHesketh76 opened this issue 4 months ago

feat: adds user information to the input of the scheduler

github.com/vllm-project/vllm - FerranAgulloLopez opened this pull request 4 months ago

[Bug]: Concurrent requests messing up GREEDY responses

github.com/vllm-project/vllm - prashantgupta24 opened this issue 4 months ago

[Fix] Use utf-8 encoding in entrypoints/openai/run_batch.py

github.com/vllm-project/vllm - zifeitong opened this pull request 4 months ago

[Feature]: Access to user information in scheduler

github.com/vllm-project/vllm - FerranAgulloLopez opened this issue 4 months ago

[bugfix][distributed] fix 16 gpus local rank arrangement

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[LoRA] Add support for pinning lora adapters in the LRU cache

github.com/vllm-project/vllm - rohithkrn opened this pull request 4 months ago

[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default)

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 4 months ago

RuntimeError: No suitable kernel. h_in=16 h_out=4096 dtype=Float out_dtype=Half

github.com/vllm-project/vllm - yangelaboy opened this issue 4 months ago

[Feature]: support Qwen2 embedding

github.com/vllm-project/vllm - DavidPeleg6 opened this issue 4 months ago

[Core] Add use_dummy_dirver to parallel config

github.com/vllm-project/vllm - DriverSong opened this pull request 4 months ago

[Feature]: Add config of use_dummy_driver rather than default 'False'

github.com/vllm-project/vllm - DriverSong opened this issue 4 months ago

[Bug]: GPTQ-Marlin kernel illegal memory access with `group_size=32`, `desc_act=True`, `tp=4`

github.com/vllm-project/vllm - danieldk opened this issue 4 months ago

[Model] Rename Phi3 rope scaling type

github.com/vllm-project/vllm - garg-amit opened this pull request 4 months ago