github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[New Model]: Mistral-Nemo

Hambaobao opened this issue 6 months ago

[Bug]: Failed to import from vllm._C with ImportError("/lib64/libc.so.6: version `GLIBC_2.32' not found

balcklive opened this issue 6 months ago

[Usage]: Can't utilize all VRAM for context

vlsav opened this issue 6 months ago

[Performance]: GPU utilization is low when running large batches on H100

sleepwalker2017 opened this issue 6 months ago

[ Misc ] `fbgemm` checkpoints

robertgshaw2-neuralmagic opened this pull request 6 months ago

[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

EstellaXinyuZhang opened this issue 6 months ago

[Core] Allow specifying custom Executor

Yard1 opened this pull request 6 months ago

[RFC]: Single Program Multiple Data (SPMD) Worker Control Plane

ruisearch42 opened this issue 6 months ago

[Bug]: vllm doesn't support multi-instance GPU

cfhammill opened this issue 6 months ago

[ci][test] add correctness test for cpu offloading

youkaichao opened this pull request 6 months ago

[Model] Support Mistral-Nemo

mgoin opened this pull request 6 months ago

[ Kernel ] Enable Dynamic Per Token `fp8`

robertgshaw2-neuralmagic opened this pull request 6 months ago

[CI/Build] bump ruff version, fix linting issues

dtrifiro opened this pull request 6 months ago

[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support

bjoernpl opened this issue 6 months ago

[Usage]: How to release GPU of vLLM model in python code

quanshr opened this issue 6 months ago

[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes

mawong-amd opened this pull request 6 months ago

[CI/Build] replace yapf with ruff

dtrifiro opened this pull request 6 months ago

[Misc] Consolidate and optimize logic for building padded tensors

DarkLight1337 opened this pull request 6 months ago

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion

yecohn opened this issue 6 months ago

[Bug]: vllm turned off my pc (loading mixtral8x7b)

juanluis17 opened this issue 6 months ago

[Bug]: vllm not support fp8 kv cache when use flashinfer

kuangdao opened this issue 6 months ago

[Bugfix] Corrected Typographical Errors from "indicies" to "indices"

JHLEE17 opened this pull request 6 months ago

[Core] Reduce unnecessary compute when logprobs=None

peng1999 opened this pull request 6 months ago

[Bug]: inter-token latency is lower than TPOT in serving benchmark result

Jeffwan opened this issue 6 months ago

[doc][distributed] add more doc for setting up multi-node environment

youkaichao opened this pull request 6 months ago

[Misc] Support FP8 kv cache scales from compressed-tensors

mgoin opened this pull request 6 months ago

added bitsandbytes dependency in common requirement.txt file

dipatidar opened this pull request 6 months ago

[Misc] Small perf improvements

Yard1 opened this pull request 6 months ago

[Model] Pipeline Parallel Support for DeepSeek v2

tjohnson31415 opened this pull request 6 months ago

[Model] Initialize support for InternVL2 series models

Isotr0py opened this pull request 6 months ago

FP8 Dynamic-Per-Token Quant

varun-sundar-rabindranath opened this pull request 6 months ago

[DOC] - Add docker image to Cerebrium Integration

milo157 opened this pull request 6 months ago

[Usage]: No chat template provided. Chat API will not work. How do I get vllm to support Codellama-34B in openai format?

x0w3n opened this issue 6 months ago

[Feature]: Add OpenAI server `prompt_logprobs` support

Theodotus1243 opened this issue 6 months ago

[Bug]: The _get_stats() are called multiple times which cause incorrect metrics collecting in do_log_stats()

yejingfu opened this issue 6 months ago

[TPU] Refactor TPU worker & model runner

WoosukKwon opened this pull request 6 months ago

[Misc] Use `torch.Tensor` for type annotation

WoosukKwon opened this pull request 6 months ago

[TPU] Remove multi-modal args in TPU backend

WoosukKwon opened this pull request 6 months ago

[New Model]: Support for Telechat

hzhaoy opened this issue 6 months ago

[Model] Add Support for GPTQ Fused MOE

izhuhaoran opened this pull request 6 months ago

[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash

noamgat opened this pull request 6 months ago

[Bug]: When I use gemma2 27b, the openai.api returns content "" as none ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=[])

Minami-su opened this issue 6 months ago

deploying embedding model in same way as LLM

riyajatar37003 opened this issue 6 months ago

ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10 ERROR: Could not find a version that satisfies the requirement numba (from outlines) (from versions: none) ERROR: No matching distribution found for numba[Installation]:

XyLove0223 opened this issue 6 months ago

[core][model] yet another cpu offload implementation

youkaichao opened this pull request 6 months ago

[Bugfix] Fix for multinode crash on 4 PP

andoorve opened this pull request 6 months ago

[Bug]: The metrics have not improved.

zjjznw123 opened this issue 6 months ago

Sequence parallel

wbdr opened this pull request 6 months ago

[Not for review]test gemma lora

jeejeelee opened this pull request 6 months ago

[misc][distributed] add seed to dummy weights

youkaichao opened this pull request 6 months ago

[CI/Build] Update flashinfer to v0.0.9 (#6489)

170928 opened this pull request 6 months ago

[Misc] Updated flashinfer to v0.0.9 in the following test scripts:

170928 opened this issue 6 months ago

[misc][distributed] improve tests

youkaichao opened this pull request 6 months ago

[ Kernel ] Fp8 Channelwise Weight Support

robertgshaw2-neuralmagic opened this pull request 6 months ago

[Bug]: No module named `jsonschema.protocols`.

eff-kay opened this issue 6 months ago

[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models.

sroy745 opened this pull request 6 months ago

[Model] Support Mamba

tlrmchlsmth opened this pull request 6 months ago

[Not for review] Spmd tp rebase

ruisearch42 opened this pull request 6 months ago

[ROCm] Cleanup Dockerfile and remove outdated patch

hongxiayang opened this pull request 6 months ago

[New Model]: Codestral Mamba

K-Mistele opened this issue 6 months ago

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2

choco9966 opened this issue 6 months ago

[Bug]: Gemma 27B crashes on GCP A100

noamgat opened this issue 6 months ago

[Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`.

HPUedCSLearner opened this issue 6 months ago

[Feature]: Pipeline parallelism support for qwen model

hiyforever opened this issue 6 months ago

[Usage]: PeftModelForCausalLM is not JSON serializable

jazzisfuture opened this issue 6 months ago

[Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM

bong-furiosa opened this issue 7 months ago

[Misc][Speculative decoding] Typos and typing fixes

ShangmingCai opened this pull request 7 months ago

[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE)

weiminw opened this issue 7 months ago

unable to run vllm model deployment

riyajatar37003 opened this issue 7 months ago

[Bugfix][Frontend] Fix missing `/metrics` endpoint

DarkLight1337 opened this pull request 7 months ago

[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2

vlsav opened this issue 7 months ago

[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression?

frittentheke opened this issue 7 months ago

[CI/Build] Remove "boardwalk" image asset

DarkLight1337 opened this pull request 7 months ago

[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs

KrishnaM251 opened this pull request 7 months ago

[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization

wushidonguc opened this pull request 7 months ago

[Misc] Log spec decode metrics

comaniac opened this pull request 7 months ago

[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron

servient-ashwin opened this issue 7 months ago

[Model] H2O Danube3-4b

g-eoj opened this pull request 7 months ago

[Bug]: Seed issue with Pipeline Parallel

andoorve opened this issue 7 months ago

[Not for review] PP ADAG

ruisearch42 opened this pull request 7 months ago

[Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it

candowu opened this issue 7 months ago

[Core] Use numpy to speed up padded token processing

peng1999 opened this pull request 7 months ago

[Draft] proposal for ipex quant support

jikunshang opened this pull request 7 months ago

[doc][misc] doc update

youkaichao opened this pull request 7 months ago

[Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct

lance0108 opened this issue 7 months ago

[Doc] add env docs for flashinfer backend

DefTruth opened this pull request 7 months ago

[VLM] Minor space optimization for `ClipVisionModel`

ywang96 opened this pull request 7 months ago

Add FUNDING.yml

simon-mo opened this pull request 7 months ago

v0.5.2, v0.5.3, v0.6.0 Release Tracker

simon-mo opened this issue 7 months ago

bump version to v0.5.2

simon-mo opened this pull request 7 months ago

[Bug]: autogen can't work with vllm v0.5.1

tonyaw opened this issue 7 months ago

[Doc][CI/Build] Update docs and tests to use `vllm serve`

DarkLight1337 opened this pull request 7 months ago

[Bugfix] Convert image to RGB by default

DarkLight1337 opened this pull request 7 months ago

[Bug]: illegal memory access when increase max_model_length on FP8 models

IEI-mjx opened this issue 7 months ago

[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests'

lxline opened this pull request 7 months ago

[Bug]: Paligemma support for PNG files

BabyChouSr opened this issue 7 months ago

[ CI ] 0.4.3.post1 Hotfix

robertgshaw2-neuralmagic opened this pull request 7 months ago

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug

mzusman opened this pull request 7 months ago

[Feature]: Return softmax of attention layer.

DouHappy opened this issue 7 months ago

[ Misc ] Enable Quantizing All Layers of DeekSeekv2

robertgshaw2-neuralmagic opened this pull request 7 months ago