Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
TCPStore is not available
Z-Diviner opened this issue 10 months ago
Z-Diviner opened this issue 10 months ago
What's difference between the seed in LLMEngine and seed in SamplingParams?
tomdzh opened this issue 10 months ago
tomdzh opened this issue 10 months ago
Is it possible to use vllm-0.3.3 with CUDA 11.8
HSLUCKY opened this issue 10 months ago
HSLUCKY opened this issue 10 months ago
Implement structured engine for parsing json grammar by token with `response_format: {type: json_object}`
pathorn opened this pull request 10 months ago
pathorn opened this pull request 10 months ago
add aya-101 model
ahkarami opened this issue 11 months ago
ahkarami opened this issue 11 months ago
What's up with Pipeline Parallelism?
duanzhaol opened this issue 11 months ago
duanzhaol opened this issue 11 months ago
how to run gemma-7b model with vllm 0.3.3 under cuda 118??
adogwangwang opened this issue 11 months ago
adogwangwang opened this issue 11 months ago
When chat-ui and vllm are used together, the dialogue output of Llama-2-70b-chat-hf(safetensor file) is abnormal.
majestichou opened this issue 11 months ago
majestichou opened this issue 11 months ago
AsyncEngineDeadError when LoRA loading fails
lifuhuang opened this issue 11 months ago
lifuhuang opened this issue 11 months ago
Multi-LoRA - Support for providing /load and /unload API
gauravkr2108 opened this issue 11 months ago
gauravkr2108 opened this issue 11 months ago
[feature on nm-vllm] Sparse Inference with weight only int8 quant
shiqingzhangCSU opened this issue 11 months ago
shiqingzhangCSU opened this issue 11 months ago
Question regarding GPU memory allocation
wx971025 opened this issue 11 months ago
wx971025 opened this issue 11 months ago
Error compiling kernels
declark1 opened this issue 11 months ago
declark1 opened this issue 11 months ago
lm-evaluation-harness broken on master
pcmoritz opened this issue 11 months ago
pcmoritz opened this issue 11 months ago
v0.3.3 api server can't startup with neuron sdk
qingyuan18 opened this issue 11 months ago
qingyuan18 opened this issue 11 months ago
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU)
AdrianAbeyta opened this pull request 11 months ago
AdrianAbeyta opened this pull request 11 months ago
[FIX] Fix prefix test error on main
zhuohan123 opened this pull request 11 months ago
zhuohan123 opened this pull request 11 months ago
Mixtral 4x 4090 OOM
SinanAkkoyun opened this issue 11 months ago
SinanAkkoyun opened this issue 11 months ago
Order of keys for guided JSON
ccdv-ai opened this issue 11 months ago
ccdv-ai opened this issue 11 months ago
Regression in llama model inference due to #3005
Qubitium opened this issue 11 months ago
Qubitium opened this issue 11 months ago
unload the model
osafaimal opened this issue 11 months ago
osafaimal opened this issue 11 months ago
install from source failed using the latest code
sleepwalker2017 opened this issue 11 months ago
sleepwalker2017 opened this issue 11 months ago
[FIX] Make `flash_attn` optional
WoosukKwon opened this pull request 11 months ago
WoosukKwon opened this pull request 11 months ago
[Minor fix] Include flash_attn in docker image
tdoublep opened this pull request 11 months ago
tdoublep opened this pull request 11 months ago
Error when prompt_logprobs + enable_prefix_caching
bgyoon opened this issue 11 months ago
bgyoon opened this issue 11 months ago
Can vLLM handle concurrent request with FastAPI?
Strongorange opened this issue 11 months ago
Strongorange opened this issue 11 months ago
OpenAI Tools / function calling v2
FlorianJoncour opened this pull request 11 months ago
FlorianJoncour opened this pull request 11 months ago
Prefix Caching with FP8 KV cache support
chenxu2048 opened this pull request 11 months ago
chenxu2048 opened this pull request 11 months ago
When running pytest tests/, undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
Imss27 opened this issue 11 months ago
Imss27 opened this issue 11 months ago
vllm load SqueezeLLM quantization model failed
zuosong-peng opened this issue 11 months ago
zuosong-peng opened this issue 11 months ago
[WIP] Build FlashInfer
WoosukKwon opened this pull request 11 months ago
WoosukKwon opened this pull request 11 months ago
ExLlamaV2: exl2 support
pabl-o-ce opened this issue 11 months ago
pabl-o-ce opened this issue 11 months ago
got completely wrong answer for openchat model with vllm
v-yunbin opened this issue 11 months ago
v-yunbin opened this issue 11 months ago
[Feature request] Output attention scores in vLLM
ChenxinAn-fdu opened this issue 11 months ago
ChenxinAn-fdu opened this issue 11 months ago
Unable to run distributed inference on ray with tensor parallel size > 1
pravingadakh opened this issue 11 months ago
pravingadakh opened this issue 11 months ago
Supporting embedding models
jc9123 opened this pull request 11 months ago
jc9123 opened this pull request 11 months ago
Support `response_format: json_object` in OpenAI server
simon-mo opened this issue 11 months ago
simon-mo opened this issue 11 months ago
[ROCm] Add support for Punica kernels on AMD GPUs
kliuae opened this pull request 11 months ago
kliuae opened this pull request 11 months ago
Does vLLM support the 4-bit quantized version of the Mixtral-8x7B-Instruct-v0.1 model downloaded from Hugging Face
leockl opened this issue 11 months ago
leockl opened this issue 11 months ago
Benchmarking script does not limit the maximum concurrency
wangchen615 opened this issue 11 months ago
wangchen615 opened this issue 11 months ago
RuntimeError while running any model with embeddedllminfo/vllm-rocm:vllm-v0.2.4 image and rocm5.7 (rhel 8.7)
AjayKadoula opened this issue 11 months ago
AjayKadoula opened this issue 11 months ago
Should one use tokenizer templates during offline inference?
vmkhlv opened this issue 11 months ago
vmkhlv opened this issue 11 months ago
Loading models from an S3 location instead of local path
simon-mo opened this issue 11 months ago
simon-mo opened this issue 11 months ago
add doc about serving option on dstack
deep-diver opened this pull request 11 months ago
deep-diver opened this pull request 11 months ago
OpenAI Server issue when running on Apptainer (HPC)
vishruth-v opened this issue 11 months ago
vishruth-v opened this issue 11 months ago
Failed to build from source on ROCm (with pytorch and xformers working correctly)
nayn99 opened this issue 11 months ago
nayn99 opened this issue 11 months ago
Building VLLM from source and running inference: No module named 'vllm._C'
Lena-Jurkschat opened this issue 11 months ago
Lena-Jurkschat opened this issue 11 months ago
Is there a mecanism of priorities when sending a new request
brunorigal opened this issue 11 months ago
brunorigal opened this issue 11 months ago
TypeError: 'NoneType' object is not callable
lixiaolx opened this issue 11 months ago
lixiaolx opened this issue 11 months ago
Fatal Python error: Segmentation fault
lmx760581375 opened this issue 11 months ago
lmx760581375 opened this issue 11 months ago
run qwen1.5-14b-chat with vllm container error.
James-Dao opened this issue 11 months ago
James-Dao opened this issue 11 months ago
how to shat out the log which is unnecessary print per 10s
sxk000 opened this issue 11 months ago
sxk000 opened this issue 11 months ago
Merge Gemma into Llama
WoosukKwon opened this pull request 11 months ago
WoosukKwon opened this pull request 11 months ago
[Feature] Add vision language model support.
xwjiang2010 opened this pull request 11 months ago
xwjiang2010 opened this pull request 11 months ago
Support of AMD consumer GPUs
arno4000 opened this issue 11 months ago
arno4000 opened this issue 11 months ago
部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题
gaijigoumeiren opened this issue 11 months ago
gaijigoumeiren opened this issue 11 months ago
Qwen 14B AWQ deploy: AttributeError: 'ndarray' object has no attribute '_torch_dtype'
testTech92 opened this issue 11 months ago
testTech92 opened this issue 11 months ago
[BUG] Prompt logprobs causing tensor broadcast issue in `sampler.py`
AetherPrior opened this issue 11 months ago
AetherPrior opened this issue 11 months ago
lots of blank before each runing step
Eutenacity opened this issue 11 months ago
Eutenacity opened this issue 11 months ago
AWQ: Implement new kernels (64% faster decoding)
casper-hansen opened this issue 11 months ago
casper-hansen opened this issue 11 months ago
Large length variance of sampled sequences from llama2 70b model compared to HuggingFace .generate()
uralik opened this issue 11 months ago
uralik opened this issue 11 months ago
Unable to specify GPU usage in VLLM code
humza-sami opened this issue 11 months ago
humza-sami opened this issue 11 months ago
Separate attention backends
WoosukKwon opened this pull request 11 months ago
WoosukKwon opened this pull request 11 months ago
some error happend when installing vllm
finylink opened this issue 11 months ago
finylink opened this issue 11 months ago
How can I use the Lora Adapter for a model with Vocab size 40960?
hrson-1203 opened this issue 11 months ago
hrson-1203 opened this issue 11 months ago
Failed to find C compiler. Please specify via CC environment variable
gangooteli opened this issue 11 months ago
gangooteli opened this issue 11 months ago
Fix: Echo without asking for new tokens or logprobs in OpenAI Completions API
matheper opened this pull request 11 months ago
matheper opened this pull request 11 months ago
Limited Request Handling for AMD Instinct MI300 X GPUs with Tensor Parallelism > 1
Spurthi-Bhat-ScalersAI opened this issue 11 months ago
Spurthi-Bhat-ScalersAI opened this issue 11 months ago
求问 qwen-14b微调后的模型用vllm推理后结果都为空
lalalabobobo opened this issue 11 months ago
lalalabobobo opened this issue 11 months ago
The answer accuracy of the QWen series model is lost
zhochengbiao opened this issue 11 months ago
zhochengbiao opened this issue 11 months ago
The service results based on vllm qwen7B are inconsistent with the original qwen results, and the accuracy will drop significantly
chenshukai1015 opened this issue 11 months ago
chenshukai1015 opened this issue 11 months ago
AWQ Quantization Memory Usage
vcivan opened this issue 11 months ago
vcivan opened this issue 11 months ago
Multi-GPU Support Failures with AMD MI210
tom-papatheodore opened this issue 11 months ago
tom-papatheodore opened this issue 11 months ago
Fix empty output when temp is too low
CatherineSue opened this pull request 11 months ago
CatherineSue opened this pull request 11 months ago
E5-mistral-7b-instruct embedding support
DavidPeleg6 opened this issue 11 months ago
DavidPeleg6 opened this issue 11 months ago
Runtime exception [step must be nonzero]
DreamGenX opened this issue 11 months ago
DreamGenX opened this issue 11 months ago
The results of vllm deployment of qwen-14B are inconsistent with the results of the original qwen-14B
qingjiaozyn opened this issue 11 months ago
qingjiaozyn opened this issue 11 months ago
vllm keeps hanging when using djl-deepspeed
ali-firstparty opened this issue 11 months ago
ali-firstparty opened this issue 11 months ago
api_server.py: error: unrecognized arguments: --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
xueyongfu11 opened this issue 11 months ago
xueyongfu11 opened this issue 11 months ago
--tensor-parallel-size 2 fails to load on GCP
noamgat opened this issue 11 months ago
noamgat opened this issue 11 months ago
Duplicate Token `<s>` in Tokenizer Encoded Token ids
zxybazh opened this issue 11 months ago
zxybazh opened this issue 11 months ago
Add docker-compose.yml and corresponding .env
WolframRavenwolf opened this pull request 11 months ago
WolframRavenwolf opened this pull request 11 months ago
Allow model to be served under multiple names
hmellor opened this pull request 11 months ago
hmellor opened this pull request 11 months ago
HQQ quantization support
max-wittig opened this issue 11 months ago
max-wittig opened this issue 11 months ago
Missing prometheus metrics in `0.3.0`
SamComber opened this issue 11 months ago
SamComber opened this issue 11 months ago
Please add lora support for higher ranks and alpha values
parikshitsaikia1619 opened this issue 11 months ago
parikshitsaikia1619 opened this issue 11 months ago
Add LoRA support for Mixtral
tterrysun opened this pull request 12 months ago
tterrysun opened this pull request 12 months ago
vLLM running on a Ray Cluster Hanging on Initializing
Kaotic3 opened this issue 12 months ago
Kaotic3 opened this issue 12 months ago
Add guided decoding for OpenAI API server
felixzhu555 opened this pull request 12 months ago
felixzhu555 opened this pull request 12 months ago
Adds support for gunicorn multiprocess process
jalotra opened this pull request 12 months ago
jalotra opened this pull request 12 months ago
Incorrect completions with tensor parallel size of 8 on MI300X GPUs
seungduk-yanolja opened this issue 12 months ago
seungduk-yanolja opened this issue 12 months ago
VLLM Multi-Lora with embed_tokens and lm_head in adapter weights
germanjke opened this issue 12 months ago
germanjke opened this issue 12 months ago
openai completions api <echo=True> raises Error
seoyunYang opened this issue 12 months ago
seoyunYang opened this issue 12 months ago
Add Splitwise implementation to vLLM
aashaka opened this pull request 12 months ago
aashaka opened this pull request 12 months ago
Nvidia-H20 with nvcr.io/nvidia/pytorch:23.12-py3,CUBLAS Error!
tohneecao opened this issue 12 months ago
tohneecao opened this issue 12 months ago
Multi GPU ROCm6 issues, and workarounds
BKitor opened this issue 12 months ago
BKitor opened this issue 12 months ago
model continue conversation
andrey-genpracc opened this issue 12 months ago
andrey-genpracc opened this issue 12 months ago
[Bug] `v0.3.0` produces garbage output when serving CodeLlama-70B on 4xA6000
ganler opened this issue 12 months ago
ganler opened this issue 12 months ago
ERROR: Fail to install in editable mode. "UserWarning: There are no .../x86_64-conda-linux-gnu-c++ version bounds defined for CUDA version 12.1"
KartikYZ opened this issue 12 months ago
KartikYZ opened this issue 12 months ago