Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[FIX] Fix prefix test error on main
github.com/vllm-project/vllm - zhuohan123 opened this pull request 10 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 10 months ago
Order of keys for guided JSON
github.com/vllm-project/vllm - ccdv-ai opened this issue 10 months ago
github.com/vllm-project/vllm - ccdv-ai opened this issue 10 months ago
Regression in llama model inference due to #3005
github.com/vllm-project/vllm - Qubitium opened this issue 10 months ago
github.com/vllm-project/vllm - Qubitium opened this issue 10 months ago
install from source failed using the latest code
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 10 months ago
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 10 months ago
[FIX] Make `flash_attn` optional
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
[Minor fix] Include flash_attn in docker image
github.com/vllm-project/vllm - tdoublep opened this pull request 10 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 10 months ago
Error when prompt_logprobs + enable_prefix_caching
github.com/vllm-project/vllm - bgyoon opened this issue 10 months ago
github.com/vllm-project/vllm - bgyoon opened this issue 10 months ago
Can vLLM handle concurrent request with FastAPI?
github.com/vllm-project/vllm - Strongorange opened this issue 10 months ago
github.com/vllm-project/vllm - Strongorange opened this issue 10 months ago
OpenAI Tools / function calling v2
github.com/vllm-project/vllm - FlorianJoncour opened this pull request 10 months ago
github.com/vllm-project/vllm - FlorianJoncour opened this pull request 10 months ago
Prefix Caching with FP8 KV cache support
github.com/vllm-project/vllm - chenxu2048 opened this pull request 10 months ago
github.com/vllm-project/vllm - chenxu2048 opened this pull request 10 months ago
When running pytest tests/, undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
github.com/vllm-project/vllm - Imss27 opened this issue 10 months ago
github.com/vllm-project/vllm - Imss27 opened this issue 10 months ago
vllm load SqueezeLLM quantization model failed
github.com/vllm-project/vllm - zuosong-peng opened this issue 10 months ago
github.com/vllm-project/vllm - zuosong-peng opened this issue 10 months ago
[WIP] Build FlashInfer
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
got completely wrong answer for openchat model with vllm
github.com/vllm-project/vllm - v-yunbin opened this issue 10 months ago
github.com/vllm-project/vllm - v-yunbin opened this issue 10 months ago
[Feature request] Output attention scores in vLLM
github.com/vllm-project/vllm - ChenxinAn-fdu opened this issue 10 months ago
github.com/vllm-project/vllm - ChenxinAn-fdu opened this issue 10 months ago
Unable to run distributed inference on ray with tensor parallel size > 1
github.com/vllm-project/vllm - pravingadakh opened this issue 10 months ago
github.com/vllm-project/vllm - pravingadakh opened this issue 10 months ago
Supporting embedding models
github.com/vllm-project/vllm - jc9123 opened this pull request 10 months ago
github.com/vllm-project/vllm - jc9123 opened this pull request 10 months ago
Support `response_format: json_object` in OpenAI server
github.com/vllm-project/vllm - simon-mo opened this issue 10 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 10 months ago
[ROCm] Add support for Punica kernels on AMD GPUs
github.com/vllm-project/vllm - kliuae opened this pull request 10 months ago
github.com/vllm-project/vllm - kliuae opened this pull request 10 months ago
Does vLLM support the 4-bit quantized version of the Mixtral-8x7B-Instruct-v0.1 model downloaded from Hugging Face
github.com/vllm-project/vllm - leockl opened this issue 10 months ago
github.com/vllm-project/vllm - leockl opened this issue 10 months ago
Benchmarking script does not limit the maximum concurrency
github.com/vllm-project/vllm - wangchen615 opened this issue 10 months ago
github.com/vllm-project/vllm - wangchen615 opened this issue 10 months ago
RuntimeError while running any model with embeddedllminfo/vllm-rocm:vllm-v0.2.4 image and rocm5.7 (rhel 8.7)
github.com/vllm-project/vllm - AjayKadoula opened this issue 10 months ago
github.com/vllm-project/vllm - AjayKadoula opened this issue 10 months ago
Should one use tokenizer templates during offline inference?
github.com/vllm-project/vllm - vmkhlv opened this issue 10 months ago
github.com/vllm-project/vllm - vmkhlv opened this issue 10 months ago
Loading models from an S3 location instead of local path
github.com/vllm-project/vllm - simon-mo opened this issue 10 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 10 months ago
add doc about serving option on dstack
github.com/vllm-project/vllm - deep-diver opened this pull request 10 months ago
github.com/vllm-project/vllm - deep-diver opened this pull request 10 months ago
OpenAI Server issue when running on Apptainer (HPC)
github.com/vllm-project/vllm - vishruth-v opened this issue 10 months ago
github.com/vllm-project/vllm - vishruth-v opened this issue 10 months ago
Failed to build from source on ROCm (with pytorch and xformers working correctly)
github.com/vllm-project/vllm - nayn99 opened this issue 10 months ago
github.com/vllm-project/vllm - nayn99 opened this issue 10 months ago
Building VLLM from source and running inference: No module named 'vllm._C'
github.com/vllm-project/vllm - Lena-Jurkschat opened this issue 10 months ago
github.com/vllm-project/vllm - Lena-Jurkschat opened this issue 10 months ago
Is there a mecanism of priorities when sending a new request
github.com/vllm-project/vllm - brunorigal opened this issue 10 months ago
github.com/vllm-project/vllm - brunorigal opened this issue 10 months ago
TypeError: 'NoneType' object is not callable
github.com/vllm-project/vllm - lixiaolx opened this issue 10 months ago
github.com/vllm-project/vllm - lixiaolx opened this issue 10 months ago
Fatal Python error: Segmentation fault
github.com/vllm-project/vllm - lmx760581375 opened this issue 10 months ago
github.com/vllm-project/vllm - lmx760581375 opened this issue 10 months ago
run qwen1.5-14b-chat with vllm container error.
github.com/vllm-project/vllm - James-Dao opened this issue 10 months ago
github.com/vllm-project/vllm - James-Dao opened this issue 10 months ago
how to shat out the log which is unnecessary print per 10s
github.com/vllm-project/vllm - sxk000 opened this issue 10 months ago
github.com/vllm-project/vllm - sxk000 opened this issue 10 months ago
Merge Gemma into Llama
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 10 months ago
[Feature] Add vision language model support.
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 10 months ago
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 10 months ago
Support of AMD consumer GPUs
github.com/vllm-project/vllm - arno4000 opened this issue 10 months ago
github.com/vllm-project/vllm - arno4000 opened this issue 10 months ago
部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题
github.com/vllm-project/vllm - gaijigoumeiren opened this issue 10 months ago
github.com/vllm-project/vllm - gaijigoumeiren opened this issue 10 months ago
Qwen 14B AWQ deploy: AttributeError: 'ndarray' object has no attribute '_torch_dtype'
github.com/vllm-project/vllm - testTech92 opened this issue 10 months ago
github.com/vllm-project/vllm - testTech92 opened this issue 10 months ago
[BUG] Prompt logprobs causing tensor broadcast issue in `sampler.py`
github.com/vllm-project/vllm - AetherPrior opened this issue 10 months ago
github.com/vllm-project/vllm - AetherPrior opened this issue 10 months ago
lots of blank before each runing step
github.com/vllm-project/vllm - Eutenacity opened this issue 10 months ago
github.com/vllm-project/vllm - Eutenacity opened this issue 10 months ago
AWQ: Implement new kernels (64% faster decoding)
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago
Large length variance of sampled sequences from llama2 70b model compared to HuggingFace .generate()
github.com/vllm-project/vllm - uralik opened this issue 11 months ago
github.com/vllm-project/vllm - uralik opened this issue 11 months ago
Unable to specify GPU usage in VLLM code
github.com/vllm-project/vllm - humza-sami opened this issue 11 months ago
github.com/vllm-project/vllm - humza-sami opened this issue 11 months ago
Separate attention backends
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
some error happend when installing vllm
github.com/vllm-project/vllm - finylink opened this issue 11 months ago
github.com/vllm-project/vllm - finylink opened this issue 11 months ago
How can I use the Lora Adapter for a model with Vocab size 40960?
github.com/vllm-project/vllm - hrson-1203 opened this issue 11 months ago
github.com/vllm-project/vllm - hrson-1203 opened this issue 11 months ago
Failed to find C compiler. Please specify via CC environment variable
github.com/vllm-project/vllm - gangooteli opened this issue 11 months ago
github.com/vllm-project/vllm - gangooteli opened this issue 11 months ago
Fix: Echo without asking for new tokens or logprobs in OpenAI Completions API
github.com/vllm-project/vllm - matheper opened this pull request 11 months ago
github.com/vllm-project/vllm - matheper opened this pull request 11 months ago
Limited Request Handling for AMD Instinct MI300 X GPUs with Tensor Parallelism > 1
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue 11 months ago
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue 11 months ago
求问 qwen-14b微调后的模型用vllm推理后结果都为空
github.com/vllm-project/vllm - lalalabobobo opened this issue 11 months ago
github.com/vllm-project/vllm - lalalabobobo opened this issue 11 months ago
The answer accuracy of the QWen series model is lost
github.com/vllm-project/vllm - zhochengbiao opened this issue 11 months ago
github.com/vllm-project/vllm - zhochengbiao opened this issue 11 months ago
The service results based on vllm qwen7B are inconsistent with the original qwen results, and the accuracy will drop significantly
github.com/vllm-project/vllm - chenshukai1015 opened this issue 11 months ago
github.com/vllm-project/vllm - chenshukai1015 opened this issue 11 months ago
Multi-GPU Support Failures with AMD MI210
github.com/vllm-project/vllm - tom-papatheodore opened this issue 11 months ago
github.com/vllm-project/vllm - tom-papatheodore opened this issue 11 months ago
Fix empty output when temp is too low
github.com/vllm-project/vllm - CatherineSue opened this pull request 11 months ago
github.com/vllm-project/vllm - CatherineSue opened this pull request 11 months ago
E5-mistral-7b-instruct embedding support
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 11 months ago
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 11 months ago
Runtime exception [step must be nonzero]
github.com/vllm-project/vllm - DreamGenX opened this issue 11 months ago
github.com/vllm-project/vllm - DreamGenX opened this issue 11 months ago
The results of vllm deployment of qwen-14B are inconsistent with the results of the original qwen-14B
github.com/vllm-project/vllm - qingjiaozyn opened this issue 11 months ago
github.com/vllm-project/vllm - qingjiaozyn opened this issue 11 months ago
vllm keeps hanging when using djl-deepspeed
github.com/vllm-project/vllm - ali-firstparty opened this issue 11 months ago
github.com/vllm-project/vllm - ali-firstparty opened this issue 11 months ago
api_server.py: error: unrecognized arguments: --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
github.com/vllm-project/vllm - xueyongfu11 opened this issue 11 months ago
github.com/vllm-project/vllm - xueyongfu11 opened this issue 11 months ago
--tensor-parallel-size 2 fails to load on GCP
github.com/vllm-project/vllm - noamgat opened this issue 11 months ago
github.com/vllm-project/vllm - noamgat opened this issue 11 months ago
Duplicate Token `<s>` in Tokenizer Encoded Token ids
github.com/vllm-project/vllm - zxybazh opened this issue 11 months ago
github.com/vllm-project/vllm - zxybazh opened this issue 11 months ago
Add docker-compose.yml and corresponding .env
github.com/vllm-project/vllm - WolframRavenwolf opened this pull request 11 months ago
github.com/vllm-project/vllm - WolframRavenwolf opened this pull request 11 months ago
Allow model to be served under multiple names
github.com/vllm-project/vllm - hmellor opened this pull request 11 months ago
github.com/vllm-project/vllm - hmellor opened this pull request 11 months ago
Missing prometheus metrics in `0.3.0`
github.com/vllm-project/vllm - SamComber opened this issue 11 months ago
github.com/vllm-project/vllm - SamComber opened this issue 11 months ago
Please add lora support for higher ranks and alpha values
github.com/vllm-project/vllm - parikshitsaikia1619 opened this issue 11 months ago
github.com/vllm-project/vllm - parikshitsaikia1619 opened this issue 11 months ago
Add LoRA support for Mixtral
github.com/vllm-project/vllm - tterrysun opened this pull request 11 months ago
github.com/vllm-project/vllm - tterrysun opened this pull request 11 months ago
vLLM running on a Ray Cluster Hanging on Initializing
github.com/vllm-project/vllm - Kaotic3 opened this issue 11 months ago
github.com/vllm-project/vllm - Kaotic3 opened this issue 11 months ago
Add guided decoding for OpenAI API server
github.com/vllm-project/vllm - felixzhu555 opened this pull request 11 months ago
github.com/vllm-project/vllm - felixzhu555 opened this pull request 11 months ago
Adds support for gunicorn multiprocess process
github.com/vllm-project/vllm - jalotra opened this pull request 11 months ago
github.com/vllm-project/vllm - jalotra opened this pull request 11 months ago
Incorrect completions with tensor parallel size of 8 on MI300X GPUs
github.com/vllm-project/vllm - seungduk-yanolja opened this issue 11 months ago
github.com/vllm-project/vllm - seungduk-yanolja opened this issue 11 months ago
VLLM Multi-Lora with embed_tokens and lm_head in adapter weights
github.com/vllm-project/vllm - germanjke opened this issue 11 months ago
github.com/vllm-project/vllm - germanjke opened this issue 11 months ago
openai completions api <echo=True> raises Error
github.com/vllm-project/vllm - seoyunYang opened this issue 11 months ago
github.com/vllm-project/vllm - seoyunYang opened this issue 11 months ago
Add Splitwise implementation to vLLM
github.com/vllm-project/vllm - aashaka opened this pull request 11 months ago
github.com/vllm-project/vllm - aashaka opened this pull request 11 months ago
Nvidia-H20 with nvcr.io/nvidia/pytorch:23.12-py3,CUBLAS Error!
github.com/vllm-project/vllm - tohneecao opened this issue 11 months ago
github.com/vllm-project/vllm - tohneecao opened this issue 11 months ago
Multi GPU ROCm6 issues, and workarounds
github.com/vllm-project/vllm - BKitor opened this issue 11 months ago
github.com/vllm-project/vllm - BKitor opened this issue 11 months ago
model continue conversation
github.com/vllm-project/vllm - andrey-genpracc opened this issue 11 months ago
github.com/vllm-project/vllm - andrey-genpracc opened this issue 11 months ago
[Bug] `v0.3.0` produces garbage output when serving CodeLlama-70B on 4xA6000
github.com/vllm-project/vllm - ganler opened this issue 11 months ago
github.com/vllm-project/vllm - ganler opened this issue 11 months ago
ERROR: Fail to install in editable mode. "UserWarning: There are no .../x86_64-conda-linux-gnu-c++ version bounds defined for CUDA version 12.1"
github.com/vllm-project/vllm - KartikYZ opened this issue 11 months ago
github.com/vllm-project/vllm - KartikYZ opened this issue 11 months ago
Add fused top-K softmax kernel for MoE
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
GPTQ & AWQ Fused MOE
github.com/vllm-project/vllm - chu-tianxiang opened this pull request 11 months ago
github.com/vllm-project/vllm - chu-tianxiang opened this pull request 11 months ago
Llama Guard inconsistent output between HuggingFace's Transformers and vLLM
github.com/vllm-project/vllm - AmenRa opened this issue 11 months ago
github.com/vllm-project/vllm - AmenRa opened this issue 11 months ago
vLLM ignores my requests when I increase the number of concurrent requests
github.com/vllm-project/vllm - savannahfung opened this issue 11 months ago
github.com/vllm-project/vllm - savannahfung opened this issue 11 months ago
[Minor] More fix of test_cache.py CI test failure
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 11 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 11 months ago
ImportError: /ramyapra/vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol:
github.com/vllm-project/vllm - ramyaprabhu-alt opened this issue 11 months ago
github.com/vllm-project/vllm - ramyaprabhu-alt opened this issue 11 months ago
How to increase vllm scheduler promt limit?
github.com/vllm-project/vllm - hanswang1 opened this issue 11 months ago
github.com/vllm-project/vllm - hanswang1 opened this issue 11 months ago
Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
github.com/vllm-project/vllm - gty111 opened this issue 11 months ago
github.com/vllm-project/vllm - gty111 opened this issue 11 months ago
Fix/async chat serving
github.com/vllm-project/vllm - schoennenbeck opened this pull request 11 months ago
github.com/vllm-project/vllm - schoennenbeck opened this pull request 11 months ago
KV Cache usage is 0% for mistral model
github.com/vllm-project/vllm - nikhilshandilya opened this issue 11 months ago
github.com/vllm-project/vllm - nikhilshandilya opened this issue 11 months ago
Seek help, `Qwen-14B-Chat-Int4`ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
github.com/vllm-project/vllm - huangyunxin opened this issue 11 months ago
github.com/vllm-project/vllm - huangyunxin opened this issue 11 months ago
OpenAIServingChat cannot be instantiated within a running event loop
github.com/vllm-project/vllm - schoennenbeck opened this issue 11 months ago
github.com/vllm-project/vllm - schoennenbeck opened this issue 11 months ago
IndexError when using Beam Search in Chat Completions
github.com/vllm-project/vllm - jamestwhedbee opened this issue 11 months ago
github.com/vllm-project/vllm - jamestwhedbee opened this issue 11 months ago
ValueError: Total number of attention heads (52) must be divisible by tensor parallel size (8).
github.com/vllm-project/vllm - PolinaBokova opened this issue 11 months ago
github.com/vllm-project/vllm - PolinaBokova opened this issue 11 months ago
Question: Would a PR integrating ExLlamaV2 kernels with AWQ be accepted?
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago