Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
When starting the second vllm.entrypoints.api_server using tensor parallel in a single node, the second vllm api_server Stuck in " Started a local Ray instance." OR "Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory"
github.com/vllm-project/vllm - durant1999 opened this issue 11 months ago
github.com/vllm-project/vllm - durant1999 opened this issue 11 months ago
Does 'all-reduce kernels are temporarily disabled' is the cause for the more memory requirment?
github.com/vllm-project/vllm - SafeyahShemali opened this issue 11 months ago
github.com/vllm-project/vllm - SafeyahShemali opened this issue 11 months ago
inference with AWQ quantization
github.com/vllm-project/vllm - Kev1ntan opened this issue 11 months ago
github.com/vllm-project/vllm - Kev1ntan opened this issue 11 months ago
Bug when input top_k as a float that is outside of range
github.com/vllm-project/vllm - Drzhivago264 opened this issue 11 months ago
github.com/vllm-project/vllm - Drzhivago264 opened this issue 11 months ago
[Feature Request] Add GPTQ quantization kernels for 4-bit NormalFloat (NF4) use cases.
github.com/vllm-project/vllm - duchengyao opened this issue 11 months ago
github.com/vllm-project/vllm - duchengyao opened this issue 11 months ago
What's difference between the seed in LLMEngine and seed in SamplingParams?
github.com/vllm-project/vllm - tomdzh opened this issue 11 months ago
github.com/vllm-project/vllm - tomdzh opened this issue 11 months ago
Is it possible to use vllm-0.3.3 with CUDA 11.8
github.com/vllm-project/vllm - HSLUCKY opened this issue 11 months ago
github.com/vllm-project/vllm - HSLUCKY opened this issue 11 months ago
Implement structured engine for parsing json grammar by token with `response_format: {type: json_object}`
github.com/vllm-project/vllm - pathorn opened this pull request 11 months ago
github.com/vllm-project/vllm - pathorn opened this pull request 11 months ago
What's up with Pipeline Parallelism?
github.com/vllm-project/vllm - duanzhaol opened this issue 11 months ago
github.com/vllm-project/vllm - duanzhaol opened this issue 11 months ago
how to run gemma-7b model with vllm 0.3.3 under cuda 118??
github.com/vllm-project/vllm - adogwangwang opened this issue 11 months ago
github.com/vllm-project/vllm - adogwangwang opened this issue 11 months ago
When chat-ui and vllm are used together, the dialogue output of Llama-2-70b-chat-hf(safetensor file) is abnormal.
github.com/vllm-project/vllm - majestichou opened this issue 11 months ago
github.com/vllm-project/vllm - majestichou opened this issue 11 months ago
AsyncEngineDeadError when LoRA loading fails
github.com/vllm-project/vllm - lifuhuang opened this issue 11 months ago
github.com/vllm-project/vllm - lifuhuang opened this issue 11 months ago
Multi-LoRA - Support for providing /load and /unload API
github.com/vllm-project/vllm - gauravkr2108 opened this issue 11 months ago
github.com/vllm-project/vllm - gauravkr2108 opened this issue 11 months ago
[feature on nm-vllm] Sparse Inference with weight only int8 quant
github.com/vllm-project/vllm - shiqingzhangCSU opened this issue 11 months ago
github.com/vllm-project/vllm - shiqingzhangCSU opened this issue 11 months ago
Question regarding GPU memory allocation
github.com/vllm-project/vllm - wx971025 opened this issue 11 months ago
github.com/vllm-project/vllm - wx971025 opened this issue 11 months ago
lm-evaluation-harness broken on master
github.com/vllm-project/vllm - pcmoritz opened this issue 11 months ago
github.com/vllm-project/vllm - pcmoritz opened this issue 11 months ago
v0.3.3 api server can't startup with neuron sdk
github.com/vllm-project/vllm - qingyuan18 opened this issue 11 months ago
github.com/vllm-project/vllm - qingyuan18 opened this issue 11 months ago
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU)
github.com/vllm-project/vllm - AdrianAbeyta opened this pull request 11 months ago
github.com/vllm-project/vllm - AdrianAbeyta opened this pull request 11 months ago
[FIX] Fix prefix test error on main
github.com/vllm-project/vllm - zhuohan123 opened this pull request 11 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 11 months ago
Order of keys for guided JSON
github.com/vllm-project/vllm - ccdv-ai opened this issue 11 months ago
github.com/vllm-project/vllm - ccdv-ai opened this issue 11 months ago
Regression in llama model inference due to #3005
github.com/vllm-project/vllm - Qubitium opened this issue 11 months ago
github.com/vllm-project/vllm - Qubitium opened this issue 11 months ago
install from source failed using the latest code
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 11 months ago
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 11 months ago
[FIX] Make `flash_attn` optional
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
[Minor fix] Include flash_attn in docker image
github.com/vllm-project/vllm - tdoublep opened this pull request 11 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 11 months ago
Error when prompt_logprobs + enable_prefix_caching
github.com/vllm-project/vllm - bgyoon opened this issue 11 months ago
github.com/vllm-project/vllm - bgyoon opened this issue 11 months ago
Can vLLM handle concurrent request with FastAPI?
github.com/vllm-project/vllm - Strongorange opened this issue 11 months ago
github.com/vllm-project/vllm - Strongorange opened this issue 11 months ago
OpenAI Tools / function calling v2
github.com/vllm-project/vllm - FlorianJoncour opened this pull request 11 months ago
github.com/vllm-project/vllm - FlorianJoncour opened this pull request 11 months ago
Prefix Caching with FP8 KV cache support
github.com/vllm-project/vllm - chenxu2048 opened this pull request 11 months ago
github.com/vllm-project/vllm - chenxu2048 opened this pull request 11 months ago
When running pytest tests/, undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
github.com/vllm-project/vllm - Imss27 opened this issue 11 months ago
github.com/vllm-project/vllm - Imss27 opened this issue 11 months ago
vllm load SqueezeLLM quantization model failed
github.com/vllm-project/vllm - zuosong-peng opened this issue 11 months ago
github.com/vllm-project/vllm - zuosong-peng opened this issue 11 months ago
[WIP] Build FlashInfer
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
got completely wrong answer for openchat model with vllm
github.com/vllm-project/vllm - v-yunbin opened this issue 11 months ago
github.com/vllm-project/vllm - v-yunbin opened this issue 11 months ago
[Feature request] Output attention scores in vLLM
github.com/vllm-project/vllm - ChenxinAn-fdu opened this issue 11 months ago
github.com/vllm-project/vllm - ChenxinAn-fdu opened this issue 11 months ago
Unable to run distributed inference on ray with tensor parallel size > 1
github.com/vllm-project/vllm - pravingadakh opened this issue 11 months ago
github.com/vllm-project/vllm - pravingadakh opened this issue 11 months ago
Supporting embedding models
github.com/vllm-project/vllm - jc9123 opened this pull request 11 months ago
github.com/vllm-project/vllm - jc9123 opened this pull request 11 months ago
Support `response_format: json_object` in OpenAI server
github.com/vllm-project/vllm - simon-mo opened this issue 11 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 11 months ago
[ROCm] Add support for Punica kernels on AMD GPUs
github.com/vllm-project/vllm - kliuae opened this pull request 11 months ago
github.com/vllm-project/vllm - kliuae opened this pull request 11 months ago
Does vLLM support the 4-bit quantized version of the Mixtral-8x7B-Instruct-v0.1 model downloaded from Hugging Face
github.com/vllm-project/vllm - leockl opened this issue 11 months ago
github.com/vllm-project/vllm - leockl opened this issue 11 months ago
Benchmarking script does not limit the maximum concurrency
github.com/vllm-project/vllm - wangchen615 opened this issue 11 months ago
github.com/vllm-project/vllm - wangchen615 opened this issue 11 months ago
RuntimeError while running any model with embeddedllminfo/vllm-rocm:vllm-v0.2.4 image and rocm5.7 (rhel 8.7)
github.com/vllm-project/vllm - AjayKadoula opened this issue 11 months ago
github.com/vllm-project/vllm - AjayKadoula opened this issue 11 months ago
Should one use tokenizer templates during offline inference?
github.com/vllm-project/vllm - vmkhlv opened this issue 11 months ago
github.com/vllm-project/vllm - vmkhlv opened this issue 11 months ago
Loading models from an S3 location instead of local path
github.com/vllm-project/vllm - simon-mo opened this issue 11 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 11 months ago
add doc about serving option on dstack
github.com/vllm-project/vllm - deep-diver opened this pull request 11 months ago
github.com/vllm-project/vllm - deep-diver opened this pull request 11 months ago
OpenAI Server issue when running on Apptainer (HPC)
github.com/vllm-project/vllm - vishruth-v opened this issue 11 months ago
github.com/vllm-project/vllm - vishruth-v opened this issue 11 months ago
Failed to build from source on ROCm (with pytorch and xformers working correctly)
github.com/vllm-project/vllm - nayn99 opened this issue 11 months ago
github.com/vllm-project/vllm - nayn99 opened this issue 11 months ago
Building VLLM from source and running inference: No module named 'vllm._C'
github.com/vllm-project/vllm - Lena-Jurkschat opened this issue 11 months ago
github.com/vllm-project/vllm - Lena-Jurkschat opened this issue 11 months ago
Is there a mecanism of priorities when sending a new request
github.com/vllm-project/vllm - brunorigal opened this issue 11 months ago
github.com/vllm-project/vllm - brunorigal opened this issue 11 months ago
TypeError: 'NoneType' object is not callable
github.com/vllm-project/vllm - lixiaolx opened this issue 11 months ago
github.com/vllm-project/vllm - lixiaolx opened this issue 11 months ago
Fatal Python error: Segmentation fault
github.com/vllm-project/vllm - lmx760581375 opened this issue 11 months ago
github.com/vllm-project/vllm - lmx760581375 opened this issue 11 months ago
run qwen1.5-14b-chat with vllm container error.
github.com/vllm-project/vllm - James-Dao opened this issue 11 months ago
github.com/vllm-project/vllm - James-Dao opened this issue 11 months ago
how to shat out the log which is unnecessary print per 10s
github.com/vllm-project/vllm - sxk000 opened this issue 11 months ago
github.com/vllm-project/vllm - sxk000 opened this issue 11 months ago
Merge Gemma into Llama
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
[Feature] Add vision language model support.
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 11 months ago
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 11 months ago
Support of AMD consumer GPUs
github.com/vllm-project/vllm - arno4000 opened this issue 11 months ago
github.com/vllm-project/vllm - arno4000 opened this issue 11 months ago
部署qwen1.5-7B-Chat的时候遇到API接口返回缺10个字符的问题
github.com/vllm-project/vllm - gaijigoumeiren opened this issue 11 months ago
github.com/vllm-project/vllm - gaijigoumeiren opened this issue 11 months ago
Qwen 14B AWQ deploy: AttributeError: 'ndarray' object has no attribute '_torch_dtype'
github.com/vllm-project/vllm - testTech92 opened this issue 11 months ago
github.com/vllm-project/vllm - testTech92 opened this issue 11 months ago
[BUG] Prompt logprobs causing tensor broadcast issue in `sampler.py`
github.com/vllm-project/vllm - AetherPrior opened this issue 11 months ago
github.com/vllm-project/vllm - AetherPrior opened this issue 11 months ago
lots of blank before each runing step
github.com/vllm-project/vllm - Eutenacity opened this issue 11 months ago
github.com/vllm-project/vllm - Eutenacity opened this issue 11 months ago
AWQ: Implement new kernels (64% faster decoding)
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago
github.com/vllm-project/vllm - casper-hansen opened this issue 11 months ago
Large length variance of sampled sequences from llama2 70b model compared to HuggingFace .generate()
github.com/vllm-project/vllm - uralik opened this issue 11 months ago
github.com/vllm-project/vllm - uralik opened this issue 11 months ago
Unable to specify GPU usage in VLLM code
github.com/vllm-project/vllm - humza-sami opened this issue 11 months ago
github.com/vllm-project/vllm - humza-sami opened this issue 11 months ago
Separate attention backends
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 11 months ago
some error happend when installing vllm
github.com/vllm-project/vllm - finylink opened this issue 11 months ago
github.com/vllm-project/vllm - finylink opened this issue 11 months ago
How can I use the Lora Adapter for a model with Vocab size 40960?
github.com/vllm-project/vllm - hrson-1203 opened this issue 11 months ago
github.com/vllm-project/vllm - hrson-1203 opened this issue 11 months ago
Failed to find C compiler. Please specify via CC environment variable
github.com/vllm-project/vllm - gangooteli opened this issue 11 months ago
github.com/vllm-project/vllm - gangooteli opened this issue 11 months ago
Fix: Echo without asking for new tokens or logprobs in OpenAI Completions API
github.com/vllm-project/vllm - matheper opened this pull request 11 months ago
github.com/vllm-project/vllm - matheper opened this pull request 11 months ago
Limited Request Handling for AMD Instinct MI300 X GPUs with Tensor Parallelism > 1
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue 11 months ago
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue 11 months ago
求问 qwen-14b微调后的模型用vllm推理后结果都为空
github.com/vllm-project/vllm - lalalabobobo opened this issue 12 months ago
github.com/vllm-project/vllm - lalalabobobo opened this issue 12 months ago
The answer accuracy of the QWen series model is lost
github.com/vllm-project/vllm - zhochengbiao opened this issue 12 months ago
github.com/vllm-project/vllm - zhochengbiao opened this issue 12 months ago
The service results based on vllm qwen7B are inconsistent with the original qwen results, and the accuracy will drop significantly
github.com/vllm-project/vllm - chenshukai1015 opened this issue 12 months ago
github.com/vllm-project/vllm - chenshukai1015 opened this issue 12 months ago
Multi-GPU Support Failures with AMD MI210
github.com/vllm-project/vllm - tom-papatheodore opened this issue 12 months ago
github.com/vllm-project/vllm - tom-papatheodore opened this issue 12 months ago
Fix empty output when temp is too low
github.com/vllm-project/vllm - CatherineSue opened this pull request 12 months ago
github.com/vllm-project/vllm - CatherineSue opened this pull request 12 months ago
E5-mistral-7b-instruct embedding support
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 12 months ago
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 12 months ago
Runtime exception [step must be nonzero]
github.com/vllm-project/vllm - DreamGenX opened this issue 12 months ago
github.com/vllm-project/vllm - DreamGenX opened this issue 12 months ago
The results of vllm deployment of qwen-14B are inconsistent with the results of the original qwen-14B
github.com/vllm-project/vllm - qingjiaozyn opened this issue 12 months ago
github.com/vllm-project/vllm - qingjiaozyn opened this issue 12 months ago
vllm keeps hanging when using djl-deepspeed
github.com/vllm-project/vllm - ali-firstparty opened this issue 12 months ago
github.com/vllm-project/vllm - ali-firstparty opened this issue 12 months ago
api_server.py: error: unrecognized arguments: --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
github.com/vllm-project/vllm - xueyongfu11 opened this issue 12 months ago
github.com/vllm-project/vllm - xueyongfu11 opened this issue 12 months ago
--tensor-parallel-size 2 fails to load on GCP
github.com/vllm-project/vllm - noamgat opened this issue 12 months ago
github.com/vllm-project/vllm - noamgat opened this issue 12 months ago
Duplicate Token `<s>` in Tokenizer Encoded Token ids
github.com/vllm-project/vllm - zxybazh opened this issue 12 months ago
github.com/vllm-project/vllm - zxybazh opened this issue 12 months ago
Add docker-compose.yml and corresponding .env
github.com/vllm-project/vllm - WolframRavenwolf opened this pull request 12 months ago
github.com/vllm-project/vllm - WolframRavenwolf opened this pull request 12 months ago
Allow model to be served under multiple names
github.com/vllm-project/vllm - hmellor opened this pull request 12 months ago
github.com/vllm-project/vllm - hmellor opened this pull request 12 months ago
Missing prometheus metrics in `0.3.0`
github.com/vllm-project/vllm - SamComber opened this issue 12 months ago
github.com/vllm-project/vllm - SamComber opened this issue 12 months ago
Please add lora support for higher ranks and alpha values
github.com/vllm-project/vllm - parikshitsaikia1619 opened this issue 12 months ago
github.com/vllm-project/vllm - parikshitsaikia1619 opened this issue 12 months ago
Add LoRA support for Mixtral
github.com/vllm-project/vllm - tterrysun opened this pull request 12 months ago
github.com/vllm-project/vllm - tterrysun opened this pull request 12 months ago
vLLM running on a Ray Cluster Hanging on Initializing
github.com/vllm-project/vllm - Kaotic3 opened this issue 12 months ago
github.com/vllm-project/vllm - Kaotic3 opened this issue 12 months ago
Add guided decoding for OpenAI API server
github.com/vllm-project/vllm - felixzhu555 opened this pull request 12 months ago
github.com/vllm-project/vllm - felixzhu555 opened this pull request 12 months ago
Adds support for gunicorn multiprocess process
github.com/vllm-project/vllm - jalotra opened this pull request 12 months ago
github.com/vllm-project/vllm - jalotra opened this pull request 12 months ago
Incorrect completions with tensor parallel size of 8 on MI300X GPUs
github.com/vllm-project/vllm - seungduk-yanolja opened this issue 12 months ago
github.com/vllm-project/vllm - seungduk-yanolja opened this issue 12 months ago