Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bug]: Engine timeout error due to request step residual
pushan01 opened this issue 7 months ago
pushan01 opened this issue 7 months ago
[Bug]: segfault when using google/gemma-2-27b-it on vLLM
federicotorrielli opened this issue 7 months ago
federicotorrielli opened this issue 7 months ago
[Bug]: Load LoRA adaptor for Llama3 seems not working
ANYMS-A opened this issue 7 months ago
ANYMS-A opened this issue 7 months ago
add benchmark test for fixed input and output length
haichuan1221 opened this pull request 7 months ago
haichuan1221 opened this pull request 7 months ago
[Installation]: Installation with OpenVINO get dependency conflict Error !!!
HPUedCSLearner opened this issue 7 months ago
HPUedCSLearner opened this issue 7 months ago
[Usage]: Gemma2-9b not working on A10G 24gb gpu
Abhinay2323 opened this issue 7 months ago
Abhinay2323 opened this issue 7 months ago
[Bug]: Performance : slow inference for FP8 on L20 with 0.5.1(v0.5.0.post1 was fine)
garycaokai opened this issue 7 months ago
garycaokai opened this issue 7 months ago
[Installation]: Gemma2 Installing Flash Infer `[rank0]: TypeError: 'NoneType' object is not callable`
robertgshaw2-neuralmagic opened this issue 7 months ago
robertgshaw2-neuralmagic opened this issue 7 months ago
[Core] Support dynamically loading Lora adapter from HuggingFace
Jeffwan opened this pull request 7 months ago
Jeffwan opened this pull request 7 months ago
[Feature]: Support loading lora adapters from HuggingFace in runtime
Jeffwan opened this issue 7 months ago
Jeffwan opened this issue 7 months ago
[Bug]: relative path doesn't work for Lora adapter model
Jeffwan opened this issue 7 months ago
Jeffwan opened this issue 7 months ago
[Doc] Fix the lora adapter path in server startup script
Jeffwan opened this pull request 7 months ago
Jeffwan opened this pull request 7 months ago
[RFC] Drop beam search support
WoosukKwon opened this issue 7 months ago
WoosukKwon opened this issue 7 months ago
[Bug]: benchmark_throughput gets TypeError: XFormersMetadata.__init__() got an unexpected keyword argument 'is_prompt' wit CPU
LGLG42 opened this issue 7 months ago
LGLG42 opened this issue 7 months ago
[ BugFix ] Prompt Logprobs Detokenization
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bug]: Gemma2 supports 8192 context with sliding window, but vllm only does 4196 or fails if try 8192
pseudotensor opened this issue 7 months ago
pseudotensor opened this issue 7 months ago
[Bug]: issue with Phi3 mini GPTQ 4Bit/8Bit
gm3000 opened this issue 7 months ago
gm3000 opened this issue 7 months ago
[Hardware][Intel CPU][DOC] Update docs for CPU backend
zhouyuan opened this pull request 7 months ago
zhouyuan opened this pull request 7 months ago
[Bug]: AsyncEngineDeadError: Task finished unexpectedly with qwen2 72b
thomZ1 opened this issue 7 months ago
thomZ1 opened this issue 7 months ago
[Installation]: pip install -e .
Kev1ntan opened this issue 7 months ago
Kev1ntan opened this issue 7 months ago
[Usage]: Is there a way to make the results of two different calls to VLLM with temperature > 0 consistent?
Some-random opened this issue 7 months ago
Some-random opened this issue 7 months ago
do not exclude `object` field in CompletionStreamResponse
kczimm opened this pull request 7 months ago
kczimm opened this pull request 7 months ago
[misc][frontend] log all available endpoints
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: No end point available after model is fully loaded
hassanzadeh opened this issue 7 months ago
hassanzadeh opened this issue 7 months ago
[Bug]: Guided decoding with Phi-3-small crashes
crosiumreborn opened this issue 7 months ago
crosiumreborn opened this issue 7 months ago
[Bug]: gemma-2-27b error loading with vllm.LLM
jl3676 opened this issue 7 months ago
jl3676 opened this issue 7 months ago
[Usage]: OpenAI-like API in offline inference
1ncludeSteven opened this issue 7 months ago
1ncludeSteven opened this issue 7 months ago
[Bug]: AsyncEngineDeadError: Task finished unexpectedly with Gemma2 9B
nelyajizi opened this issue 7 months ago
nelyajizi opened this issue 7 months ago
[Feature]: Precise model device placement
vwxyzjn opened this issue 7 months ago
vwxyzjn opened this issue 7 months ago
[Feature]: lazy import for VLM
zhyncs opened this issue 7 months ago
zhyncs opened this issue 7 months ago
[Usage]: BNB Gemma2 9b loading problems
orellavie1212 opened this issue 7 months ago
orellavie1212 opened this issue 7 months ago
[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion
sighingnow opened this pull request 7 months ago
sighingnow opened this pull request 7 months ago
[Usage]: solve problem like "Found no NVIDIA driver on your system." in WSL2
HelloCard opened this issue 7 months ago
HelloCard opened this issue 7 months ago
[core][distributed] add zmq fallback for broadcasting large objects
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
Add test test (this is a test pr)
llmpros opened this pull request 7 months ago
llmpros opened this pull request 7 months ago
[Bug]: Multiprocessing FileNotFound error in triton cache
jl3676 opened this issue 7 months ago
jl3676 opened this issue 7 months ago
[Usage]: Struggling to get fp8 inference working correctly on 8xL40s
williambarberjr opened this issue 7 months ago
williambarberjr opened this issue 7 months ago
[Feature]: Support AVX2 for CPU (drop AVX-512 requirement)
kozuch opened this issue 7 months ago
kozuch opened this issue 7 months ago
[Bug]: Empty strings as output using gemma-2-27B with 4 A10s
lucafirefox opened this issue 7 months ago
lucafirefox opened this issue 7 months ago
[Bug]: LLaVA 1.6 in 0.5.1: Exceptions after some bigger image request, stuck in faulty mode
andrePankraz opened this issue 7 months ago
andrePankraz opened this issue 7 months ago
[Feature]: ROPE scaling supported by vLLM gemma2
kkk935208447 opened this issue 7 months ago
kkk935208447 opened this issue 7 months ago
[Doc]: Code Shared for OpenAI Embedding Client gives base64 encode error
palash-fin opened this issue 7 months ago
palash-fin opened this issue 7 months ago
[Bug]: As V100 does not support FlashAttention, it is not possible to run the gemma model, hopefully it can support the xformers way to run it
warlockedward opened this issue 7 months ago
warlockedward opened this issue 7 months ago
Add FlashInfer to default Dockerfile
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Bug]: New bug in 0.5.1 (v0.5.0.post1 was fine)
andrePankraz opened this issue 7 months ago
andrePankraz opened this issue 7 months ago
[Core] implement disaggregated prefilling via KV cache transfer
KuntaiDu opened this pull request 7 months ago
KuntaiDu opened this pull request 7 months ago
[Bug]: TypeError: 'NoneType' object is not callable when loading Gemma 2 9B with new 0.5.1 version
DanielusG opened this issue 7 months ago
DanielusG opened this issue 7 months ago
[Doc] Move guide for multimodal model and other improvements
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Doc] Reorganize Supported Models by Type
ywang96 opened this pull request 7 months ago
ywang96 opened this pull request 7 months ago
[Feature]: Return hidden states (in progress?)
Elanmarkowitz opened this issue 7 months ago
Elanmarkowitz opened this issue 7 months ago
[Core] Refactor _prepare_model_input_tensors - take 2
comaniac opened this pull request 7 months ago
comaniac opened this pull request 7 months ago
Move release wheel env var to Dockerfile instead
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
Fix release wheel build env var
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
Update wheel builds to strip debug
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Bug]: Batch expansion doesn't work with lora
Adhyyan1252 opened this issue 7 months ago
Adhyyan1252 opened this issue 7 months ago
[Docs] Fix readthedocs for tag build
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
bump version to v0.5.1
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Bug]: When starting deepseek-coder-v2-lite-instruct with vllm on 4 GPUs, one of them is at 0%.
fengyang95 opened this issue 7 months ago
fengyang95 opened this issue 7 months ago
[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs)
KimMinSang96 opened this issue 7 months ago
KimMinSang96 opened this issue 7 months ago
[Feature]: expose the tqdm progress bar to enable logging the progress
hugolytics opened this issue 7 months ago
hugolytics opened this issue 7 months ago
加载Baichuan2-13B-Chat时出异常,torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU
czhcc opened this issue 7 months ago
czhcc opened this issue 7 months ago
[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess.
excelsimon opened this issue 7 months ago
excelsimon opened this issue 7 months ago
[Feature]: Integrate new backend
XDaoHong opened this issue 7 months ago
XDaoHong opened this issue 7 months ago
[Performance]: the performance with chunked-prefill-enabled is lower than default
BestKuan opened this issue 7 months ago
BestKuan opened this issue 7 months ago
[VLM] Cleanup validation and update docs
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Feature]: Model ChatGLMForCausalLM does not support LoRA, but LoRA is enabled.
wangbhan opened this issue 7 months ago
wangbhan opened this issue 7 months ago
[Bug]: CUDA error when using multiple GPUs
ndao600 opened this issue 7 months ago
ndao600 opened this issue 7 months ago
[VLM] Improve consistency between feature size calculation and dummy data for profiling
ywang96 opened this pull request 7 months ago
ywang96 opened this pull request 7 months ago
[Bug]: When using tp for inference, an error occurs: Worker VllmWorkerProcess pid 3283517 died, exit code: -15.
B-201 opened this issue 7 months ago
B-201 opened this issue 7 months ago
[Bugfix] Enable chunked-prefill and prefix cache with flash-attn backend
sighingnow opened this pull request 7 months ago
sighingnow opened this pull request 7 months ago
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend
kzawora-intel opened this pull request 7 months ago
kzawora-intel opened this pull request 7 months ago
[Feature]: deepseek-v2 awq support
fengyang95 opened this issue 7 months ago
fengyang95 opened this issue 7 months ago
[Usage]: Internal server error when serving LoRA adapters with Open-AI compatible vLLM server
ebi64 opened this issue 7 months ago
ebi64 opened this issue 7 months ago
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
[Model] Implement DualChunkAttention for Qwen2 Models
hzhwcmhf opened this pull request 7 months ago
hzhwcmhf opened this pull request 7 months ago
[Bugfix] Handle `best_of>1` case by disabling speculation.
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
[Bug]: Spec. decode fails for requests with n>1 or best_of>1
tdoublep opened this issue 7 months ago
tdoublep opened this issue 7 months ago
[Bugfix] Use templated datasource in grafana.json to allow automatic imports
frittentheke opened this pull request 7 months ago
frittentheke opened this pull request 7 months ago
[Bug]: Phi-3 long context (longrope) doesn't work with fp8 kv cache
jphme opened this issue 7 months ago
jphme opened this issue 7 months ago
[Installation]: Couldn't find CUDA library root.
CodexDive opened this issue 7 months ago
CodexDive opened this issue 7 months ago
[Feature]: Multi lora on multi gpus
jiuzhangsy opened this issue 7 months ago
jiuzhangsy opened this issue 7 months ago
[Usage]: vllm server mode, gpu util
UbeCc opened this issue 7 months ago
UbeCc opened this issue 7 months ago
[Bug]: Disable log requests and disable log stats do not work
wufxgtihub123 opened this issue 7 months ago
wufxgtihub123 opened this issue 7 months ago
[Usage]: vllm现在支持embedding输入吗,没有发现相关接口
zhanghang-official opened this issue 7 months ago
zhanghang-official opened this issue 7 months ago
[core][distributed] accelerate distributed weight loading
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=7392 dtype=Float out_dtype=BFloat16
JJJJerry opened this issue 7 months ago
JJJJerry opened this issue 7 months ago
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation
bigPYJ1151 opened this pull request 7 months ago
bigPYJ1151 opened this pull request 7 months ago
[Bug]: Number of available GPU blocks drop significantly for Phi3-vision
CatherineSue opened this issue 7 months ago
CatherineSue opened this issue 7 months ago
[Feature]: multi-lora support older nvidia gpus.
wuisawesome opened this issue 7 months ago
wuisawesome opened this issue 7 months ago
[VLM] Calculate maximum number of multi-modal tokens by model
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Distributed][Core] Support Py39 and Py38 for PP
andoorve opened this pull request 7 months ago
andoorve opened this pull request 7 months ago
[doc][misc] bump up py version in installation doc
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Installation]: ImportError: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12
laithsakka opened this issue 7 months ago
laithsakka opened this issue 7 months ago
[core][distributed] allow custom allreduce when pipeline parallel size > 1
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: Mixtral 8x7b FP8 encounters illegal memory access in custom_all_reduce.cuh
ferdiko opened this issue 7 months ago
ferdiko opened this issue 7 months ago
[core][distributed] support layer size undividable by pp size in pipeline parallel inference
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Feature]: support layer size undividable by pp size in pipeline parallel inference
youkaichao opened this issue 7 months ago
youkaichao opened this issue 7 months ago
[ Misc ] Clean Up `CompressedTensorsW8A8`
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago