Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Bug]: Engine timeout error due to request step residual

pushan01 opened this issue 7 months ago
[Bug]: segfault when using google/gemma-2-27b-it on vLLM

federicotorrielli opened this issue 7 months ago
[Bug]: Load LoRA adaptor for Llama3 seems not working

ANYMS-A opened this issue 7 months ago
add benchmark test for fixed input and output length

haichuan1221 opened this pull request 7 months ago
[Usage]: Gemma2-9b not working on A10G 24gb gpu

Abhinay2323 opened this issue 7 months ago
[Core] Support dynamically loading Lora adapter from HuggingFace

Jeffwan opened this pull request 7 months ago
[Bug]: relative path doesn't work for Lora adapter model

Jeffwan opened this issue 7 months ago
[Doc] Fix the lora adapter path in server startup script

Jeffwan opened this pull request 7 months ago
[RFC] Drop beam search support

WoosukKwon opened this issue 7 months ago
[ BugFix ] Prompt Logprobs Detokenization

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bug]: issue with Phi3 mini GPTQ 4Bit/8Bit

gm3000 opened this issue 7 months ago
[Hardware][Intel CPU][DOC] Update docs for CPU backend

zhouyuan opened this pull request 7 months ago
[Installation]: pip install -e .

Kev1ntan opened this issue 7 months ago
do not exclude `object` field in CompletionStreamResponse

kczimm opened this pull request 7 months ago
[misc][frontend] log all available endpoints

youkaichao opened this pull request 7 months ago
[Bug]: No end point available after model is fully loaded

hassanzadeh opened this issue 7 months ago
[Bug]: Guided decoding with Phi-3-small crashes

crosiumreborn opened this issue 7 months ago
[Bug]: gemma-2-27b error loading with vllm.LLM

jl3676 opened this issue 7 months ago
[Usage]: OpenAI-like API in offline inference

1ncludeSteven opened this issue 7 months ago
[Feature]: Precise model device placement

vwxyzjn opened this issue 7 months ago
[Feature]: lazy import for VLM

zhyncs opened this issue 7 months ago
[Usage]: BNB Gemma2 9b loading problems

orellavie1212 opened this issue 7 months ago
[core][distributed] add zmq fallback for broadcasting large objects

youkaichao opened this pull request 7 months ago
Add test test (this is a test pr)

llmpros opened this pull request 7 months ago
[Bug]: Multiprocessing FileNotFound error in triton cache

jl3676 opened this issue 7 months ago
[Usage]: Struggling to get fp8 inference working correctly on 8xL40s

williambarberjr opened this issue 7 months ago
[Feature]: Support AVX2 for CPU (drop AVX-512 requirement)

kozuch opened this issue 7 months ago
[Bug]: Empty strings as output using gemma-2-27B with 4 A10s

lucafirefox opened this issue 7 months ago
[Feature]: ROPE scaling supported by vLLM gemma2

kkk935208447 opened this issue 7 months ago
Add FlashInfer to default Dockerfile

simon-mo opened this pull request 7 months ago
[Bug]: New bug in 0.5.1 (v0.5.0.post1 was fine)

andrePankraz opened this issue 7 months ago
[Core] implement disaggregated prefilling via KV cache transfer

KuntaiDu opened this pull request 7 months ago
[Doc] Move guide for multimodal model and other improvements

DarkLight1337 opened this pull request 7 months ago
[Doc] Reorganize Supported Models by Type

ywang96 opened this pull request 7 months ago
[Feature]: Return hidden states (in progress?)

Elanmarkowitz opened this issue 7 months ago
[Core] Refactor _prepare_model_input_tensors - take 2

comaniac opened this pull request 7 months ago
Move release wheel env var to Dockerfile instead

simon-mo opened this pull request 7 months ago
Fix release wheel build env var

simon-mo opened this pull request 7 months ago
Update wheel builds to strip debug

simon-mo opened this pull request 7 months ago
[Bug]: Batch expansion doesn't work with lora

Adhyyan1252 opened this issue 7 months ago
[Docs] Fix readthedocs for tag build

simon-mo opened this pull request 7 months ago
bump version to v0.5.1

simon-mo opened this pull request 7 months ago
[Feature]: Integrate new backend

XDaoHong opened this issue 7 months ago
[VLM] Cleanup validation and update docs

DarkLight1337 opened this pull request 7 months ago
[Bug]: CUDA error when using multiple GPUs

ndao600 opened this issue 7 months ago
[Bugfix] Enable chunked-prefill and prefix cache with flash-attn backend

sighingnow opened this pull request 7 months ago
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend

kzawora-intel opened this pull request 7 months ago
[Feature]: deepseek-v2 awq support

fengyang95 opened this issue 7 months ago
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue

tdoublep opened this pull request 7 months ago
[Model] Implement DualChunkAttention for Qwen2 Models

hzhwcmhf opened this pull request 7 months ago
[Bugfix] Handle `best_of>1` case by disabling speculation.

tdoublep opened this pull request 7 months ago
[Bug]: Spec. decode fails for requests with n>1 or best_of>1

tdoublep opened this issue 7 months ago
[Bugfix] Use templated datasource in grafana.json to allow automatic imports

frittentheke opened this pull request 7 months ago
[Installation]: Couldn't find CUDA library root.

CodexDive opened this issue 7 months ago
[Feature]: Multi lora on multi gpus

jiuzhangsy opened this issue 7 months ago
[Usage]: vllm server mode, gpu util

UbeCc opened this issue 7 months ago
[Bug]: Disable log requests and disable log stats do not work

wufxgtihub123 opened this issue 7 months ago
[Usage]: vllm现在支持embedding输入吗,没有发现相关接口

zhanghang-official opened this issue 7 months ago
[core][distributed] accelerate distributed weight loading

youkaichao opened this pull request 7 months ago
[Feature]: multi-lora support older nvidia gpus.

wuisawesome opened this issue 7 months ago
[VLM] Calculate maximum number of multi-modal tokens by model

DarkLight1337 opened this pull request 7 months ago
[Distributed][Core] Support Py39 and Py38 for PP

andoorve opened this pull request 7 months ago
[doc][misc] bump up py version in installation doc

youkaichao opened this pull request 7 months ago
[core][distributed] allow custom allreduce when pipeline parallel size > 1

youkaichao opened this pull request 7 months ago
[ Misc ] Clean Up `CompressedTensorsW8A8`

robertgshaw2-neuralmagic opened this pull request 7 months ago