Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Core]: (2/N) Support prefill only models by Workflow Defined Engine - Prefill only scheduler
noooop opened this pull request 2 months ago
noooop opened this pull request 2 months ago
[Installation]: Pytorch nightly version 2.6 meets error: error: can't copy '/tmp/tmpv5hlsgcm.build-lib/vllm/_core_C.abi3.so': doesn't exist or not a regular file
shaoyuyoung opened this issue 3 months ago
shaoyuyoung opened this issue 3 months ago
[Bugfix] Fix lora loading for Compressed Tensors in #9120
fahadh4ilyas opened this pull request 3 months ago
fahadh4ilyas opened this pull request 3 months ago
[TPU] Fix memory profiling
WoosukKwon opened this pull request 3 months ago
WoosukKwon opened this pull request 3 months ago
[Bug]: quantization does not work with dummy weight format
youkaichao opened this issue 3 months ago
youkaichao opened this issue 3 months ago
[Bug]: Extreme low throughput when using pipeline parallelism when Batch Size(running req) is small
AlvL1225 opened this issue 3 months ago
AlvL1225 opened this issue 3 months ago
[Bug]: Error Running Qwen2.5-7B-Instruct on CPU
xiayouran opened this issue 3 months ago
xiayouran opened this issue 3 months ago
[Model] Remap FP8 kv_scale in CommandR and DBRX
hliuca opened this pull request 3 months ago
hliuca opened this pull request 3 months ago
Update link to KServe deployment guide
terrytangyuan opened this pull request 3 months ago
terrytangyuan opened this pull request 3 months ago
[Bug]: Port binding keep failing due to unnecessary code
James4Ever0 opened this issue 3 months ago
James4Ever0 opened this issue 3 months ago
Add classifiers in setup.py
terrytangyuan opened this pull request 3 months ago
terrytangyuan opened this pull request 3 months ago
[Doc] Fix VLM prompt placeholder sample bug
ycool opened this pull request 3 months ago
ycool opened this pull request 3 months ago
[Usage]: due to large max_mm_tokens, number of images that multimodal models can support is underestimated
SepehrV opened this issue 3 months ago
SepehrV opened this issue 3 months ago
[Bug]: vLLM OpenAI-api server `/docs` endpoint fails to load
mgoin opened this issue 3 months ago
mgoin opened this issue 3 months ago
[Misc] Improve validation errors around best_of and n
tjohnson31415 opened this pull request 3 months ago
tjohnson31415 opened this pull request 3 months ago
[WIP] Prototyping re-arch
WoosukKwon opened this pull request 3 months ago
WoosukKwon opened this pull request 3 months ago
[ci][test] use load dummy for testing
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Feature]: Enabling MSS for larger number of sequences (>256)
kushanam opened this issue 3 months ago
kushanam opened this issue 3 months ago
[Performance]: Llama-3.2-11B-Vision-Instruct taking up a lot of memory
pbarker opened this issue 3 months ago
pbarker opened this issue 3 months ago
mypy: check additional directories
russellb opened this pull request 3 months ago
russellb opened this pull request 3 months ago
Add `lm-eval` directly to requirements-test.txt
mgoin opened this pull request 3 months ago
mgoin opened this pull request 3 months ago
[Bugfix] Optimize composite weight loading and fix EAGLE weight loading
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[Bugfix][Doc] Report neuron error in output
joerowell opened this pull request 3 months ago
joerowell opened this pull request 3 months ago
[Misc]: How to set num-scheduler-steps
o1iv3r opened this issue 3 months ago
o1iv3r opened this issue 3 months ago
[Usage]: Multi-gpu inference takes too much memory + how to make uneven load
Ouna-the-Dataweaver opened this issue 3 months ago
Ouna-the-Dataweaver opened this issue 3 months ago
[Misc]: Segmentation Fault in vLLM API Server during Model Initialization (NCCL Error: Unhandled System Error)
shreyasp-07 opened this issue 3 months ago
shreyasp-07 opened this issue 3 months ago
[Doc] Update vlm.rst to include an example on videos
sayakpaul opened this pull request 3 months ago
sayakpaul opened this pull request 3 months ago
[Frontend][Feature] Add jamba tool parser
tomeras91 opened this pull request 3 months ago
tomeras91 opened this pull request 3 months ago
[Bug]: InternVL bounding box prediction does not work
MoritzLaurer opened this issue 3 months ago
MoritzLaurer opened this issue 3 months ago
[Bug]: Can not pip install vllm inside docker
fahadh4ilyas opened this issue 3 months ago
fahadh4ilyas opened this issue 3 months ago
[Frontend] Add Early Validation For Chat Template / Tool Call Parser
alex-jw-brooks opened this pull request 3 months ago
alex-jw-brooks opened this pull request 3 months ago
[Misc]: Nobody reviews my PR
CharlesRiggins opened this issue 3 months ago
CharlesRiggins opened this issue 3 months ago
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1
sroy745 opened this pull request 3 months ago
sroy745 opened this pull request 3 months ago
support bitsandbytes quantization with more models
chenqianfzh opened this pull request 3 months ago
chenqianfzh opened this pull request 3 months ago
[Neuron] Introduce paged attention support for neuron backend
liangfu opened this pull request 3 months ago
liangfu opened this pull request 3 months ago
[Bug]: vllm much slower on long context inputs when using --enable-lora even when lora is not used
badrjd opened this issue 3 months ago
badrjd opened this issue 3 months ago
[Bugfix] Fix crashing for multimodal when image passed with height == 1
Pernekhan opened this pull request 3 months ago
Pernekhan opened this pull request 3 months ago
[torch.compile] Fuse RMSNorm with quant
ProExpertProg opened this pull request 3 months ago
ProExpertProg opened this pull request 3 months ago
[Bug]: Unable to use --enable-lora on latest vllm docker container (v0.6.2)
noelo opened this issue 3 months ago
noelo opened this issue 3 months ago
[Doc] Improve contributing and installation documentation
rafvasq opened this pull request 3 months ago
rafvasq opened this pull request 3 months ago
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs
alex-jw-brooks opened this pull request 3 months ago
alex-jw-brooks opened this pull request 3 months ago
[CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04
mgoin opened this pull request 3 months ago
mgoin opened this pull request 3 months ago
[Bug]: assert len(indices) == len(inputs) with `Qwen/Qwen2-VL-2B-Instruct`
sayakpaul opened this issue 3 months ago
sayakpaul opened this issue 3 months ago
[Bug]: Error Encountered in vLLM Benchmarking with Input Length greater than 8192 in Llama 3.1 405B Model
Bihan opened this issue 3 months ago
Bihan opened this issue 3 months ago
[Usage]: Not getting the infrence metrics in the api response
vverma01232 opened this issue 3 months ago
vverma01232 opened this issue 3 months ago
[New Model]: silma-ai/SILMA-9B-Instruct-v1.0
hassanraha opened this issue 3 months ago
hassanraha opened this issue 3 months ago
[Core]: (1/N) Support prefill only models by Workflow Defined Engine - Prefill only attention
noooop opened this pull request 3 months ago
noooop opened this pull request 3 months ago
[Bugfix][Core] Handle empty ids_list in BlockSpaceManagerV1.get_common_computed_block_ids to prevent msgspec serialization errors
amberOoO opened this pull request 3 months ago
amberOoO opened this pull request 3 months ago
[Bug] BlockSpaceManagerV1.get_common_computed_block_ids returns empty string, causing msgspec decode failure
amberOoO opened this issue 3 months ago
amberOoO opened this issue 3 months ago
[OpenVINO] Use torch 2.4.0 and newer optimim version
ilya-lavrenov opened this pull request 3 months ago
ilya-lavrenov opened this pull request 3 months ago
[Bug]: Unsupported base layer: QKVParallelLinear when loading lora to a quantized model
fahadh4ilyas opened this issue 3 months ago
fahadh4ilyas opened this issue 3 months ago
[Bug]: Installation from last commit (version wrong)
johnnynunez opened this issue 3 months ago
johnnynunez opened this issue 3 months ago
[Bug]: Issue Running VLLM Open AI using nonroot user in K8s
luhurfth opened this issue 3 months ago
luhurfth opened this issue 3 months ago
[Frontend] API support for beam search for MQLLMEngine
LunrEclipse opened this pull request 3 months ago
LunrEclipse opened this pull request 3 months ago
[Bugfix][Hardware] Fix model input for decode
yma11 opened this pull request 3 months ago
yma11 opened this pull request 3 months ago
[Usage]: How to run llama 3.2 with CPU only version
chanandrew96 opened this issue 3 months ago
chanandrew96 opened this issue 3 months ago
[Bug] In v0.6.2, when tp=1, TPOT becomes very slow for batch sizes of 10 or so. (not happened in v0.5.5)
ashgold opened this issue 3 months ago
ashgold opened this issue 3 months ago
[Feature]: Does vLLM support ONNX models?
LetianLee opened this issue 3 months ago
LetianLee opened this issue 3 months ago
[Bug]: AMD MultiStep Feature Issue. Missing argument: 'turn_prefills_into_decodes' in `advance_step()`
tjtanaa opened this issue 3 months ago
tjtanaa opened this issue 3 months ago
[Feature]: LLMEngine and ModelConfig explicitly require path or HF model id, but no InferenceClient class for locally running VLLM server
DanielViglione opened this issue 3 months ago
DanielViglione opened this issue 3 months ago
support jetson AGX Orin
johnnynunez opened this pull request 3 months ago
johnnynunez opened this pull request 3 months ago
[Model] Explicit interface for vLLM models and support OOT embedding models
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[Usage]: chat 接口有问题,completion接口正常
cdhx opened this issue 3 months ago
cdhx opened this issue 3 months ago
[core] remove beam search from the core
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Misc] Remove user-facing error for removed VLM args
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[BugFix][Core] Fix BlockManagerV2 when Encoder Input is None
sroy745 opened this pull request 3 months ago
sroy745 opened this pull request 3 months ago
[torch.compile] register blocksparse attention
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model
tjtanaa opened this pull request 3 months ago
tjtanaa opened this pull request 3 months ago
[Bug]: Try-catch conditions are incorrect to import correct ROCm Flash Attention Backend in Draft Model
tjtanaa opened this issue 3 months ago
tjtanaa opened this issue 3 months ago
[Bug]: Llama-3.2-11B-Vision-Instruct which is an encoder-decoder model fails with BlockManager V2
sroy745 opened this issue 3 months ago
sroy745 opened this issue 3 months ago
[RFC]: hide continuous batching complexity through forward context
youkaichao opened this issue 3 months ago
youkaichao opened this issue 3 months ago
[core] use forward context for flash infer
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Bug]: vllm serve Exception in ASGI application
SpaceHunterInf opened this issue 3 months ago
SpaceHunterInf opened this issue 3 months ago
[Model] Make llama3.2 support multiple and interleaved images
xiangxu-google opened this pull request 3 months ago
xiangxu-google opened this pull request 3 months ago
[Bug]: VLLM Model Fails on Kubernetes with "CUDA error: operation not permitted when stream is capturing"
CREESTL opened this issue 3 months ago
CREESTL opened this issue 3 months ago
[Bugfix] limit lora init id greater than 0
Ssunbell opened this pull request 3 months ago
Ssunbell opened this pull request 3 months ago
[Installation]: cannot install vllm with openvino backend
guanxiang opened this issue 3 months ago
guanxiang opened this issue 3 months ago
[Bug]: Qwen2-VL model support
kulievvitaly opened this issue 3 months ago
kulievvitaly opened this issue 3 months ago
[Model] PP support for embedding models and update docs
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend
Isotr0py opened this pull request 3 months ago
Isotr0py opened this pull request 3 months ago
[Doc] Update README.md with Ray summit slides
zhuohan123 opened this pull request 3 months ago
zhuohan123 opened this pull request 3 months ago
[Frontend] API support for beam search
LunrEclipse opened this pull request 3 months ago
LunrEclipse opened this pull request 3 months ago
[Bugfix] Try to handle older versions of pytorch
bnellnm opened this pull request 3 months ago
bnellnm opened this pull request 3 months ago
[Misc] Fix CI lint
comaniac opened this pull request 3 months ago
comaniac opened this pull request 3 months ago
[Bugfix] use blockmanagerv1 for encoder-decoder
heheda12345 opened this pull request 3 months ago
heheda12345 opened this pull request 3 months ago
[Bugfix] Deprecate registration of custom configs to huggingface
heheda12345 opened this pull request 3 months ago
heheda12345 opened this pull request 3 months ago
[Bug]: vLLM MQLLMEngine Timeout - Json Schema
wrisigo opened this issue 3 months ago
wrisigo opened this issue 3 months ago
[Misc] Add random seed for prefix cache benchmark
Imss27 opened this pull request 3 months ago
Imss27 opened this pull request 3 months ago
[Bug]: Lack of reproducibility across multiple runs of prefix cache benchmark
Imss27 opened this issue 3 months ago
Imss27 opened this issue 3 months ago
Yet another Prefill-Decode separation in vllm
chenqianfzh opened this pull request 3 months ago
chenqianfzh opened this pull request 3 months ago
Developed the PoC of dAttention support. It will utilize the similar idea of vAttention, but it introduces a new memory layout that overcomes the waste of memory of vAttention.
tongping opened this pull request 3 months ago
tongping opened this pull request 3 months ago
[Misc] Improved prefix cache example
Imss27 opened this pull request 3 months ago
Imss27 opened this pull request 3 months ago
[Bug]: vllm overrides transformer's Autoconfig for mllama
lyuqin-scale opened this issue 3 months ago
lyuqin-scale opened this issue 3 months ago
Remove AMD Ray Summit Banner
simon-mo opened this pull request 3 months ago
simon-mo opened this pull request 3 months ago
[Doc]: Clear documentation about function / tool calling with examples
greg2705 opened this issue 3 months ago
greg2705 opened this issue 3 months ago
[Installation]: Build failed with error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher
ReeceResearch opened this issue 3 months ago
ReeceResearch opened this issue 3 months ago
[Misc]: Need to understand support for torch.compile in Q4 roadmap
amd-abhikulk opened this issue 3 months ago
amd-abhikulk opened this issue 3 months ago
[Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL
whyiug opened this pull request 3 months ago
whyiug opened this pull request 3 months ago
[Usage]: Benchmarking Issues: Low Success Rate and Tensor Parallel Size Constraints on 8x AMD MI300x GPUs
Bihan opened this issue 3 months ago
Bihan opened this issue 3 months ago
[Bug]: Issue with Pixtral Model: Unsupported Vision Configuration in vLLM ( AMD amd 7900 xtx)
matrix1233 opened this issue 3 months ago
matrix1233 opened this issue 3 months ago