Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[help wanted]: rename vllm/logging module to avoid shadowing builtin logging module
youkaichao opened this issue about 1 month ago
youkaichao opened this issue about 1 month ago
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill
NickLucche opened this pull request about 1 month ago
NickLucche opened this pull request about 1 month ago
[Kernel]Enable HPU for Speculative Decoding
xuechendi opened this pull request about 1 month ago
xuechendi opened this pull request about 1 month ago
[Mistral] FP8 format
patrickvonplaten opened this pull request about 1 month ago
patrickvonplaten opened this pull request about 1 month ago
[Bug]: can not serve microsoft/llava-med-v1.5-mistral-7b
cubense opened this issue about 1 month ago
cubense opened this issue about 1 month ago
Prefix Cache Aware Scheduling [1/n]
rickyyx opened this pull request about 1 month ago
rickyyx opened this pull request about 1 month ago
[V1][Bugfix] Propagate V1 LLMEngine properly
comaniac opened this pull request about 1 month ago
comaniac opened this pull request about 1 month ago
[Usage]: VLLM failing to stream response after 512+ prompt tokens.
aghbd opened this issue about 1 month ago
aghbd opened this issue about 1 month ago
[Core] Add padding-aware scheduling for 2D prefills
kzawora-intel opened this pull request about 1 month ago
kzawora-intel opened this pull request about 1 month ago
[Usage]: Engine iteration timed out. (during using qwen2-vl-7b)
HuiyuanYan opened this issue about 1 month ago
HuiyuanYan opened this issue about 1 month ago
[CI/Build] Always run mypy
russellb opened this pull request about 1 month ago
russellb opened this pull request about 1 month ago
[V1] Allow piecewise cuda graphs to run with custom allreduce
SageMoore opened this pull request about 1 month ago
SageMoore opened this pull request about 1 month ago
Fix quantization config of vl model
jinzhen-lin opened this pull request about 1 month ago
jinzhen-lin opened this pull request about 1 month ago
[New Model]: dunzhang/stella_en_1.5B_v5
cavities opened this issue about 1 month ago
cavities opened this issue about 1 month ago
[Bug]: vllm0.6.3.post1 7B model can not use cmd vllm.entrypoints.openai.api_server on wsl
xiezhipeng-git opened this issue about 2 months ago
xiezhipeng-git opened this issue about 2 months ago
[Doc]: follow the doc but got error
husheng-liu opened this issue about 2 months ago
husheng-liu opened this issue about 2 months ago
[RFC]: Merge input processor and input mapper for multi-modal models
DarkLight1337 opened this issue about 2 months ago
DarkLight1337 opened this issue about 2 months ago
[Hardware][CPU][torch.compile] integrate torch compile
bigPYJ1151 opened this pull request about 2 months ago
bigPYJ1151 opened this pull request about 2 months ago
[Bugfix] Make image processor respect `mm_processor_kwargs` for Qwen2-VL
li-plus opened this pull request about 2 months ago
li-plus opened this pull request about 2 months ago
[Bug]: When apply continue_final_message for OpenAI server, the `"echo":false` is ignored.
DIYer22 opened this issue about 2 months ago
DIYer22 opened this issue about 2 months ago
[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target
bigPYJ1151 opened this pull request about 2 months ago
bigPYJ1151 opened this pull request about 2 months ago
[Hardware][XPU] AWQ/GPTQ support for xpu backend
yma11 opened this pull request about 2 months ago
yma11 opened this pull request about 2 months ago
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark.
spliii opened this pull request about 2 months ago
spliii opened this pull request about 2 months ago
[Bug]: Engine loop has died for Meta-Llama-3.1-8B-Instruct TP=2
HaoyuWang4188 opened this issue about 2 months ago
HaoyuWang4188 opened this issue about 2 months ago
[V1][BugFix] Fix Generator construction in greedy + seed case
njhill opened this pull request about 2 months ago
njhill opened this pull request about 2 months ago
Add hf_transfer to testing image
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Kernel]Generalize Speculative decode from Cuda
xuechendi opened this pull request about 2 months ago
xuechendi opened this pull request about 2 months ago
[Usage]: disable pydantic request validation
matbee-eth opened this issue about 2 months ago
matbee-eth opened this issue about 2 months ago
Splitting attention kernel file
maleksan85 opened this pull request about 2 months ago
maleksan85 opened this pull request about 2 months ago
[Misc] Improve Web UI
rafvasq opened this pull request about 2 months ago
rafvasq opened this pull request about 2 months ago
[Feature]: Enhance integration with advanced LB/gateways with better load/cost reporting and LoRA management
liu-cong opened this issue about 2 months ago
liu-cong opened this issue about 2 months ago
[CI/Build] Automate PR body text cleanup
russellb opened this pull request about 2 months ago
russellb opened this pull request about 2 months ago
[Bug]:Structured outputs inference often took a very long time,and eventually causing a timeout and vLLM engine crushing.
hpx502766238 opened this issue about 2 months ago
hpx502766238 opened this issue about 2 months ago
[Feature]: Add Gamma Distribution Request Support for Serving Benchmark.
spliii opened this issue about 2 months ago
spliii opened this issue about 2 months ago
[Performance]: Throughput and Latency degradation with a single LoRA adapter on A100 40 GB
kaushikmitr opened this issue about 2 months ago
kaushikmitr opened this issue about 2 months ago
[Core] Add dynamic chunk size calculation
prashantgupta24 opened this pull request about 2 months ago
prashantgupta24 opened this pull request about 2 months ago
[Build] Fix for the Wswitch-bool clang warning
gshtras opened this pull request about 2 months ago
gshtras opened this pull request about 2 months ago
[Doc] Updated TPU install instructions
mikegre-google opened this pull request about 2 months ago
mikegre-google opened this pull request about 2 months ago
[Kernel] Refactor Cutlass c3x
varun-sundar-rabindranath opened this pull request about 2 months ago
varun-sundar-rabindranath opened this pull request about 2 months ago
[Benchmark] guided decoding
aarnphm opened this pull request about 2 months ago
aarnphm opened this pull request about 2 months ago
[0/N] Rename `MultiModalInputs` to `MultiModalKwargs`
DarkLight1337 opened this pull request about 2 months ago
DarkLight1337 opened this pull request about 2 months ago
[Bug]: PyTorch 2.5.x vLLM 1.0.0 dev issue with tensor parallel size > 1
CortexEdgeUser opened this issue about 2 months ago
CortexEdgeUser opened this issue about 2 months ago
Online video support for VLMs
litianjian opened this pull request about 2 months ago
litianjian opened this pull request about 2 months ago
Adding cascade inference to vLLM
raywanb opened this pull request about 2 months ago
raywanb opened this pull request about 2 months ago
[WIP] Ray Backend V1
rkooo567 opened this pull request about 2 months ago
rkooo567 opened this pull request about 2 months ago
[Bugfix] Upgrade to pytorch 2.5.1
bnellnm opened this pull request about 2 months ago
bnellnm opened this pull request about 2 months ago
[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined.
gcalmettes opened this pull request about 2 months ago
gcalmettes opened this pull request about 2 months ago
[Doc] Update VLM doc about loading from local files
ywang96 opened this pull request about 2 months ago
ywang96 opened this pull request about 2 months ago
[Bug]: last_token_time is equal to arrival_time
wolfgangsmdt opened this issue about 2 months ago
wolfgangsmdt opened this issue about 2 months ago
[Misc] Modify BNB parameter name
jeejeelee opened this pull request about 2 months ago
jeejeelee opened this pull request about 2 months ago
[Core] Enhance memory profiling in determine_num_available_blocks with error handling and fallback
Ahmed14z opened this pull request about 2 months ago
Ahmed14z opened this pull request about 2 months ago
[Bug]: For speculative decoding with a draft model, the "determine_num_available_blocks" only considers the memory usage of the target model
hustxiayang opened this issue about 2 months ago
hustxiayang opened this issue about 2 months ago
[Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep
tlrmchlsmth opened this pull request about 2 months ago
tlrmchlsmth opened this pull request about 2 months ago
[Bug]: Segment fault when import decord before import vllm
litianjian opened this issue about 2 months ago
litianjian opened this issue about 2 months ago
[Performance]: FP8 performance worse than FP16 for Qwen2-VL-2B-Instruct
LinJianping opened this issue about 2 months ago
LinJianping opened this issue about 2 months ago
[Bug]: Llama3.2 tool calling OpenAI API not working
SinanAkkoyun opened this issue about 2 months ago
SinanAkkoyun opened this issue about 2 months ago
[Bug]: I cannot able to load the model on TESLA T4 GPU in Full precision
VpkPrasanna opened this issue about 2 months ago
VpkPrasanna opened this issue about 2 months ago
[Bug]: internvl “max_dynamic_patch” not work, and add_special_tokens bug
wangpeng138375 opened this issue about 2 months ago
wangpeng138375 opened this issue about 2 months ago
[Bug]: [Regression Issue] The output from Qwen2 VL are different between vLLM v0.6.3-post1 and vLLM v0.6.1-post2
tjtanaa opened this issue about 2 months ago
tjtanaa opened this issue about 2 months ago
[Misc]Reduce BNB static variable
jeejeelee opened this pull request about 2 months ago
jeejeelee opened this pull request about 2 months ago
[Bug]: Deploying glm4 reported an error:"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set
shnyyds opened this issue about 2 months ago
shnyyds opened this issue about 2 months ago
[Usage]: Are there any batch size requirements for offline batch inference? For example, is 10,000 okay?
joyyyhuang opened this issue about 2 months ago
joyyyhuang opened this issue about 2 months ago
[Bugfix] Fix E2EL mean and median stats
daitran2k1 opened this pull request about 2 months ago
daitran2k1 opened this pull request about 2 months ago
[5/N] pass the whole config to model
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers
sroy745 opened this pull request about 2 months ago
sroy745 opened this pull request about 2 months ago
[Installation]: Model Architectures FalconMambaForCasualLM are not supported for now.
RohithDAces opened this issue about 2 months ago
RohithDAces opened this issue about 2 months ago
[4/N] make quant config first-class citizen
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Feature]: do you plan to support "suffix" of "v1/completions"
qiao-wei opened this issue about 2 months ago
qiao-wei opened this issue about 2 months ago
[Bugfix][OpenVINO] Fix circular reference #9939
MengqingCao opened this pull request about 2 months ago
MengqingCao opened this pull request about 2 months ago
[Bugfix] Fix `MQLLMEngine` hanging
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[V1] Prefix caching (take 2)
comaniac opened this pull request about 2 months ago
comaniac opened this pull request about 2 months ago
[Doc] correct schema in example batch jsonl file: max_completion_tokens -> max_tokens
staeiou opened this pull request about 2 months ago
staeiou opened this pull request about 2 months ago
[CI] Basic Integration Test For TPU
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Usage]: How to use `llava-hf/llava-1.5-7b-hf` with bitsandbytes quantization in `vllm serve`?
asadfgglie opened this issue about 2 months ago
asadfgglie opened this issue about 2 months ago
../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && in dex < sizes[i] && "index out of bounds"` failed.
Wiselnn570 opened this issue about 2 months ago
Wiselnn570 opened this issue about 2 months ago
[Bug]: ValueError:Could not broadcast input array from shape (542,) into shape (512,)
sherlockma11 opened this issue about 2 months ago
sherlockma11 opened this issue about 2 months ago
[help wanted]: fix broken xverse model
youkaichao opened this issue about 2 months ago
youkaichao opened this issue about 2 months ago
[Hardware][CPU] Add ARM CPU backend
ShawnD200 opened this pull request about 2 months ago
ShawnD200 opened this pull request about 2 months ago
[BugFix]: properly deserialize `tool_calls` iterator before processing by mistral-common when MistralTokenizer is used
gcalmettes opened this pull request about 2 months ago
gcalmettes opened this pull request about 2 months ago
[V1][VLM] Enable proper chunked prefill for multimodal models
ywang96 opened this pull request about 2 months ago
ywang96 opened this pull request about 2 months ago
[Bugfix] Fix Phi-3 BNB quantization with tensor parallel
Isotr0py opened this pull request about 2 months ago
Isotr0py opened this pull request about 2 months ago
[V1] Support per-request seed
njhill opened this pull request about 2 months ago
njhill opened this pull request about 2 months ago
[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1
FurtherAI opened this pull request about 2 months ago
FurtherAI opened this pull request about 2 months ago
[Doc] Add documentation for Structured Outputs
ismael-dm opened this pull request about 2 months ago
ismael-dm opened this pull request about 2 months ago
Bump the patch-update group with 3 updates
dependabot[bot] opened this pull request about 2 months ago
dependabot[bot] opened this pull request about 2 months ago
[Core]Add New Run:ai Streamer Load format.
pandyamarut opened this pull request about 2 months ago
pandyamarut opened this pull request about 2 months ago
[CI] Prune tests/models/decoder_only/language/* tests
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Bug]: from vllm.platforms import current_platform infinite loop error with OpenVino Build.
CalebXDonoho opened this issue about 2 months ago
CalebXDonoho opened this issue about 2 months ago
[Bug]: Phi-3 cannot be used with bitsandbytes
yananchen1989 opened this issue about 2 months ago
yananchen1989 opened this issue about 2 months ago
[CI] Prune down LM Eval test time
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[ci/build] Have dependabot ignore pinned dependencies
khluu opened this pull request about 2 months ago
khluu opened this pull request about 2 months ago
[CI] Prune back the number of tests in tests/kernels/*
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Bugfix] Fix pickle of input when async output processing is on
wallashss opened this pull request about 2 months ago
wallashss opened this pull request about 2 months ago
[misc] Allow partial prefix benchmarking & random input generation for prefix benchmarking
rickyyx opened this pull request about 2 months ago
rickyyx opened this pull request about 2 months ago
Doc: Improve benchmark documentation
rafvasq opened this pull request about 2 months ago
rafvasq opened this pull request about 2 months ago
[RFC] Propose a vulnerability management team
russellb opened this pull request about 2 months ago
russellb opened this pull request about 2 months ago
[Doc] Move CONTRIBUTING to docs site
russellb opened this pull request about 2 months ago
russellb opened this pull request about 2 months ago
[Frontend] Automatic detection of chat content format from AST
DarkLight1337 opened this pull request about 2 months ago
DarkLight1337 opened this pull request about 2 months ago
[Bug]: illegal memory access error when using prefix caching
StevenTang1998 opened this issue about 2 months ago
StevenTang1998 opened this issue about 2 months ago
[Bugfix] Fix MiniCPMV and Mllama BNB bug
jeejeelee opened this pull request about 2 months ago
jeejeelee opened this pull request about 2 months ago