Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bug]: vllm 0.6.4 部署 MiniCPM-V_2_6_awq_int4 报错
fengqiliang93 opened this issue 10 days ago
fengqiliang93 opened this issue 10 days ago
[Bugfix] Fix none seed sampling in rejection_sampler
TopIdiot opened this pull request 10 days ago
TopIdiot opened this pull request 10 days ago
[Bug]: N-gram speculative decoding got wrong output when some of seeds is None in a batch
TopIdiot opened this issue 10 days ago
TopIdiot opened this issue 10 days ago
[Bug]: InternVL2-Llama3-76B-AWQ RUN ERROR KeyError: 'layers.39.mlp.gate_up_proj.qweight'
Oldpan opened this issue 10 days ago
Oldpan opened this issue 10 days ago
[Misc][LoRA] Ensure Lora Adapter requests return adapter name
Jeffwan opened this pull request 11 days ago
Jeffwan opened this pull request 11 days ago
[Bug]: lora adapter request still return the base model name
Jeffwan opened this issue 11 days ago
Jeffwan opened this issue 11 days ago
[Misc]: Potential division by zero in csrc/cpu/attention.cpp
Xaenalt opened this issue 12 days ago
Xaenalt opened this issue 12 days ago
[Bug]: Actively generated `request` is starved when new requests arrive (tensor parallel)
llmwiz opened this issue 12 days ago
llmwiz opened this issue 12 days ago
[Misc]: Brand guidelines around vLLM logo; is there a media kit that can be downloaded with brand assets?
jessicachitas opened this issue 12 days ago
jessicachitas opened this issue 12 days ago
[RFC]: Adding support for Geospatial models
christian-pinto opened this issue 12 days ago
christian-pinto opened this issue 12 days ago
[Bug]: When I use llmcompressor to quantify the llama3 70b model to int8-a8w8,it shows ValueError: Failed to invert hessian due to numerical instability.
rexmxw02 opened this issue 12 days ago
rexmxw02 opened this issue 12 days ago
[Hardware][CPU] support cpu in v1 engine
yma11 opened this pull request 12 days ago
yma11 opened this pull request 12 days ago
[Performance]: It takes over 20 hours to quantize llama3-70B with w8a8 and I wonder does it meet expectations?
moonlightian opened this issue 12 days ago
moonlightian opened this issue 12 days ago
[V1][Bugfix] Always set enable_chunked_prefill = True for V1
WoosukKwon opened this pull request 12 days ago
WoosukKwon opened this pull request 12 days ago
Don't try to add special tokens to the matcher in XGrammar.
sjuxax opened this pull request 12 days ago
sjuxax opened this pull request 12 days ago
[torch.compile] add a flag to track batchsize statistics
youkaichao opened this pull request 12 days ago
youkaichao opened this pull request 12 days ago
[Bugfix] Fix value unpack error of simple connector for KVCache transfer.
ShangmingCai opened this pull request 12 days ago
ShangmingCai opened this pull request 12 days ago
make `fused_moe_kernel`'s `EM` and `num_valid_tokens` arguments `do_not_specialize`
JiayiFeng opened this pull request 12 days ago
JiayiFeng opened this pull request 12 days ago
[Performance]: Arguments `EM` and `num_valid_tokens` of `fused_moe_kernel` should be set to `do_not_specialize`
JiayiFeng opened this issue 12 days ago
JiayiFeng opened this issue 12 days ago
[Bug, V1]: Service launch failed with v1 code and custom models
PYNing opened this issue 12 days ago
PYNing opened this issue 12 days ago
[Bug]: bug when using 8*GPU, Error while creating shared memory segment
Tan-Hexiang opened this issue 12 days ago
Tan-Hexiang opened this issue 12 days ago
[CI/Build] Increase VLLM_MAX_SIZE_MB to 300M
tolak opened this pull request 12 days ago
tolak opened this pull request 12 days ago
[Performance]: TGI processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config
paulcx opened this issue 12 days ago
paulcx opened this issue 12 days ago
[Feature]: logging request_id instead of random uuid
cynial opened this issue 12 days ago
cynial opened this issue 12 days ago
[Usage]: How to specify the local storage path for vllm download models?
MiDonkey opened this issue 12 days ago
MiDonkey opened this issue 12 days ago
[CI] Expand OpenAI guided decoding tests
mgoin opened this pull request 12 days ago
mgoin opened this pull request 12 days ago
[Bugfix] cuda error running llama 3.2
GeneDer opened this pull request 12 days ago
GeneDer opened this pull request 12 days ago
[Bugfix] Fix guided decoding with tokenizer mode mistral
wallashss opened this pull request 12 days ago
wallashss opened this pull request 12 days ago
[Bug]: Guided decoding crashing when tokenizer_mode is set to mistral
wallashss opened this issue 12 days ago
wallashss opened this issue 12 days ago
[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF.
sjuxax opened this pull request 12 days ago
sjuxax opened this pull request 12 days ago
[Pixtral] Improve loading
patrickvonplaten opened this pull request 12 days ago
patrickvonplaten opened this pull request 12 days ago
[Bugfix] Handle <|tool_call|> token in granite tool parser
tjohnson31415 opened this pull request 13 days ago
tjohnson31415 opened this pull request 13 days ago
[Bugfix] Backport request id validation to v0
joerunde opened this pull request 13 days ago
joerunde opened this pull request 13 days ago
Update README.md
dmoliveira opened this pull request 13 days ago
dmoliveira opened this pull request 13 days ago
[Kernel] Triton Paged Attn Decode Kernel
rahulbatra85 opened this pull request 13 days ago
rahulbatra85 opened this pull request 13 days ago
[V1] Use input_ids as input for text-only models
WoosukKwon opened this pull request 13 days ago
WoosukKwon opened this pull request 13 days ago
monitor metrics of tokens per step using cudagraph batchsizes
youkaichao opened this pull request 13 days ago
youkaichao opened this pull request 13 days ago
[Hardware][Gaudi] Add multiprocessing HPU executor
kzawora-intel opened this pull request 13 days ago
kzawora-intel opened this pull request 13 days ago
[Frontend] Add OpenAI API support for input_audio
kylehh opened this pull request 13 days ago
kylehh opened this pull request 13 days ago
[Bugfix] Fix usage of `deprecated` decorator
DarkLight1337 opened this pull request 13 days ago
DarkLight1337 opened this pull request 13 days ago
[Model] Add Llama-SwiftKV model
aurickq opened this pull request 13 days ago
aurickq opened this pull request 13 days ago
[BUG] Remove token param #10921
flaviabeo opened this pull request 13 days ago
flaviabeo opened this pull request 13 days ago
[V1] VLM preprocessor hashing
alexm-neuralmagic opened this pull request 13 days ago
alexm-neuralmagic opened this pull request 13 days ago
Avoid mistakenly picking Gaudi/HPU if XPU is requested.
janimo opened this pull request 13 days ago
janimo opened this pull request 13 days ago
[Misc]: Has anyone tried to run Microsoft Graphrag with vllm?
SushmitaSingh96 opened this issue 13 days ago
SushmitaSingh96 opened this issue 13 days ago
[Neuron] Upgrade neuron to 2.20.2
xendo opened this pull request 13 days ago
xendo opened this pull request 13 days ago
[Performance]: Is it a normal case that sampling will take up most of time during the execution of one iteration?
oldcpple opened this issue 13 days ago
oldcpple opened this issue 13 days ago
[Usage]: Multiple rounds of image dialogue support ?(多轮图片对话支持?)
qingchen177 opened this issue 13 days ago
qingchen177 opened this issue 13 days ago
[torch.compile] add dynamo time tracking
youkaichao opened this pull request 13 days ago
youkaichao opened this pull request 13 days ago
[Performance]: why pipeline parallel performance will be severely degraded when using offline batching?
zhaocaibei123 opened this issue 13 days ago
zhaocaibei123 opened this issue 13 days ago
[Misc][LoRA] Add PEFTHelper for LoRA
jeejeelee opened this pull request 13 days ago
jeejeelee opened this pull request 13 days ago
[Feature]: Support for Qwen2-VL on AWS Neuron
Chin-Vic opened this issue 13 days ago
Chin-Vic opened this issue 13 days ago
[Feature]: Use Block Group for KV cache allocation in FastSwitch to support better I/O usage
aoshen524 opened this issue 13 days ago
aoshen524 opened this issue 13 days ago
[v1] fix use compile sizes
youkaichao opened this pull request 13 days ago
youkaichao opened this pull request 13 days ago
[misc] clean up and unify logging
youkaichao opened this pull request 13 days ago
youkaichao opened this pull request 13 days ago
[Doc][V1] Add V1 support column for multimodal models
ywang96 opened this pull request 13 days ago
ywang96 opened this pull request 13 days ago
[V1] Fix Detokenizer loading in `AsyncLLM`
ywang96 opened this pull request 13 days ago
ywang96 opened this pull request 13 days ago
[core] clean up cudagraph batchsize padding logic
youkaichao opened this pull request 13 days ago
youkaichao opened this pull request 13 days ago
[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support
dsikka opened this pull request 14 days ago
dsikka opened this pull request 14 days ago
[Usage]: Qwen/Qwen2-VL-7B-Instruct
mahmoudelnazer opened this issue 14 days ago
mahmoudelnazer opened this issue 14 days ago
[torch.compile][misc] fix comments
youkaichao opened this pull request 14 days ago
youkaichao opened this pull request 14 days ago
[Model] PP support for Mamba-like models
mzusman opened this pull request 14 days ago
mzusman opened this pull request 14 days ago
[CI/Build] Check transformers v4.47
DarkLight1337 opened this pull request 14 days ago
DarkLight1337 opened this pull request 14 days ago
[Doc]: ValueError: Model architectures ['Qwen2ForSequenceClassification'] are not supported for no
skywindy opened this issue 14 days ago
skywindy opened this issue 14 days ago
[V1] Further reduce CPU overheads in flash-attn
WoosukKwon opened this pull request 14 days ago
WoosukKwon opened this pull request 14 days ago
[V1][VLM] Add V1-rearch image inference support for Qwen2-VL
ywang96 opened this pull request 14 days ago
ywang96 opened this pull request 14 days ago
[Bug]: Qwen2VL doesn't work with TPU backend
carlesoctav opened this issue 14 days ago
carlesoctav opened this issue 14 days ago
[core][distributed] initialization from StatelessProcessGroup
youkaichao opened this pull request 14 days ago
youkaichao opened this pull request 14 days ago
[Doc] Update README.md
habaohaba opened this pull request 14 days ago
habaohaba opened this pull request 14 days ago
[torch.compile] allow candidate compile sizes
youkaichao opened this pull request 14 days ago
youkaichao opened this pull request 14 days ago
[Bug]: LLama 3.2 vision focuses only on first image
hrodruck opened this issue 15 days ago
hrodruck opened this issue 15 days ago
Update benchmarking code
Faraz9877 opened this pull request 15 days ago
Faraz9877 opened this pull request 15 days ago
[Bug]: Why is structured output in 0.6.4.post1 overflowing my RAM but 0.6.3.post1 has a workaround?
Leon-Sander opened this issue 15 days ago
Leon-Sander opened this issue 15 days ago
[V1][WIP] V1 sampler implements parallel sampling (PR 1/N for parallel sampling support)
afeldman-nm opened this pull request 15 days ago
afeldman-nm opened this pull request 15 days ago
[Bugfix] Multiple fixes to tool streaming with hermes and mistral
cedonley opened this pull request 15 days ago
cedonley opened this pull request 15 days ago
[Doc] Explicitly state that InternVL 2.5 is supported
DarkLight1337 opened this pull request 15 days ago
DarkLight1337 opened this pull request 15 days ago
[Model] Implement merged input processor for Phi-3-Vision models
Isotr0py opened this pull request 15 days ago
Isotr0py opened this pull request 15 days ago
[core][executor] simplify instance id
youkaichao opened this pull request 15 days ago
youkaichao opened this pull request 15 days ago
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet
DarkLight1337 opened this pull request 15 days ago
DarkLight1337 opened this pull request 15 days ago
[Usage]: How to run local model in docker with cpu
yuzifu opened this issue 15 days ago
yuzifu opened this issue 15 days ago
[Bugfix] Fix test-pipeline.yaml
jeejeelee opened this pull request 15 days ago
jeejeelee opened this pull request 15 days ago
[torch.compile] use depyf to dump torch.compile internals
youkaichao opened this pull request 15 days ago
youkaichao opened this pull request 15 days ago
[Bug]: Vllm CPU mode only takes 1 single core for multi-core cpu
fzyzcjy opened this issue 15 days ago
fzyzcjy opened this issue 15 days ago
[Bug]: embedding model not supported
cosmic-chichu opened this issue 15 days ago
cosmic-chichu opened this issue 15 days ago
[Bug]: ngram Speculation for LlamaForCausalLM Models Fails due to Sampler
avnukala opened this issue 15 days ago
avnukala opened this issue 15 days ago
[Frontend] Use request id from header
joerunde opened this pull request 15 days ago
joerunde opened this pull request 15 days ago
[Misc]: Saved sharded state should also include GPU P2P access cache
k4rth33k opened this issue 15 days ago
k4rth33k opened this issue 15 days ago
[Usage]: Unable to server embedding model e5-mistral-7b-instruct
SushmitaSingh96 opened this issue 16 days ago
SushmitaSingh96 opened this issue 16 days ago
[Core] Add support for loading weight that has already done TP sharding
HollowMan6 opened this pull request 16 days ago
HollowMan6 opened this pull request 16 days ago
[New Model]: Add support for Llama3.3
jorgeantonio21 opened this issue 16 days ago
jorgeantonio21 opened this issue 16 days ago
[Bug]: Can't load/compile Mixtral-8x7B-Instruct-v0.1 on TPU
hosseinsarshar opened this issue 16 days ago
hosseinsarshar opened this issue 16 days ago
[V1] Input Batch Relocation
varun-sundar-rabindranath opened this pull request 16 days ago
varun-sundar-rabindranath opened this pull request 16 days ago
[Core] Cleanup startup logging a bit
russellb opened this pull request 16 days ago
russellb opened this pull request 16 days ago
[misc] fix typo
youkaichao opened this pull request 16 days ago
youkaichao opened this pull request 16 days ago
[V1] Run mypy on
WoosukKwon opened this pull request 16 days ago
WoosukKwon opened this pull request 16 days ago
[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation
Isotr0py opened this pull request 16 days ago
Isotr0py opened this pull request 16 days ago
[V1] LoRA Support
varun-sundar-rabindranath opened this pull request 16 days ago
varun-sundar-rabindranath opened this pull request 16 days ago
[ci] fix broken tests
youkaichao opened this pull request 16 days ago
youkaichao opened this pull request 16 days ago
[Misc][LoRA] Abstract PunicaWrapper
jeejeelee opened this pull request 16 days ago
jeejeelee opened this pull request 16 days ago
[Bug]: Function calling not working properly for Qwen2.5-Coder models
wizche opened this issue 16 days ago
wizche opened this issue 16 days ago