Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[TPU] Update TPU CI to use torchxla nightly on 20250122
lsy323 opened this pull request 4 days ago
lsy323 opened this pull request 4 days ago
[V1] Add `uncache_blocks`
comaniac opened this pull request 4 days ago
comaniac opened this pull request 4 days ago
[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral`
rafvasq opened this pull request 4 days ago
rafvasq opened this pull request 4 days ago
add interleave sliding window by us FusedSDPA
libinta opened this pull request 4 days ago
libinta opened this pull request 4 days ago
[Usage]: trying to use generation_tokens_total and prompt_tokens_total to get total tokens in the current batch
annapendleton opened this issue 4 days ago
annapendleton opened this issue 4 days ago
Fixing the LoRA CI test.
Alexei-V-Ivanov-AMD opened this pull request 4 days ago
Alexei-V-Ivanov-AMD opened this pull request 4 days ago
[Misc]: RoPE vs Sliding Windows
ccruttjr opened this issue 4 days ago
ccruttjr opened this issue 4 days ago
[Core] Fix an isort error from pre-commit
russellb opened this pull request 4 days ago
russellb opened this pull request 4 days ago
[Docs] Document vulnerability disclosure process
russellb opened this pull request 4 days ago
russellb opened this pull request 4 days ago
[Core] Optimizing cross-attention `QKVParallelLinear` computation
NickLucche opened this pull request 4 days ago
NickLucche opened this pull request 4 days ago
[Feature]: Use `uv` in pre-commit
NickLucche opened this issue 4 days ago
NickLucche opened this issue 4 days ago
[Bug]: Speculative decoding does not work
JohnConnor123 opened this issue 4 days ago
JohnConnor123 opened this issue 4 days ago
[Usage]: Is it possible to speed up the generation speed by adding another video card?
JohnConnor123 opened this issue 4 days ago
JohnConnor123 opened this issue 4 days ago
[Usage]: The problems about the communication synchronization in disaggregated prefilling
midway2019 opened this issue 4 days ago
midway2019 opened this issue 4 days ago
[Misc] Improve the readability of BNB error messages
jeejeelee opened this pull request 4 days ago
jeejeelee opened this pull request 4 days ago
[Misc] Fix the error in the tip for the --lora-modules parameter
WangErXiao opened this pull request 4 days ago
WangErXiao opened this pull request 4 days ago
[Doc] Add docs for prompt replacement
DarkLight1337 opened this pull request 4 days ago
DarkLight1337 opened this pull request 4 days ago
[do-not-merge][perf-benchmark] cleanup unused docker images/containers
khluu opened this pull request 4 days ago
khluu opened this pull request 4 days ago
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode
ShangmingCai opened this pull request 4 days ago
ShangmingCai opened this pull request 4 days ago
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral
zhenwei-intel opened this pull request 4 days ago
zhenwei-intel opened this pull request 4 days ago
[V1][Frontend] Coalesce bunched `RequestOutput`s
njhill opened this pull request 4 days ago
njhill opened this pull request 4 days ago
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels
fenghuizhang opened this pull request 5 days ago
fenghuizhang opened this pull request 5 days ago
[Benchmark] More accurate TPOT calc in `benchmark_serving.py`
njhill opened this pull request 5 days ago
njhill opened this pull request 5 days ago
[Frontend][V1] Online serving performance improvements
njhill opened this pull request 5 days ago
njhill opened this pull request 5 days ago
[Core] tokens in queue metric
annapendleton opened this pull request 5 days ago
annapendleton opened this pull request 5 days ago
[Core] Support `reset_prefix_cache`
comaniac opened this pull request 5 days ago
comaniac opened this pull request 5 days ago
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD
rasmith opened this pull request 5 days ago
rasmith opened this pull request 5 days ago
[Feature]: Support pass in user-specified backend to torch dynamo piecewise compilation
maxyanghu opened this issue 5 days ago
maxyanghu opened this issue 5 days ago
[Usage]: deepseek v3 can not set tensor_parallel_size=16 and pipeline-parallel-size=2 on L20 #12256 Open
xwz-ol opened this issue 5 days ago
xwz-ol opened this issue 5 days ago
[torch.compile] decouple compile sizes and cudagraph sizes
youkaichao opened this pull request 5 days ago
youkaichao opened this pull request 5 days ago
[Frontend] Set server's maximum number of generated tokens using generation_config.json
mhendrey opened this pull request 5 days ago
mhendrey opened this pull request 5 days ago
[Docs] Update FP8 KV Cache documentation
mgoin opened this pull request 5 days ago
mgoin opened this pull request 5 days ago
[Bug]: ValueError: Model architectures ['LlamaForCausalLM'] failed to be inspected. Please check the logs for more details.
walker-ai opened this issue 6 days ago
walker-ai opened this issue 6 days ago
[Model] Add Qwen2 PRM model support
Isotr0py opened this pull request 6 days ago
Isotr0py opened this pull request 6 days ago
[Bug]: `minItems` and `maxItems` json schema constraint fails on `xgrammar` and did not fallback to `outlines`
Jason-CKY opened this issue 6 days ago
Jason-CKY opened this issue 6 days ago
[Usage]: Does vLLM support deploying the speculative model on a second device?
CharlesRiggins opened this issue 6 days ago
CharlesRiggins opened this issue 6 days ago
[Bug]: Dynamically load lora got wrong output
cxz91493 opened this issue 6 days ago
cxz91493 opened this issue 6 days ago
[New Model]: Qwen2.5-Math-PRM-7B, Qwen2.5-Math-PRM-72B
HaitaoWuTJU opened this issue 7 days ago
HaitaoWuTJU opened this issue 7 days ago
[Bug]: Inconsistent data received and sent using PyNcclPipe
fanfanaaaa opened this issue 7 days ago
fanfanaaaa opened this issue 7 days ago
[Bugfix] Fix incorrect types in LayerwiseProfileResults
terrytangyuan opened this pull request 7 days ago
terrytangyuan opened this pull request 7 days ago
[DOC] Add missing docstring for additional args in LLMEngine.add_request()
terrytangyuan opened this pull request 7 days ago
terrytangyuan opened this pull request 7 days ago
[DOC] Fix typo in SingleStepOutputProcessor docstring and assert message
terrytangyuan opened this pull request 7 days ago
terrytangyuan opened this pull request 7 days ago
[V1][Spec Decode] Ngram Spec Decode
LiuXiaoxuanPKU opened this pull request 7 days ago
LiuXiaoxuanPKU opened this pull request 7 days ago
[Bugfix] fix race condition that leads to wrong order of token returned
joennlae opened this pull request 7 days ago
joennlae opened this pull request 7 days ago
[torch.compile] fix sym_tensor_indices
youkaichao opened this pull request 7 days ago
youkaichao opened this pull request 7 days ago
[misc] add cuda runtime version to usage data
youkaichao opened this pull request 7 days ago
youkaichao opened this pull request 7 days ago
[Bug]: CUDA initialization error with vLLM 0.5.4 and PyTorch 2.4.0+cu121
TaoShuchang opened this issue 7 days ago
TaoShuchang opened this issue 7 days ago
[Bugfix] Fix multi-modal processors for transformers 4.48
DarkLight1337 opened this pull request 8 days ago
DarkLight1337 opened this pull request 8 days ago
[Misc] Add Gemma2 GGUF support
Isotr0py opened this pull request 8 days ago
Isotr0py opened this pull request 8 days ago
[Kernel] add triton fused moe kernel for gptq/awq
jinzhen-lin opened this pull request 8 days ago
jinzhen-lin opened this pull request 8 days ago
[Misc] Add BNB support to GLM4-V model
Isotr0py opened this pull request 8 days ago
Isotr0py opened this pull request 8 days ago
[Bug]: Fail to use beamsearch with llm.chat
gystar opened this issue 8 days ago
gystar opened this issue 8 days ago
[torch.compile] store inductor compiled Python file
youkaichao opened this pull request 8 days ago
youkaichao opened this pull request 8 days ago
[Feature]: Multi-Token Prediction (MTP)
casper-hansen opened this issue 8 days ago
casper-hansen opened this issue 8 days ago
[Bug]: Vllm can't load models from unsloth-bnb-4bit
kaiguy23 opened this issue 9 days ago
kaiguy23 opened this issue 9 days ago
[Bug]: Multi-Node Online Inference on TPUs Failing
BabyChouSr opened this issue 9 days ago
BabyChouSr opened this issue 9 days ago
[Bug]: AMD GPU docker image build No matching distribution found for torch==2.6.0.dev20241113+rocm6.2
samos123 opened this issue 9 days ago
samos123 opened this issue 9 days ago
[Bug]: Slow huggingface weights download. Sequential download
NikolaBorisov opened this issue 9 days ago
NikolaBorisov opened this issue 9 days ago
[Docs] Fix broken link in SECURITY.md
russellb opened this pull request 9 days ago
russellb opened this pull request 9 days ago
[RFC]: Distribute LoRA adapters across deployment
joerunde opened this issue 9 days ago
joerunde opened this issue 9 days ago
[AMD][CI/Build][Bugfix] updated pytorch stale wheel path by using stable wheel
hongxiayang opened this pull request 9 days ago
hongxiayang opened this pull request 9 days ago
[core] clean up executor class hierarchy between v1 and v0
youkaichao opened this pull request 9 days ago
youkaichao opened this pull request 9 days ago
[Model] Port deepseek-vl2 processor and remove `deepseek_vl2` dependency
Isotr0py opened this pull request 9 days ago
Isotr0py opened this pull request 9 days ago
[Bug]: Unable to serve Qwen2-audio in V1
superfan89 opened this issue 9 days ago
superfan89 opened this issue 9 days ago
[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor
kzawora-intel opened this pull request 9 days ago
kzawora-intel opened this pull request 9 days ago
[misc] fix cross-node TP
youkaichao opened this pull request 9 days ago
youkaichao opened this pull request 9 days ago
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
cennn opened this pull request 9 days ago
cennn opened this pull request 9 days ago
[Performance]: Very low generation throughput on CPU
SLIBM opened this issue 9 days ago
SLIBM opened this issue 9 days ago
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu
madamczykhabana opened this pull request 9 days ago
madamczykhabana opened this pull request 9 days ago
[New Model]: NV-Embed-v2
Hypothesis-Z opened this issue 10 days ago
Hypothesis-Z opened this issue 10 days ago
[WIP] Multimodal model support for V1 TPU
mgoin opened this pull request 10 days ago
mgoin opened this pull request 10 days ago
[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node
drikster80 opened this issue 10 days ago
drikster80 opened this issue 10 days ago
[Bug]: Close feature gaps when using xgrammar for structured output
russellb opened this issue 10 days ago
russellb opened this issue 10 days ago
[V1] Add V1 support of Qwen2-VL
ywang96 opened this pull request 10 days ago
ywang96 opened this pull request 10 days ago
[core] further polish memory profiling
youkaichao opened this pull request 10 days ago
youkaichao opened this pull request 10 days ago
[Bug]: XGrammar-based CFG decoding degraded after 0.6.5
AlbertoCastelo opened this issue 10 days ago
AlbertoCastelo opened this issue 10 days ago
[Misc] Update to Transformers 4.48
tlrmchlsmth opened this pull request 10 days ago
tlrmchlsmth opened this pull request 10 days ago
[BUILD] Add VLLM_BUILD_EXT to control custom op build
MengqingCao opened this pull request 10 days ago
MengqingCao opened this pull request 10 days ago
[V1] Collect env var for usage stats
simon-mo opened this pull request 10 days ago
simon-mo opened this pull request 10 days ago
[Bugfix] Fix test_long_context.py and activation kernels
jeejeelee opened this pull request 10 days ago
jeejeelee opened this pull request 10 days ago
benchmark_serving support --served-model-name param
gujingit opened this pull request 10 days ago
gujingit opened this pull request 10 days ago
[Misc]add modules_to_not_convert attribute to gptq series
1096125073 opened this pull request 10 days ago
1096125073 opened this pull request 10 days ago
[Misc][LoRA] Improve the readability of LoRA error messages during loading
jeejeelee opened this pull request 11 days ago
jeejeelee opened this pull request 11 days ago
[Performance]: Question about TTFT for ngram speculative decoding
ynwang007 opened this issue 11 days ago
ynwang007 opened this issue 11 days ago
[New Model]: internlm3-8b-instruct
engchina opened this issue 11 days ago
engchina opened this issue 11 days ago
[Bug]: Discrepancies in the llama layer forward function between meta-llama, transformers and vLLM.
mcubuktepe opened this issue 11 days ago
mcubuktepe opened this issue 11 days ago
Use CUDA 12.4 as default for release and nightly wheels
mgoin opened this pull request 11 days ago
mgoin opened this pull request 11 days ago
Add: Support for Sparse24Bitmask Compressed Models
rahul-tuli opened this pull request 11 days ago
rahul-tuli opened this pull request 11 days ago
[Bug]: Corrupted responses for Llama-3.2-3B-Instruct with v0.6.6.post1
bsatzger opened this issue 11 days ago
bsatzger opened this issue 11 days ago
[Bug]: whisper example issue?
silvacarl2 opened this issue 11 days ago
silvacarl2 opened this issue 11 days ago
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync
youngkent opened this pull request 11 days ago
youngkent opened this pull request 11 days ago
[Kernel] Flash Attention 3 Support
LucasWilkinson opened this pull request 11 days ago
LucasWilkinson opened this pull request 11 days ago
[Bug]: config format not found in llama family model
angerhang opened this issue 11 days ago
angerhang opened this issue 11 days ago
[Bugfix] Fix _get_lora_device for HQQ marlin
varun-sundar-rabindranath opened this pull request 11 days ago
varun-sundar-rabindranath opened this pull request 11 days ago
Various cosmetic/comment fixes
mgoin opened this pull request 11 days ago
mgoin opened this pull request 11 days ago
[delete]
Aktsvigun opened this pull request 11 days ago
Aktsvigun opened this pull request 11 days ago
Allow hip sources to be directly included when compiling for rocm.
tvirolai-amd opened this pull request 11 days ago
tvirolai-amd opened this pull request 11 days ago
[V1][WIP] Add KV cache group dimension to block table
heheda12345 opened this pull request 11 days ago
heheda12345 opened this pull request 11 days ago
[Usage]: Token Embeddings from LLMs/VLMs
conceptofmind opened this issue 11 days ago
conceptofmind opened this issue 11 days ago