Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[TPU] Update TPU CI to use torchxla nightly on 20250122

lsy323 opened this pull request 4 days ago
[V1] Add `uncache_blocks`

comaniac opened this pull request 4 days ago
add interleave sliding window by us FusedSDPA

libinta opened this pull request 4 days ago
Fixing the LoRA CI test.

Alexei-V-Ivanov-AMD opened this pull request 4 days ago
[Misc]: RoPE vs Sliding Windows

ccruttjr opened this issue 4 days ago
[Core] Fix an isort error from pre-commit

russellb opened this pull request 4 days ago
[Docs] Document vulnerability disclosure process

russellb opened this pull request 4 days ago
[Core] Optimizing cross-attention `QKVParallelLinear` computation

NickLucche opened this pull request 4 days ago
[Feature]: Use `uv` in pre-commit

NickLucche opened this issue 4 days ago
[Bug]: Speculative decoding does not work

JohnConnor123 opened this issue 4 days ago
[Misc] Improve the readability of BNB error messages

jeejeelee opened this pull request 4 days ago
[Misc] Fix the error in the tip for the --lora-modules parameter

WangErXiao opened this pull request 4 days ago
[Doc] Add docs for prompt replacement

DarkLight1337 opened this pull request 4 days ago
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode

ShangmingCai opened this pull request 4 days ago
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral

zhenwei-intel opened this pull request 4 days ago
[V1][Frontend] Coalesce bunched `RequestOutput`s

njhill opened this pull request 4 days ago
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels

fenghuizhang opened this pull request 5 days ago
[Benchmark] More accurate TPOT calc in `benchmark_serving.py`

njhill opened this pull request 5 days ago
[Frontend][V1] Online serving performance improvements

njhill opened this pull request 5 days ago
[Core] tokens in queue metric

annapendleton opened this pull request 5 days ago
[Core] Support `reset_prefix_cache`

comaniac opened this pull request 5 days ago
[torch.compile] decouple compile sizes and cudagraph sizes

youkaichao opened this pull request 5 days ago
[Docs] Update FP8 KV Cache documentation

mgoin opened this pull request 5 days ago
[Model] Add Qwen2 PRM model support

Isotr0py opened this pull request 6 days ago
[Bug]: Dynamically load lora got wrong output

cxz91493 opened this issue 6 days ago
[New Model]: Qwen2.5-Math-PRM-7B, Qwen2.5-Math-PRM-72B

HaitaoWuTJU opened this issue 7 days ago
[Bug]: Inconsistent data received and sent using PyNcclPipe

fanfanaaaa opened this issue 7 days ago
[Bugfix] Fix incorrect types in LayerwiseProfileResults

terrytangyuan opened this pull request 7 days ago
[DOC] Add missing docstring for additional args in LLMEngine.add_request()

terrytangyuan opened this pull request 7 days ago
[DOC] Fix typo in SingleStepOutputProcessor docstring and assert message

terrytangyuan opened this pull request 7 days ago
[V1][Spec Decode] Ngram Spec Decode

LiuXiaoxuanPKU opened this pull request 7 days ago
[torch.compile] fix sym_tensor_indices

youkaichao opened this pull request 7 days ago
[misc] add cuda runtime version to usage data

youkaichao opened this pull request 7 days ago
[Bugfix] Fix multi-modal processors for transformers 4.48

DarkLight1337 opened this pull request 8 days ago
[Misc] Add Gemma2 GGUF support

Isotr0py opened this pull request 8 days ago
[Kernel] add triton fused moe kernel for gptq/awq

jinzhen-lin opened this pull request 8 days ago
[Misc] Add BNB support to GLM4-V model

Isotr0py opened this pull request 8 days ago
[Bug]: Fail to use beamsearch with llm.chat

gystar opened this issue 8 days ago
[torch.compile] store inductor compiled Python file

youkaichao opened this pull request 8 days ago
[Feature]: Multi-Token Prediction (MTP)

casper-hansen opened this issue 8 days ago
[Bug]: Vllm can't load models from unsloth-bnb-4bit

kaiguy23 opened this issue 9 days ago
[Bug]: Multi-Node Online Inference on TPUs Failing

BabyChouSr opened this issue 9 days ago
[Bug]: Slow huggingface weights download. Sequential download

NikolaBorisov opened this issue 9 days ago
[Docs] Fix broken link in SECURITY.md

russellb opened this pull request 9 days ago
[RFC]: Distribute LoRA adapters across deployment

joerunde opened this issue 9 days ago
[core] clean up executor class hierarchy between v1 and v0

youkaichao opened this pull request 9 days ago
[Bug]: Unable to serve Qwen2-audio in V1

superfan89 opened this issue 9 days ago
[misc] fix cross-node TP

youkaichao opened this pull request 9 days ago
[Performance]: Very low generation throughput on CPU

SLIBM opened this issue 9 days ago
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu

madamczykhabana opened this pull request 9 days ago
[New Model]: NV-Embed-v2

Hypothesis-Z opened this issue 10 days ago
[WIP] Multimodal model support for V1 TPU

mgoin opened this pull request 10 days ago
[V1] Add V1 support of Qwen2-VL

ywang96 opened this pull request 10 days ago
[core] further polish memory profiling

youkaichao opened this pull request 10 days ago
[Bug]: XGrammar-based CFG decoding degraded after 0.6.5

AlbertoCastelo opened this issue 10 days ago
[Misc] Update to Transformers 4.48

tlrmchlsmth opened this pull request 10 days ago
[BUILD] Add VLLM_BUILD_EXT to control custom op build

MengqingCao opened this pull request 10 days ago
[V1] Collect env var for usage stats

simon-mo opened this pull request 10 days ago
[Bugfix] Fix test_long_context.py and activation kernels

jeejeelee opened this pull request 10 days ago
benchmark_serving support --served-model-name param

gujingit opened this pull request 10 days ago
[Misc]add modules_to_not_convert attribute to gptq series

1096125073 opened this pull request 10 days ago
[New Model]: internlm3-8b-instruct

engchina opened this issue 11 days ago
Use CUDA 12.4 as default for release and nightly wheels

mgoin opened this pull request 11 days ago
Add: Support for Sparse24Bitmask Compressed Models

rahul-tuli opened this pull request 11 days ago
[Bug]: whisper example issue?

silvacarl2 opened this issue 11 days ago
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync

youngkent opened this pull request 11 days ago
[Kernel] Flash Attention 3 Support

LucasWilkinson opened this pull request 11 days ago
[Bug]: config format not found in llama family model

angerhang opened this issue 11 days ago
[Bugfix] Fix _get_lora_device for HQQ marlin

varun-sundar-rabindranath opened this pull request 11 days ago
Various cosmetic/comment fixes

mgoin opened this pull request 11 days ago
[delete]

Aktsvigun opened this pull request 11 days ago
Allow hip sources to be directly included when compiling for rocm.

tvirolai-amd opened this pull request 11 days ago
[V1][WIP] Add KV cache group dimension to block table

heheda12345 opened this pull request 11 days ago
[Usage]: Token Embeddings from LLMs/VLMs

conceptofmind opened this issue 11 days ago