Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[TPU] Update TPU CI to use torchxla nightly on 20250122
github.com/vllm-project/vllm - lsy323 opened this pull request 9 days ago
github.com/vllm-project/vllm - lsy323 opened this pull request 9 days ago
[V1] Add `uncache_blocks`
github.com/vllm-project/vllm - comaniac opened this pull request 9 days ago
github.com/vllm-project/vllm - comaniac opened this pull request 9 days ago
[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral`
github.com/vllm-project/vllm - rafvasq opened this pull request 9 days ago
github.com/vllm-project/vllm - rafvasq opened this pull request 9 days ago
add interleave sliding window by us FusedSDPA
github.com/vllm-project/vllm - libinta opened this pull request 10 days ago
github.com/vllm-project/vllm - libinta opened this pull request 10 days ago
[Usage]: trying to use generation_tokens_total and prompt_tokens_total to get total tokens in the current batch
github.com/vllm-project/vllm - annapendleton opened this issue 10 days ago
github.com/vllm-project/vllm - annapendleton opened this issue 10 days ago
Fixing the LoRA CI test.
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 10 days ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 10 days ago
[Misc]: RoPE vs Sliding Windows
github.com/vllm-project/vllm - ccruttjr opened this issue 10 days ago
github.com/vllm-project/vllm - ccruttjr opened this issue 10 days ago
[Core] Fix an isort error from pre-commit
github.com/vllm-project/vllm - russellb opened this pull request 10 days ago
github.com/vllm-project/vllm - russellb opened this pull request 10 days ago
[Docs] Document vulnerability disclosure process
github.com/vllm-project/vllm - russellb opened this pull request 10 days ago
github.com/vllm-project/vllm - russellb opened this pull request 10 days ago
[Core] Optimizing cross-attention `QKVParallelLinear` computation
github.com/vllm-project/vllm - NickLucche opened this pull request 10 days ago
github.com/vllm-project/vllm - NickLucche opened this pull request 10 days ago
[Feature]: Use `uv` in pre-commit
github.com/vllm-project/vllm - NickLucche opened this issue 10 days ago
github.com/vllm-project/vllm - NickLucche opened this issue 10 days ago
[Bug]: Speculative decoding does not work
github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago
github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago
[Usage]: Is it possible to speed up the generation speed by adding another video card?
github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago
github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago
[Usage]: The problems about the communication synchronization in disaggregated prefilling
github.com/vllm-project/vllm - midway2019 opened this issue 10 days ago
github.com/vllm-project/vllm - midway2019 opened this issue 10 days ago
[Misc] Improve the readability of BNB error messages
github.com/vllm-project/vllm - jeejeelee opened this pull request 10 days ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 10 days ago
[Misc] Fix the error in the tip for the --lora-modules parameter
github.com/vllm-project/vllm - WangErXiao opened this pull request 10 days ago
github.com/vllm-project/vllm - WangErXiao opened this pull request 10 days ago
[Doc] Add docs for prompt replacement
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 10 days ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 10 days ago
[do-not-merge][perf-benchmark] cleanup unused docker images/containers
github.com/vllm-project/vllm - khluu opened this pull request 10 days ago
github.com/vllm-project/vllm - khluu opened this pull request 10 days ago
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode
github.com/vllm-project/vllm - ShangmingCai opened this pull request 10 days ago
github.com/vllm-project/vllm - ShangmingCai opened this pull request 10 days ago
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral
github.com/vllm-project/vllm - zhenwei-intel opened this pull request 10 days ago
github.com/vllm-project/vllm - zhenwei-intel opened this pull request 10 days ago
[V1][Frontend] Coalesce bunched `RequestOutput`s
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels
github.com/vllm-project/vllm - fenghuizhang opened this pull request 10 days ago
github.com/vllm-project/vllm - fenghuizhang opened this pull request 10 days ago
[Benchmark] More accurate TPOT calc in `benchmark_serving.py`
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
[Frontend][V1] Online serving performance improvements
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
github.com/vllm-project/vllm - njhill opened this pull request 10 days ago
[Core] tokens in queue metric
github.com/vllm-project/vllm - annapendleton opened this pull request 10 days ago
github.com/vllm-project/vllm - annapendleton opened this pull request 10 days ago
[Core] Support `reset_prefix_cache`
github.com/vllm-project/vllm - comaniac opened this pull request 10 days ago
github.com/vllm-project/vllm - comaniac opened this pull request 10 days ago
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD
github.com/vllm-project/vllm - rasmith opened this pull request 11 days ago
github.com/vllm-project/vllm - rasmith opened this pull request 11 days ago
[Feature]: Support pass in user-specified backend to torch dynamo piecewise compilation
github.com/vllm-project/vllm - maxyanghu opened this issue 11 days ago
github.com/vllm-project/vllm - maxyanghu opened this issue 11 days ago
[Usage]: deepseek v3 can not set tensor_parallel_size=16 and pipeline-parallel-size=2 on L20 #12256 Open
github.com/vllm-project/vllm - xwz-ol opened this issue 11 days ago
github.com/vllm-project/vllm - xwz-ol opened this issue 11 days ago
[torch.compile] decouple compile sizes and cudagraph sizes
github.com/vllm-project/vllm - youkaichao opened this pull request 11 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 11 days ago
[Frontend] Set server's maximum number of generated tokens using generation_config.json
github.com/vllm-project/vllm - mhendrey opened this pull request 11 days ago
github.com/vllm-project/vllm - mhendrey opened this pull request 11 days ago
[Docs] Update FP8 KV Cache documentation
github.com/vllm-project/vllm - mgoin opened this pull request 11 days ago
github.com/vllm-project/vllm - mgoin opened this pull request 11 days ago
[Bug]: ValueError: Model architectures ['LlamaForCausalLM'] failed to be inspected. Please check the logs for more details.
github.com/vllm-project/vllm - walker-ai opened this issue 12 days ago
github.com/vllm-project/vllm - walker-ai opened this issue 12 days ago
[Model] Add Qwen2 PRM model support
github.com/vllm-project/vllm - Isotr0py opened this pull request 12 days ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 12 days ago
[Bug]: `minItems` and `maxItems` json schema constraint fails on `xgrammar` and did not fallback to `outlines`
github.com/vllm-project/vllm - Jason-CKY opened this issue 12 days ago
github.com/vllm-project/vllm - Jason-CKY opened this issue 12 days ago
[Usage]: Does vLLM support deploying the speculative model on a second device?
github.com/vllm-project/vllm - CharlesRiggins opened this issue 12 days ago
github.com/vllm-project/vllm - CharlesRiggins opened this issue 12 days ago
[Bug]: Dynamically load lora got wrong output
github.com/vllm-project/vllm - cxz91493 opened this issue 12 days ago
github.com/vllm-project/vllm - cxz91493 opened this issue 12 days ago
[New Model]: Qwen2.5-Math-PRM-7B, Qwen2.5-Math-PRM-72B
github.com/vllm-project/vllm - HaitaoWuTJU opened this issue 12 days ago
github.com/vllm-project/vllm - HaitaoWuTJU opened this issue 12 days ago
[Bug]: Inconsistent data received and sent using PyNcclPipe
github.com/vllm-project/vllm - fanfanaaaa opened this issue 12 days ago
github.com/vllm-project/vllm - fanfanaaaa opened this issue 12 days ago
[Bugfix] Fix incorrect types in LayerwiseProfileResults
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
[DOC] Add missing docstring for additional args in LLMEngine.add_request()
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
[DOC] Fix typo in SingleStepOutputProcessor docstring and assert message
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago
[V1][Spec Decode] Ngram Spec Decode
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 12 days ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 12 days ago
[Bugfix] fix race condition that leads to wrong order of token returned
github.com/vllm-project/vllm - joennlae opened this pull request 13 days ago
github.com/vllm-project/vllm - joennlae opened this pull request 13 days ago
[torch.compile] fix sym_tensor_indices
github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago
[misc] add cuda runtime version to usage data
github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago
[Bug]: CUDA initialization error with vLLM 0.5.4 and PyTorch 2.4.0+cu121
github.com/vllm-project/vllm - TaoShuchang opened this issue 13 days ago
github.com/vllm-project/vllm - TaoShuchang opened this issue 13 days ago
[Bugfix] Fix multi-modal processors for transformers 4.48
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 14 days ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 14 days ago
[Misc] Add Gemma2 GGUF support
github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago
[Kernel] add triton fused moe kernel for gptq/awq
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 14 days ago
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 14 days ago
[Misc] Add BNB support to GLM4-V model
github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago
[Bug]: Fail to use beamsearch with llm.chat
github.com/vllm-project/vllm - gystar opened this issue 14 days ago
github.com/vllm-project/vllm - gystar opened this issue 14 days ago
[torch.compile] store inductor compiled Python file
github.com/vllm-project/vllm - youkaichao opened this pull request 14 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 14 days ago
[Feature]: Multi-Token Prediction (MTP)
github.com/vllm-project/vllm - casper-hansen opened this issue 14 days ago
github.com/vllm-project/vllm - casper-hansen opened this issue 14 days ago
[Bug]: Vllm can't load models from unsloth-bnb-4bit
github.com/vllm-project/vllm - kaiguy23 opened this issue 14 days ago
github.com/vllm-project/vllm - kaiguy23 opened this issue 14 days ago
[Bug]: Multi-Node Online Inference on TPUs Failing
github.com/vllm-project/vllm - BabyChouSr opened this issue 14 days ago
github.com/vllm-project/vllm - BabyChouSr opened this issue 14 days ago
[Bug]: AMD GPU docker image build No matching distribution found for torch==2.6.0.dev20241113+rocm6.2
github.com/vllm-project/vllm - samos123 opened this issue 14 days ago
github.com/vllm-project/vllm - samos123 opened this issue 14 days ago
[Bug]: Slow huggingface weights download. Sequential download
github.com/vllm-project/vllm - NikolaBorisov opened this issue 15 days ago
github.com/vllm-project/vllm - NikolaBorisov opened this issue 15 days ago
[Docs] Fix broken link in SECURITY.md
github.com/vllm-project/vllm - russellb opened this pull request 15 days ago
github.com/vllm-project/vllm - russellb opened this pull request 15 days ago
[RFC]: Distribute LoRA adapters across deployment
github.com/vllm-project/vllm - joerunde opened this issue 15 days ago
github.com/vllm-project/vllm - joerunde opened this issue 15 days ago
[AMD][CI/Build][Bugfix] updated pytorch stale wheel path by using stable wheel
github.com/vllm-project/vllm - hongxiayang opened this pull request 15 days ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 15 days ago
[core] clean up executor class hierarchy between v1 and v0
github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago
[Model] Port deepseek-vl2 processor and remove `deepseek_vl2` dependency
github.com/vllm-project/vllm - Isotr0py opened this pull request 15 days ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 15 days ago
[Bug]: Unable to serve Qwen2-audio in V1
github.com/vllm-project/vllm - superfan89 opened this issue 15 days ago
github.com/vllm-project/vllm - superfan89 opened this issue 15 days ago
[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor
github.com/vllm-project/vllm - kzawora-intel opened this pull request 15 days ago
github.com/vllm-project/vllm - kzawora-intel opened this pull request 15 days ago
[misc] fix cross-node TP
github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
github.com/vllm-project/vllm - cennn opened this pull request 15 days ago
github.com/vllm-project/vllm - cennn opened this pull request 15 days ago
[Performance]: Very low generation throughput on CPU
github.com/vllm-project/vllm - SLIBM opened this issue 15 days ago
github.com/vllm-project/vllm - SLIBM opened this issue 15 days ago
[BUGFIX] Move scores to float32 in case of running xgrammar on cpu
github.com/vllm-project/vllm - madamczykhabana opened this pull request 15 days ago
github.com/vllm-project/vllm - madamczykhabana opened this pull request 15 days ago
[WIP] Multimodal model support for V1 TPU
github.com/vllm-project/vllm - mgoin opened this pull request 15 days ago
github.com/vllm-project/vllm - mgoin opened this pull request 15 days ago
[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node
github.com/vllm-project/vllm - drikster80 opened this issue 15 days ago
github.com/vllm-project/vllm - drikster80 opened this issue 15 days ago
[Bug]: Close feature gaps when using xgrammar for structured output
github.com/vllm-project/vllm - russellb opened this issue 16 days ago
github.com/vllm-project/vllm - russellb opened this issue 16 days ago
[V1] Add V1 support of Qwen2-VL
github.com/vllm-project/vllm - ywang96 opened this pull request 16 days ago
github.com/vllm-project/vllm - ywang96 opened this pull request 16 days ago
[core] further polish memory profiling
github.com/vllm-project/vllm - youkaichao opened this pull request 16 days ago
github.com/vllm-project/vllm - youkaichao opened this pull request 16 days ago
[Bug]: XGrammar-based CFG decoding degraded after 0.6.5
github.com/vllm-project/vllm - AlbertoCastelo opened this issue 16 days ago
github.com/vllm-project/vllm - AlbertoCastelo opened this issue 16 days ago
[Misc] Update to Transformers 4.48
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 16 days ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 16 days ago
[BUILD] Add VLLM_BUILD_EXT to control custom op build
github.com/vllm-project/vllm - MengqingCao opened this pull request 16 days ago
github.com/vllm-project/vllm - MengqingCao opened this pull request 16 days ago
[V1] Collect env var for usage stats
github.com/vllm-project/vllm - simon-mo opened this pull request 16 days ago
github.com/vllm-project/vllm - simon-mo opened this pull request 16 days ago
[Bugfix] Fix test_long_context.py and activation kernels
github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago
benchmark_serving support --served-model-name param
github.com/vllm-project/vllm - gujingit opened this pull request 16 days ago
github.com/vllm-project/vllm - gujingit opened this pull request 16 days ago
[Misc]add modules_to_not_convert attribute to gptq series
github.com/vllm-project/vllm - 1096125073 opened this pull request 16 days ago
github.com/vllm-project/vllm - 1096125073 opened this pull request 16 days ago
[Misc][LoRA] Improve the readability of LoRA error messages during loading
github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago
[Performance]: Question about TTFT for ngram speculative decoding
github.com/vllm-project/vllm - ynwang007 opened this issue 16 days ago
github.com/vllm-project/vllm - ynwang007 opened this issue 16 days ago
[New Model]: internlm3-8b-instruct
github.com/vllm-project/vllm - engchina opened this issue 16 days ago
github.com/vllm-project/vllm - engchina opened this issue 16 days ago
[Bug]: Discrepancies in the llama layer forward function between meta-llama, transformers and vLLM.
github.com/vllm-project/vllm - mcubuktepe opened this issue 16 days ago
github.com/vllm-project/vllm - mcubuktepe opened this issue 16 days ago
Use CUDA 12.4 as default for release and nightly wheels
github.com/vllm-project/vllm - mgoin opened this pull request 16 days ago
github.com/vllm-project/vllm - mgoin opened this pull request 16 days ago
Add: Support for Sparse24Bitmask Compressed Models
github.com/vllm-project/vllm - rahul-tuli opened this pull request 17 days ago
github.com/vllm-project/vllm - rahul-tuli opened this pull request 17 days ago
[Bug]: Corrupted responses for Llama-3.2-3B-Instruct with v0.6.6.post1
github.com/vllm-project/vllm - bsatzger opened this issue 17 days ago
github.com/vllm-project/vllm - bsatzger opened this issue 17 days ago
[Bug]: whisper example issue?
github.com/vllm-project/vllm - silvacarl2 opened this issue 17 days ago
github.com/vllm-project/vllm - silvacarl2 opened this issue 17 days ago
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync
github.com/vllm-project/vllm - youngkent opened this pull request 17 days ago
github.com/vllm-project/vllm - youngkent opened this pull request 17 days ago
[Kernel] Flash Attention 3 Support
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 17 days ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 17 days ago
[Bug]: config format not found in llama family model
github.com/vllm-project/vllm - angerhang opened this issue 17 days ago
github.com/vllm-project/vllm - angerhang opened this issue 17 days ago
[Bugfix] Fix _get_lora_device for HQQ marlin
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 17 days ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 17 days ago
Various cosmetic/comment fixes
github.com/vllm-project/vllm - mgoin opened this pull request 17 days ago
github.com/vllm-project/vllm - mgoin opened this pull request 17 days ago
Allow hip sources to be directly included when compiling for rocm.
github.com/vllm-project/vllm - tvirolai-amd opened this pull request 17 days ago
github.com/vllm-project/vllm - tvirolai-amd opened this pull request 17 days ago
[V1][WIP] Add KV cache group dimension to block table
github.com/vllm-project/vllm - heheda12345 opened this pull request 17 days ago
github.com/vllm-project/vllm - heheda12345 opened this pull request 17 days ago
[Usage]: Token Embeddings from LLMs/VLMs
github.com/vllm-project/vllm - conceptofmind opened this issue 17 days ago
github.com/vllm-project/vllm - conceptofmind opened this issue 17 days ago