vLLM issues | Ecosyste.ms: OpenCollective

[TPU] Update TPU CI to use torchxla nightly on 20250122

github.com/vllm-project/vllm - lsy323 opened this pull request 9 days ago

[V1] Add `uncache_blocks`

github.com/vllm-project/vllm - comaniac opened this pull request 9 days ago

[Frontend] Generate valid tool call IDs when using `tokenizer-mode=mistral`

github.com/vllm-project/vllm - rafvasq opened this pull request 9 days ago

add interleave sliding window by us FusedSDPA

github.com/vllm-project/vllm - libinta opened this pull request 10 days ago

[Usage]: trying to use generation_tokens_total and prompt_tokens_total to get total tokens in the current batch

github.com/vllm-project/vllm - annapendleton opened this issue 10 days ago

Fixing the LoRA CI test.

github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 10 days ago

[Misc]: RoPE vs Sliding Windows

github.com/vllm-project/vllm - ccruttjr opened this issue 10 days ago

[Core] Fix an isort error from pre-commit

github.com/vllm-project/vllm - russellb opened this pull request 10 days ago

[Docs] Document vulnerability disclosure process

github.com/vllm-project/vllm - russellb opened this pull request 10 days ago

[Core] Optimizing cross-attention `QKVParallelLinear` computation

github.com/vllm-project/vllm - NickLucche opened this pull request 10 days ago

[Feature]: Use `uv` in pre-commit

github.com/vllm-project/vllm - NickLucche opened this issue 10 days ago

[Bug]: Speculative decoding does not work

github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago

[Usage]: Is it possible to speed up the generation speed by adding another video card?

github.com/vllm-project/vllm - JohnConnor123 opened this issue 10 days ago

[Usage]: The problems about the communication synchronization in disaggregated prefilling

github.com/vllm-project/vllm - midway2019 opened this issue 10 days ago

[Misc] Improve the readability of BNB error messages

github.com/vllm-project/vllm - jeejeelee opened this pull request 10 days ago

[Misc] Fix the error in the tip for the --lora-modules parameter

github.com/vllm-project/vllm - WangErXiao opened this pull request 10 days ago

[Doc] Add docs for prompt replacement

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 10 days ago

[do-not-merge][perf-benchmark] cleanup unused docker images/containers

github.com/vllm-project/vllm - khluu opened this pull request 10 days ago

[Feature][Spec Decode] Simplify the use of Eagle Spec Decode

github.com/vllm-project/vllm - ShangmingCai opened this pull request 10 days ago

[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral

github.com/vllm-project/vllm - zhenwei-intel opened this pull request 10 days ago

[V1][Frontend] Coalesce bunched `RequestOutput`s

github.com/vllm-project/vllm - njhill opened this pull request 10 days ago

[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels

github.com/vllm-project/vllm - fenghuizhang opened this pull request 10 days ago

[Benchmark] More accurate TPOT calc in `benchmark_serving.py`

github.com/vllm-project/vllm - njhill opened this pull request 10 days ago

[Frontend][V1] Online serving performance improvements

github.com/vllm-project/vllm - njhill opened this pull request 10 days ago

[Core] tokens in queue metric

github.com/vllm-project/vllm - annapendleton opened this pull request 10 days ago

[Core] Support `reset_prefix_cache`

github.com/vllm-project/vllm - comaniac opened this pull request 10 days ago

[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD

github.com/vllm-project/vllm - rasmith opened this pull request 11 days ago

[Feature]: Support pass in user-specified backend to torch dynamo piecewise compilation

github.com/vllm-project/vllm - maxyanghu opened this issue 11 days ago

[Usage]: deepseek v3 can not set tensor_parallel_size=16 and pipeline-parallel-size=2 on L20 #12256 Open

github.com/vllm-project/vllm - xwz-ol opened this issue 11 days ago

[torch.compile] decouple compile sizes and cudagraph sizes

github.com/vllm-project/vllm - youkaichao opened this pull request 11 days ago

[Frontend] Set server's maximum number of generated tokens using generation_config.json

github.com/vllm-project/vllm - mhendrey opened this pull request 11 days ago

[Docs] Update FP8 KV Cache documentation

github.com/vllm-project/vllm - mgoin opened this pull request 11 days ago

[Bug]: ValueError: Model architectures ['LlamaForCausalLM'] failed to be inspected. Please check the logs for more details.

github.com/vllm-project/vllm - walker-ai opened this issue 12 days ago

[Model] Add Qwen2 PRM model support

github.com/vllm-project/vllm - Isotr0py opened this pull request 12 days ago

[Bug]: `minItems` and `maxItems` json schema constraint fails on `xgrammar` and did not fallback to `outlines`

github.com/vllm-project/vllm - Jason-CKY opened this issue 12 days ago

[Usage]: Does vLLM support deploying the speculative model on a second device?

github.com/vllm-project/vllm - CharlesRiggins opened this issue 12 days ago

[Bug]: Dynamically load lora got wrong output

github.com/vllm-project/vllm - cxz91493 opened this issue 12 days ago

[New Model]: Qwen2.5-Math-PRM-7B, Qwen2.5-Math-PRM-72B

github.com/vllm-project/vllm - HaitaoWuTJU opened this issue 12 days ago

[Bug]: Inconsistent data received and sent using PyNcclPipe

github.com/vllm-project/vllm - fanfanaaaa opened this issue 12 days ago

[Bugfix] Fix incorrect types in LayerwiseProfileResults

github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago

[DOC] Add missing docstring for additional args in LLMEngine.add_request()

github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago

[DOC] Fix typo in SingleStepOutputProcessor docstring and assert message

github.com/vllm-project/vllm - terrytangyuan opened this pull request 12 days ago

[V1][Spec Decode] Ngram Spec Decode

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 12 days ago

[Bugfix] fix race condition that leads to wrong order of token returned

github.com/vllm-project/vllm - joennlae opened this pull request 13 days ago

[torch.compile] fix sym_tensor_indices

github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago

[misc] add cuda runtime version to usage data

github.com/vllm-project/vllm - youkaichao opened this pull request 13 days ago

[Bug]: CUDA initialization error with vLLM 0.5.4 and PyTorch 2.4.0+cu121

github.com/vllm-project/vllm - TaoShuchang opened this issue 13 days ago

[Bugfix] Fix multi-modal processors for transformers 4.48

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 14 days ago

[Misc] Add Gemma2 GGUF support

github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago

[Kernel] add triton fused moe kernel for gptq/awq

github.com/vllm-project/vllm - jinzhen-lin opened this pull request 14 days ago

[Misc] Add BNB support to GLM4-V model

github.com/vllm-project/vllm - Isotr0py opened this pull request 14 days ago

[Bug]: Fail to use beamsearch with llm.chat

github.com/vllm-project/vllm - gystar opened this issue 14 days ago

[torch.compile] store inductor compiled Python file

github.com/vllm-project/vllm - youkaichao opened this pull request 14 days ago

[Feature]: Multi-Token Prediction (MTP)

github.com/vllm-project/vllm - casper-hansen opened this issue 14 days ago

[Bug]: Vllm can't load models from unsloth-bnb-4bit

github.com/vllm-project/vllm - kaiguy23 opened this issue 14 days ago

[Bug]: Multi-Node Online Inference on TPUs Failing

github.com/vllm-project/vllm - BabyChouSr opened this issue 14 days ago

[Bug]: AMD GPU docker image build No matching distribution found for torch==2.6.0.dev20241113+rocm6.2

github.com/vllm-project/vllm - samos123 opened this issue 14 days ago

[Bug]: Slow huggingface weights download. Sequential download

github.com/vllm-project/vllm - NikolaBorisov opened this issue 15 days ago

[Docs] Fix broken link in SECURITY.md

github.com/vllm-project/vllm - russellb opened this pull request 15 days ago

[RFC]: Distribute LoRA adapters across deployment

github.com/vllm-project/vllm - joerunde opened this issue 15 days ago

[AMD][CI/Build][Bugfix] updated pytorch stale wheel path by using stable wheel

github.com/vllm-project/vllm - hongxiayang opened this pull request 15 days ago

[core] clean up executor class hierarchy between v1 and v0

github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago

[Model] Port deepseek-vl2 processor and remove `deepseek_vl2` dependency

github.com/vllm-project/vllm - Isotr0py opened this pull request 15 days ago

[Bug]: Unable to serve Qwen2-audio in V1

github.com/vllm-project/vllm - superfan89 opened this issue 15 days ago

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor

github.com/vllm-project/vllm - kzawora-intel opened this pull request 15 days ago

[misc] fix cross-node TP

github.com/vllm-project/vllm - youkaichao opened this pull request 15 days ago

ValueError: The prompt (total length 25938) is too long to fit into the model (context length 4096). Make sure that `max_model_len` is no smaller than the number of text tokens plus multimodal tokens.

github.com/vllm-project/vllm - sysu19351160 opened this issue 15 days ago

[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution

github.com/vllm-project/vllm - cennn opened this pull request 15 days ago

[Performance]: Very low generation throughput on CPU

github.com/vllm-project/vllm - SLIBM opened this issue 15 days ago

[BUGFIX] Move scores to float32 in case of running xgrammar on cpu

github.com/vllm-project/vllm - madamczykhabana opened this pull request 15 days ago

[New Model]: NV-Embed-v2

github.com/vllm-project/vllm - Hypothesis-Z opened this issue 15 days ago

[WIP] Multimodal model support for V1 TPU

github.com/vllm-project/vllm - mgoin opened this pull request 15 days ago

[Bug]: Multi-Node Tensor-Parallel in #11256 forces TP > cuda_device_count per node

github.com/vllm-project/vllm - drikster80 opened this issue 15 days ago

[Bug]: Close feature gaps when using xgrammar for structured output

github.com/vllm-project/vllm - russellb opened this issue 16 days ago

[V1] Add V1 support of Qwen2-VL

github.com/vllm-project/vllm - ywang96 opened this pull request 16 days ago

[core] further polish memory profiling

github.com/vllm-project/vllm - youkaichao opened this pull request 16 days ago

[Bug]: XGrammar-based CFG decoding degraded after 0.6.5

github.com/vllm-project/vllm - AlbertoCastelo opened this issue 16 days ago

[Misc] Update to Transformers 4.48

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 16 days ago

[BUILD] Add VLLM_BUILD_EXT to control custom op build

github.com/vllm-project/vllm - MengqingCao opened this pull request 16 days ago

[V1] Collect env var for usage stats

github.com/vllm-project/vllm - simon-mo opened this pull request 16 days ago

[Bugfix] Fix test_long_context.py and activation kernels

github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago

benchmark_serving support --served-model-name param

github.com/vllm-project/vllm - gujingit opened this pull request 16 days ago

[Misc]add modules_to_not_convert attribute to gptq series

github.com/vllm-project/vllm - 1096125073 opened this pull request 16 days ago

[Misc][LoRA] Improve the readability of LoRA error messages during loading

github.com/vllm-project/vllm - jeejeelee opened this pull request 16 days ago

[Performance]: Question about TTFT for ngram speculative decoding

github.com/vllm-project/vllm - ynwang007 opened this issue 16 days ago

[New Model]: internlm3-8b-instruct

github.com/vllm-project/vllm - engchina opened this issue 16 days ago

[Bug]: Discrepancies in the llama layer forward function between meta-llama, transformers and vLLM.

github.com/vllm-project/vllm - mcubuktepe opened this issue 16 days ago

Use CUDA 12.4 as default for release and nightly wheels

github.com/vllm-project/vllm - mgoin opened this pull request 16 days ago

Add: Support for Sparse24Bitmask Compressed Models

github.com/vllm-project/vllm - rahul-tuli opened this pull request 17 days ago

[Bug]: Corrupted responses for Llama-3.2-3B-Instruct with v0.6.6.post1

github.com/vllm-project/vllm - bsatzger opened this issue 17 days ago

[Bug]: whisper example issue?

github.com/vllm-project/vllm - silvacarl2 opened this issue 17 days ago

[V1][Perf] Reduce scheduling overhead in model runner after cuda sync

github.com/vllm-project/vllm - youngkent opened this pull request 17 days ago

[Kernel] Flash Attention 3 Support

github.com/vllm-project/vllm - LucasWilkinson opened this pull request 17 days ago

[Bug]: config format not found in llama family model

github.com/vllm-project/vllm - angerhang opened this issue 17 days ago

[Bugfix] Fix _get_lora_device for HQQ marlin

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 17 days ago

Various cosmetic/comment fixes

github.com/vllm-project/vllm - mgoin opened this pull request 17 days ago

[delete]

github.com/vllm-project/vllm - Aktsvigun opened this pull request 17 days ago

Allow hip sources to be directly included when compiling for rocm.

github.com/vllm-project/vllm - tvirolai-amd opened this pull request 17 days ago

[V1][WIP] Add KV cache group dimension to block table

github.com/vllm-project/vllm - heheda12345 opened this pull request 17 days ago

[Usage]: Token Embeddings from LLMs/VLMs

github.com/vllm-project/vllm - conceptofmind opened this issue 17 days ago