vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: Concurrent requests are skipped when enable --enable-chunked-prefill

github.com/vllm-project/vllm - xiangxu-google opened this issue 3 months ago

[Bug]: 8-way tensor parallelism w/ Punica broken on Ubuntu 20.04 (effectively Azure) since v0.5

github.com/vllm-project/vllm - nightflight-dk opened this issue 3 months ago

[Bugfix] Fix token padding for chameleon

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Bug]: python3: /project/lib/Analysis/Allocation.cpp:43: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed. Aborted (core dumped)

github.com/vllm-project/vllm - linzm1007 opened this issue 3 months ago

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. and RuntimeError: Triton Error [CUDA]: device kernel image is invalid

github.com/vllm-project/vllm - xyfZzz opened this issue 3 months ago

[Feature]: GLM4 function call is supported ?

github.com/vllm-project/vllm - RyanOvO opened this issue 3 months ago

[Bug]: flash_attn # prefix-enabled attention case forward code maybe error?

github.com/vllm-project/vllm - yangchengtest opened this issue 3 months ago

[Bug]: temperature=0 does not lead to Greedy Sampling

github.com/vllm-project/vllm - UbeCc opened this issue 3 months ago

[Usage]: Can spec_decode and repetition_penalty be used together？

github.com/vllm-project/vllm - Time-Limit opened this issue 3 months ago

[Bug]: custom docker Error

github.com/vllm-project/vllm - ciaoyizhen opened this issue 3 months ago

[Bug]: llama-3.1-70b model shard_memory objects to clean

github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago

Update logits processor with tensor caching

github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 3 months ago

[SpecDecoding] Update MLPSpeculator CI tests to use smaller model

github.com/vllm-project/vllm - njhill opened this pull request 3 months ago

[Bug]: RuntimeError: GET was unable to find an engine to execute this computation for llava-next model

github.com/vllm-project/vllm - fdas3213 opened this issue 3 months ago

[Core] Tweaks to model runner/input builder developer APIs

github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago

[Build/CI] Update run-amd-test.sh. Enable Docker Hub login.

github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 3 months ago

[CI/Build] Build wheel in release model when sccache is not enabled

github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago

[CORE] support for *.pt type prompt adapters

github.com/vllm-project/vllm - prashantgupta24 opened this pull request 3 months ago

[Bugfix] fix flashinfer cudagraph capture for PP

github.com/vllm-project/vllm - SolitaryThinker opened this pull request 3 months ago

[Bug]: Engine crashes when max_tokens undefined

github.com/vllm-project/vllm - w013nad opened this issue 3 months ago

[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists

github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago

[Bug]: VLM Streaming does not output CompletionUsage

github.com/vllm-project/vllm - epark001 opened this issue 3 months ago

[Bug]: Flash-attn on-GPU advance step optimization bug with spec decode on LLama 405B

github.com/vllm-project/vllm - alugowski opened this issue 3 months ago

[CI] Add smoke test for non-uniform AutoFP8 quantization

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bug]: multi-GPU inference (tensor_parallel_size=2) fails on Intel GPUs

github.com/vllm-project/vllm - raffenet opened this issue 3 months ago

[Bug]: vLLM 0.5.3 is getting stuck at LLAMA 3.1 405B FP8 model loading

github.com/vllm-project/vllm - lanking520 opened this issue 3 months ago

[Model] Meta Llama 3.1 vllm use

github.com/vllm-project/vllm - go-noah opened this issue 3 months ago

[Model] Meta Llama 3.1 Know Issues & FAQ

github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago

[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate.

github.com/vllm-project/vllm - AllenDou opened this pull request 3 months ago

[Misc]: Source code compilation vllm failed Installation Environment: A40-48G python=3.11 cuda-12.1

github.com/vllm-project/vllm - Micla-SHL opened this issue 3 months ago

[Bug]: CUDA OOM error when loading another model after exiting the first one.

github.com/vllm-project/vllm - R-C101 opened this issue 3 months ago

[Docs][ROCm] Detailed instructions to build from source

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Usage]: Does Prefix Caching currently support offloading to the CPU?

github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago

Merge branch 'main' of https://github.com/vllm-project/vllm

github.com/vllm-project/vllm - balcklive opened this pull request 3 months ago

Bump version to v0.5.3

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

support ignore patterns in model loader

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

[misc] only tqdm for first rank

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[misc] add start loading models for users information

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bug]: The num_generation_tokens in stats is incorrect with speculative decoding

github.com/vllm-project/vllm - sighingnow opened this issue 3 months ago

[Bug] model:DeepSeek-V2-Chat-0628 bug: ChildProcessError: worker died

github.com/vllm-project/vllm - zhangfan-algo opened this issue 3 months ago

[Kernels] Add fp8 support to `reshape_and_cache_flash`

github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago

[Misc] Enable chunked prefill by default for long context models

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Bugfix] Fix null `modules_to_not_convert` in FBGEMM Fp8 quantization

github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago

[Draft] [Speculative decoding] Use SPMD worker to reduce control plane communication

github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago

[Misc] Fix attribute error when accessing compiled_dag

github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago

[Misc] Increase default chunk size to 2048

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[ci][build] add back vim in docker

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Usage]: How to disable logging

github.com/vllm-project/vllm - brucewlee opened this issue 3 months ago

[Misc] Remove deprecation warning for beam search

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

Llama3 Inference Optimization

github.com/vllm-project/vllm - sfc-gh-reyazda opened this pull request 3 months ago

[Misc] Add ignored layers for `fp8` quantization

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[ci] Use different sccache bucket for CUDA 11.8 wheel build

github.com/vllm-project/vllm - khluu opened this pull request 3 months ago

[Bug]: DeepSeek-Coder-V2-Lite-Instruct with CPU : Torch not compiled with CUDA enabled

github.com/vllm-project/vllm - papipsycho opened this issue 3 months ago

[DOC] Correct warning about performance

github.com/vllm-project/vllm - casper-hansen opened this pull request 3 months ago

[Doc]: Outdated docs on AutoAWQ

github.com/vllm-project/vllm - casper-hansen opened this issue 3 months ago

[Frontend] Add Usage data in each chunk for chat_serving. #6540

github.com/vllm-project/vllm - yecohn opened this pull request 3 months ago

[Kernel] Add dynamic asymmetric quantization kernel

github.com/vllm-project/vllm - ProExpertProg opened this pull request 3 months ago

[Usage]: VLLM + RAY encountered an error while executing the integrity check script.

github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago

[Bugfix][Kernel] Use int64_t for indices in fp8 quant kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago

[Usage]: How to fix the batch size whiling decoding？

github.com/vllm-project/vllm - allan0703 opened this issue 3 months ago

[Bug]: wheel size exceeds 200MB

github.com/vllm-project/vllm - nopepper opened this issue 3 months ago

[Bug]: In SamplingParams, setting n to a large value (e.g., 512) never finishes

github.com/vllm-project/vllm - RylanSchaeffer opened this issue 3 months ago

[Bugfix] StatLoggers: cache spec decode metrics when they get collected.

github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago

[Bug]: PaliGemma serving

github.com/vllm-project/vllm - arseniybelkov opened this issue 3 months ago

[Bug] [SpecDecode] Speculative metrics very unlikely to get logged

github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago

[Usage]: What do max_num_seqs and max_model_len do

github.com/vllm-project/vllm - highheart opened this issue 3 months ago

[Bug]: vLLM failing on AWS Inferentia (inf2)

github.com/vllm-project/vllm - EshamAaqib opened this issue 3 months ago

[Installation]: Can not install vllm-0.5.2 on cuda-11.8

github.com/vllm-project/vllm - WrRan opened this issue 3 months ago

[Frontend] Add `add_special_tokens` parameter to `CompletionRequest`

github.com/vllm-project/vllm - jsato8094 opened this pull request 3 months ago

[Bug]: vllm serve hang at `Using model weights format ['*.safetensors']` when using tp

github.com/vllm-project/vllm - coye01 opened this issue 3 months ago

[VLM][Model] Support image input for Chameleon

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Bug]: Is vllm support function call mode?

github.com/vllm-project/vllm - FanZhang91 opened this issue 3 months ago

[Frontend] Represent tokens with identifiable strings

github.com/vllm-project/vllm - ezliu opened this pull request 3 months ago

[Bugfix] Fix `vocab_size` field access in `llava_next.py`

github.com/vllm-project/vllm - jaywonchung opened this pull request 3 months ago

[Performance]: Llava runs with small batch size and # of GPU blocks

github.com/vllm-project/vllm - jaywonchung opened this issue 3 months ago

[Installation]: ERROR: No matching distribution found for torch==2.3.1

github.com/vllm-project/vllm - asifali22 opened this issue 3 months ago

[Model] Refactor and decouple phi3v image embedding

github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago

[Feature]: support reward model API

github.com/vllm-project/vllm - catqaq opened this issue 3 months ago

[Misc] Remove abused noqa

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Misc] Add a wrapper for torch.inference_mode

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Performance]: Flashinfer backend's improvement is marginal compared to FlashAttention backend for long context Qwen2-72b-instruct-128k

github.com/vllm-project/vllm - ehuaa opened this issue 3 months ago

[Feature]: Support DeepSeek-V2 - MI300x

github.com/vllm-project/vllm - ferrybaltimore opened this issue 3 months ago

[Feature]: 4D Attention Mask

github.com/vllm-project/vllm - littletomatodonkey opened this issue 3 months ago

[Bug]: No available block found in 60 second in shm

github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago

[Core][VLM] Support image embeddings as input

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Kernel][Core] Add AWQ support to the Marlin kernel

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago

[Model] Support Nemotron

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Performance]: Multi-node Pipeline Parallel double bandwidth, no change in performance

github.com/vllm-project/vllm - drikster80 opened this issue 3 months ago

[ Bugfix ] Fix AutoFP8 fp8 marlin

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[WIP] Fp8 marlin grouped

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bug]: Phi-3-mini does not work when using Ray

github.com/vllm-project/vllm - baughmann opened this issue 3 months ago

[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

Increase supported token window for using LoRA Adapter with mistralai/Mistral-Nemo-Instruct-2407

github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago

[Feature]: MultiModal LLM with vector API

github.com/vllm-project/vllm - qZhang88 opened this issue 3 months ago

[Bug]: Error when loading mistral and gemma model using VLLM docker

github.com/vllm-project/vllm - Adevils opened this issue 3 months ago

[Frontend] Move chat utils

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Usage]:Can vllm use a method similar to device_map in transformers ?

github.com/vllm-project/vllm - orderer0001 opened this issue 3 months ago

[Misc] Manage HTTP connections in one place

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[build] add ib so that multi-node support with infiniband can be supported out-of-the-box

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Docs] Update PP docs

github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago