Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: Concurrent requests are skipped when enable --enable-chunked-prefill
github.com/vllm-project/vllm - xiangxu-google opened this issue 3 months ago
github.com/vllm-project/vllm - xiangxu-google opened this issue 3 months ago
[Bug]: 8-way tensor parallelism w/ Punica broken on Ubuntu 20.04 (effectively Azure) since v0.5
github.com/vllm-project/vllm - nightflight-dk opened this issue 3 months ago
github.com/vllm-project/vllm - nightflight-dk opened this issue 3 months ago
[Bugfix] Fix token padding for chameleon
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
[Bug]: python3: /project/lib/Analysis/Allocation.cpp:43: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed. Aborted (core dumped)
github.com/vllm-project/vllm - linzm1007 opened this issue 3 months ago
github.com/vllm-project/vllm - linzm1007 opened this issue 3 months ago
[Feature]: GLM4 function call is supported ?
github.com/vllm-project/vllm - RyanOvO opened this issue 3 months ago
github.com/vllm-project/vllm - RyanOvO opened this issue 3 months ago
[Bug]: flash_attn # prefix-enabled attention case forward code maybe error?
github.com/vllm-project/vllm - yangchengtest opened this issue 3 months ago
github.com/vllm-project/vllm - yangchengtest opened this issue 3 months ago
[Bug]: temperature=0 does not lead to Greedy Sampling
github.com/vllm-project/vllm - UbeCc opened this issue 3 months ago
github.com/vllm-project/vllm - UbeCc opened this issue 3 months ago
[Usage]: Can spec_decode and repetition_penalty be used together?
github.com/vllm-project/vllm - Time-Limit opened this issue 3 months ago
github.com/vllm-project/vllm - Time-Limit opened this issue 3 months ago
[Bug]: llama-3.1-70b model shard_memory objects to clean
github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago
github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago
Update logits processor with tensor caching
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 3 months ago
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 3 months ago
[SpecDecoding] Update MLPSpeculator CI tests to use smaller model
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
[Bug]: RuntimeError: GET was unable to find an engine to execute this computation for llava-next model
github.com/vllm-project/vllm - fdas3213 opened this issue 3 months ago
github.com/vllm-project/vllm - fdas3213 opened this issue 3 months ago
[Core] Tweaks to model runner/input builder developer APIs
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login.
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 3 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 3 months ago
[CI/Build] Build wheel in release model when sccache is not enabled
github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago
[CORE] support for *.pt type prompt adapters
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 3 months ago
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 3 months ago
[Bugfix] fix flashinfer cudagraph capture for PP
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 3 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 3 months ago
[Bug]: Engine crashes when max_tokens undefined
github.com/vllm-project/vllm - w013nad opened this issue 3 months ago
github.com/vllm-project/vllm - w013nad opened this issue 3 months ago
[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists
github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago
[Bug]: VLM Streaming does not output CompletionUsage
github.com/vllm-project/vllm - epark001 opened this issue 3 months ago
github.com/vllm-project/vllm - epark001 opened this issue 3 months ago
[Bug]: Flash-attn on-GPU advance step optimization bug with spec decode on LLama 405B
github.com/vllm-project/vllm - alugowski opened this issue 3 months ago
github.com/vllm-project/vllm - alugowski opened this issue 3 months ago
[CI] Add smoke test for non-uniform AutoFP8 quantization
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[Bug]: multi-GPU inference (tensor_parallel_size=2) fails on Intel GPUs
github.com/vllm-project/vllm - raffenet opened this issue 3 months ago
github.com/vllm-project/vllm - raffenet opened this issue 3 months ago
[Bug]: vLLM 0.5.3 is getting stuck at LLAMA 3.1 405B FP8 model loading
github.com/vllm-project/vllm - lanking520 opened this issue 3 months ago
github.com/vllm-project/vllm - lanking520 opened this issue 3 months ago
[Model] Meta Llama 3.1 vllm use
github.com/vllm-project/vllm - go-noah opened this issue 3 months ago
github.com/vllm-project/vllm - go-noah opened this issue 3 months ago
[Model] Meta Llama 3.1 Know Issues & FAQ
github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago
[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate.
github.com/vllm-project/vllm - AllenDou opened this pull request 3 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 3 months ago
[Misc]: Source code compilation vllm failed Installation Environment: A40-48G python=3.11 cuda-12.1
github.com/vllm-project/vllm - Micla-SHL opened this issue 3 months ago
github.com/vllm-project/vllm - Micla-SHL opened this issue 3 months ago
[Bug]: CUDA OOM error when loading another model after exiting the first one.
github.com/vllm-project/vllm - R-C101 opened this issue 3 months ago
github.com/vllm-project/vllm - R-C101 opened this issue 3 months ago
[Docs][ROCm] Detailed instructions to build from source
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Usage]: Does Prefix Caching currently support offloading to the CPU?
github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago
github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago
Merge branch 'main' of https://github.com/vllm-project/vllm
github.com/vllm-project/vllm - balcklive opened this pull request 3 months ago
github.com/vllm-project/vllm - balcklive opened this pull request 3 months ago
Bump version to v0.5.3
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
support ignore patterns in model loader
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
[misc] only tqdm for first rank
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[misc] add start loading models for users information
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bug]: The num_generation_tokens in stats is incorrect with speculative decoding
github.com/vllm-project/vllm - sighingnow opened this issue 3 months ago
github.com/vllm-project/vllm - sighingnow opened this issue 3 months ago
[Bug] model:DeepSeek-V2-Chat-0628 bug: ChildProcessError: worker died
github.com/vllm-project/vllm - zhangfan-algo opened this issue 3 months ago
github.com/vllm-project/vllm - zhangfan-algo opened this issue 3 months ago
[Kernels] Add fp8 support to `reshape_and_cache_flash`
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
[Misc] Enable chunked prefill by default for long context models
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Bugfix] Fix null `modules_to_not_convert` in FBGEMM Fp8 quantization
github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago
github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago
[Draft] [Speculative decoding] Use SPMD worker to reduce control plane communication
github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 3 months ago
[Misc] Fix attribute error when accessing compiled_dag
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
[Misc] Increase default chunk size to 2048
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[ci][build] add back vim in docker
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Usage]: How to disable logging
github.com/vllm-project/vllm - brucewlee opened this issue 3 months ago
github.com/vllm-project/vllm - brucewlee opened this issue 3 months ago
[Misc] Remove deprecation warning for beam search
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
Llama3 Inference Optimization
github.com/vllm-project/vllm - sfc-gh-reyazda opened this pull request 3 months ago
github.com/vllm-project/vllm - sfc-gh-reyazda opened this pull request 3 months ago
[Misc] Add ignored layers for `fp8` quantization
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[ci] Use different sccache bucket for CUDA 11.8 wheel build
github.com/vllm-project/vllm - khluu opened this pull request 3 months ago
github.com/vllm-project/vllm - khluu opened this pull request 3 months ago
[Bug]: DeepSeek-Coder-V2-Lite-Instruct with CPU : Torch not compiled with CUDA enabled
github.com/vllm-project/vllm - papipsycho opened this issue 3 months ago
github.com/vllm-project/vllm - papipsycho opened this issue 3 months ago
[DOC] Correct warning about performance
github.com/vllm-project/vllm - casper-hansen opened this pull request 3 months ago
github.com/vllm-project/vllm - casper-hansen opened this pull request 3 months ago
[Doc]: Outdated docs on AutoAWQ
github.com/vllm-project/vllm - casper-hansen opened this issue 3 months ago
github.com/vllm-project/vllm - casper-hansen opened this issue 3 months ago
[Frontend] Add Usage data in each chunk for chat_serving. #6540
github.com/vllm-project/vllm - yecohn opened this pull request 3 months ago
github.com/vllm-project/vllm - yecohn opened this pull request 3 months ago
[Kernel] Add dynamic asymmetric quantization kernel
github.com/vllm-project/vllm - ProExpertProg opened this pull request 3 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 3 months ago
[Usage]: VLLM + RAY encountered an error while executing the integrity check script.
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
[Bugfix][Kernel] Use int64_t for indices in fp8 quant kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
[Usage]: How to fix the batch size whiling decoding?
github.com/vllm-project/vllm - allan0703 opened this issue 3 months ago
github.com/vllm-project/vllm - allan0703 opened this issue 3 months ago
[Bug]: wheel size exceeds 200MB
github.com/vllm-project/vllm - nopepper opened this issue 3 months ago
github.com/vllm-project/vllm - nopepper opened this issue 3 months ago
[Bug]: In SamplingParams, setting n to a large value (e.g., 512) never finishes
github.com/vllm-project/vllm - RylanSchaeffer opened this issue 3 months ago
github.com/vllm-project/vllm - RylanSchaeffer opened this issue 3 months ago
[Bugfix] StatLoggers: cache spec decode metrics when they get collected.
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Bug]: PaliGemma serving
github.com/vllm-project/vllm - arseniybelkov opened this issue 3 months ago
github.com/vllm-project/vllm - arseniybelkov opened this issue 3 months ago
[Bug] [SpecDecode] Speculative metrics very unlikely to get logged
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
[Usage]: What do max_num_seqs and max_model_len do
github.com/vllm-project/vllm - highheart opened this issue 3 months ago
github.com/vllm-project/vllm - highheart opened this issue 3 months ago
[Bug]: vLLM failing on AWS Inferentia (inf2)
github.com/vllm-project/vllm - EshamAaqib opened this issue 3 months ago
github.com/vllm-project/vllm - EshamAaqib opened this issue 3 months ago
[Installation]: Can not install vllm-0.5.2 on cuda-11.8
github.com/vllm-project/vllm - WrRan opened this issue 3 months ago
github.com/vllm-project/vllm - WrRan opened this issue 3 months ago
[Frontend] Add `add_special_tokens` parameter to `CompletionRequest`
github.com/vllm-project/vllm - jsato8094 opened this pull request 3 months ago
github.com/vllm-project/vllm - jsato8094 opened this pull request 3 months ago
[Bug]: vllm serve hang at `Using model weights format ['*.safetensors']` when using tp
github.com/vllm-project/vllm - coye01 opened this issue 3 months ago
github.com/vllm-project/vllm - coye01 opened this issue 3 months ago
[VLM][Model] Support image input for Chameleon
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
[Bug]: Is vllm support function call mode?
github.com/vllm-project/vllm - FanZhang91 opened this issue 3 months ago
github.com/vllm-project/vllm - FanZhang91 opened this issue 3 months ago
[Frontend] Represent tokens with identifiable strings
github.com/vllm-project/vllm - ezliu opened this pull request 3 months ago
github.com/vllm-project/vllm - ezliu opened this pull request 3 months ago
[Bugfix] Fix `vocab_size` field access in `llava_next.py`
github.com/vllm-project/vllm - jaywonchung opened this pull request 3 months ago
github.com/vllm-project/vllm - jaywonchung opened this pull request 3 months ago
[Performance]: Llava runs with small batch size and # of GPU blocks
github.com/vllm-project/vllm - jaywonchung opened this issue 3 months ago
github.com/vllm-project/vllm - jaywonchung opened this issue 3 months ago
[Installation]: ERROR: No matching distribution found for torch==2.3.1
github.com/vllm-project/vllm - asifali22 opened this issue 3 months ago
github.com/vllm-project/vllm - asifali22 opened this issue 3 months ago
[Model] Refactor and decouple phi3v image embedding
github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago
[Feature]: support reward model API
github.com/vllm-project/vllm - catqaq opened this issue 3 months ago
github.com/vllm-project/vllm - catqaq opened this issue 3 months ago
[Misc] Remove abused noqa
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Misc] Add a wrapper for torch.inference_mode
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Feature]: Support DeepSeek-V2 - MI300x
github.com/vllm-project/vllm - ferrybaltimore opened this issue 3 months ago
github.com/vllm-project/vllm - ferrybaltimore opened this issue 3 months ago
[Feature]: 4D Attention Mask
github.com/vllm-project/vllm - littletomatodonkey opened this issue 3 months ago
github.com/vllm-project/vllm - littletomatodonkey opened this issue 3 months ago
[Bug]: No available block found in 60 second in shm
github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago
github.com/vllm-project/vllm - wjj19950828 opened this issue 3 months ago
[Core][VLM] Support image embeddings as input
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
[Kernel][Core] Add AWQ support to the Marlin kernel
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
[Performance]: Multi-node Pipeline Parallel double bandwidth, no change in performance
github.com/vllm-project/vllm - drikster80 opened this issue 3 months ago
github.com/vllm-project/vllm - drikster80 opened this issue 3 months ago
[ Bugfix ] Fix AutoFP8 fp8 marlin
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bug]: Phi-3-mini does not work when using Ray
github.com/vllm-project/vllm - baughmann opened this issue 3 months ago
github.com/vllm-project/vllm - baughmann opened this issue 3 months ago
[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
Increase supported token window for using LoRA Adapter with mistralai/Mistral-Nemo-Instruct-2407
github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago
github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago
[Feature]: MultiModal LLM with vector API
github.com/vllm-project/vllm - qZhang88 opened this issue 3 months ago
github.com/vllm-project/vllm - qZhang88 opened this issue 3 months ago
[Bug]: Error when loading mistral and gemma model using VLLM docker
github.com/vllm-project/vllm - Adevils opened this issue 3 months ago
github.com/vllm-project/vllm - Adevils opened this issue 3 months ago
[Frontend] Move chat utils
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Usage]:Can vllm use a method similar to device_map in transformers ?
github.com/vllm-project/vllm - orderer0001 opened this issue 3 months ago
github.com/vllm-project/vllm - orderer0001 opened this issue 3 months ago
[Misc] Manage HTTP connections in one place
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[build] add ib so that multi-node support with infiniband can be supported out-of-the-box
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago