Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: Not able to run LLama3 LoRA with --fully-sharded-loras
github.com/vllm-project/vllm - xyang16 opened this issue about 1 month ago
github.com/vllm-project/vllm - xyang16 opened this issue about 1 month ago
Rahul quant merged
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
[Kernel] Add CUTLASS sparse support, heuristics, and torch operators
github.com/vllm-project/vllm - Faraz9877 opened this pull request about 1 month ago
github.com/vllm-project/vllm - Faraz9877 opened this pull request about 1 month ago
[Perf] Reduce peak memory usage of llama
github.com/vllm-project/vllm - andoorve opened this pull request about 1 month ago
github.com/vllm-project/vllm - andoorve opened this pull request about 1 month ago
[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model
github.com/vllm-project/vllm - amakaido28 opened this issue about 1 month ago
github.com/vllm-project/vllm - amakaido28 opened this issue about 1 month ago
[Bug]: different garbage output of same prompt when inferred with single sequence vs concurrent requests on vllm openai server , temp =0. (mixed batching in longrope))
github.com/vllm-project/vllm - bhupendrathore opened this issue about 1 month ago
github.com/vllm-project/vllm - bhupendrathore opened this issue about 1 month ago
[Kernel] Add CUTLASS sparse support with argument sweep, heuristics, and torch operators
github.com/vllm-project/vllm - Faraz9877 opened this pull request about 1 month ago
github.com/vllm-project/vllm - Faraz9877 opened this pull request about 1 month ago
[bugfix] Fix static asymmetric quantization case
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
[Tool parsing] Improve / correct mistral tool parsing
github.com/vllm-project/vllm - patrickvonplaten opened this pull request about 1 month ago
github.com/vllm-project/vllm - patrickvonplaten opened this pull request about 1 month ago
[Docs] Publish meetup slides
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Feature] enable host memory for kv cache
github.com/vllm-project/vllm - YZP17121579 opened this pull request about 1 month ago
github.com/vllm-project/vllm - YZP17121579 opened this pull request about 1 month ago
Rs 24 sparse
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
[Misc] Add uninitialized params tracking for `AutoWeightsLoader`
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval
github.com/vllm-project/vllm - wchen61 opened this issue about 1 month ago
github.com/vllm-project/vllm - wchen61 opened this issue about 1 month ago
[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
github.com/vllm-project/vllm - victorserbu2709 opened this issue about 1 month ago
github.com/vllm-project/vllm - victorserbu2709 opened this issue about 1 month ago
[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it?
github.com/vllm-project/vllm - 1nlplearner opened this issue about 1 month ago
github.com/vllm-project/vllm - 1nlplearner opened this issue about 1 month ago
[Usage]: using open-webui with vLLM inference engine instead of ollama
github.com/vllm-project/vllm - wolfgangsmdt opened this issue about 1 month ago
github.com/vllm-project/vllm - wolfgangsmdt opened this issue about 1 month ago
[Installation]: Request to include vllm==0.6.2 for cuda 11.8
github.com/vllm-project/vllm - amew0 opened this issue about 1 month ago
github.com/vllm-project/vllm - amew0 opened this issue about 1 month ago
[Performance]: Results from the vLLM Blog article "How Speculative Decoding Boosts vLLM Performance by up to 2.8x" are unreproducible
github.com/vllm-project/vllm - yeonjoon-jung01 opened this issue about 1 month ago
github.com/vllm-project/vllm - yeonjoon-jung01 opened this issue about 1 month ago
[Hardware][Cambricon MLU] Add Cambricon MLU inference backend (#9649)
github.com/vllm-project/vllm - zonghuaxiansheng opened this pull request about 1 month ago
github.com/vllm-project/vllm - zonghuaxiansheng opened this pull request about 1 month ago
[Bug]: FusedMoE kernel performance depends on input prompt length while decoding
github.com/vllm-project/vllm - taegeonum opened this issue about 1 month ago
github.com/vllm-project/vllm - taegeonum opened this issue about 1 month ago
[Bugfix] Fix unable to load some models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Model] Support telechat2
github.com/vllm-project/vllm - shunxing12345 opened this pull request about 1 month ago
github.com/vllm-project/vllm - shunxing12345 opened this pull request about 1 month ago
[Misc] Change RedundantReshapesPass and FusionPass logging from info to debug
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 1 month ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 1 month ago
[TPU] Implement prefix caching for TPUs
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Bug]: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
github.com/vllm-project/vllm - yananchen1989 opened this issue about 1 month ago
github.com/vllm-project/vllm - yananchen1989 opened this issue about 1 month ago
[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3
github.com/vllm-project/vllm - piamo opened this issue about 1 month ago
github.com/vllm-project/vllm - piamo opened this issue about 1 month ago
[Feature]: Quark quantization format upstream to VLLM
github.com/vllm-project/vllm - kewang-xlnx opened this issue about 1 month ago
github.com/vllm-project/vllm - kewang-xlnx opened this issue about 1 month ago
[Bug]: Can't use yarn rope config for long context in Qwen2 model
github.com/vllm-project/vllm - FlyCarrot opened this issue about 1 month ago
github.com/vllm-project/vllm - FlyCarrot opened this issue about 1 month ago
[Bugfix] return zero point in static quantization in scaled_int8_quant
github.com/vllm-project/vllm - danieldk opened this pull request about 1 month ago
github.com/vllm-project/vllm - danieldk opened this pull request about 1 month ago
[Model] Add Support for Multimodal Granite Models
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 1 month ago
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 1 month ago
[Misc] Update benchmark to support image_url file or http
github.com/vllm-project/vllm - kakao-steve-ai opened this pull request about 1 month ago
github.com/vllm-project/vllm - kakao-steve-ai opened this pull request about 1 month ago
[Bug]: vllm serve works incorrect for (some) Vision LM models
github.com/vllm-project/vllm - Aktsvigun opened this issue about 1 month ago
github.com/vllm-project/vllm - Aktsvigun opened this issue about 1 month ago
[CI/Build] Make shellcheck happy
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Bug]: 因vllm的版本不同,启动的qwen2.5服务,对于相同的输入;0.6.1.post2 sse输出是正确的,但 0.6.3.post1是错误的?
github.com/vllm-project/vllm - mawenju203 opened this issue about 1 month ago
github.com/vllm-project/vllm - mawenju203 opened this issue about 1 month ago
Bump to compressed-tensors v0.8.0
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
Bump to `compressed-tensors` v0.8.0
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
[Core][Frontend] Add faster-outlines as guided decoding backend
github.com/vllm-project/vllm - unaidedelf8777 opened this pull request about 1 month ago
github.com/vllm-project/vllm - unaidedelf8777 opened this pull request about 1 month ago
[Bug]: Speculative Decoding + TP on Spec Worker + Chunked Prefill does not work.
github.com/vllm-project/vllm - andoorve opened this issue about 1 month ago
github.com/vllm-project/vllm - andoorve opened this issue about 1 month ago
[core][distributed] use tcp store directly
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[help wanted]: add QwenModel to ci tests
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
[torch.compile] PostGradPassManager, Inductor code caching fix, fix_functionalization pass refactor + tests
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
[V1] Fix CI tests on V1 engine
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
Revert "[ci][build] limit cmake version"
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[doc] improve debugging doc
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[Usage]: Adaptive Batching and number of concurrent requests
github.com/vllm-project/vllm - Leon-Sander opened this issue about 1 month ago
github.com/vllm-project/vllm - Leon-Sander opened this issue about 1 month ago
[V1] Enable Inductor when using piecewise CUDA graphs
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Feature]: Support for NVIDIA Unified memory
github.com/vllm-project/vllm - khayamgondal opened this issue about 1 month ago
github.com/vllm-project/vllm - khayamgondal opened this issue about 1 month ago
[doc] fix location of runllm widget
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[TPU] Use numpy to compute slot mapping
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Doc] Fix typo in arg_utils.py
github.com/vllm-project/vllm - xyang16 opened this pull request about 1 month ago
github.com/vllm-project/vllm - xyang16 opened this pull request about 1 month ago
[Bug]: qwen cannot be quantized in vllm
github.com/vllm-project/vllm - yananchen1989 opened this issue about 1 month ago
github.com/vllm-project/vllm - yananchen1989 opened this issue about 1 month ago
[Bugfix] Fix QwenModel argument
github.com/vllm-project/vllm - DamonFool opened this pull request about 1 month ago
github.com/vllm-project/vllm - DamonFool opened this pull request about 1 month ago
[Bug]: The throughput computation in metric.py seems wrong
github.com/vllm-project/vllm - Achazwl opened this issue about 1 month ago
github.com/vllm-project/vllm - Achazwl opened this issue about 1 month ago
[Feature]: 2:4 sparsity + w4a16 support
github.com/vllm-project/vllm - arunpatala opened this issue about 1 month ago
github.com/vllm-project/vllm - arunpatala opened this issue about 1 month ago
[Feature]: Is it possible for VLLM to support inference with dynamic activation sparsity?
github.com/vllm-project/vllm - jiangjiadi opened this issue about 1 month ago
github.com/vllm-project/vllm - jiangjiadi opened this issue about 1 month ago
[Usage]:Qwen2-VL not support Lora
github.com/vllm-project/vllm - menglrskr opened this issue about 1 month ago
github.com/vllm-project/vllm - menglrskr opened this issue about 1 month ago
[Usage]: How to Use a Public URL for Remote Access to a Deployed vLLM Model?
github.com/vllm-project/vllm - Nothern-ai opened this issue about 1 month ago
github.com/vllm-project/vllm - Nothern-ai opened this issue about 1 month ago
[Misc]Fix Idefics3Model argument
github.com/vllm-project/vllm - jeejeelee opened this pull request about 1 month ago
github.com/vllm-project/vllm - jeejeelee opened this pull request about 1 month ago
[Kernel][Hardware][AMD] Add support for GGUF quantization on ROCm
github.com/vllm-project/vllm - kliuae opened this pull request about 1 month ago
github.com/vllm-project/vllm - kliuae opened this pull request about 1 month ago
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Installation]: Install Gpu vllm got no module named triton
github.com/vllm-project/vllm - Serenagirl opened this issue about 1 month ago
github.com/vllm-project/vllm - Serenagirl opened this issue about 1 month ago
[Bug]: Deepseek V2 coder 236B awq error!
github.com/vllm-project/vllm - tohnee opened this issue about 1 month ago
github.com/vllm-project/vllm - tohnee opened this issue about 1 month ago
[misc] Layerwise profile updates
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 1 month ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 1 month ago
[V1] TPU Prototype
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 1 month ago
[Hardware] [HPU]add `mark_step` for hpu
github.com/vllm-project/vllm - jikunshang opened this pull request about 1 month ago
github.com/vllm-project/vllm - jikunshang opened this pull request about 1 month ago
[New Model]: 采用 Out-of-Tree Model Integration 方式注册新模型在启用多卡 Ray 模式下的注册信息丢失的问题
github.com/vllm-project/vllm - llery opened this issue about 1 month ago
github.com/vllm-project/vllm - llery opened this issue about 1 month ago
[Core] Reduce TTFT with concurrent partial prefills
github.com/vllm-project/vllm - joerunde opened this pull request about 1 month ago
github.com/vllm-project/vllm - joerunde opened this pull request about 1 month ago
[Bugfix] Fix for Spec model TP + Chunked Prefill
github.com/vllm-project/vllm - andoorve opened this pull request about 1 month ago
github.com/vllm-project/vllm - andoorve opened this pull request about 1 month ago
Making vLLM compatible with Mistral fp8 weights.
github.com/vllm-project/vllm - akllm opened this pull request about 1 month ago
github.com/vllm-project/vllm - akllm opened this pull request about 1 month ago
[V1] Enable custom ops with piecewise CUDA graphs
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions
github.com/vllm-project/vllm - imkero opened this pull request about 1 month ago
github.com/vllm-project/vllm - imkero opened this pull request about 1 month ago
[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
[Frontend][Core] Add Guidance backend for guided decoding
github.com/vllm-project/vllm - JC1DA opened this pull request about 1 month ago
github.com/vllm-project/vllm - JC1DA opened this pull request about 1 month ago
[6/N] pass whole config to inner model
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[Bugfix] bitsandbytes models fail to run pipeline parallel
github.com/vllm-project/vllm - HoangCongDuc opened this pull request about 1 month ago
github.com/vllm-project/vllm - HoangCongDuc opened this pull request about 1 month ago
[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling.
github.com/vllm-project/vllm - jeongin601 opened this pull request about 1 month ago
github.com/vllm-project/vllm - jeongin601 opened this pull request about 1 month ago
[Core] Loading model from S3 using RunAI Model Streamer as optional loader
github.com/vllm-project/vllm - omer-dayan opened this pull request about 1 month ago
github.com/vllm-project/vllm - omer-dayan opened this pull request about 1 month ago
[help wanted]: why cmake 3.31 breaks vllm and how to fix it
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
[Bug]: 500 Internal Server Error when calling v1/completions and v1/chat/completions with vllm/vllm-openai:v0.6.3.post1 on OpenShift
github.com/vllm-project/vllm - JohnWestlund opened this issue about 1 month ago
github.com/vllm-project/vllm - JohnWestlund opened this issue about 1 month ago
[Model] Support Qwen2 embeddings and use tags to select model tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored
github.com/vllm-project/vllm - chaunceyjiang opened this pull request about 1 month ago
github.com/vllm-project/vllm - chaunceyjiang opened this pull request about 1 month ago
[Installation]: error: identifier "__builtin_dynamic_object_size" is undefined
github.com/vllm-project/vllm - xiaoxiaosuaxuan opened this issue about 1 month ago
github.com/vllm-project/vllm - xiaoxiaosuaxuan opened this issue about 1 month ago
[Frontend] Add per-request number of cached token stats
github.com/vllm-project/vllm - zifeitong opened this pull request about 1 month ago
github.com/vllm-project/vllm - zifeitong opened this pull request about 1 month ago
[Feature]: BASE_URL environment variable
github.com/vllm-project/vllm - bjb19 opened this issue about 1 month ago
github.com/vllm-project/vllm - bjb19 opened this issue about 1 month ago
[Docs] Misc updates to TPU installation instructions
github.com/vllm-project/vllm - mikegre-google opened this pull request about 1 month ago
github.com/vllm-project/vllm - mikegre-google opened this pull request about 1 month ago
[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use
github.com/vllm-project/vllm - tjohnson31415 opened this pull request about 1 month ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request about 1 month ago
[Doc] Move PR template content to docs
github.com/vllm-project/vllm - russellb opened this pull request about 1 month ago
github.com/vllm-project/vllm - russellb opened this pull request about 1 month ago
[Bug]: Error in benchmark model with vllm backend for endpoint /v1/chat/completions
github.com/vllm-project/vllm - rabaja opened this issue about 1 month ago
github.com/vllm-project/vllm - rabaja opened this issue about 1 month ago
[Bug]: Unable to load Llama-3.1-70B-Instruct using either `vllm serve` or `vllm-openai` docker
github.com/vllm-project/vllm - SMAntony opened this issue about 1 month ago
github.com/vllm-project/vllm - SMAntony opened this issue about 1 month ago
[Bug]: FlashInfer throws error in nightly: Please set `use_tensor_cores=True` in BatchDecodeWithPagedKVCacheWrapper for group size 3
github.com/vllm-project/vllm - nathan-az opened this issue about 1 month ago
github.com/vllm-project/vllm - nathan-az opened this issue about 1 month ago
[Usage]: how can i get all logits of token?
github.com/vllm-project/vllm - joyyyhuang opened this issue about 1 month ago
github.com/vllm-project/vllm - joyyyhuang opened this issue about 1 month ago
[Bug]: H100 - Your GPU does not have native support for FP8 computation
github.com/vllm-project/vllm - ScOut3R opened this issue about 1 month ago
github.com/vllm-project/vllm - ScOut3R opened this issue about 1 month ago
[Bug]: Outlines w/ Mistral
github.com/vllm-project/vllm - matbee-eth opened this issue about 1 month ago
github.com/vllm-project/vllm - matbee-eth opened this issue about 1 month ago
[Feature]: Support for predicted outputs
github.com/vllm-project/vllm - flozi00 opened this issue about 1 month ago
github.com/vllm-project/vllm - flozi00 opened this issue about 1 month ago
[WIP] Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1
github.com/vllm-project/vllm - sroy745 opened this pull request about 1 month ago
github.com/vllm-project/vllm - sroy745 opened this pull request about 1 month ago
[V1] Add all_token_ids attribute to Request
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 month ago
Rename vllm.logging to vllm.logging_utils
github.com/vllm-project/vllm - flozi00 opened this pull request about 1 month ago
github.com/vllm-project/vllm - flozi00 opened this pull request about 1 month ago