Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Tracking issue] [Help wanted]: Deprecate BlockManagerV1
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
[Performance]: Profile & optimize the BlockManagerV2
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support
github.com/vllm-project/vllm - comaniac opened this pull request 6 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 6 months ago
Disable cuda version check in vllm-openai image
github.com/vllm-project/vllm - zhaoyang-star opened this pull request 6 months ago
github.com/vllm-project/vllm - zhaoyang-star opened this pull request 6 months ago
[Doc] change installation version to 0.4.2
github.com/vllm-project/vllm - dimaioksha opened this pull request 6 months ago
github.com/vllm-project/vllm - dimaioksha opened this pull request 6 months ago
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations)
github.com/vllm-project/vllm - mgoin opened this pull request 6 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 6 months ago
[Doc]Replace deprecated flag in readme
github.com/vllm-project/vllm - ronensc opened this pull request 6 months ago
github.com/vllm-project/vllm - ronensc opened this pull request 6 months ago
[Kernel] Initial Activation Quantization Support
github.com/vllm-project/vllm - dsikka opened this pull request 6 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 6 months ago
[Doc]: Documentation disappeared
github.com/vllm-project/vllm - markovalexander opened this issue 6 months ago
github.com/vllm-project/vllm - markovalexander opened this issue 6 months ago
--kv_cache_dtype fp8 should not check for nvcc
github.com/vllm-project/vllm - wizche opened this issue 6 months ago
github.com/vllm-project/vllm - wizche opened this issue 6 months ago
[Bug]: Failing to find LoRA adapter for MultiLoRA Inference
github.com/vllm-project/vllm - RonanKMcGovern opened this issue 6 months ago
github.com/vllm-project/vllm - RonanKMcGovern opened this issue 6 months ago
[Core][Model runner refactoring 1/N] Refactor attn metadata term
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
[Bug]: For RDNA3 (navi31; gfx1100) VLLM_USE_TRITON_FLASH_ATTN=0 currently must be forced
github.com/vllm-project/vllm - lhl opened this issue 6 months ago
github.com/vllm-project/vllm - lhl opened this issue 6 months ago
[Core] Add retention policy code for processing requests
github.com/vllm-project/vllm - James4Ever0 opened this pull request 6 months ago
github.com/vllm-project/vllm - James4Ever0 opened this pull request 6 months ago
[Bug]: `truncate_prompt_tokens` in SamplingParams only available for openai entrypoints, not for offline vLLM engine
github.com/vllm-project/vllm - YuWang916 opened this issue 6 months ago
github.com/vllm-project/vllm - YuWang916 opened this issue 6 months ago
[Kernel] Initial Activation Quantization Support
github.com/vllm-project/vllm - dsikka opened this pull request 6 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 6 months ago
[Doc] Add a note about MAX_THREADS and NVCC_THREADS
github.com/vllm-project/vllm - Tabrizian opened this pull request 6 months ago
github.com/vllm-project/vllm - Tabrizian opened this pull request 6 months ago
[Doc] update(example model): for OpenAI compatible serving
github.com/vllm-project/vllm - fpaupier opened this pull request 6 months ago
github.com/vllm-project/vllm - fpaupier opened this pull request 6 months ago
[New Model]: how to debug the values of tensor while adding a new model
github.com/vllm-project/vllm - cillinzhang opened this issue 6 months ago
github.com/vllm-project/vllm - cillinzhang opened this issue 6 months ago
[Feature]: Load new LoRA adapters on request
github.com/vllm-project/vllm - markovalexander opened this issue 6 months ago
github.com/vllm-project/vllm - markovalexander opened this issue 6 months ago
[Installation]: Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher
github.com/vllm-project/vllm - fxavier-maf opened this issue 6 months ago
github.com/vllm-project/vllm - fxavier-maf opened this issue 6 months ago
[Bug]: Token accuracy not as expected ad also got Junk values for starcoderbase-15b model with continuous batching
github.com/vllm-project/vllm - sreka opened this issue 6 months ago
github.com/vllm-project/vllm - sreka opened this issue 6 months ago
[Usage]: It seems that vllm doesn't perform well under high concurrency
github.com/vllm-project/vllm - syGOAT opened this issue 6 months ago
github.com/vllm-project/vllm - syGOAT opened this issue 6 months ago
[Usage]: How to modify the default system prompt?
github.com/vllm-project/vllm - lee0v0 opened this issue 6 months ago
github.com/vllm-project/vllm - lee0v0 opened this issue 6 months ago
[Bug]: The first token generated after prefill occupies the GPU memory but virtual block
github.com/vllm-project/vllm - March-H opened this issue 6 months ago
github.com/vllm-project/vllm - March-H opened this issue 6 months ago
[Misc][Typo] type annotation fix
github.com/vllm-project/vllm - HarryWu99 opened this pull request 6 months ago
github.com/vllm-project/vllm - HarryWu99 opened this pull request 6 months ago
Unable to find Punica extension issue during source code installation
github.com/vllm-project/vllm - kingljl opened this pull request 6 months ago
github.com/vllm-project/vllm - kingljl opened this pull request 6 months ago
fix_tokenizer_snapshot_download_bug
github.com/vllm-project/vllm - kingljl opened this pull request 6 months ago
github.com/vllm-project/vllm - kingljl opened this pull request 6 months ago
[Usage]: Can decoding phase benefits from prefix-caching?
github.com/vllm-project/vllm - Juelianqvq opened this issue 6 months ago
github.com/vllm-project/vllm - Juelianqvq opened this issue 6 months ago
[Misc] allow user to specify where to write gpu_p2p_access_cache through VLLM_CACHE_DIR env var
github.com/vllm-project/vllm - sfc-gh-zhwang opened this pull request 6 months ago
github.com/vllm-project/vllm - sfc-gh-zhwang opened this pull request 6 months ago
[Usage]: Do I need to specify chat-template for Qwen model?
github.com/vllm-project/vllm - xudong2019 opened this issue 6 months ago
github.com/vllm-project/vllm - xudong2019 opened this issue 6 months ago
[Bugfix][Kernel] allow non-power-of-2 head sizes for prefix prefill with alibi
github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago
github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago
[Core][Dsitributed] fix del exception
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
[Misc] remove debug log for chunk detected
github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago
github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago
[Bug]: with_pynccl_for_all_reduce causes GPU OOM
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 6 months ago
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 6 months ago
[Core][Distributed] add cleanup code for model parallel
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
[Feature]: Sliding window attention in a fraction of layers
github.com/vllm-project/vllm - zhuzilin opened this issue 6 months ago
github.com/vllm-project/vllm - zhuzilin opened this issue 6 months ago
[WIP] Add IPEX Paged Att.
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 6 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 6 months ago
[Bugfix][Minor] Make ignore_eos effective
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 6 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 6 months ago
[Frontend] [Core] Tensorizer: support dynamic `num_readers`, update version
github.com/vllm-project/vllm - alpayariyak opened this pull request 6 months ago
github.com/vllm-project/vllm - alpayariyak opened this pull request 6 months ago
[Core]Refactor gptq_marlin ops
github.com/vllm-project/vllm - jikunshang opened this pull request 6 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request 6 months ago
is there a way we can add Hugging face PEFT model for VLLM to load?
github.com/vllm-project/vllm - rsong0606 opened this issue 6 months ago
github.com/vllm-project/vllm - rsong0606 opened this issue 6 months ago
[Metrics] add more metrics
github.com/vllm-project/vllm - HarryWu99 opened this pull request 6 months ago
github.com/vllm-project/vllm - HarryWu99 opened this pull request 6 months ago
[Bugfix][Kernel] Fix compute_type for MoE kernel
github.com/vllm-project/vllm - WoosukKwon opened this pull request 6 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 6 months ago
[Usage]: How do you setup vllm to work in k8s/openshift cluster
github.com/vllm-project/vllm - jayteaftw opened this issue 6 months ago
github.com/vllm-project/vllm - jayteaftw opened this issue 6 months ago
[Distributed] refactor pynccl to support multilpe TP groups
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
[Bug]: JSON Schema for multiple function choices
github.com/vllm-project/vllm - jamestwhedbee opened this issue 6 months ago
github.com/vllm-project/vllm - jamestwhedbee opened this issue 6 months ago
[Kernel] Update fused_moe tuning script for FP8
github.com/vllm-project/vllm - pcmoritz opened this pull request 6 months ago
github.com/vllm-project/vllm - pcmoritz opened this pull request 6 months ago
[Doc] add visualization for multi-stage dockerfile
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 6 months ago
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 6 months ago
[Usage] [Bug]: run inference on mistralai/Mixtral-8x7B-Instruct-v0.1 with tensor parallel > 1 (currently not working)
github.com/vllm-project/vllm - jayteaftw opened this issue 6 months ago
github.com/vllm-project/vllm - jayteaftw opened this issue 6 months ago
[Misc] Upgrade to `torch==2.3.0`
github.com/vllm-project/vllm - mgoin opened this pull request 6 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 6 months ago
[Misc] fix typo in block manager
github.com/vllm-project/vllm - Juelianqvq opened this pull request 6 months ago
github.com/vllm-project/vllm - Juelianqvq opened this pull request 6 months ago
[Usage]: How to start vllm with llava using docker compose
github.com/vllm-project/vllm - athenawisdoms opened this issue 6 months ago
github.com/vllm-project/vllm - athenawisdoms opened this issue 6 months ago
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
[mypy][6/N] Fix all the core subdirectory typing
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago
[Usage]: why vllm takes as much ram as possible ?
github.com/vllm-project/vllm - xudong2019 opened this issue 6 months ago
github.com/vllm-project/vllm - xudong2019 opened this issue 6 months ago
[Bug]: 1-card deployment and 2-card deployment yield inconsistent output logits.
github.com/vllm-project/vllm - thisissum opened this issue 6 months ago
github.com/vllm-project/vllm - thisissum opened this issue 6 months ago
[CORE] Allow loading of quantized lm_head (ParallelLMHead)
github.com/vllm-project/vllm - Qubitium opened this pull request 6 months ago
github.com/vllm-project/vllm - Qubitium opened this pull request 6 months ago
[Performance]: Empirical Measurement of how to broadcast python object in vLLM
github.com/vllm-project/vllm - youkaichao opened this issue 6 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 6 months ago
[Bug]: OpenAI API request doesn't go through with 'guided_json'
github.com/vllm-project/vllm - Tejaswgupta opened this issue 6 months ago
github.com/vllm-project/vllm - Tejaswgupta opened this issue 6 months ago
[Bug]: Prefix caching does not work on Pascal GPUs
github.com/vllm-project/vllm - sasha0552 opened this issue 6 months ago
github.com/vllm-project/vllm - sasha0552 opened this issue 6 months ago
[Misc]: need "first good issue"
github.com/vllm-project/vllm - HarryWu99 opened this issue 6 months ago
github.com/vllm-project/vllm - HarryWu99 opened this issue 6 months ago
[Feature]: option to return hidden states
github.com/vllm-project/vllm - zhenlan0426 opened this issue 6 months ago
github.com/vllm-project/vllm - zhenlan0426 opened this issue 6 months ago
[Usage]: How to disable multi lora to avoid using punica ? Or is the punica being the only choice?
github.com/vllm-project/vllm - laoda513 opened this issue 6 months ago
github.com/vllm-project/vllm - laoda513 opened this issue 6 months ago
[Bug]: Initialising LLM on multiple GPUs stuck at "Started a local Ray instance"
github.com/vllm-project/vllm - timbmg opened this issue 6 months ago
github.com/vllm-project/vllm - timbmg opened this issue 6 months ago
[Bug]: Engine iteration timed out. This should never happen!
github.com/vllm-project/vllm - itechbear opened this issue 6 months ago
github.com/vllm-project/vllm - itechbear opened this issue 6 months ago
[Usage]: Not enough memory when run a 33b model float16 on 2 x L40 GPU (48G)
github.com/vllm-project/vllm - garyyang85 opened this issue 6 months ago
github.com/vllm-project/vllm - garyyang85 opened this issue 6 months ago
[CI/Build] Move `test_utils.py` to `tests/utils.py`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Usage]: If I use Offline way to launch the model, how can I get the metrics?
github.com/vllm-project/vllm - amumu96 opened this issue 6 months ago
github.com/vllm-project/vllm - amumu96 opened this issue 6 months ago
[Core] Centralize GPU Worker construction
github.com/vllm-project/vllm - njhill opened this pull request 6 months ago
github.com/vllm-project/vllm - njhill opened this pull request 6 months ago
[Bug]: cannot load model back due to [does not appear to have a file named config.json]
github.com/vllm-project/vllm - yananchen1989 opened this issue 6 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue 6 months ago
[WIP][Hardware][Intel] support intel builds with intel c++
github.com/vllm-project/vllm - kannon92 opened this pull request 6 months ago
github.com/vllm-project/vllm - kannon92 opened this pull request 6 months ago
[Core] Pipeline Parallel Support
github.com/vllm-project/vllm - andoorve opened this pull request 6 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 6 months ago
[Doc]: Offline Inference Distributed Broken for TP
github.com/vllm-project/vllm - sam-h-bean opened this issue 6 months ago
github.com/vllm-project/vllm - sam-h-bean opened this issue 6 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs
github.com/vllm-project/vllm - jasonacox opened this pull request 6 months ago
github.com/vllm-project/vllm - jasonacox opened this pull request 6 months ago
[RFC]: environment variable management in vllm
github.com/vllm-project/vllm - youkaichao opened this issue 6 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 6 months ago
[kernel] fix sliding window in prefix prefill Triton kernel
github.com/vllm-project/vllm - mmoskal opened this pull request 6 months ago
github.com/vllm-project/vllm - mmoskal opened this pull request 6 months ago
[Bug]: Can not run openapi server with cpu backend
github.com/vllm-project/vllm - kannon92 opened this issue 6 months ago
github.com/vllm-project/vllm - kannon92 opened this issue 6 months ago
[Frontend] add tok/s speed metric to llm class when using tqdm
github.com/vllm-project/vllm - MahmoudAshraf97 opened this pull request 6 months ago
github.com/vllm-project/vllm - MahmoudAshraf97 opened this pull request 6 months ago
[Bug]: TypeError in XFormersMetadata
github.com/vllm-project/vllm - skonto opened this issue 6 months ago
github.com/vllm-project/vllm - skonto opened this issue 6 months ago
[Model]: Support for InternVL-Chat-V1-5
github.com/vllm-project/vllm - Iven2132 opened this issue 6 months ago
github.com/vllm-project/vllm - Iven2132 opened this issue 6 months ago
[Bug]: Running llama2-7b on H20, Floating point exception (core dumped) appears on float16
github.com/vllm-project/vllm - yk1012664593 opened this issue 6 months ago
github.com/vllm-project/vllm - yk1012664593 opened this issue 6 months ago
[Usage]: I doubt about the meaning of --enable-prefix-caching
github.com/vllm-project/vllm - chenchunhui97 opened this issue 6 months ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue 6 months ago
[Bug]: vllm 0.4.1 and transformers 4.40.1 have conflicting dependencies on pydantic
github.com/vllm-project/vllm - AbbottKilig opened this issue 6 months ago
github.com/vllm-project/vllm - AbbottKilig opened this issue 6 months ago
[Bug]: Chunked prefill doesn't seem to work when --kv-cache-dtype fp8
github.com/vllm-project/vllm - rkooo567 opened this issue 6 months ago
github.com/vllm-project/vllm - rkooo567 opened this issue 6 months ago
[Model] Phi-3 4k sliding window temp. fix
github.com/vllm-project/vllm - caiom opened this pull request 6 months ago
github.com/vllm-project/vllm - caiom opened this pull request 6 months ago
[Speculative decoding] Support target-model logprobs
github.com/vllm-project/vllm - cadedaniel opened this pull request 6 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 6 months ago
[Bug]: Phi3 still not supported
github.com/vllm-project/vllm - andrew-vold opened this issue 6 months ago
github.com/vllm-project/vllm - andrew-vold opened this issue 6 months ago
✨ support local cache for models
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 6 months ago
github.com/vllm-project/vllm - prashantgupta24 opened this pull request 6 months ago
[Installation]: GitHub access required during install for vllm >=0.4.1 (for cu12-libnccl.so.2.18.1)
github.com/vllm-project/vllm - mattmalcher opened this issue 6 months ago
github.com/vllm-project/vllm - mattmalcher opened this issue 6 months ago
[Feature]: GPTQ/AWQ quantization is not fully optimized yet. The speed can be slower than non-quantized models.
github.com/vllm-project/vllm - ShubhamVerma16 opened this issue 6 months ago
github.com/vllm-project/vllm - ShubhamVerma16 opened this issue 6 months ago
[Feature]: AssertionError: Speculative decoding not yet supported for RayGPU backend.
github.com/vllm-project/vllm - cocoza4 opened this issue 6 months ago
github.com/vllm-project/vllm - cocoza4 opened this issue 6 months ago
[Core] Add `multiproc_worker_utils` for multiprocessing-based workers
github.com/vllm-project/vllm - njhill opened this pull request 6 months ago
github.com/vllm-project/vllm - njhill opened this pull request 6 months ago
[Frontend] Add APIs for dynamic LoRA models load/unload
github.com/vllm-project/vllm - graceleeis opened this pull request 6 months ago
github.com/vllm-project/vllm - graceleeis opened this pull request 6 months ago