Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Doc]: Marlin does not support weight_bits = uint4b8
github.com/vllm-project/vllm - xiaotukuaipao12318 opened this issue about 2 months ago
github.com/vllm-project/vllm - xiaotukuaipao12318 opened this issue about 2 months ago
[Performance]: The impact of CPU on vLLM performance is significant.
github.com/vllm-project/vllm - skylee-01 opened this issue about 2 months ago
github.com/vllm-project/vllm - skylee-01 opened this issue about 2 months ago
[Bugfix]disable cuda graph when max_decode_seq_len is close to max_seq_len_to_capture
github.com/vllm-project/vllm - Ximingwang-09 opened this pull request about 2 months ago
github.com/vllm-project/vllm - Ximingwang-09 opened this pull request about 2 months ago
[Bug]: Using the same startup command, deepseek-v2-lite succeeds while deepseek-v2 236b encounters an error.
github.com/vllm-project/vllm - fengyang95 opened this issue about 2 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue about 2 months ago
[Bug]: Met a error when deploying an AWQ model on H20.
github.com/vllm-project/vllm - medwang1 opened this issue about 2 months ago
github.com/vllm-project/vllm - medwang1 opened this issue about 2 months ago
[Misc] add iteration_tokens metric
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 2 months ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 2 months ago
[Usage]: 125m parameter model is also showing CUDA: Out of memory error in a Nvidia16GB 4060
github.com/vllm-project/vllm - shubh9m opened this issue about 2 months ago
github.com/vllm-project/vllm - shubh9m opened this issue about 2 months ago
[Misc] GPTQ Activation Ordering
github.com/vllm-project/vllm - kylesayrs opened this pull request about 2 months ago
github.com/vllm-project/vllm - kylesayrs opened this pull request about 2 months ago
[CI/Build] Use python 3.12 in cuda image
github.com/vllm-project/vllm - joerunde opened this pull request about 2 months ago
github.com/vllm-project/vllm - joerunde opened this pull request about 2 months ago
Adding Cascade Infer to FlashInfer
github.com/vllm-project/vllm - raywanb opened this pull request about 2 months ago
github.com/vllm-project/vllm - raywanb opened this pull request about 2 months ago
[MISC] Consolidate FP8 kv-cache tests
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 2 months ago
[Core][WIP] MPLLMEngine with async streaming (depends on 8090)
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
➕ add peft to common requirements
github.com/vllm-project/vllm - prashantgupta24 opened this pull request about 2 months ago
github.com/vllm-project/vllm - prashantgupta24 opened this pull request about 2 months ago
[Usage]: Do not understand why Smaller max_num_batched_tokens achieves better ITL
github.com/vllm-project/vllm - Kausal-Lei opened this issue about 2 months ago
github.com/vllm-project/vllm - Kausal-Lei opened this issue about 2 months ago
[Performance] Enable chunked prefill and prefix caching together
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
[Bugfix] Fix weight loading for the unfused pathway
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
fix: Add the +empty tag to the version only when the VLLM_TARGET_DEVICE envvar was explicitly set to "empty"
github.com/vllm-project/vllm - tomeras91 opened this pull request about 2 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request about 2 months ago
[Doc]: `LLM.chat()` docstring incorrectly suggests multiple chats can be generated in one call
github.com/vllm-project/vllm - capnrefsmmat opened this issue about 2 months ago
github.com/vllm-project/vllm - capnrefsmmat opened this issue about 2 months ago
[Bug]: Loading GPTQ-quantized GPTBigCode fails in weight_loader_v2 of qptq_marlin
github.com/vllm-project/vllm - maxdebayser opened this issue about 2 months ago
github.com/vllm-project/vllm - maxdebayser opened this issue about 2 months ago
[Bug]: Mismatch in TTFT count and number of successful requests completed
github.com/vllm-project/vllm - jerin-scalers-ai opened this issue about 2 months ago
github.com/vllm-project/vllm - jerin-scalers-ai opened this issue about 2 months ago
[Bug]: JAMBA 1.5 - Beam Search Returns a few characters then stops early
github.com/vllm-project/vllm - evannorstrand-mp opened this issue about 2 months ago
github.com/vllm-project/vllm - evannorstrand-mp opened this issue about 2 months ago
[Bugfix] Fix bug in detokenizer.py
github.com/vllm-project/vllm - cafeii opened this pull request about 2 months ago
github.com/vllm-project/vllm - cafeii opened this pull request about 2 months ago
[Usage]: How to output logprob for each possiable token about classification or determin task?
github.com/vllm-project/vllm - HuXiLiFeng opened this issue about 2 months ago
github.com/vllm-project/vllm - HuXiLiFeng opened this issue about 2 months ago
[Bug]: The error encountered when deploying the MiniCPM-2B model in a CPU environment using the VLLM framework
github.com/vllm-project/vllm - liuzhipengchd opened this issue about 2 months ago
github.com/vllm-project/vllm - liuzhipengchd opened this issue about 2 months ago
[Bugfix] Fix bug in detokenizer.py
github.com/vllm-project/vllm - cafeii opened this pull request about 2 months ago
github.com/vllm-project/vllm - cafeii opened this pull request about 2 months ago
Iboiko/flatpa blocksnumber
github.com/vllm-project/vllm - iboiko-habana opened this pull request about 2 months ago
github.com/vllm-project/vllm - iboiko-habana opened this pull request about 2 months ago
[Bug]: Docker image for 0.5.4 does not include package timm==0.9.10 to run MiniCPMV
github.com/vllm-project/vllm - bjornjee opened this issue about 2 months ago
github.com/vllm-project/vllm - bjornjee opened this issue about 2 months ago
[Bugfix] remove post_layernorm in siglip
github.com/vllm-project/vllm - wnma3mz opened this pull request about 2 months ago
github.com/vllm-project/vllm - wnma3mz opened this pull request about 2 months ago
[Usage]: Can vLLM handle multi-turn and multi-instance at the same time?
github.com/vllm-project/vllm - devjun7 opened this issue about 2 months ago
github.com/vllm-project/vllm - devjun7 opened this issue about 2 months ago
[Performance]: Too slow when serving for large number of prompts.
github.com/vllm-project/vllm - VincentXWD opened this issue about 2 months ago
github.com/vllm-project/vllm - VincentXWD opened this issue about 2 months ago
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env
github.com/vllm-project/vllm - haitwang-cloud opened this pull request about 2 months ago
github.com/vllm-project/vllm - haitwang-cloud opened this pull request about 2 months ago
[New Model]: ValueError: Model architectures ['UltravoxModel'] are not supported for now.
github.com/vllm-project/vllm - Nishant-kirito opened this issue about 2 months ago
github.com/vllm-project/vllm - Nishant-kirito opened this issue about 2 months ago
[Bug]: ValueError: Queue <multiprocessing.queues.Queue object at 0x7f5703d2d0f0> is closed;zipfile.BadZipFile: Bad magic number for file header
github.com/vllm-project/vllm - Jiangchenglin521 opened this issue about 2 months ago
github.com/vllm-project/vllm - Jiangchenglin521 opened this issue about 2 months ago
[Bug]: When use `guided choice` feature, vllm.engine.async_llm_engine.AsyncEngineDeadError
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
[Installation]: Using Image to build from source get error
github.com/vllm-project/vllm - Mingbo-Lee opened this issue about 2 months ago
github.com/vllm-project/vllm - Mingbo-Lee opened this issue about 2 months ago
[Frontend] Multimodal support in offline chat
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Bug]: InternVL2-26B tensor_parallel_size=4, AssertionError: 25 is not divisible by 4
github.com/vllm-project/vllm - SovereignRemedy opened this issue about 2 months ago
github.com/vllm-project/vllm - SovereignRemedy opened this issue about 2 months ago
[Bug]: Unable to serve minicpm-v2.6 with GGUF quantization
github.com/vllm-project/vllm - Sakura4036 opened this issue about 2 months ago
github.com/vllm-project/vllm - Sakura4036 opened this issue about 2 months ago
[Bug]: vllm cpu installation build from source error
github.com/vllm-project/vllm - park12sj opened this issue about 2 months ago
github.com/vllm-project/vllm - park12sj opened this issue about 2 months ago
[Bug]: dtype float16 Failure to use enable-chunked-prefill
github.com/vllm-project/vllm - warlockedward opened this issue about 2 months ago
github.com/vllm-project/vllm - warlockedward opened this issue about 2 months ago
[Core][Bugfix][Perf] Refactor Server to Avoid `AsyncLLMEngine`
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Bug]: vLLM 0.5.5 and FlashInfer0.1.6
github.com/vllm-project/vllm - wlwqq opened this issue about 2 months ago
github.com/vllm-project/vllm - wlwqq opened this issue about 2 months ago
[Performance]: Llama 3 70B; vLLM does not scale beyond TP=4
github.com/vllm-project/vllm - DreamGenX opened this issue about 2 months ago
github.com/vllm-project/vllm - DreamGenX opened this issue about 2 months ago
[Feature]: Chat Completion with Parallel Function Calling
github.com/vllm-project/vllm - KevinZeng08 opened this issue about 2 months ago
github.com/vllm-project/vllm - KevinZeng08 opened this issue about 2 months ago
[Bug]: when tensor-parallel-size>1,Stuck
github.com/vllm-project/vllm - wiluen opened this issue about 2 months ago
github.com/vllm-project/vllm - wiluen opened this issue about 2 months ago
[Performance]: TTFT increases linearly with the number of batched tokens
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
[Usage]: How to stop vllm serving properly?
github.com/vllm-project/vllm - phisinger opened this issue about 2 months ago
github.com/vllm-project/vllm - phisinger opened this issue about 2 months ago
[Installation]: building CPU docker image crashes my machine
github.com/vllm-project/vllm - khaerensml6 opened this issue about 2 months ago
github.com/vllm-project/vllm - khaerensml6 opened this issue about 2 months ago
[Model] LoRA with lm_head fully trained
github.com/vllm-project/vllm - sergeykochetkov opened this pull request about 2 months ago
github.com/vllm-project/vllm - sergeykochetkov opened this pull request about 2 months ago
[Bug]: RuntimeError: CUDA error: invalid argument
github.com/vllm-project/vllm - fengyang95 opened this issue about 2 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue about 2 months ago
[New Model]: quantized Qwen2 MoE models
github.com/vllm-project/vllm - BrenchCC opened this issue about 2 months ago
github.com/vllm-project/vllm - BrenchCC opened this issue about 2 months ago
[Misc] Clean up RoPE forward_native
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Feature]: Faster guided decoding for pre-defined output
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
[Feature]: Support multi-node serving on Kubernetes
github.com/vllm-project/vllm - linnlh opened this issue about 2 months ago
github.com/vllm-project/vllm - linnlh opened this issue about 2 months ago
[Bug]: Persistent OutOfMemoryError error when using speculative decoding
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
Add smoothquant support
github.com/vllm-project/vllm - ehartford opened this issue about 2 months ago
github.com/vllm-project/vllm - ehartford opened this issue about 2 months ago
[Bug]: SpeculativeDecoding is outputting nonsense words
github.com/vllm-project/vllm - kerthcet opened this issue about 2 months ago
github.com/vllm-project/vllm - kerthcet opened this issue about 2 months ago
[New Model]: FM9GForCausalLM
github.com/vllm-project/vllm - Aiwenqiuyu opened this issue about 2 months ago
github.com/vllm-project/vllm - Aiwenqiuyu opened this issue about 2 months ago
[Usage]: Does VLLM support starting multiple cards using mpirun? Want to bind different CPUs to each card.
github.com/vllm-project/vllm - xiabo123 opened this issue about 2 months ago
github.com/vllm-project/vllm - xiabo123 opened this issue about 2 months ago
[Bug]: ValueError: could not broadcast input array from shape (513,) into shape (512,)
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
[Feature]: Beam Search with Temperature > 0
github.com/vllm-project/vllm - ekurtulus opened this issue about 2 months ago
github.com/vllm-project/vllm - ekurtulus opened this issue about 2 months ago
[Bug]: TPU InternVL2 Model Error Graph break due to unsupported builtin _XLAC.PyCapsule._xla_get_replication_devices_count
github.com/vllm-project/vllm - radna0 opened this issue about 2 months ago
github.com/vllm-project/vllm - radna0 opened this issue about 2 months ago
[Usage]: Using TPU example with InternVL2 Model
github.com/vllm-project/vllm - radna0 opened this issue about 2 months ago
github.com/vllm-project/vllm - radna0 opened this issue about 2 months ago
[Misc] Optional installation of audio related packages
github.com/vllm-project/vllm - ywang96 opened this pull request about 2 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request about 2 months ago
[Neuron] Adding support for adding/ overriding neuron configuration a…
github.com/vllm-project/vllm - hbikki opened this pull request about 2 months ago
github.com/vllm-project/vllm - hbikki opened this pull request about 2 months ago
[Bugfix][VLM] Add fallback to SDPA for ViT model running on CPU backend
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Frontend] Add progress reporting to run_batch.py
github.com/vllm-project/vllm - alugowski opened this pull request about 2 months ago
github.com/vllm-project/vllm - alugowski opened this pull request about 2 months ago
[BugFix][Core] Multistep Fix Crash on Request Cancellation
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Bug]: vLLM hang at nccl step when trying to use multiple GPUs
github.com/vllm-project/vllm - BiboyQG opened this issue about 2 months ago
github.com/vllm-project/vllm - BiboyQG opened this issue about 2 months ago
[Bug]: vllm0.4.3 guided decoding
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
[Core][Bugfix] Accept GGUF model without .gguf extension
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Bugfix] Fix internlm2 tensor parallel inference
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Hardware][Ascend] Add Ascend NPU backend
github.com/vllm-project/vllm - wangshuai09 opened this pull request about 2 months ago
github.com/vllm-project/vllm - wangshuai09 opened this pull request about 2 months ago
[Usage]: Bad Request with multiple multimodal inputs when using vision LLM.
github.com/vllm-project/vllm - mru4913 opened this issue about 2 months ago
github.com/vllm-project/vllm - mru4913 opened this issue about 2 months ago
[Bugfix] Fix import error in Phi-3.5-MoE
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Bug]: flakey test found in #7874
github.com/vllm-project/vllm - noooop opened this issue about 2 months ago
github.com/vllm-project/vllm - noooop opened this issue about 2 months ago
[Core] Optimize Async + Multi-step
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
[Frontend][VLM] Add support for multiple multi-modal items in the OpenAI frontend
github.com/vllm-project/vllm - ywang96 opened this pull request about 2 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request about 2 months ago
[not-for-review] test PR
github.com/vllm-project/vllm - khluu opened this pull request about 2 months ago
github.com/vllm-project/vllm - khluu opened this pull request about 2 months ago
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding
github.com/vllm-project/vllm - tjohnson31415 opened this pull request about 2 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request about 2 months ago
[WIP, Kernel] (3/N) Machete W4A8
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 2 months ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 2 months ago
[Bug]: Inconsistent generation with guided_json, speculative decoding and temp > 0.0
github.com/vllm-project/vllm - ccdv-ai opened this issue about 2 months ago
github.com/vllm-project/vllm - ccdv-ai opened this issue about 2 months ago
[CI/Build] Use uv in the Dockerfile
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
[CI/Build][Kernel] Update CUTLASS to 3.5.1 tag
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
[Installation]: Issues with installing vLLM on ROCM without sudo access
github.com/vllm-project/vllm - prajwal1210 opened this issue about 2 months ago
github.com/vllm-project/vllm - prajwal1210 opened this issue about 2 months ago
[cleanup] remove engine-use-ray
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
[Performance]: Sampler is too slow?
github.com/vllm-project/vllm - niuzheng168 opened this issue about 2 months ago
github.com/vllm-project/vllm - niuzheng168 opened this issue about 2 months ago
[Kernel] Change interface to Mamba selective_state_update for continuous batching
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
RayServe TPU example
github.com/vllm-project/vllm - richardsliu opened this pull request about 2 months ago
github.com/vllm-project/vllm - richardsliu opened this pull request about 2 months ago
[Bugfix] Fix ModelScope models in v0.5.5
github.com/vllm-project/vllm - NickLucche opened this pull request about 2 months ago
github.com/vllm-project/vllm - NickLucche opened this pull request about 2 months ago
[Feature]: Contribute T5 model to vLLM
github.com/vllm-project/vllm - shivance opened this issue about 2 months ago
github.com/vllm-project/vllm - shivance opened this issue about 2 months ago
[TPU][Bugfix] Fix tpu type api
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Bugfix] Fix import error in Exaone model
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Misc]: Question about Serving with Server API
github.com/vllm-project/vllm - JoanFM opened this issue about 2 months ago
github.com/vllm-project/vllm - JoanFM opened this issue about 2 months ago
[Kernel] Enable 8-bit weights in Fused Marlin MoE
github.com/vllm-project/vllm - ElizaWszola opened this pull request about 2 months ago
github.com/vllm-project/vllm - ElizaWszola opened this pull request about 2 months ago
[Usage]: How can I determine the maximum number of concurrent requests?
github.com/vllm-project/vllm - zhangyan1986 opened this issue about 2 months ago
github.com/vllm-project/vllm - zhangyan1986 opened this issue about 2 months ago