Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective - Host: opensource - https://opencollective.com/vllm - Code: https://github.com/vllm-project/vllm

[FIX] Fix prefix test error on main

github.com/vllm-project/vllm - zhuohan123 opened this pull request 7 months ago
Mixtral 4x 4090 OOM

github.com/vllm-project/vllm - SinanAkkoyun opened this issue 7 months ago
Order of keys for guided JSON

github.com/vllm-project/vllm - ccdv-ai opened this issue 7 months ago
unload the model

github.com/vllm-project/vllm - osafaimal opened this issue 7 months ago
install from source failed using the latest code

github.com/vllm-project/vllm - sleepwalker2017 opened this issue 7 months ago
[FIX] Make `flash_attn` optional

github.com/vllm-project/vllm - WoosukKwon opened this pull request 7 months ago
[Minor fix] Include flash_attn in docker image

github.com/vllm-project/vllm - tdoublep opened this pull request 8 months ago
OpenAI Tools / function calling v2

github.com/vllm-project/vllm - FlorianJoncour opened this pull request 8 months ago
Prefix Caching with FP8 KV cache support

github.com/vllm-project/vllm - chenxu2048 opened this pull request 8 months ago
[WIP] Build FlashInfer

github.com/vllm-project/vllm - WoosukKwon opened this pull request 8 months ago
ExLlamaV2: exl2 support

github.com/vllm-project/vllm - pabl-o-ce opened this issue 8 months ago
Supporting embedding models

github.com/vllm-project/vllm - jc9123 opened this pull request 8 months ago
add doc about serving option on dstack

github.com/vllm-project/vllm - deep-diver opened this pull request 8 months ago
Merge Gemma into Llama

github.com/vllm-project/vllm - WoosukKwon opened this pull request 8 months ago
[Feature] Add vision language model support.

github.com/vllm-project/vllm - xwjiang2010 opened this pull request 8 months ago
Support of AMD consumer GPUs

github.com/vllm-project/vllm - arno4000 opened this issue 8 months ago
Unable to specify GPU usage in VLLM code

github.com/vllm-project/vllm - humza-sami opened this issue 8 months ago
Separate attention backends

github.com/vllm-project/vllm - WoosukKwon opened this pull request 8 months ago
AWQ Quantization Memory Usage

github.com/vllm-project/vllm - vcivan opened this issue 8 months ago
Multi-GPU Support Failures with AMD MI210

github.com/vllm-project/vllm - tom-papatheodore opened this issue 8 months ago
Fix empty output when temp is too low

github.com/vllm-project/vllm - CatherineSue opened this pull request 8 months ago
E5-mistral-7b-instruct embedding support

github.com/vllm-project/vllm - DavidPeleg6 opened this issue 8 months ago
Runtime exception [step must be nonzero]

github.com/vllm-project/vllm - DreamGenX opened this issue 8 months ago
vllm keeps hanging when using djl-deepspeed

github.com/vllm-project/vllm - ali-firstparty opened this issue 8 months ago
Allow model to be served under multiple names

github.com/vllm-project/vllm - hmellor opened this pull request 8 months ago
HQQ quantization support

github.com/vllm-project/vllm - max-wittig opened this issue 8 months ago
Missing prometheus metrics in `0.3.0`

github.com/vllm-project/vllm - SamComber opened this issue 8 months ago
Add LoRA support for Mixtral

github.com/vllm-project/vllm - tterrysun opened this pull request 8 months ago
Add guided decoding for OpenAI API server

github.com/vllm-project/vllm - felixzhu555 opened this pull request 8 months ago
Adds support for gunicorn multiprocess process

github.com/vllm-project/vllm - jalotra opened this pull request 8 months ago
Add Splitwise implementation to vLLM

github.com/vllm-project/vllm - aashaka opened this pull request 8 months ago
model continue conversation

github.com/vllm-project/vllm - andrey-genpracc opened this issue 9 months ago
Add fused top-K softmax kernel for MoE

github.com/vllm-project/vllm - WoosukKwon opened this pull request 9 months ago
GPTQ & AWQ Fused MOE

github.com/vllm-project/vllm - chu-tianxiang opened this pull request 9 months ago
[Minor] More fix of test_cache.py CI test failure

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 9 months ago
Fix/async chat serving

github.com/vllm-project/vllm - schoennenbeck opened this pull request 9 months ago
KV Cache usage is 0% for mistral model

github.com/vllm-project/vllm - nikhilshandilya opened this issue 9 months ago
Ray worker out of memory

github.com/vllm-project/vllm - tristan279 opened this issue 9 months ago
Dockerfile: build-arg to punica kernel

github.com/vllm-project/vllm - AguirreNicolas opened this pull request 9 months ago
[RFC] Automatic Prefix Caching

github.com/vllm-project/vllm - zhuohan123 opened this issue 9 months ago
Speculative Decoding

github.com/vllm-project/vllm - ymwangg opened this pull request 9 months ago
RuntimeError on ROCm

github.com/vllm-project/vllm - rlrs opened this issue 9 months ago
Allow passing hf config args with openai server

github.com/vllm-project/vllm - Aakash-kaushik opened this issue 9 months ago