Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective - Host: opensource - https://opencollective.com/vllm - Code: https://github.com/vllm-project/vllm

openai completions api <echo=True> raises Error

github.com/vllm-project/vllm - seoyunYang opened this issue almost 1 year ago
Add Splitwise implementation to vLLM

github.com/vllm-project/vllm - aashaka opened this pull request about 1 year ago
Multi GPU ROCm6 issues, and workarounds

github.com/vllm-project/vllm - BKitor opened this issue about 1 year ago
model continue conversation

github.com/vllm-project/vllm - andrey-genpracc opened this issue about 1 year ago
Add fused top-K softmax kernel for MoE

github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 year ago
GPTQ & AWQ Fused MOE

github.com/vllm-project/vllm - chu-tianxiang opened this pull request about 1 year ago
[Minor] More fix of test_cache.py CI test failure

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request about 1 year ago
How to increase vllm scheduler promt limit?

github.com/vllm-project/vllm - hanswang1 opened this issue about 1 year ago
Fix/async chat serving

github.com/vllm-project/vllm - schoennenbeck opened this pull request about 1 year ago
KV Cache usage is 0% for mistral model

github.com/vllm-project/vllm - nikhilshandilya opened this issue about 1 year ago
Ray worker out of memory

github.com/vllm-project/vllm - tristan279 opened this issue about 1 year ago
IndexError when using Beam Search in Chat Completions

github.com/vllm-project/vllm - jamestwhedbee opened this issue about 1 year ago
Dockerfile: build-arg to punica kernel

github.com/vllm-project/vllm - AguirreNicolas opened this pull request about 1 year ago
[RFC] Automatic Prefix Caching

github.com/vllm-project/vllm - zhuohan123 opened this issue about 1 year ago
Speculative Decoding

github.com/vllm-project/vllm - ymwangg opened this pull request about 1 year ago
Combine multi-LoRA and quantization

github.com/vllm-project/vllm - Yard1 opened this issue about 1 year ago
RuntimeError on ROCm

github.com/vllm-project/vllm - rlrs opened this issue about 1 year ago
Longer stop sequence not working in streaming mode

github.com/vllm-project/vllm - andrePankraz opened this issue about 1 year ago
Allow passing hf config args with openai server

github.com/vllm-project/vllm - Aakash-kaushik opened this issue about 1 year ago
Aborted request without reason

github.com/vllm-project/vllm - erjieyong opened this issue about 1 year ago
Support JSON mode.

github.com/vllm-project/vllm - MiyazonoKaori opened this issue about 1 year ago
Add JSON format logging support

github.com/vllm-project/vllm - CatherineSue opened this pull request about 1 year ago
anyone can Qwen-14B-Chat-AWQ work with VLLM/TP ?

github.com/vllm-project/vllm - s-natsubori opened this issue about 1 year ago
Compute perplexity/logits for the prompt

github.com/vllm-project/vllm - dsmilkov opened this issue about 1 year ago
OutOfMemoryError

github.com/vllm-project/vllm - Hobrus opened this issue about 1 year ago
vLLM on OpenShift/Kubernetes Manifests

github.com/vllm-project/vllm - WinsonSou opened this issue about 1 year ago
out of memory with mixtral AWQ

github.com/vllm-project/vllm - m0wer opened this issue about 1 year ago
Docs: Add Haystack integration details

github.com/vllm-project/vllm - bilgeyucel opened this pull request about 1 year ago
Could we support Fuyu-8B, a multimodel llm?

github.com/vllm-project/vllm - leiwen83 opened this issue about 1 year ago
[WIP] Speculative decoding using a draft model

github.com/vllm-project/vllm - cadedaniel opened this pull request about 1 year ago
Use LRU cache for CUDA Graphs

github.com/vllm-project/vllm - WoosukKwon opened this issue about 1 year ago
torch.cuda.OutOfMemoryError: CUDA out of memory

github.com/vllm-project/vllm - DenisStefanAndrei opened this issue about 1 year ago
vllm推理如何指定某块gpu

github.com/vllm-project/vllm - SiqinLv opened this issue about 1 year ago
Inquiry Regarding vLLM Support for Mac Metal API

github.com/vllm-project/vllm - yihong1120 opened this issue about 1 year ago
Implement Triton-based AWQ kernel

github.com/vllm-project/vllm - WoosukKwon opened this pull request about 1 year ago
Support VLM model and GPT4V API

github.com/vllm-project/vllm - xunfeng1980 opened this issue about 1 year ago
Vllm RayWoker process hangs when use llm engine

github.com/vllm-project/vllm - SuoSiFire opened this issue about 1 year ago
[FEATURE REQUEST] SparQ Attention

github.com/vllm-project/vllm - AlpinDale opened this issue about 1 year ago
why online seving slower than offline serving??

github.com/vllm-project/vllm - BangDaeng opened this issue about 1 year ago
I want to add mamba_chat (2.8b) model

github.com/vllm-project/vllm - SafeyahShemali opened this issue about 1 year ago
How to fix incomplete answers?

github.com/vllm-project/vllm - LuciAkirami opened this issue about 1 year ago
Can it support macos ? M2 chip.

github.com/vllm-project/vllm - znsoftm opened this issue about 1 year ago
01-ai/Yi-34B-Chat never stops

github.com/vllm-project/vllm - pseudotensor opened this issue about 1 year ago
ModuleNotFoundError: No module named "vllm._C"

github.com/vllm-project/vllm - Kawai1Ace opened this issue about 1 year ago
Please help me solve the problem. thanks

github.com/vllm-project/vllm - CP3666 opened this issue about 1 year ago
Proposal: force type hint check with mypy

github.com/vllm-project/vllm - wangkuiyi opened this issue about 1 year ago
pip install -e . failed

github.com/vllm-project/vllm - dachengai opened this issue about 1 year ago
How to use logits_processors

github.com/vllm-project/vllm - shuaiwang2022 opened this issue about 1 year ago
NCCL error

github.com/vllm-project/vllm - maxmelichov opened this issue about 1 year ago
ImportError: libcudart.so.12

github.com/vllm-project/vllm - tranhoangnguyen03 opened this issue about 1 year ago
Avoid re-initialize parallel groups

github.com/vllm-project/vllm - wangruohui opened this pull request about 1 year ago
[Feature] SYCL kernel support for Intel GPU

github.com/vllm-project/vllm - abhilash1910 opened this pull request about 1 year ago
API Server Performance

github.com/vllm-project/vllm - simon-mo opened this issue about 1 year ago
usage of vllm for extracting embeddings

github.com/vllm-project/vllm - ra-MANUJ-an opened this issue about 1 year ago
Revert 1 docker build

github.com/vllm-project/vllm - wasertech opened this pull request about 1 year ago
Prompt caching

github.com/vllm-project/vllm - AIApprentice101 opened this issue about 1 year ago