Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

Added Support for guided decoding in offline interface

kevinbu233 opened this pull request 8 months ago
[Feature]: Support HuggingFaceM4/idefics2-8b as vision model

pseudotensor opened this issue 8 months ago
[Misc] [CI]: AMD test flaky on main CI

cadedaniel opened this issue 8 months ago
[Model] Jamba support

mzusman opened this pull request 8 months ago
[CI/BUILD] enable intel queue for longer CPU tests

zhouyuan opened this pull request 8 months ago
[Bug]: VLLM's output is unstable when handling requests CONCURRENTLY.

zhengwei-gao opened this issue 8 months ago
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide`

br3no opened this pull request 8 months ago
[Bug]: --engine-use-ray is broken. #4100

jdinalt opened this pull request 8 months ago
[Bugfix] Fix naive attention typos and make it run on navi3x

maleksan85 opened this pull request 8 months ago
[Bug]: guided_json bad output for llama2-13b

pseudotensor opened this issue 8 months ago
[Model] Adding support for MiniCPM-V

HwwwwwwwH opened this pull request 8 months ago
[FacebookAI/roberta-large]: vllm support for FacebookAI/roberta-large

pradeepdev-1995 opened this issue 8 months ago
[Bug]: vllm_C is missing.

Calvinnncy97 opened this issue 8 months ago
[Model] Add support for 360zhinao

garycaokai opened this pull request 8 months ago
[Bug]: RuntimeError: Unknown layout

zzlgreat opened this issue 8 months ago
[Usage]: Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1

rohitnanda1443 opened this issue 8 months ago
Does vllm support both CUDA 11.3 version and PyTorch 1.12?

iclgg opened this issue 8 months ago
[Usage]: Problem when loading my trained model.

hummingbird2030 opened this issue 8 months ago
[Feature][Chunked prefill]: Make sliding window work

rkooo567 opened this issue 8 months ago
[Feature]: bitsandbytes support

orellavie1212 opened this issue 8 months ago
[Frontend] Refactor prompt processing

DarkLight1337 opened this pull request 8 months ago
[Bug]: start api server stuck

QianguoS opened this issue 8 months ago
[Model] [Kernel] Add 16, 32 kernel sizes in compliation

nbardy opened this pull request 8 months ago
[Installation]: Any plans on providing vLLM pre-compiled for ROCm?

satyamk7054 opened this issue 8 months ago
[Core] Support LoRA on quantized models

jeejeelee opened this pull request 9 months ago
[Installation]: VLLM is impossible to install.

GPaolo opened this issue 9 months ago
[Kernel] Fused MoE Config for Mixtral 8x22

ywang96 opened this pull request 9 months ago
[Usage]: flash_attn vs xformers

VeryVery opened this issue 9 months ago
[Bug]: Command R+ GPTQ bad output on ROCm

TNT3530 opened this issue 9 months ago
[Feature]: Tree attention about Speculative Decoding

yukavio opened this issue 9 months ago
[CI/Build] Reduce race condition in docker build

youkaichao opened this pull request 9 months ago
[Misc]: Does prefix caching work together with multi lora?

sleepwalker2017 opened this issue 9 months ago
[Bug]: StableLM 12b head size incorrect

bjoernpl opened this issue 9 months ago
[Model] LoRA gptbigcode implementation

raywanb opened this pull request 9 months ago
[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics

DearPlanet opened this pull request 9 months ago
[Bug]: leading space within content via OpenAI Compatible Server

bufferoverflow opened this issue 9 months ago
[Usage]: How to offload some layers to CPU?

cheney369 opened this issue 9 months ago
想问下有一个稳定版本的docker 镜像吗?

huyang19881115 opened this issue 9 months ago
[Model] Initialize Fuyu-8B support

Isotr0py opened this pull request 9 months ago
[Kernel] PyTorch Labs Fused MoE Kernel Integration

robertgshaw2-neuralmagic opened this pull request 9 months ago
[Bug]: killed due to high memory usage

xiewf1990 opened this issue 9 months ago
[Bug]: Cannot load lora adapters in WSL 2

invokeinnovation opened this issue 9 months ago
[Doc/Feature]: Llava 1.5 in OpenAI compatible server

stikkireddy opened this issue 9 months ago
[Roadmap] vLLM Roadmap Q2 2024

simon-mo opened this issue 9 months ago
[Misc]: Can we remove `vllm/entrypoints/api_server.py`?

hmellor opened this issue 9 months ago
[Frontend] openAI entrypoint dynamic adapter load

DavidPeleg6 opened this pull request 9 months ago
[Bug]: Error happen in async_llm_engine when use multiple GPUs

for-just-we opened this issue 9 months ago
[Misc]: Implement CPU/GPU swapping in BlockManagerV2

Kaiyang-Chen opened this pull request 9 months ago
[Core] :loud_sound: Improve request logging truncation

joerunde opened this pull request 9 months ago
[Model] Cohere CommandR+

saurabhdash2512 opened this pull request 9 months ago
[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend

jikunshang opened this pull request 9 months ago
[Bug]: YI:34B在使用上无法停止。

cat2353050774 opened this issue 9 months ago
[Feature]: Make `outlines` dependency optional

saattrupdan opened this issue 9 months ago
[Feature]: Add OpenTelemetry distributed tracing

ronensc opened this issue 9 months ago
[Feature]: cuda12.2 support

s-natsubori opened this issue 9 months ago
Best server cmd for mistralai/Mistral-7B-v0.1

sshleifer opened this issue 9 months ago
[RFC] How do we test and support third-party models

youkaichao opened this issue 9 months ago
[Bug]: Qwen-14B-Chat-Int4 with guided_json error

xunfeng1980 opened this issue 9 months ago
[Bug]: n_inner divisible to number of GPUs

aliozts opened this issue 9 months ago
[Core] Eliminate parallel worker per-step task scheduling overhead

njhill opened this pull request 9 months ago
[Usage]: Expected output when prompt_logprobs=1

thefirebanks opened this issue 9 months ago
[Bug]: CUDA error: invalid argument

qingjiaozyn opened this issue 9 months ago
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API

CatherineSue opened this pull request 9 months ago
[Model Loading] Speedup model loading with distributed loading

chestnut-Q opened this pull request 9 months ago
[RFC] Initial support for Intel GPU

jikunshang opened this issue 9 months ago
[Usage]: vllm can host offline? with internet connection?

juud79 opened this issue 9 months ago
[Feature]: A instruction/chat method for offline LLM class.

simon-mo opened this issue 9 months ago
[Bug]: VLLM OOMing unpredictably on prediction

hillarysanders opened this issue 9 months ago
[Bug]: Custom all reduce not work.

esmeetu opened this issue 9 months ago
Enable mypy type checking

simon-mo opened this issue 9 months ago