Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Misc]: Implement CPU/GPU swapping in BlockManagerV2

cadedaniel opened this issue 9 months ago
[Hardware][AMD][Kernel]Adding custom kernel for vector query on Rocm

charlifu opened this pull request 9 months ago
[Bug]: ChatCompletion prompt_logprobs does not work

noamgat opened this issue 9 months ago
[RFC] Initial Support for CPUs

bigPYJ1151 opened this issue 9 months ago
[Kernel] Use flash-attn for decoding

skrider opened this pull request 9 months ago
[Misc] add the "download-dir" option to the latency/throughput benchmarks

AmadeusChan opened this pull request 9 months ago
[RFC] Initial Support for Cloud TPUs

WoosukKwon opened this issue 9 months ago
[BugFix] Fix Falcon tied embeddings

WoosukKwon opened this pull request 9 months ago
[Feature]: Offload Model Weights to CPU

chenqianfzh opened this issue 9 months ago
[New Model]: Phi-2 support for LoRA

andykhanna opened this issue 9 months ago
[CI] Create nightly images/wheels

simon-mo opened this issue 9 months ago
[Usage]: model not support lora but listed in supported models

xiaobo-Chen opened this issue 9 months ago
[Feature]: Support Guided Decoding in `LLM` entrypoint

simon-mo opened this issue 9 months ago
[Bug]: when intalling vllm by pip, some errors happend.

finylink opened this issue 9 months ago
[Usage]: How to inference model with multi-gpus

ckj18 opened this issue 9 months ago
[Kernel] Full Tensor Parallelism for LoRA Layers

FurtherAI opened this pull request 9 months ago
[Feature]: Support distributing serving with KubeRay's autoscaler

TrafalgarZZZ opened this issue 9 months ago
[Bug]: vllm slows down after a long run

momomobinx opened this issue 9 months ago
[New Model]: Please support CogVLM

kietna1809 opened this issue 9 months ago
[Misc] Add attention sinks

felixzhu555 opened this pull request 9 months ago
[Bug]: Use of LoRAReqeust

meiru-cam opened this issue 9 months ago
[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server

njhill opened this pull request 9 months ago
[Core] Add generic typing to `LRUCache`

njhill opened this pull request 9 months ago
[Usage]: Set dtype for VLLM using YAML

telekoteko opened this issue 9 months ago
Dynamic Multi LoRA Load \ Delete Support

gauravkr2108 opened this pull request 9 months ago
[Feature]: Compute and log the serving FLOPs

zhuohan123 opened this issue 9 months ago
[Usage]: Why increase max-num-seqs will use less memory

TaChao opened this issue 9 months ago
[Frontend] [Core] feat: Add model loading using `tensorizer`

sangstar opened this pull request 9 months ago
[Frontend] Support complex message content for chat completions endpoint

fgreinacher opened this pull request 9 months ago
[Core] Multiprocessing executor for single-node multi-GPU deployment

njhill opened this pull request 9 months ago
baichuan/qwen/chatlgm with lora adaption [feature]

kexuedaishu opened this issue 9 months ago
[Bugfix] Fix beam search logits processor

maximzubkov opened this pull request 9 months ago
[Feature]: Control vectors

generalsvr opened this issue 9 months ago
[Core] Support thread-based async tokenizer pools

njhill opened this pull request 9 months ago
[Bug]: Bug in Guided Generation Logits Processorwith `n>1`

maximzubkov opened this issue 9 months ago
[Frontend] support new lora module to a live server in OpenAI Entrypoints

AlphaINF opened this pull request 9 months ago
[Test] Add a randomized test for OpenAI API

dylanwhawk opened this issue 9 months ago
[Bug]: Incompatible version between torch and triton

mzz12 opened this issue 9 months ago
Does vllm support pytorch/xla ?

dinghaodhd opened this issue 9 months ago
[Misc] add HOST_IP env var

youkaichao opened this pull request 9 months ago
Incremental output for LLM entrypoint

yhu422 opened this pull request 9 months ago
Unable to load LoRA fine-tuned LLM from HF (AssertionError)

oscar-martin opened this issue 9 months ago
[Feature] Implement FastV's Token Pruning

chenllliang opened this issue 9 months ago
Sampling is very slow, causing a CPU bottleneck

m-harmonic opened this issue 9 months ago
[TEST] Add a distributed test for async LLM engine.

zhuohan123 opened this issue 9 months ago
DeepSeek VL support

SinanAkkoyun opened this issue 9 months ago
inference with AWQ quantization

Kev1ntan opened this issue 9 months ago
Fixes #1556 double free

br3no opened this pull request 10 months ago
Bug when input top_k as a float that is outside of range

Drzhivago264 opened this issue 10 months ago
TCPStore is not available

Z-Diviner opened this issue 10 months ago
Is it possible to use vllm-0.3.3 with CUDA 11.8

HSLUCKY opened this issue 10 months ago
add aya-101 model

ahkarami opened this issue 10 months ago
What's up with Pipeline Parallelism?

duanzhaol opened this issue 10 months ago
how to run gemma-7b model with vllm 0.3.3 under cuda 118??

adogwangwang opened this issue 10 months ago
AsyncEngineDeadError when LoRA loading fails

lifuhuang opened this issue 10 months ago
Multi-LoRA - Support for providing /load and /unload API

gauravkr2108 opened this issue 10 months ago
[feature on nm-vllm] Sparse Inference with weight only int8 quant

shiqingzhangCSU opened this issue 10 months ago
Question regarding GPU memory allocation

wx971025 opened this issue 10 months ago
Error compiling kernels

declark1 opened this issue 10 months ago
lm-evaluation-harness broken on master

pcmoritz opened this issue 10 months ago
v0.3.3 api server can't startup with neuron sdk

qingyuan18 opened this issue 10 months ago
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU)

AdrianAbeyta opened this pull request 10 months ago
[FIX] Fix prefix test error on main

zhuohan123 opened this pull request 10 months ago
Mixtral 4x 4090 OOM

SinanAkkoyun opened this issue 10 months ago
Order of keys for guided JSON

ccdv-ai opened this issue 10 months ago
Regression in llama model inference due to #3005

Qubitium opened this issue 10 months ago
unload the model

osafaimal opened this issue 10 months ago
install from source failed using the latest code

sleepwalker2017 opened this issue 10 months ago
[FIX] Make `flash_attn` optional

WoosukKwon opened this pull request 10 months ago
[Minor fix] Include flash_attn in docker image

tdoublep opened this pull request 10 months ago
Error when prompt_logprobs + enable_prefix_caching

bgyoon opened this issue 10 months ago
Can vLLM handle concurrent request with FastAPI?

Strongorange opened this issue 10 months ago
OpenAI Tools / function calling v2

FlorianJoncour opened this pull request 10 months ago
Prefix Caching with FP8 KV cache support

chenxu2048 opened this pull request 10 months ago