Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

Support W8A8 inference in vllm

AniZpZ opened this pull request about 1 year ago
Support int8 KVCache Quant in Vllm

AniZpZ opened this pull request about 1 year ago
Added logits processor API to sampling params

noamgat opened this pull request about 1 year ago
ImportError: cannot import name 'MistralConfig' from 'transformers'

peter-ch opened this issue about 1 year ago
Does vllm support the Mac/Metal/MPS?

Phil-U-U opened this issue over 1 year ago
Using VLLM with a Tesla T4 on SageMaker Studio (ml.g4dn.xlarge instance)

paulovasconcellos-hotmart opened this issue over 1 year ago
[question] Does vllm support macos M1 or M2 chip?

acekingke opened this issue over 1 year ago
Make multi replicas to make a balancer.

linkedlist771 opened this issue over 1 year ago
How to deploy vllm model across multiple nodes in kubernetes?

Ryojikn opened this issue over 1 year ago
[Error] 400 Bad Request

Tostino opened this issue over 1 year ago
Low VRAM batch processing mode

viktor-ferenczi opened this issue over 1 year ago
feat: demonstrate using regex for suffix matching

wsxiaoys opened this pull request over 1 year ago
Memory leak

SatoshiReport opened this issue over 1 year ago
StreamingLLM support?

nivibilla opened this issue over 1 year ago
workaround of AWQ for Turing GPUs

twaka opened this pull request over 1 year ago
Jetson agx orin

MrBrabus75 opened this issue over 1 year ago
Data parallel inference

kevinhu opened this issue over 1 year ago
Support Python 3.12

EwoutH opened this issue over 1 year ago
3 gpu's not supported?

ye7love7 opened this issue over 1 year ago
Generate nothing from VLLM output

FocusLiwen opened this issue over 1 year ago
vLLM to add a locally trained model

atanikan opened this issue over 1 year ago
vLLM Discord Server

zhuohan123 opened this issue over 1 year ago
Waiting sequence group should have only one prompt sequence.

Link-Li opened this issue over 1 year ago
Inconsistent results between HuggingFace Transformers and vllm

normster opened this issue over 1 year ago
How to deploy api server as https

yilihtien opened this issue over 1 year ago
vllm hangs when reinitializing ray

nelson-liu opened this issue over 1 year ago
How to use vllm to compute ppl score for input text?

yinochaos opened this issue over 1 year ago
GGUF support

viktor-ferenczi opened this issue over 1 year ago
can model Qwen/Qwen-VL-Chat work well?

wangschang opened this issue over 1 year ago
Is there authentication supported?

mluogh opened this issue over 1 year ago
Loading Model through Multi-Node Ray Cluster Fails

VarunSreenivasan16 opened this issue over 1 year ago
Sagemaker support for inference

Tarun3679 opened this issue over 1 year ago
Support for RLHF (ILQL)-trained Models

ojus1 opened this issue over 1 year ago
vLLM full name

designInno opened this issue over 1 year ago
pip installation error - ERROR: Failed building wheel for vllm

dxlong2000 opened this issue over 1 year ago
Stuck in Initializing an LLM engine

EvilCalf opened this issue over 1 year ago
Feature request: Support for embedding models

mantrakp04 opened this issue over 1 year ago
test qwen-7b-chat model and output incorrect

dachengai opened this issue over 1 year ago
What a fast tokenizer can be used for Baichuan-13b?

FURYFOR opened this issue over 1 year ago
vllm如何量化部署

xxm1668 opened this issue over 1 year ago
Issue with raylet error

ZihanWang314 opened this issue over 1 year ago
Memory leak while using tensor_parallel_size>1

haiasd opened this issue over 1 year ago
Installing with ROCM

baderex opened this issue over 1 year ago
Best effort support for all Hugging Face transformers models

dwyatte opened this issue over 1 year ago
Cannot get a simple example working with multi-GPU

brevity2021 opened this issue over 1 year ago
多gpus如何使用?

xxm1668 opened this issue over 1 year ago
Flash Attention V2

nivibilla opened this issue over 1 year ago
Faster model loading

imoneoi opened this issue over 1 year ago
+34% higher throughput?

naed90 opened this issue over 1 year ago
[Feature Request] Support input embedding in `LLM.generate()`

KimmiShi opened this issue over 1 year ago
Decode error while inferencing a batch of prompts

SiriusNEO opened this issue over 1 year ago
Support Multiple Models

aldrinc opened this issue over 1 year ago
Feature request:support ExLlama

alanxmay opened this issue over 1 year ago
8bit support

mymusise opened this issue over 1 year ago
Require a "Wrapper" feature

jeffchy opened this issue over 1 year ago
CTranslate2

Matthieu-Tinycoaching opened this issue over 1 year ago
Remove Ray for the dependency

lanking520 opened this issue over 1 year ago
CUDA error: out of memory

SunixLiu opened this issue over 1 year ago
Adding support for encoder-decoder models, like T5 or BART

shermansiu opened this issue over 1 year ago
Can I directly obtain the logits here?

SparkJiao opened this issue over 1 year ago
Whisper support

gottlike opened this issue over 1 year ago
Build failure due to CUDA version mismatch

WoosukKwon opened this issue over 1 year ago
Support custom models

WoosukKwon opened this issue over 1 year ago
Add docstrings to some modules and classes

WoosukKwon opened this pull request over 1 year ago
Minor code cleaning for SamplingParams

WoosukKwon opened this pull request over 1 year ago
Add performance comparison figures on A100, V100, T4

WoosukKwon opened this issue over 1 year ago
Add CD to PyPI

WoosukKwon opened this issue over 1 year ago
Enhance SamplingParams

WoosukKwon opened this pull request over 1 year ago
Implement presence and frequency penalties

WoosukKwon opened this pull request over 1 year ago
Support top-k sampling

WoosukKwon opened this pull request over 1 year ago
Avoid sorting waiting queue & Minor code cleaning

WoosukKwon opened this pull request over 1 year ago
Support string-based stopping conditions

WoosukKwon opened this issue over 1 year ago
Rename variables and methods

WoosukKwon opened this pull request over 1 year ago
Log system stats

WoosukKwon opened this pull request over 1 year ago
Update example prompts in `simple_server.py`

WoosukKwon opened this pull request over 1 year ago
Support various sampling parameters

WoosukKwon opened this issue over 1 year ago
Make sure the system can run on T4 and V100

WoosukKwon opened this issue over 1 year ago
Clean up the scheduler code

WoosukKwon opened this issue over 1 year ago
Add a system logger

WoosukKwon opened this pull request over 1 year ago
Use slow tokenizer for LLaMA

WoosukKwon opened this pull request over 1 year ago
Enhance model loader

WoosukKwon opened this pull request over 1 year ago