vLLM issues | Ecosyste.ms: OpenCollective

GGUF support

github.com/vllm-project/vllm - viktor-ferenczi opened this issue about 1 year ago

AsyncEngineDeadError / RuntimeError: CUDA error: an illegal memory access was encountered

github.com/vllm-project/vllm - xingyaoww opened this issue about 1 year ago

It seems that SamplingParams doesnt support the bad_words_ids parameter when generating

github.com/vllm-project/vllm - mengban opened this issue about 1 year ago

can model Qwen/Qwen-VL-Chat work well?

github.com/vllm-project/vllm - wangschang opened this issue about 1 year ago

vllm reducing quality when loading local fine tuned Llama-2-13b-hf model

github.com/vllm-project/vllm - BerndHuber opened this issue about 1 year ago

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

github.com/vllm-project/vllm - amulil opened this issue about 1 year ago

Is there authentication supported?

github.com/vllm-project/vllm - mluogh opened this issue about 1 year ago

Loading Model through Multi-Node Ray Cluster Fails

github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue about 1 year ago

SIGABRT - Fatal Python error: Aborted when running vllm on llama2-7b with --tensor-parallel-size 2

github.com/vllm-project/vllm - dhritiman opened this issue about 1 year ago

Sagemaker support for inference

github.com/vllm-project/vllm - Tarun3679 opened this issue about 1 year ago

start vllm.entrypoints.api_server model vicuna-13b-v1.3 error: Fatal Python error: Bus error

github.com/vllm-project/vllm - luefei opened this issue about 1 year ago

Support for RLHF (ILQL)-trained Models

github.com/vllm-project/vllm - ojus1 opened this issue about 1 year ago

vLLM full name

github.com/vllm-project/vllm - designInno opened this issue about 1 year ago

Stream Tokens operation integration into LLM class (which uses LLMEngine behind the scenes)

github.com/vllm-project/vllm - orellavie1212 opened this issue about 1 year ago

pip installation error - ERROR: Failed building wheel for vllm

github.com/vllm-project/vllm - dxlong2000 opened this issue about 1 year ago

Stuck in Initializing an LLM engine

github.com/vllm-project/vllm - EvilCalf opened this issue about 1 year ago

Feature request: Support for embedding models

github.com/vllm-project/vllm - mantrakp04 opened this issue about 1 year ago

test qwen-7b-chat model and output incorrect

github.com/vllm-project/vllm - dachengai opened this issue about 1 year ago

What a fast tokenizer can be used for Baichuan-13b?

github.com/vllm-project/vllm - FURYFOR opened this issue about 1 year ago

vllm如何量化部署

github.com/vllm-project/vllm - xxm1668 opened this issue about 1 year ago

Issue with raylet error

github.com/vllm-project/vllm - ZihanWang314 opened this issue about 1 year ago

Memory leak while using tensor_parallel_size>1

github.com/vllm-project/vllm - haiasd opened this issue about 1 year ago

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

github.com/vllm-project/vllm - jinfengfeng opened this issue about 1 year ago

Installing with ROCM

github.com/vllm-project/vllm - baderex opened this issue about 1 year ago

Best effort support for all Hugging Face transformers models

github.com/vllm-project/vllm - dwyatte opened this issue about 1 year ago

ValueError: The number of GPUs per node is not divisible by the number of tensor parallelism.

github.com/vllm-project/vllm - beratcmn opened this issue about 1 year ago

Cannot get a simple example working with multi-GPU

github.com/vllm-project/vllm - brevity2021 opened this issue about 1 year ago

多gpus如何使用？

github.com/vllm-project/vllm - xxm1668 opened this issue about 1 year ago

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b

github.com/vllm-project/vllm - McCarrtney opened this issue about 1 year ago

LlaMA 2: Input prompt (2664 tokens) is too long and exceeds limit of 2048/2560

github.com/vllm-project/vllm - foamliu opened this issue about 1 year ago

Flash Attention V2

github.com/vllm-project/vllm - nivibilla opened this issue over 1 year ago

Faster model loading

github.com/vllm-project/vllm - imoneoi opened this issue over 1 year ago

+34% higher throughput?

github.com/vllm-project/vllm - naed90 opened this issue over 1 year ago

Decode error while inferencing a batch of prompts

github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago

Support Multiple Models

github.com/vllm-project/vllm - aldrinc opened this issue over 1 year ago

Feature request：support ExLlama

github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago

8bit support

github.com/vllm-project/vllm - mymusise opened this issue over 1 year ago

Require a "Wrapper" feature

github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago

CTranslate2

github.com/vllm-project/vllm - Matthieu-Tinycoaching opened this issue over 1 year ago

Remove Ray for the dependency

github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago

CUDA error: out of memory

github.com/vllm-project/vllm - SunixLiu opened this issue over 1 year ago

Adding support for encoder-decoder models, like T5 or BART

github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago

Can I directly obtain the logits here?

github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago

Whisper support

github.com/vllm-project/vllm - gottlike opened this issue over 1 year ago

Build failure due to CUDA version mismatch

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support custom models

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add docstrings to some modules and classes

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Minor code cleaning for SamplingParams

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Add performance comparison figures on A100, V100, T4

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add CD to PyPI

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Enhance SamplingParams

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Implement presence and frequency penalties

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Support top-k sampling

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Avoid sorting waiting queue & Minor code cleaning

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Support string-based stopping conditions

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Rename variables and methods

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Log system stats

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Update example prompts in `simple_server.py`

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Support various sampling parameters

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Make sure the system can run on T4 and V100

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Clean up the scheduler code

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add a system logger

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Use slow tokenizer for LLaMA

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Enhance model loader

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Refactor system architecture

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Use runtime profiling to replace manual memory analyzers

github.com/vllm-project/vllm - zhuohan123 opened this pull request over 1 year ago

Bug in LLaMA fast tokenizer

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

[Minor] Fix a dtype bug

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Specify python package dependencies in requirements.txt

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Clean up Megatron-LM code

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add license

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Implement client API

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add docstring

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago

Use mypy

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support FP32

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Dangerous floating point comparison

github.com/vllm-project/vllm - merrymercy opened this issue over 1 year ago

Replace FlashAttention with xformers

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Decrease the default size of swap space

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Fix a bug in attention kernel

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Use O3 optimization instead of O2 for CUDA compilation?

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

A critical bug in attention kernel after refactoring

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add documents on how to add new models

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago

Enhance model mapper

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Use dtype from model config & Add Dolly V2

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Implement a system logger to print system status and warnings

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support BLOOM

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add support for GPT-2

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Profile memory usage

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago

Use pytest for unit tests

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add code formatting script & Add CI to check code format

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add dependencies in setup.py

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support GPT-2

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support bfloat16 data type

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Refactor attention kernels

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

New weight loader without np copy

github.com/vllm-project/vllm - zhuohan123 opened this pull request over 1 year ago

Add an option to launch cacheflow without ray

github.com/vllm-project/vllm - zhuohan123 opened this pull request over 1 year ago

Add support for GPT-NeoX (Pythia)

github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago

Add plot scripts

github.com/vllm-project/vllm - Ying1123 opened this pull request over 1 year ago

Improve Weight Loading

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago

Frontend Improvements

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago