vLLM issues | Ecosyste.ms: OpenCollective

No CUDA GPUs are available Error with vLLM in JupyterLab

github.com/vllm-project/vllm - SafeyahShemali opened this issue about 1 year ago

how to use chat function

github.com/vllm-project/vllm - zhangzai666 opened this issue about 1 year ago

chatglm3 vllm/vllm/model_executor/models/chatglm.py", line 53, in __init__ assert self.total_num_kv_heads % tp_size == 0 AssertionError

github.com/vllm-project/vllm - Changjy1997nb opened this issue about 1 year ago

Support for sparsity?

github.com/vllm-project/vllm - BDHU opened this issue about 1 year ago

Tensor parallelism on ray cluster

github.com/vllm-project/vllm - baojunliu opened this issue about 1 year ago

Adding support for switch-transformer / NLLB-MoE

github.com/vllm-project/vllm - yl3469 opened this issue about 1 year ago

[Bug] prompt_logprobs = 1 OOM problem

github.com/vllm-project/vllm - shunxing1234 opened this issue over 1 year ago

Error: When using OpenAI-Compatible Server, the server is available but cannot be accessed from the same terminal.

github.com/vllm-project/vllm - LuristheSun opened this issue over 1 year ago

Support W8A8 inference in vllm

github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago

Support int8 KVCache Quant in Vllm

github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago

Added logits processor API to sampling params

github.com/vllm-project/vllm - noamgat opened this pull request over 1 year ago

ImportError: cannot import name 'MistralConfig' from 'transformers'

github.com/vllm-project/vllm - peter-ch opened this issue over 1 year ago

Adding Locally Typical Sampling (i.e. typical_p in transformers and TGI)

github.com/vllm-project/vllm - seongminp opened this issue over 1 year ago

Does vllm support the Mac/Metal/MPS?

github.com/vllm-project/vllm - Phil-U-U opened this issue over 1 year ago

Using VLLM with a Tesla T4 on SageMaker Studio (ml.g4dn.xlarge instance)

github.com/vllm-project/vllm - paulovasconcellos-hotmart opened this issue over 1 year ago

[question] Does vllm support macos M1 or M2 chip?

github.com/vllm-project/vllm - acekingke opened this issue over 1 year ago

Could not build wheels for vllm, which is required to install pyproject.toml-based projects

github.com/vllm-project/vllm - ABooth01 opened this issue over 1 year ago

Make multi replicas to make a balancer.

github.com/vllm-project/vllm - linkedlist771 opened this issue over 1 year ago

How to deploy vllm model across multiple nodes in kubernetes?

github.com/vllm-project/vllm - Ryojikn opened this issue over 1 year ago

[Error] 400 Bad Request

github.com/vllm-project/vllm - Tostino opened this issue over 1 year ago

What is the max number prompts that the generate() method can take

github.com/vllm-project/vllm - hxue3 opened this issue over 1 year ago

Low VRAM batch processing mode

github.com/vllm-project/vllm - viktor-ferenczi opened this issue over 1 year ago

Feature request. Allow a few model instances in one GPU if they can feet in VRAM.

github.com/vllm-project/vllm - agrogov opened this issue over 1 year ago

feat: demonstrate using regex for suffix matching

github.com/vllm-project/vllm - wsxiaoys opened this pull request over 1 year ago

Memory leak

github.com/vllm-project/vllm - SatoshiReport opened this issue over 1 year ago

StreamingLLM support?

github.com/vllm-project/vllm - nivibilla opened this issue over 1 year ago

workaround of AWQ for Turing GPUs

github.com/vllm-project/vllm - twaka opened this pull request over 1 year ago

Jetson agx orin

github.com/vllm-project/vllm - MrBrabus75 opened this issue over 1 year ago

Data parallel inference

github.com/vllm-project/vllm - kevinhu opened this issue over 1 year ago

Support Python 3.12

github.com/vllm-project/vllm - EwoutH opened this issue over 1 year ago

3 gpu's not supported?

github.com/vllm-project/vllm - ye7love7 opened this issue over 1 year ago

Generate nothing from VLLM output

github.com/vllm-project/vllm - FocusLiwen opened this issue over 1 year ago

[Discussion] Will vLLM consider using Speculative Sampling to accelerating LLM decoding?

github.com/vllm-project/vllm - gesanqiu opened this issue over 1 year ago

vLLM to add a locally trained model

github.com/vllm-project/vllm - atanikan opened this issue over 1 year ago

vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly.

github.com/vllm-project/vllm - MUZAMMILPERVAIZ opened this issue over 1 year ago

AWQ: bfloat16 not supported? And `--dtype` arg doesn't allow specifying float16

github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago

vLLM Discord Server

github.com/vllm-project/vllm - zhuohan123 opened this issue over 1 year ago

Waiting sequence group should have only one prompt sequence.

github.com/vllm-project/vllm - Link-Li opened this issue over 1 year ago

Inconsistent results between HuggingFace Transformers and vllm

github.com/vllm-project/vllm - normster opened this issue over 1 year ago

How to deploy api server as https

github.com/vllm-project/vllm - yilihtien opened this issue over 1 year ago

vllm hangs when reinitializing ray

github.com/vllm-project/vllm - nelson-liu opened this issue over 1 year ago

How to use vllm to compute ppl score for input text?

github.com/vllm-project/vllm - yinochaos opened this issue over 1 year ago

GGUF support

github.com/vllm-project/vllm - viktor-ferenczi opened this issue over 1 year ago

AsyncEngineDeadError / RuntimeError: CUDA error: an illegal memory access was encountered

github.com/vllm-project/vllm - xingyaoww opened this issue over 1 year ago

It seems that SamplingParams doesnt support the bad_words_ids parameter when generating

github.com/vllm-project/vllm - mengban opened this issue over 1 year ago

can model Qwen/Qwen-VL-Chat work well?

github.com/vllm-project/vllm - wangschang opened this issue over 1 year ago

vllm reducing quality when loading local fine tuned Llama-2-13b-hf model

github.com/vllm-project/vllm - BerndHuber opened this issue over 1 year ago

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

github.com/vllm-project/vllm - amulil opened this issue over 1 year ago

Is there authentication supported?

github.com/vllm-project/vllm - mluogh opened this issue over 1 year ago

Loading Model through Multi-Node Ray Cluster Fails

github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue over 1 year ago

SIGABRT - Fatal Python error: Aborted when running vllm on llama2-7b with --tensor-parallel-size 2

github.com/vllm-project/vllm - dhritiman opened this issue over 1 year ago

Sagemaker support for inference

github.com/vllm-project/vllm - Tarun3679 opened this issue over 1 year ago

start vllm.entrypoints.api_server model vicuna-13b-v1.3 error: Fatal Python error: Bus error

github.com/vllm-project/vllm - luefei opened this issue over 1 year ago

Support for RLHF (ILQL)-trained Models

github.com/vllm-project/vllm - ojus1 opened this issue over 1 year ago

vLLM full name

github.com/vllm-project/vllm - designInno opened this issue over 1 year ago

Stream Tokens operation integration into LLM class (which uses LLMEngine behind the scenes)

github.com/vllm-project/vllm - orellavie1212 opened this issue over 1 year ago

pip installation error - ERROR: Failed building wheel for vllm

github.com/vllm-project/vllm - dxlong2000 opened this issue over 1 year ago

Stuck in Initializing an LLM engine

github.com/vllm-project/vllm - EvilCalf opened this issue over 1 year ago

Feature request: Support for embedding models

github.com/vllm-project/vllm - mantrakp04 opened this issue over 1 year ago

test qwen-7b-chat model and output incorrect

github.com/vllm-project/vllm - dachengai opened this issue over 1 year ago

What a fast tokenizer can be used for Baichuan-13b?

github.com/vllm-project/vllm - FURYFOR opened this issue over 1 year ago

vllm如何量化部署

github.com/vllm-project/vllm - xxm1668 opened this issue over 1 year ago

Issue with raylet error

github.com/vllm-project/vllm - ZihanWang314 opened this issue over 1 year ago

Memory leak while using tensor_parallel_size>1

github.com/vllm-project/vllm - haiasd opened this issue over 1 year ago

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

github.com/vllm-project/vllm - jinfengfeng opened this issue over 1 year ago

Installing with ROCM

github.com/vllm-project/vllm - baderex opened this issue over 1 year ago

Best effort support for all Hugging Face transformers models

github.com/vllm-project/vllm - dwyatte opened this issue over 1 year ago

ValueError: The number of GPUs per node is not divisible by the number of tensor parallelism.

github.com/vllm-project/vllm - beratcmn opened this issue over 1 year ago

Cannot get a simple example working with multi-GPU

github.com/vllm-project/vllm - brevity2021 opened this issue over 1 year ago

多gpus如何使用？

github.com/vllm-project/vllm - xxm1668 opened this issue over 1 year ago

ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b

github.com/vllm-project/vllm - McCarrtney opened this issue over 1 year ago

vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.

github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago

LlaMA 2: Input prompt (2664 tokens) is too long and exceeds limit of 2048/2560

github.com/vllm-project/vllm - foamliu opened this issue over 1 year ago

Flash Attention V2

github.com/vllm-project/vllm - nivibilla opened this issue over 1 year ago

Faster model loading

github.com/vllm-project/vllm - imoneoi opened this issue over 1 year ago

+34% higher throughput?

github.com/vllm-project/vllm - naed90 opened this issue over 1 year ago

[Feature Request] Support input embedding in `LLM.generate()`

github.com/vllm-project/vllm - KimmiShi opened this issue over 1 year ago

Decode error while inferencing a batch of prompts

github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago

Support Multiple Models

github.com/vllm-project/vllm - aldrinc opened this issue over 1 year ago

Feature request：support ExLlama

github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago

8bit support

github.com/vllm-project/vllm - mymusise opened this issue over 1 year ago

Require a "Wrapper" feature

github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago

CTranslate2

github.com/vllm-project/vllm - Matthieu-Tinycoaching opened this issue over 1 year ago

Remove Ray for the dependency

github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago

CUDA error: out of memory

github.com/vllm-project/vllm - SunixLiu opened this issue over 1 year ago

Adding support for encoder-decoder models, like T5 or BART

github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago

Can I directly obtain the logits here?

github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago

Whisper support

github.com/vllm-project/vllm - gottlike opened this issue over 1 year ago

Build failure due to CUDA version mismatch

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Support custom models

github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago

Add docstrings to some modules and classes