github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[Usage]: how should I do data parallelism using vLLM?

YuWang916 opened this issue 7 months ago

[Bugfix] Fix KV head calculation for MPT models when using GQA

bfontain opened this pull request 7 months ago

[CI/Build] Test buildkite monorepo plugin

dgoupil opened this pull request 7 months ago

[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.

Etelis opened this pull request 7 months ago

[Core] Remove unnecessary copies in flash attn backend

Yard1 opened this pull request 7 months ago

[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU

tlrmchlsmth opened this pull request 7 months ago

[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)

alexm-neuralmagic opened this pull request 7 months ago

[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest`

Etelis opened this pull request 7 months ago

[Usage]: Do we have any tutorials for using vllm with tensorrt-LLM?

weiyunfei opened this issue 7 months ago

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0

crazy-JiangDongHua opened this issue 7 months ago

[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier

sroy745 opened this pull request 7 months ago

[CI/Build] increase wheel size limit to 200 MB

youkaichao opened this pull request 7 months ago

[Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py

ita9naiwa opened this pull request 7 months ago

[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX

zhaofangtao opened this issue 7 months ago

[Feature]:

double-vin opened this issue 7 months ago

[Usage]: extractive question answering using VLLM

suryavan11 opened this issue 7 months ago

[Doc] Use intersphinx and update entrypoints docs

DarkLight1337 opened this pull request 7 months ago

[New Model]: LLaVA-NeXT-Video support

AmazDeng opened this issue 7 months ago

[Bug]: The tail problem

ZixinxinWang opened this issue 7 months ago

Add gptq_marlin test to cover bug report https://github.com/vllm-project/vllm/issues/5088

alexm-neuralmagic opened this pull request 7 months ago

Add gptq_marlin test to cover bug report #5088

alexm-neuralmagic opened this pull request 7 months ago

[Bugfix] Avoid Warnings in SparseML Activation Quantization

robertgshaw2-neuralmagic opened this pull request 7 months ago

[Bugfix] Automatically Detect SparseML models

robertgshaw2-neuralmagic opened this pull request 7 months ago

[Misc] Simplify code and fix type annotations in `conftest.py`

DarkLight1337 opened this pull request 7 months ago

[Usage]: Multiple samplig params with OpenAI library

JH-lee95 opened this issue 7 months ago

[Kernel] Add `w4a16` support for `compressed_tensors` models

dsikka opened this pull request 7 months ago

[Kernel] Add support for block size 96 to the paged attention kernel.

bfontain opened this pull request 7 months ago

[Kernel] CUTLASS epilogue refactor and kernels with quantized outputs

tlrmchlsmth opened this pull request 7 months ago

[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters

flexorRegev opened this issue 7 months ago

[CI/Build] Docker cleanup functionality for amd servers

okakarpa opened this pull request 7 months ago

[Bug]: vLLM embeddings example code doesn't work

orionw opened this issue 7 months ago

New CI template on AWS stack

khluu opened this pull request 7 months ago

[ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite

eduardozamudio opened this issue 7 months ago

[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter

alexm-neuralmagic opened this pull request 7 months ago

[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM

AkshataDM opened this issue 7 months ago

[Bug]: async engine failure when placing multi lora adapter under load

DavidPeleg6 opened this issue 7 months ago

[Bug]: can not clean up the memory usage after instantiating the LLM class.

c3-ali opened this issue 7 months ago

[Doc][Build] update after removing vllm-nccl

youkaichao opened this pull request 7 months ago

[Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run

sung-ho-moon opened this issue 7 months ago

[Speculative Decoding] Enable arbitrary model inputs

abhigoyal1997 opened this pull request 7 months ago

[CI/Build] Simplify OpenAI server setup in tests

DarkLight1337 opened this pull request 7 months ago

[Core] Avoid the need to pass `None` values to `Sequence.inputs`

DarkLight1337 opened this pull request 7 months ago

[Misc] Add vLLM version getter to utils

DarkLight1337 opened this pull request 7 months ago

[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff`

DarkLight1337 opened this pull request 7 months ago

[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterators`

DarkLight1337 opened this pull request 7 months ago

[Bug]: Can't run vllm distributed inference with vLLM + Ray

linchen111 opened this issue 7 months ago

[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors.

macheng6 opened this issue 7 months ago

[Feature] vLLM CLI for serving and querying OpenAI compatible server

EthanqX opened this pull request 8 months ago

[Bug]: Gemma model fails with GPTQ marlin

arunpatala opened this issue 8 months ago

[Installation]: Error when importing LLM from vllm

manishkumar0709 opened this issue 8 months ago

[Bug]: The vllm is disconnected after running for some time

zxcdsa45687 opened this issue 8 months ago

[RFC]: OpenAI Triton-only backend

bringlein opened this issue 8 months ago

curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues

fishingcatgo opened this issue 8 months ago

[Model] Support MAP-NEO model

xingweiqu opened this pull request 8 months ago

[Usage]: quantization option usage

Juelianqvq opened this issue 8 months ago

[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint

youkaichao opened this pull request 8 months ago

[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label

KuntaiDu opened this pull request 8 months ago

[Bug]: Build/Install Issues with pip install -e .

Msiavashi opened this issue 8 months ago

[Model] Add support for falcon-11B

Isotr0py opened this pull request 8 months ago

[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash

AllenDou opened this pull request 8 months ago

[Bug]: The VRAM usage of calculating log_probs is not considered in profile run

Conless opened this issue 8 months ago

[Feature]: Integration of transformers past_key_values into the vllm kvcache Function

ChaoZhou2023 opened this issue 8 months ago

Heterogeneous Speculative Decoding (CPU + GPU)

jiqing-feng opened this pull request 8 months ago

[Model] Add Internlm2 LoRA support

Isotr0py opened this pull request 8 months ago

[Misc]: How to use guided decoding and regex as well?

debraj135 opened this issue 8 months ago

[Bug]: When load model weights, there are infinite loading

tjrlwjd1 opened this issue 8 months ago

[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3

yananchen1989 opened this issue 8 months ago

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.

heungson opened this issue 8 months ago

[Core] Allow AQLM on Pascal

sasha0552 opened this pull request 8 months ago

[Bug]: Cannot build cpu docker image

licryle opened this issue 8 months ago

[Feature]: multi-steps model_runner?

leiwen83 opened this issue 8 months ago

[Frontend] Add tokenize/detokenize endpoints

sasha0552 opened this pull request 8 months ago

[Bugfix] Adds outlines performance improvement

lynkz-matt-psaltis opened this pull request 8 months ago

Running Vllm on ray cluster, logging stuck at loading

maherr13 opened this issue 8 months ago

[Feature]: Add num_requests_preempted metric

sathyanarays opened this issue 8 months ago

Chat method for offline llm

nunjunj opened this pull request 8 months ago

[Installation]:

Kastycupra opened this issue 8 months ago

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops

bnellnm opened this pull request 8 months ago

Bump version to v0.4.3

simon-mo opened this pull request 8 months ago

[Misc] add logging level env var

youkaichao opened this pull request 8 months ago

[Misc] Make Serving Benchmark More User-friendly

ywang96 opened this pull request 8 months ago

[Misc]: Understanding Batching Mechanism in Prefill and Decode Phases

Msiavashi opened this issue 8 months ago

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes

achandrasekar opened this issue 8 months ago

ci draft

khluu opened this pull request 8 months ago

[Model] Enable FP8 QKV in MoE and refine kernel tuning script

comaniac opened this pull request 8 months ago

[Core] Change LoRA embedding sharding to support loading methods

Yard1 opened this pull request 8 months ago

[Kernel] Dynamic Per-Token Activation Quantization

dsikka opened this pull request 8 months ago

[Kernel][RFC] Refactor the punica kernel based on Triton

jeejeelee opened this pull request 8 months ago

[Bug]: 英伟达最新驱动555.85，vllm运行报错

gaye746560359 opened this issue 8 months ago

[CI/Build] CMakeLists: build all extensions' cmake targets at the same time

dtrifiro opened this pull request 8 months ago

[Misc]: LLM is responding with advertisement

Pocoyo7798 opened this issue 8 months ago

[FRONTEND] OpenAI `tools` support named functions

br3no opened this pull request 8 months ago

[Bugfix] logprobs is not compatible with the OpenAI spec #4795

Etelis opened this pull request 8 months ago

[Bug]: Command-R incorrect output contains `<EOS_TOKEN>` and seems to do text prediction rather than conversation

epignatelli opened this issue 8 months ago

[BUGFIX] [FRONTEND] Correct chat logprobs

br3no opened this pull request 8 months ago

[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response.

fengshansi opened this issue 8 months ago

[Bugfix][Frontend] Cleanup "fix chat logprobs"

DarkLight1337 opened this pull request 8 months ago

[Kernel] Initial commit containing new Triton kernels for multi lora serving.

FurtherAI opened this pull request 8 months ago

[Bug]: Wrong results in LangChain integration

Warit314 opened this issue 8 months ago

[Bug]: Mistral 7b inst v0.3 fails to run

yaronr opened this issue 8 months ago