Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Usage]: how should I do data parallelism using vLLM?

YuWang916 opened this issue 7 months ago
[Bugfix] Fix KV head calculation for MPT models when using GQA

bfontain opened this pull request 7 months ago
[CI/Build] Test buildkite monorepo plugin

dgoupil opened this pull request 7 months ago
[Core] Remove unnecessary copies in flash attn backend

Yard1 opened this pull request 7 months ago
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU

tlrmchlsmth opened this pull request 7 months ago
[Bug]: nsys cannot track the cuda kernel called by the process except rank 0

crazy-JiangDongHua opened this issue 7 months ago
[CI/Build] increase wheel size limit to 200 MB

youkaichao opened this pull request 7 months ago
[Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py

ita9naiwa opened this pull request 7 months ago
[Feature]:

double-vin opened this issue 7 months ago
[Usage]: extractive question answering using VLLM

suryavan11 opened this issue 7 months ago
[Doc] Use intersphinx and update entrypoints docs

DarkLight1337 opened this pull request 7 months ago
[New Model]: LLaVA-NeXT-Video support

AmazDeng opened this issue 7 months ago
[Bug]: The tail problem

ZixinxinWang opened this issue 7 months ago
Add gptq_marlin test to cover bug report #5088

alexm-neuralmagic opened this pull request 7 months ago
[Bugfix] Avoid Warnings in SparseML Activation Quantization

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bugfix] Automatically Detect SparseML models

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Misc] Simplify code and fix type annotations in `conftest.py`

DarkLight1337 opened this pull request 7 months ago
[Usage]: Multiple samplig params with OpenAI library

JH-lee95 opened this issue 7 months ago
[Kernel] Add `w4a16` support for `compressed_tensors` models

dsikka opened this pull request 7 months ago
[Kernel] Add support for block size 96 to the paged attention kernel.

bfontain opened this pull request 7 months ago
[Kernel] CUTLASS epilogue refactor and kernels with quantized outputs

tlrmchlsmth opened this pull request 7 months ago
[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters

flexorRegev opened this issue 7 months ago
[CI/Build] Docker cleanup functionality for amd servers

okakarpa opened this pull request 7 months ago
[Bug]: vLLM embeddings example code doesn't work

orionw opened this issue 7 months ago
New CI template on AWS stack

khluu opened this pull request 7 months ago
[ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite

eduardozamudio opened this issue 7 months ago
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter

alexm-neuralmagic opened this pull request 7 months ago
[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM

AkshataDM opened this issue 7 months ago
[Doc][Build] update after removing vllm-nccl

youkaichao opened this pull request 7 months ago
[Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run

sung-ho-moon opened this issue 7 months ago
[Speculative Decoding] Enable arbitrary model inputs

abhigoyal1997 opened this pull request 7 months ago
[CI/Build] Simplify OpenAI server setup in tests

DarkLight1337 opened this pull request 7 months ago
[Core] Avoid the need to pass `None` values to `Sequence.inputs`

DarkLight1337 opened this pull request 7 months ago
[Misc] Add vLLM version getter to utils

DarkLight1337 opened this pull request 7 months ago
[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff`

DarkLight1337 opened this pull request 7 months ago
[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterators`

DarkLight1337 opened this pull request 7 months ago
[Bug]: Can't run vllm distributed inference with vLLM + Ray

linchen111 opened this issue 7 months ago
[Feature] vLLM CLI for serving and querying OpenAI compatible server

EthanqX opened this pull request 8 months ago
[Bug]: Gemma model fails with GPTQ marlin

arunpatala opened this issue 8 months ago
[Installation]: Error when importing LLM from vllm

manishkumar0709 opened this issue 8 months ago
[Bug]: The vllm is disconnected after running for some time

zxcdsa45687 opened this issue 8 months ago
[RFC]: OpenAI Triton-only backend

bringlein opened this issue 8 months ago
[Model] Support MAP-NEO model

xingweiqu opened this pull request 8 months ago
[Usage]: quantization option usage

Juelianqvq opened this issue 8 months ago
[Bug]: Build/Install Issues with pip install -e .

Msiavashi opened this issue 8 months ago
[Model] Add support for falcon-11B

Isotr0py opened this pull request 8 months ago
Heterogeneous Speculative Decoding (CPU + GPU)

jiqing-feng opened this pull request 8 months ago
[Model] Add Internlm2 LoRA support

Isotr0py opened this pull request 8 months ago
[Misc]: How to use guided decoding and regex as well?

debraj135 opened this issue 8 months ago
[Bug]: When load model weights, there are infinite loading

tjrlwjd1 opened this issue 8 months ago
[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3

yananchen1989 opened this issue 8 months ago
[Core] Allow AQLM on Pascal

sasha0552 opened this pull request 8 months ago
[Bug]: Cannot build cpu docker image

licryle opened this issue 8 months ago
[Feature]: multi-steps model_runner?

leiwen83 opened this issue 8 months ago
[Frontend] Add tokenize/detokenize endpoints

sasha0552 opened this pull request 8 months ago
[Bugfix] Adds outlines performance improvement

lynkz-matt-psaltis opened this pull request 8 months ago
Running Vllm on ray cluster, logging stuck at loading

maherr13 opened this issue 8 months ago
[Feature]: Add num_requests_preempted metric

sathyanarays opened this issue 8 months ago
Chat method for offline llm

nunjunj opened this pull request 8 months ago
[Installation]:

Kastycupra opened this issue 8 months ago
Bump version to v0.4.3

simon-mo opened this pull request 8 months ago
[Misc] add logging level env var

youkaichao opened this pull request 8 months ago
[Misc] Make Serving Benchmark More User-friendly

ywang96 opened this pull request 8 months ago
ci draft

khluu opened this pull request 8 months ago
[Model] Enable FP8 QKV in MoE and refine kernel tuning script

comaniac opened this pull request 8 months ago
[Core] Change LoRA embedding sharding to support loading methods

Yard1 opened this pull request 8 months ago
[Kernel] Dynamic Per-Token Activation Quantization

dsikka opened this pull request 8 months ago
[Kernel][RFC] Refactor the punica kernel based on Triton

jeejeelee opened this pull request 8 months ago
[Bug]: 英伟达最新驱动555.85,vllm运行报错

gaye746560359 opened this issue 8 months ago
[Misc]: LLM is responding with advertisement

Pocoyo7798 opened this issue 8 months ago
[FRONTEND] OpenAI `tools` support named functions

br3no opened this pull request 8 months ago
[Bugfix] logprobs is not compatible with the OpenAI spec #4795

Etelis opened this pull request 8 months ago
[BUGFIX] [FRONTEND] Correct chat logprobs

br3no opened this pull request 8 months ago
[Bugfix][Frontend] Cleanup "fix chat logprobs"

DarkLight1337 opened this pull request 8 months ago
[Bug]: Wrong results in LangChain integration

Warit314 opened this issue 8 months ago
[Bug]: Mistral 7b inst v0.3 fails to run

yaronr opened this issue 8 months ago