Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[New Model]: Mistral-Nemo

Hambaobao opened this issue 6 months ago
[Usage]: Can't utilize all VRAM for context

vlsav opened this issue 6 months ago
[Performance]: GPU utilization is low when running large batches on H100

sleepwalker2017 opened this issue 6 months ago
[ Misc ] `fbgemm` checkpoints

robertgshaw2-neuralmagic opened this pull request 6 months ago
[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

EstellaXinyuZhang opened this issue 6 months ago
[Core] Allow specifying custom Executor

Yard1 opened this pull request 6 months ago
[RFC]: Single Program Multiple Data (SPMD) Worker Control Plane

ruisearch42 opened this issue 6 months ago
[Bug]: vllm doesn't support multi-instance GPU

cfhammill opened this issue 6 months ago
[ci][test] add correctness test for cpu offloading

youkaichao opened this pull request 6 months ago
[Model] Support Mistral-Nemo

mgoin opened this pull request 6 months ago
[ Kernel ] Enable Dynamic Per Token `fp8`

robertgshaw2-neuralmagic opened this pull request 6 months ago
[CI/Build] bump ruff version, fix linting issues

dtrifiro opened this pull request 6 months ago
[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support

bjoernpl opened this issue 6 months ago
[Usage]: How to release GPU of vLLM model in python code

quanshr opened this issue 6 months ago
[CI/Build] replace yapf with ruff

dtrifiro opened this pull request 6 months ago
[Misc] Consolidate and optimize logic for building padded tensors

DarkLight1337 opened this pull request 6 months ago
[Bug]: vllm turned off my pc (loading mixtral8x7b)

juanluis17 opened this issue 6 months ago
[Bug]: vllm not support fp8 kv cache when use flashinfer

kuangdao opened this issue 6 months ago
[Bugfix] Corrected Typographical Errors from "indicies" to "indices"

JHLEE17 opened this pull request 6 months ago
[Core] Reduce unnecessary compute when logprobs=None

peng1999 opened this pull request 6 months ago
[doc][distributed] add more doc for setting up multi-node environment

youkaichao opened this pull request 6 months ago
[Misc] Support FP8 kv cache scales from compressed-tensors

mgoin opened this pull request 6 months ago
added bitsandbytes dependency in common requirement.txt file

dipatidar opened this pull request 6 months ago
[Misc] Small perf improvements

Yard1 opened this pull request 6 months ago
[Model] Pipeline Parallel Support for DeepSeek v2

tjohnson31415 opened this pull request 6 months ago
[Model] Initialize support for InternVL2 series models

Isotr0py opened this pull request 6 months ago
FP8 Dynamic-Per-Token Quant

varun-sundar-rabindranath opened this pull request 6 months ago
[DOC] - Add docker image to Cerebrium Integration

milo157 opened this pull request 6 months ago
[Feature]: Add OpenAI server `prompt_logprobs` support

Theodotus1243 opened this issue 6 months ago
[TPU] Refactor TPU worker & model runner

WoosukKwon opened this pull request 6 months ago
[Misc] Use `torch.Tensor` for type annotation

WoosukKwon opened this pull request 6 months ago
[TPU] Remove multi-modal args in TPU backend

WoosukKwon opened this pull request 6 months ago
[New Model]: Support for Telechat

hzhaoy opened this issue 6 months ago
[Model] Add Support for GPTQ Fused MOE

izhuhaoran opened this pull request 6 months ago
deploying embedding model in same way as LLM

riyajatar37003 opened this issue 6 months ago
[core][model] yet another cpu offload implementation

youkaichao opened this pull request 6 months ago
[Bugfix] Fix for multinode crash on 4 PP

andoorve opened this pull request 6 months ago
[Bug]: The metrics have not improved.

zjjznw123 opened this issue 6 months ago
Sequence parallel

wbdr opened this pull request 6 months ago
[Not for review]test gemma lora

jeejeelee opened this pull request 6 months ago
[misc][distributed] add seed to dummy weights

youkaichao opened this pull request 6 months ago
[CI/Build] Update flashinfer to v0.0.9 (#6489)

170928 opened this pull request 6 months ago
[misc][distributed] improve tests

youkaichao opened this pull request 6 months ago
[ Kernel ] Fp8 Channelwise Weight Support

robertgshaw2-neuralmagic opened this pull request 6 months ago
[Bug]: No module named `jsonschema.protocols`.

eff-kay opened this issue 6 months ago
[Model] Support Mamba

tlrmchlsmth opened this pull request 6 months ago
[Not for review] Spmd tp rebase

ruisearch42 opened this pull request 6 months ago
[ROCm] Cleanup Dockerfile and remove outdated patch

hongxiayang opened this pull request 6 months ago
[New Model]: Codestral Mamba

K-Mistele opened this issue 6 months ago
[Bug]: Gemma 27B crashes on GCP A100

noamgat opened this issue 6 months ago
[Feature]: Pipeline parallelism support for qwen model

hiyforever opened this issue 6 months ago
[Usage]: PeftModelForCausalLM is not JSON serializable

jazzisfuture opened this issue 6 months ago
[Misc][Speculative decoding] Typos and typing fixes

ShangmingCai opened this pull request 7 months ago
[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE)

weiminw opened this issue 7 months ago
unable to run vllm model deployment

riyajatar37003 opened this issue 7 months ago
[Bugfix][Frontend] Fix missing `/metrics` endpoint

DarkLight1337 opened this pull request 7 months ago
[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2

vlsav opened this issue 7 months ago
[CI/Build] Remove "boardwalk" image asset

DarkLight1337 opened this pull request 7 months ago
[Misc] Log spec decode metrics

comaniac opened this pull request 7 months ago
[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron

servient-ashwin opened this issue 7 months ago
[Model] H2O Danube3-4b

g-eoj opened this pull request 7 months ago
[Bug]: Seed issue with Pipeline Parallel

andoorve opened this issue 7 months ago
[Not for review] PP ADAG

ruisearch42 opened this pull request 7 months ago
[Core] Use numpy to speed up padded token processing

peng1999 opened this pull request 7 months ago
[Draft] proposal for ipex quant support

jikunshang opened this pull request 7 months ago
[doc][misc] doc update

youkaichao opened this pull request 7 months ago
[Doc] add env docs for flashinfer backend

DefTruth opened this pull request 7 months ago
[VLM] Minor space optimization for `ClipVisionModel`

ywang96 opened this pull request 7 months ago
Add FUNDING.yml

simon-mo opened this pull request 7 months ago
v0.5.2, v0.5.3, v0.6.0 Release Tracker

simon-mo opened this issue 7 months ago
bump version to v0.5.2

simon-mo opened this pull request 7 months ago
[Bug]: autogen can't work with vllm v0.5.1

tonyaw opened this issue 7 months ago
[Doc][CI/Build] Update docs and tests to use `vllm serve`

DarkLight1337 opened this pull request 7 months ago
[Bugfix] Convert image to RGB by default

DarkLight1337 opened this pull request 7 months ago
[Bug]: Paligemma support for PNG files

BabyChouSr opened this issue 7 months ago
[ CI ] 0.4.3.post1 Hotfix

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Feature]: Return softmax of attention layer.

DouHappy opened this issue 7 months ago
[ Misc ] Enable Quantizing All Layers of DeekSeekv2

robertgshaw2-neuralmagic opened this pull request 7 months ago