Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Bug]: Qwen2-VL incoherent output with OpenAI API

SinanAkkoyun opened this issue 3 months ago
[Bug]: tensor parallelism multinode

gpucce opened this issue 3 months ago
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode

llsj14 opened this pull request 3 months ago
[Bug]: Jetson support regression

conroy-cheers opened this issue 3 months ago
[Doc] Specify async engine args in docs

DarkLight1337 opened this pull request 3 months ago
[V1] Prototype Fully Async Detokenizer

robertgshaw2-neuralmagic opened this pull request 3 months ago
[core] cudagraph output with tensor weak reference

youkaichao opened this pull request 3 months ago
[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL

hector-gr opened this issue 3 months ago
[Bugfix] Use temporary directory in registry

DarkLight1337 opened this pull request 3 months ago
[Model] Add BNB quantization support for Mllama

Isotr0py opened this pull request 3 months ago
[Misc] SpecDecodeWorker supports profiling

Abatom opened this pull request 3 months ago
[torch.compile] rework compile control with piecewise cudagraph

youkaichao opened this pull request 3 months ago
[Model] Add classification Task with Qwen2ForSequenceClassification

kakao-kevin-us opened this pull request 3 months ago
[Usage]: Using a model for inference and embedding

micuentadecasa opened this issue 3 months ago
CI TEST

maxdebayser opened this pull request 3 months ago
[Model] Support math-shepherd-mistral-7b-prm model

Went-Liang opened this pull request 3 months ago
[Model] Support GGUF models newly added in `transformers` 4.46.0

Isotr0py opened this pull request 3 months ago
[Core] Support offloading KV cache to CPU

KuntaiDu opened this pull request 3 months ago
[Build] skip renaming files for release wheels pipeline

simon-mo opened this pull request 3 months ago
[Doc] Update FAQ links in spec_decode.rst

whyiug opened this pull request 3 months ago
[Usage]: Pass multiple LoRA modules through YAML config

andreapairon opened this issue 3 months ago
[Feature]: support SageAttention

LSC527 opened this issue 3 months ago
[Performance]: Low GPU utilization - is it normal?

fzyzcjy opened this issue 3 months ago
[V1] Move mm_input_mapper to a separate process

WoosukKwon opened this pull request 3 months ago
[Bug]: pipepline parallel performance issue for 1 sample.

littletomatodonkey opened this issue 3 months ago
[torch.compile] Adding torch compile annotations to some models

CRZbulabula opened this pull request 3 months ago
[Usage]: Multimodal content with benchmark_serving.py

khayamgondal opened this issue 3 months ago
[Bugfix] Fix edge cases for MistralTokenizer

tjohnson31415 opened this pull request 3 months ago
[Model][LoRA]LoRA support added for Qwen

jeejeelee opened this pull request 3 months ago
[CI/Build] improve python-only dev setup

dtrifiro opened this pull request 3 months ago
[Bug]: crash:RecursionError: maximum recursion depth exceeded

wciq1208 opened this issue 3 months ago
[New Model]: stepfun-ai/GOT-OCR2_0

akhileshsharma99 opened this issue 3 months ago
[Core] Make encoder-decoder inputs a nested structure to be more composable

DarkLight1337 opened this pull request 3 months ago
Linter test

maxdebayser opened this pull request 3 months ago
[Misc] Upgrade to pytorch 2.5

bnellnm opened this pull request 3 months ago
[Feature]: LoRA support for Qwen model

zhangfan-algo opened this issue 3 months ago
[Bugfix] use AF_INET6 instead of AF_INET for OpenAI Compatible Server

jxpxxzj opened this pull request 3 months ago
[Feature]: Support for 1.58-bit models.

RealMrCactus opened this issue 3 months ago
[Performance]: vllm Eagle performance is worse than expected

LiuXiaoxuanPKU opened this issue 3 months ago
[Bug]: benchmark serving does not support --best_of>1

homeffjy opened this issue 3 months ago
[Bug]: MistralTokenizer Detokenization Issue

prashantgupta24 opened this issue 3 months ago
[Usage]: Custom LLM Generate

Blaizzy opened this issue 3 months ago
[Bugfix][Misc]: fix graph capture for decoder

yudian0504 opened this pull request 3 months ago
[New Model]: bert-base-chinese

kangzemin opened this issue 3 months ago
[Feature]: Support for Controlled Decoding

simonucl opened this issue 3 months ago
[Performance]: bitsandbytes quantization slow

lance0108 opened this issue 3 months ago
[Feature]: EAGLE fp8 quantization

fengyang95 opened this issue 3 months ago
[Bugfix] Fix load config when using bools

madt2709 opened this pull request 3 months ago
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger

heheda12345 opened this pull request 3 months ago
[Frontend] Support suffix in completions API (fill-in-the-middle)

njhill opened this pull request 3 months ago
Adds method to read the pooling types from model's files

flaviabeo opened this pull request 3 months ago
[Feature]: LoRA support for InternVLChatModel

AkshataABhat opened this issue 3 months ago
[Misc] Fix ImportError causing by triton

MengqingCao opened this pull request 3 months ago
[Usage]: When to use flashinfer as the default backend

ehuaa opened this issue 3 months ago
【Frontend】Add sampler_priority and repetition_penalty_range

ZeroYuJie opened this pull request 3 months ago
[Feature]: Alternating local-global attention layers

griff4692 opened this issue 3 months ago
[Feature]: Consider parallel_tool_calls parameter at the API level

lucasalvarezlacasa opened this issue 3 months ago
[Misc]: offline inference inconsistency result of qwen2-7b

poppybrown opened this issue 3 months ago
[Bug]: vllm startup model error /proc file not found

970602 opened this issue 3 months ago
[Misc] Compute query_start_loc/seq_start_loc on CPU

zhengy001 opened this pull request 3 months ago
[Frontend] re-enable multi-modality input in the new beam search implementation

FerdinandZhong opened this pull request 3 months ago
[Bug]: Speculative decoding breaks guided decoding.

roberthoenig opened this issue 3 months ago
Begin refactoring executor_base ABC

jberkhahn opened this pull request 4 months ago
Support Roberta embedding models

maxdebayser opened this pull request 4 months ago
[Performance][Kernel] Fused_moe Performance Improvement

charlifu opened this pull request 4 months ago
[New Model]: Support Zyphra/Zamba2-7B

mgoin opened this issue 4 months ago
[CI/Build] remove .github from .dockerignore

dtrifiro opened this pull request 4 months ago
[Neuron] [Bugfix] Fix neuron startup

xendo opened this pull request 4 months ago