Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Doc] Specify async engine args in docs

DarkLight1337 opened this pull request about 2 months ago
[V1] Prototype Fully Async Detokenizer

robertgshaw2-neuralmagic opened this pull request about 2 months ago
[core] cudagraph output with tensor weak reference

youkaichao opened this pull request about 2 months ago
[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL

hector-gr opened this issue about 2 months ago
[Performance]: How to Improve Performance Under Concurrency

ljwps opened this issue about 2 months ago
[Bugfix] Use temporary directory in registry

DarkLight1337 opened this pull request about 2 months ago
[Model] Add BNB quantization support for Mllama

Isotr0py opened this pull request about 2 months ago
[Misc] SpecDecodeWorker supports profiling

Abatom opened this pull request about 2 months ago
[torch.compile] rework compile control with piecewise cudagraph

youkaichao opened this pull request about 2 months ago
[Model] Add classification Task with Qwen2ForSequenceClassification

kakao-kevin-us opened this pull request about 2 months ago
[Usage]: Using a model for inference and embedding

micuentadecasa opened this issue about 2 months ago
CI TEST

maxdebayser opened this pull request about 2 months ago
[Model] Support math-shepherd-mistral-7b-prm model

Went-Liang opened this pull request about 2 months ago
[Model] Support GGUF models newly added in `transformers` 4.46.0

Isotr0py opened this pull request about 2 months ago
[Core] Support offloading KV cache to CPU

KuntaiDu opened this pull request about 2 months ago
[Build] skip renaming files for release wheels pipeline

simon-mo opened this pull request about 2 months ago
[Doc] Update FAQ links in spec_decode.rst

whyiug opened this pull request about 2 months ago
[V1] Move mm_input_mapper to a separate process

WoosukKwon opened this pull request about 2 months ago
[torch.compile] Adding torch compile annotations to some models

CRZbulabula opened this pull request about 2 months ago
[Bugfix] Fix edge cases for MistralTokenizer

tjohnson31415 opened this pull request 2 months ago
[Model][LoRA]LoRA support added for Qwen

jeejeelee opened this pull request 2 months ago
[CI/Build] improve python-only dev setup

dtrifiro opened this pull request 2 months ago
[Bug]: crash:RecursionError: maximum recursion depth exceeded

wciq1208 opened this issue 2 months ago
[Core] Make encoder-decoder inputs a nested structure to be more composable

DarkLight1337 opened this pull request 2 months ago
Linter test

maxdebayser opened this pull request 2 months ago
[Misc] Upgrade to pytorch 2.5

bnellnm opened this pull request 2 months ago
[Feature]: LoRA support for Qwen model

zhangfan-algo opened this issue 2 months ago
[Bugfix] use AF_INET6 instead of AF_INET for OpenAI Compatible Server

jxpxxzj opened this pull request 2 months ago
[Performance]: vllm Eagle performance is worse than expected

LiuXiaoxuanPKU opened this issue 2 months ago
[Bug]: MistralTokenizer Detokenization Issue

prashantgupta24 opened this issue 2 months ago
[Bugfix][Misc]: fix graph capture for decoder

yudian0504 opened this pull request 2 months ago
[Feature]: Support for Controlled Decoding

simonucl opened this issue 2 months ago
[Bugfix] Fix load config when using bools

madt2709 opened this pull request 2 months ago
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger

heheda12345 opened this pull request 2 months ago
[Frontend] Support suffix in completions API (fill-in-the-middle)

njhill opened this pull request 2 months ago
Adds method to read the pooling types from model's files

flaviabeo opened this pull request 2 months ago
[Feature]: LoRA support for InternVLChatModel

AkshataABhat opened this issue 2 months ago
[Misc] Fix ImportError causing by triton

MengqingCao opened this pull request 2 months ago
【Frontend】Add sampler_priority and repetition_penalty_range

ZeroYuJie opened this pull request 2 months ago
[Feature]: Consider parallel_tool_calls parameter at the API level

lucasalvarezlacasa opened this issue 2 months ago
[Misc] Compute query_start_loc/seq_start_loc on CPU

zhengy001 opened this pull request 2 months ago
[Frontend] re-enable multi-modality input in the new beam search implementation

FerdinandZhong opened this pull request 2 months ago
Begin refactoring executor_base ABC

jberkhahn opened this pull request 2 months ago
Support Roberta embedding models

maxdebayser opened this pull request 2 months ago
[Performance][Kernel] Fused_moe Performance Improvement

charlifu opened this pull request 2 months ago
[New Model]: Support Zyphra/Zamba2-7B

mgoin opened this issue 2 months ago
[CI/Build] remove .github from .dockerignore

dtrifiro opened this pull request 2 months ago
[Neuron] [Bugfix] Fix neuron startup

xendo opened this pull request 2 months ago
[Bug]: Tensor Parallelism performs poorly

DanielViglione opened this issue 2 months ago
[CI/Build] VLM Test Consolidation

alex-jw-brooks opened this pull request 2 months ago
[CI][Misc] Add tests for python-only development

cermeng opened this pull request 2 months ago
[Bug]: cannot run model when TP>1 (already run debug file)

jli943 opened this issue 2 months ago
[Feature]: support for prompt cache

wiluen opened this issue 2 months ago
[Bug]: 400 Bad Request

ErykCh opened this issue 2 months ago
[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs

bhupendra1324 opened this issue 2 months ago
[Misc]: Im trying to host my finetuned Llama -3-8b instruct in Vllm

preethiisenthil opened this issue 2 months ago
[Bug]: Error running Molmo on API in v0.6.3

Inforeon opened this issue 2 months ago
[Bug]: guided_json fails on pixtral when using OpenAI API

ktrapeznikov opened this issue 2 months ago
[Bugfix]: Make chat content text allow type content

vrdn-23 opened this pull request 2 months ago
[BugFix] Fix chat API continuous usage stats

njhill opened this pull request 2 months ago
[Bug]: llama3.2-11B-Vision-Instruct not working

warlockedward opened this issue 2 months ago
bugfix on draft_tp value

qibaoyuan opened this pull request 2 months ago
[Installation]: v0.6.3 pip install -e . error

tolry418 opened this issue 2 months ago
[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage

joerunde opened this pull request 2 months ago
[Bugfix] Update InternVL input mapper to support image embeds

hhzhang16 opened this pull request 2 months ago
[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel

WoosukKwon opened this pull request 2 months ago
pass ignore_eos parameter to all benchmark_serving calls

gracehonv opened this pull request 2 months ago
[Doc] Fix code formatting in spec_decode.rst

mgoin opened this pull request 2 months ago
[Docs] Remove PDF build from Readtehdocs

simon-mo opened this pull request 2 months ago
[Usage]: Obtaining success / error rate % metrics

yqlu opened this issue 2 months ago
[Frontend] Clarify model_type error messages

stevegrubb opened this pull request 2 months ago
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support

bigPYJ1151 opened this pull request 2 months ago
[Bugfix] Clean up some cruft in mamba.py

tlrmchlsmth opened this pull request 2 months ago
[Bug]: LLAMA 3.2 11B Vision Instruct Model not Running in VLLM 0.6.2

saikatscalers opened this issue 2 months ago
[Installation]: Adding opentelemetry packages in container image

sanketsudake opened this issue 2 months ago
[Usage]: --cpu-offload-gb no use

Rane2021 opened this issue 2 months ago
[Hardware] [Intel GPU] Add multistep scheduler for xpu device

jikunshang opened this pull request 2 months ago