Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Core] Centralize GPU Worker construction

njhill opened this pull request 8 months ago
[WIP][Hardware][Intel] support intel builds with intel c++

kannon92 opened this pull request 8 months ago
Add support for ReFT

RonanKMcGovern opened this issue 8 months ago
[Core] Pipeline Parallel Support

andoorve opened this pull request 8 months ago
[Doc]: Offline Inference Distributed Broken for TP

sam-h-bean opened this issue 8 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs

jasonacox opened this pull request 8 months ago
[RFC]: environment variable management in vllm

youkaichao opened this issue 8 months ago
[kernel] fix sliding window in prefix prefill Triton kernel

mmoskal opened this pull request 8 months ago
[Bug]: Can not run openapi server with cpu backend

kannon92 opened this issue 8 months ago
[Frontend] add tok/s speed metric to llm class when using tqdm

MahmoudAshraf97 opened this pull request 8 months ago
[Bug]: TypeError in XFormersMetadata

skonto opened this issue 8 months ago
[Model]: Support for InternVL-Chat-V1-5

Iven2132 opened this issue 8 months ago
[Usage]: I doubt about the meaning of --enable-prefix-caching

chenchunhui97 opened this issue 8 months ago
[Model] Phi-3 4k sliding window temp. fix

caiom opened this pull request 8 months ago
[Speculative decoding] Support target-model logprobs

cadedaniel opened this pull request 8 months ago
[Bug]: Phi3 still not supported

andrew-vold opened this issue 8 months ago
✨ support local cache for models

prashantgupta24 opened this pull request 8 months ago
[Core] Add `multiproc_worker_utils` for multiprocessing-based workers

njhill opened this pull request 8 months ago
[Frontend] Add APIs for dynamic LoRA models load/unload

graceleeis opened this pull request 8 months ago
[Kernel] Use flashinfer for decoding

LiuXiaoxuanPKU opened this pull request 8 months ago
[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales

pcmoritz opened this pull request 8 months ago
[Bug]: Call to CUDA function failed - unknown error

roclark opened this issue 8 months ago
[Misc]: RuntimeError: Cannot find any model weights [vllm=0.4.0]

vishwa27yvs opened this issue 8 months ago
[Kernel] Support Fp8 Checkpoints (Dynamic + Static)

robertgshaw2-neuralmagic opened this pull request 8 months ago
[New Model]: launch error of Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

eigen2017 opened this issue 8 months ago
[Misc] Upgrade outlines to v0.0.41

psykhi opened this pull request 8 months ago
Add logger extra

olehviniarchyk opened this pull request 8 months ago
[Core] Consolidate prompt arguments to LLM engines

DarkLight1337 opened this pull request 8 months ago
[Kernel][Core][WIP] Tree attention and parallel decoding

yukavio opened this pull request 8 months ago
[Usage]: Flash Attention not working any more

Techinix opened this issue 8 months ago
[CI] check size of the wheels

simon-mo opened this pull request 8 months ago
[New Model]: Support Phi-3

alexkreidler opened this issue 8 months ago
Allow user to define whitespace pattern for outlines

robcaulk opened this pull request 8 months ago
[Feature]: batched parallel decoding

snyhlxde1 opened this issue 8 months ago
[Usage]: ValueError: Cannot find the config file for awq

grumpyp opened this issue 8 months ago
[New Model]: Llama 3 8B Instruct

K-Mistele opened this issue 8 months ago
[Speculative decoding] CUDA graph support

heeju-kim2 opened this pull request 8 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs

cduk opened this pull request 8 months ago
[WIP] Infrastructure for encoder/decoder support

afeldman-nm opened this pull request 8 months ago
[Bug]: vllm stall on llama3-70b warmup with 0.4.1

piercefreeman opened this issue 8 months ago
[Bug]: CPU Inference vllm_ops not defined

bsu3338 opened this issue 8 months ago
add standalone_api_server

alex-k-cart opened this pull request 8 months ago
[CI/Build] AMD CI pipeline with extended set of tests.

Alexei-V-Ivanov-AMD opened this pull request 8 months ago
[Speculative decoding] Fix async executing

zxdvd opened this pull request 8 months ago
[Bug]: Ray memory leak

saattrupdan opened this issue 8 months ago
[Speculative decoding] Add ngram prompt lookup decoding

leiwen83 opened this pull request 8 months ago
[Bug]: NameError: name 'vllm_ops' is not defined

yananchen1989 opened this issue 8 months ago
[Model] Add moondream vision language model

vikhyat opened this pull request 8 months ago
[Bug]: NCCL locating mechanism in multi-user environment

ticoneva opened this issue 8 months ago
[Bugfix] Fix marlin kernel crash on H100

alexm-neuralmagic opened this pull request 8 months ago
[Speculative decoding] [Performance]: Re-enable bonus tokens

cadedaniel opened this issue 8 months ago
Performance Regression between v0.4.0 and v0.4.1

simon-mo opened this issue 8 months ago
[Usage]: Make request to LLAVA server.

premg16 opened this issue 8 months ago
[Usage]: How to use LoRARequest with AsyncLLMEngine?

Rares9999 opened this issue 8 months ago
[Frontend] Support GPT-4V Chat Completions API

DarkLight1337 opened this pull request 8 months ago
[Model] Initial support for LLaVA-NeXT

DarkLight1337 opened this pull request 8 months ago
[Core] Support image processor

DarkLight1337 opened this pull request 8 months ago
[Misc]: optimize eager mode host time

functionxu123 opened this pull request 8 months ago
[RFC]: Multi-modality Support Refactoring

ywang96 opened this issue 8 months ago
[New Model]: Please update docker to support llama3

HangLu123 opened this issue 8 months ago
Adding max queue time parameter

KrishnaM251 opened this pull request 8 months ago
[Usage]: Llama 3 8B Instruct Inference

aliozts opened this issue 8 months ago
Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`

simon-mo opened this pull request 8 months ago
[CI/Build] Further decouple HuggingFace implementation from ours during tests

DarkLight1337 opened this pull request 8 months ago
[BugFix] fix num_lookahead_slots missing in async executor

leiwen83 opened this pull request 8 months ago
[Misc]: How to access the KV cache directly?

BDHU opened this issue 8 months ago
[Feature]: AMD ROCm 6.1 Support

kannan-scalers-ai opened this issue 8 months ago
[Core] Enable prefix caching with block manager v2 enabled

leiwen83 opened this pull request 8 months ago
[Feature]: Phi2 LoRA support

zero-or-one opened this issue 8 months ago
[Misc]Add customized information for models

jeejeelee opened this pull request 8 months ago
[Bug]: Invalid Device Ordinal on ROCm

Bellk17 opened this issue 8 months ago