Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[ Kernel ] AWQ Fused MoE

robertgshaw2-neuralmagic opened this pull request 7 months ago
[ci][build] fix commit id

youkaichao opened this pull request 7 months ago
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests

g-eoj opened this pull request 7 months ago
[doc][distributed] add suggestion for distributed inference

youkaichao opened this pull request 7 months ago
[ Misc ] Apply MoE Refactor to Qwen2 + Deepseekv2 To Support Fp8

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Feature]: Apply chat template through `LLM` class

robertgshaw2-neuralmagic opened this issue 7 months ago
[ Kernel ] AWQ Fused MoE

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bugfix][CI/Build] Fix testing for generated commit hash

mgoin opened this pull request 7 months ago
[Doc] Add documentations for nightly benchmarks

KuntaiDu opened this pull request 7 months ago
Updating LM Format Enforcer version to v10.3

noamgat opened this pull request 7 months ago
[ci][distributed] add pipeline parallel correctness test

youkaichao opened this pull request 7 months ago
[core][distributed] simplify code to support pipeline parallel

youkaichao opened this pull request 7 months ago
Remove unnecessary trailing period in spec_decode.rst

terrytangyuan opened this pull request 7 months ago
Report usage for beam search

simon-mo opened this pull request 7 months ago
[Model] Pipeline parallel support for Mixtral

binxuan opened this pull request 7 months ago
[Misc] Add deprecation warning for beam search

WoosukKwon opened this pull request 7 months ago
[Misc] Disambiguate quantized types via a new ScalarType

LucasWilkinson opened this pull request 7 months ago
[CI/Build] Cross python wheel

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Doc] xpu backend requires running setvars.sh

rscohn2 opened this pull request 7 months ago
[Bug]: Problem loading Gemma 2 27b-it

rdaiello opened this issue 7 months ago
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace

tlrmchlsmth opened this pull request 7 months ago
torch.compile based model optimizer

bnellnm opened this pull request 7 months ago
[Bug]: vLLM 0.5.1 tensor parallel 2 hang

Flynn-Zh opened this issue 7 months ago
[BUGFIX] Raise an error for no draft token case when draft_tp>1

wooyeonlee0 opened this pull request 7 months ago
[Feature]: Request for Ascend NPU support

xuedinge233 opened this issue 7 months ago
[ Misc ] More Cleanup of Marlin

robertgshaw2-neuralmagic opened this pull request 7 months ago
[ Misc ] Support Act Order in Compressed Tensors

robertgshaw2-neuralmagic opened this pull request 7 months ago
[BigFix] Fix the lm_head in gpt_bigcode in lora mode

maxdebayser opened this pull request 7 months ago
[ Misc ] Support Models With Bias in `compressed-tensors` integration

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bugfix] Fix Ray Metrics API usage

Yard1 opened this pull request 7 months ago
[ Misc ] Remove separate bias add

robertgshaw2-neuralmagic opened this pull request 7 months ago
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in vllm to get device count

hongxiayang opened this pull request 7 months ago
[Misc] Remove flashinfer warning, add flashinfer tests to CI

LiuXiaoxuanPKU opened this pull request 7 months ago
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub

adityagoel14 opened this pull request 7 months ago
[Bugfix] Fix usage stats logging exception warning with OpenVINO

helena-intel opened this pull request 7 months ago
[Feature]: FlashAttention 3 support

orellavie1212 opened this issue 7 months ago
[doc] update pipeline parallel in readme

youkaichao opened this pull request 7 months ago
[distributed][misc] keep consistent with how pytorch finds libcudart.so

youkaichao opened this pull request 7 months ago
[BugFix] BatchResponseData body should be optional

zifeitong opened this pull request 7 months ago
[Kernel] Fix identical branches

stevegrubb opened this pull request 7 months ago
[Model][Phi3-Small] Remove scipy from blocksparse_attention

mgoin opened this pull request 7 months ago
[Bug]: OpenAI batch file format pydantic validation error

ArsalShakil opened this issue 7 months ago
[Misc] add fixture to guided processor tests

kevinbu233 opened this pull request 7 months ago
[bug fix] Fix llava next feature size calculation.

xwjiang2010 opened this pull request 7 months ago
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step

alexm-neuralmagic opened this pull request 7 months ago
[Performance]: how to use NVIDIA Nsight Compute in lunix

chenglu66 opened this issue 7 months ago
fix cuda118 can't find libcudart.so error

zhaotyer opened this pull request 7 months ago
[Bug]: Unable to run phi-3-small in latest release

ssmi153 opened this issue 7 months ago
wip

thri5ha opened this pull request 7 months ago
[Bug]: Gloo 库无法在两台计算机之间进行通信

JKYtydt opened this issue 7 months ago
[Bug]: VLLM's output is unstable version==0.5.1

ffxmm opened this issue 7 months ago
[Model] RowParallelLinear: pass bias to quant_method.apply

tdoublep opened this pull request 7 months ago
[Feature]: Hybrid Attention

leo6022 opened this issue 7 months ago
[Bug]: VLLM 0.5.1 with LLaVA 1.6 exceptions

andrePankraz opened this issue 7 months ago
[Model]: Support for InternVL2

Weiyun1025 opened this issue 7 months ago
[Misc] refactor(config): clean up unused code

aniaan opened this pull request 7 months ago
[Core] offload model weights to CPU conditionally

chenqianfzh opened this pull request 7 months ago
[Core] Support Lora lineage and base model metadata management

Jeffwan opened this pull request 7 months ago
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor

WoosukKwon opened this pull request 7 months ago
[BUG FIX]fix compile error when building with torch2.1

maidabu opened this pull request 7 months ago
[Bug]: Gloo Connection reset by peer

thies1006 opened this issue 7 months ago
[Usage]: In phi3 vision maximum context length issue

tusharraskar opened this issue 7 months ago
[Feature]: Multi-Proposers support for speculative decoding.

ShangmingCai opened this issue 7 months ago
[Bug]: Vllm 0.5.1+cu118 timeout when init CustomAllreduce

zhaotyer opened this issue 7 months ago
Speculative decoding leads to zombie requests

naturomics opened this issue 7 months ago
[Model] Add support for 'gte-Qwen2' embedding models

Nickydusk opened this pull request 7 months ago
[ci] try to add multi-node tests

youkaichao opened this pull request 7 months ago
[CI/Build][TPU] Add TPU CI test

WoosukKwon opened this pull request 7 months ago
[core] Sampling controller interface

mmoskal opened this pull request 7 months ago
[Doc]: Latency vs Throughput Configurations

antferdom opened this issue 7 months ago
[BugFix] get_and_reset only when scheduler outputs are not empty

mzusman opened this pull request 7 months ago
[Bug]: Qwen2 Moe FP8 not supported on L40

TopIdiot opened this issue 7 months ago
No executable after building vllm from source with CPU support

parkesorgua opened this issue 7 months ago
[BugFix]: fix engine timeout due to request abort

pushan01 opened this pull request 7 months ago