Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[Bugfix]Fix evict v2 with long context length

puf147 opened this pull request 7 months ago
[CI] docfix

rkooo567 opened this pull request 7 months ago
[Doc] add debugging tips

youkaichao opened this pull request 7 months ago
[Core] Refactor Worker and ModelRunner to consolidate control plane communication

stephanie-wang opened this pull request 7 months ago
hidden-states from final (or middle layers)

janphilippfranken opened this issue 7 months ago
[Bug]:The vllm service takes two hours to start Because of NCCL

zhaotyer opened this issue 7 months ago
[Bug]: topk=1 and temperature=0 cause different output in vllm

rangehow opened this issue 7 months ago
[Doc][Typo] Fixing Missing Comma

ywang96 opened this pull request 7 months ago
[Bugfix] Add device assertion to TorchSDPA

bigPYJ1151 opened this pull request 7 months ago
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later

tlrmchlsmth opened this pull request 7 months ago
[Speculative decoding] Initial spec decode docs

cadedaniel opened this pull request 7 months ago
[Core][Distributed] add shm broadcast

youkaichao opened this pull request 7 months ago
[Bugfix] fix lora_dtype value type in arg_utils.py

c3-ali opened this pull request 7 months ago
[Bug]: EngineArgs missing value type for `lora_dtype`

c3-ali opened this issue 7 months ago
[Kernel] Vectorized FP8 quantize kernel

comaniac opened this pull request 7 months ago
[Bug]: Llama3 output limited to around 10 tokens

arifsaeed opened this issue 7 months ago
[ci] Fix Buildkite agent path

khluu opened this pull request 7 months ago
[Kernel] Factor out epilogues from cutlass kernels

tlrmchlsmth opened this pull request 7 months ago
[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel

cyang49 opened this pull request 7 months ago
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable

WoosukKwon opened this pull request 7 months ago
[Doc] Add documentation for FP8 W8A8

mgoin opened this pull request 7 months ago
[Kernel] `w4a16` support for `compressed-tensors`

dsikka opened this pull request 7 months ago
Bump version to v0.5.0

simon-mo opened this pull request 7 months ago
[Docs] Add Docs on Limitations of VLM Support

ywang96 opened this pull request 7 months ago
[CI] Upgrade codespell version.

rkooo567 opened this pull request 7 months ago
[Hardware][Intel] OpenVINO vLLM backend

ilya-lavrenov opened this pull request 7 months ago
[RFC]: OpenVINO vLLM backend

ilya-lavrenov opened this issue 7 months ago
0.4.3 error CUDA error: an illegal memory access was encountered

maxin9966 opened this issue 7 months ago
[misc][typo] fix typo

youkaichao opened this pull request 7 months ago
[Core][Distributed] add same-node detection

youkaichao opened this pull request 7 months ago
[Misc] Various simplifications and typing fixes

njhill opened this pull request 7 months ago
[WIP][Core] Support tensor parallel division with remainder of attention heads

NadavShmayo opened this pull request 7 months ago
[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min

JJplane opened this issue 7 months ago
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy

KuntaiDu opened this pull request 7 months ago
[Model] Add GLM-4v support

songxxzp opened this pull request 7 months ago
[Bugfix] Take the VRAM usage of prompt_logprobs into account

Conless opened this pull request 7 months ago
[Core][Distributed] merge two broadcast_tensor_dict

youkaichao opened this pull request 7 months ago
[Bug Fix] Fix the support check for FP8 CUTLASS

cli99 opened this pull request 7 months ago
[Bug]: TorchSDPAMetadata is out of date

Reichenbachian opened this issue 7 months ago
[Misc] Update to comply with the new `compressed-tensors` config

dsikka opened this pull request 7 months ago
[Bugfix][Core] fix broken state for recompute

youkaichao opened this pull request 7 months ago
[RFC]: Refactor MoE

robertgshaw2-neuralmagic opened this issue 7 months ago
[Misc] Remove unused cuda_utils.h in CPU backend

DamonFool opened this pull request 7 months ago
fix DbrxFusedNormAttention missing cache_config

Calvinnncy97 opened this pull request 7 months ago
[Usage]: Howto quiet the terminal 'Info' outputs in vllm

rohitnanda1443 opened this issue 7 months ago
[Bug]: non-deterministic Python gc order leads to flaky tests

youkaichao opened this issue 7 months ago
[Misc] Add args for selecting distributed executor to benchmarks

BKitor opened this pull request 7 months ago
[Misc][Utils] allow get_open_port to be called for multiple times

youkaichao opened this pull request 7 months ago
remove sort_keys=True in guided_decoding

DeyangKong opened this pull request 7 months ago
[Core] Fix sharing of stateful logits processors

maxdebayser opened this pull request 7 months ago
[Bug]: vLLM does not support virtual GPU

youkaichao opened this issue 7 months ago
[MISC] Upgrade dependency to PyTorch 2.3.1

comaniac opened this pull request 7 months ago
Sa 24 sparse

dsikka opened this pull request 7 months ago
[Doc] Add an automatic prefix caching section in vllm documentation

KuntaiDu opened this pull request 7 months ago
[AMD][ROCm][CI] unit tests fixes or skip

hongxiayang opened this pull request 7 months ago
[Usage]: Streaming Response from vLLM 0.4.2 -> 0.4.3

BiboyQG opened this issue 7 months ago
[New Model]: mistralai/Codestral-22B-v0.1

eduardozamudio opened this issue 7 months ago
[Installation]: Compiling VLLM for cpu only.

Zibri opened this issue 7 months ago
GLM-4-9B-Chat:

Geaming-CHN opened this issue 7 months ago
[Installation]: Building editable for vllm fails (pip install -e .)

felixzhu555 opened this issue 7 months ago
[Bug]: Cannot request more than 5 logprobs

coder109 opened this issue 7 months ago
Addition of lacked ignored_seq_groups in _schedule_chunked_prefill

JamesLim-sy opened this pull request 7 months ago
[Core][Distributed] add coordinator to reduce code duplication in tp and pp

youkaichao opened this pull request 7 months ago
[Hardware] Initial TPU integration

WoosukKwon opened this pull request 7 months ago
[Misc] Skip for logits_scale == 1.0

WoosukKwon opened this pull request 7 months ago
[Usage]: the docker image v0.4.3 cannot work

BUJIDAOVS opened this issue 7 months ago
[Misc] Missing error message for custom ops import

DamonFool opened this pull request 7 months ago
trigger_ci_cd

sergey-tinkoff opened this pull request 7 months ago
[Bug]: Regression in predictions in v0.4.3

hibukipanim opened this issue 7 months ago
[Model] Dynamic image size support for LLaVA-NeXT

DarkLight1337 opened this pull request 7 months ago
test

geeker-smallwhite opened this pull request 7 months ago
[Core] Dynamic image size support for VLMs

DarkLight1337 opened this pull request 7 months ago
[Kernel] Update Cutlass int8 kernel configs for SM80

varun-sundar-rabindranath opened this pull request 7 months ago
[Bug]: chatglm3 with lora adapter

Qingyuncookie opened this issue 7 months ago
[Misc] Fix docstring of get_attn_backend

WoosukKwon opened this pull request 7 months ago
[Bug]: a bug

lambda7xx opened this issue 7 months ago
[Usage]: How to load a model with less CPU memory

liulfy opened this issue 7 months ago