Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

[New Model]: SparseLLM/prosparse-llama-2-7b

eljrte opened this issue about 2 months ago
[Frontend] Multi-Modality Support for Loading Local Image Files

chaunceyjiang opened this pull request about 2 months ago
[Bug]: awq marlin error for deepseek v2 lite

TechxGenus opened this issue about 2 months ago
[Hardware][CPU] Update torch 2.5

bigPYJ1151 opened this pull request about 2 months ago
[Bug]: vllm 0.6.3.post1 does not work with `response_format`

Quang-elec44 opened this issue about 2 months ago
[Bugfix] Respect modules_to_not_convert within awq_marlin

mgoin opened this pull request about 2 months ago
[Bugfix] Fix layer skip logic with bitsandbytes

mgoin opened this pull request about 2 months ago
[Bugfix]Using the correct type hints

gshtras opened this pull request about 2 months ago
add github action to build and push cpu-inference image

br3no opened this pull request about 2 months ago
fix point in path

youkaichao opened this pull request about 2 months ago
[Frontend] Add max_tokens prometheus metric

tomeras91 opened this pull request about 2 months ago
[V1] Complete v1 sample and prompt logprobs support

afeldman-nm opened this pull request about 2 months ago
[Usage]: Hidden States not working in Speculative Decode

ChiKaWa3077 opened this issue about 2 months ago
[Neuron] Skip model attention head compatibility check with tensor pa…

sssrijan-amazon opened this pull request about 2 months ago
[torch.compile] Adding torch compile annotations to some models

CRZbulabula opened this pull request about 2 months ago
[Bug]: Running on a single machine with multiple GPUs error

Wiselnn570 opened this issue about 2 months ago
[V1] Support VLMs with fine-grained scheduling

WoosukKwon opened this pull request about 2 months ago
[Bug]: LLaVA-v1.5-13B with OpenAI compatible API reported an error

ligeng0197 opened this issue about 2 months ago
[Doc] Update Qwen documentation

jeejeelee opened this pull request about 2 months ago
[New Model]: NV-Embed-v2

warlockedward opened this issue about 2 months ago
[torch.compile] rework test plans

youkaichao opened this pull request about 2 months ago
[Installation]: pynvml.NVMLError_InvalidArgument: Invalid Argument

jedi0605 opened this issue about 2 months ago
[Misc] Remove deprecated arg for cuda graph capture

ywang96 opened this pull request about 2 months ago
[Performance]: Qwen2-VL-7B AWQ model performance

zzf2grx opened this issue about 2 months ago
[Prototype][WIP] Prefix Cache Aware Scheduling for V0

rickyyx opened this pull request about 2 months ago
[Bugfix/Core] Remove assertion for Flashinfer k_scale and v_scale

pavanimajety opened this pull request about 2 months ago
[Frontend] Pythonic tool parser

mdepinet opened this pull request about 2 months ago
[V1] Multiprocessing Tensor Parallel Support for v1

tlrmchlsmth opened this pull request about 2 months ago
[Kernel] Initial Machete W4A8 support + Refactors

LucasWilkinson opened this pull request about 2 months ago
[help wanted]: add sliding window support for flashinfer

youkaichao opened this issue about 2 months ago
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices

wallashss opened this pull request about 2 months ago
[CI/Build] Adding a forced docker system prune to clean up space

Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
[New Model]: BAAI/bge-m3

javiplav opened this issue about 2 months ago
[CI/Build] Add Model Tests for Qwen2-VL

alex-jw-brooks opened this pull request about 2 months ago
[Feature]: Online video support for VLMs

litianjian opened this issue about 2 months ago
[Bug]: api_server.py: error: unrecognized arguments: --task embedding

javiplav opened this issue about 2 months ago
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100

mzusman opened this pull request about 2 months ago
[Bug] params Type is not right?

cqray1990 opened this issue about 2 months ago
[Usage]: How to use GLM4v multi_modal_data to make Multi-turn dialogue

Jimmy-L99 opened this issue about 2 months ago
[Usage]: ValueError: Unexpected weight for Qwen2-VL GPTQ 4-bit custom model.

bhavyajoshi-mahindra opened this issue about 2 months ago
[Feature]: host wheel via pypi index?

youkaichao opened this issue about 2 months ago
[V1] `AsyncLLM` Implementation

robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Kernel][ROCm][AMD] fp8 moe configs for MI300X. Mixtral-8x(7B,22B) TP=1,2,4,8

divakar-amd opened this pull request about 2 months ago
[Model] Support quantization of Qwen2VisionTransformer for Qwen2-VL

mgoin opened this pull request about 2 months ago
[CI/Build] Add Model Tests for PixtralHF

mgoin opened this pull request about 2 months ago
[Feature]: Integrate Writing in the Margins inference pattern ($5,000 Bounty)

melisa-writer opened this issue about 2 months ago
[Bug]: Can't start `vllm serve microsoft/Phi-3-small-8k-instruct`

Yuto-24 opened this issue about 2 months ago
[Usage]: prefix caching support for multimodal models

mearcstapa-gqz opened this issue about 2 months ago
[Feature]: Is it supported Qwen2.5 tool_choice: auto?

deku0818 opened this issue about 2 months ago
[torch.compile] Simplify exception trace in compilation tests

CRZbulabula opened this pull request about 2 months ago
[Bug]: ModuleNotFoundError: No module named 'openai.types'

StevenTang1998 opened this issue about 2 months ago
[Hardware] using current_platform.seed_everything

wangshuai09 opened this pull request about 2 months ago
[Usage]: how to return logits

psh0628-eng opened this issue about 2 months ago
Prototyping `LLMEngineCore`

robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Frontend]: enable state callbacks for offline inference

sethkimmel3 opened this pull request about 2 months ago
[Misc] Refactor benchmark_throughput.py

lk-chen opened this pull request about 2 months ago
trigger

robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Bug]: [help wanted] MoE + TP + custom allreduce bug

youkaichao opened this issue about 2 months ago
[Doc] fix third-party model example

russellb opened this pull request about 2 months ago
[perf bench] H200 development

simon-mo opened this pull request about 2 months ago
[Model] Add Idefics3 support

jeejeelee opened this pull request about 2 months ago
[Model][Quantization] HQQ support through Marlin kernel expansion

ElizaWszola opened this pull request about 2 months ago
[Bug]: OSError: [Errno 98] Address already in use

ericxsun opened this issue about 2 months ago
[Model] add tool parser for openbmb/MiniCPM3-4B

Cppowboy opened this pull request about 2 months ago
[Frontend] Chat-based Embeddings API

DarkLight1337 opened this pull request about 2 months ago
[Bug]: Llama-3.2-11B-Vision-Instruct Inference Can't Stop

sudanl opened this issue about 2 months ago
[xpu] Use allreduce to replace gather can reduce extra cat

ys950902 opened this pull request about 2 months ago
[Model] Add support for H2OVL-Mississippi models

cooleel opened this pull request about 2 months ago
Bump actions/checkout from 4.2.1 to 4.2.2

dependabot[bot] opened this pull request about 2 months ago
Bump actions/setup-python from 5.2.0 to 5.3.0

dependabot[bot] opened this pull request about 2 months ago
[Usage]:

Slicknuts23 opened this issue about 2 months ago
[Feature]: OpenAI supply image path

SinanAkkoyun opened this issue about 2 months ago
[misc] use out argument for flash attention

youkaichao opened this pull request about 2 months ago
[Bug]: ValueError: At most 1 image(s) may be provided in one request.

eav-solution opened this issue about 2 months ago
[Bug]: offline inference with ray fails on multinode

gpucce opened this issue about 2 months ago
[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2)

hibukipanim opened this issue about 2 months ago
[Usage]: MistralModel architecture not supported

dequeueing opened this issue about 2 months ago
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson

conroy-cheers opened this pull request about 2 months ago
Fix cache management in "Close inactive issues and PRs" actions workflow

hmellor opened this pull request about 2 months ago
[Bug]: Qwen2-VL incoherent output with OpenAI API

SinanAkkoyun opened this issue about 2 months ago
[Bug]: tensor parallelism multinode

gpucce opened this issue about 2 months ago
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode

llsj14 opened this pull request about 2 months ago
[Bug]: Jetson support regression

conroy-cheers opened this issue about 2 months ago