github.com/vllm-project/vllm issues | Ecosyste.ms: OpenCollective

[New Model]: SparseLLM/prosparse-llama-2-7b

eljrte opened this issue about 2 months ago

[Frontend] Multi-Modality Support for Loading Local Image Files

chaunceyjiang opened this pull request about 2 months ago

[Bug]: awq marlin error for deepseek v2 lite

TechxGenus opened this issue about 2 months ago

[Hardware][CPU] Update torch 2.5

bigPYJ1151 opened this pull request about 2 months ago

[Bug]: Mistral 'SentencePieceTokenizer' object has no attribute 'id_to_byte_piece'

liziniu opened this issue about 2 months ago

[Bug]: vllm 0.6.3.post1 does not work with `response_format`

Quang-elec44 opened this issue about 2 months ago

[Bugfix] Respect modules_to_not_convert within awq_marlin

mgoin opened this pull request about 2 months ago

[Bugfix] Fix layer skip logic with bitsandbytes

mgoin opened this pull request about 2 months ago

[Kernels] Add an inductor pass to rewrite and fuse collective communication ops with gemms

bnellnm opened this pull request about 2 months ago

[Bugfix]Using the correct type hints

gshtras opened this pull request about 2 months ago

add github action to build and push cpu-inference image

br3no opened this pull request about 2 months ago

[Kernels] Add an inductor pass to rewrite and fuse collective communication ops with gemms (WIP not for review)

bnellnm opened this pull request about 2 months ago

fix point in path

youkaichao opened this pull request about 2 months ago

[Frontend] Add max_tokens prometheus metric

tomeras91 opened this pull request about 2 months ago

[V1] Complete v1 sample and prompt logprobs support

afeldman-nm opened this pull request about 2 months ago

[Usage]: Hidden States not working in Speculative Decode

ChiKaWa3077 opened this issue about 2 months ago

[Bug]: seq_group_metadata.encoder_seq_data.get_len() AttributeError: 'NoneType' object has no attribute 'get_len'

bingwork opened this issue about 2 months ago

[Neuron] Skip model attention head compatibility check with tensor pa…

sssrijan-amazon opened this pull request about 2 months ago

[torch.compile] Adding torch compile annotations to some models

CRZbulabula opened this pull request about 2 months ago

[Bug]: Running on a single machine with multiple GPUs error

Wiselnn570 opened this issue about 2 months ago

[Bug]: Function calling with Qwen & Streaming ('NoneType' object has no attribute 'get')

githebs opened this issue about 2 months ago

[New Model]: 请问接下来有支持诸如GPT-SoVITS之类的语音生成大模型的计划吗？

wyp19960713 opened this issue about 2 months ago

[Bug]: You are using a model of type qwen2_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.

cqray1990 opened this issue about 2 months ago

[V1] Support VLMs with fine-grained scheduling

WoosukKwon opened this pull request about 2 months ago

[Bug]: LLaVA-v1.5-13B with OpenAI compatible API reported an error

ligeng0197 opened this issue about 2 months ago

[Doc] Update Qwen documentation

jeejeelee opened this pull request about 2 months ago

[New Model]: NV-Embed-v2

warlockedward opened this issue about 2 months ago

[torch.compile] rework test plans

youkaichao opened this pull request about 2 months ago

[Installation]: pynvml.NVMLError_InvalidArgument: Invalid Argument

jedi0605 opened this issue about 2 months ago

[Misc] Remove deprecated arg for cuda graph capture

ywang96 opened this pull request about 2 months ago

[Performance]: Qwen2-VL-7B AWQ model performance

zzf2grx opened this issue about 2 months ago

[Prototype][WIP] Prefix Cache Aware Scheduling for V0

rickyyx opened this pull request about 2 months ago

[Bugfix/Core] Remove assertion for Flashinfer k_scale and v_scale

pavanimajety opened this pull request about 2 months ago

[Frontend] Pythonic tool parser

mdepinet opened this pull request about 2 months ago

[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support int8 SmoothQuant, symmetric case

rasmith opened this pull request about 2 months ago

[V1] Multiprocessing Tensor Parallel Support for v1

tlrmchlsmth opened this pull request about 2 months ago

[Kernel] Initial Machete W4A8 support + Refactors

LucasWilkinson opened this pull request about 2 months ago

[help wanted]: add sliding window support for flashinfer

youkaichao opened this issue about 2 months ago

[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices

wallashss opened this pull request about 2 months ago

[CI/Build] Adding a forced docker system prune to clean up space

Alexei-V-Ivanov-AMD opened this pull request about 2 months ago

[Bug]: could not broadcast input array from shape (944,) into shape (512,)

olegkhr opened this issue about 2 months ago

[New Model]: BAAI/bge-m3

javiplav opened this issue about 2 months ago

[CI/Build] Add Model Tests for Qwen2-VL

alex-jw-brooks opened this pull request about 2 months ago

[Feature]: Online video support for VLMs

litianjian opened this issue about 2 months ago

[Bug]: api_server.py: error: unrecognized arguments: --task embedding

javiplav opened this issue about 2 months ago

[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100

mzusman opened this pull request about 2 months ago

[Bug] params Type is not right?

cqray1990 opened this issue about 2 months ago

[Usage]: How to use GLM4v multi_modal_data to make Multi-turn dialogue

Jimmy-L99 opened this issue about 2 months ago

[Usage]: ValueError: Unexpected weight for Qwen2-VL GPTQ 4-bit custom model.

bhavyajoshi-mahindra opened this issue about 2 months ago

[Feature]: host wheel via pypi index?

youkaichao opened this issue about 2 months ago

> **Bug**:Interesting finding: The official pip package v0.6.3 is broken. However, installing `https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl` fixes this issue. (`vLLM API server version 0.6.3.post2.dev139+g622b7ab9`)

Wiselnn570 opened this issue about 2 months ago

[V1] `AsyncLLM` Implementation

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[Bug]: The Qwen series models produce garbled output when generating long texts.

hongqing1986 opened this issue about 2 months ago

[Kernel][ROCm][AMD] fp8 moe configs for MI300X. Mixtral-8x(7B,22B) TP=1,2,4,8

divakar-amd opened this pull request about 2 months ago

[Model] Support quantization of Qwen2VisionTransformer for Qwen2-VL

mgoin opened this pull request about 2 months ago

[CI/Build] Add Model Tests for PixtralHF

mgoin opened this pull request about 2 months ago

[Feature]: Integrate Writing in the Margins inference pattern ($5,000 Bounty)

melisa-writer opened this issue about 2 months ago

[Bugfix][Quantization]Fix support for non quantized visual layers in otherwise quantized mllama model, including missing scaling factors

gshtras opened this pull request about 2 months ago

[Bug]: Can't start `vllm serve microsoft/Phi-3-small-8k-instruct`

Yuto-24 opened this issue about 2 months ago

[Usage]: prefix caching support for multimodal models

mearcstapa-gqz opened this issue about 2 months ago

[Feature]: Is it supported Qwen2.5 tool_choice: auto?

deku0818 opened this issue about 2 months ago

[Performance]: Sampler account for most of time comparing to prefill and decode

zhjunqin opened this issue about 2 months ago

[torch.compile] Simplify exception trace in compilation tests

CRZbulabula opened this pull request about 2 months ago

[Bug]: ModuleNotFoundError: No module named 'openai.types'

StevenTang1998 opened this issue about 2 months ago

[Hardware] using current_platform.seed_everything

wangshuai09 opened this pull request about 2 months ago

[Usage]: how to return logits

psh0628-eng opened this issue about 2 months ago

Prototyping `LLMEngineCore`

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[Frontend]: enable state callbacks for offline inference

sethkimmel3 opened this pull request about 2 months ago

[Misc] Refactor benchmark_throughput.py

lk-chen opened this pull request about 2 months ago

trigger

robertgshaw2-neuralmagic opened this pull request about 2 months ago

[Bug]: [help wanted] MoE + TP + custom allreduce bug

youkaichao opened this issue about 2 months ago

[Doc] fix third-party model example

russellb opened this pull request about 2 months ago

[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs]

pseudotensor opened this issue about 2 months ago

[perf bench] H200 development

simon-mo opened this pull request about 2 months ago

[Model] Add Idefics3 support

jeejeelee opened this pull request about 2 months ago

[Model][Quantization] HQQ support through Marlin kernel expansion

ElizaWszola opened this pull request about 2 months ago

[Bug]: OSError: [Errno 98] Address already in use

ericxsun opened this issue about 2 months ago

[Model] add tool parser for openbmb/MiniCPM3-4B

Cppowboy opened this pull request about 2 months ago

[Feature]: Qwen2.5 model : ValueError: This model does not support the 'embedding' task. Supported tasks: {'generate'}

tarikbeijing opened this issue about 2 months ago

[Frontend] Chat-based Embeddings API

DarkLight1337 opened this pull request about 2 months ago

[Bug]: Llama-3.2-11B-Vision-Instruct Inference Can't Stop

sudanl opened this issue about 2 months ago

[xpu] Use allreduce to replace gather can reduce extra cat

ys950902 opened this pull request about 2 months ago

[Model] Add support for H2OVL-Mississippi models

cooleel opened this pull request about 2 months ago

Bump actions/checkout from 4.2.1 to 4.2.2

dependabot[bot] opened this pull request about 2 months ago

Bump actions/setup-python from 5.2.0 to 5.3.0

dependabot[bot] opened this pull request about 2 months ago

[Usage]:

Slicknuts23 opened this issue about 2 months ago

[Feature]: OpenAI supply image path

SinanAkkoyun opened this issue about 2 months ago

[misc] use out argument for flash attention

youkaichao opened this pull request about 2 months ago

[Bug]: ValueError: At most 1 image(s) may be provided in one request.

eav-solution opened this issue about 2 months ago

[Bug]: offline inference with ray fails on multinode

gpucce opened this issue about 2 months ago

[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2)

hibukipanim opened this issue about 2 months ago

[Usage]: MistralModel architecture not supported

dequeueing opened this issue about 2 months ago

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson

conroy-cheers opened this pull request about 2 months ago

Fix cache management in "Close inactive issues and PRs" actions workflow

hmellor opened this pull request about 2 months ago

[Usage]: It seems succeed to deploy llm. but I get errors when "curl http://127.0.0.1:8011/v1/models"

gaohang opened this issue about 2 months ago

[Bug]: Qwen2-VL incoherent output with OpenAI API

SinanAkkoyun opened this issue about 2 months ago

[Bug]: tensor parallelism multinode

gpucce opened this issue about 2 months ago

[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode

llsj14 opened this pull request about 2 months ago

[Bug]: Bfloat16 or Half are not compatible with HF float16/bfloat16 result.

jason9693 opened this issue about 2 months ago

[Bug]: Jetson support regression

conroy-cheers opened this issue about 2 months ago