vLLM issues | Ecosyste.ms: OpenCollective

[V1] Add all_token_ids attribute to Request

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

Rename vllm.logging to vllm.logging_utils

github.com/vllm-project/vllm - flozi00 opened this pull request 3 months ago

[help wanted]: rename vllm/logging module to avoid shadowing builtin logging module

github.com/vllm-project/vllm - youkaichao opened this issue 3 months ago

[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill

github.com/vllm-project/vllm - NickLucche opened this pull request 3 months ago

[Kernel]Enable HPU for Speculative Decoding

github.com/vllm-project/vllm - xuechendi opened this pull request 3 months ago

[Mistral] FP8 format

github.com/vllm-project/vllm - patrickvonplaten opened this pull request 3 months ago

[Bug]: can not serve microsoft/llava-med-v1.5-mistral-7b

github.com/vllm-project/vllm - cubense opened this issue 3 months ago

[WIP] Prefix Cache Aware Scheduling [1/n]

github.com/vllm-project/vllm - rickyyx opened this pull request 3 months ago

[V1][Bugfix] Propagate V1 LLMEngine properly

github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago

[Usage]: VLLM failing to stream response after 512+ prompt tokens.

github.com/vllm-project/vllm - aghbd opened this issue 3 months ago

[Core] Add padding-aware scheduling for 2D prefills

github.com/vllm-project/vllm - kzawora-intel opened this pull request 3 months ago

[Usage]: Engine iteration timed out. (during using qwen2-vl-7b)

github.com/vllm-project/vllm - HuiyuanYan opened this issue 3 months ago

[CI/Build] Always run mypy

github.com/vllm-project/vllm - russellb opened this pull request 3 months ago

[V1] Allow piecewise cuda graphs to run with custom allreduce

github.com/vllm-project/vllm - SageMoore opened this pull request 3 months ago

Fix quantization config of vl model

github.com/vllm-project/vllm - jinzhen-lin opened this pull request 3 months ago

[New Model]: dunzhang/stella_en_1.5B_v5

github.com/vllm-project/vllm - cavities opened this issue 3 months ago

[Bug]: vllm0.6.3.post1 7B model can not use cmd vllm.entrypoints.openai.api_server on wsl

github.com/vllm-project/vllm - xiezhipeng-git opened this issue 3 months ago

[Doc]: follow the doc but got error

github.com/vllm-project/vllm - husheng-liu opened this issue 3 months ago

[RFC]: Merge input processor and input mapper for multi-modal models

github.com/vllm-project/vllm - DarkLight1337 opened this issue 3 months ago

[Hardware][CPU][torch.compile] integrate torch compile

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 3 months ago

[Bugfix] Make image processor respect `mm_processor_kwargs` for Qwen2-VL

github.com/vllm-project/vllm - li-plus opened this pull request 3 months ago

[Bug]: When apply continue_final_message for OpenAI server, the `"echo":false` is ignored.

github.com/vllm-project/vllm - DIYer22 opened this issue 3 months ago

[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 3 months ago

[Hardware][XPU] AWQ/GPTQ support for xpu backend

github.com/vllm-project/vllm - yma11 opened this pull request 3 months ago

[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark.

github.com/vllm-project/vllm - spliii opened this pull request 3 months ago

[Bug]: Engine loop has died for Meta-Llama-3.1-8B-Instruct TP=2

github.com/vllm-project/vllm - HaoyuWang4188 opened this issue 3 months ago

[V1][BugFix] Fix Generator construction in greedy + seed case

github.com/vllm-project/vllm - njhill opened this pull request 3 months ago

Add hf_transfer to testing image

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Kernel]Generalize Speculative decode from Cuda

github.com/vllm-project/vllm - xuechendi opened this pull request 3 months ago

[Usage]: disable pydantic request validation

github.com/vllm-project/vllm - matbee-eth opened this issue 3 months ago

Splitting attention kernel file

github.com/vllm-project/vllm - maleksan85 opened this pull request 3 months ago

[Misc] Improve Web UI

github.com/vllm-project/vllm - rafvasq opened this pull request 3 months ago

[Feature]: Enhance integration with advanced LB/gateways with better load/cost reporting and LoRA management

github.com/vllm-project/vllm - liu-cong opened this issue 3 months ago

[CI/Build] Automate PR body text cleanup

github.com/vllm-project/vllm - russellb opened this pull request 3 months ago

[Bug]:Structured outputs inference often took a very long time,and eventually causing a timeout and vLLM engine crushing.

github.com/vllm-project/vllm - hpx502766238 opened this issue 3 months ago

[Feature]: Add Gamma Distribution Request Support for Serving Benchmark.

github.com/vllm-project/vllm - spliii opened this issue 3 months ago

[Performance]: Throughput and Latency degradation with a single LoRA adapter on A100 40 GB

github.com/vllm-project/vllm - kaushikmitr opened this issue 3 months ago

[Core] Add dynamic chunk size calculation

github.com/vllm-project/vllm - prashantgupta24 opened this pull request 3 months ago

[Build] Fix for the Wswitch-bool clang warning

github.com/vllm-project/vllm - gshtras opened this pull request 3 months ago

[Doc] Updated TPU install instructions

github.com/vllm-project/vllm - mikegre-google opened this pull request 3 months ago

[Kernel] Refactor Cutlass c3x

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago

[Benchmark] guided decoding

github.com/vllm-project/vllm - aarnphm opened this pull request 3 months ago

[0/N] Rename `MultiModalInputs` to `MultiModalKwargs`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Bug]: PyTorch 2.5.x vLLM 1.0.0 dev issue with tensor parallel size > 1

github.com/vllm-project/vllm - CortexEdgeUser opened this issue 3 months ago

Online video support for VLMs

github.com/vllm-project/vllm - litianjian opened this pull request 3 months ago

[Bugfix] Free cross attention block table for preempted-for-recompute sequence group.

github.com/vllm-project/vllm - kathyyu-google opened this pull request 3 months ago

Adding cascade inference to vLLM

github.com/vllm-project/vllm - raywanb opened this pull request 3 months ago

[Bug]: vLLM multi-step scheduling crashes when input prompt is long

github.com/vllm-project/vllm - Terranlee opened this issue 3 months ago

[WIP] Ray Backend V1

github.com/vllm-project/vllm - rkooo567 opened this pull request 3 months ago

[Bugfix] Upgrade to pytorch 2.5.1

github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago

[BugFix] Do not raise a `ValueError` when `tool_choice` is set to the supported `none` option and `tools` are not defined.

github.com/vllm-project/vllm - gcalmettes opened this pull request 3 months ago

[Doc] Update VLM doc about loading from local files

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Bug]: last_token_time is equal to arrival_time

github.com/vllm-project/vllm - wolfgangsmdt opened this issue 3 months ago

[Misc] Modify BNB parameter name

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[Core] Enhance memory profiling in determine_num_available_blocks with error handling and fallback

github.com/vllm-project/vllm - Ahmed14z opened this pull request 3 months ago

[Bug]: For speculative decoding with a draft model, the "determine_num_available_blocks" only considers the memory usage of the target model

github.com/vllm-project/vllm - hustxiayang opened this issue 3 months ago

[Core] Use os.sched_yield in ShmRingBuffer instead of time.sleep

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago

[Bug]: Segment fault when import decord before import vllm

github.com/vllm-project/vllm - litianjian opened this issue 3 months ago

[Performance]: FP8 performance worse than FP16 for Qwen2-VL-2B-Instruct

github.com/vllm-project/vllm - LinJianping opened this issue 3 months ago

[Bug]: Llama3.2 tool calling OpenAI API not working

github.com/vllm-project/vllm - SinanAkkoyun opened this issue 3 months ago

[Bug]: I cannot able to load the model on TESLA T4 GPU in Full precision

github.com/vllm-project/vllm - VpkPrasanna opened this issue 3 months ago

[Bug]: internvl “max_dynamic_patch” not work, and add_special_tokens bug

github.com/vllm-project/vllm - wangpeng138375 opened this issue 3 months ago

[Bug]: [Regression Issue] The output from Qwen2 VL are different between vLLM v0.6.3-post1 and vLLM v0.6.1-post2

github.com/vllm-project/vllm - tjtanaa opened this issue 3 months ago

[Misc]Reduce BNB static variable

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[Bug]: Deploying glm4 reported an error："auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set

github.com/vllm-project/vllm - shnyyds opened this issue 3 months ago

[Usage]: Are there any batch size requirements for offline batch inference? For example, is 10,000 okay?

github.com/vllm-project/vllm - joyyyhuang opened this issue 3 months ago

[Bugfix] Fix E2EL mean and median stats

github.com/vllm-project/vllm - daitran2k1 opened this pull request 3 months ago

[5/N] pass the whole config to model

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers

github.com/vllm-project/vllm - sroy745 opened this pull request 3 months ago

[Installation]: Model Architectures FalconMambaForCasualLM are not supported for now.

github.com/vllm-project/vllm - RohithDAces opened this issue 3 months ago

[4/N] make quant config first-class citizen

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Feature]: do you plan to support "suffix" of "v1/completions"

github.com/vllm-project/vllm - qiao-wei opened this issue 3 months ago

[Bugfix][OpenVINO] Fix circular reference #9939

github.com/vllm-project/vllm - MengqingCao opened this pull request 3 months ago

[Bugfix] Fix `MQLLMEngine` hanging

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[V1] Prefix caching (take 2)

github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago

[Doc] correct schema in example batch jsonl file: max_completion_tokens -> max_tokens

github.com/vllm-project/vllm - staeiou opened this pull request 3 months ago

[CI] Basic Integration Test For TPU

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Usage]: How to use `llava-hf/llava-1.5-7b-hf` with bitsandbytes quantization in `vllm serve`?

github.com/vllm-project/vllm - asadfgglie opened this issue 3 months ago

../aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [8320,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && in dex < sizes[i] && "index out of bounds"` failed.

github.com/vllm-project/vllm - Wiselnn570 opened this issue 3 months ago

[Bug]: ValueError:Could not broadcast input array from shape (542,) into shape (512,)

github.com/vllm-project/vllm - sherlockma11 opened this issue 3 months ago

[help wanted]: fix broken xverse model

github.com/vllm-project/vllm - youkaichao opened this issue 3 months ago

[Hardware][CPU] Add ARM CPU backend

github.com/vllm-project/vllm - ShawnD200 opened this pull request 3 months ago

[BugFix]: properly deserialize `tool_calls` iterator before processing by mistral-common when MistralTokenizer is used

github.com/vllm-project/vllm - gcalmettes opened this pull request 3 months ago

[V1][VLM] Enable proper chunked prefill for multimodal models

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Bugfix] Fix Phi-3 BNB quantization with tensor parallel

github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago

[V1] Support per-request seed

github.com/vllm-project/vllm - njhill opened this pull request 3 months ago

[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1

github.com/vllm-project/vllm - FurtherAI opened this pull request 3 months ago

[Doc] Add documentation for Structured Outputs

github.com/vllm-project/vllm - ismael-dm opened this pull request 3 months ago

Bump the patch-update group with 3 updates

github.com/vllm-project/vllm - dependabot[bot] opened this pull request 3 months ago

[Core]Add New Run:ai Streamer Load format.

github.com/vllm-project/vllm - pandyamarut opened this pull request 3 months ago

[CI] Prune tests/models/decoder_only/language/* tests

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bug]: from vllm.platforms import current_platform infinite loop error with OpenVino Build.

github.com/vllm-project/vllm - CalebXDonoho opened this issue 3 months ago

[Bug]: Phi-3 cannot be used with bitsandbytes

github.com/vllm-project/vllm - yananchen1989 opened this issue 3 months ago

[CI] Prune down LM Eval test time

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[ci/build] Have dependabot ignore pinned dependencies

github.com/vllm-project/vllm - khluu opened this pull request 3 months ago

[CI] Prune back the number of tests in tests/kernels/*

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bugfix] Fix pickle of input when async output processing is on

github.com/vllm-project/vllm - wallashss opened this pull request 3 months ago

[misc] Allow partial prefix benchmarking & random input generation for prefix benchmarking

github.com/vllm-project/vllm - rickyyx opened this pull request 3 months ago

Doc: Improve benchmark documentation

github.com/vllm-project/vllm - rafvasq opened this pull request 3 months ago

[RFC] Propose a vulnerability management team

github.com/vllm-project/vllm - russellb opened this pull request 3 months ago