vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: ray error when tp>=2

github.com/vllm-project/vllm - Jimmy-Lu opened this issue 4 months ago

[vlm] Remove vision language config.

github.com/vllm-project/vllm - xwjiang2010 opened this pull request 4 months ago

[ Misc ] Expand Fp8 MoE Support to Qwen

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Usage]: Why is the useage information missing in the streaming call. Not streaming is there.

github.com/vllm-project/vllm - alfgo opened this issue 4 months ago

[ Misc ] Refactor Marlin Python Utilities

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale`

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[RFC]: Priority Scheduling

github.com/vllm-project/vllm - apatke opened this issue 4 months ago

[Feature]: Add readiness endpoint /ready and return /health earlier (vLLM on Kubernetes)

github.com/vllm-project/vllm - frittentheke opened this issue 4 months ago

[Bug]: Loading LoRA is super slow when using tensor parallel

github.com/vllm-project/vllm - markovalexander opened this issue 4 months ago

[Gemma 2 27B]: Update docker hub image to support gemma-2-27B-it

github.com/vllm-project/vllm - vipulgote1999 opened this issue 4 months ago

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization?

github.com/vllm-project/vllm - maxin9966 opened this issue 4 months ago

[Feature]: support Ascend 910B in the future

github.com/vllm-project/vllm - jkl375 opened this issue 4 months ago

[Bug]: benchmark_serving.py cannot calculate Median TTFT correctly

github.com/vllm-project/vllm - Sekri0 opened this issue 4 months ago

[Installation]: how to disable NCCL support on Jetson cevices

github.com/vllm-project/vllm - thunder95 opened this issue 4 months ago

[Bug]: ValidationError using langchain_community.llms.VLLM

github.com/vllm-project/vllm - santurini opened this issue 4 months ago

[Doc] Reinstate doc dependencies

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Bug]: Garbled Tokens appears in vllm generation result every time change to new LLM model (Qwen)

github.com/vllm-project/vllm - Jason-csc opened this issue 4 months ago

[Usage]: How to use beam search when request OpenAI Completions API

github.com/vllm-project/vllm - nguyenhoanganh2002 opened this issue 4 months ago

[Usage]: How to use --pipeline-parallel-size

github.com/vllm-project/vllm - XiaoYu2022 opened this issue 4 months ago

[Kernel] Unify the kernel used in flash attention backend

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 4 months ago

[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 4 months ago

Benchmark: add H100 suite

github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago

[Bug]: call for stack trace for "Watchdog caught collective operation timeout"

github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago

[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation

github.com/vllm-project/vllm - LeiWang1999 opened this pull request 4 months ago

[Bugfix] Make spec. decode respect per-request seed.

github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago

[Core] Introduce SPMD worker execution using Ray accelerated DAG

github.com/vllm-project/vllm - ruisearch42 opened this pull request 4 months ago

Support for quantized kv cache (compressed-tensors)

github.com/vllm-project/vllm - dbogunowicz opened this pull request 4 months ago

[Bug]: Producer process has been terminated before all shared CUDA tensors released (v 0.5.0 post1, v 0.4.3)

github.com/vllm-project/vllm - yaronr opened this issue 4 months ago

model test with cache

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Bug]: There are differences in the output results of the same prompt between vllm offline and online calls

github.com/vllm-project/vllm - ArlanCooper opened this issue 4 months ago

[New Model]: facebook/seamless-m4t-v2-large

github.com/vllm-project/vllm - frittentheke opened this issue 4 months ago

Test HF cache

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

embedings error python -m vllm.entrypoints.openai.api_server --trust-remote-code --model gte_Qwen2-7B-instruct --seed 48 --max-model-len 1000 --tensor-parallel-size 2 --gpu-memory-utilization 1 --dtype float16

github.com/vllm-project/vllm - 2679326161or opened this issue 4 months ago

[ci][distributed] add distributed test gptq_marlin with tp = 2

github.com/vllm-project/vllm - llmpros opened this pull request 4 months ago

[Hardware][Intel CPU] Adding intel openmp tunings in Docker file

github.com/vllm-project/vllm - zhouyuan opened this pull request 4 months ago

[Feature][Hardware][AMD] Enable Scaled FP8 GEMM on ROCm

github.com/vllm-project/vllm - HaiShaw opened this pull request 4 months ago

[ci][distributed] fix some cuda init that makes it necessary to use spawn

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[ci][distributed] fix phi-3v test failure

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[CI/Build] Temporarily Remove Phi3-Vision from TP Test

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[CI/Build] Reuse code for checking output consistency

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[BugFix] Ensure worker model loop is always stopped at the right time

github.com/vllm-project/vllm - njhill opened this pull request 4 months ago

[Frontend] Bad words sampling parameter

github.com/vllm-project/vllm - Alvant opened this pull request 4 months ago

[New Model]: support for BartForSequenceClassification

github.com/vllm-project/vllm - Sapessii opened this issue 4 months ago

about the RotaryEmbedding

github.com/vllm-project/vllm - tricky61 opened this issue 4 months ago

[Bug]: TypeError: FlashAttentionMetadata.__init__() missing 10 required positional arguments

github.com/vllm-project/vllm - lonngxiang opened this issue 4 months ago

[Bug]: AttributeError: 'NoneType' object has no attribute 'prefill_metadata'

github.com/vllm-project/vllm - lonngxiang opened this issue 4 months ago

[Misc] Update Phi-3-Vision Example

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG

github.com/vllm-project/vllm - stephanie-wang opened this pull request 4 months ago

[misc][doc] try to add warning for latest html

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bugfix][TPU] Fix TPU sampler output

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bugfix][TPU] Fix pad slot id

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bug]: New bug in last few days for phi-3-vision. The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (50944)

github.com/vllm-project/vllm - pseudotensor opened this issue 4 months ago

[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Core] Optimize `SequenceStatus.is_finished` by switching to IntEnum

github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago

test yum install

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Draft][Core] Refactor _prepare_model_input_tensors

github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago

[Misc] Fix `get_min_capability`

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[ Misc ] Isolate Fp8Moe From Mixtral

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Bug]: Phi-3 vision crash: TypeError: only integer tensors of a single element can be converted to an index

github.com/vllm-project/vllm - pseudotensor opened this issue 4 months ago

[misc][optimization] optimize data structure in allocator

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[CI/Build] [3/3] Reorganize entrypoints tests

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Model] Changes to MLPSpeculator to support tie_weights and input_scale

github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago

Whisper support

github.com/vllm-project/vllm - huseinzol05 opened this pull request 4 months ago

[Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

Unmark more files as executable

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Bug]: vLLM crash when running Phi-3-small-8k-instruct with enable-chunked-prefill

github.com/vllm-project/vllm - yaronr opened this issue 4 months ago

[Core] Adding Priority Scheduling

github.com/vllm-project/vllm - apatke opened this pull request 4 months ago

[Bug]: qwen1.5-32b-chat no response

github.com/vllm-project/vllm - linpan opened this issue 4 months ago

Add support for multi-node on CI

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Bugfix] Support `eos_token_id` from `config.json`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

Gemma2 models from google

github.com/vllm-project/vllm - bks5881 opened this issue 4 months ago

[Bug]: Chunked prefill vs. non-chunked output is different for a long prompt

github.com/vllm-project/vllm - felixzhu555 opened this issue 4 months ago

[Bugfix][CI/Build][Hardware][AMD] Install matching torchvision to fix AMD tests

github.com/vllm-project/vllm - mawong-amd opened this pull request 4 months ago

[Usage]: 是否可以多节点多CPU推理

github.com/vllm-project/vllm - JKYtydt opened this issue 4 months ago

add FAQ doc under 'serving'

github.com/vllm-project/vllm - llmpros opened this pull request 4 months ago

[Usage]: can I save log to a file?

github.com/vllm-project/vllm - chenchunhui97 opened this issue 4 months ago

[Hardware][Intel CPU]use ipex varlen attention to compute prompts for better performance.

github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago

[core][optimization] use a pool of numpy ndarray to hold seq data

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Kernel] Add per-tensor and per-token AZP epilogues

github.com/vllm-project/vllm - ProExpertProg opened this pull request 4 months ago

[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify Weight Loading)

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k

github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago

Virtual Office Hours: July 9 and July 25

github.com/vllm-project/vllm - mgoin opened this issue 4 months ago

[Bugfix] Only add `Attention.kv_scale` if kv cache quantization is enabled

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Frontend]openai base64 embedding: remove the message blocker for base64 embedding

github.com/vllm-project/vllm - llmpros opened this pull request 4 months ago

[New Model]: Florence-2

github.com/vllm-project/vllm - localbarrage opened this issue 4 months ago

[Kernel] Add punica dimensions for Granite 3b and 8b

github.com/vllm-project/vllm - joerunde opened this pull request 4 months ago

[Bugfix] fix missing last itl in openai completions benchmark

github.com/vllm-project/vllm - mcalman opened this pull request 4 months ago

[Misc] Extend vLLM Metrics logging API

github.com/vllm-project/vllm - SolitaryThinker opened this pull request 4 months ago

[Frontend] Support for chat completions input in the tokenize endpoint

github.com/vllm-project/vllm - sasha0552 opened this pull request 4 months ago

[ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Model] Initial support for BLIP-2

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Kernel] Prototype integration of bytedance/flux kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Bug]: FP8 checkpoints with fused linear modules fail to load scales correctly

github.com/vllm-project/vllm - mgoin opened this issue 4 months ago

[Bugfix] Fix precisions in Gemma 1

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules.

github.com/vllm-project/vllm - rkooo567 opened this pull request 4 months ago

[Model] Add Gemma 2

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bug]: TRACKING ISSUE: CUDA OOM with Logprobs

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 4 months ago

[Bug]: TRACKING ISSUE: `AsyncEngineDeadError`

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 4 months ago

wikivular

github.com/vllm-project/vllm - Wikivu opened this pull request 4 months ago

[Bug]: Inconsistent Responses with VLLM When Batch Size > 1 even temperature = 0

github.com/vllm-project/vllm - gjgjos opened this issue 4 months ago