vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2

github.com/vllm-project/vllm - vlsav opened this issue 3 months ago

[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression?

github.com/vllm-project/vllm - frittentheke opened this issue 3 months ago

[CI/Build] Remove "boardwalk" image asset

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs

github.com/vllm-project/vllm - KrishnaM251 opened this pull request 3 months ago

[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization

github.com/vllm-project/vllm - wushidonguc opened this pull request 3 months ago

[Misc] Log spec decode metrics

github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago

[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron

github.com/vllm-project/vllm - servient-ashwin opened this issue 3 months ago

[Model] H2O Danube3-4b

github.com/vllm-project/vllm - g-eoj opened this pull request 3 months ago

[Bug]: Seed issue with Pipeline Parallel

github.com/vllm-project/vllm - andoorve opened this issue 3 months ago

[Not for review] PP ADAG

github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago

[Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it

github.com/vllm-project/vllm - candowu opened this issue 3 months ago

[Core] Use numpy to speed up padded token processing

github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago

[Draft] proposal for ipex quant support

github.com/vllm-project/vllm - jikunshang opened this pull request 3 months ago

[doc][misc] doc update

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct

github.com/vllm-project/vllm - lance0108 opened this issue 3 months ago

[Doc] add env docs for flashinfer backend

github.com/vllm-project/vllm - DefTruth opened this pull request 3 months ago

[VLM] Minor space optimization for `ClipVisionModel`

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

Add FUNDING.yml

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

v0.5.2, v0.5.3, v0.6.0 Release Tracker

github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago

bump version to v0.5.2

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

[Bug]: autogen can't work with vllm v0.5.1

github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago

[Doc][CI/Build] Update docs and tests to use `vllm serve`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Bugfix] Convert image to RGB by default

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Bug]: illegal memory access when increase max_model_length on FP8 models

github.com/vllm-project/vllm - IEI-mjx opened this issue 3 months ago

[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests'

github.com/vllm-project/vllm - lxline opened this pull request 3 months ago

[Bug]: Paligemma support for PNG files

github.com/vllm-project/vllm - BabyChouSr opened this issue 3 months ago

[ CI ] 0.4.3.post1 Hotfix

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug

github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago

[Feature]: Return softmax of attention layer.

github.com/vllm-project/vllm - DouHappy opened this issue 3 months ago

[ Misc ] Enable Quantizing All Layers of DeekSeekv2

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[ Kernel ] AWQ Fused MoE

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“

github.com/vllm-project/vllm - ZHJ19970917 opened this issue 3 months ago

[ci][build] fix commit id

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests

github.com/vllm-project/vllm - g-eoj opened this pull request 3 months ago

[doc][distributed] add suggestion for distributed inference

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[ Misc ] Apply MoE Refactor to Qwen2 + Deepseekv2 To Support Fp8

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Feature]: Apply chat template through `LLM` class

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago

[ Kernel ] AWQ Fused MoE

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Bug]: Timeout Error When Deploying Llamafied InternLM2-5-7B-Chat-1M Model via vLLM OpenAI API Server

github.com/vllm-project/vllm - mf-skjung opened this issue 3 months ago

[Bugfix][CI/Build] Fix testing for generated commit hash

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Doc] Add documentations for nightly benchmarks

github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago

Updating LM Format Enforcer version to v10.3

github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago

[ci][distributed] add pipeline parallel correctness test

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF

github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago

when i set tensor_parallel_size>1(A100 * 4), it does not work

github.com/vllm-project/vllm - cx-hub opened this issue 3 months ago

[core][distributed] simplify code to support pipeline parallel

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

Remove unnecessary trailing period in spec_decode.rst

github.com/vllm-project/vllm - terrytangyuan opened this pull request 3 months ago

Report usage for beam search

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

[Model] Pipeline parallel support for Mixtral

github.com/vllm-project/vllm - binxuan opened this pull request 3 months ago

[Misc] Add deprecation warning for beam search

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Misc] Disambiguate quantized types via a new ScalarType

github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago

[Bug]: Gemma-2 + FlashInfer: ValueError: Unsupported max_frags_z:

github.com/vllm-project/vllm - HanGuo97 opened this issue 3 months ago

[CI/Build] Cross python wheel

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Doc] xpu backend requires running setvars.sh

github.com/vllm-project/vllm - rscohn2 opened this pull request 3 months ago

[Bug]: Problem loading Gemma 2 27b-it

github.com/vllm-project/vllm - rdaiello opened this issue 3 months ago

[Bug]: Runtime AssertionError: 32768 is not divisible by 3, multiproc_worker_utils.py:120, when using 3 GPUs for tensor-parallel

github.com/vllm-project/vllm - haltingstate opened this issue 3 months ago

[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago

[RFC]: A Graph Optimization System in vLLM using torch.compile

github.com/vllm-project/vllm - bnellnm opened this issue 3 months ago

torch.compile based model optimizer

github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago

[Bug]: vLLM 0.5.1 tensor parallel 2 hang

github.com/vllm-project/vllm - Flynn-Zh opened this issue 3 months ago

[BUGFIX] Raise an error for no draft token case when draft_tp>1

github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 3 months ago

[Feature]: Request for Ascend NPU support

github.com/vllm-project/vllm - xuedinge233 opened this issue 3 months ago

[ Misc ] More Cleanup of Marlin

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[ Misc ] Support Act Order in Compressed Tensors

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[BigFix] Fix the lm_head in gpt_bigcode in lora mode

github.com/vllm-project/vllm - maxdebayser opened this pull request 3 months ago

[ Misc ] Support Models With Bias in `compressed-tensors` integration

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Installation]: Running ohereForAI/c4ai-command-r-v01 with main pytorch

github.com/vllm-project/vllm - laithsakka opened this issue 3 months ago

[Bugfix] Fix Ray Metrics API usage

github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago

[ Misc ] Remove separate bias add

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in vllm to get device count

github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago

[Misc] Remove flashinfer warning, add flashinfer tests to CI

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 3 months ago

[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub

github.com/vllm-project/vllm - adityagoel14 opened this pull request 3 months ago

[Bugfix] Fix usage stats logging exception warning with OpenVINO

github.com/vllm-project/vllm - helena-intel opened this pull request 3 months ago

[Feature]: FlashAttention 3 support

github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago

[doc] update pipeline parallel in readme

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[distributed][misc] keep consistent with how pytorch finds libcudart.so

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[BugFix] BatchResponseData body should be optional

github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago

[Kernel] Fix identical branches

github.com/vllm-project/vllm - stevegrubb opened this pull request 3 months ago

[Model][Phi3-Small] Remove scipy from blocksparse_attention

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bug]: OpenAI batch file format pydantic validation error

github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago

[Misc] add fixture to guided processor tests

github.com/vllm-project/vllm - kevinbu233 opened this pull request 3 months ago

[Bug]: get that Exception in thread Thread-3 (_report_usage_worker): (vllm OpenVINO，When python3 vllm/benchmarks/benchmark_throughput.py，)

github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago

[bug fix] Fix llava next feature size calculation.

github.com/vllm-project/vllm - xwjiang2010 opened this pull request 3 months ago

[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago

[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly

github.com/vllm-project/vllm - thies1006 opened this issue 3 months ago

[Performance]: how to use NVIDIA Nsight Compute in lunix

github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago

fix cuda118 can't find libcudart.so error

github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago

[Bug]: Unable to run phi-3-small in latest release

github.com/vllm-project/vllm - ssmi153 opened this issue 3 months ago

[Bug]: Error on inference with LoRa request (safetensors format)

github.com/vllm-project/vllm - tsvisab opened this issue 3 months ago

[Bug]: `tests/basic_correctness/test_chunked_prefill.py` is failing on main in fp32

github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago

[Bug]: Gemma 2 GPTQ - Complete output via API but incomplete through batch inference

github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago

wip

github.com/vllm-project/vllm - thri5ha opened this pull request 3 months ago

[Bug]: Gloo 库无法在两台计算机之间进行通信

github.com/vllm-project/vllm - JKYtydt opened this issue 3 months ago

[Bug]: VLLM's output is unstable version==0.5.1

github.com/vllm-project/vllm - ffxmm opened this issue 3 months ago

[Model] RowParallelLinear: pass bias to quant_method.apply

github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago

[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules.

github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago

[Usage]: Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt

github.com/vllm-project/vllm - tusharraskar opened this issue 3 months ago

[Feature]: Hybrid Attention

github.com/vllm-project/vllm - leo6022 opened this issue 3 months ago

[Bug]: VLLM 0.5.1 with LLaVA 1.6 exceptions

github.com/vllm-project/vllm - andrePankraz opened this issue 3 months ago

[Model]: Support for InternVL2

github.com/vllm-project/vllm - Weiyun1025 opened this issue 3 months ago