vLLM issues | Ecosyste.ms: OpenCollective

Performance Regression between v0.4.0 and v0.4.1

github.com/vllm-project/vllm - simon-mo opened this issue 9 months ago

[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0

github.com/vllm-project/vllm - sangstar opened this pull request 9 months ago

[Usage]: Make request to LLAVA server.

github.com/vllm-project/vllm - premg16 opened this issue 9 months ago

[Usage]: How to use LoRARequest with AsyncLLMEngine?

github.com/vllm-project/vllm - Rares9999 opened this issue 9 months ago

[Installation]: Failed to build form source code. Python=3.9 CUDA=12.1

github.com/vllm-project/vllm - WJMacro opened this issue 9 months ago

[Frontend] Support GPT-4V Chat Completions API

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[Model] Initial support for LLaVA-NeXT

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[Bug]: KeyError: 'model.layers.24.mlp.down_proj.weight' for llama 7b model SqueezeLLM quantization

github.com/vllm-project/vllm - condy0919 opened this issue 9 months ago

[Core] Support image processor

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[Misc]: optimize eager mode host time

github.com/vllm-project/vllm - functionxu123 opened this pull request 9 months ago

[RFC]: Multi-modality Support Refactoring

github.com/vllm-project/vllm - ywang96 opened this issue 9 months ago

[Bug]: Disk I/O Error when using tools due to shared outlines cache database

github.com/vllm-project/vllm - AaronFriel opened this issue 9 months ago

[New Model]: Please update docker to support llama3

github.com/vllm-project/vllm - HangLu123 opened this issue 9 months ago

Adding max queue time parameter

github.com/vllm-project/vllm - KrishnaM251 opened this pull request 9 months ago

[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported

github.com/vllm-project/vllm - u650080 opened this issue 9 months ago

[Usage]: Llama 3 8B Instruct Inference

github.com/vllm-project/vllm - aliozts opened this issue 9 months ago

[Bug]: Server crash for bloom-3b while use prefix_caching, `AssertionError assert Lk in {16, 32, 64, 128}`

github.com/vllm-project/vllm - DefTruth opened this issue 9 months ago

Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`

github.com/vllm-project/vllm - simon-mo opened this pull request 9 months ago

[CI/Build] Further decouple HuggingFace implementation from ours during tests

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[BugFix] fix num_lookahead_slots missing in async executor

github.com/vllm-project/vllm - leiwen83 opened this pull request 9 months ago

[Misc]: How to access the KV cache directly?

github.com/vllm-project/vllm - BDHU opened this issue 9 months ago

[Feature]: AMD ROCm 6.1 Support

github.com/vllm-project/vllm - kannan-scalers-ai opened this issue 9 months ago

[Bug]: Processed prompts: 5%|▌ | 429/8535 [00:27<08:37, 15.68it/s] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

github.com/vllm-project/vllm - pangpang-xuan opened this issue 9 months ago

[Usage]: if I want to run a 34B model，like yi-34B-chat,how can I use multi GPU,I just have A100 40G

github.com/vllm-project/vllm - hellostronger opened this issue 9 months ago

[Usage]: How to get the latency of each request with benchmark_serving.py

github.com/vllm-project/vllm - wanzhenchn opened this issue 9 months ago

[Core] Enable prefix caching with block manager v2 enabled

github.com/vllm-project/vllm - leiwen83 opened this pull request 9 months ago

[Feature]: Phi2 LoRA support

github.com/vllm-project/vllm - zero-or-one opened this issue 9 months ago

[Misc]Add customized information for models

github.com/vllm-project/vllm - jeejeelee opened this pull request 9 months ago

[Bug]: Invalid Device Ordinal on ROCm

github.com/vllm-project/vllm - Bellk17 opened this issue 9 months ago

Added Support for guided decoding in offline interface

github.com/vllm-project/vllm - kevinbu233 opened this pull request 9 months ago

[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring

github.com/vllm-project/vllm - hongxiayang opened this pull request 9 months ago

[Feature]: Support HuggingFaceM4/idefics2-8b as vision model

github.com/vllm-project/vllm - pseudotensor opened this issue 9 months ago

[Misc] [CI]: AMD test flaky on main CI

github.com/vllm-project/vllm - cadedaniel opened this issue 9 months ago

[Model] Update MPT model with GLU and rope and add low precision layer norm

github.com/vllm-project/vllm - marov opened this pull request 9 months ago

[Model] Jamba support

github.com/vllm-project/vllm - mzusman opened this pull request 9 months ago

[CI/BUILD] enable intel queue for longer CPU tests

github.com/vllm-project/vllm - zhouyuan opened this pull request 9 months ago

[Bug]: VLLM's output is unstable when handling requests CONCURRENTLY.

github.com/vllm-project/vllm - zhengwei-gao opened this issue 9 months ago

[Bug]: deepseek-coder-33b-instruct and deepseek-coder-6.7b-instruct broken, but deepseek-llm-7b-chat and deepseek-llm-67b-chat work well

github.com/vllm-project/vllm - lgw2023 opened this issue 9 months ago

[Frontend][Core] Update Outlines Integration from `FSM` to `Guide`

github.com/vllm-project/vllm - br3no opened this pull request 9 months ago

[Bug]: NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered

github.com/vllm-project/vllm - pseudotensor opened this issue 9 months ago

[Bug]: --engine-use-ray is broken. #4100

github.com/vllm-project/vllm - jdinalt opened this pull request 9 months ago

[Bugfix] Fix naive attention typos and make it run on navi3x

github.com/vllm-project/vllm - maleksan85 opened this pull request 9 months ago

[Bug]: guided_json bad output for llama2-13b

github.com/vllm-project/vllm - pseudotensor opened this issue 9 months ago

[Model] Adding support for MiniCPM-V

github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 9 months ago

[FacebookAI/roberta-large]: vllm support for FacebookAI/roberta-large

github.com/vllm-project/vllm - pradeepdev-1995 opened this issue 9 months ago

[Bug]: vllm_C is missing.

github.com/vllm-project/vllm - Calvinnncy97 opened this issue 9 months ago

[Model] Add support for 360zhinao

github.com/vllm-project/vllm - garycaokai opened this pull request 9 months ago

[Bug]: RuntimeError: Unknown layout

github.com/vllm-project/vllm - zzlgreat opened this issue 9 months ago

[Bug]: sending request using response_format json twice breaks vLLM

github.com/vllm-project/vllm - samos123 opened this issue 9 months ago

[Feature]: Allow LoRA adapters to be specified as in-memory dict of tensors

github.com/vllm-project/vllm - jacobthebanana opened this issue 9 months ago

[Usage]: Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1

github.com/vllm-project/vllm - rohitnanda1443 opened this issue 9 months ago

Does vllm support both CUDA 11.3 version and PyTorch 1.12?

github.com/vllm-project/vllm - iclgg opened this issue 9 months ago

[Usage]: Problem when loading my trained model.

github.com/vllm-project/vllm - hummingbird2030 opened this issue 9 months ago

[Feature][Chunked prefill]: Make sliding window work

github.com/vllm-project/vllm - rkooo567 opened this issue 9 months ago

[Feature]: bitsandbytes support

github.com/vllm-project/vllm - orellavie1212 opened this issue 9 months ago

[Frontend] Refactor prompt processing

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[Bug]: start api server stuck

github.com/vllm-project/vllm - QianguoS opened this issue 9 months ago

[Model] [Kernel] Add 16, 32 kernel sizes in compliation

github.com/vllm-project/vllm - nbardy opened this pull request 9 months ago

[Installation]: Any plans on providing vLLM pre-compiled for ROCm?

github.com/vllm-project/vllm - satyamk7054 opened this issue 9 months ago

[Core] Support LoRA on quantized models

github.com/vllm-project/vllm - jeejeelee opened this pull request 9 months ago

[Installation]: VLLM is impossible to install.

github.com/vllm-project/vllm - GPaolo opened this issue 9 months ago

[Kernel] Fused MoE Config for Mixtral 8x22

github.com/vllm-project/vllm - ywang96 opened this pull request 9 months ago

[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号，无结果

github.com/vllm-project/vllm - li995495592 opened this issue 9 months ago

[Usage]: flash_attn vs xformers

github.com/vllm-project/vllm - VeryVery opened this issue 9 months ago

[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm

github.com/vllm-project/vllm - TNT3530 opened this issue 9 months ago

[Bug]: Command R+ GPTQ bad output on ROCm

github.com/vllm-project/vllm - TNT3530 opened this issue 9 months ago

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 9 months ago

[Feature]: Tree attention about Speculative Decoding

github.com/vllm-project/vllm - yukavio opened this issue 10 months ago

[CI/Build] Reduce race condition in docker build

github.com/vllm-project/vllm - youkaichao opened this pull request 10 months ago

[Misc]: Does prefix caching work together with multi lora?

github.com/vllm-project/vllm - sleepwalker2017 opened this issue 10 months ago

[Bug]: StableLM 12b head size incorrect

github.com/vllm-project/vllm - bjoernpl opened this issue 10 months ago

[Model] LoRA gptbigcode implementation

github.com/vllm-project/vllm - raywanb opened this pull request 10 months ago

[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics

github.com/vllm-project/vllm - DearPlanet opened this pull request 10 months ago

[Bug]: leading space within content via OpenAI Compatible Server

github.com/vllm-project/vllm - bufferoverflow opened this issue 10 months ago

[Usage]: How to offload some layers to CPU？

github.com/vllm-project/vllm - cheney369 opened this issue 10 months ago

想问下有一个稳定版本的docker 镜像吗？

github.com/vllm-project/vllm - huyang19881115 opened this issue 10 months ago

[Model] Initialize Fuyu-8B support

github.com/vllm-project/vllm - Isotr0py opened this pull request 10 months ago

[Bug]: Cannot use FlashAttention because the package is not found. Please install it for better performance.

github.com/vllm-project/vllm - pseudotensor opened this issue 10 months ago

[Bug]: Getting subprocess.CalledProcessError: Command '['/usr/bin/gcc',] error message.

github.com/vllm-project/vllm - yk287 opened this issue 10 months ago

[Usage]: I have two Gpus, how do I make my model run on 2 gpus

github.com/vllm-project/vllm - hxujal opened this issue 10 months ago

[Kernel] PyTorch Labs Fused MoE Kernel Integration

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 10 months ago

[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes

github.com/vllm-project/vllm - Jeffwan opened this issue 10 months ago

[Bug]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

github.com/vllm-project/vllm - guangweiShaw opened this issue 10 months ago

[Bug]:Failed that we generate the pompts with the google/gemma-2b model by the python code,

github.com/vllm-project/vllm - 936187425 opened this issue 10 months ago

[Usage]: How to determine whether the vllm engine is full with requests or not

github.com/vllm-project/vllm - man2machine opened this issue 10 months ago

[Bug]: killed due to high memory usage

github.com/vllm-project/vllm - xiewf1990 opened this issue 10 months ago

[Bug]: Cannot load lora adapters in WSL 2

github.com/vllm-project/vllm - invokeinnovation opened this issue 10 months ago

[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x

github.com/vllm-project/vllm - vgod-dbx opened this issue 10 months ago

[Doc/Feature]: Llava 1.5 in OpenAI compatible server

github.com/vllm-project/vllm - stikkireddy opened this issue 10 months ago

[Roadmap] vLLM Roadmap Q2 2024

github.com/vllm-project/vllm - simon-mo opened this issue 10 months ago

[Misc]: Can we remove `vllm/entrypoints/api_server.py`?

github.com/vllm-project/vllm - hmellor opened this issue 10 months ago

[Frontend] openAI entrypoint dynamic adapter load

github.com/vllm-project/vllm - DavidPeleg6 opened this pull request 10 months ago

[Bug]: Error happen in async_llm_engine when use multiple GPUs

github.com/vllm-project/vllm - for-just-we opened this issue 10 months ago

[Misc]: Implement CPU/GPU swapping in BlockManagerV2

github.com/vllm-project/vllm - Kaiyang-Chen opened this pull request 10 months ago

[Core] :loud_sound: Improve request logging truncation

github.com/vllm-project/vllm - joerunde opened this pull request 10 months ago

[Model] Cohere CommandR+

github.com/vllm-project/vllm - saurabhdash2512 opened this pull request 10 months ago

[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend

github.com/vllm-project/vllm - jikunshang opened this pull request 10 months ago

[Installation]: Tesla V100 cuda11.4, I have no permission to install a upper-version cuda driver, how can I install vllm? I have tried to build from source and use pip, both failed.

github.com/vllm-project/vllm - LaVieEnRose365 opened this issue 10 months ago

[Bug]: YI:34B在使用上无法停止。

github.com/vllm-project/vllm - cat2353050774 opened this issue 10 months ago

[Feature]: Make `outlines` dependency optional

github.com/vllm-project/vllm - saattrupdan opened this issue 10 months ago