vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: deploy Phi-3-mini-128k-instruct AssertionError

github.com/vllm-project/vllm - hxujal opened this issue 5 months ago

[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput

github.com/vllm-project/vllm - Ourspolaire1 opened this issue 5 months ago

[Kernel] add bfloat16 support for gptq kernel

github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago

[Installation]: Stuck for two hours during the installation of vllm

github.com/vllm-project/vllm - loxs123 opened this issue 5 months ago

[Feature][Kernel] Support bitsandbytes quantization and QLoRA

github.com/vllm-project/vllm - chenqianfzh opened this pull request 5 months ago

[core] SequenceController in SamplingParams

github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago

Sync huggingface modifications of qwen Moe model

github.com/vllm-project/vllm - eigen2017 opened this pull request 5 months ago

[CI/Build] build on empty device for better dev experience

github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt

github.com/vllm-project/vllm - leejamesss opened this issue 5 months ago

[Feature]: Host CPU Docker image on Docker Hub

github.com/vllm-project/vllm - VMinB12 opened this issue 5 months ago

[Feature]: CI: Test on NVLink-enabled machine

github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago

[Misc] Logits processor plugins

github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago

[Kernel] sliding window support in paged_attention_v1/v2 kernels

github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago

[Misc]Easier access to the nccl library

github.com/vllm-project/vllm - Cyuchuan opened this pull request 5 months ago

[Feature]: could paged_attention_v1 support parameter 'attn_bias'

github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago

[Bugfix] Fix call to init_logger in openai server

github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago

[Core][Bugfix]: fix prefix caching for blockv2

github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)

github.com/vllm-project/vllm - bratao opened this issue 5 months ago

[Core][Bugfix]: fix prefix caching for blockv2

github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago

[Core][Bugfix]: fix prefix caching for blockv2

github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago

[Performance]: Why the avg. througput generation is low?

github.com/vllm-project/vllm - rvsh2 opened this issue 5 months ago

[CI/Build] Enable entrypoints tests to be run in a single command

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Frontend] Re-enable custom roles in Chat Completions API

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Core][Distributed] add fast broadcast for tensor dict

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8

github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)

github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago

[Core][Distributed] refactor custom allreduce to support multiple tp groups

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

Support fp8 KV cache in `context_attention_fwd`

github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago

Add TensorizerArgs to client api server

github.com/vllm-project/vllm - vrdn-23 opened this pull request 5 months ago

[Misc] Enhance attention selector

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Core][Test] fix function name typo in custom allreduce

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Kernel] Add w8a8 CUTLASS kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[CI] Nits for bad initialization of SeqGroup in testing

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Usage]: prompt_logprompt from endpoint

github.com/vllm-project/vllm - basma-b opened this issue 5 months ago

[Usage]: How to batch requests to chat models with OpenAI server?

github.com/vllm-project/vllm - sidjha1 opened this issue 5 months ago

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU

github.com/vllm-project/vllm - danielstankw opened this issue 5 months ago

[RFC]: Support specifying quant_config details in the LLM or Server entrypoints

github.com/vllm-project/vllm - mgoin opened this issue 5 months ago

[Bug]: ValueError when using LoRA with CohereForCausalLM model

github.com/vllm-project/vllm - onlyfish79 opened this issue 5 months ago

[Bug]: squeezeLLM with sparse could not work.

github.com/vllm-project/vllm - RyanWMHI opened this issue 5 months ago

[Bug]: why the logits is different between 0.4.1 and 0.4.2

github.com/vllm-project/vllm - sitabulaixizawaluduo opened this issue 5 months ago

[New Model]: Blip2 Support required

github.com/vllm-project/vllm - anisingh1 opened this issue 5 months ago

[CI/Build] use setuptools-scm to set __version__

github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago

[Core] Fix circular reference which leaked llm instance in local dev env

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention)

github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago

[Bugfix] Fix CLI arguments in OpenAI server docs

github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago

[Core]fix type annotation for `swap_blocks`

github.com/vllm-project/vllm - jikunshang opened this pull request 5 months ago

[Bug]: Unable to serve Llama3 using vLLM Docker container

github.com/vllm-project/vllm - vecorro opened this issue 5 months ago

[Speculative decoding] Improve n-gram efficiency

github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago

[Performance]: Why does vllm spend so much memory even using OPT model?

github.com/vllm-project/vllm - MitchellX opened this issue 5 months ago

[CI/Build] Enforce style for C++ and CUDA code with `clang-format`

github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago

[Feature]: Enforce formatting standards for C++ and CUDA code

github.com/vllm-project/vllm - mgoin opened this issue 5 months ago

[Misc] Added devcontainer to help vscode dev setup

github.com/vllm-project/vllm - ElefHead opened this pull request 5 months ago

[Misc] Apply a couple g++ cleanups

github.com/vllm-project/vllm - stevegrubb opened this pull request 5 months ago

[CORE] Improvement in ranks code

github.com/vllm-project/vllm - SwapnilDreams100 opened this pull request 5 months ago

[Bugfix] Add logs for all model dtype casting

github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago

[Misc] Keep only one implementation of the create_dummy_prompt function.

github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago

[Bug]: Not able to do lora inference with phi-3

github.com/vllm-project/vllm - WeiXiaoSummer opened this issue 5 months ago

[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4

github.com/vllm-project/vllm - frankxyy opened this issue 5 months ago

[CI/Build] Tweak Marlin Nondeterminism Issues

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

add `TypeLogitsProcessor`

github.com/vllm-project/vllm - eitanturok opened this pull request 5 months ago

[Bugfix] Update grafana.json

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Doc] Add API reference for offline inference

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bugfix] Fix CLI arguments in OpenAI server docs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16

github.com/vllm-project/vllm - victorzhz111 opened this issue 5 months ago

[Doc]: OpenAI Server Command Line Args Broken

github.com/vllm-project/vllm - noamgat opened this issue 5 months ago

[Bug]: use thread after call multiple times. KeyError: request_id

github.com/vllm-project/vllm - xubzhlin opened this issue 5 months ago

[Misc] Set block size at initialization & Fix test_model_runner

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ?

github.com/vllm-project/vllm - yyccli opened this issue 5 months ago

[Misc] Remove unnecessary ModelRunner imports

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Performance]: why hf is better than vllm when using benchmark throughput

github.com/vllm-project/vllm - yuki252111 opened this issue 5 months ago

[Feature]: Supporting a version of Consistency LLM

github.com/vllm-project/vllm - usaxena-asapp opened this issue 5 months ago

[Performance]: large rate of decrease in generation throughput when SamplingParams.logprobs increases

github.com/vllm-project/vllm - jeffrey-fong opened this issue 5 months ago

[Performance]: benchmarking vllm copy kernel and pytorch index copy

github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago

[Bug]: Chunked prefill returning gibberish in some cases.

github.com/vllm-project/vllm - fmmoret opened this issue 5 months ago

[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies

github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago

[Bug]: VLLM + tritonserver

github.com/vllm-project/vllm - dlopes78 opened this issue 5 months ago

[Feature]: bind python and c++ through tools other than pybind11

github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago

Remove Ray health check

github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago

Installation with CPU with errors

github.com/vllm-project/vllm - ming-ddtechcg opened this issue 5 months ago

[Kernel] [FP8] Improve FP8 linear layer performance

github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago

[Core] Implement sharded state loader

github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago

[Frontend] OpenAI API server: Do not add bos token by default when encoding

github.com/vllm-project/vllm - bofenghuang opened this pull request 5 months ago

[Misc] Add OpenTelemetry support

github.com/vllm-project/vllm - ronensc opened this pull request 5 months ago

[Doc]: API reference for LLM class

github.com/vllm-project/vllm - zplizzi opened this issue 5 months ago

[Usage]: Get time statistics with each request

github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago

[Bug]: assert parts[0] == "base_model" AssertionError

github.com/vllm-project/vllm - Edisonwei54 opened this issue 5 months ago

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[Feature]: support for aixcoder

github.com/vllm-project/vllm - chucksylar opened this issue 5 months ago

[Feature]: support lora such as qwen-7b and qwen1.5

github.com/vllm-project/vllm - kynow2 opened this issue 5 months ago

[Feature]: vAttention

github.com/vllm-project/vllm - nivibilla opened this issue 5 months ago

[Frontend] Move async logic outside of constructor

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs

github.com/vllm-project/vllm - cadedaniel opened this pull request 5 months ago

[Usage]: prefix-caching

github.com/vllm-project/vllm - chenchunhui97 opened this issue 5 months ago

[Feature]: Support for a draft model that takes inputs from base model (to support Medusa/EAGLE/Hydra)

github.com/vllm-project/vllm - abhigoyal1997 opened this issue 5 months ago

[Misc] enable DynamicNTKScalingRotaryEmbedding YaRNScalingRotaryEmbedding test case

github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'max_tokens must be at least 1, got -186.', 'type': 'BadRequestError', 'param': None, 'code': 400}

github.com/vllm-project/vllm - Mrdragonyuan opened this issue 5 months ago

[Bug]: Unable to use vLLM for serving fine tuned mistral model

github.com/vllm-project/vllm - praveen-kanamarlapudi opened this issue 5 months ago

[RFC]: Inline Golden (Expected) Tests

github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used

github.com/vllm-project/vllm - alexeykondrat opened this pull request 5 months ago