vLLM issues | Ecosyste.ms: OpenCollective

[Misc]: Understanding Batching Mechanism in Prefill and Decode Phases

github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago

[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes

github.com/vllm-project/vllm - achandrasekar opened this issue 5 months ago

ci draft

github.com/vllm-project/vllm - khluu opened this pull request 5 months ago

[Model] Enable FP8 QKV in MoE and refine kernel tuning script

github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago

[Core] Change LoRA embedding sharding to support loading methods

github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago

[Kernel] Dynamic Per-Token Activation Quantization

github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago

[Kernel][RFC] Refactor the punica kernel based on Triton

github.com/vllm-project/vllm - jeejeelee opened this pull request 5 months ago

[Bug]: 英伟达最新驱动555.85，vllm运行报错

github.com/vllm-project/vllm - gaye746560359 opened this issue 5 months ago

[CI/Build] CMakeLists: build all extensions' cmake targets at the same time

github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago

[Misc]: LLM is responding with advertisement

github.com/vllm-project/vllm - Pocoyo7798 opened this issue 5 months ago

[FRONTEND] OpenAI `tools` support named functions

github.com/vllm-project/vllm - br3no opened this pull request 5 months ago

[Bugfix] logprobs is not compatible with the OpenAI spec #4795

github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago

[Bug]: Command-R incorrect output contains `<EOS_TOKEN>` and seems to do text prediction rather than conversation

github.com/vllm-project/vllm - epignatelli opened this issue 5 months ago

[BUGFIX] [FRONTEND] Correct chat logprobs

github.com/vllm-project/vllm - br3no opened this pull request 5 months ago

[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response.

github.com/vllm-project/vllm - fengshansi opened this issue 5 months ago

[Bugfix][Frontend] Cleanup "fix chat logprobs"

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Kernel] Initial commit containing new Triton kernels for multi lora serving.

github.com/vllm-project/vllm - FurtherAI opened this pull request 5 months ago

[Bug]: Wrong results in LangChain integration

github.com/vllm-project/vllm - Warit314 opened this issue 5 months ago

[Bug]: Mistral 7b inst v0.3 fails to run

github.com/vllm-project/vllm - yaronr opened this issue 5 months ago

[Bug]: UnboundLocalError: local variable 'lora_b_k' referenced before assignment

github.com/vllm-project/vllm - Stealthwriter opened this issue 5 months ago

[Core][2/N] Helpers for PP

github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago

Marlin moe integration

github.com/vllm-project/vllm - ElizaWszola opened this pull request 5 months ago

[Model] Add base class for LoRA-supported models

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[mypy] Enable type checking for test directory

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Feature] [Spec decode]: Combine chunked prefill with speculative decoding

github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago

[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance

github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago

[Bug]: inferences are not the same with batch mode when using

github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago

[Bug]: enable_chunked_prefill feature hangs on AMD Radeon PRO W7900 (gfx1100)

github.com/vllm-project/vllm - hongxiayang opened this issue 5 months ago

[Doc] add ccache guide in doc

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Usage]: Run local models using vLLM

github.com/vllm-project/vllm - bibutikoley opened this issue 5 months ago

[New Model]: tiiuae/falcon-11B

github.com/vllm-project/vllm - s-smits opened this issue 5 months ago

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is not defined

github.com/vllm-project/vllm - LetianLee opened this pull request 5 months ago

[Bug]: OpenAI LogProbs format for Chat-Completion is incorrect

github.com/vllm-project/vllm - br3no opened this issue 5 months ago

[Performance]: Vllm performance on L40s GPU

github.com/vllm-project/vllm - warlock135 opened this issue 5 months ago

[Bugfix] Adds outlines performance improvement

github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago

[Bugfix] Fix Mistral v0.3 Weight Loading

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Misc] Improve organization of utility and test code

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Feature]: Tensor Parallelism with non divisble amount of attention heads

github.com/vllm-project/vllm - NadavShmayo opened this issue 5 months ago

[Bug]: 0.4.2 error on H20

github.com/vllm-project/vllm - tohneecao opened this issue 5 months ago

[Dynamic Spec Decoding] Minor fix for disabling speculative decoding

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 5 months ago

[Bug]: Confuse with ray implements

github.com/vllm-project/vllm - vincent-pli opened this issue 5 months ago

[Performance]: Splitting model across GPUs with varying vRAM

github.com/vllm-project/vllm - ccruttjr opened this issue 5 months ago

[Bugfix] fix sharded state loader for lora

github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago

[Usage]: There is no response after the "GPU P2P capability or P2P test failed" warning is displayed. What can I do?

github.com/vllm-project/vllm - wzz981 opened this issue 5 months ago

[Feature]: Chunked prefill + lora

github.com/vllm-project/vllm - rkooo567 opened this issue 5 months ago

[WIP] Make chunekd prefill work with lora

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[New Model]: microsoft/Phi-3-small-128k-instruct

github.com/vllm-project/vllm - PeterAronZentai opened this issue 5 months ago

[Core][Distributed] improve p2p access check

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Frontend] [Core] Support for sharded tensorized models

github.com/vllm-project/vllm - tjohnson31415 opened this pull request 5 months ago

[Bug]: Loading mistral-7B-instruct-v03 KeyError: 'layers.0.attention.wk.weight'

github.com/vllm-project/vllm - timbmg opened this issue 5 months ago

[Core][1/N] Support PP PyNCCL Groups

github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago

[Model] Initialize Phi-3-vision support

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Core]: Option To Use Prompt Token Ids Inside Logits Processor

github.com/vllm-project/vllm - kezouke opened this pull request 5 months ago

[Usage]: How to start vLLM on a particular GPU?

github.com/vllm-project/vllm - kstyagi23 opened this issue 5 months ago

[Speculative Decoding] Medusa Implementation with Top-1 proposer

github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago

[Hardware][Intel] Optimize CPU backend and add more performance tips

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago

[Usage]: Is it possible to start 8 tp=1 LLMEngine on a 8-GPU machine?

github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago

[Feature]: Please optimize the output print info about time count.

github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago

[Feature]: support `stream_options` option

github.com/vllm-project/vllm - NiuBlibing opened this issue 5 months ago

[ROCm][AMD] Use pytorch sdpa math backend to do naive attention

github.com/vllm-project/vllm - hongxiayang opened this pull request 5 months ago

[Bugfix] Pass in CPU selector from worker

github.com/vllm-project/vllm - casassg opened this pull request 5 months ago

[Installation]: editable install fails with setuptools 70.0.0

github.com/vllm-project/vllm - 5cp opened this issue 5 months ago

[Misc] Take user preference in attention selector

github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago

[FastAPI related] Built an API, ended into torch.cuda.OutOfMemoryError: CUDA out of memory

github.com/vllm-project/vllm - rsong0606 opened this issue 5 months ago

[Feature]: microsoft/Phi-3-vision-128k-instruct Vision support

github.com/vllm-project/vllm - pseudotensor opened this issue 5 months ago

[Feature]: Support loading of sharded vLLM serialized models with Tensorizer

github.com/vllm-project/vllm - tjohnson31415 opened this issue 5 months ago

[model] AddRelPositionMultiHeadedAttention

github.com/vllm-project/vllm - rajveer43 opened this pull request 5 months ago

[Feature]: automatically select distributed inference backend

github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago

[Kernel] Fixup for CUTLASS kernels in CUDA graphs

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[New Model]: Phi-3-medium-128k-instruct support

github.com/vllm-project/vllm - ai8hyf opened this issue 5 months ago

[Misc] Small refactor on create_lora_manager()

github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago

[Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X

github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago

[Installation]: Reduce Image size when installing wheel with cuda 11.8

github.com/vllm-project/vllm - ch9hn opened this issue 5 months ago

[Feature]: pre release or nightly builds

github.com/vllm-project/vllm - nivibilla opened this issue 5 months ago

Fix phi 2 test

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[Model] MLPSpeculator speculative decoding support

github.com/vllm-project/vllm - JRosenkranz opened this pull request 5 months ago

[Bug]: Error executing method load_model. This might cause deadlock in distributed execution.

github.com/vllm-project/vllm - userandpass opened this issue 5 months ago

[CI/Build] Codespell ignore `build/` directory

github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago

[Bugfix][Kernel] Add head size check for attention backend selection

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Feature]: Support for MiniCPM-Llama3-V-2_5 the Multi-modal LLM

github.com/vllm-project/vllm - wizd opened this issue 5 months ago

[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support)

github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago

[Build/CI] set CMAKE_BUILD_TYPE=release on wheel build

github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago

[Bug]: run vllm api with docker

github.com/vllm-project/vllm - shudct opened this issue 5 months ago

[Misc] Support HF Hub remote loading for LoRA adapters

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Usage]: How to reload model when tensor_parallel_size > 1 ?

github.com/vllm-project/vllm - qy1026 opened this issue 5 months ago

Adding idefics2

github.com/vllm-project/vllm - jc9123 opened this pull request 5 months ago

text_generation_router::infer: router/src/infer.rs:130: no permits available

github.com/vllm-project/vllm - Ling-CF opened this issue 5 months ago

[Bugfix] Fix flag name for `max_seq_len_to_capture`

github.com/vllm-project/vllm - kerthcet opened this pull request 5 months ago

[Bug]: speculative decoding got `shape mismatch` error with n>1 and random sample

github.com/vllm-project/vllm - zxdvd opened this issue 5 months ago

[Performance] [Speculative decoding] Support draft model on different tensor-parallel size than target model

github.com/vllm-project/vllm - GeauxEric opened this pull request 5 months ago

[Frontend] Add prompt token ids into logit processor

github.com/vllm-project/vllm - xz-liu opened this pull request 5 months ago

[Misc] Minor change to improve FP8 thoughtput by 2%

github.com/vllm-project/vllm - elfiegg opened this pull request 5 months ago

[Model] add rope_scaling support for qwen2

github.com/vllm-project/vllm - hzhwcmhf opened this pull request 5 months ago

[Feature]: Add prompt token ids into logit processor?

github.com/vllm-project/vllm - xz-liu opened this issue 5 months ago

[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer

github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago

[RFC]: Postmerge performance suite

github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago

[Docs] Add acknowledgment for sponsors

github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago

Update setup.py

github.com/vllm-project/vllm - mohannad44 opened this pull request 5 months ago

[Bug]: Embedding model not working with tensor parallel

github.com/vllm-project/vllm - Vincent-Li-9701 opened this issue 5 months ago

[Build/CI] Switching to ROCm v. 6.1 in Dockerfile.rocm

github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago