vLLM issues | Ecosyste.ms: OpenCollective

[CI/Build] Add TP test for vision models

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Bugfix] Fix img_sizes Parsing in Phi3-Vision

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Feature]: Request for SmartSpec Method Support

github.com/vllm-project/vllm - bong-furiosa opened this issue 4 months ago

feat: controlling max queue time

github.com/vllm-project/vllm - KrishnaM251 opened this pull request 4 months ago

[Bug]: Exception in ASGI application

github.com/vllm-project/vllm - houshuai-cs opened this issue 4 months ago

[Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and `weight_loader_v2`

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Bug]: vllm stuck when using prompt_token_ids and setting prompt_logprobs

github.com/vllm-project/vllm - xinyangz opened this issue 4 months ago

[Hardware][TPU] Implement tensor parallelism with Ray

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bug]: LLaVa Next Value Error - "Incorrect type of image sizes" when running in Docker

github.com/vllm-project/vllm - FennFlyer opened this issue 4 months ago

wikivu Dockerfile

github.com/vllm-project/vllm - Wikivu opened this pull request 4 months ago

[Bug]: A bug when running examples/llava_example.py with image_features as input and multiple GPUs enabled

github.com/vllm-project/vllm - yinsong1986 opened this issue 4 months ago

[Feature]: support logging input and output

github.com/vllm-project/vllm - NiuBlibing opened this issue 4 months ago

add benchmark for fix length input and output

github.com/vllm-project/vllm - haichuan1221 opened this pull request 4 months ago

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended)

github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 4 months ago

[Hardware][TPU] Support parallel sampling

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Misc]: CUDAGraph captured generation stuck with custom_all_reduce and tensor_parallel=2

github.com/vllm-project/vllm - nuzant opened this issue 4 months ago

[Bug]: server error when hosting TheBloke/Llama-2-7B-Chat-GPTQ with chunked-prefill

github.com/vllm-project/vllm - George-ao opened this issue 4 months ago

[VLM] Remove `image_input_type` from VLM config

github.com/vllm-project/vllm - xwjiang2010 opened this pull request 4 months ago

[VLM] Remove support for pixel_values and image_features.

github.com/vllm-project/vllm - xwjiang2010 opened this pull request 4 months ago

[Hardware][TPU] Raise errors for unsupported sampling params

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[BugFix] Fix `min_tokens` behaviour for multiple eos tokens

github.com/vllm-project/vllm - njhill opened this pull request 4 months ago

[CI] [Flaky test] distributed/test_shm_broadcast.py is flaky

github.com/vllm-project/vllm - cadedaniel opened this issue 4 months ago

[Bug]: OutOfMemoryError when loading a small model with a huge context length

github.com/vllm-project/vllm - alugowski opened this issue 4 months ago

[Bugfix] Fix off-by-one bug in decode_prompt_logprobs_inplace()

github.com/vllm-project/vllm - zifeitong opened this pull request 4 months ago

[CI/Build] This PR is an experiment only. No need to merge it.

github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 4 months ago

[Bugfix] Add await to async_get_and_parse_image to ensure image is properly uploaded.

github.com/vllm-project/vllm - thealmightygrant opened this pull request 4 months ago

[Bug]: VLM same chat different image results in serving_chat.py:238 Error in loading image data:

github.com/vllm-project/vllm - thealmightygrant opened this issue 4 months ago

[Installation]: ValueError: Quantization method specified in the model config (gptq) does not match the quantization method specified in the `quantization` argument (gptq_marlin).

github.com/vllm-project/vllm - rsong0606 opened this issue 4 months ago

[Bugfix] Fix assertion in NeuronExecutor

github.com/vllm-project/vllm - aws-patlange opened this pull request 4 months ago

[Bug]: Neuron offline inferenc example assertion error

github.com/vllm-project/vllm - aws-patlange opened this issue 4 months ago

[ CI/Build ] Added E2E Test For Compressed Tensors

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[ CI/Build ] LM Eval Harness Based CI Testing

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

auto-merge test PR

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

feat: passing hf_config args through openai server

github.com/vllm-project/vllm - KrishnaM251 opened this pull request 4 months ago

[Feature]: Support in distributed speculative inference

github.com/vllm-project/vllm - keyboardAnt opened this issue 4 months ago

[doc][distributed] add both gloo and nccl tests

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bug]: Inconsistent Output from OPT-x models

github.com/vllm-project/vllm - NihalPotdar opened this issue 4 months ago

[Misc][Doc] Add Example of using OpenAI Server with VLM

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Hardware][TPU] Refactor TPU backend

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Usage]: Multi-LoRA questions

github.com/vllm-project/vllm - kfswain opened this issue 4 months ago

[Bugfix] Fix embedding to support 2D inputs

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bug]: unhandled system error with NCCL on v0.5.0.post1

github.com/vllm-project/vllm - eByteTheDust opened this issue 4 months ago

[Bug]: Internal Server Error when hosting Alibaba-NLP/gte-Qwen2-7B-instruct

github.com/vllm-project/vllm - markkofler opened this issue 4 months ago

[Bug]: Model architectures ['NVEmbedModel'] are not supported for now

github.com/vllm-project/vllm - markkofler opened this issue 4 months ago

[RFC]: Classifier-Free Guidance

github.com/vllm-project/vllm - Vermeille opened this issue 4 months ago

[Misc] Remove useless code in cpu_worker

github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago

[Bug]: AsyncEngineDeadError: Background loop is stopped after invalid parameter in request

github.com/vllm-project/vllm - guillaumerenault opened this issue 4 months ago

[CI/Build] Refactor image test assets

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Model] Initialize deepseek-vl support

github.com/vllm-project/vllm - liuyancong-enflame-tech opened this pull request 4 months ago

[RFC] Changes to CI workflow for PRs

github.com/vllm-project/vllm - khluu opened this issue 4 months ago

[Bug]: Test_skip_speculation fails in distributed execution

github.com/vllm-project/vllm - wooyeonlee0 opened this issue 4 months ago

[distributed][kernel]support tensor-parallelism in bitsandbytes quant…

github.com/vllm-project/vllm - chenqianfzh opened this pull request 4 months ago

[distributed][models] add out-of-tree model registration support for distributed inference

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Feature]: MLPSpeculator Tensor Parallel support

github.com/vllm-project/vllm - njhill opened this issue 4 months ago

[Model]: Adding support for MiniCPM-Llama3-V-2_5

github.com/vllm-project/vllm - ssuncheol opened this issue 4 months ago

v0.5.1 Release Tracker

github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago

[Roadmap] vLLM Roadmap Q3 2024

github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago

[Usage]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'

github.com/vllm-project/vllm - mikestut opened this issue 4 months ago

[Misc]: vLLM logger disables other existing loggers by default

github.com/vllm-project/vllm - a-ys opened this issue 4 months ago

[bugfix][distributed] fix shm broadcast when the queue size is full

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Feature]: Integration Testing with lm-eval-harness

github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago

[Spec Decode] Introduce DraftModelRunner

github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago

[CI/Build] Add unit testing for FlexibleArgumentParser

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Misc] Update `w4a16` `compressed-tensors` support to include `w8a16`

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Bug]: Different quality responses using GPTQ / marlin kernels on A10 vs A100 GPUs

github.com/vllm-project/vllm - joe-schwartz-certara opened this issue 4 months ago

[CI/Build] Add E2E tests for MLPSpeculator

github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago

[Bug]: please set tensor_parallel_size to less than max local gpu count

github.com/vllm-project/vllm - RodriMora opened this issue 4 months ago

[Usage]: About the use of benchmark_latency.py

github.com/vllm-project/vllm - xwentian2020 opened this issue 4 months ago

[Bug]: vLLM 0.4.2 8xH100 init failed

github.com/vllm-project/vllm - xiejibing opened this issue 4 months ago

vLLM freezes on OOM

github.com/vllm-project/vllm - Semihal opened this issue 4 months ago

[doc][faq] add warning to download models for every nodes

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bug]: When using multi-node offline distributed inference, VLLM gets stuck at the last few samples of inference

github.com/vllm-project/vllm - LuJunru opened this issue 4 months ago

[Feature]: Use 64-bit integers as indices in cuda kernels

github.com/vllm-project/vllm - courage17340 opened this issue 4 months ago

[Usage]: 使用vllm最新版0.5.0.post1启动模型

github.com/vllm-project/vllm - lxb0425 opened this issue 4 months ago

[Bug]: 使用vllm+ray分布式推理报错

github.com/vllm-project/vllm - JKYtydt opened this issue 4 months ago

[Bug]: vllm批量推理报错

github.com/vllm-project/vllm - lxb0425 opened this issue 4 months ago

[Bug]: vllm deploy internLM2 not stop until out of max model length

github.com/vllm-project/vllm - xiyao23 opened this issue 4 months ago

[Model] add telechat52b

github.com/vllm-project/vllm - shunxing12345 opened this pull request 4 months ago

[RFC]: A Flexible Architecture for Distributed Inference

github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago

[Bugfix] Update run_batch.py to handle larger numbers of batches

github.com/vllm-project/vllm - w013nad opened this pull request 4 months ago

[New Model]: bump a new version of vllm to support Qwen2 series

github.com/vllm-project/vllm - AlphaINF opened this issue 4 months ago

[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement

github.com/vllm-project/vllm - Isotr0py opened this pull request 4 months ago

[Bugfix] Mitigate SSRF vulnerability by adding URL validation

github.com/vllm-project/vllm - gitworkflows opened this pull request 4 months ago

[Model] Initial Support for Chameleon

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Bugfix] Code hardening to scales_shard_indexer

github.com/vllm-project/vllm - HaiShaw opened this pull request 4 months ago

[core][distributed] support variable length object in shm broadcast

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bug]: Different Image Size support with Phi-3-Vision and torchvision dependency

github.com/vllm-project/vllm - CatherineSue opened this issue 4 months ago

[ CI ] Enable Distributed Testing

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models

github.com/vllm-project/vllm - sroy745 opened this pull request 4 months ago

[Frontend] `request.tool_choice` is not completely exhausted in non-stream mode

github.com/vllm-project/vllm - MatheMatrix opened this pull request 4 months ago

DeepSeekCoderV2

github.com/vllm-project/vllm - SinanAkkoyun opened this issue 4 months ago

[Usage]: How to set --max-logprobs to the default length of LLM's vocab_size.

github.com/vllm-project/vllm - fengshansi opened this issue 4 months ago

[Docs][TPU] Add installation tip for TPU

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Bugfix] Fix pin_lora error in TPU executor

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

File "/public/home/huangchensen/miniconda3/envs/pytorch21/lib/python3.9/site-packages/vllm/executor/ray_gpu_executor.py", line 324, in _run_workers driver_worker_output = getattr(self.driver_worker, File "/public/home/huangchensen/miniconda3/envs/pytorch21/lib/python3.9/site-packages/vllm/worker/worker.py", line 100, in init_device init_distributed_environment(self.parallel_config, self.rank, File "/public/home/huangchensen/miniconda3/envs/pytorch21/lib/python3.9/site-packages/vllm/worker/worker.py", line 287, in init_distributed_environment pynccl_utils.init_process_group( File "/public/home/huangchensen/miniconda3/envs/pytorch21/lib/python3.9/site-packages/vllm/model_executor/parallel_utils/pynccl_utils.py", line 46, in init_process_group comm = NCCLCommunicator(init_method=init_method, File "/public/home/huangchensen/miniconda3/envs/pytorch21/lib/python3.9/site-packages/vllm/model_executor/parallel_utils/pynccl.py", line 249, in __init__ assert result == 0 AssertionError

github.com/vllm-project/vllm - WUHU-G opened this issue 4 months ago

[Installation]: Failed to install the packages at entrypoint

github.com/vllm-project/vllm - xuyifann opened this issue 4 months ago

[ci] Remove aws template

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py

github.com/vllm-project/vllm - zifeitong opened this pull request 4 months ago

[core][distributed] add message queue for cross-node broadcast

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[core][distributed] improve shared memory broadcast

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago