vLLM issues | Ecosyste.ms: OpenCollective

[bugfix][distributed] fix multi-node bug for shared memory

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Core] Modulize prepare input and attention metadata builder

github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago

[Feature]: Implementation of Sliding Window Attention for Full Context Support with Gemma-2

github.com/vllm-project/vllm - Motorratte opened this issue 3 months ago

[Frontend] Kill the server on engine death

github.com/vllm-project/vllm - joerunde opened this pull request 3 months ago

[ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago

Add FP8 quantization `ignored_layers` support in llama

github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago

[Bug]: Intel GPU Test failing in CI

github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago

Fp8 dyn per tok

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Bugfix][Core] Output sampling: heuristic to choose between candidates

github.com/vllm-project/vllm - NihalPotdar opened this pull request 3 months ago

[Bugfix][Core]: Guard for KeyErrors that can occur if a request is aborted with Pipeline Parallel

github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago

[Bug]: Crash possible with Pipeline Parallel when aborting requests

github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago

[Feature]: LLM2Vec (Fine-Tuned Embeddings) Support

github.com/vllm-project/vllm - DorotheaMueller opened this issue 3 months ago

[Bugfix] cuda: handle case visible devices is a MIG or GPU uuid

github.com/vllm-project/vllm - cfhammill opened this pull request 3 months ago

[Bugfix][Frontend] remove duplicate init logger

github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago

[Docs] Update docs for wheel location

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

[Misc] Fix input_scale typing in w8a8_utils.py

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Bugfix] [SpecDecode] AsyncMetricsCollector: update time since last collection

github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago

[Bug]: SpecDecode AsyncMetricsCollector _last_metrics_collect_time is never reset

github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago

Fp8 support for mi300x

github.com/vllm-project/vllm - ferrybaltimore opened this issue 3 months ago

[Bug]: CUDA Error when print

github.com/vllm-project/vllm - NaNillll opened this issue 3 months ago

[Usage]: ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.

github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago

[Feature]: Support Lora Adapter generated from mistral-finetune

github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago

[Bug]: INFO 07-19 10:17:50 async_llm_engine.py:167] Aborted request 30d35945-d526-40bc-90c6-40ad73e639b9. INFO 07-19 10:17:50 async_llm_engine.py:49] Engine is gracefully shutting down.

github.com/vllm-project/vllm - Adevils opened this issue 3 months ago

[Model]: Llava-Next-Video support

github.com/vllm-project/vllm - TKONIY opened this issue 3 months ago

Upgrade to numpy >= 2.0.0

github.com/vllm-project/vllm - fgebhart opened this issue 3 months ago

add tqdm when loading checkpoint shards

github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago

[Misc] Enhance prefix-caching benchmark tool

github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago

[Bug]: gptq model fails on pascal gpu with long prompt

github.com/vllm-project/vllm - shesung opened this issue 3 months ago

[Core] Support load and unload LoRA in api server

github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago

[Feature]: Any thoughts about MI50 support ?

github.com/vllm-project/vllm - linchen111 opened this issue 3 months ago

[Bug]: Distributed Inference and Serving

github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago

[New Model]: Mistral-Nemo

github.com/vllm-project/vllm - Hambaobao opened this issue 3 months ago

[Bug]: Failed to import from vllm._C with ImportError("/lib64/libc.so.6: version `GLIBC_2.32' not found

github.com/vllm-project/vllm - balcklive opened this issue 3 months ago

[Usage]: Can't utilize all VRAM for context

github.com/vllm-project/vllm - vlsav opened this issue 3 months ago

[Performance]: GPU utilization is low when running large batches on H100

github.com/vllm-project/vllm - sleepwalker2017 opened this issue 3 months ago

[ Misc ] `fbgemm` checkpoints

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Bug]: Cannot load fp8 model of internlm2-chat-7b offline

github.com/vllm-project/vllm - EstellaXinyuZhang opened this issue 3 months ago

[Core] Allow specifying custom Executor

github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago

[Bug]: vllm doesn't support multi-instance GPU

github.com/vllm-project/vllm - cfhammill opened this issue 3 months ago

[ci][test] add correctness test for cpu offloading

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Model] Support Mistral-Nemo

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[ Kernel ] Enable Dynamic Per Token `fp8`

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[CI/Build] bump ruff version, fix linting issues

github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago

[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support

github.com/vllm-project/vllm - bjoernpl opened this issue 3 months ago

[Usage]: How to release GPU of vLLM model in python code

github.com/vllm-project/vllm - quanshr opened this issue 3 months ago

[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes

github.com/vllm-project/vllm - mawong-amd opened this pull request 3 months ago

[CI/Build] replace yapf with ruff

github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago

[Misc] Consolidate and optimize logic for building padded tensors

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion

github.com/vllm-project/vllm - yecohn opened this issue 3 months ago

[Bug]: vllm turned off my pc (loading mixtral8x7b)

github.com/vllm-project/vllm - juanluis17 opened this issue 3 months ago

[Bug]: vllm not support fp8 kv cache when use flashinfer

github.com/vllm-project/vllm - kuangdao opened this issue 3 months ago

[Bugfix] Corrected Typographical Errors from "indicies" to "indices"

github.com/vllm-project/vllm - JHLEE17 opened this pull request 3 months ago

[Core] Reduce unnecessary compute when logprobs=None

github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago

[Bug]: inter-token latency is lower than TPOT in serving benchmark result

github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago

[doc][distributed] add more doc for setting up multi-node environment

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Misc] Support FP8 kv cache scales from compressed-tensors

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

added bitsandbytes dependency in common requirement.txt file

github.com/vllm-project/vllm - dipatidar opened this pull request 3 months ago

[Misc] Small perf improvements

github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago

[Model] Pipeline Parallel Support for DeepSeek v2

github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago

[Model] Initialize support for InternVL2 series models

github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago

FP8 Dynamic-Per-Token Quant

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago

[DOC] - Add docker image to Cerebrium Integration

github.com/vllm-project/vllm - milo157 opened this pull request 3 months ago

[Usage]: No chat template provided. Chat API will not work. How do I get vllm to support Codellama-34B in openai format?

github.com/vllm-project/vllm - x0w3n opened this issue 3 months ago

[Feature]: Add OpenAI server `prompt_logprobs` support

github.com/vllm-project/vllm - Theodotus1243 opened this issue 3 months ago

[Bug]: The _get_stats() are called multiple times which cause incorrect metrics collecting in do_log_stats()

github.com/vllm-project/vllm - yejingfu opened this issue 3 months ago

[TPU] Refactor TPU worker & model runner

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Misc] Use `torch.Tensor` for type annotation

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[TPU] Remove multi-modal args in TPU backend

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[New Model]: Support for Telechat

github.com/vllm-project/vllm - hzhaoy opened this issue 3 months ago

[Model] Add Support for GPTQ Fused MOE

github.com/vllm-project/vllm - izhuhaoran opened this pull request 3 months ago

[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash

github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago

[Bug]: When I use gemma2 27b, the openai.api returns content "" as none ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=[])

github.com/vllm-project/vllm - Minami-su opened this issue 3 months ago

deploying embedding model in same way as LLM

github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago

ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10 ERROR: Could not find a version that satisfies the requirement numba (from outlines) (from versions: none) ERROR: No matching distribution found for numba[Installation]:

github.com/vllm-project/vllm - XyLove0223 opened this issue 3 months ago

[core][model] yet another cpu offload implementation

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bugfix] Fix for multinode crash on 4 PP

github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago

[Bug]: The metrics have not improved.

github.com/vllm-project/vllm - zjjznw123 opened this issue 3 months ago

Sequence parallel

github.com/vllm-project/vllm - wbdr opened this pull request 3 months ago

[Not for review]test gemma lora

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[misc][distributed] add seed to dummy weights

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[CI/Build] Update flashinfer to v0.0.9 (#6489)

github.com/vllm-project/vllm - 170928 opened this pull request 3 months ago

[Misc] Updated flashinfer to v0.0.9 in the following test scripts:

github.com/vllm-project/vllm - 170928 opened this issue 3 months ago

[misc][distributed] improve tests

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[ Kernel ] Fp8 Channelwise Weight Support

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago

[Bug]: No module named `jsonschema.protocols`.

github.com/vllm-project/vllm - eff-kay opened this issue 3 months ago

[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models.

github.com/vllm-project/vllm - sroy745 opened this pull request 3 months ago

[Model] Support Mamba

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago

[Not for review] Spmd tp rebase

github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago

[ROCm] Cleanup Dockerfile and remove outdated patch

github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago

[New Model]: Codestral Mamba

github.com/vllm-project/vllm - K-Mistele opened this issue 3 months ago

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2

github.com/vllm-project/vllm - choco9966 opened this issue 3 months ago

[Bug]: Gemma 27B crashes on GCP A100

github.com/vllm-project/vllm - noamgat opened this issue 3 months ago

[Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`.

github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago

[Feature]: Pipeline parallelism support for qwen model

github.com/vllm-project/vllm - hiyforever opened this issue 3 months ago

[Usage]: PeftModelForCausalLM is not JSON serializable

github.com/vllm-project/vllm - jazzisfuture opened this issue 3 months ago

[Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM

github.com/vllm-project/vllm - bong-furiosa opened this issue 3 months ago

[Misc][Speculative decoding] Typos and typing fixes

github.com/vllm-project/vllm - ShangmingCai opened this pull request 3 months ago

[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE)

github.com/vllm-project/vllm - weiminw opened this issue 3 months ago

unable to run vllm model deployment

github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago

[Bugfix][Frontend] Fix missing `/metrics` endpoint

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago