vLLM issues | Ecosyste.ms: OpenCollective

[Bugfix] Fix illegal memory access for lora

github.com/vllm-project/vllm - sfc-gh-zhwang opened this pull request 5 months ago

[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Performance]: What can we learn from OctoAI

github.com/vllm-project/vllm - hmellor opened this issue 5 months ago

[Build] Do not compile cutlass scaled_mm on CUDA 11

github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago

[Bugfix] Fix KeyError: 1 When Using LoRA adapters

github.com/vllm-project/vllm - BlackBird-Coding opened this pull request 5 months ago

[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine

github.com/vllm-project/vllm - kezouke opened this issue 5 months ago

[Bug]: WSL2(also Docker) 1 GPU work but 2 not,(--tensor-parallel-size 2 )

github.com/vllm-project/vllm - goodmaney opened this issue 5 months ago

[Bug]: Issue with Token Processing Efficiency and Key-Value Cache Utilization in AsyncLLMEngine

github.com/vllm-project/vllm - kezouke opened this issue 5 months ago

[Kernel] Pass a device pointer into the quantize kernel for the scales

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Core] Bump up the default of --gpu_memory_utilization to be more similar to TensorRT Triton's default

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Feature]: VLLM suport for function calling in Mistral-7B-Instruct-v0.3

github.com/vllm-project/vllm - javierquin opened this issue 5 months ago

[Feature]: Linear adapter support for Mixtral

github.com/vllm-project/vllm - DhruvaBansal00 opened this issue 5 months ago

[Bug] [spec decode] [flash_attn]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache

github.com/vllm-project/vllm - khluu opened this issue 5 months ago

[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py

github.com/vllm-project/vllm - dashanji opened this pull request 5 months ago

[Misc]: Should inference with temperature 0 generate the same results for a lora adapter and equivalent merged model?

github.com/vllm-project/vllm - rohan-daniscox opened this issue 5 months ago

[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests

github.com/vllm-project/vllm - zhaotyer opened this issue 5 months ago

add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

[Kernel] Update Cutlass fp8 configs

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago

[Usage]: how should I do data parallelism using vLLM?

github.com/vllm-project/vllm - YuWang916 opened this issue 5 months ago

[Bugfix] Fix KV head calculation for MPT models when using GQA

github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago

[CI/Build] Test buildkite monorepo plugin

github.com/vllm-project/vllm - dgoupil opened this pull request 5 months ago

[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.

github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago

[Core] Remove unnecessary copies in flash attn backend

github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago

[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest`

github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago

[Usage]: Do we have any tutorials for using vllm with tensorrt-LLM?

github.com/vllm-project/vllm - weiyunfei opened this issue 5 months ago

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0

github.com/vllm-project/vllm - crazy-JiangDongHua opened this issue 5 months ago

[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier

github.com/vllm-project/vllm - sroy745 opened this pull request 5 months ago

[CI/Build] increase wheel size limit to 200 MB

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py

github.com/vllm-project/vllm - ita9naiwa opened this pull request 5 months ago

[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX

github.com/vllm-project/vllm - zhaofangtao opened this issue 5 months ago

[Feature]:

github.com/vllm-project/vllm - double-vin opened this issue 5 months ago

[Usage]: extractive question answering using VLLM

github.com/vllm-project/vllm - suryavan11 opened this issue 5 months ago

[Doc] Use intersphinx and update entrypoints docs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[New Model]: LLaVA-NeXT-Video support

github.com/vllm-project/vllm - AmazDeng opened this issue 5 months ago

[Bug]: The tail problem

github.com/vllm-project/vllm - ZixinxinWang opened this issue 5 months ago

Add gptq_marlin test to cover bug report https://github.com/vllm-project/vllm/issues/5088

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

Add gptq_marlin test to cover bug report #5088

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

[Bugfix] Avoid Warnings in SparseML Activation Quantization

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Bugfix] Automatically Detect SparseML models

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Misc] Simplify code and fix type annotations in `conftest.py`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Usage]: Multiple samplig params with OpenAI library

github.com/vllm-project/vllm - JH-lee95 opened this issue 5 months ago

[Kernel] Add `w4a16` support for `compressed_tensors` models

github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago

[Kernel] Add support for block size 96 to the paged attention kernel.

github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago

[Kernel] CUTLASS epilogue refactor and kernels with quantized outputs

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters

github.com/vllm-project/vllm - flexorRegev opened this issue 5 months ago

[CI/Build] Docker cleanup functionality for amd servers

github.com/vllm-project/vllm - okakarpa opened this pull request 5 months ago

[Bug]: vLLM embeddings example code doesn't work

github.com/vllm-project/vllm - orionw opened this issue 5 months ago

New CI template on AWS stack

github.com/vllm-project/vllm - khluu opened this pull request 5 months ago

[ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite

github.com/vllm-project/vllm - eduardozamudio opened this issue 5 months ago

[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago

[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM

github.com/vllm-project/vllm - AkshataDM opened this issue 5 months ago

[Bug]: async engine failure when placing multi lora adapter under load

github.com/vllm-project/vllm - DavidPeleg6 opened this issue 5 months ago

[Bug]: can not clean up the memory usage after instantiating the LLM class.

github.com/vllm-project/vllm - c3-ali opened this issue 5 months ago

[Doc][Build] update after removing vllm-nccl

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run

github.com/vllm-project/vllm - sung-ho-moon opened this issue 5 months ago

[Speculative Decoding] Enable arbitrary model inputs

github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago

[CI/Build] Simplify OpenAI server setup in tests

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Core] Avoid the need to pass `None` values to `Sequence.inputs`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Misc] Add vLLM version getter to utils

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterators`

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bug]: Can't run vllm distributed inference with vLLM + Ray

github.com/vllm-project/vllm - linchen111 opened this issue 5 months ago

[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors.

github.com/vllm-project/vllm - macheng6 opened this issue 5 months ago

[Feature] vLLM CLI for serving and querying OpenAI compatible server

github.com/vllm-project/vllm - EthanqX opened this pull request 5 months ago

[Bug]: Gemma model fails with GPTQ marlin

github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago

[Installation]: Error when importing LLM from vllm

github.com/vllm-project/vllm - manishkumar0709 opened this issue 5 months ago

[Bug]: The vllm is disconnected after running for some time

github.com/vllm-project/vllm - zxcdsa45687 opened this issue 5 months ago

[RFC]: OpenAI Triton-only backend

github.com/vllm-project/vllm - bringlein opened this issue 5 months ago

curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues

github.com/vllm-project/vllm - fishingcatgo opened this issue 5 months ago

[Model] Support MAP-NEO model

github.com/vllm-project/vllm - xingweiqu opened this pull request 5 months ago

[Usage]: quantization option usage

github.com/vllm-project/vllm - Juelianqvq opened this issue 5 months ago

[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label

github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago

[Bug]: Build/Install Issues with pip install -e .

github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago

[Model] Add support for falcon-11B

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash

github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago

[Bug]: The VRAM usage of calculating log_probs is not considered in profile run

github.com/vllm-project/vllm - Conless opened this issue 5 months ago

[Feature]: Integration of transformers past_key_values into the vllm kvcache Function

github.com/vllm-project/vllm - ChaoZhou2023 opened this issue 5 months ago

Heterogeneous Speculative Decoding (CPU + GPU)

github.com/vllm-project/vllm - jiqing-feng opened this pull request 5 months ago

[Model] Add Internlm2 LoRA support

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Misc]: How to use guided decoding and regex as well?

github.com/vllm-project/vllm - debraj135 opened this issue 5 months ago

[Bug]: When load model weights, there are infinite loading

github.com/vllm-project/vllm - tjrlwjd1 opened this issue 5 months ago

[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3

github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.

github.com/vllm-project/vllm - heungson opened this issue 5 months ago

[Core] Allow AQLM on Pascal

github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago

[Bug]: Cannot build cpu docker image

github.com/vllm-project/vllm - licryle opened this issue 5 months ago

[Feature]: multi-steps model_runner?

github.com/vllm-project/vllm - leiwen83 opened this issue 5 months ago

[Frontend] Add tokenize/detokenize endpoints

github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago

[Bugfix] Adds outlines performance improvement

github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago

Running Vllm on ray cluster, logging stuck at loading

github.com/vllm-project/vllm - maherr13 opened this issue 5 months ago

[Feature]: Add num_requests_preempted metric

github.com/vllm-project/vllm - sathyanarays opened this issue 5 months ago

Chat method for offline llm

github.com/vllm-project/vllm - nunjunj opened this pull request 5 months ago

[Installation]:

github.com/vllm-project/vllm - Kastycupra opened this issue 5 months ago

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops

github.com/vllm-project/vllm - bnellnm opened this pull request 5 months ago

Bump version to v0.4.3

github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago

[Misc] add logging level env var

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Misc] Make Serving Benchmark More User-friendly

github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago