vLLM issues | Ecosyste.ms: OpenCollective

[Misc] Use scalar type to dispatch to diferent `gptq_marlin` kernels

github.com/vllm-project/vllm - LucasWilkinson opened this pull request 2 months ago

[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

github.com/vllm-project/vllm - halexan opened this issue 2 months ago

Add bitsandbytes fp4 support

github.com/vllm-project/vllm - thesues opened this pull request 2 months ago

[Kernel] Flashinfer correctness fix for v0.1.3

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago

[Docs] Update readme

github.com/vllm-project/vllm - simon-mo opened this pull request 2 months ago

[Feature]: Testing - Use `torch.testing.assert_close` instead of `torch.allclose` as a Recommended Practice

github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago

[Bugfix][Kernel] Increased atol to fix failing tests

github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago

[Usage]: how to save sharded state?

github.com/vllm-project/vllm - aldwnesx opened this issue 2 months ago

[Bug]: vllm hangs after model download / load

github.com/vllm-project/vllm - ArtificialEU opened this issue 2 months ago

[Core] Use Appropriate `torch.dtype` for FP8 KV Cache

github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago

[Usage]: Acceptance rate for Speculative Decoding

github.com/vllm-project/vllm - itsdaniele opened this issue 2 months ago

[Bug]: EfficientQAT GPTQ Does load but does not output through api

github.com/vllm-project/vllm - derpyhue opened this issue 2 months ago

[Feature]: For Meta-Llama-3.1-70B-Instruct model, no usage info included while stream equal to True

github.com/vllm-project/vllm - nikhilcms opened this issue 2 months ago

[CI/Build] Dockerfile.cpu improvements

github.com/vllm-project/vllm - dtrifiro opened this pull request 2 months ago

[Bug]: vllm is hang after upgrade to v0.5.4

github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago

AsyncLLMEngine and LLMEngine

github.com/vllm-project/vllm - ngz-sun opened this issue 2 months ago

[Usage]: When I installed vllm version 0.5.3.post1, there was a problem deploying qwen2

github.com/vllm-project/vllm - Uhao-P opened this issue 2 months ago

[Bug]: ValueError: BitAndBytes with enforce_eager = False is not supported yet.

github.com/vllm-project/vllm - XCYXHL opened this issue 2 months ago

[CI/Build] Allow building for CUDA compute capability 8.7

github.com/vllm-project/vllm - hacker1024 opened this pull request 2 months ago

[Bugfix] Fix LoRA with PP

github.com/vllm-project/vllm - andoorve opened this pull request 2 months ago

Provided example for loading GGUF model is not working [Bug]:

github.com/vllm-project/vllm - sarthakd112 opened this issue 2 months ago

[Bug]: "500 Internal Server Error" after upgrade to v0.5.4

github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago

[Feature]: Beam Search also requires diversity

github.com/vllm-project/vllm - Jack-mi opened this issue 2 months ago

[Frontend] remove max_num_batched_tokens limit for lora

github.com/vllm-project/vllm - NiuBlibing opened this pull request 2 months ago

[Feature]:

github.com/vllm-project/vllm - Jack-mi opened this issue 2 months ago

[Usage]: add mulitple lora in docker

github.com/vllm-project/vllm - chintanshrinath opened this issue 2 months ago

[Kernel] Fix Flashinfer Correctness

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago

[Bug]: Empty prompt kills vllm server (AsyncEngineDeadError: Background loop is stopped.)

github.com/vllm-project/vllm - shimizust opened this issue 2 months ago

[Misc] Update `gptq_marlin` to use new vLLMParameters

github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago

[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago

[Misc] `compressed-tensors` code reuse

github.com/vllm-project/vllm - kylesayrs opened this pull request 2 months ago

[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6

github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago

[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method

github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago

[Kernel] AQ AZP 3/4: Asymmetric quantization kernels

github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago

[Bugfix] Fix new Llama3.1 GGUF model loading

github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago

[New Model]:Is MiniCPM-V-2_6 supported?

github.com/vllm-project/vllm - det-tu opened this issue 2 months ago

[CI/Build] Pin OpenTelemetry versions and make availability errors clearer

github.com/vllm-project/vllm - ronensc opened this pull request 2 months ago

[Feature]: Custom Embedding function for vector databases (Chroma & Quadrant)

github.com/vllm-project/vllm - S-M-Ammar opened this issue 2 months ago

[Feature]: support longer max_num_batched_tokens for lora

github.com/vllm-project/vllm - NiuBlibing opened this issue 2 months ago

[Core] Support serving encoder/decoder models

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 2 months ago

[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill

github.com/vllm-project/vllm - sergeykochetkov opened this pull request 2 months ago

[Bug]: Lora is incompatible with distributed pipeline parallelism

github.com/vllm-project/vllm - CNTRYROA opened this issue 2 months ago

[OpenVINO] migrate to latest dependencies versions

github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 2 months ago

[mypy] Enable following imports for entrypoints

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

[RFC]: Initial support for RBLN NPU

github.com/vllm-project/vllm - rebel-jonghewk opened this issue 2 months ago

[Bug]: ngc24.05 "RuntimeError: Cannot re-initialize CUDA in forked subprocess."

github.com/vllm-project/vllm - LSC527 opened this issue 2 months ago

[SpecDecode][Kernel] Use Flashinfer for Rejection Sampling in Speculative Decoding

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago

[Bug]: Miscalculation for ITL

github.com/vllm-project/vllm - AslanEZ opened this issue 2 months ago

[Usage]: Increase the maximum number of running reqs, which now seems to default to 100

github.com/vllm-project/vllm - Vodkazy opened this issue 2 months ago

[Bug]: The new version (v0.5.4) cannot load the gptq model, but the old version (vllm=0.5.3.post1) can do it.

github.com/vllm-project/vllm - ningwebbeginner opened this issue 2 months ago

[Bug]: Failure to instantiate mixtral 8x7b model requires restarting script

github.com/vllm-project/vllm - gnpinkert opened this issue 2 months ago

[CI/Build][ROCm] Enabling tensorizer tests for ROCm

github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago

[New Model]: LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

github.com/vllm-project/vllm - shing100 opened this issue 2 months ago

[wip][misc] register custom op for flash attention

github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago

[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce`

github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago

[Model] Add multi-image input support for LLaVA-Next offline inference

github.com/vllm-project/vllm - zifeitong opened this pull request 2 months ago

[mypy] Enable mypy type checking for `vllm/core`

github.com/vllm-project/vllm - jberkhahn opened this pull request 2 months ago

[Bug]: llama 3.1 70b vs mistral-large generation speed

github.com/vllm-project/vllm - ccdv-ai opened this issue 2 months ago

[Core] Shut down aDAG workers with clean async llm engine exit

github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago

[Bug]: Unhandled tool invocations with Llama 3.1 using LangChain and OpenAI-compatible API

github.com/vllm-project/vllm - dolanp83 opened this issue 2 months ago

[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary

github.com/vllm-project/vllm - tjohnson31415 opened this pull request 2 months ago

[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm

github.com/vllm-project/vllm - charlifu opened this pull request 2 months ago

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago

[Installation]: run vllm in the rocm docker raied: RuntimeError: No HIP GPUs are available

github.com/vllm-project/vllm - hnhyzz opened this issue 2 months ago

[Bug]: GPTQ Marlin with cpu-offload-gb fails on `0.5.4`

github.com/vllm-project/vllm - w013nad opened this issue 2 months ago

[Usage]: Set pipeline-parallel-size to 8 but only 1 or 2 GPUs running at the same time.

github.com/vllm-project/vllm - ybdesire opened this issue 2 months ago

[Bug]: NCCL gives an error when I use tensor_parallel ：RuntimeError: NCCL error: invalid usage

github.com/vllm-project/vllm - wlll123456 opened this issue 2 months ago

[Bug]: loading fp16 model as fp8 quantized caused OOM

github.com/vllm-project/vllm - AlphaINF opened this issue 2 months ago

[Model][LoRA]LoRA support added for MiniCPMV2.5

github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago

[Bug]: ZMQError: Address already in use (addr='tcp://127.0.0.1:5570')

github.com/vllm-project/vllm - WMeng1 opened this issue 2 months ago

[Feature]: Adjust max_model_len based on viable KV space

github.com/vllm-project/vllm - mscheong01 opened this issue 2 months ago

[Bug]: Incomplete tool calling response for pipeline-parallel vllm with ray

github.com/vllm-project/vllm - sfbemerk opened this issue 2 months ago

[Bug]: flash_attn_varlen_func() got an unexpected keyword argument 'softcap'

github.com/vllm-project/vllm - cjfcsjt opened this issue 2 months ago

Updating LM Format Enforcer version to v0.10.6

github.com/vllm-project/vllm - noamgat opened this pull request 2 months ago

[Model] Add AWQ quantization support for InternVL2 model

github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago

[VLM][Model] TP support for ViTs

github.com/vllm-project/vllm - ChristopherCho opened this pull request 3 months ago

[Misc] Update dockerfile for CPU to cover protobuf installation

github.com/vllm-project/vllm - PHILO-HE opened this pull request 3 months ago

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel

github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago

[Bug]: Vllm crashes with asyncio.exceptions.CancelledError

github.com/vllm-project/vllm - drawnwren opened this issue 3 months ago

Suri vllm cpchung

github.com/vllm-project/vllm - chakpongchung opened this pull request 3 months ago

[Models] Add remaining model PP support

github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago

[Bug]: vLLM latest version on Inf2 fails

github.com/vllm-project/vllm - ratnopamc opened this issue 3 months ago

[Hardware][AMD] Update rocm base image and add openai server entrypoint

github.com/vllm-project/vllm - dgoupil opened this pull request 3 months ago

[Performance] Optimize e2e overheads: Reduce python allocations

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago

[Bug]: vllm is crashed on v0.5.3.post1

github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago

[Bugfix][Frontend] Enable tools

github.com/vllm-project/vllm - tomeras91 opened this pull request 3 months ago

[Usage]: assert error: assert self._num_computed_tokens <= self.get_len()

github.com/vllm-project/vllm - Mr-Rosan opened this issue 3 months ago

[Model] Support SigLIP encoder and alternative decoders for LLaVA models

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Bug]: After updating to vllm 0.5.3.post1, "Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method."

github.com/vllm-project/vllm - Jerry-jwz opened this issue 3 months ago

[Bug]: llama3.1-70B RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.

github.com/vllm-project/vllm - duguwanglong opened this issue 3 months ago

[Core] Use flashinfer sampling kernel when available

github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago

[Bug]: 单gpu没有任何反应（设置tensor_parallel_size=1模型加载失败）

github.com/vllm-project/vllm - efficentdet opened this issue 3 months ago

[Frontend] Support embeddings in the run_batch API

github.com/vllm-project/vllm - pooyadavoodi opened this pull request 3 months ago

[RFC]: vLLM plugin system

github.com/vllm-project/vllm - youkaichao opened this issue 3 months ago

add plugin

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago

[Feature]: How to run the int4 quantized version of the gemma2-27b model

github.com/vllm-project/vllm - maxin9966 opened this issue 3 months ago

[RFC]: Model architecture plugins

github.com/vllm-project/vllm - NadavShmayo opened this issue 3 months ago

[Model] Add multi-image support for minicpmv offline inference

github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 3 months ago