vLLM issues | Ecosyste.ms: OpenCollective

[WIP] Deepseek V2 MLA

github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago

[Usage]: when i learn quickstart，python vllm/examples /offline_inference.py

github.com/vllm-project/vllm - jingAH03 opened this issue about 2 months ago

Update deploying_with_k8s.rst

github.com/vllm-project/vllm - AlexHe99 opened this pull request about 2 months ago

[core][misc] remove use_dummy driver for _run_workers

github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago

[Feature]: Provide pre-built CPU docker image

github.com/vllm-project/vllm - fzyzcjy opened this issue about 2 months ago

[Bug]: Speculative decoding inconsistency for Qwen-Coder-32B

github.com/vllm-project/vllm - taltoris opened this issue about 2 months ago

Add Bamba Model

github.com/vllm-project/vllm - fabianlim opened this pull request about 2 months ago

[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types

github.com/vllm-project/vllm - rickyyx opened this pull request about 2 months ago

[Bug]: meta-llama/Llama-3.2-90B-Vision-Instruct and Qwen/Qwen2-VL-72B-Instruct models fails with asyncio.exceptions.CancelledError when using wiki image URLs

github.com/vllm-project/vllm - atanikan opened this issue about 2 months ago

[Bug]: tool_chat_template_mistral_parallel.jinja does not trim tool_call_id to 9 digits when using async stream

github.com/vllm-project/vllm - JonasWild opened this issue about 2 months ago

[Bug]: illegal memory access in `causal_conv1d_fn` with input length 1026

github.com/vllm-project/vllm - xffxff opened this issue about 2 months ago

[Misc]: FP8/INT8 for AQLM ？

github.com/vllm-project/vllm - Duncan1115 opened this issue about 2 months ago

[Doc]: How to make Multi-Node Inference

github.com/vllm-project/vllm - pygongnlp opened this issue about 2 months ago

[Core] Support disaggregated prefill with Mooncake Transfer Engine

github.com/vllm-project/vllm - ShangmingCai opened this pull request about 2 months ago

[Feature]: Publish vllm-tpu image to dockerhub

github.com/vllm-project/vllm - jjk-g opened this issue about 2 months ago

[Core] Support offloading KV cache to CPU

github.com/vllm-project/vllm - ApostaC opened this pull request about 2 months ago

[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations

github.com/vllm-project/vllm - SageMoore opened this pull request about 2 months ago

[Bugfix] Only require XGrammar on x86

github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago

[CI] Turn on basic correctness tests for V1

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago

[Bugfix] Fix spec decoding when seed is none in a batch

github.com/vllm-project/vllm - wallashss opened this pull request about 2 months ago

[Usage]: Different Context Free Grammars (or regex) per request

github.com/vllm-project/vllm - AlbertoCastelo opened this issue about 2 months ago

[Bug]: Error When Running gguf with vllm for Ultra-Long Context

github.com/vllm-project/vllm - anrgct opened this issue about 2 months ago

[Model] Add JambaForSequenceClassification model

github.com/vllm-project/vllm - yecohn opened this pull request about 2 months ago

[MISC][XPU] quick fix for XPU CI

github.com/vllm-project/vllm - yma11 opened this pull request about 2 months ago

Add jamba classfication

github.com/vllm-project/vllm - yecohn opened this pull request about 2 months ago

[Bug]: Docker deployment returns zmq.error.ZMQError: Operation not supported

github.com/vllm-project/vllm - aqx95 opened this issue about 2 months ago

[Bug]: RuntimeError: HIP Error on vLLM ROCm Image in Kubernetes Cluster with AMD GPUs

github.com/vllm-project/vllm - taddeusb90 opened this issue about 2 months ago

Update sampling_params.py

github.com/vllm-project/vllm - o2363286 opened this pull request about 2 months ago

[Frontend] correctly record prefill and decode time metrics

github.com/vllm-project/vllm - tomeras91 opened this pull request about 2 months ago

[Usage]: Sampling several sequences from OpenAI compatible server.

github.com/vllm-project/vllm - Ignoramus0817 opened this issue about 2 months ago

Regional compilation support

github.com/vllm-project/vllm - Kacper-Pietkun opened this pull request about 2 months ago

[Speculative Decoding] Move indices to device before filtering output

github.com/vllm-project/vllm - zhengy001 opened this pull request about 2 months ago

[Feature]: add DoRA support

github.com/vllm-project/vllm - cmhungsteve opened this issue about 2 months ago

[Bug]: GPTQ llama2-7b infer server failed!!!

github.com/vllm-project/vllm - tensorflowt opened this issue about 2 months ago

[Bug]: benchmark random input-len inconsistent

github.com/vllm-project/vllm - ltm920716 opened this issue about 2 months ago

[CORE] No Request No Scheduler: auto-increment of multi-step

github.com/vllm-project/vllm - DriverSong opened this pull request about 2 months ago

Tmp whl

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request about 2 months ago

[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug

github.com/vllm-project/vllm - jeejeelee opened this pull request about 2 months ago

[core][distributed] add pynccl broadcast

github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago

[Model] support bitsandbytes quantization with minicpm model

github.com/vllm-project/vllm - zixuanzhang226 opened this pull request about 2 months ago

Lora scheduler

github.com/vllm-project/vllm - Scott-Hickmann opened this pull request about 2 months ago

[torch.compile] remove compilation_context and simplify code

github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago

[Doc] add KubeAI to serving integrations

github.com/vllm-project/vllm - samos123 opened this pull request about 2 months ago

[WIP] Xgrammar init in engine

github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago

[Model] Add TP and BNB quantization support to LlavaMultiModalProjector

github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago

[Bug]: ERROR hermes_tool_parser.py:108] Error in extracting tool call from response.

github.com/vllm-project/vllm - Sala8888 opened this issue about 2 months ago

[Misc][LoRA] Move the implementation of lora bias to punica.py

github.com/vllm-project/vllm - jeejeelee opened this pull request about 2 months ago

[Doc] Create a new "Usage" section

github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago

[Bug]: mistral tool choice error

github.com/vllm-project/vllm - warlockedward opened this issue about 2 months ago

[Misc] Split up pooling tasks

github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago

[Bug]: The model output is abnormal when I use 2:4 sparsity

github.com/vllm-project/vllm - jiangjiadi opened this issue about 2 months ago

[RFC]: Disaggregated prefilling and KV cache transfer roadmap

github.com/vllm-project/vllm - KuntaiDu opened this issue about 2 months ago

[Misc] Remove deprecated names

github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago

[Model] Add support for embedding model GritLM

github.com/vllm-project/vllm - pooyadavoodi opened this pull request about 2 months ago

[Usage]: v0.5.0, why the process_model_inputs_async and engine_step be processed simutaneously? they are separate coroutines, and use await.

github.com/vllm-project/vllm - Deeperfinder opened this issue about 2 months ago

[misc] remove xverse modeling file

github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago

[Usage]: How to use llava-hf/llava-1.5-7b-hf with bitsandbytes quantization in vllm serve?

github.com/vllm-project/vllm - Wxy-24 opened this issue about 2 months ago

[Bug]: Engine process (pid 76) died

github.com/vllm-project/vllm - 0xymoro opened this issue about 2 months ago

[Kernel] Use `out` in flash_attn_varlen_func

github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago

[Core]: Support destroying all KV cache during runtime

github.com/vllm-project/vllm - HollowMan6 opened this pull request about 2 months ago

[core] Avoid metrics log noise when idle - include speculative decodi…

github.com/vllm-project/vllm - cduk opened this pull request about 2 months ago

[Bug]: vllm stream generate error

github.com/vllm-project/vllm - Wbxxx opened this issue about 2 months ago

[Bug]: The new vllm version is slow in inference

github.com/vllm-project/vllm - imrankh46 opened this issue about 2 months ago

[Bug]: Failed to abort requests when killing client process.

github.com/vllm-project/vllm - lixinye-nju opened this issue about 2 months ago

[doc] add warning about comparing hf and vllm outputs

github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago

[Misc] Adding `MMMU-Pro` vision dataset to serving benchmark

github.com/vllm-project/vllm - ywang96 opened this pull request about 2 months ago

[Core] add xgrammar as guided generation provider

github.com/vllm-project/vllm - joennlae opened this pull request about 2 months ago

[Bugfix] fix race condition that leads to wrong order of token returned

github.com/vllm-project/vllm - joennlae opened this pull request about 2 months ago

[Misc] Rename embedding classes to pooling

github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago

[Bug]: Fail to use CUDA with multiprocessing (llama_3_8b)

github.com/vllm-project/vllm - yliu2702 opened this issue about 2 months ago

Fill TorchSDPAAttentionMetadata seq_lens_field for prefill

github.com/vllm-project/vllm - maxdebayser opened this pull request about 2 months ago

[Bug]: LoRa adapter responses not matching peft/transformers response

github.com/vllm-project/vllm - RonanKMcGovern opened this issue about 2 months ago

[Usage]: cannot load llama 3.2 3b on a 16gb gpu when gpu_memory_utilisation=1

github.com/vllm-project/vllm - TheKidThatCodes opened this issue about 2 months ago

[LoRA] Change lora_tokenizers capacity

github.com/vllm-project/vllm - xyang16 opened this pull request about 2 months ago

[Model] Add BNB support to Llava and Pixtral-HF

github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago

[Bug]: Improve Error Messaging for Unsupported Tasks in vLLM (e.g., embedding with Llama Models)

github.com/vllm-project/vllm - laura-dietz opened this issue about 2 months ago

Fix openvino on GPU

github.com/vllm-project/vllm - janimo opened this pull request about 2 months ago

[Bug]: beam search: max_logprobs cannot be higher than 20

github.com/vllm-project/vllm - denadai2 opened this issue about 2 months ago

[Usage]: Question on max_model_len

github.com/vllm-project/vllm - mces89 opened this issue about 2 months ago

[Core][Performance] Add XGrammar support for guided decoding and set it as default

github.com/vllm-project/vllm - aarnphm opened this pull request about 2 months ago

[Usage]: Dynamically loaded LoRas do not appear on the /models endpoint

github.com/vllm-project/vllm - RonanKMcGovern opened this issue about 2 months ago

[New Model]: nvidia/Hymba-1.5B-Base

github.com/vllm-project/vllm - hutm opened this issue about 2 months ago

[Bugfix] Multiple fixes to tool streaming with hermes and mistral parsers

github.com/vllm-project/vllm - cedonley opened this pull request about 2 months ago

[Bug]: Streaming w/ tool choice auto often truncates the final delta in the streamed arguments

github.com/vllm-project/vllm - cedonley opened this issue about 2 months ago

[Bugfix] Fix OpenVino/Neuron `driver_worker` init

github.com/vllm-project/vllm - NickLucche opened this pull request about 2 months ago

[Bugfix] Fix Idefics3 bug

github.com/vllm-project/vllm - jeejeelee opened this pull request about 2 months ago

[Usage]: how to skip samples that have error and process the rest when using llm.generate(prompts, sampling_params)？

github.com/vllm-project/vllm - yxchng opened this issue about 2 months ago

Prepare sin/cos buffers for rope outside model forward

github.com/vllm-project/vllm - tzielinski-habana opened this pull request about 2 months ago

[Bug]: [Open-VINO] inference in CPU failed with "init_device" error

github.com/vllm-project/vllm - Orion-zhen opened this issue about 2 months ago

[Feature]: Unblock LLM while handling long sequences / Handling multiple prefills at the same time

github.com/vllm-project/vllm - schoennenbeck opened this issue about 2 months ago

[Bug]: AttributeError: 'Qwen2Model' object has no attribute 'rotary_emb'

github.com/vllm-project/vllm - Alex-DeepL opened this issue about 2 months ago

[Bug]: OpenAI compatible server with HuggingFaceTB/SmolVLM-Instruct resulting in 500 Internal Server error

github.com/vllm-project/vllm - Sriramkk123 opened this issue about 2 months ago

[Model] Refactor Molmo weights loading to use AutoWeightsLoader

github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago

[Model]: add some tests for aria model

github.com/vllm-project/vllm - xffxff opened this pull request about 2 months ago

[Model] Replace embedding models with pooling adapter

github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago

[Platform] Move `async output` check to platform

github.com/vllm-project/vllm - wangxiyuan opened this pull request about 2 months ago

Drop ROCm load format check

github.com/vllm-project/vllm - wangxiyuan opened this pull request about 2 months ago

[Usage]: Removal of vllm.openai.rpc folder in vLLM 0.6.2 release

github.com/vllm-project/vllm - utkshukla opened this issue about 2 months ago

[Misc][Quark] Upstream Quark format to VLLM

github.com/vllm-project/vllm - kewang-xlnx opened this pull request about 2 months ago

[Bug]: idefics3 doesn't stream

github.com/vllm-project/vllm - sjuxax opened this issue about 2 months ago