vLLM issues | Ecosyste.ms: OpenCollective

[Feature]: Enable `/score` endpoint for all embedding models

github.com/vllm-project/vllm - maxdebayser opened this issue 2 months ago

[Model] Clean up MiniCPMV

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

Configuration of the model parallelism does not make sense

github.com/vllm-project/vllm - fajavadi opened this pull request 2 months ago

[Misc][XPU] Avoid torch compile for XPU platform

github.com/vllm-project/vllm - yma11 opened this pull request 2 months ago

[Bug]: Making a request to the OpenAI API server with n=2 and best_of=2 fails

github.com/vllm-project/vllm - payoto opened this issue 2 months ago

[Misc] typo find in sampling_metadata.py

github.com/vllm-project/vllm - noooop opened this pull request 2 months ago

[Model] Add has_weight to RMSNorm and re-enable weights loading tracker for Mamba

github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago

[New Model]: Qwen/QwQ-32B-Preview

github.com/vllm-project/vllm - SionicAI-Engineering opened this issue 2 months ago

[Bug]: (vllm serve Qwen/Qwen2.5-1.5B-Instruct) is generating error torch.distributed.DistBackendError: File name too long and same thing is happening with other models

github.com/vllm-project/vllm - Diksha06122 opened this issue 2 months ago

[V1] Optimize the CPU overheads in FlashAttention custom op

github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago

[doc]Update config docstring

github.com/vllm-project/vllm - wangxiyuan opened this pull request 2 months ago

[Core] Refactoring disaggregated prefilling/decoding using Mooncake Transfer Engine

github.com/vllm-project/vllm - alogfans opened this pull request 2 months ago

[WIP][V1] Ray executor

github.com/vllm-project/vllm - rkooo567 opened this pull request 2 months ago

[Doc]: BNB 8 bit quantization is undocumented

github.com/vllm-project/vllm - molereddy opened this issue 2 months ago

[Bugfix] Fix BNB loader target_modules

github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago

[Model] Update multi-modal processor to support Mantis(LLaVA) model

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

[Bug]: Unsloth bitsandbytes quantized model cannot be run due to: `KeyError: 'layers.42.mlp.down_proj.weight.absmax`

github.com/vllm-project/vllm - kerem-coemert opened this issue 2 months ago

[Bug]: VLLM run very very slow in ARM cpu

github.com/vllm-project/vllm - feikiss opened this issue 2 months ago

[CI]add genai-perf benchmark in nightly benchmark

github.com/vllm-project/vllm - jikunshang opened this pull request 2 months ago

[V1] Initial support of multimodal models for V1 re-arch

github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago

[Bug]: v0.6.4.post1 Qwen2-VL-7B-Instruct-AWQ crash：shape mismatch

github.com/vllm-project/vllm - wciq1208 opened this issue 2 months ago

[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler

github.com/vllm-project/vllm - sroy745 opened this pull request 2 months ago

[Model] Implement merged input processor for LLaVA model

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

[RFC]: Make any vLLM model a pooling model

github.com/vllm-project/vllm - DarkLight1337 opened this issue 2 months ago

[Doc] Add github links for source code references

github.com/vllm-project/vllm - russellb opened this pull request 2 months ago

[Feature]: Integrate with XGrammar for zero-overhead structured generation in LLM inference.

github.com/vllm-project/vllm - choisioo opened this issue 2 months ago

[Bug]: Llama 3.2 90b crash

github.com/vllm-project/vllm - yessenzhar opened this issue 2 months ago

[V1] VLM - Run the mm_mapper preprocessor in the frontend process

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago

[Model] Enable optional prefix when loading embedding models

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago

[Bug]:The parameter gpu_memory_utilization does not take effect

github.com/vllm-project/vllm - liutao053877 opened this issue 2 months ago

[Feature]: Initial Idea and Design for Asynchronous Scheduling

github.com/vllm-project/vllm - lixiaolx opened this issue 2 months ago

[Usage]: What should the chat template for the `meta-llama/Llama-3.2-3B` be?

github.com/vllm-project/vllm - mrakgr opened this issue 2 months ago

[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing

github.com/vllm-project/vllm - jiahansu opened this issue 2 months ago

[Feature]: load and save kv cache from disk

github.com/vllm-project/vllm - duyongtju opened this issue 3 months ago

[core] improve cpu offloading implementation

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture

github.com/vllm-project/vllm - IdoAsraff opened this pull request 3 months ago

[Bug]: When apply prompt_logprobs for OpenAI server, the prompt_logprobs field in respnose does not show which token is chosen

github.com/vllm-project/vllm - DIYer22 opened this issue 3 months ago

[Bug]: Authorization ignored when root_path is set

github.com/vllm-project/vllm - chaunceyjiang opened this pull request 3 months ago

[Usage]: Why `use_beam_search` is eliminated in `vllm.SamplingParams` from v0.6.3?

github.com/vllm-project/vllm - BAI-Yeqi opened this issue 3 months ago

[fix] Correct num_accepted_tokens counting

github.com/vllm-project/vllm - KexinFeng opened this pull request 3 months ago

[doc] update the code to add models

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Usage]: How to make model response information appear in the vllm backend logs

github.com/vllm-project/vllm - nora647 opened this issue 3 months ago

Revert "[CI/Build] Print running script to enhance CI log readability"

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Bug]: GGUF Model Output Repeats Nonsensically

github.com/vllm-project/vllm - Mayflyyh opened this issue 3 months ago

[model][utils] add extract_layer_index utility function

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Usage]: While loading model get 'layers.0.mlp.down_proj.weight' after merge_and_unload()

github.com/vllm-project/vllm - alex2romanov opened this issue 3 months ago

[Misc]Further reduce BNB static variable

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[CI/Build] Print running script to enhance CI log readability

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[Bugfix] Avoid import AttentionMetadata explicitly in Mllama and fix openvino import

github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago

[Interleaved ATTN] Support for Mistral-8B

github.com/vllm-project/vllm - patrickvonplaten opened this pull request 3 months ago

[Bug] Streaming output error of tool calling has still not been resolved.

github.com/vllm-project/vllm - Sala8888 opened this issue 3 months ago

[Kernel] Remove hard-dependencies of Speculative decode to CUDA workers

github.com/vllm-project/vllm - xuechendi opened this pull request 3 months ago

[Bug]: Duplicate request_id breaks the engine

github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago

[Core] Update to outlines >= 0.1.8

github.com/vllm-project/vllm - russellb opened this pull request 3 months ago

[Installation]: Segmentation fault when building Docker container on WSL

github.com/vllm-project/vllm - nlsferrara opened this issue 3 months ago

[V1] Refactor model executable interface for multimodal models

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU)

github.com/vllm-project/vllm - SanjuCSudhakaran opened this pull request 3 months ago

[Docs] Add dedicated tool calling page to docs

github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago

[Usage]: Can we extend the context length of gemma2 model or other models?

github.com/vllm-project/vllm - hahmad2008 opened this issue 3 months ago

[Feature]: Support for Registering Model-Specific Default Sampling Parameters

github.com/vllm-project/vllm - yansh97 opened this issue 3 months ago

[Usage]: How to use ROPE scaling for llama3.1 and gemma2?

github.com/vllm-project/vllm - hahmad2008 opened this issue 3 months ago

[CI][Installation] Avoid uploading CUDA 11.8 wheel

github.com/vllm-project/vllm - cermeng opened this pull request 3 months ago

[Usage]: Fail to load config.json

github.com/vllm-project/vllm - dequeueing opened this issue 3 months ago

[Bug]: vllm failed to run two instance with one gpu

github.com/vllm-project/vllm - pandada8 opened this issue 3 months ago

Add Sageattention backend

github.com/vllm-project/vllm - flozi00 opened this pull request 3 months ago

[Bug]: Authorization ignored when root_path is set

github.com/vllm-project/vllm - OskarLiew opened this issue 3 months ago

[Misc] Suppress duplicated logging regarding multimodal input pipeline

github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago

[8/N] enable cli flag without a space

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[V1] Fix Compilation config & Enable CUDA graph by default

github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago

[Usage]: Optimizing TTFT for Qwen2.5-72B Model Deployment on A800 GPUs for RAG Application

github.com/vllm-project/vllm - zhanghx0905 opened this issue 3 months ago

[Feature]: Additional possible value for `tool_choice`: `required`

github.com/vllm-project/vllm - fahadh4ilyas opened this issue 3 months ago

[Bug]: Gemma2 becomes a fool.

github.com/vllm-project/vllm - Foreist opened this issue 3 months ago

fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len

github.com/vllm-project/vllm - sywangyi opened this pull request 3 months ago

[Bug]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.18.1

github.com/vllm-project/vllm - QualityGN opened this issue 3 months ago

[Kernel] Register punica ops directly

github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago

[Usage]: when i set --tensor-parallel-size 4 ，openai server dose not work . Report a new Exception

github.com/vllm-project/vllm - Geek-Peng opened this issue 3 months ago

[platforms] improve error message for unspecified platforms

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server

github.com/vllm-project/vllm - angkywilliam opened this pull request 3 months ago

[Model] Expose `dynamic_image_size` as mm_processor_kwargs for InternVL2 models

github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago

[Usage]: What's the relationship between KV cache and MAX_SEQUENCE_LENGTH.

github.com/vllm-project/vllm - GRuuuuu opened this issue 3 months ago

[Bug]: Model does not split in multiple Gpus instead it occupy same memory on each GPU

github.com/vllm-project/vllm - anilkumar0502 opened this issue 3 months ago

[Feature]: Manually inject Prefix KV Cache

github.com/vllm-project/vllm - toilaluan opened this issue 3 months ago

[Model]: Add support for Aria model

github.com/vllm-project/vllm - xffxff opened this pull request 3 months ago

[Doc] fix a small typo in docstring of llama_tool_parser

github.com/vllm-project/vllm - FerdinandZhong opened this pull request 3 months ago

[Bug]: I'm trying to run Pixtral-Large-Instruct-2411 using vllm, following the documentation at https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411, but I encountered an error.

github.com/vllm-project/vllm - eii-lyl opened this issue 3 months ago

[core] overhaul memory profiling and fix backward compatibility

github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago

[Feature]: Multimodel prefix-caching features

github.com/vllm-project/vllm - justzhanghong opened this issue 3 months ago

[Usage]:

github.com/vllm-project/vllm - Lukas-123 opened this issue 3 months ago

[Platforms] Add `device_type` in `Platform`

github.com/vllm-project/vllm - MengqingCao opened this pull request 3 months ago

[WIP][v1] Refactor KVCacheManager for more hash input than token ids

github.com/vllm-project/vllm - rickyyx opened this pull request 3 months ago

Need to update the jax and jaxlib version

github.com/vllm-project/vllm - vanbasten23 opened this pull request 3 months ago

Turn on V1 for H200 build

github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago

Metrics model name when using multiple loras

github.com/vllm-project/vllm - mces89 opened this issue 3 months ago

[Model] Add OLMo November 2024 model

github.com/vllm-project/vllm - 2015aroras opened this pull request 3 months ago

[Core] Implement disagg prefill by StatelessProcessGroup

github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago

Setting default for EmbeddingChatRequest.add_generation_prompt to False

github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago

Support softcap in ROCm Flash Attention

github.com/vllm-project/vllm - hliuca opened this pull request 3 months ago

[CI/Build] Dockerfile build for ARM64 / GH200

github.com/vllm-project/vllm - drikster80 opened this pull request 3 months ago

[Bugfix] GPU memory profiling should be per LLM instance

github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago

[Frontend] Add Command-R and Llama-3 chat template

github.com/vllm-project/vllm - ccs96307 opened this pull request 3 months ago