Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Doc] Update quantization supported hardware table
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
[Kernel][Misc] dynamo support for ScalarType
github.com/vllm-project/vllm - bnellnm opened this pull request 2 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request 2 months ago
[Bug]: vllm 0.5.4 with enable_chunked_prefill =True, throughput is slightly lower than 0.5.3~0.5.0. Caused by #6867
github.com/vllm-project/vllm - noooop opened this issue 2 months ago
github.com/vllm-project/vllm - noooop opened this issue 2 months ago
[Kernel] register punica functions as torch ops
github.com/vllm-project/vllm - bnellnm opened this pull request 2 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request 2 months ago
[Bug]: Phi-3-small-128k-instruct on 4 T4 GPUs - Memory error: Tried to allocate 1024.00 GiB
github.com/vllm-project/vllm - jgen1 opened this issue 2 months ago
github.com/vllm-project/vllm - jgen1 opened this issue 2 months ago
[Bug]: p2p check in custom all reduce not working
github.com/vllm-project/vllm - cjackal opened this issue 2 months ago
github.com/vllm-project/vllm - cjackal opened this issue 2 months ago
[Feature]: Benchmark script with speculative decode metrics
github.com/vllm-project/vllm - cermeng opened this issue 2 months ago
github.com/vllm-project/vllm - cermeng opened this issue 2 months ago
[Kernel][LoRA] Add assertion for punica sgmv kernels
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
[Ray backend] Better error when pg topology is bad.
github.com/vllm-project/vllm - rkooo567 opened this pull request 2 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 2 months ago
[misc] use nvml to get consistent device name
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Bug]: Consuming gpu memory extremly when run server with vllm docker
github.com/vllm-project/vllm - rjwharry opened this issue 2 months ago
github.com/vllm-project/vllm - rjwharry opened this issue 2 months ago
[Bug]: run quantized model error
github.com/vllm-project/vllm - soulzzz opened this issue 2 months ago
github.com/vllm-project/vllm - soulzzz opened this issue 2 months ago
[Bug]: fp8 performance is worse than fp16 when batch size is 1
github.com/vllm-project/vllm - kuangdao opened this issue 2 months ago
github.com/vllm-project/vllm - kuangdao opened this issue 2 months ago
[Usage]: The swap_blocks function in the cache_kernels.cu file does not handle errors.
github.com/vllm-project/vllm - zeroorhero opened this issue 2 months ago
github.com/vllm-project/vllm - zeroorhero opened this issue 2 months ago
[Installation]: vllm install error in jetson agx orin
github.com/vllm-project/vllm - FanZhang91 opened this issue 2 months ago
github.com/vllm-project/vllm - FanZhang91 opened this issue 2 months ago
[CI] Move quantization cpu offload tests out of fastcheck
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
[Core.aDAG] Temporarily turn off NCCL in aDAG tests
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
[ci/test] rearrange tests and make adag test soft fail
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[spec decode] [4/N] Move update_flash_attn_metadata to attn backend
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
[Core] Use uvloop with zmq-decoupled front-end
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
[Prototype] Create and use custom NCCL group for aDAG
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
[Usage]: Extremely slow inference with Llama 3.1 70b Instruct
github.com/vllm-project/vllm - harsh244 opened this issue 2 months ago
github.com/vllm-project/vllm - harsh244 opened this issue 2 months ago
[Bugfix][Harmless] Fix hardcoded float16 dtype for model_is_embedding
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
[Tests] Disable retries and use context manager for openai client
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
[Kernel] Use mutable_data_ptr or const_data_ptr instead of data_ptr
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 2 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 2 months ago
Varun/multi step chunked prefill
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 2 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 2 months ago
[Bugfix] neuron: enable tensor parallelism
github.com/vllm-project/vllm - omrishiv opened this pull request 2 months ago
github.com/vllm-project/vllm - omrishiv opened this pull request 2 months ago
[Bug]: Llama3.1 casting torch.bfloat16 to torch.float16
github.com/vllm-project/vllm - jgreer013 opened this issue 2 months ago
github.com/vllm-project/vllm - jgreer013 opened this issue 2 months ago
[Feature]: Automatic Prefix Caching and Truncating. Possibilty for Context Shifting.
github.com/vllm-project/vllm - derpyhue opened this issue 2 months ago
github.com/vllm-project/vllm - derpyhue opened this issue 2 months ago
[model] Support for Llava-Next-Video model
github.com/vllm-project/vllm - TKONIY opened this pull request 2 months ago
github.com/vllm-project/vllm - TKONIY opened this pull request 2 months ago
[Bug]: Pre-built Docker container crashes when run on CPU/MacOS
github.com/vllm-project/vllm - redevined opened this issue 2 months ago
github.com/vllm-project/vllm - redevined opened this issue 2 months ago
[Misc]: gpu-memory-utilization and Memory-Usage (by nvidia-smi)
github.com/vllm-project/vllm - ChuanhongLi opened this issue 2 months ago
github.com/vllm-project/vllm - ChuanhongLi opened this issue 2 months ago
[Bug]: NCCL error: invalid usage (run with NCCL_DEBUG=WARN for details)
github.com/vllm-project/vllm - zhaotyer opened this issue 2 months ago
github.com/vllm-project/vllm - zhaotyer opened this issue 2 months ago
[Feature]: Inquiry about Multi-modal Support in VLLM for MiniCPM-V2.6
github.com/vllm-project/vllm - Dong148 opened this issue 2 months ago
github.com/vllm-project/vllm - Dong148 opened this issue 2 months ago
[Bugfix]Add sharded_state to load format
github.com/vllm-project/vllm - tjandy98 opened this pull request 2 months ago
github.com/vllm-project/vllm - tjandy98 opened this pull request 2 months ago
[Usage]: Access weight of model with tp=2
github.com/vllm-project/vllm - floatingbigcat opened this issue 2 months ago
github.com/vllm-project/vllm - floatingbigcat opened this issue 2 months ago
[Performance]: Why does VLLM perform worse than TGI in Speculative decoding?
github.com/vllm-project/vllm - skylee-01 opened this issue 2 months ago
github.com/vllm-project/vllm - skylee-01 opened this issue 2 months ago
[Bug]: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - wangye360 opened this issue 2 months ago
github.com/vllm-project/vllm - wangye360 opened this issue 2 months ago
[Usage]: vllm流式输出会出现卡顿的现象,会卡6-9秒,是什么原因导致的,输入长度4万多字
github.com/vllm-project/vllm - ZZhangxian opened this issue 2 months ago
github.com/vllm-project/vllm - ZZhangxian opened this issue 2 months ago
register custom op for flash attn and use from torch.ops
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[VLM] Refactor `MultiModalConfig` initialization and profiling
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
[Tracking issue] [Help wanted]: Multi-step scheduling follow-ups
github.com/vllm-project/vllm - comaniac opened this issue 2 months ago
github.com/vllm-project/vllm - comaniac opened this issue 2 months ago
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[CI/Build] custom build backend and dynamic build dependencies
github.com/vllm-project/vllm - dtrifiro opened this pull request 2 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 2 months ago
[Misc] Revert `compressed-tensors` code reuse
github.com/vllm-project/vllm - kylesayrs opened this pull request 2 months ago
github.com/vllm-project/vllm - kylesayrs opened this pull request 2 months ago
[Core] Support tensor parallelism for GGUF quantization
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[Feature]: Context Parallelism
github.com/vllm-project/vllm - huseinzol05 opened this issue 2 months ago
github.com/vllm-project/vllm - huseinzol05 opened this issue 2 months ago
[Bug]: AutoAWQ marlin methods error
github.com/vllm-project/vllm - MichoChan opened this issue 2 months ago
github.com/vllm-project/vllm - MichoChan opened this issue 2 months ago
[Hardware][CPU] Support AWQ for CPU backend
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 2 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 2 months ago
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method
github.com/vllm-project/vllm - gongdao123 opened this pull request 2 months ago
github.com/vllm-project/vllm - gongdao123 opened this pull request 2 months ago
[Bug]: Docker.xpu build failed
github.com/vllm-project/vllm - liuxingbin opened this issue 2 months ago
github.com/vllm-project/vllm - liuxingbin opened this issue 2 months ago
support tqdm in notebooks
github.com/vllm-project/vllm - fzyzcjy opened this pull request 2 months ago
github.com/vllm-project/vllm - fzyzcjy opened this pull request 2 months ago
[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8]
github.com/vllm-project/vllm - learninmou opened this pull request 2 months ago
github.com/vllm-project/vllm - learninmou opened this pull request 2 months ago
[Speculative Decoding] Fixing hidden states handling in batch expansion
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 2 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 2 months ago
[ci] fix model tests
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Usage]: Can the embedding model be deployed using the openai interface?
github.com/vllm-project/vllm - LIUKAI0815 opened this issue 2 months ago
github.com/vllm-project/vllm - LIUKAI0815 opened this issue 2 months ago
[Bug]: Error in how HiddenStates are handled for speculative decoding
github.com/vllm-project/vllm - abhigoyal1997 opened this issue 2 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this issue 2 months ago
[Bugfix][Frontend] Disable embedding API for chat models
github.com/vllm-project/vllm - QwertyJack opened this pull request 2 months ago
github.com/vllm-project/vllm - QwertyJack opened this pull request 2 months ago
[Model]: Does vllm currently support the Llama-3.1-405B-Instruct multimodal ?
github.com/vllm-project/vllm - jasonstens opened this issue 2 months ago
github.com/vllm-project/vllm - jasonstens opened this issue 2 months ago
[BUG] OpenAI server stalled after processing an embedding request while serving a chat model
github.com/vllm-project/vllm - QwertyJack opened this issue 2 months ago
github.com/vllm-project/vllm - QwertyJack opened this issue 2 months ago
[doc] update test script to include cudagraph
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[CI] Fix crashes of performance benchmark
github.com/vllm-project/vllm - KuntaiDu opened this pull request 2 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 2 months ago
[Bug]: pytest failure on kernels/test_cutlass.py test_cutlass_fp8_gemm
github.com/vllm-project/vllm - Juelianqvq opened this issue 2 months ago
github.com/vllm-project/vllm - Juelianqvq opened this issue 2 months ago
[Installation]: build docker images: Failed to build mamba-ssm
github.com/vllm-project/vllm - chenchunhui97 opened this issue 2 months ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue 2 months ago
[Bug]: Dockerfile Build breaks in local
github.com/vllm-project/vllm - palash-fin opened this issue 2 months ago
github.com/vllm-project/vllm - palash-fin opened this issue 2 months ago
add causal parameter for flash attention
github.com/vllm-project/vllm - WanXiaopei opened this pull request 2 months ago
github.com/vllm-project/vllm - WanXiaopei opened this pull request 2 months ago
[Usage]: Is there an option to reduce GPU memory usage?
github.com/vllm-project/vllm - garyyang85 opened this issue 2 months ago
github.com/vllm-project/vllm - garyyang85 opened this issue 2 months ago
[Bug]: DeepSeek-Coder-V2-Instruct-AWQ assert self.quant_method is not None
github.com/vllm-project/vllm - fengyang95 opened this issue 2 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue 2 months ago
[Bugfix][Docs] Update list of mock imports
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
[Bug]: Unable to generate text tokens when add_eos_token = true for Mistral 7B instruct v0.1
github.com/vllm-project/vllm - yutsai84 opened this issue 2 months ago
github.com/vllm-project/vllm - yutsai84 opened this issue 2 months ago
[Bug]: Unable to generate output when add_eos_token = True for Mistral 7b instruct v0.1
github.com/vllm-project/vllm - yutsai84 opened this issue 2 months ago
github.com/vllm-project/vllm - yutsai84 opened this issue 2 months ago
[Feature]: need to be able to load tiny models with vllm for testing - PagedAttention forces large models
github.com/vllm-project/vllm - stas00 opened this issue 2 months ago
github.com/vllm-project/vllm - stas00 opened this issue 2 months ago
[misc][ci] fix cpu test with plugins
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Doc]: massive amount of docs are missing - just empty pages
github.com/vllm-project/vllm - stas00 opened this issue 2 months ago
github.com/vllm-project/vllm - stas00 opened this issue 2 months ago
[Bugfix][CI] Import ray under guard
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
[TPU] Make sure worker index aligns with node boundary
github.com/vllm-project/vllm - WoosukKwon opened this issue 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this issue 2 months ago
[frontend] spawn engine process from api server process
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
Announce NVIDIA Meetup
github.com/vllm-project/vllm - simon-mo opened this pull request 2 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 2 months ago
[RFC]: Add a FastChat like routing server to provide a central endpoint for multiple models
github.com/vllm-project/vllm - fozziethebeat opened this issue 2 months ago
github.com/vllm-project/vllm - fozziethebeat opened this issue 2 months ago
[Bug]: aqlm test failing on H100
github.com/vllm-project/vllm - bnellnm opened this issue 2 months ago
github.com/vllm-project/vllm - bnellnm opened this issue 2 months ago
[Bug]: Support Falcon Mamba
github.com/vllm-project/vllm - hahmad2008 opened this issue 2 months ago
github.com/vllm-project/vllm - hahmad2008 opened this issue 2 months ago
[AMD][CI/Build] Disambiguation of the function call for ROCm 6.2 headers compatibility
github.com/vllm-project/vllm - gshtras opened this pull request 2 months ago
github.com/vllm-project/vllm - gshtras opened this pull request 2 months ago
[Frontend] Using sync llm engine in a separate process in openai server mode
github.com/vllm-project/vllm - gshtras opened this pull request 2 months ago
github.com/vllm-project/vllm - gshtras opened this pull request 2 months ago
[CI/Build] Add text-only test for Qwen models
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request 2 months ago
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request 2 months ago
[Bug]: a bug in CUDA capabilities test with different GPUs available
github.com/vllm-project/vllm - alllexx88 opened this issue 2 months ago
github.com/vllm-project/vllm - alllexx88 opened this issue 2 months ago
[Bug]: BlockManagerV2 allocate with Sliding Window Attention
github.com/vllm-project/vllm - sylviayangyy opened this issue 2 months ago
github.com/vllm-project/vllm - sylviayangyy opened this issue 2 months ago
[Feature]: ROCm 6.2 support & FP8 Support
github.com/vllm-project/vllm - ferrybaltimore opened this issue 2 months ago
github.com/vllm-project/vllm - ferrybaltimore opened this issue 2 months ago
[Usage]: Got nccl error when deploy vllm in k8s with multiple GPUs
github.com/vllm-project/vllm - ZhaoGuoXin opened this issue 2 months ago
github.com/vllm-project/vllm - ZhaoGuoXin opened this issue 2 months ago
[Bug]: Gemma-2-2b-it load model hangs by vLLM==0.5.1 on Tesla T4 GPU
github.com/vllm-project/vllm - wlwqq opened this issue 2 months ago
github.com/vllm-project/vllm - wlwqq opened this issue 2 months ago
[TPU] Support multi-host inference
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
[Feature]: Add OpenAI server prompt_logprobs support #6508
github.com/vllm-project/vllm - gnpinkert opened this pull request 2 months ago
github.com/vllm-project/vllm - gnpinkert opened this pull request 2 months ago