Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: when curl /chat/completions, TypeError: Unable to evaluate type annotation 'Required[Union[str, Iterable[ChatCompletionContentPartTextParam]]]'.
github.com/vllm-project/vllm - youqugit opened this issue about 1 month ago
github.com/vllm-project/vllm - youqugit opened this issue about 1 month ago
[Bug]: 2 nodes serving hanging
github.com/vllm-project/vllm - AlvL1225 opened this issue about 1 month ago
github.com/vllm-project/vllm - AlvL1225 opened this issue about 1 month ago
[plugin][torch.compile] allow to add custom compile backend
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[Bug]: GPU can only load the model once, it gets stuck when loaded again
github.com/vllm-project/vllm - hz20091942 opened this issue about 1 month ago
github.com/vllm-project/vllm - hz20091942 opened this issue about 1 month ago
[Bugfix] Fix code for downloading models from modelscope
github.com/vllm-project/vllm - tastelikefeet opened this pull request about 1 month ago
github.com/vllm-project/vllm - tastelikefeet opened this pull request about 1 month ago
[Feature][kernel] tensor parallelism with bitsandbytes quantization
github.com/vllm-project/vllm - chenqianfzh opened this pull request about 1 month ago
github.com/vllm-project/vllm - chenqianfzh opened this pull request about 1 month ago
[CI/Build] Making xformers import conditional, cannot use them on ROCM
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 1 month ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 1 month ago
[BugFix] fix group_topk
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
github.com/vllm-project/vllm - dsikka opened this pull request about 1 month ago
[Bug]: Pixtral + guided_json fails with Internal Server Error
github.com/vllm-project/vllm - pseudotensor opened this issue about 1 month ago
github.com/vllm-project/vllm - pseudotensor opened this issue about 1 month ago
[Kernel] Factor registrations
github.com/vllm-project/vllm - bnellnm opened this pull request about 1 month ago
github.com/vllm-project/vllm - bnellnm opened this pull request about 1 month ago
[Bug]: mismatch between multimodal tokens and placeholders for Llava-Next (4 GPUs)
github.com/vllm-project/vllm - sayakpaul opened this issue about 1 month ago
github.com/vllm-project/vllm - sayakpaul opened this issue about 1 month ago
[Bug]: use speculative model in vllm error: TypeError: Worker.__init__() got an unexpected keyword argument 'num_speculative_tokens'
github.com/vllm-project/vllm - xhjcxxl opened this issue about 1 month ago
github.com/vllm-project/vllm - xhjcxxl opened this issue about 1 month ago
[Bug]: The accuracy of vllm-Qwen2-VL-7B-Instruct is low.
github.com/vllm-project/vllm - xiangxinhello opened this issue about 1 month ago
github.com/vllm-project/vllm - xiangxinhello opened this issue about 1 month ago
[Model] Refactor BLIP/BLIP-2 to support composite model loading
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model
github.com/vllm-project/vllm - sydnash opened this pull request about 1 month ago
github.com/vllm-project/vllm - sydnash opened this pull request about 1 month ago
[Misc]: Memory Order in Custom Allreduce
github.com/vllm-project/vllm - HydraQYH opened this issue about 1 month ago
github.com/vllm-project/vllm - HydraQYH opened this issue about 1 month ago
[Bug]: CUDA device detection issue with KubeRay distributed inference for quantized models
github.com/vllm-project/vllm - jradikk opened this issue about 1 month ago
github.com/vllm-project/vllm - jradikk opened this issue about 1 month ago
[Bug]: Pixtral leads to Expected at least 18286 dummy tokens for profiling, but found 16640 tokens instead or seq_len 25254 should be equal to N_txt + N_img (806, 12224, 24448)
github.com/vllm-project/vllm - pseudotensor opened this issue about 1 month ago
github.com/vllm-project/vllm - pseudotensor opened this issue about 1 month ago
[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example
github.com/vllm-project/vllm - wschin opened this pull request about 1 month ago
github.com/vllm-project/vllm - wschin opened this pull request about 1 month ago
[Bug]: Error when using tensor_parallel in v0.6.1
github.com/vllm-project/vllm - pspdada opened this issue about 1 month ago
github.com/vllm-project/vllm - pspdada opened this issue about 1 month ago
[New Model]: qwen2-audio
github.com/vllm-project/vllm - seetimee opened this issue about 1 month ago
github.com/vllm-project/vllm - seetimee opened this issue about 1 month ago
[Usage]: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
github.com/vllm-project/vllm - TianSongS opened this issue about 1 month ago
github.com/vllm-project/vllm - TianSongS opened this issue about 1 month ago
[Kernel] Factor registrations
github.com/vllm-project/vllm - bnellnm opened this pull request about 1 month ago
github.com/vllm-project/vllm - bnellnm opened this pull request about 1 month ago
[Model] Support Solar Model
github.com/vllm-project/vllm - shing100 opened this pull request about 1 month ago
github.com/vllm-project/vllm - shing100 opened this pull request about 1 month ago
[Feature]: MultiModal benchmark_latency, benchmark_throughput, and benchmark_online
github.com/vllm-project/vllm - OrenLeung opened this issue about 1 month ago
github.com/vllm-project/vllm - OrenLeung opened this issue about 1 month ago
[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call
github.com/vllm-project/vllm - gshtras opened this pull request about 1 month ago
github.com/vllm-project/vllm - gshtras opened this pull request about 1 month ago
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 1 month ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 1 month ago
[Hardware][intel GPU] bump up ipex version to 2.3
github.com/vllm-project/vllm - jikunshang opened this pull request about 1 month ago
github.com/vllm-project/vllm - jikunshang opened this pull request about 1 month ago
[Bug]: MistralTokenizer object has no attribute 'get_vocab'
github.com/vllm-project/vllm - maxDavid40 opened this issue about 1 month ago
github.com/vllm-project/vllm - maxDavid40 opened this issue about 1 month ago
[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kernel
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
[Usage]: how to shutdown vllm server
github.com/vllm-project/vllm - wiluen opened this issue about 1 month ago
github.com/vllm-project/vllm - wiluen opened this issue about 1 month ago
[Misc] Add warning for using encoder/decoder model with cpu backend
github.com/vllm-project/vllm - kevin314 opened this pull request about 1 month ago
github.com/vllm-project/vllm - kevin314 opened this pull request about 1 month ago
[bugfix] torch profiler bug for single gpu with GPUExecutor
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 1 month ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 1 month ago
[CI/Build]: Add Bandit security check to workflow
github.com/vllm-project/vllm - ChengyuZhu6 opened this pull request about 1 month ago
github.com/vllm-project/vllm - ChengyuZhu6 opened this pull request about 1 month ago
[Bug]: Kernel died while waiting for execute reply in Kaggle TPU VM v3-8 (2024-08-22)
github.com/vllm-project/vllm - BrandonStudio opened this issue about 1 month ago
github.com/vllm-project/vllm - BrandonStudio opened this issue about 1 month ago
[Bug]: vllm v0.6.0 profiler report GPUExecutorAsync object has no attribute '_run_workers' on ROCm and NV H20
github.com/vllm-project/vllm - danielhua23 opened this issue about 1 month ago
github.com/vllm-project/vllm - danielhua23 opened this issue about 1 month ago
[Bug]: guided generation can't always finish generating the requested structure
github.com/vllm-project/vllm - stas00 opened this issue about 1 month ago
github.com/vllm-project/vllm - stas00 opened this issue about 1 month ago
[Performance]: V0.6.0 version of the model, benchmarking found that the number of successful responses accounted for half of the number of requests, which is why
github.com/vllm-project/vllm - Amber-Believe opened this issue about 1 month ago
github.com/vllm-project/vllm - Amber-Believe opened this issue about 1 month ago
[Core][VLM] Add support for placeholder token content hashes
github.com/vllm-project/vllm - petersalas opened this pull request about 1 month ago
github.com/vllm-project/vllm - petersalas opened this pull request about 1 month ago
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch
github.com/vllm-project/vllm - pooyadavoodi opened this pull request about 1 month ago
github.com/vllm-project/vllm - pooyadavoodi opened this pull request about 1 month ago
[Core][VLM] Add precise multi-modal placeholder tracking
github.com/vllm-project/vllm - petersalas opened this pull request about 1 month ago
github.com/vllm-project/vllm - petersalas opened this pull request about 1 month ago
[Kernel] Add prefix-caching support for phi-3-small-8k/128k model triton kernel
github.com/vllm-project/vllm - congcongchen123 opened this pull request about 1 month ago
github.com/vllm-project/vllm - congcongchen123 opened this pull request about 1 month ago
Restoring missing CI file.
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 1 month ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 1 month ago
[Feature] Add support for Llama 3.1 and 3.2 tool use
github.com/vllm-project/vllm - maxdebayser opened this pull request about 1 month ago
github.com/vllm-project/vllm - maxdebayser opened this pull request about 1 month ago
[MISC] Keep chunked prefill enabled by default with long context when prefix caching is enabled
github.com/vllm-project/vllm - comaniac opened this pull request about 1 month ago
github.com/vllm-project/vllm - comaniac opened this pull request about 1 month ago
[Bugfix] Ensure multistep lookahead allocation is compatible with cuda graph max capture
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
[Model] tool calling support for ibm-granite/granite-20b-functioncalling
github.com/vllm-project/vllm - wseaton opened this pull request about 1 month ago
github.com/vllm-project/vllm - wseaton opened this pull request about 1 month ago
[Gemma2] add bitsandbytes support for Gemma2
github.com/vllm-project/vllm - blueyo0 opened this pull request about 1 month ago
github.com/vllm-project/vllm - blueyo0 opened this pull request about 1 month ago
[misc] CUDA Time Layerwise Profiler
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 1 month ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request about 1 month ago
Add output streaming support to multi-step + async
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
[Core] feat: support pinned caching with prefix caching
github.com/vllm-project/vllm - llsj14 opened this pull request about 1 month ago
github.com/vllm-project/vllm - llsj14 opened this pull request about 1 month ago
[RFC]: Pinned Caching with Automatic Prefix Caching (Related to Anthropic Prompt Caching API)
github.com/vllm-project/vllm - llsj14 opened this issue about 1 month ago
github.com/vllm-project/vllm - llsj14 opened this issue about 1 month ago
[Usage]: example/offline_inference_chat.py run error.
github.com/vllm-project/vllm - dshm opened this issue about 1 month ago
github.com/vllm-project/vllm - dshm opened this issue about 1 month ago
[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models
github.com/vllm-project/vllm - jeejeelee opened this pull request about 1 month ago
github.com/vllm-project/vllm - jeejeelee opened this pull request about 1 month ago
[Bug]: TimeoutError During Benchmark Profiling with Torch Profiler on vLLM v0.6.0
github.com/vllm-project/vllm - hxer7963 opened this issue about 1 month ago
github.com/vllm-project/vllm - hxer7963 opened this issue about 1 month ago
[BugFix] Spec Decode error:No available block found in 60 Second.
github.com/vllm-project/vllm - xq25478 opened this pull request about 1 month ago
github.com/vllm-project/vllm - xq25478 opened this pull request about 1 month ago
[CI/Build] Buildkite pipeline generator
github.com/vllm-project/vllm - khluu opened this pull request about 1 month ago
github.com/vllm-project/vllm - khluu opened this pull request about 1 month ago
Do vLLM support `input_embeds` as input while using LLama?
github.com/vllm-project/vllm - OswaldoBornemann opened this issue about 1 month ago
github.com/vllm-project/vllm - OswaldoBornemann opened this issue about 1 month ago
Fix verify tokens with the correct bonus token
github.com/vllm-project/vllm - jiqing-feng opened this pull request about 1 month ago
github.com/vllm-project/vllm - jiqing-feng opened this pull request about 1 month ago
[Bug]: vLLM crashes with larger context sizes on TPUs
github.com/vllm-project/vllm - francescov1 opened this issue about 1 month ago
github.com/vllm-project/vllm - francescov1 opened this issue about 1 month ago
[Speculative Decoding] Test refactor
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request about 1 month ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request about 1 month ago
[Performance]: Image preprocessing is executed twice for same image during VLLM(Qwen2-vl) inference
github.com/vllm-project/vllm - ZhangYaoFu opened this issue about 1 month ago
github.com/vllm-project/vllm - ZhangYaoFu opened this issue about 1 month ago
[Frontend] Clean up type annotations for mistral tokenizer
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Performance]: guided generation is very slow in offline mode
github.com/vllm-project/vllm - stas00 opened this issue about 1 month ago
github.com/vllm-project/vllm - stas00 opened this issue about 1 month ago
[Misc] Benchmark for awq_triton kernels
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm
github.com/vllm-project/vllm - charlifu opened this pull request about 1 month ago
github.com/vllm-project/vllm - charlifu opened this pull request about 1 month ago
Fix ppc64le buildkite job
github.com/vllm-project/vllm - sumitd2 opened this pull request about 1 month ago
github.com/vllm-project/vllm - sumitd2 opened this pull request about 1 month ago
[Bugfix] Reenable LRU cache on Outlines' guide getters
github.com/vllm-project/vllm - Lap1n opened this pull request about 1 month ago
github.com/vllm-project/vllm - Lap1n opened this pull request about 1 month ago
[RFC]: Reimplement and separate beam search on top of vLLM core
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this issue about 1 month ago
[MISC] Dump model runner inputs when crashing
github.com/vllm-project/vllm - comaniac opened this pull request about 1 month ago
github.com/vllm-project/vllm - comaniac opened this pull request about 1 month ago
[Bug]: Mistral Large Instruct 2407 tool calling leakage
github.com/vllm-project/vllm - dsingal0 opened this issue about 1 month ago
github.com/vllm-project/vllm - dsingal0 opened this issue about 1 month ago
[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
[Model] support minicpm3
github.com/vllm-project/vllm - SUDA-HLT-ywfang opened this pull request about 1 month ago
github.com/vllm-project/vllm - SUDA-HLT-ywfang opened this pull request about 1 month ago
Correct adapter usage for cohere
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request about 1 month ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request about 1 month ago
[Bug]: Speculative Decode + OpenTelemetry not working
github.com/vllm-project/vllm - cermeng opened this issue about 1 month ago
github.com/vllm-project/vllm - cermeng opened this issue about 1 month ago
[Bugfix] Mapping physical device indices for e2e test utils
github.com/vllm-project/vllm - ShangmingCai opened this pull request about 1 month ago
github.com/vllm-project/vllm - ShangmingCai opened this pull request about 1 month ago
[Bug]: Phi-3-V with vllm serve Pickling function error
github.com/vllm-project/vllm - BabyChouSr opened this issue about 1 month ago
github.com/vllm-project/vllm - BabyChouSr opened this issue about 1 month ago
[Bug]: vllm 0.5.4 NCCL error when applying speculative decoding
github.com/vllm-project/vllm - Armod-I opened this issue about 1 month ago
github.com/vllm-project/vllm - Armod-I opened this issue about 1 month ago
[Misc]: What is the difference between `vllm.core.interfaces.BlockAllocator` and `vllm.core.block_manager_v1.BlockAllocatorBase`?
github.com/vllm-project/vllm - mino-park7 opened this issue about 1 month ago
github.com/vllm-project/vllm - mino-park7 opened this issue about 1 month ago
[Usage]: Throughput and quality issue with vllm 0.6.0.
github.com/vllm-project/vllm - Agrawalchitranshu opened this issue about 1 month ago
github.com/vllm-project/vllm - Agrawalchitranshu opened this issue about 1 month ago
[Bug]: deepseek_v2 236B on 8XA100 wrong output vllm==0.5.4
github.com/vllm-project/vllm - shuailong616 opened this issue about 1 month ago
github.com/vllm-project/vllm - shuailong616 opened this issue about 1 month ago
[Usage]: How can i user scheduler with OpenAI Compatible Server
github.com/vllm-project/vllm - nguyenhoanganh2002 opened this issue about 1 month ago
github.com/vllm-project/vllm - nguyenhoanganh2002 opened this issue about 1 month ago
[Bug]: Qwen2-VL AssertionError: assert "factor" in rope_scaling.
github.com/vllm-project/vllm - zhangxi1997 opened this issue about 1 month ago
github.com/vllm-project/vllm - zhangxi1997 opened this issue about 1 month ago
[Feature]: Reflection-Llama-3.1-70B tool-choice support
github.com/vllm-project/vllm - warlockedward opened this issue about 1 month ago
github.com/vllm-project/vllm - warlockedward opened this issue about 1 month ago
[RFC]: More functionality for API control
github.com/vllm-project/vllm - paulcx opened this issue about 1 month ago
github.com/vllm-project/vllm - paulcx opened this issue about 1 month ago
[Usage]: How to access mlp layer using the current version vllm(0.4.0)
github.com/vllm-project/vllm - waterluck opened this issue about 1 month ago
github.com/vllm-project/vllm - waterluck opened this issue about 1 month ago
[Usage]: multi image inference for "OpenGVLab/InternVL2-8B" not working
github.com/vllm-project/vllm - dahwin opened this issue about 1 month ago
github.com/vllm-project/vllm - dahwin opened this issue about 1 month ago
[Bug]: RuntimeError: shape mismatch: value tensor of shape [3328, 7168] cannot be broadcast to indexing result of shape [3328] for OpenGVLab/InternVL2-40B
github.com/vllm-project/vllm - Manikandan-Thangaraj-ZS0321 opened this issue about 1 month ago
github.com/vllm-project/vllm - Manikandan-Thangaraj-ZS0321 opened this issue about 1 month ago
[Bug]: --Chunked prefill can't be used together with num-scheduler-steps
github.com/vllm-project/vllm - ndao600 opened this issue about 1 month ago
github.com/vllm-project/vllm - ndao600 opened this issue about 1 month ago
[Bug]: vllm: error: unrecognized arguments: --config
github.com/vllm-project/vllm - FloWsnr opened this issue about 1 month ago
github.com/vllm-project/vllm - FloWsnr opened this issue about 1 month ago
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility
github.com/vllm-project/vllm - K-Mistele opened this pull request about 1 month ago
github.com/vllm-project/vllm - K-Mistele opened this pull request about 1 month ago
[New Model]: Reflection-Llama-3.1-70B
github.com/vllm-project/vllm - sekh77 opened this issue about 1 month ago
github.com/vllm-project/vllm - sekh77 opened this issue about 1 month ago
[Usage]: Distributed inference with edge case: model fits multiple GPUs, but number of GPUs cannot divide the model size evenly
github.com/vllm-project/vllm - leszekhanusz opened this issue about 1 month ago
github.com/vllm-project/vllm - leszekhanusz opened this issue about 1 month ago
[Model][VLM] Decouple weight loading logic for `Paligemma`
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 1 month ago
[Bug]: AssertionError when using automatic prefix caching and prompt_logprobs
github.com/vllm-project/vllm - novoselrok opened this issue about 1 month ago
github.com/vllm-project/vllm - novoselrok opened this issue about 1 month ago
[Bugfix] Fix async postprocessor in case of preemption
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 1 month ago
[Usage]: How to call my custom agent through vllm's api service
github.com/vllm-project/vllm - sumic opened this issue about 1 month ago
github.com/vllm-project/vllm - sumic opened this issue about 1 month ago
[Bug]: how to set gpu id in code?
github.com/vllm-project/vllm - cqray1990 opened this issue about 1 month ago
github.com/vllm-project/vllm - cqray1990 opened this issue about 1 month ago
[New Model]: Support for Idefics3 8B Llama3
github.com/vllm-project/vllm - costelter opened this issue about 1 month ago
github.com/vllm-project/vllm - costelter opened this issue about 1 month ago