Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Kernel] Enhance MoE benchmarking & tuning script
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Doc]Add documentation to benchmarking script when running TGI
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
Virtual Office Hours: Jun 5 and Jun 20
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 5 months ago
[Performance]: Automatic Prefix Caching in multi-turn conversations
github.com/vllm-project/vllm - hmellor opened this issue 5 months ago
github.com/vllm-project/vllm - hmellor opened this issue 5 months ago
[Bugfix] Fix dummy weight for fp8
github.com/vllm-project/vllm - mzusman opened this pull request 5 months ago
github.com/vllm-project/vllm - mzusman opened this pull request 5 months ago
[Bug]: Phi3 lora module not loading
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
[Bugfix]: Fix communication Timeout error in safety-constrained distributed System
github.com/vllm-project/vllm - ZwwWayne opened this pull request 5 months ago
github.com/vllm-project/vllm - ZwwWayne opened this pull request 5 months ago
[Installation]: Failed building editable for vllm
github.com/vllm-project/vllm - Fanb1ing opened this issue 5 months ago
github.com/vllm-project/vllm - Fanb1ing opened this issue 5 months ago
[Misc]:pydantic version conflict between vllm openai server and transformers
github.com/vllm-project/vllm - yunll opened this issue 5 months ago
github.com/vllm-project/vllm - yunll opened this issue 5 months ago
[Bug]: `max_context_len_to_capture` deprecated, confusion with `max_seq_len_to_capture`
github.com/vllm-project/vllm - lianghsun opened this issue 5 months ago
github.com/vllm-project/vllm - lianghsun opened this issue 5 months ago
[Bug]: Cannot use FlashAttention-2 backend because the flash_attn package is not found
github.com/vllm-project/vllm - maxin9966 opened this issue 5 months ago
github.com/vllm-project/vllm - maxin9966 opened this issue 5 months ago
[CI/Build] Make marlin kernel build conditional.
github.com/vllm-project/vllm - esmeetu opened this pull request 5 months ago
github.com/vllm-project/vllm - esmeetu opened this pull request 5 months ago
[Bug]: llm_engine_example.py (more requests) get stuck
github.com/vllm-project/vllm - CsRic opened this issue 5 months ago
github.com/vllm-project/vllm - CsRic opened this issue 5 months ago
[Usage]: Passing a guided_json in offline inference
github.com/vllm-project/vllm - ccdv-ai opened this issue 5 months ago
github.com/vllm-project/vllm - ccdv-ai opened this issue 5 months ago
Update test_ignore_eos
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
[Core] Fix scheduler considering "no LoRA" as "LoRA"
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
[Core] Eliminate parallel worker per-step task scheduling overhead
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
[Misc] Load FP8 kv-cache scaling factors from checkpoints
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[Misc]: When is the planned date for the next release?
github.com/vllm-project/vllm - vrdn-23 opened this issue 5 months ago
github.com/vllm-project/vllm - vrdn-23 opened this issue 5 months ago
[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory
github.com/vllm-project/vllm - epignatelli opened this issue 5 months ago
github.com/vllm-project/vllm - epignatelli opened this issue 5 months ago
[Bugfix] Relax tiktoken to >= 0.6.0
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
[Core] Sharded State Loader download from HF
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support)
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
[Model] Add Phi-2 LoRA support
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Bugfix] Fix with verifying model max len
github.com/vllm-project/vllm - dimaioksha opened this pull request 5 months ago
github.com/vllm-project/vllm - dimaioksha opened this pull request 5 months ago
[Bug]: Too strict version requirement on `tiktoken`
github.com/vllm-project/vllm - saattrupdan opened this issue 5 months ago
github.com/vllm-project/vllm - saattrupdan opened this issue 5 months ago
[Bug]: assert parts[0] == "base_model" AssertionError
github.com/vllm-project/vllm - Edisonwei54 opened this issue 5 months ago
github.com/vllm-project/vllm - Edisonwei54 opened this issue 5 months ago
[Usage]: why can't I set gpu nums while use "tensor_parallel_size"?
github.com/vllm-project/vllm - GodHforever opened this issue 5 months ago
github.com/vllm-project/vllm - GodHforever opened this issue 5 months ago
[Installation]: Do we have the plan to update the pip package installation method for the CPU backend.
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
[Usage]: gpu memory usage when using tensor parallel
github.com/vllm-project/vllm - DaiJianghai opened this issue 5 months ago
github.com/vllm-project/vllm - DaiJianghai opened this issue 5 months ago
[Bug]: single lora request error make all processing requests error
github.com/vllm-project/vllm - jinzhen-lin opened this issue 5 months ago
github.com/vllm-project/vllm - jinzhen-lin opened this issue 5 months ago
[Build/CI] Extending AMD Tests
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
[Draft][CI/Build] Optimize models tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[RFC]: Add control panel support for vLLM
github.com/vllm-project/vllm - leiwen83 opened this issue 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this issue 5 months ago
[Bug]: Shape error encountered in speculative decoding when `enable_lora=True`
github.com/vllm-project/vllm - mitchellstern opened this issue 5 months ago
github.com/vllm-project/vllm - mitchellstern opened this issue 5 months ago
[Doc] Update Ray Data distributed offline inference example
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
[Misc] remove old comments
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Usage]: distributed inference with kuberay
github.com/vllm-project/vllm - hetian127 opened this issue 5 months ago
github.com/vllm-project/vllm - hetian127 opened this issue 5 months ago
[Misc]: a question about chunked-prefill in flash-attn backends
github.com/vllm-project/vllm - HarryWu99 opened this issue 5 months ago
github.com/vllm-project/vllm - HarryWu99 opened this issue 5 months ago
Add control panel allow manage multi vllm instances
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
[Doc]: Why is the PA kernel time cost in the decode phase optimized after turning on Prefix Caching?
github.com/vllm-project/vllm - wjj19950828 opened this issue 5 months ago
github.com/vllm-project/vllm - wjj19950828 opened this issue 5 months ago
[Feature]: add local_files_only parameter
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
[Bug]: No CUDA GPUs are available on 'CPU' use
github.com/vllm-project/vllm - mcr-ksh opened this issue 5 months ago
github.com/vllm-project/vllm - mcr-ksh opened this issue 5 months ago
[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true
github.com/vllm-project/vllm - liuzhenghua opened this pull request 5 months ago
github.com/vllm-project/vllm - liuzhenghua opened this pull request 5 months ago
[Usage]: How to determine how many concurrent requests can be supported in an acceptable time duration with demo api server?
github.com/vllm-project/vllm - senbinyu opened this issue 5 months ago
github.com/vllm-project/vllm - senbinyu opened this issue 5 months ago
[Bug]: Qwen1.5-72B L20x8 latest vLLM TPOT slower than v0.4.0.post, 48ms vs 39ms, why?
github.com/vllm-project/vllm - DefTruth opened this issue 5 months ago
github.com/vllm-project/vllm - DefTruth opened this issue 5 months ago
[Bugfix / Core] Prefix Caching Guards (merged with main)
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
[Core] Avoid one broadcast op when propagating metadata
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
[Doc] Highlight the fourth meetup in the README
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
Add a new kernel for fusing the dequantization in fused-moe gemm
github.com/vllm-project/vllm - RezaYazdaniAminabadi opened this pull request 5 months ago
github.com/vllm-project/vllm - RezaYazdaniAminabadi opened this pull request 5 months ago
[Speculative decoding][Re-take] Enable TP>1 speculative decoding
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[Bug]: Cache operations are not supported for Neuron backend.
github.com/vllm-project/vllm - milo157 opened this issue 5 months ago
github.com/vllm-project/vllm - milo157 opened this issue 5 months ago
[Feature]: Build and publish Neuron docker image
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
[Bug]: Running vllm docker image with neuron fails
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
[Bugfix] fix rope error when load models with different dtypes
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
[Build/CI] Enabling AMD Entrypoints Test
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
[New Model]: Google's Paligemma family of models
github.com/vllm-project/vllm - nfplay opened this issue 5 months ago
github.com/vllm-project/vllm - nfplay opened this issue 5 months ago
[Usage]: how to use run in mixed mode CPU/GPU (device_map="auto")
github.com/vllm-project/vllm - osafaimal opened this issue 5 months ago
github.com/vllm-project/vllm - osafaimal opened this issue 5 months ago
[Bug]: llava inference result is wrong !
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
[Hardware][Intel] Add LoRA adapter support for CPU backend
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
Support to serve vLLM on Kubernetes with LWS
github.com/vllm-project/vllm - kerthcet opened this pull request 5 months ago
github.com/vllm-project/vllm - kerthcet opened this pull request 5 months ago
[Bugfix] Avoid circular import in model loader
github.com/vllm-project/vllm - hiyouga opened this pull request 5 months ago
github.com/vllm-project/vllm - hiyouga opened this pull request 5 months ago
Can I still use FP8 E5M2 KV Cache if my GPU capability is less than 8.9?
github.com/vllm-project/vllm - blacker521 opened this issue 5 months ago
github.com/vllm-project/vllm - blacker521 opened this issue 5 months ago
[Usage]: Passing image to the vllm api endpoint
github.com/vllm-project/vllm - davidramous opened this issue 5 months ago
github.com/vllm-project/vllm - davidramous opened this issue 5 months ago
[Usage]: How to use tensor-parallel-size argument when deploy Llama3-8b with AsyncLLMEngine
github.com/vllm-project/vllm - ANYMS-A opened this issue 5 months ago
github.com/vllm-project/vllm - ANYMS-A opened this issue 5 months ago
[Feature]: rope_scaling for qwen2
github.com/vllm-project/vllm - HappyLynn opened this issue 5 months ago
github.com/vllm-project/vllm - HappyLynn opened this issue 5 months ago
[Performance]: Will memcpy happen with distributed kv caches while decoding ?
github.com/vllm-project/vllm - GodHforever opened this issue 5 months ago
github.com/vllm-project/vllm - GodHforever opened this issue 5 months ago
[Bug]: llava, output is truncated, not fully displayed
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
[Bug]: Llama 3 - Out of memory - RTX 4060 TI
github.com/vllm-project/vllm - savi8sant8s opened this issue 5 months ago
github.com/vllm-project/vllm - savi8sant8s opened this issue 5 months ago
Revert "[Kernel] Use flash-attn for decoding (#3648)"
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
temporarily prioritize xformer for lora test
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
[Core][Distributed] remove graph mode function
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
Add 4th meetup announcement to readme
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
[Bugfix] Properly set distributed_executor_backend in ParallelConfig
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
Add marlin unit tests and marlin benchmark script
github.com/vllm-project/vllm - alexm-nm opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-nm opened this pull request 5 months ago
Remove EOS token before passing the tokenized input to model
github.com/vllm-project/vllm - VallabhMahajan1 opened this issue 5 months ago
github.com/vllm-project/vllm - VallabhMahajan1 opened this issue 5 months ago
[Bug]: 'ArgumentHelper' has no attribute 'enable_prefix_caching'
github.com/vllm-project/vllm - xiaohangguo opened this issue 5 months ago
github.com/vllm-project/vllm - xiaohangguo opened this issue 5 months ago
[Usage]: convert llava-v1.5-7b to liuhaotian/llava-v1.5-7b-hf format
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
github.com/vllm-project/vllm - xiaoyudxy opened this issue 5 months ago
Qwen1.5-14B-Chat-GPTQ-Int4: quantization is not fully optimized yet. The speed can be slower than non-quantized models.
github.com/vllm-project/vllm - lostsollar opened this issue 5 months ago
github.com/vllm-project/vllm - lostsollar opened this issue 5 months ago
[Bugfix][Model] Add base class for vision-language models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Speculative decoding] Enable TP>1 speculative decoding
github.com/vllm-project/vllm - cadedaniel opened this pull request 5 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 5 months ago
[Usage]: Seems nn.module definition may affect the output tokens. Don't know the reason.
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
[Bugfix][Doc] Fix CI failure in docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Performance]: how to test tensorrt-llm serving correctly
github.com/vllm-project/vllm - RunningLeon opened this issue 5 months ago
github.com/vllm-project/vllm - RunningLeon opened this issue 5 months ago
[Performance]: Deepseek-v2 support
github.com/vllm-project/vllm - ZixinxinWang opened this issue 5 months ago
github.com/vllm-project/vllm - ZixinxinWang opened this issue 5 months ago
[Doc] Add page for `PoolingParams`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model
github.com/vllm-project/vllm - linxihui opened this pull request 5 months ago
github.com/vllm-project/vllm - linxihui opened this pull request 5 months ago
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
[Doc] Shorten README by removing supported model list
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
[Bug]: `logprobs` is not compatible with the OpenAI spec
github.com/vllm-project/vllm - GabrielBianconi opened this issue 5 months ago
github.com/vllm-project/vllm - GabrielBianconi opened this issue 5 months ago
[Frontend] Support OpenAI batch file format
github.com/vllm-project/vllm - wuisawesome opened this pull request 5 months ago
github.com/vllm-project/vllm - wuisawesome opened this pull request 5 months ago
[CI/Build] PEP 517/518 improvements
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
Add GPTQ Marlin 2:4 sparse structured support
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Bug]: Async engine hangs with 0.4.* releases
github.com/vllm-project/vllm - glos-nv opened this issue 5 months ago
github.com/vllm-project/vllm - glos-nv opened this issue 5 months ago
[Kernel] add bfloat16 support for gptq marlin kernel
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
[Lora] Support long context lora
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago