Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: deploy Phi-3-mini-128k-instruct AssertionError
github.com/vllm-project/vllm - hxujal opened this issue 5 months ago
github.com/vllm-project/vllm - hxujal opened this issue 5 months ago
[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput
github.com/vllm-project/vllm - Ourspolaire1 opened this issue 5 months ago
github.com/vllm-project/vllm - Ourspolaire1 opened this issue 5 months ago
[Kernel] add bfloat16 support for gptq kernel
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 5 months ago
[Installation]: Stuck for two hours during the installation of vllm
github.com/vllm-project/vllm - loxs123 opened this issue 5 months ago
github.com/vllm-project/vllm - loxs123 opened this issue 5 months ago
[Feature][Kernel] Support bitsandbytes quantization and QLoRA
github.com/vllm-project/vllm - chenqianfzh opened this pull request 5 months ago
github.com/vllm-project/vllm - chenqianfzh opened this pull request 5 months ago
[core] SequenceController in SamplingParams
github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago
github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago
Sync huggingface modifications of qwen Moe model
github.com/vllm-project/vllm - eigen2017 opened this pull request 5 months ago
github.com/vllm-project/vllm - eigen2017 opened this pull request 5 months ago
[CI/Build] build on empty device for better dev experience
github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago
[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt
github.com/vllm-project/vllm - leejamesss opened this issue 5 months ago
github.com/vllm-project/vllm - leejamesss opened this issue 5 months ago
[Feature]: Host CPU Docker image on Docker Hub
github.com/vllm-project/vllm - VMinB12 opened this issue 5 months ago
github.com/vllm-project/vllm - VMinB12 opened this issue 5 months ago
[Feature]: CI: Test on NVLink-enabled machine
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
[Misc] Logits processor plugins
github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago
github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago
[Kernel] sliding window support in paged_attention_v1/v2 kernels
github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago
github.com/vllm-project/vllm - mmoskal opened this pull request 5 months ago
[Misc]Easier access to the nccl library
github.com/vllm-project/vllm - Cyuchuan opened this pull request 5 months ago
github.com/vllm-project/vllm - Cyuchuan opened this pull request 5 months ago
[Feature]: could paged_attention_v1 support parameter 'attn_bias'
github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago
github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago
[Bugfix] Fix call to init_logger in openai server
github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago
github.com/vllm-project/vllm - NadavShmayo opened this pull request 5 months ago
[Core][Bugfix]: fix prefix caching for blockv2
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)
github.com/vllm-project/vllm - bratao opened this issue 5 months ago
github.com/vllm-project/vllm - bratao opened this issue 5 months ago
[Core][Bugfix]: fix prefix caching for blockv2
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
[Core][Bugfix]: fix prefix caching for blockv2
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 5 months ago
[Performance]: Why the avg. througput generation is low?
github.com/vllm-project/vllm - rvsh2 opened this issue 5 months ago
github.com/vllm-project/vllm - rvsh2 opened this issue 5 months ago
[CI/Build] Enable entrypoints tests to be run in a single command
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Frontend] Re-enable custom roles in Chat Completions API
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Core][Distributed] add fast broadcast for tensor dict
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago
Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)
github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago
[Core][Distributed] refactor custom allreduce to support multiple tp groups
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
Support fp8 KV cache in `context_attention_fwd`
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
Add TensorizerArgs to client api server
github.com/vllm-project/vllm - vrdn-23 opened this pull request 5 months ago
github.com/vllm-project/vllm - vrdn-23 opened this pull request 5 months ago
[Misc] Enhance attention selector
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Core][Test] fix function name typo in custom allreduce
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Kernel] Add w8a8 CUTLASS kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[CI] Nits for bad initialization of SeqGroup in testing
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Usage]: prompt_logprompt from endpoint
github.com/vllm-project/vllm - basma-b opened this issue 5 months ago
github.com/vllm-project/vllm - basma-b opened this issue 5 months ago
[Usage]: How to batch requests to chat models with OpenAI server?
github.com/vllm-project/vllm - sidjha1 opened this issue 5 months ago
github.com/vllm-project/vllm - sidjha1 opened this issue 5 months ago
[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU
github.com/vllm-project/vllm - danielstankw opened this issue 5 months ago
github.com/vllm-project/vllm - danielstankw opened this issue 5 months ago
[RFC]: Support specifying quant_config details in the LLM or Server entrypoints
github.com/vllm-project/vllm - mgoin opened this issue 5 months ago
github.com/vllm-project/vllm - mgoin opened this issue 5 months ago
[Bug]: ValueError when using LoRA with CohereForCausalLM model
github.com/vllm-project/vllm - onlyfish79 opened this issue 5 months ago
github.com/vllm-project/vllm - onlyfish79 opened this issue 5 months ago
[Bug]: squeezeLLM with sparse could not work.
github.com/vllm-project/vllm - RyanWMHI opened this issue 5 months ago
github.com/vllm-project/vllm - RyanWMHI opened this issue 5 months ago
[Bug]: why the logits is different between 0.4.1 and 0.4.2
github.com/vllm-project/vllm - sitabulaixizawaluduo opened this issue 5 months ago
github.com/vllm-project/vllm - sitabulaixizawaluduo opened this issue 5 months ago
[New Model]: Blip2 Support required
github.com/vllm-project/vllm - anisingh1 opened this issue 5 months ago
github.com/vllm-project/vllm - anisingh1 opened this issue 5 months ago
[CI/Build] use setuptools-scm to set __version__
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
[Core] Fix circular reference which leaked llm instance in local dev env
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention)
github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago
github.com/vllm-project/vllm - cillinzhang opened this issue 5 months ago
[Bugfix] Fix CLI arguments in OpenAI server docs
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
[Core]fix type annotation for `swap_blocks`
github.com/vllm-project/vllm - jikunshang opened this pull request 5 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request 5 months ago
[Bug]: Unable to serve Llama3 using vLLM Docker container
github.com/vllm-project/vllm - vecorro opened this issue 5 months ago
github.com/vllm-project/vllm - vecorro opened this issue 5 months ago
[Speculative decoding] Improve n-gram efficiency
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[Performance]: Why does vllm spend so much memory even using OPT model?
github.com/vllm-project/vllm - MitchellX opened this issue 5 months ago
github.com/vllm-project/vllm - MitchellX opened this issue 5 months ago
[CI/Build] Enforce style for C++ and CUDA code with `clang-format`
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
[Feature]: Enforce formatting standards for C++ and CUDA code
github.com/vllm-project/vllm - mgoin opened this issue 5 months ago
github.com/vllm-project/vllm - mgoin opened this issue 5 months ago
[Misc] Added devcontainer to help vscode dev setup
github.com/vllm-project/vllm - ElefHead opened this pull request 5 months ago
github.com/vllm-project/vllm - ElefHead opened this pull request 5 months ago
[Misc] Apply a couple g++ cleanups
github.com/vllm-project/vllm - stevegrubb opened this pull request 5 months ago
github.com/vllm-project/vllm - stevegrubb opened this pull request 5 months ago
[CORE] Improvement in ranks code
github.com/vllm-project/vllm - SwapnilDreams100 opened this pull request 5 months ago
github.com/vllm-project/vllm - SwapnilDreams100 opened this pull request 5 months ago
[Bugfix] Add logs for all model dtype casting
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
[Misc] Keep only one implementation of the create_dummy_prompt function.
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
[Bug]: Not able to do lora inference with phi-3
github.com/vllm-project/vllm - WeiXiaoSummer opened this issue 5 months ago
github.com/vllm-project/vllm - WeiXiaoSummer opened this issue 5 months ago
[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4
github.com/vllm-project/vllm - frankxyy opened this issue 5 months ago
github.com/vllm-project/vllm - frankxyy opened this issue 5 months ago
[CI/Build] Tweak Marlin Nondeterminism Issues
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
add `TypeLogitsProcessor`
github.com/vllm-project/vllm - eitanturok opened this pull request 5 months ago
github.com/vllm-project/vllm - eitanturok opened this pull request 5 months ago
[Bugfix] Update grafana.json
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Doc] Add API reference for offline inference
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bugfix] Fix CLI arguments in OpenAI server docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16
github.com/vllm-project/vllm - victorzhz111 opened this issue 5 months ago
github.com/vllm-project/vllm - victorzhz111 opened this issue 5 months ago
[Doc]: OpenAI Server Command Line Args Broken
github.com/vllm-project/vllm - noamgat opened this issue 5 months ago
github.com/vllm-project/vllm - noamgat opened this issue 5 months ago
[Bug]: use thread after call multiple times. KeyError: request_id
github.com/vllm-project/vllm - xubzhlin opened this issue 5 months ago
github.com/vllm-project/vllm - xubzhlin opened this issue 5 months ago
[Misc] Set block size at initialization & Fix test_model_runner
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ?
github.com/vllm-project/vllm - yyccli opened this issue 5 months ago
github.com/vllm-project/vllm - yyccli opened this issue 5 months ago
[Misc] Remove unnecessary ModelRunner imports
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Performance]: why hf is better than vllm when using benchmark throughput
github.com/vllm-project/vllm - yuki252111 opened this issue 5 months ago
github.com/vllm-project/vllm - yuki252111 opened this issue 5 months ago
[Feature]: Supporting a version of Consistency LLM
github.com/vllm-project/vllm - usaxena-asapp opened this issue 5 months ago
github.com/vllm-project/vllm - usaxena-asapp opened this issue 5 months ago
[Performance]: large rate of decrease in generation throughput when SamplingParams.logprobs increases
github.com/vllm-project/vllm - jeffrey-fong opened this issue 5 months ago
github.com/vllm-project/vllm - jeffrey-fong opened this issue 5 months ago
[Performance]: benchmarking vllm copy kernel and pytorch index copy
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
[Bug]: Chunked prefill returning gibberish in some cases.
github.com/vllm-project/vllm - fmmoret opened this issue 5 months ago
github.com/vllm-project/vllm - fmmoret opened this issue 5 months ago
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
[Feature]: bind python and c++ through tools other than pybind11
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
Installation with CPU with errors
github.com/vllm-project/vllm - ming-ddtechcg opened this issue 5 months ago
github.com/vllm-project/vllm - ming-ddtechcg opened this issue 5 months ago
[Kernel] [FP8] Improve FP8 linear layer performance
github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago
github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago
[Core] Implement sharded state loader
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
[Frontend] OpenAI API server: Do not add bos token by default when encoding
github.com/vllm-project/vllm - bofenghuang opened this pull request 5 months ago
github.com/vllm-project/vllm - bofenghuang opened this pull request 5 months ago
[Misc] Add OpenTelemetry support
github.com/vllm-project/vllm - ronensc opened this pull request 5 months ago
github.com/vllm-project/vllm - ronensc opened this pull request 5 months ago
[Doc]: API reference for LLM class
github.com/vllm-project/vllm - zplizzi opened this issue 5 months ago
github.com/vllm-project/vllm - zplizzi opened this issue 5 months ago
[Usage]: Get time statistics with each request
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
[Bug]: assert parts[0] == "base_model" AssertionError
github.com/vllm-project/vllm - Edisonwei54 opened this issue 5 months ago
github.com/vllm-project/vllm - Edisonwei54 opened this issue 5 months ago
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
[Feature]: support for aixcoder
github.com/vllm-project/vllm - chucksylar opened this issue 5 months ago
github.com/vllm-project/vllm - chucksylar opened this issue 5 months ago
[Feature]: support lora such as qwen-7b and qwen1.5
github.com/vllm-project/vllm - kynow2 opened this issue 5 months ago
github.com/vllm-project/vllm - kynow2 opened this issue 5 months ago
[Frontend] Move async logic outside of constructor
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs
github.com/vllm-project/vllm - cadedaniel opened this pull request 5 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 5 months ago
[Feature]: Support for a draft model that takes inputs from base model (to support Medusa/EAGLE/Hydra)
github.com/vllm-project/vllm - abhigoyal1997 opened this issue 5 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this issue 5 months ago
[Misc] enable DynamicNTKScalingRotaryEmbedding YaRNScalingRotaryEmbedding test case
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
[Bug]: Unable to use vLLM for serving fine tuned mistral model
github.com/vllm-project/vllm - praveen-kanamarlapudi opened this issue 5 months ago
github.com/vllm-project/vllm - praveen-kanamarlapudi opened this issue 5 months ago
[RFC]: Inline Golden (Expected) Tests
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
[Core][Distributed] support both cpu and device tensor in broadcast tensor dict
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used
github.com/vllm-project/vllm - alexeykondrat opened this pull request 5 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request 5 months ago