Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
Log system stats
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Update example prompts in `simple_server.py`
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support various sampling parameters
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Make sure the system can run on T4 and V100
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Clean up the scheduler code
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add a system logger
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use slow tokenizer for LLaMA
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Enhance model loader
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor system architecture
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use runtime profiling to replace manual memory analyzers
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Bug in LLaMA fast tokenizer
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
[Minor] Fix a dtype bug
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Specify python package dependencies in requirements.txt
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Clean up Megatron-LM code
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Dangerous floating point comparison
github.com/vllm-project/vllm - merrymercy opened this issue almost 2 years ago
github.com/vllm-project/vllm - merrymercy opened this issue almost 2 years ago
Replace FlashAttention with xformers
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Decrease the default size of swap space
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Fix a bug in attention kernel
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use O3 optimization instead of O2 for CUDA compilation?
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
A critical bug in attention kernel after refactoring
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add documents on how to add new models
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Use dtype from model config & Add Dolly V2
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement a system logger to print system status and warnings
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add support for GPT-2
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use pytest for unit tests
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add code formatting script & Add CI to check code format
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add dependencies in setup.py
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Support bfloat16 data type
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor attention kernels
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
New weight loader without np copy
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add an option to launch cacheflow without ray
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add support for GPT-NeoX (Pythia)
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add plot scripts
github.com/vllm-project/vllm - Ying1123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - Ying1123 opened this pull request almost 2 years ago
Improve Weight Loading
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Frontend Improvements
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Debug the optimal upper-bound performance for swapping (0-cost swapping).
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Turn shareGPT data into a standard benchmark
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Fix the rushed out multi-query kernel
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Add support for Stable-LM and OpenAssistant
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Modify the current PyTorch model to C++
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
[DO NOT MERGE] Orca prefix sharing benchmark
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix sharing (bug fixed)
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix stash siyuan
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
Support various block sizes
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement prefix sharing
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add chatbot benchmark scripts
github.com/vllm-project/vllm - merrymercy opened this pull request almost 2 years ago
github.com/vllm-project/vllm - merrymercy opened this pull request almost 2 years ago
Support block size 32
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Fix timeout error in the FastAPI frontend
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add an option to use dummy weights
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement block copy kernel to optimize beam search
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
[DO NOT MERGE] Hao integration
github.com/vllm-project/vllm - zhisbug opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhisbug opened this pull request almost 2 years ago
Add a script for serving experiments & Collect system stats in scheduler
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Memcpy kernel for flash attention
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
Fix potential bugs in FastAPI frontend and add comments
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add query stride to multi_query_cached_kv_attention & Add kernel benchmark script
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add CUDA graph-based all reduce launcher
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Batched benchmark script and more detailed benchmark metrics
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Basic attention kernel that supports cached KV + (multi-)prompts
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
Add an option to disable Ray when using a single GPU
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Tensor Parallel profiling result
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Add ninja to dependency
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Optimize data movement
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use FP32 for log probabilities
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Modify README to include info on loading LLaMA
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Optimize tensor parallel execution speed
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add custom kernel for RMS normalization
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Merge QKV into one linear layer
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Implement custom kernel for LLaMA rotary embedding
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor the test code for attention kernels
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement preemption via recomputation & Refactor scheduling logic
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add cache watermark to avoid frequent cache eviction
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
FastAPI-based working frontend
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Implement LLaMA
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add miscellaneous updates
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support beam search & parallel generation
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Automatically configure KV cache size
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Fix a bug in 1D input shape
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use FlashAttention for `multi_query_kv_attention`
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement `single_query_cached_kv_attention` kernel
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support tensor parallel
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Fix a bug in tying OPT embeddings
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago