Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

Log system stats

WoosukKwon opened this pull request over 1 year ago
Update example prompts in `simple_server.py`

WoosukKwon opened this pull request over 1 year ago
Support various sampling parameters

WoosukKwon opened this issue over 1 year ago
Make sure the system can run on T4 and V100

WoosukKwon opened this issue over 1 year ago
Clean up the scheduler code

WoosukKwon opened this issue over 1 year ago
Add a system logger

WoosukKwon opened this pull request over 1 year ago
Use slow tokenizer for LLaMA

WoosukKwon opened this pull request over 1 year ago
Enhance model loader

WoosukKwon opened this pull request over 1 year ago
Refactor system architecture

WoosukKwon opened this pull request over 1 year ago
Use runtime profiling to replace manual memory analyzers

zhuohan123 opened this pull request over 1 year ago
Bug in LLaMA fast tokenizer

WoosukKwon opened this issue over 1 year ago
[Minor] Fix a dtype bug

WoosukKwon opened this pull request over 1 year ago
Specify python package dependencies in requirements.txt

WoosukKwon opened this pull request over 1 year ago
Clean up Megatron-LM code

WoosukKwon opened this issue over 1 year ago
Add license

WoosukKwon opened this issue over 1 year ago
Implement client API

WoosukKwon opened this issue over 1 year ago
Add docstring

zhuohan123 opened this issue over 1 year ago
Use mypy

WoosukKwon opened this issue over 1 year ago
Support FP32

WoosukKwon opened this issue over 1 year ago
Dangerous floating point comparison

merrymercy opened this issue over 1 year ago
Replace FlashAttention with xformers

WoosukKwon opened this pull request over 1 year ago
Decrease the default size of swap space

WoosukKwon opened this issue over 1 year ago
Fix a bug in attention kernel

WoosukKwon opened this pull request over 1 year ago
Use O3 optimization instead of O2 for CUDA compilation?

WoosukKwon opened this issue over 1 year ago
A critical bug in attention kernel after refactoring

WoosukKwon opened this issue over 1 year ago
Add documents on how to add new models

zhuohan123 opened this issue over 1 year ago
Enhance model mapper

WoosukKwon opened this issue over 1 year ago
Use dtype from model config & Add Dolly V2

WoosukKwon opened this pull request over 1 year ago
Implement a system logger to print system status and warnings

WoosukKwon opened this issue over 1 year ago
Support BLOOM

WoosukKwon opened this issue over 1 year ago
Add support for GPT-2

WoosukKwon opened this pull request over 1 year ago
Profile memory usage

zhuohan123 opened this issue over 1 year ago
Use pytest for unit tests

WoosukKwon opened this issue over 1 year ago
Add code formatting script & Add CI to check code format

WoosukKwon opened this issue over 1 year ago
Add dependencies in setup.py

WoosukKwon opened this issue over 1 year ago
Support GPT-2

WoosukKwon opened this issue over 1 year ago
Support bfloat16 data type

WoosukKwon opened this pull request over 1 year ago
Refactor attention kernels

WoosukKwon opened this pull request over 1 year ago
New weight loader without np copy

zhuohan123 opened this pull request over 1 year ago
Add an option to launch cacheflow without ray

zhuohan123 opened this pull request almost 2 years ago
Add support for GPT-NeoX (Pythia)

WoosukKwon opened this pull request almost 2 years ago
Add plot scripts

Ying1123 opened this pull request almost 2 years ago
Improve Weight Loading

zhuohan123 opened this issue almost 2 years ago
Frontend Improvements

zhuohan123 opened this issue almost 2 years ago
Debug the optimal upper-bound performance for swapping (0-cost swapping).

zhuohan123 opened this issue almost 2 years ago
Turn shareGPT data into a standard benchmark

zhuohan123 opened this issue almost 2 years ago
Fix the rushed out multi-query kernel

zhuohan123 opened this issue almost 2 years ago
Add support for Stable-LM and OpenAssistant

WoosukKwon opened this issue almost 2 years ago
Modify the current PyTorch model to C++

zhuohan123 opened this issue almost 2 years ago
[DO NOT MERGE] Orca prefix sharing benchmark

suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix sharing (bug fixed)

suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix stash siyuan

suquark opened this pull request almost 2 years ago
Support various block sizes

WoosukKwon opened this pull request almost 2 years ago
Implement prefix sharing

WoosukKwon opened this pull request almost 2 years ago
Add chatbot benchmark scripts

merrymercy opened this pull request almost 2 years ago
Support block size 32

WoosukKwon opened this pull request almost 2 years ago
Fix timeout error in the FastAPI frontend

zhuohan123 opened this pull request almost 2 years ago
Add an option to use dummy weights

WoosukKwon opened this pull request almost 2 years ago
Implement block copy kernel to optimize beam search

WoosukKwon opened this pull request almost 2 years ago
[DO NOT MERGE] Hao integration

zhisbug opened this pull request almost 2 years ago
Add a script for serving experiments & Collect system stats in scheduler

WoosukKwon opened this pull request almost 2 years ago
Memcpy kernel for flash attention

suquark opened this pull request almost 2 years ago
Fix potential bugs in FastAPI frontend and add comments

zhuohan123 opened this pull request almost 2 years ago
Add query stride to multi_query_cached_kv_attention & Add kernel benchmark script

WoosukKwon opened this pull request almost 2 years ago
Add CUDA graph-based all reduce launcher

WoosukKwon opened this pull request almost 2 years ago
Batched benchmark script and more detailed benchmark metrics

zhuohan123 opened this pull request almost 2 years ago
Basic attention kernel that supports cached KV + (multi-)prompts

suquark opened this pull request almost 2 years ago
Add an option to disable Ray when using a single GPU

WoosukKwon opened this issue almost 2 years ago
Tensor Parallel profiling result

zhuohan123 opened this issue almost 2 years ago
Add ninja to dependency

WoosukKwon opened this pull request almost 2 years ago
Optimize data movement

WoosukKwon opened this pull request almost 2 years ago
Use FP32 for log probabilities

WoosukKwon opened this pull request almost 2 years ago
Modify README to include info on loading LLaMA

zhuohan123 opened this pull request almost 2 years ago
Optimize tensor parallel execution speed

zhuohan123 opened this pull request almost 2 years ago
Add custom kernel for RMS normalization

WoosukKwon opened this pull request almost 2 years ago
Merge QKV into one linear layer

zhuohan123 opened this pull request almost 2 years ago
Implement custom kernel for LLaMA rotary embedding

WoosukKwon opened this pull request almost 2 years ago
Refactor the test code for attention kernels

WoosukKwon opened this pull request almost 2 years ago
Implement preemption via recomputation & Refactor scheduling logic

WoosukKwon opened this pull request almost 2 years ago
Add cache watermark to avoid frequent cache eviction

WoosukKwon opened this pull request almost 2 years ago
FastAPI-based working frontend

zhuohan123 opened this pull request almost 2 years ago
Implement LLaMA

WoosukKwon opened this pull request almost 2 years ago
Add miscellaneous updates

WoosukKwon opened this pull request almost 2 years ago
Support beam search & parallel generation

WoosukKwon opened this pull request almost 2 years ago
Automatically configure KV cache size

WoosukKwon opened this pull request almost 2 years ago
Fix a bug in 1D input shape

WoosukKwon opened this pull request almost 2 years ago
Use FlashAttention for `multi_query_kv_attention`

WoosukKwon opened this pull request almost 2 years ago
Implement `single_query_cached_kv_attention` kernel

WoosukKwon opened this pull request almost 2 years ago
Support tensor parallel

zhuohan123 opened this pull request almost 2 years ago
Fix a bug in tying OPT embeddings

WoosukKwon opened this pull request almost 2 years ago