Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective - Host: opensource - https://opencollective.com/vllm - Code: https://github.com/vllm-project/vllm

Log system stats

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Update example prompts in `simple_server.py`

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support various sampling parameters

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Make sure the system can run on T4 and V100

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Clean up the scheduler code

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add a system logger

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use slow tokenizer for LLaMA

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Enhance model loader

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor system architecture

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use runtime profiling to replace manual memory analyzers

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Bug in LLaMA fast tokenizer

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
[Minor] Fix a dtype bug

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Specify python package dependencies in requirements.txt

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Clean up Megatron-LM code

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add license

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Implement client API

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add docstring

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Use mypy

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Support FP32

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Dangerous floating point comparison

github.com/vllm-project/vllm - merrymercy opened this issue almost 2 years ago
Replace FlashAttention with xformers

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Decrease the default size of swap space

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Fix a bug in attention kernel

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add documents on how to add new models

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Enhance model mapper

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Use dtype from model config & Add Dolly V2

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support BLOOM

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add support for GPT-2

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Profile memory usage

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Use pytest for unit tests

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Add dependencies in setup.py

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Support GPT-2

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Support bfloat16 data type

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor attention kernels

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
New weight loader without np copy

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add an option to launch cacheflow without ray

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add support for GPT-NeoX (Pythia)

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add plot scripts

github.com/vllm-project/vllm - Ying1123 opened this pull request almost 2 years ago
Improve Weight Loading

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Frontend Improvements

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Turn shareGPT data into a standard benchmark

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Fix the rushed out multi-query kernel

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Add support for Stable-LM and OpenAssistant

github.com/vllm-project/vllm - WoosukKwon opened this issue almost 2 years ago
Modify the current PyTorch model to C++

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
[DO NOT MERGE] Orca prefix sharing benchmark

github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix sharing (bug fixed)

github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
[DO NOT MERGE] Prefix stash siyuan

github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
Support various block sizes

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement prefix sharing

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add chatbot benchmark scripts

github.com/vllm-project/vllm - merrymercy opened this pull request almost 2 years ago
Support block size 32

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Fix timeout error in the FastAPI frontend

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add an option to use dummy weights

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement block copy kernel to optimize beam search

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
[DO NOT MERGE] Hao integration

github.com/vllm-project/vllm - zhisbug opened this pull request almost 2 years ago
Memcpy kernel for flash attention

github.com/vllm-project/vllm - suquark opened this pull request almost 2 years ago
Fix potential bugs in FastAPI frontend and add comments

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add CUDA graph-based all reduce launcher

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Batched benchmark script and more detailed benchmark metrics

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Tensor Parallel profiling result

github.com/vllm-project/vllm - zhuohan123 opened this issue almost 2 years ago
Add ninja to dependency

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Optimize data movement

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use FP32 for log probabilities

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Modify README to include info on loading LLaMA

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Optimize tensor parallel execution speed

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Add custom kernel for RMS normalization

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Merge QKV into one linear layer

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Implement custom kernel for LLaMA rotary embedding

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Refactor the test code for attention kernels

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add cache watermark to avoid frequent cache eviction

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
FastAPI-based working frontend

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Implement LLaMA

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Add miscellaneous updates

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support beam search & parallel generation

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Automatically configure KV cache size

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Fix a bug in 1D input shape

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Use FlashAttention for `multi_query_kv_attention`

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Implement `single_query_cached_kv_attention` kernel

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago
Support tensor parallel

github.com/vllm-project/vllm - zhuohan123 opened this pull request almost 2 years ago
Fix a bug in tying OPT embeddings

github.com/vllm-project/vllm - WoosukKwon opened this pull request almost 2 years ago