Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

Modify README to include info on loading LLaMA

zhuohan123 opened this pull request almost 2 years ago
Optimize tensor parallel execution speed

zhuohan123 opened this pull request almost 2 years ago
Add custom kernel for RMS normalization

WoosukKwon opened this pull request almost 2 years ago
Merge QKV into one linear layer

zhuohan123 opened this pull request almost 2 years ago
Implement custom kernel for LLaMA rotary embedding

WoosukKwon opened this pull request almost 2 years ago
Refactor the test code for attention kernels

WoosukKwon opened this pull request almost 2 years ago
Implement preemption via recomputation & Refactor scheduling logic

WoosukKwon opened this pull request almost 2 years ago
Add cache watermark to avoid frequent cache eviction

WoosukKwon opened this pull request almost 2 years ago
FastAPI-based working frontend

zhuohan123 opened this pull request almost 2 years ago
Implement LLaMA

WoosukKwon opened this pull request almost 2 years ago
Add miscellaneous updates

WoosukKwon opened this pull request almost 2 years ago
Support beam search & parallel generation

WoosukKwon opened this pull request almost 2 years ago
Automatically configure KV cache size

WoosukKwon opened this pull request almost 2 years ago
Fix a bug in 1D input shape

WoosukKwon opened this pull request almost 2 years ago
Use FlashAttention for `multi_query_kv_attention`

WoosukKwon opened this pull request almost 2 years ago
Implement `single_query_cached_kv_attention` kernel

WoosukKwon opened this pull request almost 2 years ago
Support tensor parallel

zhuohan123 opened this pull request almost 2 years ago
Fix a bug in tying OPT embeddings

WoosukKwon opened this pull request almost 2 years ago