Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

Modify README to include info on loading LLaMA

zhuohan123 opened this pull request almost 2 years ago

Optimize tensor parallel execution speed

zhuohan123 opened this pull request almost 2 years ago

Add custom kernel for RMS normalization

WoosukKwon opened this pull request almost 2 years ago

Merge QKV into one linear layer

zhuohan123 opened this pull request almost 2 years ago

Implement custom kernel for LLaMA rotary embedding

WoosukKwon opened this pull request almost 2 years ago

Refactor the test code for attention kernels

WoosukKwon opened this pull request almost 2 years ago

Implement preemption via recomputation & Refactor scheduling logic

WoosukKwon opened this pull request almost 2 years ago

Add cache watermark to avoid frequent cache eviction

WoosukKwon opened this pull request almost 2 years ago

FastAPI-based working frontend

zhuohan123 opened this pull request almost 2 years ago

Implement LLaMA

WoosukKwon opened this pull request almost 2 years ago

Add miscellaneous updates

WoosukKwon opened this pull request almost 2 years ago

Support beam search & parallel generation

WoosukKwon opened this pull request almost 2 years ago

Automatically configure KV cache size

WoosukKwon opened this pull request almost 2 years ago

Fix a bug in 1D input shape

WoosukKwon opened this pull request almost 2 years ago

Use FlashAttention for `multi_query_kv_attention`

WoosukKwon opened this pull request almost 2 years ago

Implement `single_query_cached_kv_attention` kernel

WoosukKwon opened this pull request almost 2 years ago

Support tensor parallel

zhuohan123 opened this pull request almost 2 years ago

Fix a bug in tying OPT embeddings

WoosukKwon opened this pull request almost 2 years ago