Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
Modify README to include info on loading LLaMA
zhuohan123 opened this pull request almost 2 years ago
zhuohan123 opened this pull request almost 2 years ago
Optimize tensor parallel execution speed
zhuohan123 opened this pull request almost 2 years ago
zhuohan123 opened this pull request almost 2 years ago
Add custom kernel for RMS normalization
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Merge QKV into one linear layer
zhuohan123 opened this pull request almost 2 years ago
zhuohan123 opened this pull request almost 2 years ago
Implement custom kernel for LLaMA rotary embedding
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Refactor the test code for attention kernels
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Implement preemption via recomputation & Refactor scheduling logic
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Add cache watermark to avoid frequent cache eviction
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
FastAPI-based working frontend
zhuohan123 opened this pull request almost 2 years ago
zhuohan123 opened this pull request almost 2 years ago
Implement LLaMA
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Add miscellaneous updates
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Support beam search & parallel generation
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Automatically configure KV cache size
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Fix a bug in 1D input shape
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Use FlashAttention for `multi_query_kv_attention`
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Implement `single_query_cached_kv_attention` kernel
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago
Support tensor parallel
zhuohan123 opened this pull request almost 2 years ago
zhuohan123 opened this pull request almost 2 years ago
Fix a bug in tying OPT embeddings
WoosukKwon opened this pull request almost 2 years ago
WoosukKwon opened this pull request almost 2 years ago