Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Misc]: Understanding Batching Mechanism in Prefill and Decode Phases
github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago
github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago
[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes
github.com/vllm-project/vllm - achandrasekar opened this issue 5 months ago
github.com/vllm-project/vllm - achandrasekar opened this issue 5 months ago
[Model] Enable FP8 QKV in MoE and refine kernel tuning script
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[Core] Change LoRA embedding sharding to support loading methods
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
[Kernel] Dynamic Per-Token Activation Quantization
github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago
[Kernel][RFC] Refactor the punica kernel based on Triton
github.com/vllm-project/vllm - jeejeelee opened this pull request 5 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 5 months ago
[Bug]: 英伟达最新驱动555.85,vllm运行报错
github.com/vllm-project/vllm - gaye746560359 opened this issue 5 months ago
github.com/vllm-project/vllm - gaye746560359 opened this issue 5 months ago
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
[Misc]: LLM is responding with advertisement
github.com/vllm-project/vllm - Pocoyo7798 opened this issue 5 months ago
github.com/vllm-project/vllm - Pocoyo7798 opened this issue 5 months ago
[FRONTEND] OpenAI `tools` support named functions
github.com/vllm-project/vllm - br3no opened this pull request 5 months ago
github.com/vllm-project/vllm - br3no opened this pull request 5 months ago
[Bugfix] logprobs is not compatible with the OpenAI spec #4795
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
[Bug]: Command-R incorrect output contains `<EOS_TOKEN>` and seems to do text prediction rather than conversation
github.com/vllm-project/vllm - epignatelli opened this issue 5 months ago
github.com/vllm-project/vllm - epignatelli opened this issue 5 months ago
[BUGFIX] [FRONTEND] Correct chat logprobs
github.com/vllm-project/vllm - br3no opened this pull request 5 months ago
github.com/vllm-project/vllm - br3no opened this pull request 5 months ago
[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response.
github.com/vllm-project/vllm - fengshansi opened this issue 5 months ago
github.com/vllm-project/vllm - fengshansi opened this issue 5 months ago
[Bugfix][Frontend] Cleanup "fix chat logprobs"
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Kernel] Initial commit containing new Triton kernels for multi lora serving.
github.com/vllm-project/vllm - FurtherAI opened this pull request 5 months ago
github.com/vllm-project/vllm - FurtherAI opened this pull request 5 months ago
[Bug]: Wrong results in LangChain integration
github.com/vllm-project/vllm - Warit314 opened this issue 5 months ago
github.com/vllm-project/vllm - Warit314 opened this issue 5 months ago
[Bug]: Mistral 7b inst v0.3 fails to run
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
github.com/vllm-project/vllm - yaronr opened this issue 5 months ago
[Bug]: UnboundLocalError: local variable 'lora_b_k' referenced before assignment
github.com/vllm-project/vllm - Stealthwriter opened this issue 5 months ago
github.com/vllm-project/vllm - Stealthwriter opened this issue 5 months ago
[Core][2/N] Helpers for PP
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
Marlin moe integration
github.com/vllm-project/vllm - ElizaWszola opened this pull request 5 months ago
github.com/vllm-project/vllm - ElizaWszola opened this pull request 5 months ago
[Model] Add base class for LoRA-supported models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[mypy] Enable type checking for test directory
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding
github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago
[Help wanted] [Spec decode]: Increase acceptance rate via Medusa's typical acceptance
github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 5 months ago
[Bug]: inferences are not the same with batch mode when using
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
[Bug]: enable_chunked_prefill feature hangs on AMD Radeon PRO W7900 (gfx1100)
github.com/vllm-project/vllm - hongxiayang opened this issue 5 months ago
github.com/vllm-project/vllm - hongxiayang opened this issue 5 months ago
[Doc] add ccache guide in doc
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Usage]: Run local models using vLLM
github.com/vllm-project/vllm - bibutikoley opened this issue 5 months ago
github.com/vllm-project/vllm - bibutikoley opened this issue 5 months ago
[New Model]: tiiuae/falcon-11B
github.com/vllm-project/vllm - s-smits opened this issue 5 months ago
github.com/vllm-project/vllm - s-smits opened this issue 5 months ago
[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is not defined
github.com/vllm-project/vllm - LetianLee opened this pull request 5 months ago
github.com/vllm-project/vllm - LetianLee opened this pull request 5 months ago
[Bug]: OpenAI LogProbs format for Chat-Completion is incorrect
github.com/vllm-project/vllm - br3no opened this issue 5 months ago
github.com/vllm-project/vllm - br3no opened this issue 5 months ago
[Performance]: Vllm performance on L40s GPU
github.com/vllm-project/vllm - warlock135 opened this issue 5 months ago
github.com/vllm-project/vllm - warlock135 opened this issue 5 months ago
[Bugfix] Adds outlines performance improvement
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago
[Bugfix] Fix Mistral v0.3 Weight Loading
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Misc] Improve organization of utility and test code
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Feature]: Tensor Parallelism with non divisble amount of attention heads
github.com/vllm-project/vllm - NadavShmayo opened this issue 5 months ago
github.com/vllm-project/vllm - NadavShmayo opened this issue 5 months ago
[Dynamic Spec Decoding] Minor fix for disabling speculative decoding
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 5 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 5 months ago
[Bug]: Confuse with ray implements
github.com/vllm-project/vllm - vincent-pli opened this issue 5 months ago
github.com/vllm-project/vllm - vincent-pli opened this issue 5 months ago
[Performance]: Splitting model across GPUs with varying vRAM
github.com/vllm-project/vllm - ccruttjr opened this issue 5 months ago
github.com/vllm-project/vllm - ccruttjr opened this issue 5 months ago
[Bugfix] fix sharded state loader for lora
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
github.com/vllm-project/vllm - aurickq opened this pull request 5 months ago
[Usage]: There is no response after the "GPU P2P capability or P2P test failed" warning is displayed. What can I do?
github.com/vllm-project/vllm - wzz981 opened this issue 5 months ago
github.com/vllm-project/vllm - wzz981 opened this issue 5 months ago
[Feature]: Chunked prefill + lora
github.com/vllm-project/vllm - rkooo567 opened this issue 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this issue 5 months ago
[WIP] Make chunekd prefill work with lora
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago
[New Model]: microsoft/Phi-3-small-128k-instruct
github.com/vllm-project/vllm - PeterAronZentai opened this issue 5 months ago
github.com/vllm-project/vllm - PeterAronZentai opened this issue 5 months ago
[Core][Distributed] improve p2p access check
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Frontend] [Core] Support for sharded tensorized models
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 5 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 5 months ago
[Bug]: Loading mistral-7B-instruct-v03 KeyError: 'layers.0.attention.wk.weight'
github.com/vllm-project/vllm - timbmg opened this issue 5 months ago
github.com/vllm-project/vllm - timbmg opened this issue 5 months ago
[Core][1/N] Support PP PyNCCL Groups
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
[Model] Initialize Phi-3-vision support
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Core]: Option To Use Prompt Token Ids Inside Logits Processor
github.com/vllm-project/vllm - kezouke opened this pull request 5 months ago
github.com/vllm-project/vllm - kezouke opened this pull request 5 months ago
[Usage]: How to start vLLM on a particular GPU?
github.com/vllm-project/vllm - kstyagi23 opened this issue 5 months ago
github.com/vllm-project/vllm - kstyagi23 opened this issue 5 months ago
[Speculative Decoding] Medusa Implementation with Top-1 proposer
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago
[Hardware][Intel] Optimize CPU backend and add more performance tips
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago
[Usage]: Is it possible to start 8 tp=1 LLMEngine on a 8-GPU machine?
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago
github.com/vllm-project/vllm - sfc-gh-zhwang opened this issue 5 months ago
[Feature]: Please optimize the output print info about time count.
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
github.com/vllm-project/vllm - Zhenzhong1 opened this issue 5 months ago
[Feature]: support `stream_options` option
github.com/vllm-project/vllm - NiuBlibing opened this issue 5 months ago
github.com/vllm-project/vllm - NiuBlibing opened this issue 5 months ago
[ROCm][AMD] Use pytorch sdpa math backend to do naive attention
github.com/vllm-project/vllm - hongxiayang opened this pull request 5 months ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 5 months ago
[Bugfix] Pass in CPU selector from worker
github.com/vllm-project/vllm - casassg opened this pull request 5 months ago
github.com/vllm-project/vllm - casassg opened this pull request 5 months ago
[Installation]: editable install fails with setuptools 70.0.0
github.com/vllm-project/vllm - 5cp opened this issue 5 months ago
github.com/vllm-project/vllm - 5cp opened this issue 5 months ago
[Misc] Take user preference in attention selector
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[FastAPI related] Built an API, ended into torch.cuda.OutOfMemoryError: CUDA out of memory
github.com/vllm-project/vllm - rsong0606 opened this issue 5 months ago
github.com/vllm-project/vllm - rsong0606 opened this issue 5 months ago
[Feature]: microsoft/Phi-3-vision-128k-instruct Vision support
github.com/vllm-project/vllm - pseudotensor opened this issue 5 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 5 months ago
[Feature]: Support loading of sharded vLLM serialized models with Tensorizer
github.com/vllm-project/vllm - tjohnson31415 opened this issue 5 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this issue 5 months ago
[model] AddRelPositionMultiHeadedAttention
github.com/vllm-project/vllm - rajveer43 opened this pull request 5 months ago
github.com/vllm-project/vllm - rajveer43 opened this pull request 5 months ago
[Feature]: automatically select distributed inference backend
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 5 months ago
[Kernel] Fixup for CUTLASS kernels in CUDA graphs
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[New Model]: Phi-3-medium-128k-instruct support
github.com/vllm-project/vllm - ai8hyf opened this issue 5 months ago
github.com/vllm-project/vllm - ai8hyf opened this issue 5 months ago
[Misc] Small refactor on create_lora_manager()
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
[Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X
github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago
github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago
[Installation]: Reduce Image size when installing wheel with cuda 11.8
github.com/vllm-project/vllm - ch9hn opened this issue 5 months ago
github.com/vllm-project/vllm - ch9hn opened this issue 5 months ago
[Feature]: pre release or nightly builds
github.com/vllm-project/vllm - nivibilla opened this issue 5 months ago
github.com/vllm-project/vllm - nivibilla opened this issue 5 months ago
[Model] MLPSpeculator speculative decoding support
github.com/vllm-project/vllm - JRosenkranz opened this pull request 5 months ago
github.com/vllm-project/vllm - JRosenkranz opened this pull request 5 months ago
[Bug]: Error executing method load_model. This might cause deadlock in distributed execution.
github.com/vllm-project/vllm - userandpass opened this issue 5 months ago
github.com/vllm-project/vllm - userandpass opened this issue 5 months ago
[CI/Build] Codespell ignore `build/` directory
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
[Bugfix][Kernel] Add head size check for attention backend selection
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Feature]: Support for MiniCPM-Llama3-V-2_5 the Multi-modal LLM
github.com/vllm-project/vllm - wizd opened this issue 5 months ago
github.com/vllm-project/vllm - wizd opened this issue 5 months ago
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support)
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
[Build/CI] set CMAKE_BUILD_TYPE=release on wheel build
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 5 months ago
[Bug]: run vllm api with docker
github.com/vllm-project/vllm - shudct opened this issue 5 months ago
github.com/vllm-project/vllm - shudct opened this issue 5 months ago
[Misc] Support HF Hub remote loading for LoRA adapters
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Usage]: How to reload model when tensor_parallel_size > 1 ?
github.com/vllm-project/vllm - qy1026 opened this issue 5 months ago
github.com/vllm-project/vllm - qy1026 opened this issue 5 months ago
text_generation_router::infer: router/src/infer.rs:130: no permits available
github.com/vllm-project/vllm - Ling-CF opened this issue 5 months ago
github.com/vllm-project/vllm - Ling-CF opened this issue 5 months ago
[Bugfix] Fix flag name for `max_seq_len_to_capture`
github.com/vllm-project/vllm - kerthcet opened this pull request 5 months ago
github.com/vllm-project/vllm - kerthcet opened this pull request 5 months ago
[Bug]: speculative decoding got `shape mismatch` error with n>1 and random sample
github.com/vllm-project/vllm - zxdvd opened this issue 5 months ago
github.com/vllm-project/vllm - zxdvd opened this issue 5 months ago
[Performance] [Speculative decoding] Support draft model on different tensor-parallel size than target model
github.com/vllm-project/vllm - GeauxEric opened this pull request 5 months ago
github.com/vllm-project/vllm - GeauxEric opened this pull request 5 months ago
[Frontend] Add prompt token ids into logit processor
github.com/vllm-project/vllm - xz-liu opened this pull request 5 months ago
github.com/vllm-project/vllm - xz-liu opened this pull request 5 months ago
[Misc] Minor change to improve FP8 thoughtput by 2%
github.com/vllm-project/vllm - elfiegg opened this pull request 5 months ago
github.com/vllm-project/vllm - elfiegg opened this pull request 5 months ago
[Model] add rope_scaling support for qwen2
github.com/vllm-project/vllm - hzhwcmhf opened this pull request 5 months ago
github.com/vllm-project/vllm - hzhwcmhf opened this pull request 5 months ago
[Feature]: Add prompt token ids into logit processor?
github.com/vllm-project/vllm - xz-liu opened this issue 5 months ago
github.com/vllm-project/vllm - xz-liu opened this issue 5 months ago
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer
github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago
github.com/vllm-project/vllm - divakar-amd opened this pull request 5 months ago
[RFC]: Postmerge performance suite
github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago
[Docs] Add acknowledgment for sponsors
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
[Bug]: Embedding model not working with tensor parallel
github.com/vllm-project/vllm - Vincent-Li-9701 opened this issue 5 months ago
github.com/vllm-project/vllm - Vincent-Li-9701 opened this issue 5 months ago
[Build/CI] Switching to ROCm v. 6.1 in Dockerfile.rocm
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 5 months ago