Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Misc] Use scalar type to dispatch to diferent `gptq_marlin` kernels
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 2 months ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 2 months ago
[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100
github.com/vllm-project/vllm - halexan opened this issue 2 months ago
github.com/vllm-project/vllm - halexan opened this issue 2 months ago
Add bitsandbytes fp4 support
github.com/vllm-project/vllm - thesues opened this pull request 2 months ago
github.com/vllm-project/vllm - thesues opened this pull request 2 months ago
[Kernel] Flashinfer correctness fix for v0.1.3
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
[Feature]: Testing - Use `torch.testing.assert_close` instead of `torch.allclose` as a Recommended Practice
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
[Bugfix][Kernel] Increased atol to fix failing tests
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
[Usage]: how to save sharded state?
github.com/vllm-project/vllm - aldwnesx opened this issue 2 months ago
github.com/vllm-project/vllm - aldwnesx opened this issue 2 months ago
[Bug]: vllm hangs after model download / load
github.com/vllm-project/vllm - ArtificialEU opened this issue 2 months ago
github.com/vllm-project/vllm - ArtificialEU opened this issue 2 months ago
[Core] Use Appropriate `torch.dtype` for FP8 KV Cache
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago
[Usage]: Acceptance rate for Speculative Decoding
github.com/vllm-project/vllm - itsdaniele opened this issue 2 months ago
github.com/vllm-project/vllm - itsdaniele opened this issue 2 months ago
[Bug]: EfficientQAT GPTQ Does load but does not output through api
github.com/vllm-project/vllm - derpyhue opened this issue 2 months ago
github.com/vllm-project/vllm - derpyhue opened this issue 2 months ago
[Feature]: For Meta-Llama-3.1-70B-Instruct model, no usage info included while stream equal to True
github.com/vllm-project/vllm - nikhilcms opened this issue 2 months ago
github.com/vllm-project/vllm - nikhilcms opened this issue 2 months ago
[CI/Build] Dockerfile.cpu improvements
github.com/vllm-project/vllm - dtrifiro opened this pull request 2 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 2 months ago
[Bug]: vllm is hang after upgrade to v0.5.4
github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago
github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago
[Usage]: When I installed vllm version 0.5.3.post1, there was a problem deploying qwen2
github.com/vllm-project/vllm - Uhao-P opened this issue 2 months ago
github.com/vllm-project/vllm - Uhao-P opened this issue 2 months ago
[Bug]: ValueError: BitAndBytes with enforce_eager = False is not supported yet.
github.com/vllm-project/vllm - XCYXHL opened this issue 2 months ago
github.com/vllm-project/vllm - XCYXHL opened this issue 2 months ago
[CI/Build] Allow building for CUDA compute capability 8.7
github.com/vllm-project/vllm - hacker1024 opened this pull request 2 months ago
github.com/vllm-project/vllm - hacker1024 opened this pull request 2 months ago
[Bugfix] Fix LoRA with PP
github.com/vllm-project/vllm - andoorve opened this pull request 2 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 2 months ago
Provided example for loading GGUF model is not working [Bug]:
github.com/vllm-project/vllm - sarthakd112 opened this issue 2 months ago
github.com/vllm-project/vllm - sarthakd112 opened this issue 2 months ago
[Bug]: "500 Internal Server Error" after upgrade to v0.5.4
github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago
github.com/vllm-project/vllm - tonyaw opened this issue 2 months ago
[Feature]: Beam Search also requires diversity
github.com/vllm-project/vllm - Jack-mi opened this issue 2 months ago
github.com/vllm-project/vllm - Jack-mi opened this issue 2 months ago
[Frontend] remove max_num_batched_tokens limit for lora
github.com/vllm-project/vllm - NiuBlibing opened this pull request 2 months ago
github.com/vllm-project/vllm - NiuBlibing opened this pull request 2 months ago
[Usage]: add mulitple lora in docker
github.com/vllm-project/vllm - chintanshrinath opened this issue 2 months ago
github.com/vllm-project/vllm - chintanshrinath opened this issue 2 months ago
[Kernel] Fix Flashinfer Correctness
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
[Bug]: Empty prompt kills vllm server (AsyncEngineDeadError: Background loop is stopped.)
github.com/vllm-project/vllm - shimizust opened this issue 2 months ago
github.com/vllm-project/vllm - shimizust opened this issue 2 months ago
[Misc] Update `gptq_marlin` to use new vLLMParameters
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
[Misc] `compressed-tensors` code reuse
github.com/vllm-project/vllm - kylesayrs opened this pull request 2 months ago
github.com/vllm-project/vllm - kylesayrs opened this pull request 2 months ago
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
[Bugfix] Fix new Llama3.1 GGUF model loading
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[New Model]:Is MiniCPM-V-2_6 supported?
github.com/vllm-project/vllm - det-tu opened this issue 2 months ago
github.com/vllm-project/vllm - det-tu opened this issue 2 months ago
[CI/Build] Pin OpenTelemetry versions and make availability errors clearer
github.com/vllm-project/vllm - ronensc opened this pull request 2 months ago
github.com/vllm-project/vllm - ronensc opened this pull request 2 months ago
[Feature]: Custom Embedding function for vector databases (Chroma & Quadrant)
github.com/vllm-project/vllm - S-M-Ammar opened this issue 2 months ago
github.com/vllm-project/vllm - S-M-Ammar opened this issue 2 months ago
[Feature]: support longer max_num_batched_tokens for lora
github.com/vllm-project/vllm - NiuBlibing opened this issue 2 months ago
github.com/vllm-project/vllm - NiuBlibing opened this issue 2 months ago
[Core] Support serving encoder/decoder models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 2 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 2 months ago
[Performance] [Speculative decoding]: Replace scoring spec tokens via batched 1-step generation by n-step prefill
github.com/vllm-project/vllm - sergeykochetkov opened this pull request 2 months ago
github.com/vllm-project/vllm - sergeykochetkov opened this pull request 2 months ago
[Bug]: Lora is incompatible with distributed pipeline parallelism
github.com/vllm-project/vllm - CNTRYROA opened this issue 2 months ago
github.com/vllm-project/vllm - CNTRYROA opened this issue 2 months ago
[OpenVINO] migrate to latest dependencies versions
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 2 months ago
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 2 months ago
[mypy] Enable following imports for entrypoints
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
[RFC]: Initial support for RBLN NPU
github.com/vllm-project/vllm - rebel-jonghewk opened this issue 2 months ago
github.com/vllm-project/vllm - rebel-jonghewk opened this issue 2 months ago
[Bug]: ngc24.05 "RuntimeError: Cannot re-initialize CUDA in forked subprocess."
github.com/vllm-project/vllm - LSC527 opened this issue 2 months ago
github.com/vllm-project/vllm - LSC527 opened this issue 2 months ago
[SpecDecode][Kernel] Use Flashinfer for Rejection Sampling in Speculative Decoding
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 2 months ago
[Usage]: Increase the maximum number of running reqs, which now seems to default to 100
github.com/vllm-project/vllm - Vodkazy opened this issue 2 months ago
github.com/vllm-project/vllm - Vodkazy opened this issue 2 months ago
[Bug]: The new version (v0.5.4) cannot load the gptq model, but the old version (vllm=0.5.3.post1) can do it.
github.com/vllm-project/vllm - ningwebbeginner opened this issue 2 months ago
github.com/vllm-project/vllm - ningwebbeginner opened this issue 2 months ago
[Bug]: Failure to instantiate mixtral 8x7b model requires restarting script
github.com/vllm-project/vllm - gnpinkert opened this issue 2 months ago
github.com/vllm-project/vllm - gnpinkert opened this issue 2 months ago
[CI/Build][ROCm] Enabling tensorizer tests for ROCm
github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago
[New Model]: LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
github.com/vllm-project/vllm - shing100 opened this issue 2 months ago
github.com/vllm-project/vllm - shing100 opened this issue 2 months ago
[wip][misc] register custom op for flash attention
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce`
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 2 months ago
[Model] Add multi-image input support for LLaVA-Next offline inference
github.com/vllm-project/vllm - zifeitong opened this pull request 2 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 2 months ago
[mypy] Enable mypy type checking for `vllm/core`
github.com/vllm-project/vllm - jberkhahn opened this pull request 2 months ago
github.com/vllm-project/vllm - jberkhahn opened this pull request 2 months ago
[Bug]: llama 3.1 70b vs mistral-large generation speed
github.com/vllm-project/vllm - ccdv-ai opened this issue 2 months ago
github.com/vllm-project/vllm - ccdv-ai opened this issue 2 months ago
[Core] Shut down aDAG workers with clean async llm engine exit
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 2 months ago
[Bug]: Unhandled tool invocations with Llama 3.1 using LangChain and OpenAI-compatible API
github.com/vllm-project/vllm - dolanp83 opened this issue 2 months ago
github.com/vllm-project/vllm - dolanp83 opened this issue 2 months ago
[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 2 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 2 months ago
[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm
github.com/vllm-project/vllm - charlifu opened this pull request 2 months ago
github.com/vllm-project/vllm - charlifu opened this pull request 2 months ago
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago
[Installation]: run vllm in the rocm docker raied: RuntimeError: No HIP GPUs are available
github.com/vllm-project/vllm - hnhyzz opened this issue 2 months ago
github.com/vllm-project/vllm - hnhyzz opened this issue 2 months ago
[Bug]: GPTQ Marlin with cpu-offload-gb fails on `0.5.4`
github.com/vllm-project/vllm - w013nad opened this issue 2 months ago
github.com/vllm-project/vllm - w013nad opened this issue 2 months ago
[Usage]: Set pipeline-parallel-size to 8 but only 1 or 2 GPUs running at the same time.
github.com/vllm-project/vllm - ybdesire opened this issue 2 months ago
github.com/vllm-project/vllm - ybdesire opened this issue 2 months ago
[Bug]: NCCL gives an error when I use tensor_parallel :RuntimeError: NCCL error: invalid usage
github.com/vllm-project/vllm - wlll123456 opened this issue 2 months ago
github.com/vllm-project/vllm - wlll123456 opened this issue 2 months ago
[Bug]: loading fp16 model as fp8 quantized caused OOM
github.com/vllm-project/vllm - AlphaINF opened this issue 2 months ago
github.com/vllm-project/vllm - AlphaINF opened this issue 2 months ago
[Model][LoRA]LoRA support added for MiniCPMV2.5
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
[Bug]: ZMQError: Address already in use (addr='tcp://127.0.0.1:5570')
github.com/vllm-project/vllm - WMeng1 opened this issue 2 months ago
github.com/vllm-project/vllm - WMeng1 opened this issue 2 months ago
[Feature]: Adjust max_model_len based on viable KV space
github.com/vllm-project/vllm - mscheong01 opened this issue 2 months ago
github.com/vllm-project/vllm - mscheong01 opened this issue 2 months ago
[Bug]: Incomplete tool calling response for pipeline-parallel vllm with ray
github.com/vllm-project/vllm - sfbemerk opened this issue 2 months ago
github.com/vllm-project/vllm - sfbemerk opened this issue 2 months ago
[Bug]: flash_attn_varlen_func() got an unexpected keyword argument 'softcap'
github.com/vllm-project/vllm - cjfcsjt opened this issue 2 months ago
github.com/vllm-project/vllm - cjfcsjt opened this issue 2 months ago
Updating LM Format Enforcer version to v0.10.6
github.com/vllm-project/vllm - noamgat opened this pull request 2 months ago
github.com/vllm-project/vllm - noamgat opened this pull request 2 months ago
[Model] Add AWQ quantization support for InternVL2 model
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[VLM][Model] TP support for ViTs
github.com/vllm-project/vllm - ChristopherCho opened this pull request 3 months ago
github.com/vllm-project/vllm - ChristopherCho opened this pull request 3 months ago
[Misc] Update dockerfile for CPU to cover protobuf installation
github.com/vllm-project/vllm - PHILO-HE opened this pull request 3 months ago
github.com/vllm-project/vllm - PHILO-HE opened this pull request 3 months ago
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago
[Bug]: Vllm crashes with asyncio.exceptions.CancelledError
github.com/vllm-project/vllm - drawnwren opened this issue 3 months ago
github.com/vllm-project/vllm - drawnwren opened this issue 3 months ago
Suri vllm cpchung
github.com/vllm-project/vllm - chakpongchung opened this pull request 3 months ago
github.com/vllm-project/vllm - chakpongchung opened this pull request 3 months ago
[Models] Add remaining model PP support
github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago
[Bug]: vLLM latest version on Inf2 fails
github.com/vllm-project/vllm - ratnopamc opened this issue 3 months ago
github.com/vllm-project/vllm - ratnopamc opened this issue 3 months ago
[Hardware][AMD] Update rocm base image and add openai server entrypoint
github.com/vllm-project/vllm - dgoupil opened this pull request 3 months ago
github.com/vllm-project/vllm - dgoupil opened this pull request 3 months ago
[Performance] Optimize e2e overheads: Reduce python allocations
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
[Bug]: vllm is crashed on v0.5.3.post1
github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago
github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago
[Bugfix][Frontend] Enable tools
github.com/vllm-project/vllm - tomeras91 opened this pull request 3 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request 3 months ago
[Usage]: assert error: assert self._num_computed_tokens <= self.get_len()
github.com/vllm-project/vllm - Mr-Rosan opened this issue 3 months ago
github.com/vllm-project/vllm - Mr-Rosan opened this issue 3 months ago
[Model] Support SigLIP encoder and alternative decoders for LLaVA models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Bug]: After updating to vllm 0.5.3.post1, "Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method."
github.com/vllm-project/vllm - Jerry-jwz opened this issue 3 months ago
github.com/vllm-project/vllm - Jerry-jwz opened this issue 3 months ago
[Core] Use flashinfer sampling kernel when available
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
[Bug]: 单gpu没有任何反应(设置tensor_parallel_size=1模型加载失败)
github.com/vllm-project/vllm - efficentdet opened this issue 3 months ago
github.com/vllm-project/vllm - efficentdet opened this issue 3 months ago
[Frontend] Support embeddings in the run_batch API
github.com/vllm-project/vllm - pooyadavoodi opened this pull request 3 months ago
github.com/vllm-project/vllm - pooyadavoodi opened this pull request 3 months ago
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Feature]: How to run the int4 quantized version of the gemma2-27b model
github.com/vllm-project/vllm - maxin9966 opened this issue 3 months ago
github.com/vllm-project/vllm - maxin9966 opened this issue 3 months ago
[RFC]: Model architecture plugins
github.com/vllm-project/vllm - NadavShmayo opened this issue 3 months ago
github.com/vllm-project/vllm - NadavShmayo opened this issue 3 months ago
[Model] Add multi-image support for minicpmv offline inference
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 3 months ago
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 3 months ago