Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Kernel] Use flashinfer for decoding
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 6 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 6 months ago
[Bug]: mistralai/Mixtral-8x22B-Instruct-v0.1 fails to load 2/3 times on aae08249acca69060d0a8220cab920e00520932c
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales
github.com/vllm-project/vllm - pcmoritz opened this pull request 6 months ago
github.com/vllm-project/vllm - pcmoritz opened this pull request 6 months ago
[Bug]: Call to CUDA function failed - unknown error
github.com/vllm-project/vllm - roclark opened this issue 6 months ago
github.com/vllm-project/vllm - roclark opened this issue 6 months ago
[Misc]: RuntimeError: Cannot find any model weights [vllm=0.4.0]
github.com/vllm-project/vllm - vishwa27yvs opened this issue 6 months ago
github.com/vllm-project/vllm - vishwa27yvs opened this issue 6 months ago
[Kernel] Support Fp8 Checkpoints (Dynamic + Static)
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago
[New Model]: launch error of Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
github.com/vllm-project/vllm - eigen2017 opened this issue 6 months ago
github.com/vllm-project/vllm - eigen2017 opened this issue 6 months ago
[Misc] Upgrade outlines to v0.0.41
github.com/vllm-project/vllm - psykhi opened this pull request 6 months ago
github.com/vllm-project/vllm - psykhi opened this pull request 6 months ago
Add logger extra
github.com/vllm-project/vllm - olehviniarchyk opened this pull request 6 months ago
github.com/vllm-project/vllm - olehviniarchyk opened this pull request 6 months ago
[Core] Consolidate prompt arguments to LLM engines
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Kernel][Core][WIP] Tree attention and parallel decoding
github.com/vllm-project/vllm - yukavio opened this pull request 6 months ago
github.com/vllm-project/vllm - yukavio opened this pull request 6 months ago
[Bug]: phi-3 (microsoft/Phi-3-mini-128k-instruct) fails with assert "factor" in rope_scaling
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Usage]: Flash Attention not working any more
github.com/vllm-project/vllm - Techinix opened this issue 6 months ago
github.com/vllm-project/vllm - Techinix opened this issue 6 months ago
[CI] check size of the wheels
github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago
[Misc]: How is the continous batching feature of vLLM implemented?
github.com/vllm-project/vllm - llx-08 opened this issue 6 months ago
github.com/vllm-project/vllm - llx-08 opened this issue 6 months ago
[New Model]: Support Phi-3
github.com/vllm-project/vllm - alexkreidler opened this issue 6 months ago
github.com/vllm-project/vllm - alexkreidler opened this issue 6 months ago
Allow user to define whitespace pattern for outlines
github.com/vllm-project/vllm - robcaulk opened this pull request 6 months ago
github.com/vllm-project/vllm - robcaulk opened this pull request 6 months ago
[Usage]: ValueError: Cannot find the config file for awq
github.com/vllm-project/vllm - grumpyp opened this issue 6 months ago
github.com/vllm-project/vllm - grumpyp opened this issue 6 months ago
[New Model]: Llama 3 8B Instruct
github.com/vllm-project/vllm - K-Mistele opened this issue 6 months ago
github.com/vllm-project/vllm - K-Mistele opened this issue 6 months ago
[Speculative decoding] CUDA graph support
github.com/vllm-project/vllm - heeju-kim2 opened this pull request 6 months ago
github.com/vllm-project/vllm - heeju-kim2 opened this pull request 6 months ago
[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3.
github.com/vllm-project/vllm - blackblue9 opened this issue 6 months ago
github.com/vllm-project/vllm - blackblue9 opened this issue 6 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs
github.com/vllm-project/vllm - cduk opened this pull request 6 months ago
github.com/vllm-project/vllm - cduk opened this pull request 6 months ago
[WIP] Infrastructure for encoder/decoder support
github.com/vllm-project/vllm - afeldman-nm opened this pull request 6 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 6 months ago
[Bug]: vllm stall on llama3-70b warmup with 0.4.1
github.com/vllm-project/vllm - piercefreeman opened this issue 6 months ago
github.com/vllm-project/vllm - piercefreeman opened this issue 6 months ago
[Bug]: CPU Inference vllm_ops not defined
github.com/vllm-project/vllm - bsu3338 opened this issue 6 months ago
github.com/vllm-project/vllm - bsu3338 opened this issue 6 months ago
[MISC] Rework logger to enable pythonic custom logging configuration to be provided
github.com/vllm-project/vllm - tdg5 opened this pull request 6 months ago
github.com/vllm-project/vllm - tdg5 opened this pull request 6 months ago
add standalone_api_server
github.com/vllm-project/vllm - alex-k-cart opened this pull request 6 months ago
github.com/vllm-project/vllm - alex-k-cart opened this pull request 6 months ago
[CI/Build] AMD CI pipeline with extended set of tests.
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 6 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 6 months ago
[Bug]: offline test, Process hangs without exiting when using cuda graph
github.com/vllm-project/vllm - DefTruth opened this issue 6 months ago
github.com/vllm-project/vllm - DefTruth opened this issue 6 months ago
[Bug]: Repeatedly printing after the conversation ends<| im_end |><| im_start |>
github.com/vllm-project/vllm - huangshengfu opened this issue 6 months ago
github.com/vllm-project/vllm - huangshengfu opened this issue 6 months ago
[Speculative decoding] Fix async executing
github.com/vllm-project/vllm - zxdvd opened this pull request 6 months ago
github.com/vllm-project/vllm - zxdvd opened this pull request 6 months ago
[Feature]: Cannot use FlashAttention backend for Volta and Turing GPUs. (but FlashAttention v1.0.9 supports Turing GPU.)
github.com/vllm-project/vllm - tutu329 opened this issue 6 months ago
github.com/vllm-project/vllm - tutu329 opened this issue 6 months ago
Llama-3-70b: Should I apply some special template to use llama-3?
github.com/vllm-project/vllm - UbeCc opened this issue 6 months ago
github.com/vllm-project/vllm - UbeCc opened this issue 6 months ago
[Speculative decoding] Add ngram prompt lookup decoding
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
[Misc]: is it possible to load lora adapter on request basis with out restarting the base model for every new lora trained?
github.com/vllm-project/vllm - Wizmak9 opened this issue 6 months ago
github.com/vllm-project/vllm - Wizmak9 opened this issue 6 months ago
[Misc]: Total number of attention heads (40) must be divisible by tensor parallel size (6)
github.com/vllm-project/vllm - CNXDZS opened this issue 6 months ago
github.com/vllm-project/vllm - CNXDZS opened this issue 6 months ago
[Bug]: NameError: name 'vllm_ops' is not defined
github.com/vllm-project/vllm - yananchen1989 opened this issue 6 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue 6 months ago
[Model] Add moondream vision language model
github.com/vllm-project/vllm - vikhyat opened this pull request 6 months ago
github.com/vllm-project/vllm - vikhyat opened this pull request 6 months ago
[Bugfix] Fix marlin kernel crash on H100
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 6 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 6 months ago
[Feature]: beam search mode to allow for more options in sampling process
github.com/vllm-project/vllm - GeauxEric opened this issue 6 months ago
github.com/vllm-project/vllm - GeauxEric opened this issue 6 months ago
[Speculative decoding] [Performance]: Re-enable bonus tokens
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
Performance Regression between v0.4.0 and v0.4.1
github.com/vllm-project/vllm - simon-mo opened this issue 6 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 6 months ago
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0
github.com/vllm-project/vllm - sangstar opened this pull request 6 months ago
github.com/vllm-project/vllm - sangstar opened this pull request 6 months ago
[Usage]: Make request to LLAVA server.
github.com/vllm-project/vllm - premg16 opened this issue 6 months ago
github.com/vllm-project/vllm - premg16 opened this issue 6 months ago
[Usage]: How to use LoRARequest with AsyncLLMEngine?
github.com/vllm-project/vllm - Rares9999 opened this issue 6 months ago
github.com/vllm-project/vllm - Rares9999 opened this issue 6 months ago
[Installation]: Failed to build form source code. Python=3.9 CUDA=12.1
github.com/vllm-project/vllm - WJMacro opened this issue 6 months ago
github.com/vllm-project/vllm - WJMacro opened this issue 6 months ago
[Frontend] Support GPT-4V Chat Completions API
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Model] Initial support for LLaVA-NeXT
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Bug]: KeyError: 'model.layers.24.mlp.down_proj.weight' for llama 7b model SqueezeLLM quantization
github.com/vllm-project/vllm - condy0919 opened this issue 6 months ago
github.com/vllm-project/vllm - condy0919 opened this issue 6 months ago
[Core] Support image processor
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Misc]: optimize eager mode host time
github.com/vllm-project/vllm - functionxu123 opened this pull request 6 months ago
github.com/vllm-project/vllm - functionxu123 opened this pull request 6 months ago
[RFC]: Multi-modality Support Refactoring
github.com/vllm-project/vllm - ywang96 opened this issue 6 months ago
github.com/vllm-project/vllm - ywang96 opened this issue 6 months ago
[Bug]: Disk I/O Error when using tools due to shared outlines cache database
github.com/vllm-project/vllm - AaronFriel opened this issue 6 months ago
github.com/vllm-project/vllm - AaronFriel opened this issue 6 months ago
[New Model]: Please update docker to support llama3
github.com/vllm-project/vllm - HangLu123 opened this issue 6 months ago
github.com/vllm-project/vllm - HangLu123 opened this issue 6 months ago
Adding max queue time parameter
github.com/vllm-project/vllm - KrishnaM251 opened this pull request 6 months ago
github.com/vllm-project/vllm - KrishnaM251 opened this pull request 6 months ago
[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported
github.com/vllm-project/vllm - u650080 opened this issue 6 months ago
github.com/vllm-project/vllm - u650080 opened this issue 6 months ago
[Usage]: Llama 3 8B Instruct Inference
github.com/vllm-project/vllm - aliozts opened this issue 6 months ago
github.com/vllm-project/vllm - aliozts opened this issue 6 months ago
[Bug]: Server crash for bloom-3b while use prefix_caching, `AssertionError assert Lk in {16, 32, 64, 128}`
github.com/vllm-project/vllm - DefTruth opened this issue 6 months ago
github.com/vllm-project/vllm - DefTruth opened this issue 6 months ago
Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`
github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago
[CI/Build] Further decouple HuggingFace implementation from ours during tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[BugFix] fix num_lookahead_slots missing in async executor
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
[Misc]: How to access the KV cache directly?
github.com/vllm-project/vllm - BDHU opened this issue 6 months ago
github.com/vllm-project/vllm - BDHU opened this issue 6 months ago
[Feature]: AMD ROCm 6.1 Support
github.com/vllm-project/vllm - kannan-scalers-ai opened this issue 6 months ago
github.com/vllm-project/vllm - kannan-scalers-ai opened this issue 6 months ago
[Usage]: if I want to run a 34B model,like yi-34B-chat,how can I use multi GPU,I just have A100 40G
github.com/vllm-project/vllm - hellostronger opened this issue 6 months ago
github.com/vllm-project/vllm - hellostronger opened this issue 6 months ago
[Usage]: How to get the latency of each request with benchmark_serving.py
github.com/vllm-project/vllm - wanzhenchn opened this issue 6 months ago
github.com/vllm-project/vllm - wanzhenchn opened this issue 6 months ago
[Core] Enable prefix caching with block manager v2 enabled
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago
[Feature]: Phi2 LoRA support
github.com/vllm-project/vllm - zero-or-one opened this issue 6 months ago
github.com/vllm-project/vllm - zero-or-one opened this issue 6 months ago
[Misc]Add customized information for models
github.com/vllm-project/vllm - jeejeelee opened this pull request 6 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 6 months ago
[Bug]: Invalid Device Ordinal on ROCm
github.com/vllm-project/vllm - Bellk17 opened this issue 6 months ago
github.com/vllm-project/vllm - Bellk17 opened this issue 6 months ago
Added Support for guided decoding in offline interface
github.com/vllm-project/vllm - kevinbu233 opened this pull request 6 months ago
github.com/vllm-project/vllm - kevinbu233 opened this pull request 6 months ago
[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring
github.com/vllm-project/vllm - hongxiayang opened this pull request 6 months ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 6 months ago
[Feature]: Support HuggingFaceM4/idefics2-8b as vision model
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Misc] [CI]: AMD test flaky on main CI
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago
[Model] Update MPT model with GLU and rope and add low precision layer norm
github.com/vllm-project/vllm - marov opened this pull request 6 months ago
github.com/vllm-project/vllm - marov opened this pull request 6 months ago
[CI/BUILD] enable intel queue for longer CPU tests
github.com/vllm-project/vllm - zhouyuan opened this pull request 6 months ago
github.com/vllm-project/vllm - zhouyuan opened this pull request 6 months ago
[Bug]: VLLM's output is unstable when handling requests CONCURRENTLY.
github.com/vllm-project/vllm - zhengwei-gao opened this issue 6 months ago
github.com/vllm-project/vllm - zhengwei-gao opened this issue 6 months ago
[Bug]: deepseek-coder-33b-instruct and deepseek-coder-6.7b-instruct broken, but deepseek-llm-7b-chat and deepseek-llm-67b-chat work well
github.com/vllm-project/vllm - lgw2023 opened this issue 6 months ago
github.com/vllm-project/vllm - lgw2023 opened this issue 6 months ago
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide`
github.com/vllm-project/vllm - br3no opened this pull request 6 months ago
github.com/vllm-project/vllm - br3no opened this pull request 6 months ago
[Bug]: NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Bug]: --engine-use-ray is broken. #4100
github.com/vllm-project/vllm - jdinalt opened this pull request 6 months ago
github.com/vllm-project/vllm - jdinalt opened this pull request 6 months ago
[Bugfix] Fix naive attention typos and make it run on navi3x
github.com/vllm-project/vllm - maleksan85 opened this pull request 6 months ago
github.com/vllm-project/vllm - maleksan85 opened this pull request 6 months ago
[Bug]: guided_json bad output for llama2-13b
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Model] Adding support for MiniCPM-V
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 6 months ago
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 6 months ago
[FacebookAI/roberta-large]: vllm support for FacebookAI/roberta-large
github.com/vllm-project/vllm - pradeepdev-1995 opened this issue 6 months ago
github.com/vllm-project/vllm - pradeepdev-1995 opened this issue 6 months ago
[Bug]: vllm_C is missing.
github.com/vllm-project/vllm - Calvinnncy97 opened this issue 6 months ago
github.com/vllm-project/vllm - Calvinnncy97 opened this issue 6 months ago
[Model] Add support for 360zhinao
github.com/vllm-project/vllm - garycaokai opened this pull request 6 months ago
github.com/vllm-project/vllm - garycaokai opened this pull request 6 months ago
[Bug]: RuntimeError: Unknown layout
github.com/vllm-project/vllm - zzlgreat opened this issue 6 months ago
github.com/vllm-project/vllm - zzlgreat opened this issue 6 months ago
[Bug]: sending request using response_format json twice breaks vLLM
github.com/vllm-project/vllm - samos123 opened this issue 6 months ago
github.com/vllm-project/vllm - samos123 opened this issue 6 months ago
[Feature]: Allow LoRA adapters to be specified as in-memory dict of tensors
github.com/vllm-project/vllm - jacobthebanana opened this issue 6 months ago
github.com/vllm-project/vllm - jacobthebanana opened this issue 6 months ago
[Usage]: Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1
github.com/vllm-project/vllm - rohitnanda1443 opened this issue 6 months ago
github.com/vllm-project/vllm - rohitnanda1443 opened this issue 6 months ago
Does vllm support both CUDA 11.3 version and PyTorch 1.12?
github.com/vllm-project/vllm - iclgg opened this issue 6 months ago
github.com/vllm-project/vllm - iclgg opened this issue 6 months ago
[Usage]: Problem when loading my trained model.
github.com/vllm-project/vllm - hummingbird2030 opened this issue 6 months ago
github.com/vllm-project/vllm - hummingbird2030 opened this issue 6 months ago
[Feature][Chunked prefill]: Make sliding window work
github.com/vllm-project/vllm - rkooo567 opened this issue 6 months ago
github.com/vllm-project/vllm - rkooo567 opened this issue 6 months ago
[Feature]: bitsandbytes support
github.com/vllm-project/vllm - orellavie1212 opened this issue 6 months ago
github.com/vllm-project/vllm - orellavie1212 opened this issue 6 months ago
[Frontend] Refactor prompt processing
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Bug]: start api server stuck
github.com/vllm-project/vllm - QianguoS opened this issue 6 months ago
github.com/vllm-project/vllm - QianguoS opened this issue 6 months ago
[Installation]: Any plans on providing vLLM pre-compiled for ROCm?
github.com/vllm-project/vllm - satyamk7054 opened this issue 6 months ago
github.com/vllm-project/vllm - satyamk7054 opened this issue 6 months ago
[Core] Support LoRA on quantized models
github.com/vllm-project/vllm - jeejeelee opened this pull request 6 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 6 months ago