Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bugfix] Fix illegal memory access for lora
github.com/vllm-project/vllm - sfc-gh-zhwang opened this pull request 5 months ago
github.com/vllm-project/vllm - sfc-gh-zhwang opened this pull request 5 months ago
[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Performance]: What can we learn from OctoAI
github.com/vllm-project/vllm - hmellor opened this issue 5 months ago
github.com/vllm-project/vllm - hmellor opened this issue 5 months ago
[Build] Do not compile cutlass scaled_mm on CUDA 11
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
[Bugfix] Fix KeyError: 1 When Using LoRA adapters
github.com/vllm-project/vllm - BlackBird-Coding opened this pull request 5 months ago
github.com/vllm-project/vllm - BlackBird-Coding opened this pull request 5 months ago
[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine
github.com/vllm-project/vllm - kezouke opened this issue 5 months ago
github.com/vllm-project/vllm - kezouke opened this issue 5 months ago
[Bug]: WSL2(also Docker) 1 GPU work but 2 not,(--tensor-parallel-size 2 )
github.com/vllm-project/vllm - goodmaney opened this issue 5 months ago
github.com/vllm-project/vllm - goodmaney opened this issue 5 months ago
[Bug]: Issue with Token Processing Efficiency and Key-Value Cache Utilization in AsyncLLMEngine
github.com/vllm-project/vllm - kezouke opened this issue 5 months ago
github.com/vllm-project/vllm - kezouke opened this issue 5 months ago
[Kernel] Pass a device pointer into the quantize kernel for the scales
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Core] Bump up the default of --gpu_memory_utilization to be more similar to TensorRT Triton's default
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Feature]: VLLM suport for function calling in Mistral-7B-Instruct-v0.3
github.com/vllm-project/vllm - javierquin opened this issue 5 months ago
github.com/vllm-project/vllm - javierquin opened this issue 5 months ago
[Feature]: Linear adapter support for Mixtral
github.com/vllm-project/vllm - DhruvaBansal00 opened this issue 5 months ago
github.com/vllm-project/vllm - DhruvaBansal00 opened this issue 5 months ago
[Bug] [spec decode] [flash_attn]: CUDA illegal memory access when calling flash_attn_cuda.fwd_kvcache
github.com/vllm-project/vllm - khluu opened this issue 5 months ago
github.com/vllm-project/vllm - khluu opened this issue 5 months ago
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py
github.com/vllm-project/vllm - dashanji opened this pull request 5 months ago
github.com/vllm-project/vllm - dashanji opened this pull request 5 months ago
[Misc]: Should inference with temperature 0 generate the same results for a lora adapter and equivalent merged model?
github.com/vllm-project/vllm - rohan-daniscox opened this issue 5 months ago
github.com/vllm-project/vllm - rohan-daniscox opened this issue 5 months ago
[Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory when Handle inference requests
github.com/vllm-project/vllm - zhaotyer opened this issue 5 months ago
github.com/vllm-project/vllm - zhaotyer opened this issue 5 months ago
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Kernel] Update Cutlass fp8 configs
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago
[Usage]: how should I do data parallelism using vLLM?
github.com/vllm-project/vllm - YuWang916 opened this issue 5 months ago
github.com/vllm-project/vllm - YuWang916 opened this issue 5 months ago
[Bugfix] Fix KV head calculation for MPT models when using GQA
github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago
github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago
[CI/Build] Test buildkite monorepo plugin
github.com/vllm-project/vllm - dgoupil opened this pull request 5 months ago
github.com/vllm-project/vllm - dgoupil opened this pull request 5 months ago
[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
[Core] Remove unnecessary copies in flash attn backend
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 5 months ago
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest`
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
github.com/vllm-project/vllm - Etelis opened this pull request 5 months ago
[Usage]: Do we have any tutorials for using vllm with tensorrt-LLM?
github.com/vllm-project/vllm - weiyunfei opened this issue 5 months ago
github.com/vllm-project/vllm - weiyunfei opened this issue 5 months ago
[Bug]: nsys cannot track the cuda kernel called by the process except rank 0
github.com/vllm-project/vllm - crazy-JiangDongHua opened this issue 5 months ago
github.com/vllm-project/vllm - crazy-JiangDongHua opened this issue 5 months ago
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier
github.com/vllm-project/vllm - sroy745 opened this pull request 5 months ago
github.com/vllm-project/vllm - sroy745 opened this pull request 5 months ago
[CI/Build] increase wheel size limit to 200 MB
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py
github.com/vllm-project/vllm - ita9naiwa opened this pull request 5 months ago
github.com/vllm-project/vllm - ita9naiwa opened this pull request 5 months ago
[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
github.com/vllm-project/vllm - zhaofangtao opened this issue 5 months ago
github.com/vllm-project/vllm - zhaofangtao opened this issue 5 months ago
[Usage]: extractive question answering using VLLM
github.com/vllm-project/vllm - suryavan11 opened this issue 5 months ago
github.com/vllm-project/vllm - suryavan11 opened this issue 5 months ago
[Doc] Use intersphinx and update entrypoints docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[New Model]: LLaVA-NeXT-Video support
github.com/vllm-project/vllm - AmazDeng opened this issue 5 months ago
github.com/vllm-project/vllm - AmazDeng opened this issue 5 months ago
Add gptq_marlin test to cover bug report https://github.com/vllm-project/vllm/issues/5088
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
Add gptq_marlin test to cover bug report #5088
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Bugfix] Avoid Warnings in SparseML Activation Quantization
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Bugfix] Automatically Detect SparseML models
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Misc] Simplify code and fix type annotations in `conftest.py`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Usage]: Multiple samplig params with OpenAI library
github.com/vllm-project/vllm - JH-lee95 opened this issue 5 months ago
github.com/vllm-project/vllm - JH-lee95 opened this issue 5 months ago
[Kernel] Add `w4a16` support for `compressed_tensors` models
github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 5 months ago
[Kernel] Add support for block size 96 to the paged attention kernel.
github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago
github.com/vllm-project/vllm - bfontain opened this pull request 5 months ago
[Kernel] CUTLASS epilogue refactor and kernels with quantized outputs
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters
github.com/vllm-project/vllm - flexorRegev opened this issue 5 months ago
github.com/vllm-project/vllm - flexorRegev opened this issue 5 months ago
[CI/Build] Docker cleanup functionality for amd servers
github.com/vllm-project/vllm - okakarpa opened this pull request 5 months ago
github.com/vllm-project/vllm - okakarpa opened this pull request 5 months ago
[Bug]: vLLM embeddings example code doesn't work
github.com/vllm-project/vllm - orionw opened this issue 5 months ago
github.com/vllm-project/vllm - orionw opened this issue 5 months ago
New CI template on AWS stack
github.com/vllm-project/vllm - khluu opened this pull request 5 months ago
github.com/vllm-project/vllm - khluu opened this pull request 5 months ago
[ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite
github.com/vllm-project/vllm - eduardozamudio opened this issue 5 months ago
github.com/vllm-project/vllm - eduardozamudio opened this issue 5 months ago
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 5 months ago
[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM
github.com/vllm-project/vllm - AkshataDM opened this issue 5 months ago
github.com/vllm-project/vllm - AkshataDM opened this issue 5 months ago
[Bug]: async engine failure when placing multi lora adapter under load
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 5 months ago
github.com/vllm-project/vllm - DavidPeleg6 opened this issue 5 months ago
[Bug]: can not clean up the memory usage after instantiating the LLM class.
github.com/vllm-project/vllm - c3-ali opened this issue 5 months ago
github.com/vllm-project/vllm - c3-ali opened this issue 5 months ago
[Doc][Build] update after removing vllm-nccl
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run
github.com/vllm-project/vllm - sung-ho-moon opened this issue 5 months ago
github.com/vllm-project/vllm - sung-ho-moon opened this issue 5 months ago
[Speculative Decoding] Enable arbitrary model inputs
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 5 months ago
[CI/Build] Simplify OpenAI server setup in tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Core] Avoid the need to pass `None` values to `Sequence.inputs`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Misc] Add vLLM version getter to utils
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterators`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bug]: Can't run vllm distributed inference with vLLM + Ray
github.com/vllm-project/vllm - linchen111 opened this issue 5 months ago
github.com/vllm-project/vllm - linchen111 opened this issue 5 months ago
[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors.
github.com/vllm-project/vllm - macheng6 opened this issue 5 months ago
github.com/vllm-project/vllm - macheng6 opened this issue 5 months ago
[Feature] vLLM CLI for serving and querying OpenAI compatible server
github.com/vllm-project/vllm - EthanqX opened this pull request 5 months ago
github.com/vllm-project/vllm - EthanqX opened this pull request 5 months ago
[Bug]: Gemma model fails with GPTQ marlin
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
github.com/vllm-project/vllm - arunpatala opened this issue 5 months ago
[Installation]: Error when importing LLM from vllm
github.com/vllm-project/vllm - manishkumar0709 opened this issue 5 months ago
github.com/vllm-project/vllm - manishkumar0709 opened this issue 5 months ago
[Bug]: The vllm is disconnected after running for some time
github.com/vllm-project/vllm - zxcdsa45687 opened this issue 5 months ago
github.com/vllm-project/vllm - zxcdsa45687 opened this issue 5 months ago
[RFC]: OpenAI Triton-only backend
github.com/vllm-project/vllm - bringlein opened this issue 5 months ago
github.com/vllm-project/vllm - bringlein opened this issue 5 months ago
curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues
github.com/vllm-project/vllm - fishingcatgo opened this issue 5 months ago
github.com/vllm-project/vllm - fishingcatgo opened this issue 5 months ago
[Model] Support MAP-NEO model
github.com/vllm-project/vllm - xingweiqu opened this pull request 5 months ago
github.com/vllm-project/vllm - xingweiqu opened this pull request 5 months ago
[Usage]: quantization option usage
github.com/vllm-project/vllm - Juelianqvq opened this issue 5 months ago
github.com/vllm-project/vllm - Juelianqvq opened this issue 5 months ago
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 5 months ago
[Bug]: Build/Install Issues with pip install -e .
github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago
github.com/vllm-project/vllm - Msiavashi opened this issue 5 months ago
[Model] Add support for falcon-11B
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 5 months ago
[Bug]: The VRAM usage of calculating log_probs is not considered in profile run
github.com/vllm-project/vllm - Conless opened this issue 5 months ago
github.com/vllm-project/vllm - Conless opened this issue 5 months ago
[Feature]: Integration of transformers past_key_values into the vllm kvcache Function
github.com/vllm-project/vllm - ChaoZhou2023 opened this issue 5 months ago
github.com/vllm-project/vllm - ChaoZhou2023 opened this issue 5 months ago
Heterogeneous Speculative Decoding (CPU + GPU)
github.com/vllm-project/vllm - jiqing-feng opened this pull request 5 months ago
github.com/vllm-project/vllm - jiqing-feng opened this pull request 5 months ago
[Model] Add Internlm2 LoRA support
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Misc]: How to use guided decoding and regex as well?
github.com/vllm-project/vllm - debraj135 opened this issue 5 months ago
github.com/vllm-project/vllm - debraj135 opened this issue 5 months ago
[Bug]: When load model weights, there are infinite loading
github.com/vllm-project/vllm - tjrlwjd1 opened this issue 5 months ago
github.com/vllm-project/vllm - tjrlwjd1 opened this issue 5 months ago
[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue 5 months ago
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
github.com/vllm-project/vllm - heungson opened this issue 5 months ago
github.com/vllm-project/vllm - heungson opened this issue 5 months ago
[Core] Allow AQLM on Pascal
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
[Bug]: Cannot build cpu docker image
github.com/vllm-project/vllm - licryle opened this issue 5 months ago
github.com/vllm-project/vllm - licryle opened this issue 5 months ago
[Feature]: multi-steps model_runner?
github.com/vllm-project/vllm - leiwen83 opened this issue 5 months ago
github.com/vllm-project/vllm - leiwen83 opened this issue 5 months ago
[Frontend] Add tokenize/detokenize endpoints
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
[Bugfix] Adds outlines performance improvement
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago
github.com/vllm-project/vllm - lynkz-matt-psaltis opened this pull request 5 months ago
Running Vllm on ray cluster, logging stuck at loading
github.com/vllm-project/vllm - maherr13 opened this issue 5 months ago
github.com/vllm-project/vllm - maherr13 opened this issue 5 months ago
[Feature]: Add num_requests_preempted metric
github.com/vllm-project/vllm - sathyanarays opened this issue 5 months ago
github.com/vllm-project/vllm - sathyanarays opened this issue 5 months ago
Chat method for offline llm
github.com/vllm-project/vllm - nunjunj opened this pull request 5 months ago
github.com/vllm-project/vllm - nunjunj opened this pull request 5 months ago
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops
github.com/vllm-project/vllm - bnellnm opened this pull request 5 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request 5 months ago
Bump version to v0.4.3
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 5 months ago
[Misc] add logging level env var
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Misc] Make Serving Benchmark More User-friendly
github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago