Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Installation]: LGPL license in dependencies
github.com/vllm-project/vllm - laurens-gs opened this issue about 2 months ago
github.com/vllm-project/vllm - laurens-gs opened this issue about 2 months ago
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat)
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 2 months ago
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 2 months ago
[Core] Increase default `max_num_batched_tokens` for multimodal models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
ppc64le: Dockerfile fixed, and a script for buildkite
github.com/vllm-project/vllm - sumitd2 opened this pull request about 2 months ago
github.com/vllm-project/vllm - sumitd2 opened this pull request about 2 months ago
[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - chenchunhui97 opened this issue about 2 months ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue about 2 months ago
[Bug]: deploy on V100, mma -> mma layout conversion is only supported on Ampere
github.com/vllm-project/vllm - brosoul opened this issue about 2 months ago
github.com/vllm-project/vllm - brosoul opened this issue about 2 months ago
[Bug]: Flashinfer now supports SM75, but VLLM is still encountering errors.
github.com/vllm-project/vllm - maxin9966 opened this issue about 2 months ago
github.com/vllm-project/vllm - maxin9966 opened this issue about 2 months ago
[Usage]: Can and How we start server on multi-node multi-gpu with torchrun?
github.com/vllm-project/vllm - ericxsun opened this issue about 2 months ago
github.com/vllm-project/vllm - ericxsun opened this issue about 2 months ago
[Bug]: Trailing newline as outputs
github.com/vllm-project/vllm - dawu415 opened this issue about 2 months ago
github.com/vllm-project/vllm - dawu415 opened this issue about 2 months ago
[Core][Kernel][Misc] Support external swapper for vllm
github.com/vllm-project/vllm - zeroorhero opened this pull request about 2 months ago
github.com/vllm-project/vllm - zeroorhero opened this pull request about 2 months ago
[Bug]: InternVL2-2B outputs gibberish with tensor parallel inference
github.com/vllm-project/vllm - Isotr0py opened this issue about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this issue about 2 months ago
[Bug]: v0.5.5 crash: "AssertionError: expected running sequences"
github.com/vllm-project/vllm - zoltan-fedor opened this issue about 2 months ago
github.com/vllm-project/vllm - zoltan-fedor opened this issue about 2 months ago
[Bugfix] Address #8009 and add model test for flashinfer fp8 kv cache.
github.com/vllm-project/vllm - pavanimajety opened this pull request about 2 months ago
github.com/vllm-project/vllm - pavanimajety opened this pull request about 2 months ago
[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request about 2 months ago
Neuron cache blocks must be 1 more than max num seqs
github.com/vllm-project/vllm - ajayvohra2005 opened this pull request about 2 months ago
github.com/vllm-project/vllm - ajayvohra2005 opened this pull request about 2 months ago
[Performance]: INT4 quantisation does not lead to any observable throughput increase
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
github.com/vllm-project/vllm - captify-sivakhno opened this issue about 2 months ago
[Installation]: vLLM source install on rocm 6.2 still requires libamdhip64.so.6
github.com/vllm-project/vllm - gounley opened this issue about 2 months ago
github.com/vllm-project/vllm - gounley opened this issue about 2 months ago
[Feature]: Proof of Work value ($5,000 Bounty) from Manifold Labs
github.com/vllm-project/vllm - GentikSolm opened this issue about 2 months ago
github.com/vllm-project/vllm - GentikSolm opened this issue about 2 months ago
[RFC]: Build `vllm-flash-attn` from source
github.com/vllm-project/vllm - ProExpertProg opened this issue about 2 months ago
github.com/vllm-project/vllm - ProExpertProg opened this issue about 2 months ago
[WIP] Multi Step Chunked Prefill - Prefill Steps
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 2 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 2 months ago
[Bug]: InternVL2-26B infer error:Attempted to assign 7 x 256 = 1792 multimodal tokens to 506 placeholders
github.com/vllm-project/vllm - SovereignRemedy opened this issue about 2 months ago
github.com/vllm-project/vllm - SovereignRemedy opened this issue about 2 months ago
[Bug]: gguf file without .gguf extension fails to run, even with "--quantization gguf --load-format gguf" flags
github.com/vllm-project/vllm - ericcurtin opened this issue about 2 months ago
github.com/vllm-project/vllm - ericcurtin opened this issue about 2 months ago
[Bug]: Jamba-1.5-mini doesn't run on A100 with 70GB available memory
github.com/vllm-project/vllm - Tejaswgupta opened this issue about 2 months ago
github.com/vllm-project/vllm - Tejaswgupta opened this issue about 2 months ago
[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co
github.com/vllm-project/vllm - NaiveYan opened this issue about 2 months ago
github.com/vllm-project/vllm - NaiveYan opened this issue about 2 months ago
[New Model]: LlavaQwen2ForCausalLM
github.com/vllm-project/vllm - Chuyun-Shen opened this issue about 2 months ago
github.com/vllm-project/vllm - Chuyun-Shen opened this issue about 2 months ago
[Usage]: run gguf model need template,how to write?
github.com/vllm-project/vllm - lonngxiang opened this issue about 2 months ago
github.com/vllm-project/vllm - lonngxiang opened this issue about 2 months ago
[Bug]: When enabling LoRA, greedy search got different answers.
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
[Misc] Update `GPTQ` to use `vLLMParameters`
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Misc] Update fbgemmfp8 to use `vLLMParameters`
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Misc] Remove `SqueezeLLM`
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
Roberta embedding
github.com/vllm-project/vllm - maxdebayser opened this pull request about 2 months ago
github.com/vllm-project/vllm - maxdebayser opened this pull request about 2 months ago
[Bug]: Multistep with n>1 Fails
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue about 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue about 2 months ago
[Model] Add Ultravox support for multiple audio chunks
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM
github.com/vllm-project/vllm - sroy745 opened this pull request about 2 months ago
github.com/vllm-project/vllm - sroy745 opened this pull request about 2 months ago
[Bug]: segfault when loading MoE models
github.com/vllm-project/vllm - nivibilla opened this issue about 2 months ago
github.com/vllm-project/vllm - nivibilla opened this issue about 2 months ago
[Bugfix] Fix incorrect vocal embedding shards for GGUF model in tensor parallelism
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Feature]: Context Caching
github.com/vllm-project/vllm - RonanKMcGovern opened this issue about 2 months ago
github.com/vllm-project/vllm - RonanKMcGovern opened this issue about 2 months ago
[Bug]: Mi300x x8 unable to lauch openai/api_server.py on rocm vllm branch.
github.com/vllm-project/vllm - ferrybaltimore opened this issue about 2 months ago
github.com/vllm-project/vllm - ferrybaltimore opened this issue about 2 months ago
[Feature]: Gemma 2 models logit softcapping for TPU pallas attention backend
github.com/vllm-project/vllm - sparsh35 opened this issue about 2 months ago
github.com/vllm-project/vllm - sparsh35 opened this issue about 2 months ago
[Performance]: vLLM version issue.
github.com/vllm-project/vllm - zjjznw123 opened this issue about 2 months ago
github.com/vllm-project/vllm - zjjznw123 opened this issue about 2 months ago
[Bugfix][VLM] Fix incompatibility between #7902 and #7230
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[WIP][Spec Decode] Add multi-proposer support for variable and flexible speculative decoding
github.com/vllm-project/vllm - ShangmingCai opened this pull request about 2 months ago
github.com/vllm-project/vllm - ShangmingCai opened this pull request about 2 months ago
[Bug]: deploy multi lora by vllm mode error
github.com/vllm-project/vllm - askcs517 opened this issue about 2 months ago
github.com/vllm-project/vllm - askcs517 opened this issue about 2 months ago
[ci][test] fix pp test failure
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Bug]:reset LLM for each inference
github.com/vllm-project/vllm - victorzhz111 opened this issue about 2 months ago
github.com/vllm-project/vllm - victorzhz111 opened this issue about 2 months ago
[misc] [doc] [frontend] LLM torch profiler support
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
[Model] EXAONE 3.0 model support
github.com/vllm-project/vllm - Deepfocused opened this pull request about 2 months ago
github.com/vllm-project/vllm - Deepfocused opened this pull request about 2 months ago
[Bug]: def _schedule_running(...) the seqs num of budget not updated
github.com/vllm-project/vllm - yblir opened this issue about 2 months ago
github.com/vllm-project/vllm - yblir opened this issue about 2 months ago
[Bug]: RuntimeError: operator torchvision::nms does not exist
github.com/vllm-project/vllm - murray-z opened this issue about 2 months ago
github.com/vllm-project/vllm - murray-z opened this issue about 2 months ago
[Bug]: Is vllm compatible with torchrun?
github.com/vllm-project/vllm - HwwwwwwwH opened this issue about 2 months ago
github.com/vllm-project/vllm - HwwwwwwwH opened this issue about 2 months ago
[Misc] Use ray[adag] dependency instead of cuda
github.com/vllm-project/vllm - ruisearch42 opened this pull request about 2 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request about 2 months ago
[Doc] fix the autoAWQ example
github.com/vllm-project/vllm - stas00 opened this pull request about 2 months ago
github.com/vllm-project/vllm - stas00 opened this pull request about 2 months ago
[Bug]: vllm api_server often crashes when the version is higher than 0.5.3.
github.com/vllm-project/vllm - BaiMoHan opened this issue about 2 months ago
github.com/vllm-project/vllm - BaiMoHan opened this issue about 2 months ago
[Performance]: 5x slower throught with openAI client/server than native one
github.com/vllm-project/vllm - stas00 opened this issue about 2 months ago
github.com/vllm-project/vllm - stas00 opened this issue about 2 months ago
[Bug]: An abnormal delay of 300 milliseconds was detected.
github.com/vllm-project/vllm - skylee-01 opened this issue about 2 months ago
github.com/vllm-project/vllm - skylee-01 opened this issue about 2 months ago
[Feature]: Slurm run_cluster.sh launcher instead of just Ray
github.com/vllm-project/vllm - OrenLeung opened this issue about 2 months ago
github.com/vllm-project/vllm - OrenLeung opened this issue about 2 months ago
[TPU] Align worker index with node boundary
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Core] Add support for recursively loading weights by model ID
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
[New Model]: Could you please help me to support google/madlad400-3b-mt translator model in vLLM?
github.com/vllm-project/vllm - aitrainingcrew opened this issue about 2 months ago
github.com/vllm-project/vllm - aitrainingcrew opened this issue about 2 months ago
[mypy][CI/Build] Fix mypy errors
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[multi-step] add flashinfer backend
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
[This PR is not supposed to be merged] Testing regression in Tensorizer Test
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
[hardware][rocm] allow rocm to override default env var
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
Draft - [CI/Build] Add shell script linting using shellcheck
github.com/vllm-project/vllm - russellb opened this pull request about 2 months ago
github.com/vllm-project/vllm - russellb opened this pull request about 2 months ago
[Frontend][VLM] Add support for multiple multi-modal items in the OpenAI frontend
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
[Model] Add OLMoE
github.com/vllm-project/vllm - Muennighoff opened this pull request about 2 months ago
github.com/vllm-project/vllm - Muennighoff opened this pull request about 2 months ago
[Core] Combine async postprocessor and multi-step
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load
github.com/vllm-project/vllm - zifeitong opened this issue about 2 months ago
github.com/vllm-project/vllm - zifeitong opened this issue about 2 months ago
Remove request.max_tokens assertion in serving_completion.py
github.com/vllm-project/vllm - zifeitong opened this pull request about 2 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request about 2 months ago
[Bug]: vllm:num_requests_waiting is not being published at /metrics endpoint
github.com/vllm-project/vllm - IshmeetMehta opened this issue about 2 months ago
github.com/vllm-project/vllm - IshmeetMehta opened this issue about 2 months ago
[benchmark] Update TGI version
github.com/vllm-project/vllm - philschmid opened this pull request about 2 months ago
github.com/vllm-project/vllm - philschmid opened this pull request about 2 months ago
[Bugfix] Fix phi3v incorrect image_idx when using async engine
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Feature]: Does VLLM only support MistralModel Architecture for embedding?
github.com/vllm-project/vllm - hahmad2008 opened this issue about 2 months ago
github.com/vllm-project/vllm - hahmad2008 opened this issue about 2 months ago
[Bug]: On a machine with an A100 GPU, when running the Dockerfile of version 0.5.5, an error occurs.
github.com/vllm-project/vllm - zjjznw123 opened this issue about 2 months ago
github.com/vllm-project/vllm - zjjznw123 opened this issue about 2 months ago
[Bug]: command r server hangs randomly with no error
github.com/vllm-project/vllm - nivibilla opened this issue about 2 months ago
github.com/vllm-project/vllm - nivibilla opened this issue about 2 months ago
[Usage]: Confirm tool calling is not supported and this is the closest thing can be done
github.com/vllm-project/vllm - summersonnn opened this issue about 2 months ago
github.com/vllm-project/vllm - summersonnn opened this issue about 2 months ago
[Core] Async_output_proc: Add virtual engine support (towards pipeline parallel)
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
[Core] Enable Memory Tiering for vLLM
github.com/vllm-project/vllm - PanJason opened this pull request about 2 months ago
github.com/vllm-project/vllm - PanJason opened this pull request about 2 months ago
[Feature]: Lora for MiniCPM_2_6
github.com/vllm-project/vllm - tristan279 opened this issue about 2 months ago
github.com/vllm-project/vllm - tristan279 opened this issue about 2 months ago
[Core] Adding Control Vector Support
github.com/vllm-project/vllm - raywanb opened this pull request about 2 months ago
github.com/vllm-project/vllm - raywanb opened this pull request about 2 months ago
[Model][VLM] Add Qwen2-VL model support
github.com/vllm-project/vllm - fyabc opened this pull request about 2 months ago
github.com/vllm-project/vllm - fyabc opened this pull request about 2 months ago
[Bug]: ray + vllm async engine: Background loop is stopped
github.com/vllm-project/vllm - Jack47 opened this issue about 2 months ago
github.com/vllm-project/vllm - Jack47 opened this issue about 2 months ago
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
[Model] EXAONE 3.0 model support - closed
github.com/vllm-project/vllm - Deepfocused opened this pull request about 2 months ago
github.com/vllm-project/vllm - Deepfocused opened this pull request about 2 months ago
[Bugfix] Unify rank computation across regular decoding and speculative decoding
github.com/vllm-project/vllm - jmkuebler opened this pull request about 2 months ago
github.com/vllm-project/vllm - jmkuebler opened this pull request about 2 months ago
[CI/Build][VLM] Cleanup multiple images inputs model test
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Bug]: RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details) [repeated 6x across cluster]
github.com/vllm-project/vllm - soumyasmruti opened this issue about 2 months ago
github.com/vllm-project/vllm - soumyasmruti opened this issue about 2 months ago
extend cuda graph size for H200
github.com/vllm-project/vllm - kushanam opened this pull request about 2 months ago
github.com/vllm-project/vllm - kushanam opened this pull request about 2 months ago
[Frontend] Add option for LLMEngine to return model hidden states.
github.com/vllm-project/vllm - jdvin opened this pull request about 2 months ago
github.com/vllm-project/vllm - jdvin opened this pull request about 2 months ago
[Usage]: how to test the time of response about minicpm-v-2.6 served by VLLM
github.com/vllm-project/vllm - Mysnake opened this issue about 2 months ago
github.com/vllm-project/vllm - Mysnake opened this issue about 2 months ago
[Bug]: CUDA_VISIBLE_DEVICES not detected
github.com/vllm-project/vllm - paolovic opened this issue about 2 months ago
github.com/vllm-project/vllm - paolovic opened this issue about 2 months ago
[Doc]: Update tensorizer docs to include vllm[tensorizer]
github.com/vllm-project/vllm - sethkimmel3 opened this pull request about 2 months ago
github.com/vllm-project/vllm - sethkimmel3 opened this pull request about 2 months ago
Adding new cutlass configurations for llama70B
github.com/vllm-project/vllm - kushanam opened this pull request about 2 months ago
github.com/vllm-project/vllm - kushanam opened this pull request about 2 months ago
[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support.
github.com/vllm-project/vllm - bnellnm opened this pull request about 2 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request about 2 months ago
[Neuron] Adding support for context-lenght, token-gen buckets.
github.com/vllm-project/vllm - hbikki opened this pull request about 2 months ago
github.com/vllm-project/vllm - hbikki opened this pull request about 2 months ago
[Performance]: Prefix-caching aware scheduling
github.com/vllm-project/vllm - comaniac opened this issue about 2 months ago
github.com/vllm-project/vllm - comaniac opened this issue about 2 months ago
[Bugfix] Fix single output condition in output processor
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Bug]: Requests larger than 75k input tokens cause `Input prompt (512 tokens) is too long and exceeds the capacity of block_manager` error
github.com/vllm-project/vllm - servient-ashwin opened this issue about 2 months ago
github.com/vllm-project/vllm - servient-ashwin opened this issue about 2 months ago
[Bug]: Request Cancelation w/ Scheduler Steps Set Causes K8s Pod Restart
github.com/vllm-project/vllm - sam-h-bean opened this issue about 2 months ago
github.com/vllm-project/vllm - sam-h-bean opened this issue about 2 months ago
[CI/Build] Add linting for github actions workflows
github.com/vllm-project/vllm - russellb opened this pull request about 2 months ago
github.com/vllm-project/vllm - russellb opened this pull request about 2 months ago