Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Usage]: vllm Docker: OSError: Incorrect path_or_model_id
github.com/vllm-project/vllm - brian-you98 opened this issue about 1 month ago
github.com/vllm-project/vllm - brian-you98 opened this issue about 1 month ago
[Bug]: num_scheduler_steps > 1, n > 1 raise error
github.com/vllm-project/vllm - efsotr opened this issue about 1 month ago
github.com/vllm-project/vllm - efsotr opened this issue about 1 month ago
[tpu][misc] fix typo
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 1 month ago
[Feature]: Dockerfile.cpu for aarch64
github.com/vllm-project/vllm - khayamgondal opened this issue about 1 month ago
github.com/vllm-project/vllm - khayamgondal opened this issue about 1 month ago
[Bugfix] Fix broken OpenAI tensorizer test
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Usage]: Single-node multi-GPU inference
github.com/vllm-project/vllm - zhentingqi opened this issue about 1 month ago
github.com/vllm-project/vllm - zhentingqi opened this issue about 1 month ago
[Bugfix] Fix Hermes tool call chat template bug
github.com/vllm-project/vllm - K-Mistele opened this pull request about 1 month ago
github.com/vllm-project/vllm - K-Mistele opened this pull request about 1 month ago
[Bugfix] Fix LongRoPE bug
github.com/vllm-project/vllm - garg-amit opened this pull request about 1 month ago
github.com/vllm-project/vllm - garg-amit opened this pull request about 1 month ago
[not-for-review] test PR multi py ver
github.com/vllm-project/vllm - khluu opened this pull request about 1 month ago
github.com/vllm-project/vllm - khluu opened this pull request about 1 month ago
[Frontend][Core] Move guided decoding params into sampling params
github.com/vllm-project/vllm - joerunde opened this pull request about 1 month ago
github.com/vllm-project/vllm - joerunde opened this pull request about 1 month ago
[Bugfix][Frontend] Update all fastapi requests based on OpenAPIBase with annotations
github.com/vllm-project/vllm - drikster80 opened this pull request about 1 month ago
github.com/vllm-project/vllm - drikster80 opened this pull request about 1 month ago
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv
github.com/vllm-project/vllm - zifeitong opened this pull request about 1 month ago
github.com/vllm-project/vllm - zifeitong opened this pull request about 1 month ago
[Bug]: Tensorizer test is broken
github.com/vllm-project/vllm - alexeykondrat opened this issue about 1 month ago
github.com/vllm-project/vllm - alexeykondrat opened this issue about 1 month ago
[Kernel] [Triton] Memory optimization for awq_gemm and awq_dequantize, 2x throughput
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
[Model] Support multiple images for qwen-vl
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 1 month ago
github.com/vllm-project/vllm - alex-jw-brooks opened this pull request about 1 month ago
Memory optimization for awq_gemm and awq_dequantize, 2x throughput
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
github.com/vllm-project/vllm - rasmith opened this pull request about 1 month ago
[Kernel] Build flash-attn from source
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request about 1 month ago
[Performance]: Using vLLM for Llama3.1 405b fp8 on 8xH100 yields poor throughput
github.com/vllm-project/vllm - jorgeantonio21 opened this issue about 1 month ago
github.com/vllm-project/vllm - jorgeantonio21 opened this issue about 1 month ago
[Installation]: NotImplementedError get_device_capability
github.com/vllm-project/vllm - joestein-ssc opened this issue about 1 month ago
github.com/vllm-project/vllm - joestein-ssc opened this issue about 1 month ago
[Bug]: GPU Memory Utilization Lower Than Expected with --enable-prefix-caching
github.com/vllm-project/vllm - hxer7963 opened this issue about 1 month ago
github.com/vllm-project/vllm - hxer7963 opened this issue about 1 month ago
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py)
github.com/vllm-project/vllm - wschin opened this pull request about 1 month ago
github.com/vllm-project/vllm - wschin opened this pull request about 1 month ago
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching
github.com/vllm-project/vllm - llsj14 opened this pull request about 1 month ago
github.com/vllm-project/vllm - llsj14 opened this pull request about 1 month ago
[Usage]: How to determine the batch size for batch offline inference?
github.com/vllm-project/vllm - pspdada opened this issue about 1 month ago
github.com/vllm-project/vllm - pspdada opened this issue about 1 month ago
[Model] Multi-input support for LLaVA and fix embedding inputs for multi-image models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 1 month ago
[Bug]: requests with response_format cause vllm to hang with pipeline parallel
github.com/vllm-project/vllm - rymc opened this issue about 1 month ago
github.com/vllm-project/vllm - rymc opened this issue about 1 month ago
[Feature]: Add multi-image input support for LLaVA offline inference (similar to #7230)
github.com/vllm-project/vllm - yinsong1986 opened this issue about 1 month ago
github.com/vllm-project/vllm - yinsong1986 opened this issue about 1 month ago
[Bug]: Missing TextTokenPrompts class
github.com/vllm-project/vllm - shubh9m opened this issue about 1 month ago
github.com/vllm-project/vllm - shubh9m opened this issue about 1 month ago
[BugFix] Fix metrics error for --num-scheduler-steps > 1
github.com/vllm-project/vllm - yuleil opened this pull request about 1 month ago
github.com/vllm-project/vllm - yuleil opened this pull request about 1 month ago
[Bug]: Metrics error for --num-scheduler-steps > 1
github.com/vllm-project/vllm - yuleil opened this issue about 1 month ago
github.com/vllm-project/vllm - yuleil opened this issue about 1 month ago
[New Model]: when support MiniCPM3ForCausalLM MiniCPM3-4B model
github.com/vllm-project/vllm - ML-GCN opened this issue about 1 month ago
github.com/vllm-project/vllm - ML-GCN opened this issue about 1 month ago
[Misc]: Throughput calculation in benchmark_throughput.py
github.com/vllm-project/vllm - Andy0422 opened this issue about 1 month ago
github.com/vllm-project/vllm - Andy0422 opened this issue about 1 month ago
[Bug]: vLLM 0.5.5 using prefix caching causing CUDA error: illegal memory access
github.com/vllm-project/vllm - Sekri0 opened this issue about 1 month ago
github.com/vllm-project/vllm - Sekri0 opened this issue about 1 month ago
[Misc]: kvcache hash collision
github.com/vllm-project/vllm - WangErXiao opened this issue about 1 month ago
github.com/vllm-project/vllm - WangErXiao opened this issue about 1 month ago
[Performance]: Clarification on Base Model Inference Count with Multiple LoRA Models in vLLM Deployment
github.com/vllm-project/vllm - zhangyuqi-1 opened this issue about 1 month ago
github.com/vllm-project/vllm - zhangyuqi-1 opened this issue about 1 month ago
[Bug]: In the case of quantization=compressed-tensors, Qwen2-57B-A14B-Instruct is not supported.
github.com/vllm-project/vllm - liangshaopeng opened this issue about 1 month ago
github.com/vllm-project/vllm - liangshaopeng opened this issue about 1 month ago
[Bug]: sm75 --num-scheduler-steps 8, unhandled errors in a TaskGroup
github.com/vllm-project/vllm - maxin9966 opened this issue about 1 month ago
github.com/vllm-project/vllm - maxin9966 opened this issue about 1 month ago
[Usage]: FP8 and INT8
github.com/vllm-project/vllm - chenchunhui97 opened this issue about 1 month ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue about 1 month ago
[Spec Decode] Move ops.advance_step to flash attn advance_step
github.com/vllm-project/vllm - kevin314 opened this pull request about 1 month ago
github.com/vllm-project/vllm - kevin314 opened this pull request about 1 month ago
[Bug]: Poor TTFT performance with simultaneous --enable-chunked-prefill and --enable-prefix-caching
github.com/vllm-project/vllm - hxer7963 opened this issue about 2 months ago
github.com/vllm-project/vllm - hxer7963 opened this issue about 2 months ago
[Installation]: Is there no more "***.whl" based on cuda12?
github.com/vllm-project/vllm - zejunwang1 opened this issue about 2 months ago
github.com/vllm-project/vllm - zejunwang1 opened this issue about 2 months ago
[Installation]: Dockerfile for aarch64
github.com/vllm-project/vllm - khayamgondal opened this issue about 2 months ago
github.com/vllm-project/vllm - khayamgondal opened this issue about 2 months ago
[Misc] Remove `SqueezeLLM`
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Bug]: vLLM v0.6.0 Instability issue. "ValueError: max() arg is an empty sequence" under load.
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
[Feature]: Supporting Guided Decoding via AsyncLLMEngine
github.com/vllm-project/vllm - DhruvaBansal00 opened this issue about 2 months ago
github.com/vllm-project/vllm - DhruvaBansal00 opened this issue about 2 months ago
[Misc] Fused MoE Marlin support for GPTQ
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[BugFix] Fix Granite model configuration
github.com/vllm-project/vllm - njhill opened this pull request about 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request about 2 months ago
[Usage]: how can i perfrome multiimage inference? in MiniCPM-V-2_6 model or any vision language model with vllm?
github.com/vllm-project/vllm - dahwin opened this issue about 2 months ago
github.com/vllm-project/vllm - dahwin opened this issue about 2 months ago
[Misc] Add GPTQ Marlin Fused MoE Support
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
Add VLLM_LOGGING_INTERVAL_SEC envvar to control logging rate
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
[Bug]: FastAPI 0.113.0 breaks vLLM OpenAPI
github.com/vllm-project/vllm - drikster80 opened this issue about 2 months ago
github.com/vllm-project/vllm - drikster80 opened this issue about 2 months ago
[Misc] Upgrade vllm-flash-attn to v2.6.2
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Usage]: Is there an argument to adjust the interval time for logs displayed on the terminal?
github.com/vllm-project/vllm - KimMinSang96 opened this issue about 2 months ago
github.com/vllm-project/vllm - KimMinSang96 opened this issue about 2 months ago
Fix shutdown problem
github.com/vllm-project/vllm - Bye-legumes opened this pull request about 2 months ago
github.com/vllm-project/vllm - Bye-legumes opened this pull request about 2 months ago
[Bug]: Shutdown problem when we use ADAG
github.com/vllm-project/vllm - Bye-legumes opened this issue about 2 months ago
github.com/vllm-project/vllm - Bye-legumes opened this issue about 2 months ago
[Model] Adding Granite MoE.
github.com/vllm-project/vllm - shawntan opened this pull request about 2 months ago
github.com/vllm-project/vllm - shawntan opened this pull request about 2 months ago
[Misc]: benchmark_serving with image input
github.com/vllm-project/vllm - Mrxiangli opened this issue about 2 months ago
github.com/vllm-project/vllm - Mrxiangli opened this issue about 2 months ago
[Bug]: [Errno 98] error while attempting to bind on address ('0.0.0.0', 8000): address already in use
github.com/vllm-project/vllm - youkaichao opened this issue about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this issue about 2 months ago
[CI/Build] Increasing timeout for multiproc worker tests
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request about 2 months ago
[Model][VLM] Support multi-images inputs for InternVL2 models
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Core] *Prompt* logprobs support in Multi-step
github.com/vllm-project/vllm - afeldman-nm opened this pull request about 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request about 2 months ago
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError
github.com/vllm-project/vllm - NicolasDrapier opened this issue about 2 months ago
github.com/vllm-project/vllm - NicolasDrapier opened this issue about 2 months ago
[Feature]: `benchmark_serving.py` should support `--logprobs`
github.com/vllm-project/vllm - afeldman-nm opened this issue about 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this issue about 2 months ago
[OpenVINO] Enable GPU support for OpenVINO vLLM backend
github.com/vllm-project/vllm - sshlyapn opened this pull request about 2 months ago
github.com/vllm-project/vllm - sshlyapn opened this pull request about 2 months ago
[Frontend] Add --logprobs argument to `benchmark_serving.py`
github.com/vllm-project/vllm - afeldman-nm opened this pull request about 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request about 2 months ago
[Usage]: how to release cuda memory
github.com/vllm-project/vllm - UCC-team opened this issue about 2 months ago
github.com/vllm-project/vllm - UCC-team opened this issue about 2 months ago
[Misc]: HELPPP! Implement vLLM Library in FastAPI using MultiGPUS got Force Shutdown after some warning
github.com/vllm-project/vllm - hanifabd opened this issue about 2 months ago
github.com/vllm-project/vllm - hanifabd opened this issue about 2 months ago
[Bug]: Phi-3.5-MoE-Instruct on vLLM produces weird strings
github.com/vllm-project/vllm - chiwanpark opened this issue about 2 months ago
github.com/vllm-project/vllm - chiwanpark opened this issue about 2 months ago
[Usage]: number of allocated GPU blocks depending on max_seq_length ??
github.com/vllm-project/vllm - vpellegrain opened this issue about 2 months ago
github.com/vllm-project/vllm - vpellegrain opened this issue about 2 months ago
[Bug]: (OOM) Find two places that cause a significant increase in GPU memory usage (probably lead to memory leak)
github.com/vllm-project/vllm - cafeii opened this issue about 2 months ago
github.com/vllm-project/vllm - cafeii opened this issue about 2 months ago
[Bug]: AssertionError: Logits Processors are not supported in multi-step decoding
github.com/vllm-project/vllm - Quang-elec44 opened this issue about 2 months ago
github.com/vllm-project/vllm - Quang-elec44 opened this issue about 2 months ago
[Doc] Add multi-image input example and update supported models
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Usage]: "RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`" when serving w8a8
github.com/vllm-project/vllm - xyionwu opened this issue about 2 months ago
github.com/vllm-project/vllm - xyionwu opened this issue about 2 months ago
[Bug]: Frequent Errors:async_llm_engine.py:158] Aborted request
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
github.com/vllm-project/vllm - TangJiakai opened this issue about 2 months ago
[Bug]: In v0.6.0 and above, Some of monitoring metrics are not correct.
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
github.com/vllm-project/vllm - ashgold opened this issue about 2 months ago
[Bug]: watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - DreamGenX opened this issue about 2 months ago
github.com/vllm-project/vllm - DreamGenX opened this issue about 2 months ago
[Performance]: reproducing vLLM performance benchmark
github.com/vllm-project/vllm - KuntaiDu opened this issue about 2 months ago
github.com/vllm-project/vllm - KuntaiDu opened this issue about 2 months ago
[Benchmark] Add block_size option to benchmark_throughput.py
github.com/vllm-project/vllm - liangfu opened this pull request about 2 months ago
github.com/vllm-project/vllm - liangfu opened this pull request about 2 months ago
[Installation]: error: can't copy 'build/lib.linux-x86_64-3.10/vllm/_core_C.abi3.so': doesn't exist or not a regular file
github.com/vllm-project/vllm - DreamerZhang11 opened this issue about 2 months ago
github.com/vllm-project/vllm - DreamerZhang11 opened this issue about 2 months ago
[Core/Bugfix] Add query dtype as per FlashInfer API requirements.
github.com/vllm-project/vllm - elfiegg opened this pull request about 2 months ago
github.com/vllm-project/vllm - elfiegg opened this pull request about 2 months ago
[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
[New Model]: Support for allenai/OLMoE-1B-7B-0924
github.com/vllm-project/vllm - GulatiAditya opened this issue about 2 months ago
github.com/vllm-project/vllm - GulatiAditya opened this issue about 2 months ago
[bugfix] Upgrade minimum OpenAI version
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request about 2 months ago
[Model] Allow loading from original Mistral format
github.com/vllm-project/vllm - patrickvonplaten opened this pull request about 2 months ago
github.com/vllm-project/vllm - patrickvonplaten opened this pull request about 2 months ago
[Usage]: KV cache memory utilization tracking
github.com/vllm-project/vllm - shubh9m opened this issue about 2 months ago
github.com/vllm-project/vllm - shubh9m opened this issue about 2 months ago
Bump version to v0.6.0
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
Move verify_marlin_supported to GPTQMarlinLinearMethod
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
[MISC] Replace input token throughput with total token throughput
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
[CI] Change test input in Gemma LoRA test
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request about 2 months ago
[Misc] remove peft as dependency for prompt models
github.com/vllm-project/vllm - prashantgupta24 opened this pull request about 2 months ago
github.com/vllm-project/vllm - prashantgupta24 opened this pull request about 2 months ago
[Doc] [Misc] Create CODE_OF_CONDUCT.md
github.com/vllm-project/vllm - mmcelaney opened this pull request about 2 months ago
github.com/vllm-project/vllm - mmcelaney opened this pull request about 2 months ago
[ci] Mark LoRA test as soft-fail
github.com/vllm-project/vllm - khluu opened this pull request about 2 months ago
github.com/vllm-project/vllm - khluu opened this pull request about 2 months ago
[Feature]: Allow partial context in speculative decoding when using draft models with smaller context than target model
github.com/vllm-project/vllm - dsingal0 opened this issue about 2 months ago
github.com/vllm-project/vllm - dsingal0 opened this issue about 2 months ago
[Bug]: vllm async engine can not use adag
github.com/vllm-project/vllm - Bye-legumes opened this issue about 2 months ago
github.com/vllm-project/vllm - Bye-legumes opened this issue about 2 months ago
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
[Misc]: Use of response_format and guided output in LLMEngine and AsyncLLMEngine
github.com/vllm-project/vllm - ingambe opened this issue about 2 months ago
github.com/vllm-project/vllm - ingambe opened this issue about 2 months ago
[Bugfix] Fix missing `post_layernorm` in CLIP
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Usage]: How to use vllm infer video with Internvl2 8b multimodal model
github.com/vllm-project/vllm - PancakeAwesome opened this issue about 2 months ago
github.com/vllm-project/vllm - PancakeAwesome opened this issue about 2 months ago