Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: Embedding doesn't work with `device="cpu"`
github.com/vllm-project/vllm - TheRoadQaQ opened this issue 4 months ago
github.com/vllm-project/vllm - TheRoadQaQ opened this issue 4 months ago
[Model] Port over CLIPVisionModel for VLMs
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
[Usage]: 'InternVLChatConfig' object has no attribute 'num_attention_heads'
github.com/vllm-project/vllm - wangdong1992 opened this issue 4 months ago
github.com/vllm-project/vllm - wangdong1992 opened this issue 4 months ago
[Optimization] use a pool to reuse LogicalTokenBlock.token_ids
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Hardware][Intel] Add AWQ support for CPU backend
github.com/vllm-project/vllm - zhouyuan opened this pull request 4 months ago
github.com/vllm-project/vllm - zhouyuan opened this pull request 4 months ago
[Frontend] Add model peak memory usage to loading weights log
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
[Bug]: chunked prefill scheudler uses up swap on many n>=2 requests
github.com/vllm-project/vllm - toslunar opened this issue 4 months ago
github.com/vllm-project/vllm - toslunar opened this issue 4 months ago
[CI/BUILD] Support non-AVX512 vLLM building and testing
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
[Bug]: BitsandBytes quantization is not working as expected
github.com/vllm-project/vllm - QwertyJack opened this issue 4 months ago
github.com/vllm-project/vllm - QwertyJack opened this issue 4 months ago
[Bug]: Regression in LoRA Adapter loading speed between vllm 0.4.3 and 0.5.0
github.com/vllm-project/vllm - sampritipanda opened this issue 4 months ago
github.com/vllm-project/vllm - sampritipanda opened this issue 4 months ago
[Bug]: Speculative decoding server: `ValueError: could not broadcast input array from shape (513,) into shape (512,)`
github.com/vllm-project/vllm - jeffreyling opened this issue 4 months ago
github.com/vllm-project/vllm - jeffreyling opened this issue 4 months ago
[Performance] [Speculative decoding] Speed up autoregressive proposal methods by making sampler CPU serialization optional
github.com/vllm-project/vllm - cadedaniel opened this issue 4 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 4 months ago
[Kernel] Adding bias epilogue support for `cutlass_scaled_mm`
github.com/vllm-project/vllm - ProExpertProg opened this pull request 4 months ago
github.com/vllm-project/vllm - ProExpertProg opened this pull request 4 months ago
[Kernel] Add punica dimensions for Granite 13b
github.com/vllm-project/vllm - joerunde opened this pull request 4 months ago
github.com/vllm-project/vllm - joerunde opened this pull request 4 months ago
[RFC]: Implement disaggregated prefilling via KV cache transfer
github.com/vllm-project/vllm - KuntaiDu opened this issue 4 months ago
github.com/vllm-project/vllm - KuntaiDu opened this issue 4 months ago
[Bug]: RuntimeError: CUDA error: no kernel image is available for execution on the device
github.com/vllm-project/vllm - seungyoonee opened this issue 4 months ago
github.com/vllm-project/vllm - seungyoonee opened this issue 4 months ago
[Bug]: prefix-caching: inconsistent completions
github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago
github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
[Feature]: LoRA support for Mixtral GPTQ and AWQ
github.com/vllm-project/vllm - StrikerRUS opened this issue 4 months ago
github.com/vllm-project/vllm - StrikerRUS opened this issue 4 months ago
[Bug]: CUDA illegal memory access error when `enable_prefix_caching=True`
github.com/vllm-project/vllm - mpoemsl opened this issue 4 months ago
github.com/vllm-project/vllm - mpoemsl opened this issue 4 months ago
[Usage]: how to use enable-chunked-prefill?
github.com/vllm-project/vllm - chenchunhui97 opened this issue 4 months ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue 4 months ago
[Bug]: Very slow execution of from_lora_tensors() when using mp instead of ray as --distributed-executor-backend.
github.com/vllm-project/vllm - ashgold opened this issue 4 months ago
github.com/vllm-project/vllm - ashgold opened this issue 4 months ago
[Bug]: In vLLM v0.4.3 and later, calling list_loras() in a tensor parallelism situation causes the system to hang.
github.com/vllm-project/vllm - ashgold opened this issue 4 months ago
github.com/vllm-project/vllm - ashgold opened this issue 4 months ago
[CI/Build] Disable LLaVA-NeXT CPU test
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Core][Distributed] improve p2p cache generation
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Bug]: MOE模型,2卡推理,报错AssertionError("Invalid device id")
github.com/vllm-project/vllm - Elissa0723 opened this issue 4 months ago
github.com/vllm-project/vllm - Elissa0723 opened this issue 4 months ago
[CI/Build] [1/3] Reorganize entrypoints tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Core] Remove duplicate processing in async engine
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Bug]: The speed of loading the qwen2 72b model, glm-4-9b-chat-1m model in v0.5.0 is much lower than that in v0.4.2.
github.com/vllm-project/vllm - majestichou opened this issue 4 months ago
github.com/vllm-project/vllm - majestichou opened this issue 4 months ago
bump version to v0.5.0.post1
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[Bug]: Shutdown error when using multiproc_gpu_executor
github.com/vllm-project/vllm - wooyeonlee0 opened this issue 4 months ago
github.com/vllm-project/vllm - wooyeonlee0 opened this issue 4 months ago
[RFC]: Usage Data Enhancement for v0.5.*
github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 4 months ago
Limit visible devices for 2gpu tests
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
Add basic correctness 2 GPU tests to 4 GPU pipeline
github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago
[Bug]: Excessive Memory Consumption of Cudagraph on A10G/L4 GPUs
github.com/vllm-project/vllm - ymwangg opened this issue 4 months ago
github.com/vllm-project/vllm - ymwangg opened this issue 4 months ago
[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
[Misc] Log cudagraph memory usage
github.com/vllm-project/vllm - ymwangg opened this pull request 4 months ago
github.com/vllm-project/vllm - ymwangg opened this pull request 4 months ago
[Kernel] Update Cutlass int8 kernel configs for SM90
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 4 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 4 months ago
[Bug]: Error loading FP8 weights for `gpt_bigcode` model
github.com/vllm-project/vllm - tdoublep opened this issue 4 months ago
github.com/vllm-project/vllm - tdoublep opened this issue 4 months ago
[misc][distributed] fix benign error in `is_in_the_same_node`
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[misc] fix format.sh
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[CI/Build] Disable test_fp8.py
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
[Bug]: Illegal memory access in CUTLASS FP8 kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this issue 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this issue 4 months ago
[Kernel] Disable CUTLASS kernels for fp8
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
[Bug]: ModuleNotFoundError: No module named 'bitsandbytes'
github.com/vllm-project/vllm - emillykkejensen opened this issue 4 months ago
github.com/vllm-project/vllm - emillykkejensen opened this issue 4 months ago
[Bug]: RuntimeError: out must have shape (total_q, num_heads, head_size_og)
github.com/vllm-project/vllm - zhihui96 opened this issue 4 months ago
github.com/vllm-project/vllm - zhihui96 opened this issue 4 months ago
support load qwen2-72b-instruct lora
github.com/vllm-project/vllm - NiuBlibing opened this pull request 4 months ago
github.com/vllm-project/vllm - NiuBlibing opened this pull request 4 months ago
[Bug]: Qwen/Qwen2-72B-Instruct 128k server down
github.com/vllm-project/vllm - junior-zsy opened this issue 4 months ago
github.com/vllm-project/vllm - junior-zsy opened this issue 4 months ago
[Bug]: ray not work when tp>=2
github.com/vllm-project/vllm - Jimmy-Lu opened this issue 4 months ago
github.com/vllm-project/vllm - Jimmy-Lu opened this issue 4 months ago
[Usage]: How do I get the FP8 scaling factors for KV cache?
github.com/vllm-project/vllm - CharlesRiggins opened this issue 4 months ago
github.com/vllm-project/vllm - CharlesRiggins opened this issue 4 months ago
[Hardware][Intel] fp8 kv cache support for CPU
github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request 4 months ago
[Feature]: load/unload API to run multiple LLMs in a single GPU instance
github.com/vllm-project/vllm - lizzzcai opened this issue 4 months ago
github.com/vllm-project/vllm - lizzzcai opened this issue 4 months ago
当调用接口,不传system时,输出卡主了,输出全是!!!!!
github.com/vllm-project/vllm - shujun1992 opened this issue 4 months ago
github.com/vllm-project/vllm - shujun1992 opened this issue 4 months ago
[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗
github.com/vllm-project/vllm - huai-ying opened this issue 4 months ago
github.com/vllm-project/vllm - huai-ying opened this issue 4 months ago
Enable random seed option to make latency benchmarking more configurable
github.com/vllm-project/vllm - qingquansong opened this pull request 4 months ago
github.com/vllm-project/vllm - qingquansong opened this pull request 4 months ago
[Bug]: ImportError: cannot import name 'boolean_dispatched' from partially initialized module 'torch._jit_internal'
github.com/vllm-project/vllm - morestart opened this issue 4 months ago
github.com/vllm-project/vllm - morestart opened this issue 4 months ago
[Bug]: NCCL hangs and causes timeout
github.com/vllm-project/vllm - wjj19950828 opened this issue 4 months ago
github.com/vllm-project/vllm - wjj19950828 opened this issue 4 months ago
[Misc] add code to get git hash info for vllm
github.com/vllm-project/vllm - dhuangnm opened this pull request 4 months ago
github.com/vllm-project/vllm - dhuangnm opened this pull request 4 months ago
[CI/Build] Enable CPU test for VLMs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Usage]: Can I use vllm.LLM(quantization="bitsandbytes"...) when bitsandbytes is supported in the v0.5.0 version
github.com/vllm-project/vllm - cywuuuu opened this issue 4 months ago
github.com/vllm-project/vllm - cywuuuu opened this issue 4 months ago
[Bug]: Loading Mixtral-8x22B-Instruct-v0.1-FP8 on 8xL40S causes a SIGSEGV
github.com/vllm-project/vllm - nickandbro opened this issue 4 months ago
github.com/vllm-project/vllm - nickandbro opened this issue 4 months ago
[Usage]: OpenRLHF: How can I create a second NCCL Group in a vLLM v0.4.3+ Ray worker?
github.com/vllm-project/vllm - hijkzzz opened this issue 4 months ago
github.com/vllm-project/vllm - hijkzzz opened this issue 4 months ago
Add `cuda_device_count_stateless`
github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 4 months ago
[Doc] Update documentation on Tensorizer
github.com/vllm-project/vllm - sangstar opened this pull request 4 months ago
github.com/vllm-project/vllm - sangstar opened this pull request 4 months ago
[Bug][v0.5.0]: Benign error reported by Python multiprocessing resource_tracker
github.com/vllm-project/vllm - mgoin opened this issue 4 months ago
github.com/vllm-project/vllm - mgoin opened this issue 4 months ago
[Feature]: Allow user defined extra request args to be logged in OpenAI compatible server
github.com/vllm-project/vllm - davidgxue opened this issue 4 months ago
github.com/vllm-project/vllm - davidgxue opened this issue 4 months ago
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
[Bug]: Runtime Error: GET was unable to find an engine to execute this computation for LLaVa-NEXT
github.com/vllm-project/vllm - XkunW opened this issue 4 months ago
github.com/vllm-project/vllm - XkunW opened this issue 4 months ago
[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations"
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[misc] add hint for AttributeError
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models
github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 4 months ago
[Feature]: PagedAttention multiple of 8
github.com/vllm-project/vllm - barschiiii opened this issue 4 months ago
github.com/vllm-project/vllm - barschiiii opened this issue 4 months ago
[Bug]: Error when --tensor-parallel-size > 1
github.com/vllm-project/vllm - javi111717 opened this issue 4 months ago
github.com/vllm-project/vllm - javi111717 opened this issue 4 months ago
[Installation]: M2 Mac Dependency Torch 2.1.2 (Incompatible)
github.com/vllm-project/vllm - velocity33 opened this issue 4 months ago
github.com/vllm-project/vllm - velocity33 opened this issue 4 months ago
[Bug]: Outdated binaries when re-building vLLM from source
github.com/vllm-project/vllm - DarkLight1337 opened this issue 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this issue 4 months ago
[Bugfix] Skip test temporarily; failing quantization test
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
[Bug]: 0.5.0 AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'
github.com/vllm-project/vllm - WangErXiao opened this issue 4 months ago
github.com/vllm-project/vllm - WangErXiao opened this issue 4 months ago
[Usage] Clarify and Update Argument for Specifying Model Revisions
github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago
github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago
[Hardware][Intel] Support CPU inference with AVX2 ISA
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
[Bugfix] Fix wrong multi_modal_input format for CPU runner
github.com/vllm-project/vllm - Isotr0py opened this pull request 4 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 4 months ago
[Bug]: vllm v0.5.0 internal assert failed
github.com/vllm-project/vllm - changshivek opened this issue 4 months ago
github.com/vllm-project/vllm - changshivek opened this issue 4 months ago
[Usage]: How to serve embedding model and LLM at the same time
github.com/vllm-project/vllm - weiyunfei opened this issue 4 months ago
github.com/vllm-project/vllm - weiyunfei opened this issue 4 months ago
[Bug]: AttributeError: '_OpNamespace' '_C_cache_ops' object has no attribute 'reshape_and_cache_flash'
github.com/vllm-project/vllm - syuoni opened this issue 4 months ago
github.com/vllm-project/vllm - syuoni opened this issue 4 months ago
[Model] Bert Embedding Model
github.com/vllm-project/vllm - laishzh opened this pull request 4 months ago
github.com/vllm-project/vllm - laishzh opened this pull request 4 months ago
[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago
multilora_inference调用qwen2-1.5b报错
github.com/vllm-project/vllm - zigangzhao-ai opened this issue 4 months ago
github.com/vllm-project/vllm - zigangzhao-ai opened this issue 4 months ago
[Bugfix] TYPE_CHECKING for MultiModalData
github.com/vllm-project/vllm - kimdwkimdw opened this pull request 4 months ago
github.com/vllm-project/vllm - kimdwkimdw opened this pull request 4 months ago
[Bug]: v0.4.3 AsyncEngineDeadError
github.com/vllm-project/vllm - changshivek opened this issue 4 months ago
github.com/vllm-project/vllm - changshivek opened this issue 4 months ago
[Bugfix] Avoid to warmup when world size is 1
github.com/vllm-project/vllm - kerthcet opened this pull request 4 months ago
github.com/vllm-project/vllm - kerthcet opened this pull request 4 months ago
[Kernel] Add punica dimension for Qwen2 LoRA
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 4 months ago
github.com/vllm-project/vllm - jinzhen-lin opened this pull request 4 months ago
[Bug]: TypeError: a bytes-like object is required, not 'str'
github.com/vllm-project/vllm - yaoyasong opened this issue 4 months ago
github.com/vllm-project/vllm - yaoyasong opened this issue 4 months ago