Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bug]: Very slow execution of from_lora_tensors() when using mp instead of ray as --distributed-executor-backend.
ashgold opened this issue 7 months ago
ashgold opened this issue 7 months ago
[Bug]: In vLLM v0.4.3 and later, calling list_loras() in a tensor parallelism situation causes the system to hang.
ashgold opened this issue 7 months ago
ashgold opened this issue 7 months ago
[ci] Diff check step
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
[CI/Build] Disable LLaVA-NeXT CPU test
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Core][Distributed] improve p2p cache generation
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: MOE模型,2卡推理,报错AssertionError("Invalid device id")
Elissa0723 opened this issue 7 months ago
Elissa0723 opened this issue 7 months ago
[CI/Build] [1/3] Reorganize entrypoints tests
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Core] Remove duplicate processing in async engine
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Misc] Fix arg names
AllenDou opened this pull request 7 months ago
AllenDou opened this pull request 7 months ago
[Bug]: The speed of loading the qwen2 72b model, glm-4-9b-chat-1m model in v0.5.0 is much lower than that in v0.4.2.
majestichou opened this issue 7 months ago
majestichou opened this issue 7 months ago
bump version to v0.5.0.post1
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Bug]: Shutdown error when using multiproc_gpu_executor
wooyeonlee0 opened this issue 7 months ago
wooyeonlee0 opened this issue 7 months ago
[RFC]: Usage Data Enhancement for v0.5.*
simon-mo opened this issue 7 months ago
simon-mo opened this issue 7 months ago
Limit visible devices for 2gpu tests
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
Add basic correctness 2 GPU tests to 4 GPU pipeline
Yard1 opened this pull request 7 months ago
Yard1 opened this pull request 7 months ago
[Bug]: Excessive Memory Consumption of Cudagraph on A10G/L4 GPUs
ymwangg opened this issue 7 months ago
ymwangg opened this issue 7 months ago
[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Misc] Log cudagraph memory usage
ymwangg opened this pull request 7 months ago
ymwangg opened this pull request 7 months ago
[Kernel] Update Cutlass int8 kernel configs for SM90
varun-sundar-rabindranath opened this pull request 7 months ago
varun-sundar-rabindranath opened this pull request 7 months ago
[Bug]: Error loading FP8 weights for `gpt_bigcode` model
tdoublep opened this issue 7 months ago
tdoublep opened this issue 7 months ago
[misc][distributed] fix benign error in `is_in_the_same_node`
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[misc] fix format.sh
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[CI/Build] Disable test_fp8.py
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Bugfix]typofix
AllenDou opened this pull request 7 months ago
AllenDou opened this pull request 7 months ago
[Bug]: Illegal memory access in CUTLASS FP8 kernels
tlrmchlsmth opened this issue 7 months ago
tlrmchlsmth opened this issue 7 months ago
[Kernel] Disable CUTLASS kernels for fp8
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Bug]: ModuleNotFoundError: No module named 'bitsandbytes'
emillykkejensen opened this issue 7 months ago
emillykkejensen opened this issue 7 months ago
[Bug]: ailed to import from vllm._C with ImportError('/usr/local/lib/python3.8/dist-packages/vllm/_C.abi3.so: undefined symbol: _ZN5torch7LibraryC1ENS0_4KindESsSt8optionalIN3c1011DispatchKeyEEPKcj')
MonolithFoundation opened this issue 7 months ago
MonolithFoundation opened this issue 7 months ago
[Bug]: RuntimeError: out must have shape (total_q, num_heads, head_size_og)
zhihui96 opened this issue 7 months ago
zhihui96 opened this issue 7 months ago
support load qwen2-72b-instruct lora
NiuBlibing opened this pull request 7 months ago
NiuBlibing opened this pull request 7 months ago
[Bug]: Qwen/Qwen2-72B-Instruct 128k server down
junior-zsy opened this issue 7 months ago
junior-zsy opened this issue 7 months ago
[Bug]: ray not work when tp>=2
Jimmy-Lu opened this issue 7 months ago
Jimmy-Lu opened this issue 7 months ago
[Usage]: How do I get the FP8 scaling factors for KV cache?
CharlesRiggins opened this issue 7 months ago
CharlesRiggins opened this issue 7 months ago
[Hardware][Intel] fp8 kv cache support for CPU
jikunshang opened this pull request 7 months ago
jikunshang opened this pull request 7 months ago
[Feature]: load/unload API to run multiple LLMs in a single GPU instance
lizzzcai opened this issue 7 months ago
lizzzcai opened this issue 7 months ago
当调用接口,不传system时,输出卡主了,输出全是!!!!!
shujun1992 opened this issue 7 months ago
shujun1992 opened this issue 7 months ago
[Feature]: 我们可以让vllm支持tensorrt编译之后的engine吗
huai-ying opened this issue 7 months ago
huai-ying opened this issue 7 months ago
Enable random seed option to make latency benchmarking more configurable
qingquansong opened this pull request 7 months ago
qingquansong opened this pull request 7 months ago
[Bug]: ImportError: cannot import name 'boolean_dispatched' from partially initialized module 'torch._jit_internal'
morestart opened this issue 7 months ago
morestart opened this issue 7 months ago
[Bug]: NCCL hangs and causes timeout
wjj19950828 opened this issue 7 months ago
wjj19950828 opened this issue 7 months ago
[Misc] add code to get git hash info for vllm
dhuangnm opened this pull request 7 months ago
dhuangnm opened this pull request 7 months ago
[CI/Build] Update CPU tests to include all "standard" tests
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Usage]: Can I use vllm.LLM(quantization="bitsandbytes"...) when bitsandbytes is supported in the v0.5.0 version
cywuuuu opened this issue 7 months ago
cywuuuu opened this issue 7 months ago
[Bug]: Loading Mixtral-8x22B-Instruct-v0.1-FP8 on 8xL40S causes a SIGSEGV
nickandbro opened this issue 7 months ago
nickandbro opened this issue 7 months ago
[Usage]: OpenRLHF: How can I create a second NCCL Group in a vLLM v0.4.3+ Ray worker?
hijkzzz opened this issue 7 months ago
hijkzzz opened this issue 7 months ago
Add `cuda_device_count_stateless`
Yard1 opened this pull request 7 months ago
Yard1 opened this pull request 7 months ago
[Doc] Update documentation on Tensorizer
sangstar opened this pull request 7 months ago
sangstar opened this pull request 7 months ago
[ci] Upload wheels
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
[Bug][v0.5.0]: Benign error reported by Python multiprocessing resource_tracker
mgoin opened this issue 7 months ago
mgoin opened this issue 7 months ago
[Feature]: Allow user defined extra request args to be logged in OpenAI compatible server
davidgxue opened this issue 7 months ago
davidgxue opened this issue 7 months ago
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Bug]: Runtime Error: GET was unable to find an engine to execute this computation for LLaVa-NEXT
XkunW opened this issue 7 months ago
XkunW opened this issue 7 months ago
[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations"
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[misc] add hint for AttributeError
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: Torch2.3 run fail
lucasjinreal opened this issue 7 months ago
lucasjinreal opened this issue 7 months ago
[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
[Feature]: PagedAttention multiple of 8
barschiiii opened this issue 7 months ago
barschiiii opened this issue 7 months ago
[Bug]: Error when --tensor-parallel-size > 1
javi111717 opened this issue 7 months ago
javi111717 opened this issue 7 months ago
[Installation]: M2 Mac Dependency Torch 2.1.2 (Incompatible)
velocity33 opened this issue 7 months ago
velocity33 opened this issue 7 months ago
[Bug]: Outdated binaries when re-building vLLM from source
DarkLight1337 opened this issue 7 months ago
DarkLight1337 opened this issue 7 months ago
[Bugfix] Skip test temporarily; failing quantization test
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
[Bug]: 0.5.0 AttributeError: '_OpNamespace' '_C' object has no attribute 'rms_norm'
WangErXiao opened this issue 7 months ago
WangErXiao opened this issue 7 months ago
[Usage] Clarify and Update Argument for Specifying Model Revisions
Etelis opened this pull request 7 months ago
Etelis opened this pull request 7 months ago
[Hardware][Intel] Support CPU inference with AVX2 ISA
DamonFool opened this pull request 7 months ago
DamonFool opened this pull request 7 months ago
[Bugfix] Fix wrong multi_modal_input format for CPU runner
Isotr0py opened this pull request 7 months ago
Isotr0py opened this pull request 7 months ago
[Bug]: vllm v0.5.0 internal assert failed
changshivek opened this issue 7 months ago
changshivek opened this issue 7 months ago
[Usage]: How to serve embedding model and LLM at the same time
weiyunfei opened this issue 7 months ago
weiyunfei opened this issue 7 months ago
[Bug]: AttributeError: '_OpNamespace' '_C_cache_ops' object has no attribute 'reshape_and_cache_flash'
syuoni opened this issue 7 months ago
syuoni opened this issue 7 months ago
[Model] Bert Embedding Model
laishzh opened this pull request 7 months ago
laishzh opened this pull request 7 months ago
[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.
bigPYJ1151 opened this pull request 7 months ago
bigPYJ1151 opened this pull request 7 months ago
multilora_inference调用qwen2-1.5b报错
zigangzhao-ai opened this issue 7 months ago
zigangzhao-ai opened this issue 7 months ago
[Bugfix] TYPE_CHECKING for MultiModalData
kimdwkimdw opened this pull request 7 months ago
kimdwkimdw opened this pull request 7 months ago
[Bug]: v0.4.3 AsyncEngineDeadError
changshivek opened this issue 7 months ago
changshivek opened this issue 7 months ago
[Bugfix] Avoid to warmup when world size is 1
kerthcet opened this pull request 7 months ago
kerthcet opened this pull request 7 months ago
[Kernel] Add punica dimension for Qwen2 LoRA
jinzhen-lin opened this pull request 7 months ago
jinzhen-lin opened this pull request 7 months ago
[Bug]: TypeError: a bytes-like object is required, not 'str'
yaoyasong opened this issue 7 months ago
yaoyasong opened this issue 7 months ago
[Bug]: resource_tracker unregister error with 2*3090
xuhao916 opened this issue 7 months ago
xuhao916 opened this issue 7 months ago
[Doc] Update debug docs
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Doc] Update LLaVA docs
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bug]: get the degree of the `outlines FSM` compilation progress from vlllm0.5.0 engine (via a route)
syGOAT opened this issue 7 months ago
syGOAT opened this issue 7 months ago
`compressed-tensors` marlin 24 support
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
[Feature]: PagedAttention for CPU-memory constraned environments?
peeteeman opened this issue 7 months ago
peeteeman opened this issue 7 months ago
[Feature]: Add guided-* Parameters to Sampling Parameters
zhanghx0905 opened this issue 7 months ago
zhanghx0905 opened this issue 7 months ago
[ Misc ] Rs/compressed tensors cleanup
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Feature]: Support [RecurrentGemmaForCausalLM]
sung-ho-moon opened this issue 7 months ago
sung-ho-moon opened this issue 7 months ago
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2
c3-ali opened this pull request 7 months ago
c3-ali opened this pull request 7 months ago
[Docs] [Spec decode] Fix docs error in code example
cadedaniel opened this pull request 7 months ago
cadedaniel opened this pull request 7 months ago
[Feature]: ci test with vGPU
youkaichao opened this issue 7 months ago
youkaichao opened this issue 7 months ago
[Frontend] Add "input speed" to tqdm postfix alongside output speed
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size
qaz-wsx-1 opened this issue 7 months ago
qaz-wsx-1 opened this issue 7 months ago
[RFC]: Improve guided decoding (logit_processor) APIs and performance.
rkooo567 opened this issue 7 months ago
rkooo567 opened this issue 7 months ago
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes
mawong-amd opened this pull request 7 months ago
mawong-amd opened this pull request 7 months ago
[Bug]: Automatic Prefix caching not working while hitting same request multiple times
Abhinay2323 opened this issue 7 months ago
Abhinay2323 opened this issue 7 months ago
cache image build
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'
zhaobu opened this issue 7 months ago
zhaobu opened this issue 7 months ago
[Bug]: Small context lengths consume more memory than large context lengths
majestichou opened this issue 7 months ago
majestichou opened this issue 7 months ago
[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?
fake-name opened this issue 7 months ago
fake-name opened this issue 7 months ago
[Speculative Decoding] Support draft model on different tensor-parallel size than target model
wooyeonlee0 opened this pull request 7 months ago
wooyeonlee0 opened this pull request 7 months ago
[Doc]: Urgent MoE question
ymmm-4 opened this issue 7 months ago
ymmm-4 opened this issue 7 months ago