Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
Performance Regression between v0.4.0 and v0.4.1
simon-mo opened this issue 9 months ago
simon-mo opened this issue 9 months ago
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0
sangstar opened this pull request 9 months ago
sangstar opened this pull request 9 months ago
[Usage]: Make request to LLAVA server.
premg16 opened this issue 9 months ago
premg16 opened this issue 9 months ago
[Usage]: How to use LoRARequest with AsyncLLMEngine?
Rares9999 opened this issue 9 months ago
Rares9999 opened this issue 9 months ago
[Installation]: Failed to build form source code. Python=3.9 CUDA=12.1
WJMacro opened this issue 9 months ago
WJMacro opened this issue 9 months ago
[Frontend] Support GPT-4V Chat Completions API
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Model] Initial support for LLaVA-NeXT
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Bug]: KeyError: 'model.layers.24.mlp.down_proj.weight' for llama 7b model SqueezeLLM quantization
condy0919 opened this issue 9 months ago
condy0919 opened this issue 9 months ago
[Core] Support image processor
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Misc]: optimize eager mode host time
functionxu123 opened this pull request 9 months ago
functionxu123 opened this pull request 9 months ago
[RFC]: Multi-modality Support Refactoring
ywang96 opened this issue 9 months ago
ywang96 opened this issue 9 months ago
[Bug]: Disk I/O Error when using tools due to shared outlines cache database
AaronFriel opened this issue 9 months ago
AaronFriel opened this issue 9 months ago
[New Model]: Please update docker to support llama3
HangLu123 opened this issue 9 months ago
HangLu123 opened this issue 9 months ago
Adding max queue time parameter
KrishnaM251 opened this pull request 9 months ago
KrishnaM251 opened this pull request 9 months ago
[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported
u650080 opened this issue 9 months ago
u650080 opened this issue 9 months ago
[Usage]: Llama 3 8B Instruct Inference
aliozts opened this issue 9 months ago
aliozts opened this issue 9 months ago
[Bug]: Server crash for bloom-3b while use prefix_caching, `AssertionError assert Lk in {16, 32, 64, 128}`
DefTruth opened this issue 9 months ago
DefTruth opened this issue 9 months ago
Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`
simon-mo opened this pull request 9 months ago
simon-mo opened this pull request 9 months ago
[CI/Build] Further decouple HuggingFace implementation from ours during tests
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[BugFix] fix num_lookahead_slots missing in async executor
leiwen83 opened this pull request 9 months ago
leiwen83 opened this pull request 9 months ago
[Misc]: How to access the KV cache directly?
BDHU opened this issue 9 months ago
BDHU opened this issue 9 months ago
[Feature]: AMD ROCm 6.1 Support
kannan-scalers-ai opened this issue 9 months ago
kannan-scalers-ai opened this issue 9 months ago
[Bug]: Processed prompts: 5%|▌ | 429/8535 [00:27<08:37, 15.68it/s] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
pangpang-xuan opened this issue 9 months ago
pangpang-xuan opened this issue 9 months ago
[Usage]: if I want to run a 34B model,like yi-34B-chat,how can I use multi GPU,I just have A100 40G
hellostronger opened this issue 9 months ago
hellostronger opened this issue 9 months ago
[Usage]: How to get the latency of each request with benchmark_serving.py
wanzhenchn opened this issue 9 months ago
wanzhenchn opened this issue 9 months ago
[Core] Enable prefix caching with block manager v2 enabled
leiwen83 opened this pull request 9 months ago
leiwen83 opened this pull request 9 months ago
[Feature]: Phi2 LoRA support
zero-or-one opened this issue 9 months ago
zero-or-one opened this issue 9 months ago
[Misc]Add customized information for models
jeejeelee opened this pull request 9 months ago
jeejeelee opened this pull request 9 months ago
[Bug]: Invalid Device Ordinal on ROCm
Bellk17 opened this issue 9 months ago
Bellk17 opened this issue 9 months ago
Added Support for guided decoding in offline interface
kevinbu233 opened this pull request 9 months ago
kevinbu233 opened this pull request 9 months ago
[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring
hongxiayang opened this pull request 9 months ago
hongxiayang opened this pull request 9 months ago
[Feature]: Support HuggingFaceM4/idefics2-8b as vision model
pseudotensor opened this issue 9 months ago
pseudotensor opened this issue 9 months ago
[Misc] [CI]: AMD test flaky on main CI
cadedaniel opened this issue 9 months ago
cadedaniel opened this issue 9 months ago
[Model] Update MPT model with GLU and rope and add low precision layer norm
marov opened this pull request 9 months ago
marov opened this pull request 9 months ago
[Model] Jamba support
mzusman opened this pull request 9 months ago
mzusman opened this pull request 9 months ago
[CI/BUILD] enable intel queue for longer CPU tests
zhouyuan opened this pull request 9 months ago
zhouyuan opened this pull request 9 months ago
[Bug]: VLLM's output is unstable when handling requests CONCURRENTLY.
zhengwei-gao opened this issue 9 months ago
zhengwei-gao opened this issue 9 months ago
[Bug]: deepseek-coder-33b-instruct and deepseek-coder-6.7b-instruct broken, but deepseek-llm-7b-chat and deepseek-llm-67b-chat work well
lgw2023 opened this issue 9 months ago
lgw2023 opened this issue 9 months ago
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide`
br3no opened this pull request 9 months ago
br3no opened this pull request 9 months ago
[Bug]: NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
pseudotensor opened this issue 9 months ago
pseudotensor opened this issue 9 months ago
[Bug]: --engine-use-ray is broken. #4100
jdinalt opened this pull request 9 months ago
jdinalt opened this pull request 9 months ago
[Bugfix] Fix naive attention typos and make it run on navi3x
maleksan85 opened this pull request 9 months ago
maleksan85 opened this pull request 9 months ago
[Bug]: guided_json bad output for llama2-13b
pseudotensor opened this issue 9 months ago
pseudotensor opened this issue 9 months ago
[Model] Adding support for MiniCPM-V
HwwwwwwwH opened this pull request 9 months ago
HwwwwwwwH opened this pull request 9 months ago
[FacebookAI/roberta-large]: vllm support for FacebookAI/roberta-large
pradeepdev-1995 opened this issue 9 months ago
pradeepdev-1995 opened this issue 9 months ago
[Bug]: vllm_C is missing.
Calvinnncy97 opened this issue 9 months ago
Calvinnncy97 opened this issue 9 months ago
[Model] Add support for 360zhinao
garycaokai opened this pull request 9 months ago
garycaokai opened this pull request 9 months ago
[Bug]: RuntimeError: Unknown layout
zzlgreat opened this issue 9 months ago
zzlgreat opened this issue 9 months ago
[Bug]: sending request using response_format json twice breaks vLLM
samos123 opened this issue 9 months ago
samos123 opened this issue 9 months ago
[Feature]: Allow LoRA adapters to be specified as in-memory dict of tensors
jacobthebanana opened this issue 9 months ago
jacobthebanana opened this issue 9 months ago
[Usage]: Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1
rohitnanda1443 opened this issue 9 months ago
rohitnanda1443 opened this issue 9 months ago
Does vllm support both CUDA 11.3 version and PyTorch 1.12?
iclgg opened this issue 9 months ago
iclgg opened this issue 9 months ago
[Usage]: Problem when loading my trained model.
hummingbird2030 opened this issue 9 months ago
hummingbird2030 opened this issue 9 months ago
[Feature][Chunked prefill]: Make sliding window work
rkooo567 opened this issue 9 months ago
rkooo567 opened this issue 9 months ago
[Feature]: bitsandbytes support
orellavie1212 opened this issue 9 months ago
orellavie1212 opened this issue 9 months ago
[Frontend] Refactor prompt processing
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Bug]: start api server stuck
QianguoS opened this issue 9 months ago
QianguoS opened this issue 9 months ago
[Model] [Kernel] Add 16, 32 kernel sizes in compliation
nbardy opened this pull request 9 months ago
nbardy opened this pull request 9 months ago
[Installation]: Any plans on providing vLLM pre-compiled for ROCm?
satyamk7054 opened this issue 9 months ago
satyamk7054 opened this issue 9 months ago
[Core] Support LoRA on quantized models
jeejeelee opened this pull request 9 months ago
jeejeelee opened this pull request 9 months ago
[Installation]: VLLM is impossible to install.
GPaolo opened this issue 9 months ago
GPaolo opened this issue 9 months ago
[Kernel] Fused MoE Config for Mixtral 8x22
ywang96 opened this pull request 9 months ago
ywang96 opened this pull request 9 months ago
[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果
li995495592 opened this issue 9 months ago
li995495592 opened this issue 9 months ago
[Usage]: flash_attn vs xformers
VeryVery opened this issue 9 months ago
VeryVery opened this issue 9 months ago
[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm
TNT3530 opened this issue 9 months ago
TNT3530 opened this issue 9 months ago
[Bug]: Command R+ GPTQ bad output on ROCm
TNT3530 opened this issue 9 months ago
TNT3530 opened this issue 9 months ago
[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Feature]: Tree attention about Speculative Decoding
yukavio opened this issue 10 months ago
yukavio opened this issue 10 months ago
[CI/Build] Reduce race condition in docker build
youkaichao opened this pull request 10 months ago
youkaichao opened this pull request 10 months ago
[Misc]: Does prefix caching work together with multi lora?
sleepwalker2017 opened this issue 10 months ago
sleepwalker2017 opened this issue 10 months ago
[Bug]: StableLM 12b head size incorrect
bjoernpl opened this issue 10 months ago
bjoernpl opened this issue 10 months ago
[Model] LoRA gptbigcode implementation
raywanb opened this pull request 10 months ago
raywanb opened this pull request 10 months ago
[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics
DearPlanet opened this pull request 10 months ago
DearPlanet opened this pull request 10 months ago
[Bug]: leading space within content via OpenAI Compatible Server
bufferoverflow opened this issue 10 months ago
bufferoverflow opened this issue 10 months ago
[Usage]: How to offload some layers to CPU?
cheney369 opened this issue 10 months ago
cheney369 opened this issue 10 months ago
想问下有一个稳定版本的docker 镜像吗?
huyang19881115 opened this issue 10 months ago
huyang19881115 opened this issue 10 months ago
[Model] Initialize Fuyu-8B support
Isotr0py opened this pull request 10 months ago
Isotr0py opened this pull request 10 months ago
[Bug]: Cannot use FlashAttention because the package is not found. Please install it for better performance.
pseudotensor opened this issue 10 months ago
pseudotensor opened this issue 10 months ago
[Bug]: Getting subprocess.CalledProcessError: Command '['/usr/bin/gcc',] error message.
yk287 opened this issue 10 months ago
yk287 opened this issue 10 months ago
[Usage]: I have two Gpus, how do I make my model run on 2 gpus
hxujal opened this issue 10 months ago
hxujal opened this issue 10 months ago
[Kernel] PyTorch Labs Fused MoE Kernel Integration
robertgshaw2-neuralmagic opened this pull request 10 months ago
robertgshaw2-neuralmagic opened this pull request 10 months ago
[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes
Jeffwan opened this issue 10 months ago
Jeffwan opened this issue 10 months ago
[Bug]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
guangweiShaw opened this issue 10 months ago
guangweiShaw opened this issue 10 months ago
[Bug]:Failed that we generate the pompts with the google/gemma-2b model by the python code,
936187425 opened this issue 10 months ago
936187425 opened this issue 10 months ago
[Usage]: How to determine whether the vllm engine is full with requests or not
man2machine opened this issue 10 months ago
man2machine opened this issue 10 months ago
[Bug]: killed due to high memory usage
xiewf1990 opened this issue 10 months ago
xiewf1990 opened this issue 10 months ago
[Bug]: Cannot load lora adapters in WSL 2
invokeinnovation opened this issue 10 months ago
invokeinnovation opened this issue 10 months ago
[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x
vgod-dbx opened this issue 10 months ago
vgod-dbx opened this issue 10 months ago
[Doc/Feature]: Llava 1.5 in OpenAI compatible server
stikkireddy opened this issue 10 months ago
stikkireddy opened this issue 10 months ago
[Roadmap] vLLM Roadmap Q2 2024
simon-mo opened this issue 10 months ago
simon-mo opened this issue 10 months ago
[Misc]: Can we remove `vllm/entrypoints/api_server.py`?
hmellor opened this issue 10 months ago
hmellor opened this issue 10 months ago
[Frontend] openAI entrypoint dynamic adapter load
DavidPeleg6 opened this pull request 10 months ago
DavidPeleg6 opened this pull request 10 months ago
[Bug]: Error happen in async_llm_engine when use multiple GPUs
for-just-we opened this issue 10 months ago
for-just-we opened this issue 10 months ago
[Misc]: Implement CPU/GPU swapping in BlockManagerV2
Kaiyang-Chen opened this pull request 10 months ago
Kaiyang-Chen opened this pull request 10 months ago
[Core] :loud_sound: Improve request logging truncation
joerunde opened this pull request 10 months ago
joerunde opened this pull request 10 months ago
[Model] Cohere CommandR+
saurabhdash2512 opened this pull request 10 months ago
saurabhdash2512 opened this pull request 10 months ago
[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend
jikunshang opened this pull request 10 months ago
jikunshang opened this pull request 10 months ago
[Installation]: Tesla V100 cuda11.4, I have no permission to install a upper-version cuda driver, how can I install vllm? I have tried to build from source and use pip, both failed.
LaVieEnRose365 opened this issue 10 months ago
LaVieEnRose365 opened this issue 10 months ago
[Bug]: YI:34B在使用上无法停止。
cat2353050774 opened this issue 10 months ago
cat2353050774 opened this issue 10 months ago
[Feature]: Make `outlines` dependency optional
saattrupdan opened this issue 10 months ago
saattrupdan opened this issue 10 months ago