Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
Added Support for guided decoding in offline interface
kevinbu233 opened this pull request 8 months ago
kevinbu233 opened this pull request 8 months ago
[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring
hongxiayang opened this pull request 8 months ago
hongxiayang opened this pull request 8 months ago
[Feature]: Support HuggingFaceM4/idefics2-8b as vision model
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
[Misc] [CI]: AMD test flaky on main CI
cadedaniel opened this issue 8 months ago
cadedaniel opened this issue 8 months ago
[Model] Update MPT model with GLU and rope and add low precision layer norm
marov opened this pull request 8 months ago
marov opened this pull request 8 months ago
[Model] Jamba support
mzusman opened this pull request 8 months ago
mzusman opened this pull request 8 months ago
[CI/BUILD] enable intel queue for longer CPU tests
zhouyuan opened this pull request 8 months ago
zhouyuan opened this pull request 8 months ago
[Bug]: VLLM's output is unstable when handling requests CONCURRENTLY.
zhengwei-gao opened this issue 8 months ago
zhengwei-gao opened this issue 8 months ago
[Bug]: deepseek-coder-33b-instruct and deepseek-coder-6.7b-instruct broken, but deepseek-llm-7b-chat and deepseek-llm-67b-chat work well
lgw2023 opened this issue 8 months ago
lgw2023 opened this issue 8 months ago
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide`
br3no opened this pull request 8 months ago
br3no opened this pull request 8 months ago
[Bug]: NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
[Bug]: --engine-use-ray is broken. #4100
jdinalt opened this pull request 8 months ago
jdinalt opened this pull request 8 months ago
[Bugfix] Fix naive attention typos and make it run on navi3x
maleksan85 opened this pull request 8 months ago
maleksan85 opened this pull request 8 months ago
[Bug]: guided_json bad output for llama2-13b
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
[Model] Adding support for MiniCPM-V
HwwwwwwwH opened this pull request 8 months ago
HwwwwwwwH opened this pull request 8 months ago
[FacebookAI/roberta-large]: vllm support for FacebookAI/roberta-large
pradeepdev-1995 opened this issue 8 months ago
pradeepdev-1995 opened this issue 8 months ago
[Bug]: vllm_C is missing.
Calvinnncy97 opened this issue 8 months ago
Calvinnncy97 opened this issue 8 months ago
[Model] Add support for 360zhinao
garycaokai opened this pull request 8 months ago
garycaokai opened this pull request 8 months ago
[Bug]: RuntimeError: Unknown layout
zzlgreat opened this issue 8 months ago
zzlgreat opened this issue 8 months ago
[Bug]: sending request using response_format json twice breaks vLLM
samos123 opened this issue 8 months ago
samos123 opened this issue 8 months ago
[Feature]: Allow LoRA adapters to be specified as in-memory dict of tensors
jacobthebanana opened this issue 8 months ago
jacobthebanana opened this issue 8 months ago
[Usage]: Unable to load mistralai/Mixtral-8x7B-Instruct-v0.1
rohitnanda1443 opened this issue 8 months ago
rohitnanda1443 opened this issue 8 months ago
Does vllm support both CUDA 11.3 version and PyTorch 1.12?
iclgg opened this issue 8 months ago
iclgg opened this issue 8 months ago
[Usage]: Problem when loading my trained model.
hummingbird2030 opened this issue 8 months ago
hummingbird2030 opened this issue 8 months ago
[Feature][Chunked prefill]: Make sliding window work
rkooo567 opened this issue 8 months ago
rkooo567 opened this issue 8 months ago
[Feature]: bitsandbytes support
orellavie1212 opened this issue 8 months ago
orellavie1212 opened this issue 8 months ago
[Frontend] Refactor prompt processing
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Bug]: start api server stuck
QianguoS opened this issue 8 months ago
QianguoS opened this issue 8 months ago
[Model] [Kernel] Add 16, 32 kernel sizes in compliation
nbardy opened this pull request 8 months ago
nbardy opened this pull request 8 months ago
[Installation]: Any plans on providing vLLM pre-compiled for ROCm?
satyamk7054 opened this issue 8 months ago
satyamk7054 opened this issue 8 months ago
[Core] Support LoRA on quantized models
jeejeelee opened this pull request 9 months ago
jeejeelee opened this pull request 9 months ago
[Installation]: VLLM is impossible to install.
GPaolo opened this issue 9 months ago
GPaolo opened this issue 9 months ago
[Kernel] Fused MoE Config for Mixtral 8x22
ywang96 opened this pull request 9 months ago
ywang96 opened this pull request 9 months ago
[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果
li995495592 opened this issue 9 months ago
li995495592 opened this issue 9 months ago
[Usage]: flash_attn vs xformers
VeryVery opened this issue 9 months ago
VeryVery opened this issue 9 months ago
[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm
TNT3530 opened this issue 9 months ago
TNT3530 opened this issue 9 months ago
[Bug]: Command R+ GPTQ bad output on ROCm
TNT3530 opened this issue 9 months ago
TNT3530 opened this issue 9 months ago
[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API
DarkLight1337 opened this pull request 9 months ago
DarkLight1337 opened this pull request 9 months ago
[Feature]: Tree attention about Speculative Decoding
yukavio opened this issue 9 months ago
yukavio opened this issue 9 months ago
[CI/Build] Reduce race condition in docker build
youkaichao opened this pull request 9 months ago
youkaichao opened this pull request 9 months ago
[Misc]: Does prefix caching work together with multi lora?
sleepwalker2017 opened this issue 9 months ago
sleepwalker2017 opened this issue 9 months ago
[Bug]: StableLM 12b head size incorrect
bjoernpl opened this issue 9 months ago
bjoernpl opened this issue 9 months ago
[Model] LoRA gptbigcode implementation
raywanb opened this pull request 9 months ago
raywanb opened this pull request 9 months ago
[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics
DearPlanet opened this pull request 9 months ago
DearPlanet opened this pull request 9 months ago
[Bug]: leading space within content via OpenAI Compatible Server
bufferoverflow opened this issue 9 months ago
bufferoverflow opened this issue 9 months ago
[Usage]: How to offload some layers to CPU?
cheney369 opened this issue 9 months ago
cheney369 opened this issue 9 months ago
想问下有一个稳定版本的docker 镜像吗?
huyang19881115 opened this issue 9 months ago
huyang19881115 opened this issue 9 months ago
[Model] Initialize Fuyu-8B support
Isotr0py opened this pull request 9 months ago
Isotr0py opened this pull request 9 months ago
[Bug]: Cannot use FlashAttention because the package is not found. Please install it for better performance.
pseudotensor opened this issue 9 months ago
pseudotensor opened this issue 9 months ago
[Bug]: Getting subprocess.CalledProcessError: Command '['/usr/bin/gcc',] error message.
yk287 opened this issue 9 months ago
yk287 opened this issue 9 months ago
[Usage]: I have two Gpus, how do I make my model run on 2 gpus
hxujal opened this issue 9 months ago
hxujal opened this issue 9 months ago
[Kernel] PyTorch Labs Fused MoE Kernel Integration
robertgshaw2-neuralmagic opened this pull request 9 months ago
robertgshaw2-neuralmagic opened this pull request 9 months ago
[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes
Jeffwan opened this issue 9 months ago
Jeffwan opened this issue 9 months ago
[Bug]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
guangweiShaw opened this issue 9 months ago
guangweiShaw opened this issue 9 months ago
[Bug]:Failed that we generate the pompts with the google/gemma-2b model by the python code,
936187425 opened this issue 9 months ago
936187425 opened this issue 9 months ago
[Usage]: How to determine whether the vllm engine is full with requests or not
man2machine opened this issue 9 months ago
man2machine opened this issue 9 months ago
[Bug]: killed due to high memory usage
xiewf1990 opened this issue 9 months ago
xiewf1990 opened this issue 9 months ago
[Bug]: Cannot load lora adapters in WSL 2
invokeinnovation opened this issue 9 months ago
invokeinnovation opened this issue 9 months ago
[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x
vgod-dbx opened this issue 9 months ago
vgod-dbx opened this issue 9 months ago
[Doc/Feature]: Llava 1.5 in OpenAI compatible server
stikkireddy opened this issue 9 months ago
stikkireddy opened this issue 9 months ago
[Roadmap] vLLM Roadmap Q2 2024
simon-mo opened this issue 9 months ago
simon-mo opened this issue 9 months ago
[Misc]: Can we remove `vllm/entrypoints/api_server.py`?
hmellor opened this issue 9 months ago
hmellor opened this issue 9 months ago
[Frontend] openAI entrypoint dynamic adapter load
DavidPeleg6 opened this pull request 9 months ago
DavidPeleg6 opened this pull request 9 months ago
[Bug]: Error happen in async_llm_engine when use multiple GPUs
for-just-we opened this issue 9 months ago
for-just-we opened this issue 9 months ago
[Misc]: Implement CPU/GPU swapping in BlockManagerV2
Kaiyang-Chen opened this pull request 9 months ago
Kaiyang-Chen opened this pull request 9 months ago
[Core] :loud_sound: Improve request logging truncation
joerunde opened this pull request 9 months ago
joerunde opened this pull request 9 months ago
[Model] Cohere CommandR+
saurabhdash2512 opened this pull request 9 months ago
saurabhdash2512 opened this pull request 9 months ago
[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend
jikunshang opened this pull request 9 months ago
jikunshang opened this pull request 9 months ago
[Installation]: Tesla V100 cuda11.4, I have no permission to install a upper-version cuda driver, how can I install vllm? I have tried to build from source and use pip, both failed.
LaVieEnRose365 opened this issue 9 months ago
LaVieEnRose365 opened this issue 9 months ago
[Bug]: YI:34B在使用上无法停止。
cat2353050774 opened this issue 9 months ago
cat2353050774 opened this issue 9 months ago
[Feature]: Make `outlines` dependency optional
saattrupdan opened this issue 9 months ago
saattrupdan opened this issue 9 months ago
[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=3424 dtype=Float out_dtype=BFloat16
Edisonwei54 opened this issue 9 months ago
Edisonwei54 opened this issue 9 months ago
[Feature]: Add OpenTelemetry distributed tracing
ronensc opened this issue 9 months ago
ronensc opened this issue 9 months ago
[Feature]: cuda12.2 support
s-natsubori opened this issue 9 months ago
s-natsubori opened this issue 9 months ago
[Bug]: 【P100】RuntimeError: CUDA error: no kernel image is available for execution on the device [repeated 6x across cluster]
matrixssy opened this issue 9 months ago
matrixssy opened this issue 9 months ago
vllm-0.4.0.post1+neuron213; ModuleNotFoundError: No module named 'vllm._C' [Bug]:
MojHnd opened this issue 9 months ago
MojHnd opened this issue 9 months ago
Best server cmd for mistralai/Mistral-7B-v0.1
sshleifer opened this issue 9 months ago
sshleifer opened this issue 9 months ago
[RFC] How do we test and support third-party models
youkaichao opened this issue 9 months ago
youkaichao opened this issue 9 months ago
[Bug]: Qwen-14B-Chat-Int4 with guided_json error
xunfeng1980 opened this issue 9 months ago
xunfeng1980 opened this issue 9 months ago
[Bug]: n_inner divisible to number of GPUs
aliozts opened this issue 9 months ago
aliozts opened this issue 9 months ago
[Bug]: docker 启动vllm,配置了host_IP ,还是 [W socket.cpp:663] [c10d] The client socket has failed to connect to [::ffff:172.16.8.232]:39623 (errno: 110 - Connection timed out)
huyang19881115 opened this issue 9 months ago
huyang19881115 opened this issue 9 months ago
[Core] Eliminate parallel worker per-step task scheduling overhead
njhill opened this pull request 9 months ago
njhill opened this pull request 9 months ago
[WIP][Core] fully composible launcher/task/coordinator/communicator design and implementation
youkaichao opened this pull request 9 months ago
youkaichao opened this pull request 9 months ago
[Usage]: Expected output when prompt_logprobs=1
thefirebanks opened this issue 9 months ago
thefirebanks opened this issue 9 months ago
[Bug]: trying to run vllm inference behind the fastapi's server, but it stucks
sigridjineth opened this issue 9 months ago
sigridjineth opened this issue 9 months ago
[Bug]: CUDA error: invalid argument
qingjiaozyn opened this issue 9 months ago
qingjiaozyn opened this issue 9 months ago
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API
CatherineSue opened this pull request 9 months ago
CatherineSue opened this pull request 9 months ago
[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290
Alexei-V-Ivanov-AMD opened this pull request 9 months ago
Alexei-V-Ivanov-AMD opened this pull request 9 months ago
[Model Loading] Speedup model loading with distributed loading
chestnut-Q opened this pull request 9 months ago
chestnut-Q opened this pull request 9 months ago
[Misc]: Cohere models are not working due to transformers library outdated?
Playerrrrr opened this issue 9 months ago
Playerrrrr opened this issue 9 months ago
[RFC] Initial support for Intel GPU
jikunshang opened this issue 9 months ago
jikunshang opened this issue 9 months ago
[Bug]: RuntimeError: CUDA error: invalid device ordinal with multi node multi gpus
kn1011 opened this issue 9 months ago
kn1011 opened this issue 9 months ago
[Usage]: vllm can host offline? with internet connection?
juud79 opened this issue 9 months ago
juud79 opened this issue 9 months ago
[Feature]: A instruction/chat method for offline LLM class.
simon-mo opened this issue 9 months ago
simon-mo opened this issue 9 months ago
[Bug]: VLLM OOMing unpredictably on prediction
hillarysanders opened this issue 9 months ago
hillarysanders opened this issue 9 months ago
[Bug]: Custom all reduce not work.
esmeetu opened this issue 9 months ago
esmeetu opened this issue 9 months ago
[Usage]: Error Segmentation fault(core dumped) while testing asynchronous high concurrency
alex1996-ljl opened this issue 9 months ago
alex1996-ljl opened this issue 9 months ago
Using the VLLM engine framework for inference, why is the first character generated always a space?
cy565025164 opened this issue 9 months ago
cy565025164 opened this issue 9 months ago
Enable mypy type checking
simon-mo opened this issue 9 months ago
simon-mo opened this issue 9 months ago