vLLM issues | Ecosyste.ms: OpenCollective

[Installation]: VLLM is impossible to install.

github.com/vllm-project/vllm - GPaolo opened this issue 6 months ago

[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号，无结果

github.com/vllm-project/vllm - li995495592 opened this issue 6 months ago

[Usage]: flash_attn vs xformers

github.com/vllm-project/vllm - VeryVery opened this issue 6 months ago

[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm

github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago

[Bug]: Command R+ GPTQ bad output on ROCm

github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago

[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago

[Feature]: Tree attention about Speculative Decoding

github.com/vllm-project/vllm - yukavio opened this issue 6 months ago

[CI/Build] Reduce race condition in docker build

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Bug]: StableLM 12b head size incorrect

github.com/vllm-project/vllm - bjoernpl opened this issue 6 months ago

[Model] LoRA gptbigcode implementation

github.com/vllm-project/vllm - raywanb opened this pull request 6 months ago

[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics

github.com/vllm-project/vllm - DearPlanet opened this pull request 6 months ago

[Usage]: How to offload some layers to CPU？

github.com/vllm-project/vllm - cheney369 opened this issue 6 months ago

想问下有一个稳定版本的docker 镜像吗？

github.com/vllm-project/vllm - huyang19881115 opened this issue 6 months ago

[Model] Initialize Fuyu-8B support

github.com/vllm-project/vllm - Isotr0py opened this pull request 6 months ago

[Bug]: Cannot use FlashAttention because the package is not found. Please install it for better performance.

github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago

[Bug]: Getting subprocess.CalledProcessError: Command '['/usr/bin/gcc',] error message.

github.com/vllm-project/vllm - yk287 opened this issue 6 months ago

[Usage]: I have two Gpus, how do I make my model run on 2 gpus

github.com/vllm-project/vllm - hxujal opened this issue 6 months ago

[Kernel] PyTorch Labs Fused MoE Kernel Integration

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago

[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes

github.com/vllm-project/vllm - Jeffwan opened this issue 6 months ago

[Bug]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

github.com/vllm-project/vllm - guangweiShaw opened this issue 6 months ago

[Usage]: How to determine whether the vllm engine is full with requests or not

github.com/vllm-project/vllm - man2machine opened this issue 6 months ago

[Bug]: killed due to high memory usage

github.com/vllm-project/vllm - xiewf1990 opened this issue 6 months ago

[Bug]: Cannot load lora adapters in WSL 2

github.com/vllm-project/vllm - invokeinnovation opened this issue 6 months ago

[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x

github.com/vllm-project/vllm - vgod-dbx opened this issue 7 months ago

[Doc/Feature]: Llava 1.5 in OpenAI compatible server

github.com/vllm-project/vllm - stikkireddy opened this issue 7 months ago

[Roadmap] vLLM Roadmap Q2 2024

github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago

[Misc]: Can we remove `vllm/entrypoints/api_server.py`?

github.com/vllm-project/vllm - hmellor opened this issue 7 months ago

[Frontend] openAI entrypoint dynamic adapter load

github.com/vllm-project/vllm - DavidPeleg6 opened this pull request 7 months ago

[Bug]: Error happen in async_llm_engine when use multiple GPUs

github.com/vllm-project/vllm - for-just-we opened this issue 7 months ago

[Misc]: Implement CPU/GPU swapping in BlockManagerV2

github.com/vllm-project/vllm - Kaiyang-Chen opened this pull request 7 months ago

[Core] :loud_sound: Improve request logging truncation

github.com/vllm-project/vllm - joerunde opened this pull request 7 months ago

[Model] Cohere CommandR+

github.com/vllm-project/vllm - saurabhdash2512 opened this pull request 7 months ago

[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend

github.com/vllm-project/vllm - jikunshang opened this pull request 7 months ago

[Installation]: Tesla V100 cuda11.4, I have no permission to install a upper-version cuda driver, how can I install vllm? I have tried to build from source and use pip, both failed.

github.com/vllm-project/vllm - LaVieEnRose365 opened this issue 7 months ago

[Bug]: YI:34B在使用上无法停止。

github.com/vllm-project/vllm - cat2353050774 opened this issue 7 months ago

[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=3424 dtype=Float out_dtype=BFloat16

github.com/vllm-project/vllm - Edisonwei54 opened this issue 7 months ago

[Feature]: Add OpenTelemetry distributed tracing

github.com/vllm-project/vllm - ronensc opened this issue 7 months ago

[Feature]: cuda12.2 support

github.com/vllm-project/vllm - s-natsubori opened this issue 7 months ago

vllm-0.4.0.post1+neuron213; ModuleNotFoundError: No module named 'vllm._C' [Bug]:

github.com/vllm-project/vllm - MojHnd opened this issue 7 months ago

Best server cmd for mistralai/Mistral-7B-v0.1

github.com/vllm-project/vllm - sshleifer opened this issue 7 months ago

[Bug]: Qwen-14B-Chat-Int4 with guided_json error

github.com/vllm-project/vllm - xunfeng1980 opened this issue 7 months ago

[Bug]: n_inner divisible to number of GPUs

github.com/vllm-project/vllm - aliozts opened this issue 7 months ago

[Bug]: docker 启动vllm,配置了host_IP ，还是 [W socket.cpp:663] [c10d] The client socket has failed to connect to [::ffff:172.16.8.232]:39623 (errno: 110 - Connection timed out)

github.com/vllm-project/vllm - huyang19881115 opened this issue 7 months ago

[Core] Eliminate parallel worker per-step task scheduling overhead

github.com/vllm-project/vllm - njhill opened this pull request 7 months ago

[WIP][Core] fully composible launcher/task/coordinator/communicator design and implementation

github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago

[Usage]: Expected output when prompt_logprobs=1

github.com/vllm-project/vllm - thefirebanks opened this issue 7 months ago

[Bug]: trying to run vllm inference behind the fastapi's server, but it stucks

github.com/vllm-project/vllm - sigridjineth opened this issue 7 months ago

[Bug]: CUDA error: invalid argument

github.com/vllm-project/vllm - qingjiaozyn opened this issue 7 months ago

[Model][Misc] Add e5-mistral-7b-instruct and Embedding API

github.com/vllm-project/vllm - CatherineSue opened this pull request 7 months ago

[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290

github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 7 months ago

[Model Loading] Speedup model loading with distributed loading

github.com/vllm-project/vllm - chestnut-Q opened this pull request 7 months ago

[Misc]: Cohere models are not working due to transformers library outdated?

github.com/vllm-project/vllm - Playerrrrr opened this issue 7 months ago

[Bug]: RuntimeError: CUDA error: invalid device ordinal with multi node multi gpus

github.com/vllm-project/vllm - kn1011 opened this issue 7 months ago

[Usage]: vllm can host offline? with internet connection?

github.com/vllm-project/vllm - juud79 opened this issue 7 months ago

[Feature]: A instruction/chat method for offline LLM class.

github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago

[Bug]: Custom all reduce not work.

github.com/vllm-project/vllm - esmeetu opened this issue 7 months ago

[Usage]: Error Segmentation fault(core dumped) while testing asynchronous high concurrency

github.com/vllm-project/vllm - alex1996-ljl opened this issue 7 months ago

[Misc]: Implement CPU/GPU swapping in BlockManagerV2

github.com/vllm-project/vllm - cadedaniel opened this issue 7 months ago

[Hardware][AMD][Kernel]Adding custom kernel for vector query on Rocm

github.com/vllm-project/vllm - charlifu opened this pull request 7 months ago

[Bug]: ChatCompletion prompt_logprobs does not work

github.com/vllm-project/vllm - noamgat opened this issue 7 months ago

[RFC] Initial Support for CPUs

github.com/vllm-project/vllm - bigPYJ1151 opened this issue 7 months ago

[Usage]: Generate specified number of tokens for each request individually

github.com/vllm-project/vllm - oximi123 opened this issue 7 months ago

[Kernel] Use flash-attn for decoding

github.com/vllm-project/vllm - skrider opened this pull request 7 months ago

[Misc] add the "download-dir" option to the latency/throughput benchmarks

github.com/vllm-project/vllm - AmadeusChan opened this pull request 7 months ago

[RFC] Initial Support for Cloud TPUs

github.com/vllm-project/vllm - WoosukKwon opened this issue 7 months ago

parent_child_dict[sample.parent_seq_id].append(sample) KeyError: 4

github.com/vllm-project/vllm - Stosan opened this issue 7 months ago

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b

github.com/vllm-project/vllm - haining78zhang opened this issue 7 months ago

[BugFix] Fix Falcon tied embeddings

github.com/vllm-project/vllm - WoosukKwon opened this pull request 7 months ago

[RFC]: Interface and Abstraction for Distributed Inference Environment

github.com/vllm-project/vllm - youkaichao opened this issue 7 months ago

[Misc]: Throughput/Latency for guided_json with ~100% GPU cache utilization

github.com/vllm-project/vllm - jens-create opened this issue 7 months ago

[Feature]: Offload Model Weights to CPU

github.com/vllm-project/vllm - chenqianfzh opened this issue 7 months ago

[New Model]: Phi-2 support for LoRA

github.com/vllm-project/vllm - andykhanna opened this issue 7 months ago

[Feature]: Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

github.com/vllm-project/vllm - tchaton opened this issue 7 months ago

[Usage]: punica LoRA kernels could not be imported. If you built vLLM from source, make sure VLLM_INSTALL_PUNICA_KERNELS=1 env var was set.

github.com/vllm-project/vllm - nlp-learner opened this issue 7 months ago

[Feature]: Support Guided Decoding in `LLM` entrypoint

github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago

[Bug]: when intalling vllm by pip, some errors happend.

github.com/vllm-project/vllm - finylink opened this issue 7 months ago

[Usage]: How to inference model with multi-gpus

github.com/vllm-project/vllm - ckj18 opened this issue 7 months ago

[Kernel] Full Tensor Parallelism for LoRA Layers

github.com/vllm-project/vllm - FurtherAI opened this pull request 7 months ago

[Bug]: aisingapore/sea-lion-7b-instruct fails with assert config.embedding_fraction == 1.0

github.com/vllm-project/vllm - pseudotensor opened this issue 7 months ago

[Feature]: Support distributing serving with KubeRay's autoscaler

github.com/vllm-project/vllm - TrafalgarZZZ opened this issue 7 months ago

[Bug]: vllm slows down after a long run

github.com/vllm-project/vllm - momomobinx opened this issue 7 months ago

[New Model]: Please support CogVLM

github.com/vllm-project/vllm - kietna1809 opened this issue 7 months ago

[Misc] Add attention sinks

github.com/vllm-project/vllm - felixzhu555 opened this pull request 7 months ago

[Bug]: Use of LoRAReqeust

github.com/vllm-project/vllm - meiru-cam opened this issue 7 months ago

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server

github.com/vllm-project/vllm - njhill opened this pull request 7 months ago

[Core] Add generic typing to `LRUCache`

github.com/vllm-project/vllm - njhill opened this pull request 7 months ago

[Usage]: Set dtype for VLLM using YAML

github.com/vllm-project/vllm - telekoteko opened this issue 7 months ago

[Feature]: Compute and log the serving FLOPs

github.com/vllm-project/vllm - zhuohan123 opened this issue 7 months ago

[Usage]: Why increase max-num-seqs will use less memory

github.com/vllm-project/vllm - TaChao opened this issue 7 months ago

[Bug]: DynamicNTKScalingRotaryEmbedding implementation is different from Transformers

github.com/vllm-project/vllm - killawhale2 opened this issue 7 months ago

[Frontend] [Core] feat: Add model loading using `tensorizer`

github.com/vllm-project/vllm - sangstar opened this pull request 7 months ago

[Frontend] Support complex message content for chat completions endpoint

github.com/vllm-project/vllm - fgreinacher opened this pull request 7 months ago

[Core] Multiprocessing executor for single-node multi-GPU deployment

github.com/vllm-project/vllm - njhill opened this pull request 7 months ago

[Misc] add HOST_IP env var

github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago

Unable to load LoRA fine-tuned LLM from HF (AssertionError)

github.com/vllm-project/vllm - oscar-martin opened this issue 7 months ago

Fixes #1556 double free

github.com/vllm-project/vllm - br3no opened this pull request 7 months ago

Multi-LoRA - Support for providing /load and /unload API

github.com/vllm-project/vllm - gauravkr2108 opened this issue 7 months ago

Question regarding GPU memory allocation

github.com/vllm-project/vllm - wx971025 opened this issue 7 months ago

lm-evaluation-harness broken on master

github.com/vllm-project/vllm - pcmoritz opened this issue 7 months ago

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU)

github.com/vllm-project/vllm - AdrianAbeyta opened this pull request 7 months ago