vLLM issues | Ecosyste.ms: OpenCollective

[Usage]: How to serve fine-tuned torchtune model with vllm

github.com/vllm-project/vllm - Some-random opened this issue 5 months ago

[Frontend][OpenAI] Add support for OpenAI tools calling

github.com/vllm-project/vllm - Xwdit opened this pull request 5 months ago

[Bug]: Ray on multi machine cluster fails to detect all nodes.

github.com/vllm-project/vllm - bks5881 opened this issue 5 months ago

[Bug]: NCCL timed out during inference

github.com/vllm-project/vllm - enkiid opened this issue 5 months ago

[Model] Snowflake arctic model implementation

github.com/vllm-project/vllm - sfc-gh-hazhang opened this pull request 5 months ago

[Bug]: openapi running but "POST /v1/chat/completions HTTP/1.1" 404 Not Found

github.com/vllm-project/vllm - yebangyu opened this issue 5 months ago

Support Deepseek-V2

github.com/vllm-project/vllm - zwd003 opened this pull request 5 months ago

[Scheduler] Warning upon preemption and Swapping

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[CORE] Adding support for insertion of soft-tuned prompts

github.com/vllm-project/vllm - SwapnilDreams100 opened this pull request 5 months ago

[Frontend][OpenAI] Support for returning max_model_len on /v1/models response

github.com/vllm-project/vllm - Avinash-Raj opened this pull request 5 months ago

fix MiniCPM tie_word_embeddings

github.com/vllm-project/vllm - Receiling opened this pull request 5 months ago

[Bug]: with `worker_use_ray = true`, and tensor_parallel_size > 1, the process is pending forever

github.com/vllm-project/vllm - depenglee1707 opened this issue 5 months ago

[Frontend] Dynamic RoPE scaling

github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago

[CI] Add llama 3 model test

github.com/vllm-project/vllm - rkooo567 opened this pull request 5 months ago

[Model] Add support for IBM Granite Code models

github.com/vllm-project/vllm - yikangshen opened this pull request 5 months ago

[CI] Add retry for agent lost

github.com/vllm-project/vllm - cadedaniel opened this pull request 6 months ago

[Performance] [Speculative decoding]: Support draft model on different tensor-parallel size than target model

github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago

Update lm-format-enforcer to 0.10.1

github.com/vllm-project/vllm - noamgat opened this pull request 6 months ago

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

github.com/vllm-project/vllm - cadedaniel opened this issue 6 months ago

[CI] use ccache actions properly in release workflow

github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago

[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 6 months ago

[Bug]: Add logger and redirect logs to a file

github.com/vllm-project/vllm - hahmad2008 opened this issue 6 months ago

[Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin

github.com/vllm-project/vllm - alexm-nm opened this pull request 6 months ago

[Feature]: MLA Support

github.com/vllm-project/vllm - chengtbf opened this issue 6 months ago

[Misc]: int4 support on CPU backend

github.com/vllm-project/vllm - leiwen83 opened this issue 6 months ago

[Bugfix] Fix `asyncio.Task` not being subscriptable

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago

[Usage]: doubt on computational complexity

github.com/vllm-project/vllm - Juelianqvq opened this issue 6 months ago

[Bug]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5

github.com/vllm-project/vllm - White-Friday opened this issue 6 months ago

[Bug]: `v0.4.2` python3.8 `TypeError: 'type' object is not subscriptable` (python3.9 syntax)

github.com/vllm-project/vllm - Theodotus1243 opened this issue 6 months ago

[Usage]: How to install vllm in cuda10.2? Cuda version cannot be upgraded due to environmental issues

github.com/vllm-project/vllm - 1193700079 opened this issue 6 months ago

[Bug]: ValueError: Model QWenLMHeadModel does not support LoRA, but LoRA is enabled. Support for this model may be added in the future. If this is important to you, please open an issue on github.

github.com/vllm-project/vllm - linzm1007 opened this issue 6 months ago

[Bug]: always returns invalid tokens in FP8 static mode

github.com/vllm-project/vllm - AnyISalIn opened this issue 6 months ago

[Core] Update `_earliest_arrival_time` calculation of the waiting seq_groups

github.com/vllm-project/vllm - Felix-Zhenghao opened this pull request 6 months ago

Main backup 2024 05 05

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago

Upstream sync 2024 05 05

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago

Revert to previous main

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago

[Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora

github.com/vllm-project/vllm - FurtherAI opened this pull request 6 months ago

[Usage]: Cannot run the starter code in tutorial

github.com/vllm-project/vllm - zhimin-z opened this issue 6 months ago

[Core][Optimization] change python dict to pytorch tensor

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Bug]: when dtype='bfloat16', batch_size will cause different inference results

github.com/vllm-project/vllm - yananchen1989 opened this issue 6 months ago

[Bug]: local variable 'lora_b_k' referenced before assignment

github.com/vllm-project/vllm - LucienShui opened this issue 6 months ago

[Bug]: RuntimeError: CUDA error: no kernel image is available for execution on the device

github.com/vllm-project/vllm - JPonsa opened this issue 6 months ago

chunked-prefill-doc-syntax

github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago

[CI] Reduce wheel size by not shipping debug symbols

github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago

[CI/Build] from scratch build for dockerfile

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

bump version to v0.4.2

github.com/vllm-project/vllm - simon-mo opened this pull request 6 months ago

[Bug]: I used vllm=0.4.1 to run the squeezellm, I meet the bug RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h":25, please report a bug to PyTorch.

github.com/vllm-project/vllm - RyanWMHI opened this issue 6 months ago

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class.

github.com/vllm-project/vllm - yecohn opened this pull request 6 months ago

[Bug]: 400 Bad Request

github.com/vllm-project/vllm - gaye746560359 opened this issue 6 months ago

[CI] Make mistral tests pass

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[RFC][WIP] Use llama-3 instead of llama-2 for basic testing

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[Core] Optimize sampler get_logprobs

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[BugFix] Fix fp8 quantizer

github.com/vllm-project/vllm - Kev1ntan opened this pull request 6 months ago

[Dynamic Spec Decoding] Auto-disable by the running queue size

github.com/vllm-project/vllm - comaniac opened this pull request 6 months ago

[Core][Distributed] refactor pynccl to hold multiple communicators

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Bugfix] fix func var in cpuworker.execute_model() [bug 4568]

github.com/vllm-project/vllm - peterauyeung opened this pull request 6 months ago

[Bug]: Loading GenerationConfig to SamplingParams.stop_token_ids interfere with ignore_eos=True

github.com/vllm-project/vllm - CatherineSue opened this issue 6 months ago

[Usage]: Difference in language model usage post updating versions form 0.2 to 0.4

github.com/vllm-project/vllm - servient-ashwin opened this issue 6 months ago

[Bug]: vllm 0.4.1 crashing after checking P2P status on single GPU

github.com/vllm-project/vllm - alexandergagliano opened this issue 6 months ago

[Bugfix] Allow "None" or "" to be passed to CLI for string args that default to None

github.com/vllm-project/vllm - mgoin opened this pull request 6 months ago

[Bug]: `vllm.entrypoints.openai.api_server` CLI command doesn't accept `None` value for `--quantization`

github.com/vllm-project/vllm - dbarbuzzi opened this issue 6 months ago

[Bug]: Tensorizer model loader blocks multi-GPU loading even for HF serialized models

github.com/vllm-project/vllm - bbrowning opened this issue 6 months ago

[Bug]: guided jsons with date fields are not valid

github.com/vllm-project/vllm - andreas-22 opened this issue 6 months ago

add spec infer related into prometheus metrics.

github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago

[Doc]: i want to know. How to run vllms with remote ray cluster

github.com/vllm-project/vllm - Prashantsaini25 opened this issue 6 months ago

[Doc] Chunked Prefill Documentation

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[Misc]: openai compatible server

github.com/vllm-project/vllm - aqx95 opened this issue 6 months ago

[Bug]: Special tokens split when decoding after 0.4.0.post1

github.com/vllm-project/vllm - DreamGenX opened this issue 6 months ago

[Core] Log more GPU memory reservation info

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[New Model]: Cogagent

github.com/vllm-project/vllm - leoozy opened this issue 6 months ago

[Misc] add installation time env vars

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi

github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago

[Doc] add env vars to the doc

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Misc] remove chunk detected debug logs

github.com/vllm-project/vllm - DefTruth opened this pull request 6 months ago

[Kernel] Make static FP8 scaling more robust

github.com/vllm-project/vllm - pcmoritz opened this pull request 6 months ago

[CI][Contribution Welcomed] Conditional Testing

github.com/vllm-project/vllm - simon-mo opened this issue 6 months ago

[Bug]: Query to the openapi server with cpu backend is throwing error

github.com/vllm-project/vllm - navpreet-np7 opened this issue 6 months ago

[BugFix] Prevent the task of `_force_log` from being garbage collected

github.com/vllm-project/vllm - Atry opened this pull request 6 months ago

[Core][Distributed] enable allreduce for multiple tp groups

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[RFC]: Automate Speculative Decoding

github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this issue 6 months ago

Update requirements-dev.txt

github.com/vllm-project/vllm - yecohn opened this pull request 6 months ago

[Misc]: Server Does Not Follow Scheduler Policy

github.com/vllm-project/vllm - Bojun-Feng opened this issue 6 months ago

[BugFix] Include target-device specific requirements.txt in sdist

github.com/vllm-project/vllm - markmc opened this pull request 6 months ago

[CI/Build] Unpin outlines

github.com/vllm-project/vllm - br3no opened this pull request 6 months ago

[Core] Ignore infeasible swap requests.

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[Bug]: Scheduler fail with assertion on "meta-llama/Meta-Llama-3-70B-Instruct" when calling concurrently

github.com/vllm-project/vllm - tsvisab opened this issue 6 months ago

[mypy][7/N] Cover all directories

github.com/vllm-project/vllm - rkooo567 opened this pull request 6 months ago

[Usage]: Experiencing weird import bugs and errors after installing with pip install -e .

github.com/vllm-project/vllm - KevinCL16 opened this issue 6 months ago

[Bug]: AssertionError in neuron_model_runner.py assert len(block_table) == 1

github.com/vllm-project/vllm - calvintwr opened this issue 6 months ago

[Misc] Exclude the `tests` directory from being packaged

github.com/vllm-project/vllm - itechbear opened this pull request 6 months ago

[Bug fix][Core] fixup ngram not setup correctly

github.com/vllm-project/vllm - leiwen83 opened this pull request 6 months ago

[WIP] Enhance MoE Triton kernel & tuning

github.com/vllm-project/vllm - WoosukKwon opened this pull request 6 months ago

[New Model]: OpenELM-3B

github.com/vllm-project/vllm - Isotr0py opened this issue 6 months ago

[Misc] centralize all usage of environment variables

github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago

[Bug]: v0.4.1 The output results of the MoE kinds models are incorrect on the V100

github.com/vllm-project/vllm - keyword1983 opened this issue 6 months ago

[Core] Sliding window for block manager v2

github.com/vllm-project/vllm - mmoskal opened this pull request 6 months ago

[Installation]: vLLM does not work on old CPU

github.com/vllm-project/vllm - dimaioksha opened this issue 6 months ago

[Misc][Refactor] Introduce ExecuteModelData

github.com/vllm-project/vllm - comaniac opened this pull request 6 months ago

[Core] Add MultiprocessingGPUExecutor

github.com/vllm-project/vllm - njhill opened this pull request 6 months ago

Virtual Office Hours: May 15 2pm ET

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 6 months ago