vLLM issues | Ecosyste.ms: OpenCollective

[core] [3/N] multi-step args and sequence.py

github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago

[misc] Add Torch profiler support

github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago

[Feature]: Integrate `flash-infer` FP8 KV Cache Chunked-Prefill (Append Attention)

github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago

[Core] More-efficient cross-attention parallel QKV computation

github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago

support bitsandbytes 8-bit and FP4 quantized models

github.com/vllm-project/vllm - chenqianfzh opened this pull request 2 months ago

[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints

github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago

[Bug][Frontend] Add and test client timeouts

github.com/vllm-project/vllm - joerunde opened this pull request 2 months ago

[Core] Fix tracking of model forward time to the span traces in case of PP>1

github.com/vllm-project/vllm - sfc-gh-mkeralapura opened this pull request 2 months ago

[Feature]: CI - Split up "Models Test" and "Vision Language Models Test"

github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago

[Core][Model][Frontend] Model architecture plugins

github.com/vllm-project/vllm - NadavShmayo opened this pull request 2 months ago

[Misc] update fp8 to use `vLLMParameter`

github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago

[Model] Adding Granite model.

github.com/vllm-project/vllm - shawntan opened this pull request 2 months ago

[Usage]: GPTQ quantization behavior

github.com/vllm-project/vllm - onlinex opened this issue 2 months ago

Simplify Jamba state management

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 2 months ago

[misc][plugin] add plugin system implementation

github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago

[Misc] Deprecation Warning when setting --engine-use-ray

github.com/vllm-project/vllm - wallashss opened this pull request 2 months ago

[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters`

github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago

[New Model]: LLaVA-OneVision

github.com/vllm-project/vllm - EthanZoneCoding opened this issue 2 months ago

[Misc]: How to use intel-gpu in openvino

github.com/vllm-project/vllm - liuxingbin opened this issue 2 months ago

[Kernel] W8A16 Int8 inside FusedMoE

github.com/vllm-project/vllm - mzusman opened this pull request 2 months ago

[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang

github.com/vllm-project/vllm - KuntaiDu opened this pull request 2 months ago

[frontend] isolate api server process and engine process

github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago

[VLM][Model] Add test for InternViT vision encoder

github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago

[RFC]: Refactor the service pipeline to overlap GPU execution and CPU operations

github.com/vllm-project/vllm - sleepwalker2017 opened this issue 2 months ago

[CI/Build] Minor refactoring for vLLM assets

github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago

[Misc] Update `awq_marlin` to use `vLLMParameters`

github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago

[misc] add commit id in collect env

github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago

[Usage]: KV Cache Warning for `gemma2`

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 2 months ago

[Doc] add instructions about building vLLM with VLLM_TARGET_DEVICE=empty

github.com/vllm-project/vllm - tomeras91 opened this pull request 2 months ago

[Core] Move detokenization to front-end process

github.com/vllm-project/vllm - njhill opened this pull request 2 months ago

[Bug]: Dockerfile build error

github.com/vllm-project/vllm - palash-fin opened this issue 2 months ago

[Bug]: Bug in quantization/awq /gemm_kernels.cu gemm_forward_4bit_cuda_m16nXk32 More result have been write

github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago

[Bug]: Bug in vllm/csrc/quantization/awq /gemm_kernels.cu

github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago

[Bugfix] Handle PackageNotFoundError when checking for xpu version

github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago

[Misc]: Cross-attention QKV computation is inefficient

github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago

[CI/Build]Reduce the time consumption for LoRA tests

github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago

[Core] More-efficient cross-attention parallel QKV computation

github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago

[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago

[Bug]: use `openai_vision_api_client.py` get error

github.com/vllm-project/vllm - jaffe-fly opened this issue 2 months ago

[Bugfix] Fix phi3v batch inference when images have different aspect ratio

github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago

[Core] fix _get_num_new_tokens() for _schedule_default()

github.com/vllm-project/vllm - George-ao opened this pull request 2 months ago

[Bug]: guided regex (using outlines and lm format enforcer) return bad error description on invalid regex

github.com/vllm-project/vllm - itaybar opened this issue 2 months ago

[Bug]: Successfully deployed embedding model 'gte-Qwen2-7B-instruct', but got "TypeError: 'async for' requires an object with __aiter__ method, got coroutine" when calling it

github.com/vllm-project/vllm - Dielianss opened this issue 2 months ago

[Bug]: `facebook/chameleon-30b` triggers assertion error while loading weights

github.com/vllm-project/vllm - jaywonchung opened this issue 2 months ago

[core] [2/N] refactor worker_base input preparation for multi-step

github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago

[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ

github.com/vllm-project/vllm - rasmith opened this pull request 2 months ago

[Core]RequestMetrics add preempt metrics

github.com/vllm-project/vllm - zeroorhero opened this pull request 2 months ago

[Bug]: some questions regarding the usage of NCCL allreduce/broadcast/allgather/send/recv in VLLM using pycomm and torch's distributed.

github.com/vllm-project/vllm - kanghui0204 opened this issue 2 months ago

[Bug]: LLaMa 3.1 8B/70B/405B all behave poorly and differently using completions API as compared to good chat API

github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago

[Core] Add engine option to return only deltas or final output

github.com/vllm-project/vllm - njhill opened this pull request 2 months ago

[Core] Fix edge case in chunked prefill + block manager v2

github.com/vllm-project/vllm - cadedaniel opened this pull request 2 months ago

[Performance]: vllm inference in CPU instance has generation < 10 tokens / second

github.com/vllm-project/vllm - gracequeen opened this issue 2 months ago

[Bug]: prefill/prefix FP8 triton kernel for opt-125m - an illegal memory access was encountered

github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago

[Misc] Add numpy implementation of `compute_slot_mapping`

github.com/vllm-project/vllm - Yard1 opened this pull request 2 months ago

[Bugfix] Fix `PerTensorScaleParameter` weight loading for fused models

github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago

[Usage]: Getting empty text using llm.generate of mixtral-8X7b-Instruct AWQ model

github.com/vllm-project/vllm - ab6995 opened this issue 2 months ago

[Bug]: Tensor Parallel > 1 causes desc_act=True GPTQ models to give bad output on ROCm

github.com/vllm-project/vllm - TNT3530 opened this issue 2 months ago

[Bug]: Phi-3-vision: ERROR 08-09 11:41:40 async_llm_engine.py:56] RuntimeError: stack expects each tensor to be equal size, but got [1933, 4096] at entry 0 and [2509, 4096] at entry 1

github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago

[Bugfix] Fix ITL recording in serving benchmark

github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago

[Feature]: Support block manager v2 for chunked prefill

github.com/vllm-project/vllm - comaniac opened this issue 2 months ago

[Misc]: Improving VLLM KVCACHE Transfer Efficiency with NCCL P2P Communication

github.com/vllm-project/vllm - liweiqing1997 opened this issue 2 months ago

[CI/Build][ROCm] Enabling LoRA tests on ROCm

github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago

[Installation]: git clone cutlass fails

github.com/vllm-project/vllm - paolovic opened this issue 2 months ago

[Usage]: how to use LLM class with AsyncLLMEngine

github.com/vllm-project/vllm - henry-y opened this issue 2 months ago

[RFC]: Encoder/decoder models & feature compatibility

github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago

[FrontEnd] Keep RPC server tcp protocol

github.com/vllm-project/vllm - esmeetu opened this pull request 2 months ago

[Performance] e2e overheads reduction: Small followup diff

github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago

Create speculative decode dynamic parallel strategy