vLLM issues | Ecosyste.ms: OpenCollective

Addition of lacked ignored_seq_groups in _schedule_chunked_prefill

github.com/vllm-project/vllm - JamesLim-sy opened this pull request 5 months ago

[Core][Distributed] add coordinator to reduce code duplication in tp and pp

github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago

[Hardware] Initial TPU integration

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Misc] Skip for logits_scale == 1.0

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Usage]: the docker image v0.4.3 cannot work

github.com/vllm-project/vllm - BUJIDAOVS opened this issue 5 months ago

[Misc] Missing error message for custom ops import

github.com/vllm-project/vllm - DamonFool opened this pull request 5 months ago

trigger_ci_cd

github.com/vllm-project/vllm - sergey-tinkoff opened this pull request 5 months ago

[Bug]: Regression in predictions in v0.4.3

github.com/vllm-project/vllm - hibukipanim opened this issue 5 months ago

[Model] Dynamic image size support for LLaVA-NeXT

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False)

github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago

test

github.com/vllm-project/vllm - geeker-smallwhite opened this pull request 5 months ago

[Core] Dynamic image size support for VLMs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Kernel] Update Cutlass int8 kernel configs for SM80

github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago

[Bug]: high gpu_memory_utilization with 'OOM' and low gpu_memory_utilization with 'No available memory for the cache blocks'

github.com/vllm-project/vllm - mars-ch opened this issue 5 months ago

[Bug]: chatglm3 with lora adapter

github.com/vllm-project/vllm - Qingyuncookie opened this issue 5 months ago

[Bug]: When I call the speculative model through the vllm interface, an error is reported: TypeError: 'type' object is not subscriptable

github.com/vllm-project/vllm - YuCheng-Qi opened this issue 5 months ago

[Misc] Fix docstring of get_attn_backend

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Bug]: a bug

github.com/vllm-project/vllm - lambda7xx opened this issue 5 months ago

[Usage]: How to load a model with less CPU memory

github.com/vllm-project/vllm - liulfy opened this issue 5 months ago

vllm推理THUDM/chatglm3-6b-128k 无法stop

github.com/vllm-project/vllm - linzm1007 opened this issue 5 months ago

[Bug]: Pending but Avg generation throughput: 0.0 tokens/s

github.com/vllm-project/vllm - hitsz-zxw opened this issue 5 months ago

[Usage]:how to get the output embedding for a text generation model using vllm

github.com/vllm-project/vllm - Apricot1225 opened this issue 5 months ago

[Bugfix] Destroy PP groups properly

github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago

[Bug]: prompt_logprobs doesn't work with openai compatible server

github.com/vllm-project/vllm - Some-random opened this issue 5 months ago

[misc] benchmark_serving.py -- add ITL results and tweak TPOT results

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Kernel] Allow 8-bit outputs for cutlass_scaled_mm

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

p

github.com/vllm-project/vllm - khluu opened this pull request 5 months ago

[Misc] Add CustomOp interface for device portability

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[Bugfix] Fix `MultiprocessingGPUExecutor.check_health` when world_size == 1

github.com/vllm-project/vllm - jsato8094 opened this pull request 5 months ago

[CI/Build] Add `is_quant_method_supported` to control quantization test configurations

github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago

[Speculative Decoding] Add `ProposerWorkerBase` abstract class

github.com/vllm-project/vllm - njhill opened this pull request 5 months ago

[Misc]: vllm ONLY allocate KVCache on the first device in CUDA_VISIBLE_DEVICES

github.com/vllm-project/vllm - CatYing opened this issue 5 months ago

how to compile with GLIBCXX_USE_CXX11_ABI=1

github.com/vllm-project/vllm - demonatic opened this issue 5 months ago

[BugFix]Fix the problem that StopChecker assumes a single token produ…

github.com/vllm-project/vllm - IcyFeather233 opened this pull request 5 months ago

[Kernel] Add back batch size 1536 and 3072 to MoE tuning

github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago

[CI/Build] Reducing CPU CI execution time

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago

[Bug]: Tokenizer setter of LLM without CachedTokenizer adapter

github.com/vllm-project/vllm - DriverSong opened this issue 5 months ago

[Performance]: Speculative Performance almost same or lower

github.com/vllm-project/vllm - tolry418 opened this issue 5 months ago

[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100

github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago

[Frontend] Add OpenAI Vision API Support

github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago

[Bug]: LLM.generate() collapse with some padding side

github.com/vllm-project/vllm - kevin3314 opened this issue 5 months ago

[Bugfix] Add warmup for prefix caching example

github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago

[Feature]: Add efficient interface for evaluating probabilities of fixed prompt-completion pairs

github.com/vllm-project/vllm - xinyangz opened this issue 5 months ago

Bugfix: fix broken of download models from modelscope

github.com/vllm-project/vllm - liuyhwangyh opened this pull request 5 months ago

[Feature]: vllm-flash-attn cu118 compatibility

github.com/vllm-project/vllm - epark001 opened this issue 5 months ago

[Model] Correct Mixtral FP8 checkpoint loading

github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago

[Core][Doc] Default to multiprocessing for single-node distributed case

github.com/vllm-project/vllm - njhill opened this pull request 5 months ago

[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor

github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago

[Feature]: Custom attention masks

github.com/vllm-project/vllm - ojus1 opened this issue 5 months ago

[Usage]: How to start inference serving through `LLM` object

github.com/vllm-project/vllm - Jiayi-Pan opened this issue 5 months ago

[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to False

github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago

v0.5.0 Release Tracker

github.com/vllm-project/vllm - simon-mo opened this issue 5 months ago

[Misc] Adding Speculative decoding to Throughput Benchmarking script

github.com/vllm-project/vllm - abhibambhaniya opened this pull request 5 months ago

[Usage]: RuntimeError: CUDA error: uncorrectable ECC error encountered

github.com/vllm-project/vllm - DJCoolDev opened this issue 5 months ago

[Doc]: Update the vllm distributed Inference and Serving with the new MultiprocessingGPUExecutor

github.com/vllm-project/vllm - rcarrata opened this issue 5 months ago

[Bug]: Mixtral-8x22 request cancelled by cancel scope when client sends multiple concurrent requests

github.com/vllm-project/vllm - markovalexander opened this issue 5 months ago

[Bug]: Mistral 7B crashes on NVidia Tesla P100 with a CUDA Error

github.com/vllm-project/vllm - oe3gwu opened this issue 5 months ago

Support W4A8 quantization for vllm

github.com/vllm-project/vllm - HandH1998 opened this pull request 5 months ago

[Bugfix] Support `prompt_logprobs==0`

github.com/vllm-project/vllm - toslunar opened this pull request 5 months ago

[CI/Build] Add inputs tests

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Core] Registry for processing model inputs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Bug]: prompt_logprobs=0 raises AssertionError

github.com/vllm-project/vllm - toslunar opened this issue 5 months ago

[Installation]: Failed to build punica

github.com/vllm-project/vllm - asinglestep opened this issue 5 months ago

[Usage]: how to terminal a vllm model and free or release gpu memory

github.com/vllm-project/vllm - wellcasa opened this issue 5 months ago

[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend

github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago

[Feature]: Support for Mirostat, Dynamic Temperature, and Quadratic Sampling

github.com/vllm-project/vllm - Emmie411 opened this issue 5 months ago

[Bug]: VLLM_ATTENTION_BACKEND set to ROCM_FLASH only in GHA environment, overriding automatic backend selection; this breaks other kernel unit tests.

github.com/vllm-project/vllm - afeldman-nm opened this issue 5 months ago

[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM

github.com/vllm-project/vllm - DriverSong opened this pull request 5 months ago

[Feature]: Option to override HuggingFace's configurations

github.com/vllm-project/vllm - DarkLight1337 opened this issue 5 months ago

[Feature]: inconsistent vocab_sizes support for draft and target workers while using Speculative Decoding

github.com/vllm-project/vllm - ShangmingCai opened this issue 5 months ago

[Feature]: Speculative edits

github.com/vllm-project/vllm - Muhtasham opened this issue 5 months ago

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

github.com/vllm-project/vllm - rikitomo opened this issue 5 months ago

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

github.com/vllm-project/vllm - rikioka-tomokazu opened this issue 5 months ago

[Frontend] Customizable RoPE theta

github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago

push error

github.com/vllm-project/vllm - triple-Mu opened this pull request 5 months ago

[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA?

github.com/vllm-project/vllm - chakpongchung opened this issue 5 months ago

[Misc] Improve error message when LoRA parsing fails

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago

[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph.

github.com/vllm-project/vllm - AlphaINF opened this issue 5 months ago

[Core] Support loading GGUF model

github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago

[Bug]: loading squeezellm model

github.com/vllm-project/vllm - yuhuixu1993 opened this issue 5 months ago

[Model] Add PaliGemma

github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago

[Core][Prefix Caching] Fix hashing logic for non-full blocks

github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago

[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.

github.com/vllm-project/vllm - TikZSZ opened this pull request 5 months ago

[Bug]: vLLM api_server.py when using with prompt_token_ids causes error.

github.com/vllm-project/vllm - TikZSZ opened this issue 5 months ago

[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?

github.com/vllm-project/vllm - xxll88 opened this issue 5 months ago

[BugFix] Prevent `LLM.encode` for non-generation Models

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago

[Kernel] Switch fp8 layers to use the CUTLASS kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago

[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary `asyncio.exceptions.CancelledError`

github.com/vllm-project/vllm - jlcmoore opened this issue 5 months ago

[Bug]: The Offline Inference Embedding Example Fails

github.com/vllm-project/vllm - cuizhuyefei opened this issue 5 months ago

[Bugfix]: Fix issues related to prefix caching example (#5177)

github.com/vllm-project/vllm - Delviet opened this pull request 5 months ago

[Feature]: BERT models for embeddings

github.com/vllm-project/vllm - mevince opened this issue 5 months ago

[Model] LoRA support added for command-r

github.com/vllm-project/vllm - sergey-tinkoff opened this pull request 5 months ago

[Bug]: Incorrect Example for the Inference with Prefix

github.com/vllm-project/vllm - Delviet opened this issue 5 months ago

[Usage]: Prefix caching in VLLM

github.com/vllm-project/vllm - Abhinay2323 opened this issue 5 months ago

draft2

github.com/vllm-project/vllm - khluu opened this pull request 5 months ago

[Bugfix] Remove deprecated @abstractproperty

github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.

github.com/vllm-project/vllm - charent opened this pull request 5 months ago

Adding fp8 gemm computation

github.com/vllm-project/vllm - charlifu opened this pull request 5 months ago

[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support

github.com/vllm-project/vllm - njhill opened this pull request 5 months ago

[Bug]: Model Launch Hangs with 16+ Ranks in vLLM

github.com/vllm-project/vllm - wushidonguc opened this issue 5 months ago