Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm

vllm推理THUDM/chatglm3-6b-128k 无法stop

linzm1007 opened this issue 7 months ago
[Bug]: Pending but Avg generation throughput: 0.0 tokens/s

hitsz-zxw opened this issue 7 months ago
[Bugfix] Destroy PP groups properly

andoorve opened this pull request 7 months ago
[Bug]: prompt_logprobs doesn't work with openai compatible server

Some-random opened this issue 7 months ago
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results

tlrmchlsmth opened this pull request 7 months ago
[Kernel] Allow 8-bit outputs for cutlass_scaled_mm

tlrmchlsmth opened this pull request 7 months ago
p

khluu opened this pull request 7 months ago
[Misc] Add CustomOp interface for device portability

WoosukKwon opened this pull request 7 months ago
[Bugfix] Fix `MultiprocessingGPUExecutor.check_health` when world_size == 1

jsato8094 opened this pull request 7 months ago
[Speculative Decoding] Add `ProposerWorkerBase` abstract class

njhill opened this pull request 7 months ago
how to compile with GLIBCXX_USE_CXX11_ABI=1

demonatic opened this issue 7 months ago
[BugFix]Fix the problem that StopChecker assumes a single token produ…

IcyFeather233 opened this pull request 7 months ago
[Kernel] Add back batch size 1536 and 3072 to MoE tuning

WoosukKwon opened this pull request 7 months ago
[CI/Build] Reducing CPU CI execution time

bigPYJ1151 opened this pull request 7 months ago
[Bug]: Tokenizer setter of LLM without CachedTokenizer adapter

DriverSong opened this issue 7 months ago
[Performance]: Speculative Performance almost same or lower

tolry418 opened this issue 7 months ago
[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100

pcmoritz opened this pull request 7 months ago
[Frontend] Add OpenAI Vision API Support

ywang96 opened this pull request 7 months ago
[Bug]: LLM.generate() collapse with some padding side

kevin3314 opened this issue 7 months ago
[Bugfix] Add warmup for prefix caching example

zhuohan123 opened this pull request 7 months ago
Bugfix: fix broken of download models from modelscope

liuyhwangyh opened this pull request 7 months ago
[Feature]: vllm-flash-attn cu118 compatibility

epark001 opened this issue 7 months ago
[Model] Correct Mixtral FP8 checkpoint loading

comaniac opened this pull request 7 months ago
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor

zifeitong opened this pull request 7 months ago
[Feature]: Custom attention masks

ojus1 opened this issue 7 months ago
[Usage]: How to start inference serving through `LLM` object

Jiayi-Pan opened this issue 7 months ago
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to False

zifeitong opened this pull request 7 months ago
v0.5.0 Release Tracker

simon-mo opened this issue 7 months ago
[Misc] Adding Speculative decoding to Throughput Benchmarking script

abhibambhaniya opened this pull request 7 months ago
Support W4A8 quantization for vllm

HandH1998 opened this pull request 7 months ago
[Bugfix] Support `prompt_logprobs==0`

toslunar opened this pull request 7 months ago
[CI/Build] Add inputs tests

DarkLight1337 opened this pull request 7 months ago
[Core] Registry for processing model inputs

DarkLight1337 opened this pull request 7 months ago
[Bug]: prompt_logprobs=0 raises AssertionError

toslunar opened this issue 7 months ago
[Installation]: Failed to build punica

asinglestep opened this issue 7 months ago
[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM

DriverSong opened this pull request 7 months ago
[Feature]: Option to override HuggingFace's configurations

DarkLight1337 opened this issue 7 months ago
[Feature]: Speculative edits

Muhtasham opened this issue 7 months ago
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

rikitomo opened this issue 7 months ago
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

rikioka-tomokazu opened this issue 7 months ago
[Frontend] Customizable RoPE theta

sasha0552 opened this pull request 7 months ago
push error

triple-Mu opened this pull request 7 months ago
[Misc] Improve error message when LoRA parsing fails

DarkLight1337 opened this pull request 7 months ago
[Core] Support loading GGUF model

Isotr0py opened this pull request 7 months ago
[Bug]: loading squeezellm model

yuhuixu1993 opened this issue 7 months ago
[Model] Add PaliGemma

ywang96 opened this pull request 7 months ago
[Core][Prefix Caching] Fix hashing logic for non-full blocks

zhuohan123 opened this pull request 7 months ago
[BugFix] Prevent `LLM.encode` for non-generation Models

robertgshaw2-neuralmagic opened this pull request 7 months ago
[Kernel] Switch fp8 layers to use the CUTLASS kernels

tlrmchlsmth opened this pull request 7 months ago
[Bug]: The Offline Inference Embedding Example Fails

cuizhuyefei opened this issue 7 months ago
[Bugfix]: Fix issues related to prefix caching example (#5177)

Delviet opened this pull request 7 months ago
[Feature]: BERT models for embeddings

mevince opened this issue 7 months ago
[Model] LoRA support added for command-r

sergey-tinkoff opened this pull request 7 months ago
[Bug]: Incorrect Example for the Inference with Prefix

Delviet opened this issue 7 months ago
[Usage]: Prefix caching in VLLM

Abhinay2323 opened this issue 7 months ago
draft2

khluu opened this pull request 7 months ago
[Bugfix] Remove deprecated @abstractproperty

zhuohan123 opened this pull request 7 months ago
Adding fp8 gemm computation

charlifu opened this pull request 7 months ago
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support

njhill opened this pull request 7 months ago
[Bug]: Model Launch Hangs with 16+ Ranks in vLLM

wushidonguc opened this issue 7 months ago
[Bugfix] Fix illegal memory access for lora

sfc-gh-zhwang opened this pull request 7 months ago
[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels

tlrmchlsmth opened this pull request 7 months ago
[Performance]: What can we learn from OctoAI

hmellor opened this issue 7 months ago
[Build] Do not compile cutlass scaled_mm on CUDA 11

simon-mo opened this pull request 7 months ago
[Bugfix] Fix KeyError: 1 When Using LoRA adapters

BlackBird-Coding opened this pull request 7 months ago
[Bug]: Unable to Use Prefix Caching in AsyncLLMEngine

kezouke opened this issue 7 months ago
[Kernel] Pass a device pointer into the quantize kernel for the scales

tlrmchlsmth opened this pull request 7 months ago
[Feature]: Linear adapter support for Mixtral

DhruvaBansal00 opened this issue 7 months ago
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088

alexm-neuralmagic opened this pull request 7 months ago
[Kernel] Update Cutlass fp8 configs

varun-sundar-rabindranath opened this pull request 7 months ago