Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bug]: Qwen2-VL incoherent output with OpenAI API
SinanAkkoyun opened this issue 3 months ago
SinanAkkoyun opened this issue 3 months ago
[Bug]: tensor parallelism multinode
gpucce opened this issue 3 months ago
gpucce opened this issue 3 months ago
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode
llsj14 opened this pull request 3 months ago
llsj14 opened this pull request 3 months ago
[Bug]: Bfloat16 or Half are not compatible with HF float16/bfloat16 result.
jason9693 opened this issue 3 months ago
jason9693 opened this issue 3 months ago
[Bug]: Jetson support regression
conroy-cheers opened this issue 3 months ago
conroy-cheers opened this issue 3 months ago
[Doc] Specify async engine args in docs
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[V1] Prototype Fully Async Detokenizer
robertgshaw2-neuralmagic opened this pull request 3 months ago
robertgshaw2-neuralmagic opened this pull request 3 months ago
[core] cudagraph output with tensor weak reference
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL
hector-gr opened this issue 3 months ago
hector-gr opened this issue 3 months ago
[Performance]: How to Improve Performance Under Concurrency
ljwps opened this issue 3 months ago
ljwps opened this issue 3 months ago
[Bugfix] Use temporary directory in registry
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
[Model] Add BNB quantization support for Mllama
Isotr0py opened this pull request 3 months ago
Isotr0py opened this pull request 3 months ago
[Misc] SpecDecodeWorker supports profiling
Abatom opened this pull request 3 months ago
Abatom opened this pull request 3 months ago
[torch.compile] rework compile control with piecewise cudagraph
youkaichao opened this pull request 3 months ago
youkaichao opened this pull request 3 months ago
[Usage]: ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now.
Mjiegu opened this issue 3 months ago
Mjiegu opened this issue 3 months ago
[Bug]: Inconsistent evaluations when enabling / disabling chunked_prefill?
Jingyu6 opened this issue 3 months ago
Jingyu6 opened this issue 3 months ago
[Model] Add classification Task with Qwen2ForSequenceClassification
kakao-kevin-us opened this pull request 3 months ago
kakao-kevin-us opened this pull request 3 months ago
[Usage]: Using a model for inference and embedding
micuentadecasa opened this issue 3 months ago
micuentadecasa opened this issue 3 months ago
[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows
xiezhipeng-git opened this issue 3 months ago
xiezhipeng-git opened this issue 3 months ago
CI TEST
maxdebayser opened this pull request 3 months ago
maxdebayser opened this pull request 3 months ago
[Model] Support math-shepherd-mistral-7b-prm model
Went-Liang opened this pull request 3 months ago
Went-Liang opened this pull request 3 months ago
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled
ankush13r opened this issue 3 months ago
ankush13r opened this issue 3 months ago
[Model] Support GGUF models newly added in `transformers` 4.46.0
Isotr0py opened this pull request 3 months ago
Isotr0py opened this pull request 3 months ago
[Core] Support offloading KV cache to CPU
KuntaiDu opened this pull request 3 months ago
KuntaiDu opened this pull request 3 months ago
[Build] skip renaming files for release wheels pipeline
simon-mo opened this pull request 3 months ago
simon-mo opened this pull request 3 months ago
[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nemotron-70B-Instruct-HF generate garbage on v0.6.3 ( issue is not seen in v0.6.2)
source-ram opened this issue 3 months ago
source-ram opened this issue 3 months ago
[Bug]: hung when start openai api server with multiple gpu in one node.
weiminw opened this issue 3 months ago
weiminw opened this issue 3 months ago
[Doc] Update FAQ links in spec_decode.rst
whyiug opened this pull request 3 months ago
whyiug opened this pull request 3 months ago
[Usage]: Llama-3.1-70B-Instruct best arguments for throughput at scale for multiple users
squinn1 opened this issue 3 months ago
squinn1 opened this issue 3 months ago
[Misc]: huggingface_hub.errors.HFValidationError using LLama3.1-405b
unrue opened this issue 3 months ago
unrue opened this issue 3 months ago
[Usage]: Pass multiple LoRA modules through YAML config
andreapairon opened this issue 3 months ago
andreapairon opened this issue 3 months ago
[Feature]: support SageAttention
LSC527 opened this issue 3 months ago
LSC527 opened this issue 3 months ago
[Performance]: Low GPU utilization - is it normal?
fzyzcjy opened this issue 3 months ago
fzyzcjy opened this issue 3 months ago
[V1] Move mm_input_mapper to a separate process
WoosukKwon opened this pull request 3 months ago
WoosukKwon opened this pull request 3 months ago
[Bug]: pipepline parallel performance issue for 1 sample.
littletomatodonkey opened this issue 3 months ago
littletomatodonkey opened this issue 3 months ago
[torch.compile] Adding torch compile annotations to some models
CRZbulabula opened this pull request 3 months ago
CRZbulabula opened this pull request 3 months ago
[Bug]: glm4-9b-chat-lora-merge model with VLLM for concurrent requests, the process gets stuck and returns an "Aborted request" error.
Jimmy-L99 opened this issue 3 months ago
Jimmy-L99 opened this issue 3 months ago
[Usage]: Multimodal content with benchmark_serving.py
khayamgondal opened this issue 3 months ago
khayamgondal opened this issue 3 months ago
[Bugfix] Fix edge cases for MistralTokenizer
tjohnson31415 opened this pull request 3 months ago
tjohnson31415 opened this pull request 3 months ago
[Bug]: Incompatible shape in block table when running Phi-3.5-mini-instruct
vizsatiz opened this issue 3 months ago
vizsatiz opened this issue 3 months ago
[Model][LoRA]LoRA support added for Qwen
jeejeelee opened this pull request 3 months ago
jeejeelee opened this pull request 3 months ago
[CI/Build] improve python-only dev setup
dtrifiro opened this pull request 3 months ago
dtrifiro opened this pull request 3 months ago
[Bug]: crash:RecursionError: maximum recursion depth exceeded
wciq1208 opened this issue 3 months ago
wciq1208 opened this issue 3 months ago
[New Model]: stepfun-ai/GOT-OCR2_0
akhileshsharma99 opened this issue 3 months ago
akhileshsharma99 opened this issue 3 months ago
[Core] Make encoder-decoder inputs a nested structure to be more composable
DarkLight1337 opened this pull request 3 months ago
DarkLight1337 opened this pull request 3 months ago
Linter test
maxdebayser opened this pull request 3 months ago
maxdebayser opened this pull request 3 months ago
[Misc] Upgrade to pytorch 2.5
bnellnm opened this pull request 3 months ago
bnellnm opened this pull request 3 months ago
[Feature]: LoRA support for Qwen model
zhangfan-algo opened this issue 3 months ago
zhangfan-algo opened this issue 3 months ago
[Bugfix] use AF_INET6 instead of AF_INET for OpenAI Compatible Server
jxpxxzj opened this pull request 3 months ago
jxpxxzj opened this pull request 3 months ago
[Feature]: Support for 1.58-bit models.
RealMrCactus opened this issue 3 months ago
RealMrCactus opened this issue 3 months ago
[Performance]: vllm Eagle performance is worse than expected
LiuXiaoxuanPKU opened this issue 3 months ago
LiuXiaoxuanPKU opened this issue 3 months ago
[Bug]: benchmark serving does not support --best_of>1
homeffjy opened this issue 3 months ago
homeffjy opened this issue 3 months ago
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models
sroy745 opened this pull request 3 months ago
sroy745 opened this pull request 3 months ago
[Bug]: GGUF Llama-3.1-Nemotron-70B-Instruct-HF ValueError: cannot reshape array of size into shape
paolovic opened this issue 3 months ago
paolovic opened this issue 3 months ago
[Bug]: MistralTokenizer Detokenization Issue
prashantgupta24 opened this issue 3 months ago
prashantgupta24 opened this issue 3 months ago
[Usage]: Custom LLM Generate
Blaizzy opened this issue 3 months ago
Blaizzy opened this issue 3 months ago
[Bugfix][Misc]: fix graph capture for decoder
yudian0504 opened this pull request 3 months ago
yudian0504 opened this pull request 3 months ago
[New Model]: bert-base-chinese
kangzemin opened this issue 3 months ago
kangzemin opened this issue 3 months ago
[Feature]: Support for Controlled Decoding
simonucl opened this issue 3 months ago
simonucl opened this issue 3 months ago
[Performance]: bitsandbytes quantization slow
lance0108 opened this issue 3 months ago
lance0108 opened this issue 3 months ago
[Feature]: EAGLE fp8 quantization
fengyang95 opened this issue 3 months ago
fengyang95 opened this issue 3 months ago
[Bugfix] Fix load config when using bools
madt2709 opened this pull request 3 months ago
madt2709 opened this pull request 3 months ago
[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix caching, block manager v2 and xformers enabled together
sasha0552 opened this pull request 3 months ago
sasha0552 opened this pull request 3 months ago
[Usage]: When using vllm to start the interpl2-8b model service, an error occurs. The command is as follows: vllm serve/ internvl2-8b
hyyuananran opened this issue 3 months ago
hyyuananran opened this issue 3 months ago
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger
heheda12345 opened this pull request 3 months ago
heheda12345 opened this pull request 3 months ago
[Frontend] Support suffix in completions API (fill-in-the-middle)
njhill opened this pull request 3 months ago
njhill opened this pull request 3 months ago
[Bug]: Multiple inconsistencies wrt BOS injection and BOS duplication
stas00 opened this issue 3 months ago
stas00 opened this issue 3 months ago
Adds method to read the pooling types from model's files
flaviabeo opened this pull request 3 months ago
flaviabeo opened this pull request 3 months ago
[Model] Update MPT model with GLU and rope and add low precision layer norm
kazuki opened this pull request 3 months ago
kazuki opened this pull request 3 months ago
[Bug]: When reading the content from the configuration file specified by the --config parameter, the parameter type was not considered.
SakigamiYang opened this issue 3 months ago
SakigamiYang opened this issue 3 months ago
[Bug]: [Performance] 100% performance drop using multiple lora vs no lora(qwen-chat model)
askcs517 opened this issue 3 months ago
askcs517 opened this issue 3 months ago
[Feature]: LoRA support for InternVLChatModel
AkshataABhat opened this issue 3 months ago
AkshataABhat opened this issue 3 months ago
[Misc] Fix ImportError causing by triton
MengqingCao opened this pull request 3 months ago
MengqingCao opened this pull request 3 months ago
[Usage]: When to use flashinfer as the default backend
ehuaa opened this issue 3 months ago
ehuaa opened this issue 3 months ago
【Frontend】Add sampler_priority and repetition_penalty_range
ZeroYuJie opened this pull request 3 months ago
ZeroYuJie opened this pull request 3 months ago
[Performance]: InternVL multi image speed is not improved compare to original
luohao123 opened this issue 3 months ago
luohao123 opened this issue 3 months ago
[Feature]: Support for Diff-Transformer to limit noise in attention calculation @ runtime
nightflight-dk opened this issue 3 months ago
nightflight-dk opened this issue 3 months ago
[Feature]: Alternating local-global attention layers
griff4692 opened this issue 3 months ago
griff4692 opened this issue 3 months ago
[Bug]: Too Many Tokens are Empty Strings and Empty Bytes, and `top_logprobs` Can't Identify End-of-Text (EOT) Tokens
DIYer22 opened this issue 3 months ago
DIYer22 opened this issue 3 months ago
[Installation]: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION
yangxin60-tal opened this issue 3 months ago
yangxin60-tal opened this issue 3 months ago
[Feature]: Consider parallel_tool_calls parameter at the API level
lucasalvarezlacasa opened this issue 3 months ago
lucasalvarezlacasa opened this issue 3 months ago
[Misc]: offline inference inconsistency result of qwen2-7b
poppybrown opened this issue 3 months ago
poppybrown opened this issue 3 months ago
[Bug]: vllm startup model error /proc file not found
970602 opened this issue 3 months ago
970602 opened this issue 3 months ago
[Misc] Compute query_start_loc/seq_start_loc on CPU
zhengy001 opened this pull request 3 months ago
zhengy001 opened this pull request 3 months ago
[Bug]: Could we provide an interface for setting the "dtype" when calling the example/benchmarks python?
hongfeng2013 opened this issue 3 months ago
hongfeng2013 opened this issue 3 months ago
[Bug]: Speculative decoding generate gibberish when receiving parallel requests with different seeds
wallashss opened this issue 3 months ago
wallashss opened this issue 3 months ago
[Frontend] re-enable multi-modality input in the new beam search implementation
FerdinandZhong opened this pull request 3 months ago
FerdinandZhong opened this pull request 3 months ago
[Feature]: Allow setting tool_choice="none" in LLM calls if the OpenAI comaptible vllm server is started with --enable-auto-tool-choice
deheim opened this issue 3 months ago
deheim opened this issue 3 months ago
[Bug]: Speculative decoding breaks guided decoding.
roberthoenig opened this issue 3 months ago
roberthoenig opened this issue 3 months ago
[Bug]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241016-170451.pkl): view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
double-vin opened this issue 3 months ago
double-vin opened this issue 3 months ago
[Performance]: inference with qwen2.5 using version vLLM 0.6.3 is felt to be slower
Jimmy-L99 opened this issue 3 months ago
Jimmy-L99 opened this issue 3 months ago
[Usage]: Which branch should I use to test speculative decoding
v-lmn opened this issue 3 months ago
v-lmn opened this issue 3 months ago
Begin refactoring executor_base ABC
jberkhahn opened this pull request 4 months ago
jberkhahn opened this pull request 4 months ago
Support Roberta embedding models
maxdebayser opened this pull request 4 months ago
maxdebayser opened this pull request 4 months ago
[Performance][Kernel] Fused_moe Performance Improvement
charlifu opened this pull request 4 months ago
charlifu opened this pull request 4 months ago
[New Model]: Support Zyphra/Zamba2-7B
mgoin opened this issue 4 months ago
mgoin opened this issue 4 months ago
[Bug]: KeyError: 'layers.60.mlp.gate_up_proj.weight' mistral large bitsandbytes
copasseron opened this issue 4 months ago
copasseron opened this issue 4 months ago
[CI/Build] remove .github from .dockerignore
dtrifiro opened this pull request 4 months ago
dtrifiro opened this pull request 4 months ago
[Neuron] [Bugfix] Fix neuron startup
xendo opened this pull request 4 months ago
xendo opened this pull request 4 months ago