Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Doc] Specify async engine args in docs
DarkLight1337 opened this pull request about 2 months ago
DarkLight1337 opened this pull request about 2 months ago
[V1] Prototype Fully Async Detokenizer
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[core] cudagraph output with tensor weak reference
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL
hector-gr opened this issue about 2 months ago
hector-gr opened this issue about 2 months ago
[Performance]: How to Improve Performance Under Concurrency
ljwps opened this issue about 2 months ago
ljwps opened this issue about 2 months ago
[Bugfix] Use temporary directory in registry
DarkLight1337 opened this pull request about 2 months ago
DarkLight1337 opened this pull request about 2 months ago
[Model] Add BNB quantization support for Mllama
Isotr0py opened this pull request about 2 months ago
Isotr0py opened this pull request about 2 months ago
[Misc] SpecDecodeWorker supports profiling
Abatom opened this pull request about 2 months ago
Abatom opened this pull request about 2 months ago
[torch.compile] rework compile control with piecewise cudagraph
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Usage]: ValueError: Model architectures ['LlamaForCausalLM'] are not supported for now.
Mjiegu opened this issue about 2 months ago
Mjiegu opened this issue about 2 months ago
[Bug]: Inconsistent evaluations when enabling / disabling chunked_prefill?
Jingyu6 opened this issue about 2 months ago
Jingyu6 opened this issue about 2 months ago
[Model] Add classification Task with Qwen2ForSequenceClassification
kakao-kevin-us opened this pull request about 2 months ago
kakao-kevin-us opened this pull request about 2 months ago
[Usage]: Using a model for inference and embedding
micuentadecasa opened this issue about 2 months ago
micuentadecasa opened this issue about 2 months ago
[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows
xiezhipeng-git opened this issue about 2 months ago
xiezhipeng-git opened this issue about 2 months ago
CI TEST
maxdebayser opened this pull request about 2 months ago
maxdebayser opened this pull request about 2 months ago
[Model] Support math-shepherd-mistral-7b-prm model
Went-Liang opened this pull request about 2 months ago
Went-Liang opened this pull request about 2 months ago
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled
ankush13r opened this issue about 2 months ago
ankush13r opened this issue about 2 months ago
[Model] Support GGUF models newly added in `transformers` 4.46.0
Isotr0py opened this pull request about 2 months ago
Isotr0py opened this pull request about 2 months ago
[Core] Support offloading KV cache to CPU
KuntaiDu opened this pull request about 2 months ago
KuntaiDu opened this pull request about 2 months ago
[Build] skip renaming files for release wheels pipeline
simon-mo opened this pull request about 2 months ago
simon-mo opened this pull request about 2 months ago
[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nemotron-70B-Instruct-HF generate garbage on v0.6.3 ( issue is not seen in v0.6.2)
source-ram opened this issue about 2 months ago
source-ram opened this issue about 2 months ago
[Doc] Update FAQ links in spec_decode.rst
whyiug opened this pull request about 2 months ago
whyiug opened this pull request about 2 months ago
[Misc]: huggingface_hub.errors.HFValidationError using LLama3.1-405b
unrue opened this issue about 2 months ago
unrue opened this issue about 2 months ago
[V1] Move mm_input_mapper to a separate process
WoosukKwon opened this pull request about 2 months ago
WoosukKwon opened this pull request about 2 months ago
[torch.compile] Adding torch compile annotations to some models
CRZbulabula opened this pull request about 2 months ago
CRZbulabula opened this pull request about 2 months ago
[Bugfix] Fix edge cases for MistralTokenizer
tjohnson31415 opened this pull request 2 months ago
tjohnson31415 opened this pull request 2 months ago
[Model][LoRA]LoRA support added for Qwen
jeejeelee opened this pull request 2 months ago
jeejeelee opened this pull request 2 months ago
[CI/Build] improve python-only dev setup
dtrifiro opened this pull request 2 months ago
dtrifiro opened this pull request 2 months ago
[Bug]: crash:RecursionError: maximum recursion depth exceeded
wciq1208 opened this issue 2 months ago
wciq1208 opened this issue 2 months ago
[Core] Make encoder-decoder inputs a nested structure to be more composable
DarkLight1337 opened this pull request 2 months ago
DarkLight1337 opened this pull request 2 months ago
Linter test
maxdebayser opened this pull request 2 months ago
maxdebayser opened this pull request 2 months ago
[Misc] Upgrade to pytorch 2.5
bnellnm opened this pull request 2 months ago
bnellnm opened this pull request 2 months ago
[Feature]: LoRA support for Qwen model
zhangfan-algo opened this issue 2 months ago
zhangfan-algo opened this issue 2 months ago
[Bugfix] use AF_INET6 instead of AF_INET for OpenAI Compatible Server
jxpxxzj opened this pull request 2 months ago
jxpxxzj opened this pull request 2 months ago
[Performance]: vllm Eagle performance is worse than expected
LiuXiaoxuanPKU opened this issue 2 months ago
LiuXiaoxuanPKU opened this issue 2 months ago
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models
sroy745 opened this pull request 2 months ago
sroy745 opened this pull request 2 months ago
[Bug]: GGUF Llama-3.1-Nemotron-70B-Instruct-HF ValueError: cannot reshape array of size into shape
paolovic opened this issue 2 months ago
paolovic opened this issue 2 months ago
[Bug]: MistralTokenizer Detokenization Issue
prashantgupta24 opened this issue 2 months ago
prashantgupta24 opened this issue 2 months ago
[Bugfix][Misc]: fix graph capture for decoder
yudian0504 opened this pull request 2 months ago
yudian0504 opened this pull request 2 months ago
[Feature]: Support for Controlled Decoding
simonucl opened this issue 2 months ago
simonucl opened this issue 2 months ago
[Bugfix] Fix load config when using bools
madt2709 opened this pull request 2 months ago
madt2709 opened this pull request 2 months ago
[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix caching, block manager v2 and xformers enabled together
sasha0552 opened this pull request 2 months ago
sasha0552 opened this pull request 2 months ago
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger
heheda12345 opened this pull request 2 months ago
heheda12345 opened this pull request 2 months ago
[Frontend] Support suffix in completions API (fill-in-the-middle)
njhill opened this pull request 2 months ago
njhill opened this pull request 2 months ago
Adds method to read the pooling types from model's files
flaviabeo opened this pull request 2 months ago
flaviabeo opened this pull request 2 months ago
[Model] Update MPT model with GLU and rope and add low precision layer norm
kazuki opened this pull request 2 months ago
kazuki opened this pull request 2 months ago
[Bug]: When reading the content from the configuration file specified by the --config parameter, the parameter type was not considered.
SakigamiYang opened this issue 2 months ago
SakigamiYang opened this issue 2 months ago
[Bug]: [Performance] 100% performance drop using multiple lora vs no lora(qwen-chat model)
askcs517 opened this issue 2 months ago
askcs517 opened this issue 2 months ago
[Feature]: LoRA support for InternVLChatModel
AkshataABhat opened this issue 2 months ago
AkshataABhat opened this issue 2 months ago
[Misc] Fix ImportError causing by triton
MengqingCao opened this pull request 2 months ago
MengqingCao opened this pull request 2 months ago
【Frontend】Add sampler_priority and repetition_penalty_range
ZeroYuJie opened this pull request 2 months ago
ZeroYuJie opened this pull request 2 months ago
[Performance]: InternVL multi image speed is not improved compare to original
luohao123 opened this issue 2 months ago
luohao123 opened this issue 2 months ago
[Feature]: Consider parallel_tool_calls parameter at the API level
lucasalvarezlacasa opened this issue 2 months ago
lucasalvarezlacasa opened this issue 2 months ago
[Misc] Compute query_start_loc/seq_start_loc on CPU
zhengy001 opened this pull request 2 months ago
zhengy001 opened this pull request 2 months ago
[Frontend] re-enable multi-modality input in the new beam search implementation
FerdinandZhong opened this pull request 2 months ago
FerdinandZhong opened this pull request 2 months ago
[Bug]: RuntimeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241016-170451.pkl): view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
double-vin opened this issue 2 months ago
double-vin opened this issue 2 months ago
[Performance]: inference with qwen2.5 using version vLLM 0.6.3 is felt to be slower
Jimmy-L99 opened this issue 2 months ago
Jimmy-L99 opened this issue 2 months ago
[Usage]: Which branch should I use to test speculative decoding
v-lmn opened this issue 2 months ago
v-lmn opened this issue 2 months ago
Begin refactoring executor_base ABC
jberkhahn opened this pull request 2 months ago
jberkhahn opened this pull request 2 months ago
Support Roberta embedding models
maxdebayser opened this pull request 2 months ago
maxdebayser opened this pull request 2 months ago
[Performance][Kernel] Fused_moe Performance Improvement
charlifu opened this pull request 2 months ago
charlifu opened this pull request 2 months ago
[New Model]: Support Zyphra/Zamba2-7B
mgoin opened this issue 2 months ago
mgoin opened this issue 2 months ago
[Bug]: KeyError: 'layers.60.mlp.gate_up_proj.weight' mistral large bitsandbytes
copasseron opened this issue 2 months ago
copasseron opened this issue 2 months ago
[CI/Build] remove .github from .dockerignore
dtrifiro opened this pull request 2 months ago
dtrifiro opened this pull request 2 months ago
[Neuron] [Bugfix] Fix neuron startup
xendo opened this pull request 2 months ago
xendo opened this pull request 2 months ago
[Bug]: Tensor Parallelism performs poorly
DanielViglione opened this issue 2 months ago
DanielViglione opened this issue 2 months ago
[CI/Build] VLM Test Consolidation
alex-jw-brooks opened this pull request 2 months ago
alex-jw-brooks opened this pull request 2 months ago
[Bug]: process killed when I set tp>1 for running benchmark_throughput.py
zeyang12-jpg opened this issue 2 months ago
zeyang12-jpg opened this issue 2 months ago
[CI][Misc] Add tests for python-only development
cermeng opened this pull request 2 months ago
cermeng opened this pull request 2 months ago
[Bug]: cannot run model when TP>1 (already run debug file)
jli943 opened this issue 2 months ago
jli943 opened this issue 2 months ago
[Feature]: support for prompt cache
wiluen opened this issue 2 months ago
wiluen opened this issue 2 months ago
[Bug]: --cpu-offload-gb flag not honored in vllm/vllm-openai container on amazon g5.2xlarge
DanielViglione opened this issue 2 months ago
DanielViglione opened this issue 2 months ago
[Usage]: 在尝试利用vllm通过加载lora适配器来进行模型推理的时候,表显不符合预期
PeaceAndJoyAaron opened this issue 2 months ago
PeaceAndJoyAaron opened this issue 2 months ago
[Bug]: In function calls, when outputting Chinese, a backslash character "\" appears before Chinese characters.
yhhit opened this issue 2 months ago
yhhit opened this issue 2 months ago
[Bug]: 400 Bad Request
ErykCh opened this issue 2 months ago
ErykCh opened this issue 2 months ago
[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs
bhupendra1324 opened this issue 2 months ago
bhupendra1324 opened this issue 2 months ago
[Misc]: Im trying to host my finetuned Llama -3-8b instruct in Vllm
preethiisenthil opened this issue 2 months ago
preethiisenthil opened this issue 2 months ago
[Bug]: Error running Molmo on API in v0.6.3
Inforeon opened this issue 2 months ago
Inforeon opened this issue 2 months ago
[Bug]: guided_json fails on pixtral when using OpenAI API
ktrapeznikov opened this issue 2 months ago
ktrapeznikov opened this issue 2 months ago
[Bugfix]: Make chat content text allow type content
vrdn-23 opened this pull request 2 months ago
vrdn-23 opened this pull request 2 months ago
[BugFix] Fix chat API continuous usage stats
njhill opened this pull request 2 months ago
njhill opened this pull request 2 months ago
[Bug]: llama3.2-11B-Vision-Instruct not working
warlockedward opened this issue 2 months ago
warlockedward opened this issue 2 months ago
bugfix on draft_tp value
qibaoyuan opened this pull request 2 months ago
qibaoyuan opened this pull request 2 months ago
[Installation]: v0.6.3 pip install -e . error
tolry418 opened this issue 2 months ago
tolry418 opened this issue 2 months ago
[Installation]: When can release the WHL package for version v0.6.3 of cu118?
controlRun opened this issue 2 months ago
controlRun opened this issue 2 months ago
[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage
joerunde opened this pull request 2 months ago
joerunde opened this pull request 2 months ago
[Bugfix] Update InternVL input mapper to support image embeds
hhzhang16 opened this pull request 2 months ago
hhzhang16 opened this pull request 2 months ago
[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel
WoosukKwon opened this pull request 2 months ago
WoosukKwon opened this pull request 2 months ago
pass ignore_eos parameter to all benchmark_serving calls
gracehonv opened this pull request 2 months ago
gracehonv opened this pull request 2 months ago
[Doc] Fix code formatting in spec_decode.rst
mgoin opened this pull request 2 months ago
mgoin opened this pull request 2 months ago
[Docs] Remove PDF build from Readtehdocs
simon-mo opened this pull request 2 months ago
simon-mo opened this pull request 2 months ago
[Usage]: Obtaining success / error rate % metrics
yqlu opened this issue 2 months ago
yqlu opened this issue 2 months ago
[Frontend] Clarify model_type error messages
stevegrubb opened this pull request 2 months ago
stevegrubb opened this pull request 2 months ago
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support
bigPYJ1151 opened this pull request 2 months ago
bigPYJ1151 opened this pull request 2 months ago
[Bugfix] Clean up some cruft in mamba.py
tlrmchlsmth opened this pull request 2 months ago
tlrmchlsmth opened this pull request 2 months ago
[Bug]: vllm crashes when preemption of priority scheduling is triggered on vllm-0.6.3.dev173+g36ea7907.d20241011
tonyaw opened this issue 2 months ago
tonyaw opened this issue 2 months ago
[Bug]: LLAMA 3.2 11B Vision Instruct Model not Running in VLLM 0.6.2
saikatscalers opened this issue 2 months ago
saikatscalers opened this issue 2 months ago
[Installation]: Adding opentelemetry packages in container image
sanketsudake opened this issue 2 months ago
sanketsudake opened this issue 2 months ago
[Usage]: --cpu-offload-gb no use
Rane2021 opened this issue 2 months ago
Rane2021 opened this issue 2 months ago
[Hardware] [Intel GPU] Add multistep scheduler for xpu device
jikunshang opened this pull request 2 months ago
jikunshang opened this pull request 2 months ago