Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bug]: Tensor Parallelism performs poorly
DanielViglione opened this issue 4 months ago
DanielViglione opened this issue 4 months ago
[CI/Build] VLM Test Consolidation
alex-jw-brooks opened this pull request 4 months ago
alex-jw-brooks opened this pull request 4 months ago
[Bug]: process killed when I set tp>1 for running benchmark_throughput.py
zeyang12-jpg opened this issue 4 months ago
zeyang12-jpg opened this issue 4 months ago
[CI][Misc] Add tests for python-only development
cermeng opened this pull request 4 months ago
cermeng opened this pull request 4 months ago
[Bug]: cannot run model when TP>1 (already run debug file)
jli943 opened this issue 4 months ago
jli943 opened this issue 4 months ago
[Feature]: support for prompt cache
wiluen opened this issue 4 months ago
wiluen opened this issue 4 months ago
[Bug]: --cpu-offload-gb flag not honored in vllm/vllm-openai container on amazon g5.2xlarge
DanielViglione opened this issue 4 months ago
DanielViglione opened this issue 4 months ago
[Usage]: 在尝试利用vllm通过加载lora适配器来进行模型推理的时候,表显不符合预期
PeaceAndJoyAaron opened this issue 4 months ago
PeaceAndJoyAaron opened this issue 4 months ago
[Bug]: In function calls, when outputting Chinese, a backslash character "\" appears before Chinese characters.
yhhit opened this issue 4 months ago
yhhit opened this issue 4 months ago
[Bug]: 400 Bad Request
ErykCh opened this issue 4 months ago
ErykCh opened this issue 4 months ago
[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs
bhupendra1324 opened this issue 4 months ago
bhupendra1324 opened this issue 4 months ago
[Misc]: Im trying to host my finetuned Llama -3-8b instruct in Vllm
preethiisenthil opened this issue 4 months ago
preethiisenthil opened this issue 4 months ago
[Bug]: Error running Molmo on API in v0.6.3
Inforeon opened this issue 4 months ago
Inforeon opened this issue 4 months ago
[Bug]: guided_json fails on pixtral when using OpenAI API
ktrapeznikov opened this issue 4 months ago
ktrapeznikov opened this issue 4 months ago
[Bugfix]: Make chat content text allow type content
vrdn-23 opened this pull request 4 months ago
vrdn-23 opened this pull request 4 months ago
[BugFix] Fix chat API continuous usage stats
njhill opened this pull request 4 months ago
njhill opened this pull request 4 months ago
[Bug]: llama3.2-11B-Vision-Instruct not working
warlockedward opened this issue 4 months ago
warlockedward opened this issue 4 months ago
bugfix on draft_tp value
qibaoyuan opened this pull request 4 months ago
qibaoyuan opened this pull request 4 months ago
[Installation]: v0.6.3 install -cutlass failed (Fetchcontent.cmake build step)
tolry418 opened this issue 4 months ago
tolry418 opened this issue 4 months ago
[Installation]: When can release the WHL package for version v0.6.3 of cu118?
controlRun opened this issue 4 months ago
controlRun opened this issue 4 months ago
[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage
joerunde opened this pull request 4 months ago
joerunde opened this pull request 4 months ago
[Bugfix] Update InternVL input mapper to support image embeds
hhzhang16 opened this pull request 4 months ago
hhzhang16 opened this pull request 4 months ago
[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel
WoosukKwon opened this pull request 4 months ago
WoosukKwon opened this pull request 4 months ago
pass ignore_eos parameter to all benchmark_serving calls
gracehonv opened this pull request 4 months ago
gracehonv opened this pull request 4 months ago
[Doc] Fix code formatting in spec_decode.rst
mgoin opened this pull request 4 months ago
mgoin opened this pull request 4 months ago
[Docs] Remove PDF build from Readtehdocs
simon-mo opened this pull request 4 months ago
simon-mo opened this pull request 4 months ago
[Usage]: Obtaining success / error rate % metrics
yqlu opened this issue 4 months ago
yqlu opened this issue 4 months ago
[Frontend] Clarify model_type error messages
stevegrubb opened this pull request 4 months ago
stevegrubb opened this pull request 4 months ago
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support
bigPYJ1151 opened this pull request 4 months ago
bigPYJ1151 opened this pull request 4 months ago
[Bugfix] Clean up some cruft in mamba.py
tlrmchlsmth opened this pull request 4 months ago
tlrmchlsmth opened this pull request 4 months ago
[Bug]: vllm crashes when preemption of priority scheduling is triggered on vllm-0.6.3.dev173+g36ea7907.d20241011
tonyaw opened this issue 4 months ago
tonyaw opened this issue 4 months ago
[Bug]: LLAMA 3.2 11B Vision Instruct Model not Running in VLLM 0.6.2
saikatscalers opened this issue 4 months ago
saikatscalers opened this issue 4 months ago
[Installation]: Adding opentelemetry packages in container image
sanketsudake opened this issue 4 months ago
sanketsudake opened this issue 4 months ago
[Usage]: --cpu-offload-gb no use
Rane2021 opened this issue 4 months ago
Rane2021 opened this issue 4 months ago
[Hardware] [Intel GPU] Add multistep scheduler for xpu device
jikunshang opened this pull request 4 months ago
jikunshang opened this pull request 4 months ago
[Feature]: Allow max_tokens = 0
fgebhart opened this issue 4 months ago
fgebhart opened this issue 4 months ago
[Bug]: missing 'Finished request xxxx' log
jinzhen-lin opened this issue 4 months ago
jinzhen-lin opened this issue 4 months ago
[Bug]: TPU single-host v5e-8 HBM OOM with Llama 3.1 70B and tpu_int8 quantization
samos123 opened this issue 4 months ago
samos123 opened this issue 4 months ago
[Bug]: Exception in worker VllmWorkerProcess while processing method init_device: NCCL error: unhandled cuda error
wangyao123456a opened this issue 4 months ago
wangyao123456a opened this issue 4 months ago
[Bug]: Gemma 27B Produces no Outputs (2B and 9B work fine)
RonanKMcGovern opened this issue 4 months ago
RonanKMcGovern opened this issue 4 months ago
[Feature]: Support for rhymes-ai/Aria
engchina opened this issue 4 months ago
engchina opened this issue 4 months ago
[Misc]: remove dropout related stuff from triton flash attention kernel
HaiShaw opened this issue 4 months ago
HaiShaw opened this issue 4 months ago
[Bug]: 当vLLM 部署实现 OpenAI API,并且生成模型使用llama 3 8b instruct做RAG任务时,模型生成不停
asilverlight opened this issue 4 months ago
asilverlight opened this issue 4 months ago
[Bug]: Installed vllm successfully for AMD MI60 but inference is failing
Said-Akbar opened this issue 4 months ago
Said-Akbar opened this issue 4 months ago
[Usage]: [rank0]: AttributeError: 'LLMEngine' object has no attribute 'driver_worker'
xuyuemei opened this issue 4 months ago
xuyuemei opened this issue 4 months ago
[CI] Fix merge conflict
LiuXiaoxuanPKU opened this pull request 4 months ago
LiuXiaoxuanPKU opened this pull request 4 months ago
[Bug]: KeyError during loading of Mixtral 8x22B in FP8
IowaSovereign opened this issue 4 months ago
IowaSovereign opened this issue 4 months ago
[help wanted]: write tests for python-only development
youkaichao opened this issue 4 months ago
youkaichao opened this issue 4 months ago
[RFC]: Let every model be a reward model/embedding model for PRMs
zhuzilin opened this issue 4 months ago
zhuzilin opened this issue 4 months ago
[Bug]: different generation result when changing parameters using `copy_` and `=` method
hxdtest opened this issue 4 months ago
hxdtest opened this issue 4 months ago
[Bug]: api_server.py: error: argument --tool-call-parser: invalid choice: 'llama3_json' (choose from 'mistral', 'hermes')
joestein-ssc opened this issue 4 months ago
joestein-ssc opened this issue 4 months ago
[Bugfix] Update grafana dashboard
zhan9san opened this pull request 4 months ago
zhan9san opened this pull request 4 months ago
[Bug]: vllm mistralai--Codestral-22B-v0.1 response is truncated
Fly-Pluche opened this issue 4 months ago
Fly-Pluche opened this issue 4 months ago
[Misc][Installation] Improve source installation script and related documentation
cermeng opened this pull request 4 months ago
cermeng opened this pull request 4 months ago
[Bug]: Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
eyuansu62 opened this issue 4 months ago
eyuansu62 opened this issue 4 months ago
[Bug]: latest docker build (0.6.2) got error due to VLLM_MAX_SIZE_MB
ZJLi2013 opened this issue 4 months ago
ZJLi2013 opened this issue 4 months ago
[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
Clint-chan opened this issue 4 months ago
Clint-chan opened this issue 4 months ago
[Installation]: vllm installation error
leoneyar opened this issue 4 months ago
leoneyar opened this issue 4 months ago
[Bug]: Hermes 2 Pro Tool parser could not locate tool call start/end tokens in the tokenizer!
LuckLittleBoy opened this issue 4 months ago
LuckLittleBoy opened this issue 4 months ago
[Model] VLM2Vec, the first multimodal embedding model in vLLM
DarkLight1337 opened this pull request 4 months ago
DarkLight1337 opened this pull request 4 months ago
[core] move parallel sampling out from vllm core
youkaichao opened this pull request 4 months ago
youkaichao opened this pull request 4 months ago
[Quantization][TPU] `compressed-tensors` integration for TPU
robertgshaw2-neuralmagic opened this pull request 4 months ago
robertgshaw2-neuralmagic opened this pull request 4 months ago
[misc] Fine-grained CustomOp enabling mechanism
ProExpertProg opened this pull request 4 months ago
ProExpertProg opened this pull request 4 months ago
[Bugfix] Fix support for dimension like integers and ScalarType
bnellnm opened this pull request 4 months ago
bnellnm opened this pull request 4 months ago
[SpecDec] Remove Batch Expansion (2/3)
LiuXiaoxuanPKU opened this pull request 4 months ago
LiuXiaoxuanPKU opened this pull request 4 months ago
[CI/Build] Adds a test for multi step with TPUs
allenwang28 opened this pull request 4 months ago
allenwang28 opened this pull request 4 months ago
[Frontend] merge beam search implementations
LunrEclipse opened this pull request 4 months ago
LunrEclipse opened this pull request 4 months ago
[bugfix] fix f-string for error
prashantgupta24 opened this pull request 4 months ago
prashantgupta24 opened this pull request 4 months ago
[New Model]: meta-llama/Llama-Guard-3-1B
ayeganov opened this issue 4 months ago
ayeganov opened this issue 4 months ago
[Misc] Add environment variables collection in collect_env.py tool
ycool opened this pull request 4 months ago
ycool opened this pull request 4 months ago
[Model] Support Mamba2 (Codestral Mamba)
tlrmchlsmth opened this pull request 4 months ago
tlrmchlsmth opened this pull request 4 months ago
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding
NickLucche opened this pull request 4 months ago
NickLucche opened this pull request 4 months ago
[Bug]: Out of memory with large multi-step and large gpu-memory-utilization values - `--num-scheduler-steps 16 --gpu-memory-utilization 0.941`
varun-sundar-rabindranath opened this issue 4 months ago
varun-sundar-rabindranath opened this issue 4 months ago
Add `vllm_v1`
WoosukKwon opened this pull request 4 months ago
WoosukKwon opened this pull request 4 months ago
[Doc] Remove outdated comment to avoid misunderstanding
homeffjy opened this pull request 4 months ago
homeffjy opened this pull request 4 months ago
[Bugfix]Fix MiniCPM's LoRA bug
jeejeelee opened this pull request 4 months ago
jeejeelee opened this pull request 4 months ago
Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph.
sighingnow opened this pull request 4 months ago
sighingnow opened this pull request 4 months ago
May I ask what is the image parameter of vllm's api about blip2, and I have an error here,INFO: 127.0.0.1:40608 - "POST /v1/completions HTTP/1.1" 400 Bad Request
zhaoxueqi6666 opened this issue 4 months ago
zhaoxueqi6666 opened this issue 4 months ago
[Bug]: Simultaneous mm calls lead to permanently degraded performance.
SeanIsYoung opened this issue 4 months ago
SeanIsYoung opened this issue 4 months ago
[Bug]: MiniCPM3-4B is support lora by --enable-lora ?
ML-GCN opened this issue 4 months ago
ML-GCN opened this issue 4 months ago
`seed_everything` doesn't handle HPU
SanjuCSudhakaran opened this pull request 4 months ago
SanjuCSudhakaran opened this pull request 4 months ago
[Bug]: VLLM doesn't support LoRa with config `modules_to_save`
fahadh4ilyas opened this issue 4 months ago
fahadh4ilyas opened this issue 4 months ago
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs
hissu-hyvarinen opened this pull request 4 months ago
hissu-hyvarinen opened this pull request 4 months ago
[CI] add `ignore_eos` for `benchmark_serving.py`
jikunshang opened this pull request 4 months ago
jikunshang opened this pull request 4 months ago
[Bugfix] Fix priority in multiprocessing engine
schoennenbeck opened this pull request 4 months ago
schoennenbeck opened this pull request 4 months ago
[Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps`
junstar92 opened this pull request 4 months ago
junstar92 opened this pull request 4 months ago
[Misc][LoRA] Support loading LoRA weights for target_modules in reg format
jeejeelee opened this pull request 4 months ago
jeejeelee opened this pull request 4 months ago
[Usage]: Manually Increasing inference time
Playerrrrr opened this issue 4 months ago
Playerrrrr opened this issue 4 months ago
[Usage]: VLLM 0.6.2 includes vllm-flash-attn, is it no longer necessary to install flash-attn separately?
Rssevenyu opened this issue 4 months ago
Rssevenyu opened this issue 4 months ago
[Bug]: priority scheduling doesn't work on vllm-0.6.3.dev152+gde895f16.d20241010
tonyaw opened this issue 4 months ago
tonyaw opened this issue 4 months ago
Max num seqs"
seungrokj opened this pull request 4 months ago
seungrokj opened this pull request 4 months ago
[Usage]: blip2 inference code
zhaoxueqi6666 opened this issue 4 months ago
zhaoxueqi6666 opened this issue 4 months ago
[RFC]: Make device agnostic for diverse hardware support
wangshuai09 opened this issue 4 months ago
wangshuai09 opened this issue 4 months ago
[CI/Build] mypy: Resolve some errors from checking vllm/engine
russellb opened this pull request 4 months ago
russellb opened this pull request 4 months ago
Pytorch hete spec
jiqing-feng opened this pull request 4 months ago
jiqing-feng opened this pull request 4 months ago
[Feature]: Improve Logging For Embedding Models
robertgshaw2-neuralmagic opened this issue 4 months ago
robertgshaw2-neuralmagic opened this issue 4 months ago
[Frontend, Core] Adding stop and stop_token_ids for beam search.
nFunctor opened this pull request 4 months ago
nFunctor opened this pull request 4 months ago
[Bug]: AsyncLLMEngine stuck on a single too long request
rickyyx opened this issue 4 months ago
rickyyx opened this issue 4 months ago