Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Feature]: Allow max_tokens = 0
fgebhart opened this issue 2 months ago
fgebhart opened this issue 2 months ago
[Bug]: Exception in worker VllmWorkerProcess while processing method init_device: NCCL error: unhandled cuda error
wangyao123456a opened this issue 2 months ago
wangyao123456a opened this issue 2 months ago
[Feature]: Support for rhymes-ai/Aria
engchina opened this issue 2 months ago
engchina opened this issue 2 months ago
[Misc]: remove dropout related stuff from triton flash attention kernel
HaiShaw opened this issue 2 months ago
HaiShaw opened this issue 2 months ago
[Bug]: 当vLLM 部署实现 OpenAI API,并且生成模型使用llama 3 8b instruct做RAG任务时,模型生成不停
asilverlight opened this issue 2 months ago
asilverlight opened this issue 2 months ago
[Bug]: Installed vllm successfully for AMD MI60 but inference is failing
Said-Akbar opened this issue 2 months ago
Said-Akbar opened this issue 2 months ago
[Usage]: [rank0]: AttributeError: 'LLMEngine' object has no attribute 'driver_worker'
xuyuemei opened this issue 2 months ago
xuyuemei opened this issue 2 months ago
[CI] Fix merge conflict
LiuXiaoxuanPKU opened this pull request 2 months ago
LiuXiaoxuanPKU opened this pull request 2 months ago
[Bug]: KeyError during loading of Mixtral 8x22B in FP8
IowaSovereign opened this issue 2 months ago
IowaSovereign opened this issue 2 months ago
[help wanted]: write tests for python-only development
youkaichao opened this issue 2 months ago
youkaichao opened this issue 2 months ago
[RFC]: Let every model be a reward model/embedding model for PRMs
zhuzilin opened this issue 2 months ago
zhuzilin opened this issue 2 months ago
[Bug]: different generation result when changing parameters using `copy_` and `=` method
hxdtest opened this issue 2 months ago
hxdtest opened this issue 2 months ago
[Bug]: api_server.py: error: argument --tool-call-parser: invalid choice: 'llama3_json' (choose from 'mistral', 'hermes')
joestein-ssc opened this issue 2 months ago
joestein-ssc opened this issue 2 months ago
[Bugfix] Update grafana dashboard
zhan9san opened this pull request 2 months ago
zhan9san opened this pull request 2 months ago
[Bug]: vllm mistralai--Codestral-22B-v0.1 response is truncated
Fly-Pluche opened this issue 2 months ago
Fly-Pluche opened this issue 2 months ago
[Misc][Installation] Improve source installation script and related documentation
cermeng opened this pull request 2 months ago
cermeng opened this pull request 2 months ago
[Bug]: Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
eyuansu62 opened this issue 2 months ago
eyuansu62 opened this issue 2 months ago
[Bug]: latest docker build (0.6.2) got error due to VLLM_MAX_SIZE_MB
ZJLi2013 opened this issue 2 months ago
ZJLi2013 opened this issue 2 months ago
[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
Clint-chan opened this issue 2 months ago
Clint-chan opened this issue 2 months ago
[Installation]: vllm installation error
leoneyar opened this issue 2 months ago
leoneyar opened this issue 2 months ago
[Bug]: Hermes 2 Pro Tool parser could not locate tool call start/end tokens in the tokenizer!
LuckLittleBoy opened this issue 2 months ago
LuckLittleBoy opened this issue 2 months ago
[Model] VLM2Vec, the first multimodal embedding model in vLLM
DarkLight1337 opened this pull request 2 months ago
DarkLight1337 opened this pull request 2 months ago
[core] move parallel sampling out from vllm core
youkaichao opened this pull request 2 months ago
youkaichao opened this pull request 2 months ago
[Quantization][TPU] `compressed-tensors` integration for TPU
robertgshaw2-neuralmagic opened this pull request 2 months ago
robertgshaw2-neuralmagic opened this pull request 2 months ago
[misc] Fine-grained CustomOp enabling mechanism
ProExpertProg opened this pull request 2 months ago
ProExpertProg opened this pull request 2 months ago
[Bugfix] Fix support for dimension like integers and ScalarType
bnellnm opened this pull request 2 months ago
bnellnm opened this pull request 2 months ago
[SpecDec] Remove Batch Expansion (2/3)
LiuXiaoxuanPKU opened this pull request 2 months ago
LiuXiaoxuanPKU opened this pull request 2 months ago
[CI/Build] Adds a test for multi step with TPUs
allenwang28 opened this pull request 2 months ago
allenwang28 opened this pull request 2 months ago
[Frontend] merge beam search implementations
LunrEclipse opened this pull request 2 months ago
LunrEclipse opened this pull request 2 months ago
[bugfix] fix f-string for error
prashantgupta24 opened this pull request 2 months ago
prashantgupta24 opened this pull request 2 months ago
[New Model]: meta-llama/Llama-Guard-3-1B
ayeganov opened this issue 2 months ago
ayeganov opened this issue 2 months ago
[Misc] Add environment variables collection in collect_env.py tool
ycool opened this pull request 2 months ago
ycool opened this pull request 2 months ago
[Model] Support Mamba2 (Codestral Mamba)
tlrmchlsmth opened this pull request 2 months ago
tlrmchlsmth opened this pull request 2 months ago
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding
NickLucche opened this pull request 2 months ago
NickLucche opened this pull request 2 months ago
[Bug]: Out of memory with large multi-step and large gpu-memory-utilization values - `--num-scheduler-steps 16 --gpu-memory-utilization 0.941`
varun-sundar-rabindranath opened this issue 2 months ago
varun-sundar-rabindranath opened this issue 2 months ago
Add `vllm_v1`
WoosukKwon opened this pull request 2 months ago
WoosukKwon opened this pull request 2 months ago
[Doc] Remove outdated comment to avoid misunderstanding
homeffjy opened this pull request 2 months ago
homeffjy opened this pull request 2 months ago
[Bugfix]Fix MiniCPM's LoRA bug
jeejeelee opened this pull request 2 months ago
jeejeelee opened this pull request 2 months ago
Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph.
sighingnow opened this pull request 2 months ago
sighingnow opened this pull request 2 months ago
May I ask what is the image parameter of vllm's api about blip2, and I have an error here,INFO: 127.0.0.1:40608 - "POST /v1/completions HTTP/1.1" 400 Bad Request
zhaoxueqi6666 opened this issue 2 months ago
zhaoxueqi6666 opened this issue 2 months ago
[Bug]: Simultaneous mm calls lead to permanently degraded performance.
SeanIsYoung opened this issue 2 months ago
SeanIsYoung opened this issue 2 months ago
[Bug]: MiniCPM3-4B is support lora by --enable-lora ?
ML-GCN opened this issue 2 months ago
ML-GCN opened this issue 2 months ago
`seed_everything` doesn't handle HPU
SanjuCSudhakaran opened this pull request 2 months ago
SanjuCSudhakaran opened this pull request 2 months ago
[Bug]: VLLM doesn't support LoRa with config `modules_to_save`
fahadh4ilyas opened this issue 2 months ago
fahadh4ilyas opened this issue 2 months ago
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs
hissu-hyvarinen opened this pull request 2 months ago
hissu-hyvarinen opened this pull request 2 months ago
[CI] add `ignore_eos` for `benchmark_serving.py`
jikunshang opened this pull request 2 months ago
jikunshang opened this pull request 2 months ago
[Bugfix] Fix priority in multiprocessing engine
schoennenbeck opened this pull request 2 months ago
schoennenbeck opened this pull request 2 months ago
[Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps`
junstar92 opened this pull request 2 months ago
junstar92 opened this pull request 2 months ago
[Misc][LoRA] Support loading LoRA weights for target_modules in reg format
jeejeelee opened this pull request 2 months ago
jeejeelee opened this pull request 2 months ago
[Usage]: Manually Increasing inference time
Playerrrrr opened this issue 2 months ago
Playerrrrr opened this issue 2 months ago
[Usage]: VLLM 0.6.2 includes vllm-flash-attn, is it no longer necessary to install flash-attn separately?
Rssevenyu opened this issue 2 months ago
Rssevenyu opened this issue 2 months ago
[Bug]: priority scheduling doesn't work on vllm-0.6.3.dev152+gde895f16.d20241010
tonyaw opened this issue 2 months ago
tonyaw opened this issue 2 months ago
Max num seqs"
seungrokj opened this pull request 2 months ago
seungrokj opened this pull request 2 months ago
[Usage]: blip2 inference code
zhaoxueqi6666 opened this issue 2 months ago
zhaoxueqi6666 opened this issue 2 months ago
[RFC]: Make device agnostic for diverse hardware support
wangshuai09 opened this issue 2 months ago
wangshuai09 opened this issue 2 months ago
[CI/Build] mypy: Resolve some errors from checking vllm/engine
russellb opened this pull request 2 months ago
russellb opened this pull request 2 months ago
Pytorch hete spec
jiqing-feng opened this pull request 2 months ago
jiqing-feng opened this pull request 2 months ago
[Feature]: Improve Logging For Embedding Models
robertgshaw2-neuralmagic opened this issue 2 months ago
robertgshaw2-neuralmagic opened this issue 2 months ago
[Frontend, Core] Adding stop and stop_token_ids for beam search.
nFunctor opened this pull request 2 months ago
nFunctor opened this pull request 2 months ago
[Bug]: AsyncLLMEngine stuck on a single too long request
rickyyx opened this issue 2 months ago
rickyyx opened this issue 2 months ago
[ci/build] Add placeholder command for custom models test and add comments
khluu opened this pull request 2 months ago
khluu opened this pull request 2 months ago
[misc] hide best_of from engine
youkaichao opened this pull request 2 months ago
youkaichao opened this pull request 2 months ago
[Bug]: Streaming response fails after one token (0.5.3.post1)
NeonDaniel opened this issue 2 months ago
NeonDaniel opened this issue 2 months ago
[CI/Build] Adopt Mergify for auto-labeling PRs
russellb opened this pull request 2 months ago
russellb opened this pull request 2 months ago
[torch.compile] generic decorators
youkaichao opened this pull request 2 months ago
youkaichao opened this pull request 2 months ago
[Doc][Neuron] add note to neuron documentation about resolving triton issue
omrishiv opened this pull request 2 months ago
omrishiv opened this pull request 2 months ago
[Doc] Improve quickstart documentation
rafvasq opened this pull request 2 months ago
rafvasq opened this pull request 2 months ago
[Usage]: running gated models offline
SamuelBG13 opened this issue 2 months ago
SamuelBG13 opened this issue 2 months ago
[Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected
LucasWilkinson opened this pull request 2 months ago
LucasWilkinson opened this pull request 2 months ago
[Bug]: new beam search implementation ignores stop conditions
nFunctor opened this issue 2 months ago
nFunctor opened this issue 2 months ago
[CI/Build] Make the `Dockerfile.cpu` file's `PIP_EXTRA_INDEX_URL` Configurable as a Build Argument
jyono opened this pull request 2 months ago
jyono opened this pull request 2 months ago
[Misc] Standardize RoPE handling for Qwen2-VL
DarkLight1337 opened this pull request 2 months ago
DarkLight1337 opened this pull request 2 months ago
[Model] Add Qwen2-Audio model support
faychu opened this pull request 2 months ago
faychu opened this pull request 2 months ago
[Doc]: The relationship between FlashAttentionBackend and paged_attention_kernel
zhaotyer opened this issue 2 months ago
zhaotyer opened this issue 2 months ago
[Kernel] adding fused moe kernel config for L40S TP4
bringlein opened this pull request 2 months ago
bringlein opened this pull request 2 months ago
[Bug]: vllm0.6.2 Using FLASHINFER to start VLLM reported an error, enabling -- quantification gptq -- kv cache dtype fp8_e5m2
Rssevenyu opened this issue 2 months ago
Rssevenyu opened this issue 2 months ago
[Model] Add GLM-4v support and meet vllm==0.6.2
sixsixcoder opened this pull request 2 months ago
sixsixcoder opened this pull request 2 months ago
Questions about the inference performance of the GPTQ model
Rssevenyu opened this issue 2 months ago
Rssevenyu opened this issue 2 months ago
[Model] support input image embedding for minicpmv
whyiug opened this pull request 2 months ago
whyiug opened this pull request 2 months ago
[Bug]: AssertionError When deploy API serve of Qwen2-VL-72B in Docker
FBR65 opened this issue 2 months ago
FBR65 opened this issue 2 months ago
[Misc] Fix sampling from sonnet for long context case
Imss27 opened this pull request 2 months ago
Imss27 opened this pull request 2 months ago
[issue tracker] make quantization compatible with dynamo dynamic shape
youkaichao opened this issue 2 months ago
youkaichao opened this issue 2 months ago
[Misc] Collect model support info in a single process per model
DarkLight1337 opened this pull request 2 months ago
DarkLight1337 opened this pull request 2 months ago
[Bug]: Qwen2.5-72B-Instruct压测出现AsyncLLMEngine has failed, terminating server process
WangJianQ-0118 opened this issue 2 months ago
WangJianQ-0118 opened this issue 2 months ago
[Feature] vLLM ARM Enablement for AARCH64 CPUs
sanketkaleoss opened this pull request 2 months ago
sanketkaleoss opened this pull request 2 months ago
[Bug]: Could not `pip install vllm` inside dockerfile after certain commit in `main` branch
fahadh4ilyas opened this issue 2 months ago
fahadh4ilyas opened this issue 2 months ago
[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args
DarkLight1337 opened this pull request 2 months ago
DarkLight1337 opened this pull request 2 months ago
[BugFix] Fix tool call finish reason in streaming case
maxdebayser opened this pull request 2 months ago
maxdebayser opened this pull request 2 months ago
[Bugfix] Sets `is_first_step_output` for TPUModelRunner
allenwang28 opened this pull request 2 months ago
allenwang28 opened this pull request 2 months ago
Add example of helm chart for vllm deployment on k8s
mfournioux opened this pull request 2 months ago
mfournioux opened this pull request 2 months ago
Bump actions/setup-python from 3 to 5
dependabot[bot] opened this pull request 2 months ago
dependabot[bot] opened this pull request 2 months ago
[RFC]: Adopt mergify for auto-labeling PRs
russellb opened this issue 2 months ago
russellb opened this issue 2 months ago
[Performance]: phi 3.5 vision model consuming high CPU RAM and the process getting killed
kuladeephx opened this issue 2 months ago
kuladeephx opened this issue 2 months ago
[Kernel][Model] Improve continuous batching for Jamba and Mamba
mzusman opened this pull request 2 months ago
mzusman opened this pull request 2 months ago
[Misc]: Repeat the sample sonnet.txt contents to accomodate large seq lengths in benchmarking
Bihan opened this issue 2 months ago
Bihan opened this issue 2 months ago
[Bug]: Qwen2.5-Math-7B-Instruct vllm output garbled code, but huggingface not
ziyuwan opened this issue 2 months ago
ziyuwan opened this issue 2 months ago
[Installation]: pip install vllm-0.6.2.zip err:setuptools-scm was unable to detect version for /tmp/pip-req-build-7ptioibj
uRENu opened this issue 2 months ago
uRENu opened this issue 2 months ago