Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: PaliGemma detection task is failing
github.com/vllm-project/vllm - nph4rd opened this issue 3 months ago
github.com/vllm-project/vllm - nph4rd opened this issue 3 months ago
[Bug]: Cannot use FlashAttention-2 backend because the vllm_flash_attn package is not found. But I have installed vllm-flash-attn.
github.com/vllm-project/vllm - xyfZzz opened this issue 3 months ago
github.com/vllm-project/vllm - xyfZzz opened this issue 3 months ago
[Misc] Use torch.compile for basic custom ops
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Core] Optimize SPMD architecture with delta + serialization optimization
github.com/vllm-project/vllm - rkooo567 opened this pull request 3 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 3 months ago
[Bugfix] fix spec decode with cuda graph
github.com/vllm-project/vllm - aurickq opened this pull request 3 months ago
github.com/vllm-project/vllm - aurickq opened this pull request 3 months ago
[Core] Add span metrics for model_forward, scheduler and sampler time
github.com/vllm-project/vllm - sfc-gh-mkeralapura opened this pull request 3 months ago
github.com/vllm-project/vllm - sfc-gh-mkeralapura opened this pull request 3 months ago
[Bug]: InternVL2 Inference RuntimeError: GET was unable to find an engine to execute this computation
github.com/vllm-project/vllm - HuichiZhou opened this issue 3 months ago
github.com/vllm-project/vllm - HuichiZhou opened this issue 3 months ago
Integrate fused Mixtral MoE with Marlin kernels
github.com/vllm-project/vllm - ElizaWszola opened this pull request 3 months ago
github.com/vllm-project/vllm - ElizaWszola opened this pull request 3 months ago
[Frontend] Add readiness and liveness endpoints to OpenAI API server
github.com/vllm-project/vllm - mfournioux opened this pull request 3 months ago
github.com/vllm-project/vllm - mfournioux opened this pull request 3 months ago
[Bug]: OutOfMemoryError when server running multi requests
github.com/vllm-project/vllm - lzcchl opened this issue 3 months ago
github.com/vllm-project/vllm - lzcchl opened this issue 3 months ago
[Bug]: RuntimeError: CHECK_EQ(paged_kv_indptr.size(0), batch_size + 1) failed. 1 vs 257. When load gemma-2-9b-it using vllm
github.com/vllm-project/vllm - seongjiko opened this issue 3 months ago
github.com/vllm-project/vllm - seongjiko opened this issue 3 months ago
[Core] Asynchronous Output Processor
github.com/vllm-project/vllm - megha95 opened this pull request 3 months ago
github.com/vllm-project/vllm - megha95 opened this pull request 3 months ago
[BugFix] Fix multiprocessing shutdown errors
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
[Usage]: weird GPU RAM usage
github.com/vllm-project/vllm - hieunguyenquoc opened this issue 3 months ago
github.com/vllm-project/vllm - hieunguyenquoc opened this issue 3 months ago
Add Classifier free guidance
github.com/vllm-project/vllm - zhaoyinglia opened this pull request 3 months ago
github.com/vllm-project/vllm - zhaoyinglia opened this pull request 3 months ago
[Bug] [ROCm]: ROCm fails to stop generating tokens on multiple GPTQ models
github.com/vllm-project/vllm - TNT3530 opened this issue 3 months ago
github.com/vllm-project/vllm - TNT3530 opened this issue 3 months ago
[TPU] Add Load-time W8A16 quantization for TPU Backend
github.com/vllm-project/vllm - lsy323 opened this pull request 3 months ago
github.com/vllm-project/vllm - lsy323 opened this pull request 3 months ago
[Bug]: VLLM crashes when prefix caching is enabled
github.com/vllm-project/vllm - m-harmonic opened this issue 3 months ago
github.com/vllm-project/vllm - m-harmonic opened this issue 3 months ago
[core] Multi Step Scheduling
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 3 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 3 months ago
[CI/Build] bump minimum cmake version
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
[Doc] Proofreading documentation
github.com/vllm-project/vllm - sgolebiewski-intel opened this pull request 3 months ago
github.com/vllm-project/vllm - sgolebiewski-intel opened this pull request 3 months ago
[WIP] Add Fused MoE W8A8 (Int8) Support
github.com/vllm-project/vllm - qingquansong opened this pull request 3 months ago
github.com/vllm-project/vllm - qingquansong opened this pull request 3 months ago
[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - chenchunhui97 opened this issue 3 months ago
github.com/vllm-project/vllm - chenchunhui97 opened this issue 3 months ago
[CI/Build][ROCm] Enabling tensorizer tests for ROCm
github.com/vllm-project/vllm - alexeykondrat opened this pull request 3 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request 3 months ago
[Bug]: ValueError: Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
github.com/vllm-project/vllm - youkaichao opened this issue 3 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 3 months ago
[Installation]: my env :cuda version is 12.0,python 3.10, which release should i choose?
github.com/vllm-project/vllm - fanjikang opened this issue 3 months ago
github.com/vllm-project/vllm - fanjikang opened this issue 3 months ago
[Frontend]: Add apply_chat_template method and update generate method in LLM class
github.com/vllm-project/vllm - llStringll opened this pull request 3 months ago
github.com/vllm-project/vllm - llStringll opened this pull request 3 months ago
[Model] Pipeline parallel support for Qwen2
github.com/vllm-project/vllm - xuyi opened this pull request 3 months ago
github.com/vllm-project/vllm - xuyi opened this pull request 3 months ago
[MISC] Introduce pipeline parallelism partition strategies
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
[Bug]: error: Segmentation fault(SIGSEGV received at time)
github.com/vllm-project/vllm - Archmilio opened this issue 3 months ago
github.com/vllm-project/vllm - Archmilio opened this issue 3 months ago
[Kernel][Misc] Add meta functions for ops to prevent graph breaks
github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago
[Bug]: JSON-guided generation failing to close text values
github.com/vllm-project/vllm - vecorro opened this issue 3 months ago
github.com/vllm-project/vllm - vecorro opened this issue 3 months ago
[Bug]: vLLM takes forever to load a locally stored 7B model
github.com/vllm-project/vllm - vibhas-singh opened this issue 3 months ago
github.com/vllm-project/vllm - vibhas-singh opened this issue 3 months ago
[Bug]: Error Running DeepSeek-v2-Lite w/ FP8
github.com/vllm-project/vllm - Jiayi-Pan opened this issue 3 months ago
github.com/vllm-project/vllm - Jiayi-Pan opened this issue 3 months ago
[Bug]: Error: Failed to initialize the TMA descriptor 700 for LLaMa 3.1 405B on 8*H100 -- prefill error?
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
[Core] generate from input embeds
github.com/vllm-project/vllm - Nan2018 opened this pull request 3 months ago
github.com/vllm-project/vllm - Nan2018 opened this pull request 3 months ago
[Kernel] [Triton] [AMD] Add Triton implementation of awq_dequantize
github.com/vllm-project/vllm - rasmith opened this pull request 3 months ago
github.com/vllm-project/vllm - rasmith opened this pull request 3 months ago
[Speculative Decoding] EAGLE Implementation with Top-1 proposer
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 3 months ago
github.com/vllm-project/vllm - abhigoyal1997 opened this pull request 3 months ago
[Bug]: Pipeline parallelism is very slow when inferencing one request
github.com/vllm-project/vllm - gty111 opened this issue 3 months ago
github.com/vllm-project/vllm - gty111 opened this issue 3 months ago
[Usage]: How do I deploy a model on two GPUs with different memory?
github.com/vllm-project/vllm - Halflifefa opened this issue 3 months ago
github.com/vllm-project/vllm - Halflifefa opened this issue 3 months ago
[Bug]: ERROR 07-26 14:50:35 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 214281 died, exit code: -11
github.com/vllm-project/vllm - TypeFloat opened this issue 3 months ago
github.com/vllm-project/vllm - TypeFloat opened this issue 3 months ago
[Model] Teleflm Support
github.com/vllm-project/vllm - horizon94 opened this pull request 3 months ago
github.com/vllm-project/vllm - horizon94 opened this pull request 3 months ago
[CI/Build] upgrade Dockerfile to ubuntu 22.04
github.com/vllm-project/vllm - samos123 opened this pull request 3 months ago
github.com/vllm-project/vllm - samos123 opened this pull request 3 months ago
[RFC]: Isolate OpenAI Server Into Separate Process
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
[CI] Reproduce SGLANG benchmark results
github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago
[Bug]: Engine iteration timed out. This should never happen!
github.com/vllm-project/vllm - Kelcin2 opened this issue 3 months ago
github.com/vllm-project/vllm - Kelcin2 opened this issue 3 months ago
[Usage]: can I use it with classification model (e.g. GemmaForSequenceClassification) ?
github.com/vllm-project/vllm - dodler opened this issue 3 months ago
github.com/vllm-project/vllm - dodler opened this issue 3 months ago
[Bugfix] Add synchronize to prevent possible data race
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 3 months ago
github.com/vllm-project/vllm - HwwwwwwwH opened this pull request 3 months ago
[Bugfix] Allow vllm to still work if triton is not installed.
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Feature]: ngram-spec-decode
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
[Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba
github.com/vllm-project/vllm - tomeras91 opened this pull request 3 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request 3 months ago
[Bug]: SIGSEGV received at time=1721904360 on cpu 140, Fatal Python error: Segmentation fault
github.com/vllm-project/vllm - eldarkurtic opened this issue 3 months ago
github.com/vllm-project/vllm - eldarkurtic opened this issue 3 months ago
[BugFix][Speculative Decoding] Fixes the generation token numbers with sps
github.com/vllm-project/vllm - sighingnow opened this pull request 3 months ago
github.com/vllm-project/vllm - sighingnow opened this pull request 3 months ago
[Performance]: Slow TTFT(?) for Qwen2-72B-GPTQ-Int4 on H100 *2
github.com/vllm-project/vllm - cyc00518 opened this issue 3 months ago
github.com/vllm-project/vllm - cyc00518 opened this issue 3 months ago
[Bug]: N-gram spec_decode in flash_attention bug
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
[Core] Use array to speedup padding
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
[Feature]: support Mistral-Large-Instruct-2407 function calling
github.com/vllm-project/vllm - ybdesire opened this issue 3 months ago
github.com/vllm-project/vllm - ybdesire opened this issue 3 months ago
[Performance]: Medusa SD have poor performance than baseline
github.com/vllm-project/vllm - cwlseu opened this issue 3 months ago
github.com/vllm-project/vllm - cwlseu opened this issue 3 months ago
[Bug]: qwen2-72b-instruct model with RuntimeError: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - izhuhaoran opened this issue 3 months ago
github.com/vllm-project/vllm - izhuhaoran opened this issue 3 months ago
[Bug]: Reproducing Llama 3.1 distributed inference from the blog
github.com/vllm-project/vllm - eldarkurtic opened this issue 3 months ago
github.com/vllm-project/vllm - eldarkurtic opened this issue 3 months ago
[Bug]: --max-model-len configuration robustness
github.com/vllm-project/vllm - gargnipungarg opened this issue 3 months ago
github.com/vllm-project/vllm - gargnipungarg opened this issue 3 months ago
[Usage]: Pipeline Parallelism but with quantized model?
github.com/vllm-project/vllm - fahadh4ilyas opened this issue 3 months ago
github.com/vllm-project/vllm - fahadh4ilyas opened this issue 3 months ago
[Feature]: chat API assistant prefill
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
[wip] spmd delta optimization
github.com/vllm-project/vllm - rkooo567 opened this pull request 3 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 3 months ago
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor.
github.com/vllm-project/vllm - eaplatanios opened this pull request 3 months ago
github.com/vllm-project/vllm - eaplatanios opened this pull request 3 months ago
[Installation]: Unable to build docker image using Dockerfile.openvino
github.com/vllm-project/vllm - zahidulhaque opened this issue 3 months ago
github.com/vllm-project/vllm - zahidulhaque opened this issue 3 months ago
[Usage]: How to inference a model with medusa speculative sampling.
github.com/vllm-project/vllm - cwlseu opened this issue 3 months ago
github.com/vllm-project/vllm - cwlseu opened this issue 3 months ago
[Bug]: Possible data race when running Llama 405b fp8
github.com/vllm-project/vllm - tlrmchlsmth opened this issue 3 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this issue 3 months ago
[Bug]: `pt_main_thread` processes are not killed after main process is killed in MP distributed executor backend
github.com/vllm-project/vllm - oandreeva-nv opened this issue 3 months ago
github.com/vllm-project/vllm - oandreeva-nv opened this issue 3 months ago
[Bug]: FP8 Quantization (static and dynamic) incompatible with `--cpu-offload-gb`
github.com/vllm-project/vllm - drikster80 opened this issue 3 months ago
github.com/vllm-project/vllm - drikster80 opened this issue 3 months ago
[ Kernel ] Add Fused Layernorm + Dynamic-Per-Token Quant Kernels
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
[Bugfix] Fix `kv_cache_dtype=fp8` without scales for FP8 checkpoints
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[Bug]: Broken accuracy on LLaMa 3.1 70B -- worse than even 8B
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
[Bugfix] Fix decode tokens w. CUDA graph
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py
github.com/vllm-project/vllm - CatherineSue opened this pull request 3 months ago
github.com/vllm-project/vllm - CatherineSue opened this pull request 3 months ago
[Bugfix]: use PretrainedConfig to communicate config objects with trust remote code
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
[Usage]: The 8xH100 device failed to run meta-llama/Meta-Llama-3.1-405B-Instruct-FP8.
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
[Bugfix] Fix awq_marlin and gptq_marlin flags
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
[Bug]: openai_embedding_client returns len 8192 embedding not 4096
github.com/vllm-project/vllm - ehuaa opened this issue 3 months ago
github.com/vllm-project/vllm - ehuaa opened this issue 3 months ago
[Bugfix] Fix speculative decode seeded test
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
github.com/vllm-project/vllm - njhill opened this pull request 3 months ago
[InstallImportError: cannot import name 'LogicalTokenBlock' from 'vllm.block'ation]:
github.com/vllm-project/vllm - peak-coco opened this issue 3 months ago
github.com/vllm-project/vllm - peak-coco opened this issue 3 months ago
[Frontend] split run_server into build_server and run_server
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
[Model][Jamba] Mamba cache single buffer
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
[Bug]: The FP8 models and FP8 KV-Cache-Scales loaded together failed on the latest 0.5.3
github.com/vllm-project/vllm - wanzhenchn opened this issue 3 months ago
github.com/vllm-project/vllm - wanzhenchn opened this issue 3 months ago
[Usage]: use vllm==0.4.2 to infer qwen2-0.5b model on H800 1*80G,but GPU's computational power utilization is only around 20%
github.com/vllm-project/vllm - Ajay-Wong opened this issue 3 months ago
github.com/vllm-project/vllm - Ajay-Wong opened this issue 3 months ago
[Bug]: TypeError: snapshot_download() got an unexpected keyword argument 'ignore_patterns' when set VLLM_USE_MODELSCOPE=True
github.com/vllm-project/vllm - wutz opened this issue 3 months ago
github.com/vllm-project/vllm - wutz opened this issue 3 months ago
[Bug]: batch inference not consistent (even temperature=0)
github.com/vllm-project/vllm - GGuo555 opened this issue 3 months ago
github.com/vllm-project/vllm - GGuo555 opened this issue 3 months ago
[Bug]: vllm-0.5.3.post1部署Qwen2-72b-instruct-awq模型,刚开始服务正常,但是并发高的时候就报错
github.com/vllm-project/vllm - xinzaifeixiang1992 opened this issue 3 months ago
github.com/vllm-project/vllm - xinzaifeixiang1992 opened this issue 3 months ago
[Bugfix] Fix speculative decode seeded test
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Bug]: VLLM 0.5.3.post1 [rank0]: RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
[Feature]: Add support to Llama-3.1
github.com/vllm-project/vllm - KaifAhmad1 opened this issue 3 months ago
github.com/vllm-project/vllm - KaifAhmad1 opened this issue 3 months ago
[Bugfix]fix modelscope compatible issue
github.com/vllm-project/vllm - liuyhwangyh opened this pull request 3 months ago
github.com/vllm-project/vllm - liuyhwangyh opened this pull request 3 months ago
Adjust/openai api server turbo 20240724 v2
github.com/vllm-project/vllm - zyearw1024 opened this pull request 3 months ago
github.com/vllm-project/vllm - zyearw1024 opened this pull request 3 months ago
[Feature]: vllm support for Ascend NPU
github.com/vllm-project/vllm - hi-yifeng opened this issue 3 months ago
github.com/vllm-project/vllm - hi-yifeng opened this issue 3 months ago
[Bug]: Cannot find any of ['adapter_name_or_path'] in the model's quantization config
github.com/vllm-project/vllm - fengyunflya opened this issue 3 months ago
github.com/vllm-project/vllm - fengyunflya opened this issue 3 months ago