Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Fix private accessors, rename pdf to canvas

github.com/ocrmypdf/OCRmyPDF - 2ca6e110ca15f291bfd1136691e96830a9d0f550 authored about 1 year ago by James R. Barlow <[email protected]>
Further exploratory improvements

github.com/ocrmypdf/OCRmyPDF - 30a0c315fba50354e6d963813de083c31b51efc3 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor debug printing

github.com/ocrmypdf/OCRmyPDF - 334a07c8390cfc76842ccea93dbca66fe69789d9 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor: extract methods

github.com/ocrmypdf/OCRmyPDF - a57c39358d73168c69b8648cd61a329e34990b37 authored about 1 year ago by James R. Barlow <[email protected]>
More colors

github.com/ocrmypdf/OCRmyPDF - 7ab5c55d4678423a983c0a50dbb61bb558eb2464 authored about 1 year ago by James R. Barlow <[email protected]>
Remove concept of HOCR_OK_LANGS

github.com/ocrmypdf/OCRmyPDF - 491b6bdb1f24bd7b019da631148189858c2db94b authored about 1 year ago by James R. Barlow <[email protected]>
Fix line and rect drawing

github.com/ocrmypdf/OCRmyPDF - 8b6ecd5971b255fe283fe12107ca33e13dd8a6c9 authored about 1 year ago by James R. Barlow <[email protected]>
Add rendering of space between boxes

github.com/ocrmypdf/OCRmyPDF - 686cfb25394041af66392d44696f6bbce75dd5fe authored about 1 year ago by James R. Barlow <[email protected]>
Fix position errors; ignore non-glyphless font

github.com/ocrmypdf/OCRmyPDF - 7b0871ae4c95fe8c12372d0c7e503b74185fa2c8 authored about 1 year ago by James R. Barlow <[email protected]>
Significantly improvement overall

github.com/ocrmypdf/OCRmyPDF - d9ae453a63f9c1be5a05b18b74f160da74ad9286 authored about 1 year ago by James R. Barlow <[email protected]>
Tidy up

github.com/ocrmypdf/OCRmyPDF - d739b91aef1b51aad712eef363a8bf8e323b5236 authored about 1 year ago by James R. Barlow <[email protected]>
Tidying new hOCR renderer

github.com/ocrmypdf/OCRmyPDF - b14f6f778abac2cbf406664128442f5e7c134111 authored about 1 year ago by James R. Barlow <[email protected]>
WIP improve text positioning (not there yet)

github.com/ocrmypdf/OCRmyPDF - 14f4c19f5a3be1f55b26a841bb9e8056b16fe869 authored about 1 year ago by James R. Barlow <[email protected]>
Create pikepdf backend renderer

github.com/ocrmypdf/OCRmyPDF - d0133f86412d28f12495fb90cd5d4a1390665a7c authored about 1 year ago by James R. Barlow <[email protected]>
Add pdf.ttf

github.com/ocrmypdf/OCRmyPDF - e966c1fceb0b6b7a12e2a936e9e7596490f89300 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor reportlab into generic backend

github.com/ocrmypdf/OCRmyPDF - 6d30b497dcdb6c6abefd2c31a9539809d126cda6 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor reportlab backend out of hocrtransform

github.com/ocrmypdf/OCRmyPDF - f3b89e66eb9fec87769dc4658c79f4237234c797 authored about 1 year ago by James R. Barlow <[email protected]>
Test pikepdf canvas - renders... something at this point

github.com/ocrmypdf/OCRmyPDF - 60645717e27af004d1dde278511ba2f8ed89d814 authored about 1 year ago by James R. Barlow <[email protected]>
hocrtransform: move to module

github.com/ocrmypdf/OCRmyPDF - 04154e207c989719af95e531435c12a4eb0c556f authored about 1 year ago by James R. Barlow <[email protected]>
Fix import of pdf.ttf

github.com/ocrmypdf/OCRmyPDF - 1cbf578538753e6d8663d7abd8ee2f69b8e23577 authored about 1 year ago by James R. Barlow <[email protected]>
Fix dashes

github.com/ocrmypdf/OCRmyPDF - b73af7ce1001fb6a9b14c03c5c1e44f68f39fbc4 authored about 1 year ago by James R. Barlow <[email protected]>
Fix pikepdf PdfMatrix deprecation warning; v15.4.3 release notes

github.com/ocrmypdf/OCRmyPDF - 9898904be70bf957154cc66cb5d6f57bac7c26b4 authored about 1 year ago by James R. Barlow <[email protected]>
Make logger names unique

github.com/ocrmypdf/OCRmyPDF - 27d52298420e785300e418bef4e4219ef97720e1 authored about 1 year ago by James R. Barlow <[email protected]>
ghostscript: better comments

github.com/ocrmypdf/OCRmyPDF - 4a9a575ef0f2cd7459538a43e802c8d8b3846168 authored about 1 year ago by James R. Barlow <[email protected]>
v15.4.2 release notes

github.com/ocrmypdf/OCRmyPDF - 52fd9a630d57ac7af91db423b17e6b0297fc1c66 authored about 1 year ago by James R. Barlow <[email protected]>
Raise exception if resulting PDF might appear blank in a known in some PDF viewers

Fixes #1187

github.com/ocrmypdf/OCRmyPDF - a596ccf84469c81c6d19ab0eb59f53a5df0bc3c7 authored about 1 year ago by James R. Barlow <[email protected]>
ghostscript duplicate filter: filter within a window of previous messages

github.com/ocrmypdf/OCRmyPDF - e7fa97731f507c1d91e4289b9d5f3344533d7c4c authored about 1 year ago by James R. Barlow <[email protected]>
Fix error on attempt to write to debug log after removing debug log handler

github.com/ocrmypdf/OCRmyPDF - 290aa2810868fb506a0ded2db8dabb580d5a2459 authored about 1 year ago by James R. Barlow <[email protected]>
v15.4.1 release notes

Closes #1185
Closes #1183

github.com/ocrmypdf/OCRmyPDF - a95640ed9ec0a5caa360bd64b9ad6cc9f916fa61 authored about 1 year ago by James R. Barlow <[email protected]>
watcher: restore ability to read json from file or command line string

github.com/ocrmypdf/OCRmyPDF - f69267bb675c3261121b70451f6e673f433bc7ad authored about 1 year ago by James R. Barlow <[email protected]>
Make grafting a little bit more configurable

github.com/ocrmypdf/OCRmyPDF - e36d5a309fd250927ed2c14983be35e806f7fdf0 authored about 1 year ago by James R. Barlow <[email protected]>
Fix watcher.py kwarg error

github.com/ocrmypdf/OCRmyPDF - 55566d9830cc12a27e7c47b9a467bd021698ddbf authored about 1 year ago by James R. Barlow <[email protected]>
docs: plugin documentation missing key special members

github.com/ocrmypdf/OCRmyPDF - f02ea20678ab07f0f847347720b1db1f8a52cd2e authored about 1 year ago by James R. Barlow <[email protected]>
docs: improve

github.com/ocrmypdf/OCRmyPDF - 372c22d42b449c9c0d3ca2826890ee296ac6ffb1 authored about 1 year ago by James R. Barlow <[email protected]>
graft: improve typing and remove procset tracking

ProcSet is optional and deprecated in PDF 2.0, and does little anyway; so
we removed it.

github.com/ocrmypdf/OCRmyPDF - 949265bbd0e4b0c3c606129e37039905c58b22c9 authored about 1 year ago by James R. Barlow <[email protected]>
Skip semfree unless on Linux

github.com/ocrmypdf/OCRmyPDF - 916106733c36c080e4707f0fc8e8613800ad0ed7 authored about 1 year ago by James R. Barlow <[email protected]>
Add missing file header

github.com/ocrmypdf/OCRmyPDF - 44bcafd3aa2dcf3270c87bdc79c5ccff8852f0eb authored about 1 year ago by James R. Barlow <[email protected]>
Make hocr API experimental for now

This commit can be reverted when we are ready to release a new version.

github.com/ocrmypdf/OCRmyPDF - 71166f7be8ef8a04fd4477dc3bdec07c043fed06 authored about 1 year ago by James R. Barlow <[email protected]>
Merge branch 'feature/gscan2pdf'

Reconcile release notes and copy_final() with new pipeline.

github.com/ocrmypdf/OCRmyPDF - 580252a1a0c822f9a057bf08b995a6b5fa4d9490 authored about 1 year ago by James R. Barlow <[email protected]>
build: add repository -y

github.com/ocrmypdf/OCRmyPDF - c0b60dae6a5adde8b2d26dfb42b077e6825a8a53 authored about 1 year ago by James R. Barlow <[email protected]>
Try to retain/copy xattrs

github.com/ocrmypdf/OCRmyPDF - ae123fd20994d649388ed0e297ff0233b534aed0 authored about 1 year ago by James R. Barlow <[email protected]>
build/macos: add openssl

github.com/ocrmypdf/OCRmyPDF - 454ad0acc520dc09cc5c6ef36dee0428536878b8 authored about 1 year ago by James R. Barlow <[email protected]>
v15.3.1 release ntoes

github.com/ocrmypdf/OCRmyPDF - 0c306ac328374820a10f582c6ee33c5e9e73fb3a authored about 1 year ago by James R. Barlow <[email protected]>
Fix mistakes with watcher loglevel handling

github.com/ocrmypdf/OCRmyPDF - 52d99732b12ce734881e7115388b88d810bd1858 authored about 1 year ago by James R. Barlow <[email protected]>
Tweak documentation of --output-type

github.com/ocrmypdf/OCRmyPDF - 5b5827983be2d6d341717c11368b08fded9386fa authored about 1 year ago by James R. Barlow <[email protected]>
Improve verbosity of colorspace selection

github.com/ocrmypdf/OCRmyPDF - 56f9bc311d37a0c5dd07bcc6074f9f12b25a6543 authored about 1 year ago by James R. Barlow <[email protected]>
Fix pdf save settings at metadata_fixup

github.com/ocrmypdf/OCRmyPDF - eb17dc1ecfca676af76140198529cf706013fda1 authored about 1 year ago by James R. Barlow <[email protected]>
Fix import of metadata_fixup

github.com/ocrmypdf/OCRmyPDF - 6f8115a052794ce6289eb96d8d3e6ccf3442aa39 authored about 1 year ago by James R. Barlow <[email protected]>
tesseract: EAFP

github.com/ocrmypdf/OCRmyPDF - aac913c666cdb72f11fda8cde5a5fa3d863cb2cb authored about 1 year ago by James R. Barlow <[email protected]>
Drop check for obsolete .dockerinit file

github.com/ocrmypdf/OCRmyPDF - b5e73ac4e411877cbfd5037e14117eca0d8feb4e authored about 1 year ago by James R. Barlow <[email protected]>
docs: note on docker performance

github.com/ocrmypdf/OCRmyPDF - 9e98c908919b295ae6583568a351b1928879d795 authored about 1 year ago by James R. Barlow <[email protected]>
Update draft release notes

github.com/ocrmypdf/OCRmyPDF - ca2592c1d95e3f4204659d36fba1b8f26da216d2 authored about 1 year ago by James R. Barlow <[email protected]>
Update release notes so far

github.com/ocrmypdf/OCRmyPDF - 5a759947dd1743f57f22bd5b7290104088435aa8 authored about 1 year ago by James R. Barlow <[email protected]>
Update comments and make worker functions private

github.com/ocrmypdf/OCRmyPDF - a31f17bb9d44c961b01a3d7492162962c661ee5d authored about 1 year ago by James R. Barlow <[email protected]>
Update pluginspec docs

github.com/ocrmypdf/OCRmyPDF - 1cb46afa94eca2c9f418d186df020b4744c605de authored about 1 year ago by James R. Barlow <[email protected]>
Simplify function signature of extract_image_filter

github.com/ocrmypdf/OCRmyPDF - dfa4ebf1a60e4233ab425d23c2243b4a76c72076 authored about 1 year ago by James R. Barlow <[email protected]>
pngquant: remove unused ability to quantize a non-PNG

Covering testing showed this branch was never used, and when tested it didn't work.

github.com/ocrmypdf/OCRmyPDF - cd61c4efd9ef369d1ca707cda6c7824df42b8a60 authored about 1 year ago by James R. Barlow <[email protected]>
Remove public domain congress.jpg and replace with baiona_color.jpg

For reuse compliance we are phasing out public domain licenses

github.com/ocrmypdf/OCRmyPDF - 9ffb45f28332a92d7b3ed67b0d72abdc505a2b9e authored about 1 year ago by James R. Barlow <[email protected]>
Update dep5

github.com/ocrmypdf/OCRmyPDF - 299f0c4003db2e3c25a414fff536d0ab545fcaa2 authored about 1 year ago by James R. Barlow <[email protected]>
Improve passing of arguments to workers

The executor system was built around passing only a single
argument to workers, which was
always...

github.com/ocrmypdf/OCRmyPDF - 46a279a49aa4bd42896bcde842ae36691bd00104 authored about 1 year ago by James R. Barlow <[email protected]>
Reorganize progress bars so they can be typed properly

github.com/ocrmypdf/OCRmyPDF - d2dbea6cf832646428f8c5faf671bc79c446fa0b authored about 1 year ago by James R. Barlow <[email protected]>
info: clarify pageinfo context management

github.com/ocrmypdf/OCRmyPDF - e4cd081d4d02e1c7cc77ada26c9e82200b2fa300 authored about 1 year ago by James R. Barlow <[email protected]>
info: clarify ICC -> components checking

github.com/ocrmypdf/OCRmyPDF - d2297b39d0868c2b9c17da1d56f817acfe99695c authored about 1 year ago by James R. Barlow <[email protected]>
unpaper: Remove format conversion

Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.

github.com/ocrmypdf/OCRmyPDF - a06ab2a1c5176f3aeb141923366295c225d1e007 authored about 1 year ago by James R. Barlow <[email protected]>
Fix hocrtransform test to generate blank hocr

github.com/ocrmypdf/OCRmyPDF - a4059762e63e6f237ec1d895b83067a57a21cb12 authored about 1 year ago by James R. Barlow <[email protected]>
Eliminate more run_ocrmypdf calls

github.com/ocrmypdf/OCRmyPDF - 82bef40aa68417b0951b4b70acb54739ad1182c9 authored about 1 year ago by James R. Barlow <[email protected]>
pluginspec: spacing

github.com/ocrmypdf/OCRmyPDF - 40afcd68a79b1b4618b9c7be44626b80ab6bb50d authored about 1 year ago by James R. Barlow <[email protected]>
Convert many run_ocrmypdf -> run_ocrmypdf_api

github.com/ocrmypdf/OCRmyPDF - 8916955f45de66d27dfcf47ceda24f21831ac5ca authored about 1 year ago by James R. Barlow <[email protected]>
Improve documentation of new public hOCR APIs

github.com/ocrmypdf/OCRmyPDF - f238e721edad9ffe9a8da3b005ada6c0faeb5cf5 authored about 1 year ago by James R. Barlow <[email protected]>
Fix unused imports and other trivia

github.com/ocrmypdf/OCRmyPDF - 16eb5627a77e074ad07e7680d727a97c23eab6e8 authored about 1 year ago by James R. Barlow <[email protected]>
hocr_to_ocr_pdf: handle missing hocr json file

github.com/ocrmypdf/OCRmyPDF - fbf06741896fac650f936738162f9554b82fc594 authored about 1 year ago by James R. Barlow <[email protected]>
Remove ocrmypdf._sync

github.com/ocrmypdf/OCRmyPDF - db3df13e959b892a4423cde90809e0f8328469ba authored about 1 year ago by James R. Barlow <[email protected]>
Improve ._pipelines naming

github.com/ocrmypdf/OCRmyPDF - e400112f3277abb29e2b49385ea944b6ae99124f authored about 1 year ago by James R. Barlow <[email protected]>
Use empty .hocr file instead of dummy template for symmetry with sandwich

github.com/ocrmypdf/OCRmyPDF - 7935914f5573dff1b7a89a3d6c405499adbffd31 authored about 1 year ago by James R. Barlow <[email protected]>
optimize: typing

github.com/ocrmypdf/OCRmyPDF - 2a8bc03167d4448390ea3bc7e902d7b1a473f9ac authored about 1 year ago by James R. Barlow <[email protected]>
Remove duplicate thread local storage of page numbers

github.com/ocrmypdf/OCRmyPDF - 62c4f65fc36c434ceeb5804dfa3835e0d6dfbb15 authored about 1 year ago by James R. Barlow <[email protected]>
Fix some typing issues

github.com/ocrmypdf/OCRmyPDF - 4dbc5e1dbaa848c8a3ca7ed54455c98cc8688001 authored about 1 year ago by James R. Barlow <[email protected]>
vscode isn't ready for black py312, revert

github.com/ocrmypdf/OCRmyPDF - c0637c287e0a76cdf252f93d1fcd4908ac703f94 authored about 1 year ago by James R. Barlow <[email protected]>
tests: replace many run_ocrmypdf -> run_ocrmypdf_api

github.com/ocrmypdf/OCRmyPDF - 1c45f329411e17613aa631ff221b7f47dd7793cc authored about 1 year ago by James R. Barlow <[email protected]>
Fix coverage settings and cover semfree

github.com/ocrmypdf/OCRmyPDF - 990b462a94403c5b388b203bc6aada268a300442 authored about 1 year ago by James R. Barlow <[email protected]>
Replace cryptic test error messages with more informative ones

github.com/ocrmypdf/OCRmyPDF - fadc0cf69b7e9db6f679de870f94e1b70762559f authored about 1 year ago by James R. Barlow <[email protected]>
tqdm_kwargs to progress_kwargs

github.com/ocrmypdf/OCRmyPDF - 6127f7abd66f765cab98e3404f0124962cd0a270 authored about 1 year ago by James R. Barlow <[email protected]>
Define progress bar plugins formally instead of "tqdm-like"

github.com/ocrmypdf/OCRmyPDF - 7ce9d08b2d8c170f94545913bf56d1a799bf7818 authored about 1 year ago by James R. Barlow <[email protected]>
optimize: better coverage

github.com/ocrmypdf/OCRmyPDF - 58f388c69d0d6a9b303bc8c8c3ba251401c0125a authored about 1 year ago by James R. Barlow <[email protected]>
deps: update PyMuPDF req

github.com/ocrmypdf/OCRmyPDF - ad3a1dbbad03ad16f2f871b869f670aae7ba47be authored about 1 year ago by James R. Barlow <[email protected]>
Prefer pikepdf's newer Page.mediabox accessor over .MediaBox

github.com/ocrmypdf/OCRmyPDF - eb3a51e33a7ec577c681bc025e87cf5870d9a468 authored about 1 year ago by James R. Barlow <[email protected]>
Skip fewer tests

github.com/ocrmypdf/OCRmyPDF - b928dc0808e580674faf9e58ddfe7d4459812f8b authored about 1 year ago by James R. Barlow <[email protected]>
optimize: explore page container as objects instead of page helpers

github.com/ocrmypdf/OCRmyPDF - f3dd73377309bc854bad1c80b0276a52548c7107 authored about 1 year ago by James R. Barlow <[email protected]>
Rename post_process -> postprocess

For consistency with preprocess

github.com/ocrmypdf/OCRmyPDF - c278fecb34df2c77372fec3c9c49b2d20a9a0ad7 authored about 1 year ago by James R. Barlow <[email protected]>
Enable multiprocessing freeze_support (for Windows) and enable forkserver

Since we use a thread for logging among other possibilities, we were
never able to use threads s...

github.com/ocrmypdf/OCRmyPDF - b9646b6f85fa9cbcc44f72df7c0fc315176a646b authored about 1 year ago by James R. Barlow <[email protected]>
Fix error on no languages available

github.com/ocrmypdf/OCRmyPDF - 05721ba84af4e75cf8d0493d5769ba4d40457750 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor -migrate metadata repair to new module

github.com/ocrmypdf/OCRmyPDF - 1a7738a925ed42a0ab1671ee22eb3bb3c7adb178 authored about 1 year ago by James R. Barlow <[email protected]>
docs: some copyediting

github.com/ocrmypdf/OCRmyPDF - 04a937258474bc9211b13fb7d45ac5439c8fb364 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor metadata handling

github.com/ocrmypdf/OCRmyPDF - d153a6f6df923d49a841079536b98797287bc2c5 authored about 1 year ago by James R. Barlow <[email protected]>
Automatically set document language to OCR language

github.com/ocrmypdf/OCRmyPDF - 38c3422e5ee8319d0110fdb6c5235bc541f4034d authored about 1 year ago by James R. Barlow <[email protected]>
Add py312 to black coverage

github.com/ocrmypdf/OCRmyPDF - 0655f8e7ae5db1a66864011ae224c86a630d836d authored about 1 year ago by James R. Barlow <[email protected]>
Fix exit code error on Ghostscript failure

github.com/ocrmypdf/OCRmyPDF - 0856750ee2f0aeb1cfe7962d620fb7fdd134261c authored about 1 year ago by James R. Barlow <[email protected]>
languages: kwargs are overkill

github.com/ocrmypdf/OCRmyPDF - e38d569d8f2a63a0d2e0060441fed5bf0f7405cd authored about 1 year ago by James R. Barlow <[email protected]>
Refactor debug log and work folder context cleanup

github.com/ocrmypdf/OCRmyPDF - fc6f959d21b3b19309c1e50359aa39a629ac3e07 authored about 1 year ago by James R. Barlow <[email protected]>
Refactor setup_pipeline to decouple manage_work_folder

github.com/ocrmypdf/OCRmyPDF - 6f82097d141cb7d2ae76487aec654e7787b2a89a authored about 1 year ago by James R. Barlow <[email protected]>