Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Add advisory note to release notes

github.com/ocrmypdf/OCRmyPDF - 1d0584c64468aed832f84e4e352d7555d6672072 authored almost 3 years ago by James R. Barlow <[email protected]>
Speculation: Ghostscript 9.56 new PDF interpreter breaks things

github.com/ocrmypdf/OCRmyPDF - 84b9d4d021113560948274f35712668381d00ea2 authored almost 3 years ago by James Barlow <[email protected]>
Fix Python "3.10"

github.com/ocrmypdf/OCRmyPDF - 41efd3bf0fb6357de0d5078e3e75de5116823a33 authored almost 3 years ago by James Barlow <[email protected]>
Upgrade pre-commit and associated tools; various lints

github.com/ocrmypdf/OCRmyPDF - 776ada671391a6282cdf397c78a3487fb1607059 authored almost 3 years ago by James Barlow <[email protected]>
ci: test Python 3.10

github.com/ocrmypdf/OCRmyPDF - f3593c915dbfdafa1117990436dd4140badd3d68 authored almost 3 years ago by James Barlow <[email protected]>
Add lock to certain "with patch" cases

Switch to --use-threads seems to have broken tests that assumed they could
monkeypatch things. A...

github.com/ocrmypdf/OCRmyPDF - dfe31a2f6d70906b3f21410573cd8c032b596bad authored almost 3 years ago by James Barlow <[email protected]>
Fix pytest deprecation warnings

github.com/ocrmypdf/OCRmyPDF - 0c43963d697038ee5e6dbdf10462ef6cb1515656 authored almost 3 years ago by James Barlow <[email protected]>
Fix Pillow deprecation warnings

github.com/ocrmypdf/OCRmyPDF - f29fe7f23eb21342b9dc8ff53f7d3b824d59f9c8 authored almost 3 years ago by James Barlow <[email protected]>
pdfminer 20220319

github.com/ocrmypdf/OCRmyPDF - 04996caac34a418cf233c0f3c8ac436b6f2b5920 authored almost 3 years ago by James R. Barlow <[email protected]>
Disable oom killer test for --use-threads

github.com/ocrmypdf/OCRmyPDF - 13917c051c91e677b245d2e57098d6b8ab316e0d authored almost 3 years ago by James R. Barlow <[email protected]>
Make --use-threads default and update release notes

Re: issue Hanging on Random Files #814

github.com/ocrmypdf/OCRmyPDF - 8182fe9c927c8ba1053cf649bf222a1b5be2c837 authored almost 3 years ago by James R. Barlow <[email protected]>
docs: proofread plugins

github.com/ocrmypdf/OCRmyPDF - 1950acfbda3a659ca70658c848f900306ab2e35e authored almost 3 years ago by James R. Barlow <[email protected]>
Disallow pikepdf 5.0.0

github.com/ocrmypdf/OCRmyPDF - fca6403083206ee3a90098143d56dadc35169595 authored almost 3 years ago by James R. Barlow <[email protected]>
v13.4.0 release notes (2)

github.com/ocrmypdf/OCRmyPDF - c4e2fce1efe2f3ebe6c20198812384fa4c19cb1f authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: don't deflate JPEGs with fancy DecodeParms settings

This is overly cautious but will do for now.

github.com/ocrmypdf/OCRmyPDF - 354647965893d6803c32f69fe21767ec8397cebf authored almost 3 years ago by James R. Barlow <[email protected]>
Fix error messages when run with pikepdf 5.0.0

Appears that these are spurious errors from qpdf probing the /DecodeParms
dict on images that do...

github.com/ocrmypdf/OCRmyPDF - 72442fa3d01027313679eaab3839e09254e462e0 authored almost 3 years ago by James R. Barlow <[email protected]>
v13.4.0 release notes

github.com/ocrmypdf/OCRmyPDF - 8f714b1375e0f8bde2cef09e6fe05b8af7041038 authored almost 3 years ago by James R. Barlow <[email protected]>
pdfinfo: a few annotations

github.com/ocrmypdf/OCRmyPDF - cb05c1d1224bcb6685c41d2e87fe2fc5e5ef6410 authored almost 3 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of github.com:ocrmypdf/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - b0ad07bc5f34af7f99517ad9ba1101ed19db1364 authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: recognize and produce [/FlateDecode /DCTDecode] images

github.com/ocrmypdf/OCRmyPDF - 514038d4ec2ce249d039a7fc9d1ef6023181e936 authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: remove comment about issue in Pillow that is now fixed

github.com/ocrmypdf/OCRmyPDF - 50d76e7f6c063bed71b88c45a9f3cbf863a64446 authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: remove inaccurate about ICCs

pikepdf will now get the ICC profile out and put it in the JPEG.

github.com/ocrmypdf/OCRmyPDF - 6c78a462856ef44af7272492f260769a308576bf authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: clarify log message about skipping images with multiple filters

github.com/ocrmypdf/OCRmyPDF - 863d56063262d7c0789da65d4a90fcf4d2be2840 authored almost 3 years ago by James R. Barlow <[email protected]>
release notes: typo

github.com/ocrmypdf/OCRmyPDF - 73934c854c0ee24c21bba369a5de34636c751c8b authored almost 3 years ago by James R. Barlow <[email protected]>
Fix spelling of 'ephemeral' (#908)

github.com/ocrmypdf/OCRmyPDF - 2be8eeec2cffa7860f9b078aeea4d063767ee64b authored almost 3 years ago by rdiez <[email protected]>
The world is not ready for :=

github.com/ocrmypdf/OCRmyPDF - 3dfde479e2d0d6dac4c6f55093038068e475b82c authored almost 3 years ago by James R. Barlow <[email protected]>
v13.3.0 release notes

github.com/ocrmypdf/OCRmyPDF - aea1862644caf5101b532558a53f294eb1a6e92f authored almost 3 years ago by James R. Barlow <[email protected]>
ghostscript: improve test coverage of error cases

github.com/ocrmypdf/OCRmyPDF - 3b406112d0b59d525ddf81996918c035168d928c authored almost 3 years ago by James R. Barlow <[email protected]>
ghostscript: improve error message if image cannot be opened

github.com/ocrmypdf/OCRmyPDF - fcc4c2d37180cd5ef6fce57e662e1aafc50713e5 authored almost 3 years ago by James R. Barlow <[email protected]>
tesseract: account for more tesseract 5 output differences

github.com/ocrmypdf/OCRmyPDF - 3de18ed6123db1517ed0dd364051554e9bda600b authored almost 3 years ago by James R. Barlow <[email protected]>
optimize: don't try to optimize an image we can't save

github.com/ocrmypdf/OCRmyPDF - 93cca42e2056ef323e2a4bc60bef04159581f4f7 authored almost 3 years ago by James R. Barlow <[email protected]>
Use better img2pdf settings where possible while supporting old versions

Fixes #894

github.com/ocrmypdf/OCRmyPDF - 2d0ac4707c6b19614bf56bede0892656cd0e1f0c authored almost 3 years ago by James R. Barlow <[email protected]>
unpaper: refactoring

github.com/ocrmypdf/OCRmyPDF - 7d208175cf3fe5da3db27c12a245abf360dcb64c authored almost 3 years ago by James R. Barlow <[email protected]>
unpaper: issue warning if image too large to clean

github.com/ocrmypdf/OCRmyPDF - ea69e868ed95a335b362a3708628c0372cb7abb8 authored almost 3 years ago by James R. Barlow <[email protected]>
Revert "docs: add sphinx-panels"

This reverts commit 7966192d6edb989f208cbbc6487346fdda635e78.

github.com/ocrmypdf/OCRmyPDF - beea603ab32eee22c1e9fa00f55a15b7e45d5ed2 authored about 3 years ago by James R. Barlow <[email protected]>
docs: add sphinx-panels

github.com/ocrmypdf/OCRmyPDF - 7966192d6edb989f208cbbc6487346fdda635e78 authored about 3 years ago by James R. Barlow <[email protected]>
Wrap exception on non-CMYK images into the log warning (#881)

github.com/ocrmypdf/OCRmyPDF - 5acbd7a2525aa312fc57a7b9108403937fdfb60b authored about 3 years ago by Anton Gladky <[email protected]>
Fix typo (#882)

github.com/ocrmypdf/OCRmyPDF - aed955ca8c70798a4ed6c642c768cbd77f483c7f authored about 3 years ago by Krasimir Nedelchev <[email protected]>
v13.2.0 release notes

github.com/ocrmypdf/OCRmyPDF - 298bdb8690a4cbb7ce134372e5a4f6a018134684 authored about 3 years ago by James R. Barlow <[email protected]>
concurrency: fix extra update of progressbar

github.com/ocrmypdf/OCRmyPDF - 1a58abcc6a9c0cf555d91d0621e8a9ce017e9006 authored about 3 years ago by James R. Barlow <[email protected]>
Standardize ghostscript version default

github.com/ocrmypdf/OCRmyPDF - dbfceba0201af5cede0529f27d3f203e831295b7 authored about 3 years ago by James R. Barlow <[email protected]>
Replace deprecated distutils with packaging.version

github.com/ocrmypdf/OCRmyPDF - 0faa618c3c27642460fced1b4c2d81d1e7534c9b authored about 3 years ago by James R. Barlow <[email protected]>
Order of operations suppressed detailed Ghostscript missing error message

github.com/ocrmypdf/OCRmyPDF - 7035002c03dd12bfc5db3dbf9479a9688d02946a authored about 3 years ago by James R. Barlow <[email protected]>
_windows: remove use of deprecated distutils

github.com/ocrmypdf/OCRmyPDF - f8fadaef41ae6ee8b84e93f8369e4f5e840326d8 authored about 3 years ago by James R. Barlow <[email protected]>
Update cache

github.com/ocrmypdf/OCRmyPDF - ee21bf9ef61faafa3444ef4a1f1efe6f09a879b8 authored about 3 years ago by James R. Barlow <[email protected]>
v13.1.1 release notes

github.com/ocrmypdf/OCRmyPDF - 190ca8195131d033dd1bb2bf4ff9bbbdd82dca1a authored about 3 years ago by James R. Barlow <[email protected]>
Fix issue with attempting to deskew a blank page on Tesseract 5

Closes #868

github.com/ocrmypdf/OCRmyPDF - d48254d477c45402134dfa4683781739346bf336 authored about 3 years ago by James R. Barlow <[email protected]>
docs: add warning about multiproc on macOS

github.com/ocrmypdf/OCRmyPDF - 1ec2ccca14f2bc1a8c8f3ef11a659f9f96407562 authored about 3 years ago by James R. Barlow <[email protected]>
v13.1.0 release notes

github.com/ocrmypdf/OCRmyPDF - e78f0cc56f3c144a57eedad268160cc97b066730 authored about 3 years ago by James R. Barlow <[email protected]>
tests: simplify run_ocrmypdf API

github.com/ocrmypdf/OCRmyPDF - 13af3252ff68d6d82b5f06ac5aa4391f3143bcea authored about 3 years ago by James R. Barlow <[email protected]>
config: yaml strings for versions

github.com/ocrmypdf/OCRmyPDF - 0528867e0be5da69b50501ee508afc01532c7986 authored about 3 years ago by James R. Barlow <[email protected]>
Fix test_outputtype_none on Windows and cleanup docs

github.com/ocrmypdf/OCRmyPDF - 6910c48b8113ace3ddc14392c87c938a2b1d7624 authored about 3 years ago by James R. Barlow <[email protected]>
docs: Remove reference to long removed 'tesseract' renderer

github.com/ocrmypdf/OCRmyPDF - 69aa3981c43af6ba689f3e63081a975283b29341 authored about 3 years ago by James R. Barlow <[email protected]>
docs: remove Ubuntu 16.04 install instructions

It's EOL.

github.com/ocrmypdf/OCRmyPDF - 9c1e5adfe61c22bf1259bb685bc2ee91756cb8bd authored about 3 years ago by James R. Barlow <[email protected]>
Fix kill signal on Windows

github.com/ocrmypdf/OCRmyPDF - e642dd4b356ba18691648729fb7f6b7428e56bfa authored about 3 years ago by James R. Barlow <[email protected]>
Tidy pyproject.toml

github.com/ocrmypdf/OCRmyPDF - 1414a8f5dcc1a6697f2c42efb50e1672b771107b authored about 3 years ago by James R. Barlow <[email protected]>
Use Python executors instead of pools

ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. Proc...

github.com/ocrmypdf/OCRmyPDF - 9de06f62eec88fcb1dae5f6bda88c249e6962746 authored about 3 years ago by James R. Barlow <[email protected]>
typing: small improvements

github.com/ocrmypdf/OCRmyPDF - 26badf2882a64798ed14d06aa1c3f2867fe932bd authored about 3 years ago by James R. Barlow <[email protected]>
sync: typing improvements

github.com/ocrmypdf/OCRmyPDF - 8f873aaa45aada0b8db0cdc3a6f0e56843cdf880 authored about 3 years ago by James R. Barlow <[email protected]>
tests: improve typing and remove some legacy code

github.com/ocrmypdf/OCRmyPDF - 8fdcb15b4ea42ea875ed00c0b5de2662d763e947 authored about 3 years ago by James R. Barlow <[email protected]>
ocrmypdf.fish: fix indents

[ci skip]

github.com/ocrmypdf/OCRmyPDF - 0323738adaf213bc23747940a58a281ee2b5119c authored about 3 years ago by James R. Barlow <[email protected]>
Update ocrmypdf.bash completion

Squashed commit of the following:

commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f9...

github.com/ocrmypdf/OCRmyPDF - aae5591f7e6e90f0fd212a164cf95d1785d09910 authored about 3 years ago by FPille <[email protected]>
tess cache: don't include full platform - could be sensitive

github.com/ocrmypdf/OCRmyPDF - 4c1ff1086c7e183a185bc1384ff59aec2201411d authored about 3 years ago by James R. Barlow <[email protected]>
Add new argument --tesseract-thresholding to control tesseract thresholding where available

Also add missing test for --tesseract-oem

github.com/ocrmypdf/OCRmyPDF - f91faf97955087704366df0060df398522fb622a authored about 3 years ago by James R. Barlow <[email protected]>
Whitespace

github.com/ocrmypdf/OCRmyPDF - 793cc33a905ffda5ea43e7d845c590570bf5df3c authored about 3 years ago by James R. Barlow <[email protected]>
build: typo

github.com/ocrmypdf/OCRmyPDF - fbd72efd45d52cb7706eb25207928138d1f98858 authored about 3 years ago by James R. Barlow <[email protected]>
build: address checksum error from choco

github.com/ocrmypdf/OCRmyPDF - 1115923995abe01f29f63be6975b113eb656c28e authored about 3 years ago by James R. Barlow <[email protected]>
Merge branch 'release/v13' of github.com:jbarlow83/OCRmyPDF into release/v13

github.com/ocrmypdf/OCRmyPDF - 8478d67b2879689740cd34707d8353f1fd0ef17f authored about 3 years ago by James R. Barlow <[email protected]>
Turning on Ghostscript interpolation changes this test

Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this...

github.com/ocrmypdf/OCRmyPDF - c75ff4687a545f8e6a3d24eae085fddc1ed5b358 authored about 3 years ago by James R. Barlow <[email protected]>
[ci skip] minor corrections to maintainers.rst (#858)

github.com/ocrmypdf/OCRmyPDF - 312c1e51b5ca9b0be6e8ab3dd39540e36e126291 authored about 3 years ago by mara004 <[email protected]>
Merge commit 'cd49e70154f82f54bf74fc5bb2586fe7e0358971' into release/v13

github.com/ocrmypdf/OCRmyPDF - cfe2bb25ba0e0c9cdc87868f8ba57d2bf616679c authored about 3 years ago by James R. Barlow <[email protected]>
ghostscript: force interpolation when rendering (#855)

Specifying option --oversample tends to introduce upsampling in rendering
by rasterizing page t...

github.com/ocrmypdf/OCRmyPDF - cd49e70154f82f54bf74fc5bb2586fe7e0358971 authored about 3 years ago by Tristan Porteries <[email protected]>
windows: default version to '0' when looking for Ghostscript

To avoid ValueError: max() arg is an empty sequence

As suggested by @meet1919 in #833.

github.com/ocrmypdf/OCRmyPDF - 7ce1692eef82880e60c921196862a2b01db42c77 authored about 3 years ago by James R. Barlow <[email protected]>
pyproject: tell black to target py37

github.com/ocrmypdf/OCRmyPDF - 7959f7628d3429f3f2cf0e90311f81a810231369 authored about 3 years ago by James R. Barlow <[email protected]>
Raise max-image-mpixels again

PDFs are quite likely to have a lot of pixels, e.g. large high resolution scans.
250 MP is a pag...

github.com/ocrmypdf/OCRmyPDF - 4634b20de5992748eb6296fe4f9d454fe163023b authored about 3 years ago by James R. Barlow <[email protected]>
optimize: fix mypy lint

github.com/ocrmypdf/OCRmyPDF - 3810e576ffda761a17cb1352dfbb3afc521af03d authored about 3 years ago by James R. Barlow <[email protected]>
pipeline: tidy

github.com/ocrmypdf/OCRmyPDF - 01c7895044aaffa426a803beb32a36383e1c2e51 authored about 3 years ago by James R. Barlow <[email protected]>
docs: new maintainer notes

github.com/ocrmypdf/OCRmyPDF - fdc6aa03fb54e0d72a199f12a161ed13ee089646 authored about 3 years ago by James R. Barlow <[email protected]>
v13 release notes (2)

github.com/ocrmypdf/OCRmyPDF - 25cc17ee038fdd6eea9b8cf2d708f6a6059a9985 authored about 3 years ago by James R. Barlow <[email protected]>
Dockerfile: remove requirements/

github.com/ocrmypdf/OCRmyPDF - e8098a147563825cdf411aab74c7e66eac41edaf authored about 3 years ago by James R. Barlow <[email protected]>
build: use latest pip and wheel in all cases

github.com/ocrmypdf/OCRmyPDF - 6b773883dc6094a1b46b0eaf97983441d8020165 authored about 3 years ago by James R. Barlow <[email protected]>
v13 release notes

github.com/ocrmypdf/OCRmyPDF - 4ed962233519d9530c4960c7b7d546d7e85005dc authored about 3 years ago by James R. Barlow <[email protected]>
Skip no language test for Tess 5

github.com/ocrmypdf/OCRmyPDF - acc9d58c390b7977c75c888b5f2cfba29abea614 authored about 3 years ago by James R. Barlow <[email protected]>
Remove some 'liblept' references we no longer need

github.com/ocrmypdf/OCRmyPDF - 659e738f929dba5cf36949792d9774674ae1f725 authored about 3 years ago by James R. Barlow <[email protected]>
ghostscript: choco doesn't put Ghostscript on PATH anymore

It seems that chocolately doesn't put gswin[32,64]c on PATH anymore,
so compensate.

github.com/ocrmypdf/OCRmyPDF - 7b3d7ca92ae1572286a8656628d2f9b9bc5fa68d authored about 3 years ago by James R. Barlow <[email protected]>
Adjust test to support Tesseract 5 working harder to find its files

github.com/ocrmypdf/OCRmyPDF - e3126d28068b39cff2b21e669e96517cdf4a2e6d authored about 3 years ago by James R. Barlow <[email protected]>
build: tweak CI

github.com/ocrmypdf/OCRmyPDF - 45020a7fcd56b259cb2e90625d2a0c61c304dcce authored about 3 years ago by James R. Barlow <[email protected]>
Upgrade test version of pymupdf

github.com/ocrmypdf/OCRmyPDF - f51164aff8ebf29496e132393953554cf62c5bf7 authored about 3 years ago by James R. Barlow <[email protected]>
pdfa: remove deprecated pkg_resources based access and tests

github.com/ocrmypdf/OCRmyPDF - 6f58a143510431588b7d76d1eb36047f48e6653b authored about 3 years ago by James R. Barlow <[email protected]>
Remove shims to support for old versions of pikepdf < 4

github.com/ocrmypdf/OCRmyPDF - 7ba04267b19ac59a55a61127e009192a032e23d6 authored about 3 years ago by James R. Barlow <[email protected]>
Remove requirements/*.txt - use pip install ocrmypdf[etc] instead

github.com/ocrmypdf/OCRmyPDF - 974956431365272688179d2feb352b7b7e1c7b0b authored about 3 years ago by James R. Barlow <[email protected]>
Remove Python 3.6 specific unicode environment checks

github.com/ocrmypdf/OCRmyPDF - 698e8791d7724f02ff211aff5b57ca349d0b3688 authored about 3 years ago by James R. Barlow <[email protected]>
Remove most Python 3.6 special casing

github.com/ocrmypdf/OCRmyPDF - 380b981763cc38760b758722ff2a9ec8e077bc6b authored about 3 years ago by James R. Barlow <[email protected]>
Remove leptonica and cffi

github.com/ocrmypdf/OCRmyPDF - 5abfb14c2a1bdcc5bf0a287616996d6aae3e33cd authored about 3 years ago by James R. Barlow <[email protected]>
Update cache, related to previous apparently

github.com/ocrmypdf/OCRmyPDF - 036afc4d88eb2677a38d22425a1862f737522f51 authored about 3 years ago by James R. Barlow <[email protected]>
Disable --remove-background so we can remove leptonica

github.com/ocrmypdf/OCRmyPDF - 59642a98b2e0d8a3b60f2dbd5cdf525fb2d9c48a authored about 3 years ago by James R. Barlow <[email protected]>
test_rotation: replace leptonica test with Pillow channel ops

New function is likely not as robust but seems capable of inexact image comparison.

github.com/ocrmypdf/OCRmyPDF - f8c6be2e26145b0f27d41d55983f812b914c4702 authored about 3 years ago by James R. Barlow <[email protected]>
optimize: replace leptonica compdata with direct insert of JPEG

Confirmed that img2pdf just inserts JPEG verbatim. Never had to go through
the trouble we did.

github.com/ocrmypdf/OCRmyPDF - 42bf5476ddbfe920420bba0d3fb848f3de03b333 authored about 3 years ago by James R. Barlow <[email protected]>
Remove --threshold argument

Tesseract is now included better thresholding (binarization) in v5. Users that have
thresholding...

github.com/ocrmypdf/OCRmyPDF - 30440104ba498490b5ca6a4fda694ce8aa5b19de authored about 3 years ago by James R. Barlow <[email protected]>
Convert deskew to use degrees, since all our other angles are in degrees

github.com/ocrmypdf/OCRmyPDF - b159e021104b5a97dbb7bbc8d4c9779b6cd19285 authored about 3 years ago by James R. Barlow <[email protected]>