Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

tesseract_cache: update explanatory notes

github.com/ocrmypdf/OCRmyPDF - 5de107d44cbd2b42428d1a846ab6f4b096fa7771 authored over 7 years ago by James R. Barlow <[email protected]>
tesseract.py: update canned HOCR template to tess 3.05 output

Seems better to not claim the existence of several entities that don’t
exist as the older one does

github.com/ocrmypdf/OCRmyPDF - 131a5b741df0099f9e988ea03f5e1cd96db519b7 authored over 7 years ago by James R. Barlow <[email protected]>
ghostscript: fix missing “import sys”, only applicable for an exception

github.com/ocrmypdf/OCRmyPDF - 65b89687a9715b319d384d2076776e4a06094ba2 authored over 7 years ago by James R. Barlow <[email protected]>
Update copyrights

github.com/ocrmypdf/OCRmyPDF - 048ae40e75fa8f26688bf90f81a589f829ca2ceb authored over 7 years ago by James R. Barlow <[email protected]>
Fix: Tesseract 3.04 is sensitive to order of configuration commands

“txt hocr” is not acceptable and does not produce expected output .txt
while “hocr text” works f...

github.com/ocrmypdf/OCRmyPDF - 234183ecd230b89f83ce872eb5e8dc776e546897 authored over 7 years ago by James R. Barlow <[email protected]>
cookbook: more on improving OCR

github.com/ocrmypdf/OCRmyPDF - fb067dc97bf4b8b5a376e54e0202a8bb0fde39d9 authored over 7 years ago by James R. Barlow <[email protected]>
docs: link to OCRmyPDF-web

github.com/ocrmypdf/OCRmyPDF - a1fea0ce160e77f2bb03193df33d87c969d5c152 authored over 7 years ago by James R. Barlow <[email protected]>
Test suite: tidy up imports

github.com/ocrmypdf/OCRmyPDF - e1e9135e93a8d17461e6c6135955c3e66aa21980 authored over 7 years ago by James R. Barlow <[email protected]>
autobrew: fix brew audit error on double blank line

github.com/ocrmypdf/OCRmyPDF - aff982036bed4f0eb7575332a1ac62e9153825be authored over 7 years ago by James R. Barlow <[email protected]>
Remove “null deploy script” since “/usr/bin/true” is equivalent

github.com/ocrmypdf/OCRmyPDF - d087649eab0efa1f81a6c5688c9ffd8b83f9a840 authored over 7 years ago by James R. Barlow <[email protected]>
v5.0 release notes

github.com/ocrmypdf/OCRmyPDF - 7f3fa46a40f61f1e78071bc48b108ef4cf13b198 authored over 7 years ago by James R. Barlow <[email protected]>
Disable other use redo_ocr

github.com/ocrmypdf/OCRmyPDF - b1f79e4d9797a1d2c7ab84f491d02ae4ca2db9f8 authored over 7 years ago by James R. Barlow <[email protected]>
Warn user when —image-dpi is supplied but ignored

github.com/ocrmypdf/OCRmyPDF - 115d6df94ff48e6dc4f05da5976775b2fc4f3596 authored over 7 years ago by James R. Barlow <[email protected]>
—redo-ocr is not implemented, so disable

github.com/ocrmypdf/OCRmyPDF - 559af9635fa3e7afbeb9db66e2eae05d09b9d6ff authored over 7 years ago by James R. Barlow <[email protected]>
Turn on Tesseract 4 cache in test suite

Travis is too slow without it, and perhaps it’s overly paranoid to
never cache Tess4. Maybe nuke...

github.com/ocrmypdf/OCRmyPDF - cb06359c0b3628061f5b091a5b68542fe6f54a6f authored over 7 years ago by James R. Barlow <[email protected]>
Update requirements files

github.com/ocrmypdf/OCRmyPDF - 5e26bb29d974a442f0f13adc357fe2005b2014c7 authored over 7 years ago by James R. Barlow <[email protected]>
Fix Travis CI errors while looking around for Tess4

github.com/ocrmypdf/OCRmyPDF - b0e95842b89585281348761b9fc9648ea28643e8 authored over 7 years ago by James R. Barlow <[email protected]>
rst: Clean up indentation

github.com/ocrmypdf/OCRmyPDF - 08e678f21f81497ede59ed49c35c5b550ae8526b authored over 7 years ago by James R. Barlow <[email protected]>
Update documentation for 3.03 support removal

github.com/ocrmypdf/OCRmyPDF - c17817810f6db11a7965a2b935e288db5b0a2a95 authored over 7 years ago by James R. Barlow <[email protected]>
Tell Travis to download Tesseract 4.00 from a PPA for testing

github.com/ocrmypdf/OCRmyPDF - ff5c38b1f78d2c7dead3357692bcd74a4d4ed7ae authored over 7 years ago by James R. Barlow <[email protected]>
Insist on Python 3.5 wherever we check for it

github.com/ocrmypdf/OCRmyPDF - 64314c1b827380326a8b6441880cc38895cee906 authored over 7 years ago by James R. Barlow <[email protected]>
Insist on Tesseract 3.04 wherever we check for it

github.com/ocrmypdf/OCRmyPDF - 83230097aebfc13ba856f72a5d55405b1679d6a1 authored over 7 years ago by James R. Barlow <[email protected]>
Remove Tesseract 3.02 and 3.03 compatibility shims

github.com/ocrmypdf/OCRmyPDF - 8f91acf95693d7eb743e4ff0cd3d6b770a6cc26f authored over 7 years ago by James R. Barlow <[email protected]>
.gitignore the docs Makefile

github.com/ocrmypdf/OCRmyPDF - d211722a2ff8b7207285eae424986e6f34698466 authored over 7 years ago by James R. Barlow <[email protected]>
Fix missing import; all tests passing!

github.com/ocrmypdf/OCRmyPDF - 56e6ed1249d300608674d87d32425ca8713cf62e authored over 7 years ago by James R. Barlow <[email protected]>
baiona_gray remove alpha channel

github.com/ocrmypdf/OCRmyPDF - 21982cf1cb62df9b549d146c2bad76df607b4140 authored over 7 years ago by James R. Barlow <[email protected]>
Update the .png files, again, hopefully without corruption

github.com/ocrmypdf/OCRmyPDF - edc01408da8c0ba1832fd3bd00bfdfb207dce210 authored over 7 years ago by James R. Barlow <[email protected]>
Merge release notes

github.com/ocrmypdf/OCRmyPDF - aee33c87eda2b9c3ea5b50563a12281779beea15 authored over 7 years ago by James R. Barlow <[email protected]>
Fix missing import PIPE

github.com/ocrmypdf/OCRmyPDF - 0dae1602c705da4e125eb619d01d223c9fc9ed0e authored over 7 years ago by James R. Barlow <[email protected]>
Stop git from corrupting .pngs

Grrr.

github.com/ocrmypdf/OCRmyPDF - d926f07ac1a7df3c4cdb653af9e04d4c29a55bb5 authored over 7 years ago by James R. Barlow <[email protected]>
Update develop with master changes

We’re well out of the “trivial updates” zone

github.com/ocrmypdf/OCRmyPDF - 96045e98f493c334f0f64009b803bb2e18bda9ff authored over 7 years ago by James R. Barlow <[email protected]>
Ensure skipped pages are explained in sidecars

github.com/ocrmypdf/OCRmyPDF - 01b7205e2c0940577b795dbb0405c30af74723a6 authored over 7 years ago by James R. Barlow <[email protected]>
Fix test suite breakage after sidecar feature added

Forgot to update tesseract spoofers to account for change in tesseract
parameters. Also the cha...

github.com/ocrmypdf/OCRmyPDF - c8a4cbcf17389a0a10e261c6d9e66b8dc3aeeb48 authored over 7 years ago by James R. Barlow <[email protected]>
Add changes to __main__.py that should have been in last commit

github.com/ocrmypdf/OCRmyPDF - 16b6442b23cc7a45a416dc9a8b094e45f49f8a6f authored over 7 years ago by James R. Barlow <[email protected]>
Implement sidecar text files (#126)

github.com/ocrmypdf/OCRmyPDF - 183eafa587b1952fbded889918cf75fa9b20ff1c authored over 7 years ago by James R. Barlow <[email protected]>
Reorganize —help text

github.com/ocrmypdf/OCRmyPDF - 47a2997538780a52bda4d3cae8ef7d228e20df8a authored over 7 years ago by James R. Barlow <[email protected]>
Implement —user-words, —user-patterns

github.com/ocrmypdf/OCRmyPDF - 37ebcadfa16ffaacd1c1120d1b246e056efbf34d authored over 7 years ago by James R. Barlow <[email protected]>
Update documentation for Ghostscript behavior

github.com/ocrmypdf/OCRmyPDF - 74d98216f150aa58a194b8a62dcb1f9fb5c015f5 authored over 7 years ago by James R. Barlow <[email protected]>
Tell Travis CI to use multiple cores

Let’s see if this helps the build go faster

github.com/ocrmypdf/OCRmyPDF - 4bdebf573e0f8cdf68697701fbf00c203f5a09b8 authored over 7 years ago by James R. Barlow <[email protected]>
Add —quiet (fixes #143), stop using ruffus to partially generate argparser

github.com/ocrmypdf/OCRmyPDF - 1606b6a383a82c2015477305bdf10d9cb4bd6d47 authored over 7 years ago by James R. Barlow <[email protected]>
Merge commit 'c4f01de231d22da5cea02c25aa581a965a37640b'

github.com/ocrmypdf/OCRmyPDF - 2a61902df59111a5ea4502dfbae701ddd3b178a8 authored over 7 years ago by James R. Barlow <[email protected]>
Implement —pdfa-image-compression to control Ghostscript’s compression

Fixes #163

github.com/ocrmypdf/OCRmyPDF - 01a1c2b57642ab8e687e3b187251f18ba5302e41 authored over 7 years ago by James R. Barlow <[email protected]>
Fix typo "cutput" -> "output" (#164)

[ci skip]

github.com/ocrmypdf/OCRmyPDF - c4f01de231d22da5cea02c25aa581a965a37640b authored over 7 years ago by Ingo Feinerer <[email protected]>
Revert "v4.5.7 release notes"

The change introduced regressions, so find another way to fix.

This reverts commit d077c0368698...

github.com/ocrmypdf/OCRmyPDF - 63a4a761dd47e9b8c86074545c6846377f1a74d6 authored over 7 years ago by James R. Barlow <[email protected]>
v4.5.7 release notes

github.com/ocrmypdf/OCRmyPDF - d077c03686981c1601305cac2eb7b97e7f823a34 authored over 7 years ago by James R. Barlow <[email protected]>
Update high DPI test case to confirm the output image is not downsampled

github.com/ocrmypdf/OCRmyPDF - c97ea1f2a98f81b266dbb4a78014c1be63b4c9b3 authored over 7 years ago by James R. Barlow <[email protected]>
Update documentation to warn that transparency is not tested

github.com/ocrmypdf/OCRmyPDF - fd27df2abb57ea7c7c0f0bdcbc5fb8d28803e1f7 authored over 7 years ago by James R. Barlow <[email protected]>
Fix corrupt test file “typewriter.png”

This file is not currently used in any tests, but could be, so replace
corrupt version with a us...

github.com/ocrmypdf/OCRmyPDF - bf04f03c4cb5c7e44e466d985ffc2bca532563ce authored over 7 years ago by James R. Barlow <[email protected]>
Fix issue #163, color and grayscale images JPEG compressed when not needed

github.com/ocrmypdf/OCRmyPDF - 93e802f473184deec68a39f91986c5a836da5d59 authored over 7 years ago by James R. Barlow <[email protected]>
Try Travis again with null deploy for OSX

github.com/ocrmypdf/OCRmyPDF - 1464b9087ac253ea46081ed3939de2d2f346960c authored over 7 years ago by James R. Barlow <[email protected]>
Add travis null_deploy for osx

github.com/ocrmypdf/OCRmyPDF - e8cc8fc87989ff230e70f7c6718d87f369677091 authored over 7 years ago by James R. Barlow <[email protected]>
v4.5.6 release notes

github.com/ocrmypdf/OCRmyPDF - fae2119b1ef16e1a10bbb083def90f1e13780709 authored over 7 years ago by James R. Barlow <[email protected]>
Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents

github.com/ocrmypdf/OCRmyPDF - aa859a4139271e4f44c46894b6fcb42f5b61ddc1 authored over 7 years ago by James R. Barlow <[email protected]>
Ensure that ocrmypdf stops and reports an error if Ghostscript fails

Past behavior was to continue and let ruffus puke eventually

github.com/ocrmypdf/OCRmyPDF - b9b12e28798f6fe0614859887cf297f99fa3fc18 authored over 7 years ago by James R. Barlow <[email protected]>
Fix argparse.ArgumentError needs two positional args

github.com/ocrmypdf/OCRmyPDF - cf643c9f431cf5d68df8b275c6210072cd9c53b2 authored over 7 years ago by James R. Barlow <[email protected]>
Switch to Travis triggering Docker build to skip race condition with PyPI

github.com/ocrmypdf/OCRmyPDF - 5b1a7880a94cd85588fae55d3aa347a4b220680e authored over 7 years ago by James R. Barlow <[email protected]>
v4.5.5 release notes

github.com/ocrmypdf/OCRmyPDF - 474b6b050031f31548a0be3623098af8b983fdfc authored over 7 years ago by James R. Barlow <[email protected]>
Fix #154: KeyError ‘/Contents’ on blank pages with /Contents record

github.com/ocrmypdf/OCRmyPDF - 6c8c1d8173047a3ae9241d4c8b30ce39c442383c authored over 7 years ago by James R. Barlow <[email protected]>
Squash merge improvements to auto-homebrewing macOS version

github.com/ocrmypdf/OCRmyPDF - 6a91fa637f7bedd48a0b368a0f91c0e9356c07e5 authored over 7 years ago by James R. Barlow <[email protected]>
Remove misplaced flags from re.sub() call (#153)

The 4th argument of re.sub() is maximum number of substitutions,
not flags.

Moreover, re.MUL...

github.com/ocrmypdf/OCRmyPDF - 2846fb4e310c331c502367f13011396335c803f6 authored over 7 years ago by Jakub Wilk <[email protected]>
osx_brew: show output before letting “brew audit” check it

github.com/ocrmypdf/OCRmyPDF - a1033cdc64f48a907cc136e8ec44bc267eefde7e authored over 7 years ago by James R. Barlow <[email protected]>
Move release notes into the rest of documentation

github.com/ocrmypdf/OCRmyPDF - 204336e1a5fb6f49447070e11cc57ceeb2453b6a authored over 7 years ago by James R. Barlow <[email protected]>
v4.5.4 Update release notes

github.com/ocrmypdf/OCRmyPDF - 8954e6c3b9e609f78021943f84235c2c3b39fcea authored over 7 years ago by James R. Barlow <[email protected]>
Fix #151, cannot write mode P as JPEG

all(<empty generator>) is True.

github.com/ocrmypdf/OCRmyPDF - fee22b6b0b8efc3451b9cc4e50e4d70b1fed35b1 authored over 7 years ago by James R. Barlow <[email protected]>
Update documentation

github.com/ocrmypdf/OCRmyPDF - 2b82c31b85f422dba502fe54198306514065106d authored over 7 years ago by James R. Barlow <[email protected]>
autobrew: remove homebrew dependency “zlib”, causes audit failure

github.com/ocrmypdf/OCRmyPDF - 9a4813089c3c80b75a019d0a68f848c5e9fd9f23 authored over 7 years ago by James R. Barlow <[email protected]>
Add test case for #152

github.com/ocrmypdf/OCRmyPDF - 554fcc8b9dd228ebc791f2d2749fb4a17770c030 authored over 7 years ago by James R. Barlow <[email protected]>
Fix --skip-big when there are no images in pdf (#152)

* fixed skip-big when there are no images in pdf

* added only_text pdf

* updated only_text...

github.com/ocrmypdf/OCRmyPDF - 345256ee99c5ac6b4de552437c4960782cf96d47 authored over 7 years ago by Tom <[email protected]>
v4.5.3 release notes update

github.com/ocrmypdf/OCRmyPDF - 58d1042147ec7a430caa530d36e841e0e5966612 authored almost 8 years ago by James R. Barlow <[email protected]>
Enable lossless reconstruction for —pdf-renderer tess4 where appropriate

github.com/ocrmypdf/OCRmyPDF - 7b7e3a3e03d4052982c69b69caed32fb16315244 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix issues with —pdf-renderer tess4 page skipping

If tess4 renderer needed to skip OCR on a page it would end up
duplicating the page contents ont...

github.com/ocrmypdf/OCRmyPDF - 1e7fbd4202753eefad9eab7d9818f04fba9a0999 authored almost 8 years ago by James R. Barlow <[email protected]>
Refresh requirements

github.com/ocrmypdf/OCRmyPDF - 6e907856f233cece47dfecb60a3559fda0116462 authored almost 8 years ago by James R. Barlow <[email protected]>
Begin adding new option to redo ocr

github.com/ocrmypdf/OCRmyPDF - 8bc601917252d6f7af431f36ab958c01f882b3da authored almost 8 years ago by James R. Barlow <[email protected]>
Phase out subprocess.Popen

github.com/ocrmypdf/OCRmyPDF - 059f79242e174cb5d4b994855708a09ce1b82298 authored almost 8 years ago by James R. Barlow <[email protected]>
Drop Python 3.4 compatibility

github.com/ocrmypdf/OCRmyPDF - 89599b4812b71b5985049d2f5e580efc6930dafe authored almost 8 years ago by James R. Barlow <[email protected]>
Remove backward compatible API deprecations from v4.x

github.com/ocrmypdf/OCRmyPDF - a9f4047a978d7fcc8064ca824948c1939fcc5cd4 authored almost 8 years ago by James R. Barlow <[email protected]>
Deprecate old files

github.com/ocrmypdf/OCRmyPDF - 23227ae763e20120b5921b6fed4eb040d7b081cb authored almost 8 years ago by James R. Barlow <[email protected]>
v4.5.3 release notes

github.com/ocrmypdf/OCRmyPDF - 4a9e9e9db2117688e3371e2cd92339da40228dda authored almost 8 years ago by James R. Barlow <[email protected]>
Reject high Unicode metadata at command line

Ghostscript 9.21 does not seem to accept Unicode above U+FFFF. Previous
versions did, but it now...

github.com/ocrmypdf/OCRmyPDF - 88ef2718f13f3e15667dc0e2751fcb73a619764c authored almost 8 years ago by James R. Barlow <[email protected]>
Workaround for GS VMerror -25 bug

Avoid inserting docinfo keys that would be translated to null strings,
to avoid running afoul of...

github.com/ocrmypdf/OCRmyPDF - e71e8ca3ada46860a69ff251093cc6566d3ecd09 authored almost 8 years ago by James R. Barlow <[email protected]>
Don’t use filename “pdfa_def.ps” for GS file

At recommendation of Artifex people, don’t use the filename pdfa_def.ps
because if given without...

github.com/ocrmypdf/OCRmyPDF - 45e9257d6edd2b1be2d8d4b8fc54e376ab3b8998 authored almost 8 years ago by James R. Barlow <[email protected]>
Some examples of Ghostscript and Tesseract warnings/errors were not tagged properly

github.com/ocrmypdf/OCRmyPDF - 2954e72652dd7d6240d0246263b57060dbde33f3 authored almost 8 years ago by James R. Barlow <[email protected]>
Ghostcript 9.21 seems to have a regression related to Unicode metadata

github.com/ocrmypdf/OCRmyPDF - 199de96cff214afe3b972027c49f1f03f32b8c29 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix issue #147: unpaper loses DPI information, affects —pdf-renderer tess4

github.com/ocrmypdf/OCRmyPDF - 8ddbe8151311243a6060728a90319a960ebef30b authored almost 8 years ago by James R. Barlow <[email protected]>
Make —pdf-renderer tess4 more informative, less FUD

github.com/ocrmypdf/OCRmyPDF - a3e26e049819a43d77ebb4b4b81d0ebfe3e8379e authored almost 8 years ago by James R. Barlow <[email protected]>
docs: Don't recommend system-site-packages anymore (Ubuntu 16.04)

Not needed since reportlab 3.4 comes with a wheel, and that was the main difficulty.

github.com/ocrmypdf/OCRmyPDF - 4ad129d8d8a39a6c50022100e8e3b91652c641b2 authored almost 8 years ago by James R. Barlow <[email protected]>
autobrew: missing deps

github.com/ocrmypdf/OCRmyPDF - dfb9fa0736159a7dac5fd32b160f28c2d2a60848 authored almost 8 years ago by James R. Barlow <[email protected]>
Update documentation with macOS homebrew tap

[ci skip]

github.com/ocrmypdf/OCRmyPDF - eb036898e9e3710700607b9f5fcf87c70588f87e authored almost 8 years ago by James R. Barlow <[email protected]>
Fix brew audit failure

github.com/ocrmypdf/OCRmyPDF - 7c6aa76a2a64444d48a28d69a49f2763d7e588c7 authored almost 8 years ago by James R. Barlow <[email protected]>
Fixed issue #142 — closed streams raise an exception on fork attempt

github.com/ocrmypdf/OCRmyPDF - f035cb10886a26394f10c11efe1a4b9a9bcdc123 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix UnboundLocalError in autobrew.py

github.com/ocrmypdf/OCRmyPDF - 35162166c5de5bf5430975e5ee38b6274efd48c7 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix autobrew build issues - missing deps

github.com/ocrmypdf/OCRmyPDF - 107f6abcb19e96ec5c10918ee9038e1294285971 authored almost 8 years ago by James R. Barlow <[email protected]>
Further autobrew tweaks

github.com/ocrmypdf/OCRmyPDF - 760a939e7d809f3bc357d65c0f430da81946b464 authored almost 8 years ago by James R. Barlow <[email protected]>
MacOS skip the one test that needs poppler, to save installing poppler

github.com/ocrmypdf/OCRmyPDF - 72660d0deca8cd69fb3ebcc85df2722be98c47ce authored almost 8 years ago by James R. Barlow <[email protected]>
before_deploy doesn’t run unless something is going to be deployed

github.com/ocrmypdf/OCRmyPDF - 8444a8f211d57797e528a76d8966800be1cfd0fa authored almost 8 years ago by James R. Barlow <[email protected]>
Improvements to macOS test and work on homebrew tap autobrew

Squashed commits:
[3f06c1e] Try setting up homebrew tap autobuilding
[01532f1] Strict mode error...

github.com/ocrmypdf/OCRmyPDF - 4a1fec8328d204aa4de6c1b4b4a4be0867bdfcf1 authored almost 8 years ago by James R. Barlow <[email protected]>
Revert "Finalize Dockerfile move; unfortunately not supported by Docker Hub"

Unfortunately because of this issue
https://github.com/docker/hub-feedback/issues/292

Docker Hu...

github.com/ocrmypdf/OCRmyPDF - 42547f601756393e5538225e28fc1c8b5ea59585 authored almost 8 years ago by James R. Barlow <[email protected]>
Revert "Move Dockerfiles out of root"

This reverts commit 3d3b3abc1bccf05eefb656309543312c19e3fb47.

github.com/ocrmypdf/OCRmyPDF - 0ccf564f03395b58304ddca62f197b3ba09c4f91 authored almost 8 years ago by James R. Barlow <[email protected]>
Finalize Dockerfile move; unfortunately not supported by Docker Hub

Unfortunately because of this issue
https://github.com/docker/hub-feedback/issues/292

Docker Hu...

github.com/ocrmypdf/OCRmyPDF - 65c9a07ddedb021b317383cfc1d8dc156869c92b authored almost 8 years ago by James R. Barlow <[email protected]>
Move pipeline.svg out of root

github.com/ocrmypdf/OCRmyPDF - 4700a193225b00da909b3abad672d3923adb6e6c authored almost 8 years ago by James R. Barlow <[email protected]>