Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Move Dockerfiles out of root

github.com/ocrmypdf/OCRmyPDF - 3d3b3abc1bccf05eefb656309543312c19e3fb47 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix issue #137 - proportions of non-square resolution distorted

Distortion mainly affected —force-ocr

github.com/ocrmypdf/OCRmyPDF - 7cd2770a13653ec647279f44775c80c67ab8ab41 authored almost 8 years ago by James R. Barlow <[email protected]>
v4.5 notes

github.com/ocrmypdf/OCRmyPDF - 7b94129d9eb8b427bc9fd8870839a33a74b29c08 authored almost 8 years ago by James R. Barlow <[email protected]>
Create test case for Form XObjects

github.com/ocrmypdf/OCRmyPDF - d1a0065ef878712ac4d79a6da758eec051c721b9 authored almost 8 years ago by James R. Barlow <[email protected]>
Warn more strongly about —pdf-renderer tesseract until fix is widely propagated

github.com/ocrmypdf/OCRmyPDF - 5a817370fdf3431fb9b05b1369081067307941de authored almost 8 years ago by James R. Barlow <[email protected]>
Update dockerfile.tess4 yet again

Installing Tess4 PPA over Tess3 proved too much pain, so sever the link
between this and the jba...

github.com/ocrmypdf/OCRmyPDF - ab0a210763118abe4c146faf1721e9a0d91c0ad7 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix running_in_docker() check failing on newer Docker

This test has to work to ensure spoof/tesseract_cache.py has a writable
directory to put cache i...

github.com/ocrmypdf/OCRmyPDF - 9f800736bcc130592b2a0f72f4d43f6269af3238 authored almost 8 years ago by James R. Barlow <[email protected]>
Improve batch processing examples

github.com/ocrmypdf/OCRmyPDF - c9a83afad69c287db7183daf7de859c397586900 authored almost 8 years ago by James R. Barlow <[email protected]>
pageinfo: learn to extract image information from Form XObjects

github.com/ocrmypdf/OCRmyPDF - 5e14274f10a8df284bcb591dc4b58697d1d9cd61 authored almost 8 years ago by James R. Barlow <[email protected]>
Re-fix Dockerfile.tess4

[ci skip]

github.com/ocrmypdf/OCRmyPDF - 167470b4bdd0051f94ad3eade387c933b1a898a5 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix tesseract 3.04 on tesseract 4 on image

[skip ci]

github.com/ocrmypdf/OCRmyPDF - f06d3c2ec2f19e4ca76c6604af06eb3769fbff8e authored almost 8 years ago by James R. Barlow <[email protected]>
v4.4.2 release notes

github.com/ocrmypdf/OCRmyPDF - 74c99a8a774d790f51afc3b3042ab7105eebfdf9 authored almost 8 years ago by James R. Barlow <[email protected]>
Adjust Travis deploy to PyPI settings

-only on master branch
-only Python 3.6 build uploads, so the others don’t compete
-don’t upload...

github.com/ocrmypdf/OCRmyPDF - 0e4d312ee200df5476a8b10a1536d1a739a7215a authored almost 8 years ago by James R. Barlow <[email protected]>
Rewrite Dockerfiles to use ubuntu 16.10 base system

Debian now has a few disadvantages:
-there is no convenient PPA for Debian tesseract 4.0, but th...

github.com/ocrmypdf/OCRmyPDF - 589f19559d7a9716c96d76f38fed7b9514b2db0f authored almost 8 years ago by James R. Barlow <[email protected]>
Configure travis to handle deployment to PyPI; also lint .travis.yml

github.com/ocrmypdf/OCRmyPDF - f28bc25dc0e5be8c402f5a96bc2b110724d78454 authored almost 8 years ago by James R. Barlow <[email protected]>
Prevent use of —pdf-renderer tess4 on tesseract 3

github.com/ocrmypdf/OCRmyPDF - a0657ad937839df77d297b2a7b82392c1a1bc9ff authored almost 8 years ago by James R. Barlow <[email protected]>
Suggest use of aliases to hide docker run

github.com/ocrmypdf/OCRmyPDF - 5b8d88af4ce7f1d68e81774963edd0bcae8715c4 authored almost 8 years ago by James R. Barlow <[email protected]>
Adding missing file Dockerfile.tess4

github.com/ocrmypdf/OCRmyPDF - fa82b5034075a918eb0d77167d888b8069c5b8b2 authored almost 8 years ago by James R. Barlow <[email protected]>
Support ocrmypdf-tess4

github.com/ocrmypdf/OCRmyPDF - 005216bc5712d589db61f752519f207a14ff8b14 authored almost 8 years ago by James R. Barlow <[email protected]>
v4.4.1 release notes

github.com/ocrmypdf/OCRmyPDF - e748fdcf6f0a944b289d7bb4fb018c026ee5de8f authored almost 8 years ago by James R. Barlow <[email protected]>
Add documentation and test cases for —tesseract-config

This parameter has existed for along time but never really got any
attention.

github.com/ocrmypdf/OCRmyPDF - 8c17c9918e2987b9dafab51a26b1a0967b02d5a0 authored almost 8 years ago by James R. Barlow <[email protected]>
More documentation updates

github.com/ocrmypdf/OCRmyPDF - ea0dd99d0b1770e07a724e02b390ba6aabc6b092 authored almost 8 years ago by James R. Barlow <[email protected]>
docs: suggest —oem 1

github.com/ocrmypdf/OCRmyPDF - e0cc67afaedc02e30ee765a53481da7e83eb1e51 authored almost 8 years ago by James R. Barlow <[email protected]>
Describe how to use tesseract 4.0 while 3.04 is installed

github.com/ocrmypdf/OCRmyPDF - 04f9cbe3642a745146608e6db7a51b03de591bf4 authored almost 8 years ago by James R. Barlow <[email protected]>
tesseract jobs_limit(2)

At least on macOS with my quadcore performance improves with two
tesseracts in parallel (20% gai...

github.com/ocrmypdf/OCRmyPDF - 99afebd0339284313e6a6ca9238fdf9e44f04ac9 authored almost 8 years ago by James R. Barlow <[email protected]>
travis: fix ‘pip install’ by moving working code out of the way

github.com/ocrmypdf/OCRmyPDF - a6feacc8101ee875c5b042c2a64c1749f9210877 authored almost 8 years ago by James R. Barlow <[email protected]>
cffi: verbose=True

github.com/ocrmypdf/OCRmyPDF - 65e4b1672fdb265b7c9bf2cc0c610eff565647d5 authored almost 8 years ago by James R. Barlow <[email protected]>
Revert "Do we need to exclude ocrmypdf.lib?"

This reverts commit 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c.

github.com/ocrmypdf/OCRmyPDF - 46cc0dd19098e764a3129bd0f09f5865b1839316 authored almost 8 years ago by James R. Barlow <[email protected]>
Do we need to exclude ocrmypdf.lib?

github.com/ocrmypdf/OCRmyPDF - 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c authored almost 8 years ago by James R. Barlow <[email protected]>
setup.py: cffi is definitely needed in setup_requires

github.com/ocrmypdf/OCRmyPDF - 49ab0c1f0b7da2d5860ba9f5fa56266886b7fb07 authored almost 8 years ago by James R. Barlow <[email protected]>
Experiment: update *requirements.txt, use more current travis build steps

Perhaps this works around the pip/setup.py asymmetry that broke the
4.4 release.

github.com/ocrmypdf/OCRmyPDF - ab490a77367aafa2fe88c67c51db31e19cac23ad authored almost 8 years ago by James R. Barlow <[email protected]>
setup.py: for some reason, subpackages must be explicitly specified

github.com/ocrmypdf/OCRmyPDF - e4ce1dae3549f46309f62a66a8542cab8d73ee22 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix readthedocs build error

github.com/ocrmypdf/OCRmyPDF - 179b812acbaceb1dfc91844f6640356872eca219 authored almost 8 years ago by James R. Barlow <[email protected]>
Note about pytest-helpers-namespace

github.com/ocrmypdf/OCRmyPDF - 7f170517ec0aa2174effb151ebb828aa455b3424 authored almost 8 years ago by jbarlow83 <[email protected]>
Additional docs updates for v4.4

github.com/ocrmypdf/OCRmyPDF - 5480da4f0478be0f90775c4f2994de9adfa94c1e authored almost 8 years ago by James R. Barlow <[email protected]>
Ensure specified destination is writable before starting pipeline process

github.com/ocrmypdf/OCRmyPDF - 9a15a4db10555756880b8e4d5f1de04424130185 authored almost 8 years ago by James R. Barlow <[email protected]>
Autorotation check: Replace duplicated tests with parameterized test

github.com/ocrmypdf/OCRmyPDF - 55aeaec293150c936fac5c41743408487086df81 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix test suite regression: output files dumped in tests/resources

github.com/ocrmypdf/OCRmyPDF - f6df1fb40cb8eee177615d481c2d4e9fa735c190 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix remaining 3.4/3.5 regressions

github.com/ocrmypdf/OCRmyPDF - b889a89c36dc4d6576441a28fca35d7691bfb9a2 authored almost 8 years ago by James R. Barlow <[email protected]>
Fix issue #121 “pop from empty list” (content stream parsing error)

github.com/ocrmypdf/OCRmyPDF - 1976dc6f30ba126332482b325bab230066f8e82b authored almost 8 years ago by James R. Barlow <[email protected]>
(Hopefully) Fix Path <-> py.path conversion on Py3.4/3.5

github.com/ocrmypdf/OCRmyPDF - e864c65d2657a40a1731d5c6f3d370f617b8a824 authored almost 8 years ago by James R. Barlow <[email protected]>
Refactor test suite to use fixtures to manage paths

github.com/ocrmypdf/OCRmyPDF - 02fba02d312011b977a479d4bec47936166aa055 authored almost 8 years ago by James R. Barlow <[email protected]>
Move duplicate test code into common namespace

github.com/ocrmypdf/OCRmyPDF - fb9e7c82f681236d81824f9e806d27d569107e3a authored almost 8 years ago by James R. Barlow <[email protected]>
Add renderers page (missed from previous)

github.com/ocrmypdf/OCRmyPDF - 77d31bf646c6ee3e50b6e487cc18f2cac93fcbcd authored almost 8 years ago by James R. Barlow <[email protected]>
Move pytest.ini into setup.cfg

github.com/ocrmypdf/OCRmyPDF - 29ca799bcf7310ce67014c6c7085381c08393352 authored almost 8 years ago by James R. Barlow <[email protected]>
Update docs for eventual v4.4 release

github.com/ocrmypdf/OCRmyPDF - 467b7f016337e014bdb45cacc7c8ee186209aede authored almost 8 years ago by James R. Barlow <[email protected]>
Rename ‘tesstop’ to ‘tess4’

There’s no reason text-only PDF shouldn’t become the default for
tesseract 4.

github.com/ocrmypdf/OCRmyPDF - bad67c6dc51603f12f5f359cbb4804a97e326053 authored almost 8 years ago by James R. Barlow <[email protected]>
Implement “tesstop” (tesseract v4 text-only pages - working name)

github.com/ocrmypdf/OCRmyPDF - ac40426971e436050863274ad3d6402d12ecac77 authored almost 8 years ago by James R. Barlow <[email protected]>
pipeline: rename some of the stages, for clarity

github.com/ocrmypdf/OCRmyPDF - 7acfaf6d3476b43683d3cf63adcb414749fcc384 authored almost 8 years ago by James R. Barlow <[email protected]>
tesseract: add support for using v4 textonly_pdf feature

github.com/ocrmypdf/OCRmyPDF - 99e47c9c046b23b352e7f82a9cbc06ba6505c59c authored almost 8 years ago by James R. Barlow <[email protected]>
Travis now has Python 3.6, test against it

github.com/ocrmypdf/OCRmyPDF - d7904e2251a8deb61dadca9308b059045086a81e authored almost 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' (4.3.5, Python 3.6 support) into develop

# Conflicts:
# dev_requirements.txt
# requirements.txt

github.com/ocrmypdf/OCRmyPDF - 68aef489de211826beae514ec5502b1ac9fce21c authored almost 8 years ago by James R. Barlow <[email protected]>
Document idea for producing companion text files

github.com/ocrmypdf/OCRmyPDF - 3f9adcd5e0a47bd3b20c39ee23a9b56420ed3090 authored almost 8 years ago by James R. Barlow <[email protected]>
Output to stdout: ensure stdout is flushed to prevent truncation errors

github.com/ocrmypdf/OCRmyPDF - 6cc5135d2d7100684b4d3741b7be0927c7698159 authored almost 8 years ago by James R. Barlow <[email protected]>
Forward --oem argument to tesseract 4

github.com/ocrmypdf/OCRmyPDF - d4c72b371f4e0c764aeef77e9734272cf34581f4 authored almost 8 years ago by James R. Barlow <[email protected]>
Resolve issue #124 - poor performance with Tesseract v4

It seems that Tesseract v4 on a platform with OpenMP working correctly
while perform poorly with...

github.com/ocrmypdf/OCRmyPDF - 18b6f056572e8c1138e642080be4bd7b348263a5 authored almost 8 years ago by James R. Barlow <[email protected]>
tesseract: for v4, use --psm while keeping -psm for v3

At the moment v4 accepts both but who knows if this will get dropped,
so do as document for each...

github.com/ocrmypdf/OCRmyPDF - c42d9baa26242eb56e5d01648105494df24abc23 authored almost 8 years ago by James R. Barlow <[email protected]>
Finalize ‘exec’ migration and make it backward compatibility for now

github.com/ocrmypdf/OCRmyPDF - 6e27ecd2b98dff7569a6fccfa016cc1b33c156dc authored almost 8 years ago by James R. Barlow <[email protected]>
Add installation instructions for Ubuntu 16.04

github.com/ocrmypdf/OCRmyPDF - 482692396e78caf38bae923cd322498a49f32ac7 authored almost 8 years ago by James R. Barlow <[email protected]>
v4.3.5: Python 3.6 compatibility

github.com/ocrmypdf/OCRmyPDF - c48acf165ae557076de86bc3d9e8e5a5dd309f2a authored about 8 years ago by James R. Barlow <[email protected]>
Another attempt at py 3.4/3.5

Revert to exactly what the previous passing build specified.

github.com/ocrmypdf/OCRmyPDF - 9e004c3ec022ae94570a9447b8892ad52b63e014 authored about 8 years ago by James R. Barlow <[email protected]>
fix setuptools-scm for py 3.4, 3.5

github.com/ocrmypdf/OCRmyPDF - 7be4e9c9198630cf75aecf1233c453dd6e07671b authored about 8 years ago by James R. Barlow <[email protected]>
Update requirements files and documentation for Python 3.6 - no code changes

github.com/ocrmypdf/OCRmyPDF - 5ec38a4bed1d263356e3590160321abd07469457 authored about 8 years ago by James R. Barlow <[email protected]>
pdfa: documentation, remove from __future__

github.com/ocrmypdf/OCRmyPDF - f246779b8e489a14b23972eb800b97ed4ad5d166 authored about 8 years ago by James R. Barlow <[email protected]>
Don’t copy pageinfo - job manager already provides a copy of real pdfinfo

github.com/ocrmypdf/OCRmyPDF - a7d8cdf061a99da7f9358b2176e3cd2d8ce9001c authored about 8 years ago by James R. Barlow <[email protected]>
pipeline: don’t use qpdf to check page count again

We already know the number of pages at this stage.

github.com/ocrmypdf/OCRmyPDF - 620745c8121c6bd5914c4dccd2f920ccd6ffa4d4 authored about 8 years ago by James R. Barlow <[email protected]>
Rename exe -> exec, more Unix-y and suggestive

github.com/ocrmypdf/OCRmyPDF - b8767e5ba99c9dbb7137f65535319c960c942c32 authored about 8 years ago by James R. Barlow <[email protected]>
Replace most sys.exit() with raising exceptions

Because ruffus doesn’t handle exceptions well I tended to call sys.exit
to make sure we got out ...

github.com/ocrmypdf/OCRmyPDF - d33a50660d92d43cc83d13c8ea6c420dd99c9728 authored about 8 years ago by James R. Barlow <[email protected]>
Move external program wrappers to ocrmypdf.exe package

github.com/ocrmypdf/OCRmyPDF - 4ee9658e976d533e25d31622295f5557bbae1f90 authored about 8 years ago by James R. Barlow <[email protected]>
More refactoring - helpers.py

github.com/ocrmypdf/OCRmyPDF - dd1b84e7bae6fb6904278e22f6b27167245479df authored about 8 years ago by James R. Barlow <[email protected]>
Extract pipeline out of __main__.py and into pipeline.py

This leaves __main__.py to handle command line arguments while pipeline.py
runs the pipeline - m...

github.com/ocrmypdf/OCRmyPDF - 4c677e6c47cdff85d48b7336c00d94b1fddeb227 authored about 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' into feature/ooruffus

github.com/ocrmypdf/OCRmyPDF - f0f889440be7df6da3db5f917e4fc29adebe9ca2 authored about 8 years ago by James R. Barlow <[email protected]>
v4.3.4: release notes

github.com/ocrmypdf/OCRmyPDF - cc9ceaeb742e4cc449c3a82a1dcb3636eb9dfe69 authored about 8 years ago by James R. Barlow <[email protected]>
Fix MANIFEST for .png

github.com/ocrmypdf/OCRmyPDF - ad2fa8d1d7dd4dd04b7353181a0d83076c90c4bd authored about 8 years ago by James R. Barlow <[email protected]>
Help py.test collect output in more cases

github.com/ocrmypdf/OCRmyPDF - adc1580742a320ed4cf70026cfa7dd5a19ce8fb9 authored about 8 years ago by James R. Barlow <[email protected]>
ghostscript: cleanup harmless error message printed for overprint

Redirect stderr->stdout to hopefully make GS output easier to work with
overall, since the previ...

github.com/ocrmypdf/OCRmyPDF - 4d3b44d6dfe9c8639755914171ca17657c7e2eac authored about 8 years ago by James R. Barlow <[email protected]>
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”

And add new test case for this.

github.com/ocrmypdf/OCRmyPDF - e57aa0eee2b89f9fd08a6f54af8f3697d111b4fa authored about 8 years ago by James R. Barlow <[email protected]>
Make setup.py license internally consistent

github.com/ocrmypdf/OCRmyPDF - 1ae1d116c7f1bd98040fe12002dd26a0d3260149 authored about 8 years ago by James R. Barlow <[email protected]>
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”

And add new test case for this.

github.com/ocrmypdf/OCRmyPDF - 097a69d07f1276237de7dded4dcd7b6d29fd29b1 authored about 8 years ago by James R. Barlow <[email protected]>
Remove non-reentrant options checking and logging setup

github.com/ocrmypdf/OCRmyPDF - a81ce87a50f219a3ec86b06b0c6df8e14b977b8f authored about 8 years ago by James R. Barlow <[email protected]>
Make setup.py license internally consistent

github.com/ocrmypdf/OCRmyPDF - 88be0d43a078c25d87d3828a9b820127eb03159a authored about 8 years ago by James R. Barlow <[email protected]>
Remove test for Pillow JPEG and PNG

As of 3.1.1, our minimum version, these codecs are now required by
default for a successful inst...

github.com/ocrmypdf/OCRmyPDF - ff16a00a3d18daaeeda4f73838c447b74f4bebf4 authored about 8 years ago by James R. Barlow <[email protected]>
Update requirements

-update requirements.txt and dev_requirements.txt to more recent version
-setup.py updated to Ub...

github.com/ocrmypdf/OCRmyPDF - 8982b3e1e21ce4dbe9021a62c27ede7d6a9b9d13 authored about 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' into feature/ooruffus

github.com/ocrmypdf/OCRmyPDF - be0fa35d14b0430c9a594a11d4abba3def114ef8 authored about 8 years ago by James R. Barlow <[email protected]>
Finalize v4.3.3 release notes

github.com/ocrmypdf/OCRmyPDF - 9f51ed9d0114e18618f7a1d75e3b096b8c2c1260 authored about 8 years ago by James R. Barlow <[email protected]>
Add test cases for Ghostscript PDF/A warnings

github.com/ocrmypdf/OCRmyPDF - 731e6792c7b9141442e64b38b3ff002caceea960 authored about 8 years ago by James R. Barlow <[email protected]>
ghostscript: more effort at error logging

github.com/ocrmypdf/OCRmyPDF - c35ec0b4aaec4c4c6c84bf0e38e54155912f4b21 authored about 8 years ago by James R. Barlow <[email protected]>
v4.3.3 release notes, fix more gs 9.20 issues

github.com/ocrmypdf/OCRmyPDF - 03aaf575dc8c94877f528f59f2b2bc7fa26fbd64 authored about 8 years ago by James R. Barlow <[email protected]>
Move work_folder into multiprocessing manager

github.com/ocrmypdf/OCRmyPDF - 9a060579bac3ede8c259b05680ece08fdade6c99 authored about 8 years ago by James R. Barlow <[email protected]>
Remove all remaining traces of ‘options’ global state from task runners

github.com/ocrmypdf/OCRmyPDF - d40a5c4f7a5edd25cc986a60cbce44c7f7e495ec authored about 8 years ago by James R. Barlow <[email protected]>
Distribute ‘options’ to worker processes via the multiprocessing manager

github.com/ocrmypdf/OCRmyPDF - 21f7dc337774ed453333d4bd39d68db9052db68b authored about 8 years ago by James R. Barlow <[email protected]>
Replace pdfinfo, pdfinfo_lock with multiprocessing manager

Using a context manager to guard the pdfinfo list makes the lock
unnecessary. (Although it was p...

github.com/ocrmypdf/OCRmyPDF - 43c13a1ed90399c06ca7d526de5a94e032772441 authored about 8 years ago by James R. Barlow <[email protected]>
Remove “WrappedLogger” - does not do anything useful

Never really investigated the reason why ruffus returns a mutex to go
along with its logger. It ...

github.com/ocrmypdf/OCRmyPDF - 6bc3f189e166ae98680a5fc7a114bbfe479755da authored about 8 years ago by James R. Barlow <[email protected]>
Remove temporary re_symlink logging shim

github.com/ocrmypdf/OCRmyPDF - 2c5437135cc46dd9c325957eb6f9d8f764fc58cc authored about 8 years ago by James R. Barlow <[email protected]>
Fix mistake made in converting pipeline; incredibly, all tests pass now

github.com/ocrmypdf/OCRmyPDF - 444da025230923322e8edd2fe4a075caaf244059 authored about 8 years ago by James R. Barlow <[email protected]>
Reactivate the pipeline; surprisingly works in quick test

github.com/ocrmypdf/OCRmyPDF - 00e8af2381f9c3641a13bc1a42854f56cccf7e2e authored about 8 years ago by James R. Barlow <[email protected]>
Convert to object oriented ruffus syntax (does not run)

I experimented with the idea of using asyncio-based processing but
realized that that does not s...

github.com/ocrmypdf/OCRmyPDF - 401b21864f6faa82fc4a5cc594ac42bdaff5c92f authored about 8 years ago by James R. Barlow <[email protected]>
Record version in debug log

github.com/ocrmypdf/OCRmyPDF - de939951d49397d9bf2ce3b4cc4eeacc524c20a4 authored about 8 years ago by James R. Barlow <[email protected]>
Fix exception on inline stencil masks with no /CS attribute

github.com/ocrmypdf/OCRmyPDF - 7725d16a26eb7a25540110e80636ca1a87a725f2 authored about 8 years ago by James R. Barlow <[email protected]>
Add security suggestions

github.com/ocrmypdf/OCRmyPDF - 8a74408d83eccaf1c3937f58ff99dfd9226bd0e9 authored about 8 years ago by James R. Barlow <[email protected]>