Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Add support for inline images

github.com/ocrmypdf/OCRmyPDF - 5cc3adb39a552f2ef028ac43b9a6d3386596f0e7 authored almost 9 years ago by James R. Barlow <[email protected]>
Compute image pixel density without performing rectangle intersection (+5 squashed commits)

Squashed commits:
[0e27904] Partially implement DPI calculation with rotation of the image

Fixes...

github.com/ocrmypdf/OCRmyPDF - 3957a0606c48ebe7ca760c3255f81c7afb2b63bb authored almost 9 years ago by James R. Barlow <[email protected]>
Add comments and remove debugging, improve inline handling

Squashed commits:
[bfff3c9] pageinfo, have a main()

github.com/ocrmypdf/OCRmyPDF - 570bbe9a0532c9eadafdc7e23c62583147293f1b authored almost 9 years ago by James R. Barlow <[email protected]>
v4.0.3 release notes

github.com/ocrmypdf/OCRmyPDF - 11a561dbce979e8a8120a741b48bf95d566131c7 authored almost 9 years ago by James R. Barlow <[email protected]>
Log information about detected page orientations in a summary line

github.com/ocrmypdf/OCRmyPDF - dad2198394fd670864b4eb06c7d95fed297c6ad1 authored almost 9 years ago by James R. Barlow <[email protected]>
Always dump stack trace for unexpected errors

github.com/ocrmypdf/OCRmyPDF - e40fdc502d28772d3815bd9c221fe7b0d9af146e authored almost 9 years ago by James R. Barlow <[email protected]>
Fix "too few characters" reported as error by tesseract -psm 0

github.com/ocrmypdf/OCRmyPDF - d446fe592243749add568b4b0d8a1db937cadbd1 authored almost 9 years ago by James R. Barlow <[email protected]>
Docker: fix blank JPEG2000 PDF issue

github.com/ocrmypdf/OCRmyPDF - 4ca90c106d6ee52e9589a11298f6ab91830dd4ba authored almost 9 years ago by James R. Barlow <[email protected]>
Fix test cases that break in Docker, improve test for running in Docker

github.com/ocrmypdf/OCRmyPDF - 7c5e58a497cb7f3bc2baf46207b0f93187d17b26 authored almost 9 years ago by James R. Barlow <[email protected]>
Add other missing files

github.com/ocrmypdf/OCRmyPDF - 323b9a5f8efda8e4283081958d9dbe96f193a2e6 authored almost 9 years ago by James R. Barlow <[email protected]>
Add JPEG 2000 test case

github.com/ocrmypdf/OCRmyPDF - cab381a339e4e1e86337c56db1bee252c7844d81 authored almost 9 years ago by James R. Barlow <[email protected]>
Merge commit '6f3ac46b1c176d48782347cfa14d9ef6ce773f37' into develop

github.com/ocrmypdf/OCRmyPDF - fe4d4c39cdf69ec7baa73cc0b535806b9788f856 authored almost 9 years ago by James R. Barlow <[email protected]>
Docker: supply openjpeg to address JPXDecode errors

github.com/ocrmypdf/OCRmyPDF - ad188d7ae1f931cf69e50ebff3a23c05f9536fbf authored almost 9 years ago by James R. Barlow <[email protected]>
Gracefully recover from tesseract's failure to process very large images

And test cases to check this

github.com/ocrmypdf/OCRmyPDF - 8246cc0538f7bf2d436977e6c071bf6dc0204fc6 authored almost 9 years ago by James R. Barlow <[email protected]>
Gracefully recover from tesseract's failure to process very large images

And test cases to check this

github.com/ocrmypdf/OCRmyPDF - 6f3ac46b1c176d48782347cfa14d9ef6ce773f37 authored almost 9 years ago by James R. Barlow <[email protected]>
4.0.2rc1 - release notes, add missing file caught by Travis

github.com/ocrmypdf/OCRmyPDF - ac71c3be638e990dabe8f61797bf5a0d2f0d74ed authored almost 9 years ago by James R. Barlow <[email protected]>
Fix error on --tesseract-timeout timing out

github.com/ocrmypdf/OCRmyPDF - ecc0ac9b197832ec5e8690beb011f834c8d9fe12 authored almost 9 years ago by James R. Barlow <[email protected]>
leptonica: serialization tweaks, memory handling

github.com/ocrmypdf/OCRmyPDF - ea4e6bf67da1e01527d66eeabe3229aa0a5750b3 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix leptonica pickling

github.com/ocrmypdf/OCRmyPDF - 46c204f533d1433664b4844c0307a05d6a00b0b3 authored almost 9 years ago by James R. Barlow <[email protected]>
Adjust page orientation parsing to deal with change in Tess 3.04.01

github.com/ocrmypdf/OCRmyPDF - 71fbda8bf68fffa6c1d6c4cb7bb6f86f20a1ab9c authored almost 9 years ago by James R. Barlow <[email protected]>
Leptonica: documentation, helper functions

github.com/ocrmypdf/OCRmyPDF - 9b79b4a7c8ee5cb6dfe981848b12f98c781e2326 authored almost 9 years ago by James R. Barlow <[email protected]>
leptonica: remove special PNM handling

We no longer use PNM as an intermediate format, so there's no need to
handle leptonica's PNM qui...

github.com/ocrmypdf/OCRmyPDF - c04cc853d7808a688e35295e030d1a0d6d649284 authored almost 9 years ago by James R. Barlow <[email protected]>
leptonica: nit

github.com/ocrmypdf/OCRmyPDF - dd41e70ccc764f0c6b52e28e03fa741d58cd6a02 authored almost 9 years ago by James R. Barlow <[email protected]>
tests: also check that monochrome correlation correctly detects matches

github.com/ocrmypdf/OCRmyPDF - 4206e74f42daa48b1d6c0a4af6cba592a570dfbf authored almost 9 years ago by James R. Barlow <[email protected]>
Don't do chmod unless necessarily (breaks py.test on Docker)

github.com/ocrmypdf/OCRmyPDF - 68c3ce56a9e5387816481786bacf44252fb102ce authored almost 9 years ago by James R. Barlow <[email protected]>
Improve error checking for tesseract -psm 0 (orientation) errors

github.com/ocrmypdf/OCRmyPDF - ab0e5fa4256095e8d027aca65664b550d2b4ac0e authored almost 9 years ago by James R. Barlow <[email protected]>
Improve ability to capture error messages from tesseract on a crash

github.com/ocrmypdf/OCRmyPDF - f3b0434a87a9fde844b10d93934d580f2526f090 authored almost 9 years ago by James R. Barlow <[email protected]>
Just use the PyPI version of ocrmypdf in dockerfile

Apparently setuptools_scm_git_archive is ineffective on hub.docker.com
automatic build, it still...

github.com/ocrmypdf/OCRmyPDF - aa394440dbdb9e55fc64f32dc0ba07333c69a595 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix KeyError on unexpected tess output

github.com/ocrmypdf/OCRmyPDF - 3b98a1a04b19f84114bb1587e97116dcfb8b6d95 authored almost 9 years ago by James R. Barlow <[email protected]>
Forgot to save release notes

github.com/ocrmypdf/OCRmyPDF - fcb89b0c58cb9c27e2702ffdbe8651fcd2dcb6a6 authored almost 9 years ago by James R. Barlow <[email protected]>
v4.0: release notes

github.com/ocrmypdf/OCRmyPDF - ac65d6a03a7751d16043165bb37152629bbc79f5 authored almost 9 years ago by James R. Barlow <[email protected]>
Merge branch 'release/v4.0.0'

github.com/ocrmypdf/OCRmyPDF - 2103f6090673a0814fe6bf2dadc3cc9370e7c2da authored almost 9 years ago by James R. Barlow <[email protected]>
Save Dockerfile comment

github.com/ocrmypdf/OCRmyPDF - e3c3d848c109afed6eddc1622747c4d55a401df8 authored almost 9 years ago by James R. Barlow <[email protected]>
Suppress --pdf-renderer tesseract warning in Docker image

Since the corrected font is provided in the Docker image, there's no
reason to show the warning.

github.com/ocrmypdf/OCRmyPDF - d4ef3411e03a3b6f4349433d9b4da0e19002c230 authored almost 9 years ago by James R. Barlow <[email protected]>
Restore Dockerfile on local and probably on automated build as well

github.com/ocrmypdf/OCRmyPDF - 71d616e4139af81745ac7e1c5af3d12b839bf0ea authored almost 9 years ago by James R. Barlow <[email protected]>
Overwrite Tesseract 3.04 default pdf font with better pdf font

github.com/ocrmypdf/OCRmyPDF - fe651d1bf5cfd42f06a44791c576302fe1968ae0 authored almost 9 years ago by James R. Barlow <[email protected]>
Provide sharp2.ttf for Docker images

github.com/ocrmypdf/OCRmyPDF - 582ba8cfad5df51b3a3e32db8604355bb2bd2b0b authored almost 9 years ago by James R. Barlow <[email protected]>
Remove duplicate line from documentation

github.com/ocrmypdf/OCRmyPDF - d23291650a2d90c192d8e835bc0276b6b2fbf6b5 authored almost 9 years ago by James R. Barlow <[email protected]>
Remove redundant line from resources

github.com/ocrmypdf/OCRmyPDF - 812fd745b63ff8adbebb4c33c85729d6a59ec4b9 authored almost 9 years ago by James R. Barlow <[email protected]>
Remove old documentation about Pillow not linking jpeg, zlib

As of Pillow 3.0.0 this is fixed, so make Pillow 3 a requirement

github.com/ocrmypdf/OCRmyPDF - a87aa71d85443dc07404d0d7fdc6e298e6c95e9b authored almost 9 years ago by James R. Barlow <[email protected]>
Fix JPEG DPI: Pillow expects dpi=(x,y)

github.com/ocrmypdf/OCRmyPDF - 60b2eb14553eda314c85148b8f7f9ba2d60e4deb authored almost 9 years ago by James R. Barlow <[email protected]>
Work around Leptonica < 1.72 bug that breaks Travis

github.com/ocrmypdf/OCRmyPDF - ab3c1988c16951a0b7526ea0123f985690be8fa2 authored almost 9 years ago by James R. Barlow <[email protected]>
Travis again: are invalid correlation measurements a use-after-free?

Try explicitly casting the value to a float.

github.com/ocrmypdf/OCRmyPDF - ee5223eea808da89bc7e6eae424e499440bf6ea2 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix pytest-runner not understanding 'norecursedirs'

As discussed here
https://github.com/pytest-dev/pytest-runner/issues/7
and sort of
https://githu...

github.com/ocrmypdf/OCRmyPDF - edd2185268cdc1fca15710674ad3d10fbf92ee35 authored almost 9 years ago by James R. Barlow <[email protected]>
Travis: try replacing non-standard invocation of py.test

It seems the normal thing to wire up python setup.py test to invoke
the test suite rather than p...

github.com/ocrmypdf/OCRmyPDF - 35b1ca2be2d63b09c34946b6f209c9c92fcadfee authored almost 9 years ago by James R. Barlow <[email protected]>
Fix case of JPEG missing DPI field

github.com/ocrmypdf/OCRmyPDF - 71e493a810833586458eca3e18be5ac597f5f563 authored almost 9 years ago by James R. Barlow <[email protected]>
Travis: force compile leptonica?

github.com/ocrmypdf/OCRmyPDF - 6178e22e7feff895402bd9218da826ed6eebbab5 authored almost 9 years ago by James R. Barlow <[email protected]>
Make debug output more verbose on failure

github.com/ocrmypdf/OCRmyPDF - ef0aab060a9a8c58b0a3569b8d983d1206517b92 authored almost 9 years ago by James R. Barlow <[email protected]>
Travis: maybe it's just the missing __init__.py?

github.com/ocrmypdf/OCRmyPDF - d70ce61cfd660c8c952b739aee6f1ad031db1c06 authored almost 9 years ago by James R. Barlow <[email protected]>
Revert "Try moving leptonica build script, playing with wheels a bit"

This reverts commit ec2c6c312bc7e64c25b26563e9093d89ea1b9032.

github.com/ocrmypdf/OCRmyPDF - 8cd84afac8109fa367b6d22aeec14f16a81b5977 authored almost 9 years ago by James R. Barlow <[email protected]>
Try moving leptonica build script, playing with wheels a bit

github.com/ocrmypdf/OCRmyPDF - ec2c6c312bc7e64c25b26563e9093d89ea1b9032 authored almost 9 years ago by James R. Barlow <[email protected]>
Too soon, try again

github.com/ocrmypdf/OCRmyPDF - 3946bba3184f3a59ece01ecbd5c8421afce012ff authored almost 9 years ago by James R. Barlow <[email protected]>
Travis: are you creating _leptonica.py?

github.com/ocrmypdf/OCRmyPDF - 2ed0b78a7b42db7ce1fd3c6a4176448b387f3cd4 authored almost 9 years ago by James R. Barlow <[email protected]>
Does Travis need explicit install libffi-dev?

github.com/ocrmypdf/OCRmyPDF - ed346d032cdf388ebde4d3e03aff615ed15e8eb7 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix travis syntax error

github.com/ocrmypdf/OCRmyPDF - acd645f19299ffd839a1a9b76338adecf2bd261c authored almost 9 years ago by James R. Barlow <[email protected]>
Fiddle with travis, try to get better debug output

Essentially cffi failed somehow, not clear how

github.com/ocrmypdf/OCRmyPDF - 88433e4c340b28897676e5352a7951bfaebd1072 authored almost 9 years ago by James R. Barlow <[email protected]>
Update test resources to address files with unknown source

-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf...

github.com/ocrmypdf/OCRmyPDF - 1224af17809e730bf5a2f353342510dcc9bb963f authored almost 9 years ago by James R. Barlow <[email protected]>
Revise rotation tests in prep for adding a few more

github.com/ocrmypdf/OCRmyPDF - ab13342931ffd1ab64c45bf1644b137051395cec authored almost 9 years ago by James R. Barlow <[email protected]>
Test case: remove filename conflict

github.com/ocrmypdf/OCRmyPDF - d7913da4848e74acfaf9baef4d5a95e4c1732d1c authored almost 9 years ago by James R. Barlow <[email protected]>
Complain about older tesseracts that don't have sharp2.ttf installed

github.com/ocrmypdf/OCRmyPDF - c50e3f1329fe15e75785d3648578bc738ce615e6 authored almost 9 years ago by James R. Barlow <[email protected]>
Update release notes

github.com/ocrmypdf/OCRmyPDF - a62f86dbd74e81774ca559f05b3a883310602b44 authored almost 9 years ago by James R. Barlow <[email protected]>
Update the notes

github.com/ocrmypdf/OCRmyPDF - 33b88b18db411ebfa96e7ff25007db6397f93e23 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix image layer rotation for pages with nonzero crop boxes

github.com/ocrmypdf/OCRmyPDF - 7c691c21ab1faf8a06bcaf80151091d6ea33e4fe authored almost 9 years ago by James R. Barlow <[email protected]>
Partial fix for images not anchored to (0, 0)

github.com/ocrmypdf/OCRmyPDF - 4ec51729d86cca0a740e9b40ac73a98fa74e6636 authored almost 9 years ago by James R. Barlow <[email protected]>
Cleaner access to mediabox

github.com/ocrmypdf/OCRmyPDF - 07b41e479aabf955669095d63bfe9aaaa6a96096 authored almost 9 years ago by James R. Barlow <[email protected]>
DPI information not transferred automatically from PNG to JPEG

github.com/ocrmypdf/OCRmyPDF - 6510bcad1995f2eb6defe9cce9cc1b56372bad7e authored almost 9 years ago by James R. Barlow <[email protected]>
Better skewed image

github.com/ocrmypdf/OCRmyPDF - 265d2ce39bfa2132f5691e6080b8fab5c52f5987 authored almost 9 years ago by James R. Barlow <[email protected]>
Better logging output for autorotation

github.com/ocrmypdf/OCRmyPDF - 1928a64cae5eedc303e864f6f58e5704334411c4 authored almost 9 years ago by James R. Barlow <[email protected]>
leptonica: suppress debug output

github.com/ocrmypdf/OCRmyPDF - 11e575a5a320965f1668e83d64ec3289f877573c authored almost 9 years ago by James R. Barlow <[email protected]>
tesseract: unify logging function

github.com/ocrmypdf/OCRmyPDF - 7fbc0d6460835e88c8f1293096fb96e7450d86e2 authored almost 9 years ago by James R. Barlow <[email protected]>
unpaper is lousy at deskewing, so let leptonica do it

github.com/ocrmypdf/OCRmyPDF - 1ba8b1aa4bdf54d492897cfa9166936f73aeb995 authored almost 9 years ago by James R. Barlow <[email protected]>
Also include cardinal.pdf

github.com/ocrmypdf/OCRmyPDF - 3569c76c0f3bf25b7c7a321aa9fc968df12285ca authored almost 9 years ago by James R. Barlow <[email protected]>
Fix test_deskew for new Leptonica API

github.com/ocrmypdf/OCRmyPDF - 16c7ac25828ace2b96c326c18fb979ecb08459a5 authored almost 9 years ago by James R. Barlow <[email protected]>
Leptonica: classes are better

github.com/ocrmypdf/OCRmyPDF - 4ceb59215f8eee9e8cb462a6944985a7398aeb1d authored almost 9 years ago by James R. Barlow <[email protected]>
Introduce Leptonica class for Pix

github.com/ocrmypdf/OCRmyPDF - 2e6879ee515722930ad03eca65fbd0fde67e8ee3 authored almost 9 years ago by James R. Barlow <[email protected]>
Add rotate 180 correlation sanity check

github.com/ocrmypdf/OCRmyPDF - 66fc2e9d7d0212e15148154da384633500f31068 authored almost 9 years ago by James R. Barlow <[email protected]>
Shorten names of _make_input/output

github.com/ocrmypdf/OCRmyPDF - 2c7a6e574f6eeb6e61dbf196bba5fed94c9899a0 authored almost 9 years ago by James R. Barlow <[email protected]>
Check autorotate using leptonica correlation

github.com/ocrmypdf/OCRmyPDF - 78c3bf5dba498212ea47afd82349e51839c03d8d authored almost 9 years ago by James R. Barlow <[email protected]>
Cache wasn't enabled properly for test_autorotate

github.com/ocrmypdf/OCRmyPDF - 98c115e3bbfe8269a5b6da654eef1e2bacee61a3 authored almost 9 years ago by James R. Barlow <[email protected]>
Merge branch 'feature/leptdeskew' into feature/logging

Need leptonica for testing now, I think
# Conflicts:
# ocrmypdf/tesseract.py
# requirements.txt
...

github.com/ocrmypdf/OCRmyPDF - 2752bda80bd7d22e7a290ba1e960582d8f0930e2 authored almost 9 years ago by James R. Barlow <[email protected]>
Take a stab at writing test case for autorotate

github.com/ocrmypdf/OCRmyPDF - 7c0940609a1526ef4d6d6357ca343b4aa742bb59 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix test suite by running select_image_for_pdf unconditionally

The purpose of this change that caused the problem was a minor
optimization for the tesseract re...

github.com/ocrmypdf/OCRmyPDF - d30a879e2dd9d108f196b38ce5e556dde21c8c34 authored almost 9 years ago by James R. Barlow <[email protected]>
Update tesseract spoofing to cache orientation and script detection checks

No cache: 269 s
With cache: 144 s

test_oversample[tesseract] now fails, all others good

github.com/ocrmypdf/OCRmyPDF - b907234d5ccd9f642bed38dde3fbfef1a4007577 authored almost 9 years ago by James R. Barlow <[email protected]>
More logging improvements

github.com/ocrmypdf/OCRmyPDF - b0114c917499d319287bc678aa94cb3e33e07297 authored almost 9 years ago by James R. Barlow <[email protected]>
Restore invisibletext for normal output

github.com/ocrmypdf/OCRmyPDF - d2ba8c501f5e19db5790386909d54ede60c58cef authored almost 9 years ago by James R. Barlow <[email protected]>
Make logging output a lot more useful

github.com/ocrmypdf/OCRmyPDF - 6a7ed7d359042ada1898b86c4b3ff8ff18a0b873 authored almost 9 years ago by James R. Barlow <[email protected]>
Better: custom logging factory to avoid whatever ruffus is doing

github.com/ocrmypdf/OCRmyPDF - 6289afa1a60c66074f0a91c77734f66e8f462235 authored almost 9 years ago by James R. Barlow <[email protected]>
Return logging to a semblance of normalcy

github.com/ocrmypdf/OCRmyPDF - 9bb6fa04cb0e2cc64fcd15d291fa97b81ad9bb9a authored almost 9 years ago by James R. Barlow <[email protected]>
Render preview as .jpg instead of .png

Smaller file size of JPEG seems to help performance, although the
difference is only about 1%.

github.com/ocrmypdf/OCRmyPDF - afb6f6f5c9b9c550c003b6173f64c4061a7e4b11 authored almost 9 years ago by James R. Barlow <[email protected]>
Suppress debug message

github.com/ocrmypdf/OCRmyPDF - 8a69671dbd8de8a09bd23585e6fd6b4b112be3ae authored almost 9 years ago by James R. Barlow <[email protected]>
Make rotation optional (for now it's off, possibly should be on)

github.com/ocrmypdf/OCRmyPDF - 178aee4687243cdfaf24f75b13f3edbc7bfdc34b authored almost 9 years ago by James R. Barlow <[email protected]>
Tweak pipeline, allowing --pdf-renderer to use JPEGs instead of PNGs

github.com/ocrmypdf/OCRmyPDF - 8484caddfb7d47284d44cc6a1213579348bc78ce authored almost 9 years ago by James R. Barlow <[email protected]>
Cleanup auto-rotation

github.com/ocrmypdf/OCRmyPDF - 08313316de2934aca538d64e771a48b70a9844cb authored almost 9 years ago by James R. Barlow <[email protected]>
All four rotation directions working

github.com/ocrmypdf/OCRmyPDF - 1d0eca5c6366b5efc1a70ff8304579c73f505128 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix autorotate for some lossless cases

github.com/ocrmypdf/OCRmyPDF - fe89232a301a3ac3999bbbb593808df88ea39ba5 authored almost 9 years ago by James R. Barlow <[email protected]>
Implement autorotate (provided lossless reconstruction is disabled)

Works for a single page file, probably

Although arguably rotation is not quite lossless, and th...

github.com/ocrmypdf/OCRmyPDF - 4b51b521e24f8b42df6c188e5caea1b8c44fe147 authored almost 9 years ago by James R. Barlow <[email protected]>
tesseract: add command to access OSD values

github.com/ocrmypdf/OCRmyPDF - e9ec458304d1bba36a065a523d4d64d987044ff6 authored almost 9 years ago by James R. Barlow <[email protected]>
ghostscript: don't try to "help" autorotation

It uses text direction alone -- unreliable guide.

github.com/ocrmypdf/OCRmyPDF - 54b0ddd7878b67f33bf558830336552eb7940ee0 authored almost 9 years ago by James R. Barlow <[email protected]>
README: mention polyglot, fix container vs image

github.com/ocrmypdf/OCRmyPDF - 93bec22f9c369fc7e0f8724238e50df0649c4857 authored almost 9 years ago by jbarlow83 <[email protected]>
Fix img2pdf usage in test case (to make Travis CI happy again)

github.com/ocrmypdf/OCRmyPDF - 0dc96442d8ed0309d5baffd5fcd4215b36f1010e authored almost 9 years ago by James R. Barlow <[email protected]>