Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective -
Host: opensource -
https://opencollective.com/ocrmypdf
- Code: https://github.com/jbarlow83/OCRmyPDF
I'm not fully happy with this arrangement, as it effectively downloads
OCRmyPDF twice, not to me...
github.com/ocrmypdf/OCRmyPDF - 2d15c09cca214fa81dbc219b89c3c1414702a1f7 authored almost 9 years ago by James R. Barlow <[email protected]>
setuptools_scm barfs because it can't find the version, because Docker hub
retrieves the applica...
github.com/ocrmypdf/OCRmyPDF - 6fe32bbaf7089af06b756c04068a9b34b577893a authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4abb20390dc7f881f25e2aad9d337c296123d67f authored almost 9 years ago by James R. Barlow <[email protected]>
All tests pass when forced to rely on img2pdf, so seems okay
github.com/ocrmypdf/OCRmyPDF - daa3916430558c1923b2201bfc9cd6336043cf4c authored almost 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - e9b87cefccffc0afaf74f9531615a7617aee0e4a authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 60593b5ad300ca1791aca7f1b8be541ded714b9f authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f708b11ea49162ac6465d1a5db2baf173a521a7a authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7982f58b2ea053cd7b55795821a6d890454b8c56 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e805c1908a8cbd6c0d01f2ca22c03814e3fc465c authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - cb3ba8e97391f7b20363dafea3c5dbfd0ccb645b authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 344fc40cbcb6a0349cd853c864d905d5b1592ce9 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7e5c37137b44fc4ded8f32dbe55a554b3bc10357 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 1aae11714b243ad47dd796af06c06f9f2daa2ca6 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d82f14a7aabee34ec9a863163725cac71099f379 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4b65e0b09384621df94c2cee0676953593386574 authored almost 9 years ago by James R. Barlow <[email protected]>
Now checks input image to ensure the implied page size of its .hocr file
matches the rest of the...
github.com/ocrmypdf/OCRmyPDF - 8674c9fb2083dc93fe2473b03cbd24dffb45df9a authored almost 9 years ago by James R. Barlow <[email protected]>
Fix the notes
github.com/ocrmypdf/OCRmyPDF - ccfbb54e8c26784e438ba2fcac2179f21e7d857b authored almost 9 years ago by jbarlow83 <[email protected]>github.com/ocrmypdf/OCRmyPDF - 9893ebf889c066537796ca33e3da6410d200ae4e authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 303eb3e93aa876019ea6dd33e3b345725acaa41f authored almost 9 years ago by James R. Barlow <[email protected]>
fix shebang in hocrtransform.py
github.com/ocrmypdf/OCRmyPDF - ca546d70e5bff9e9b115371f7813f3c326822bd8 authored almost 9 years ago by jbarlow83 <[email protected]>github.com/ocrmypdf/OCRmyPDF - 6a5ea2d64ae825b339d31bffafb3f4b77421cbb7 authored almost 9 years ago by Sean Whitton <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ec3d92ad8e71971f606a45d476324447970ef4c4 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 66a095d7de3d440379b69d428bd3d0d39c701891 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 411981efbcba27175dbc928f5ce528e959782ce0 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 350ad5210e075f2b9496931c26c2fdd495db8514 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f3b588764ee0779a45be4ab653f4dcb6e444e15f authored almost 9 years ago by James R. Barlow <[email protected]>
unpaper doesn't seem to be good at deskewing. It fails on test case
with a lot of italics. I thi...
github.com/ocrmypdf/OCRmyPDF - bacbcba58a0e33d68f941163c0d2678c71c15978 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 52e8aa434fd1029e0f6d3b19716935a11cb226e9 authored almost 9 years ago by James R. Barlow <[email protected]>
Small price to pay.
github.com/ocrmypdf/OCRmyPDF - 37c508f3f884e7203d6053ea0d319cf2e5eaf8b6 authored almost 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 26e36422cc66e6ebff7dfacce116ac5deb661c5c authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f82cb002bcfa4f55452575096e93eaeaef85f2aa authored almost 9 years ago by James R. Barlow <[email protected]>
Used to include a copy of the parent dir's name.
github.com/ocrmypdf/OCRmyPDF - c1eb047a4b76aeeca1bb4983c99f8967e65119e5 authored almost 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 626ca18f5c06188fb3f54310be299f3bc5f5981f authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9058dedfbefa5ca35904cbc1b11c021ca20c6474 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a0952bfca39f8a1550baae54ba4e0e18f9e42290 authored almost 9 years ago by James R. Barlow <[email protected]>
Broke Travis
github.com/ocrmypdf/OCRmyPDF - 354e61946e0ad7ec090189c609ebdb99824e1973 authored almost 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - fd6d1d748a419086849d2a54d8c6a2cb35895502 authored almost 9 years ago by James R. Barlow <[email protected]>
Add -f to force generation of the background image at the desired
oversample resolution. Our ne...
github.com/ocrmypdf/OCRmyPDF - fc0479f1100a3baea8726b477546d5bdf9c798c3 authored almost 9 years ago by James R. Barlow <[email protected]>
5 failed, 28 passed
failures:
test_oversample[hocr], test_skip_ocr, test_skip_big, test_maximum...
github.com/ocrmypdf/OCRmyPDF - dc0fb25e64d8387e454988ba492a1693a8ea6a12 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f3e04cce56cd153d3f888d25269e4b7471619910 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7067110308bbcbca4cbb26249cb9313b2d52fb37 authored almost 9 years ago by James R. Barlow <[email protected]>
Works, does not account for changes to clean/deskew, etc.
Surprisingly, it works. PyPDF2 fixes s...
github.com/ocrmypdf/OCRmyPDF - 2fa8366632db2db54ec38212ba040da89ac31422 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c368c51badd7352a0b01a7cc9107e88ee5a6feda authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7c558b37133a9cb18f0966165a8b361036c1286c authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8d323ae5102569640d3f55fca228f4154037b22c authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 3b53e9adac014aa46ab7a2f378632feef5271755 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 074c1d71b49a49f025efed1f76c353000bab29a9 authored almost 9 years ago by James R. Barlow <[email protected]>
Was splitting each argument to --tesseract-config into a list of single
character strings
Ruffus treats omitted parameter as -j1. For our purposes it makes more
sense for omitting the pa...
# Conflicts:
# RELEASE_NOTES.rst
github.com/ocrmypdf/OCRmyPDF - 12bc58b5b63a83a4a4988070ac4114471603bd14 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6af0815681ea9cd22532e73ad9feec24cf9e7df5 authored almost 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 66c2b9b78e22db3d81fbc70693b899992bfb4242 authored almost 9 years ago by James R. Barlow <[email protected]>
Make it a special image
github.com/ocrmypdf/OCRmyPDF - d03c056cb11f8ec9cc4b01c9f77c220ff5b7a3fe authored about 9 years ago by James R. Barlow <[email protected]>
Fortunately unpaper now exists as binary package, eliminating the need
to install all of the bui...
Also update ignore files
github.com/ocrmypdf/OCRmyPDF - a64c7dbe99946ea7c6e1abc4ef1acfb901beb8f4 authored about 9 years ago by James R. Barlow <[email protected]>
Because we don't really use ruffus checkpoint feature, putting the
database in a permanent locat...
github.com/ocrmypdf/OCRmyPDF - 424b4b33b15b923d4d55929a8205f0570a6ce022 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e510f89792d3fbe4def6832f314ec6263a2d09c1 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 49cd6cc619fc7ce3a1be28a898af06be3cc2eb69 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9aa3d340d46cd1fb1f57a937848bef01c8771399 authored about 9 years ago by James R. Barlow <[email protected]>
This reduces total execution time to 164s on my machine, down from
about double that.
github.com/ocrmypdf/OCRmyPDF - 9ec4aa039dbf512ea91ca4f7dac0d48ca2b122ee authored about 9 years ago by James R. Barlow <[email protected]>
Where getting OCR doesn't matter
github.com/ocrmypdf/OCRmyPDF - ecebe2f24b5ca5e5a9cb2fa1ce2ebc8ee58858f7 authored about 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 7313a77c2a0f8d0e20b61a50b55a13fd18420498 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 45113676a3c635c25f1b431d84a9fdd6b1dc6ff0 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 102bd07019ee494a02436bfdbfaa9360470d717f authored about 9 years ago by James R. Barlow <[email protected]>
And get rid of the messy binary replacement spoofing
github.com/ocrmypdf/OCRmyPDF - 9622e31da9e783e0b7b936ff12b93ae452487f92 authored about 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 1731ce2a44b6c62674e8a29182542cf9f4494093 authored about 9 years ago by James R. Barlow <[email protected]>
qpdf won so hard it wasn't funny, even though it must be called once
per page to do the job. Per...
github.com/ocrmypdf/OCRmyPDF - 133357779aa2e947a98ab72bd6da79bcb21d7980 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5d8167b232eb2fb64d01cb61b75af77f16d1908c authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e76ae8c46c44400adb81b9e889372d72cfe54d0a authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 53a7c0e66892acff8afeee41fd4c846738ca160f authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4ca243e4900dcee8e815c76516ce2306e337ce93 authored about 9 years ago by James R. Barlow <[email protected]>
Don't exit when qpdf repairs the file successfully but displays warning
github.com/ocrmypdf/OCRmyPDF - 9f374461559460527e47237323e511123f31b6b0 authored about 9 years ago by jbarlow83 <[email protected]>github.com/ocrmypdf/OCRmyPDF - d7c7559b05d49a461c5a41a21fa7082ea827c77e authored about 9 years ago by Shem Pasamba <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b2b66d134482f559922c072f2bddd7e0948207cc authored about 9 years ago by Shem Pasamba <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5d111a3c04d1c1fb6d0f1e8cf5a7475b66252539 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 10416f847f20968af2e542de6512da2fa3baab5f authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 79b3472b26d32183664bfc8d91f79c2259f36e76 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f1b2f1ae0857ff03cdad170e566eb7857ca35c45 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ee7d97ae8c50187ddf35a4f95e4a57f4e7af8e89 authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7d9f473bb1b3dcf39c1f5142b09a917728739ddd authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e77a5e5e75dba89322e9437864ddabd010d09e5f authored about 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6ab19af1220fa2493c6028c701e2a0e68f11515d authored about 9 years ago by James R. Barlow <[email protected]>
Not as good finding a general way to deal with ruffus exceptions, but
better than nil.
github.com/ocrmypdf/OCRmyPDF - acb31abe86bc3fd7b55e81557f216ed50237bee5 authored about 9 years ago by James R. Barlow <[email protected]>
Tess 3.03's has various quality problems like wrong DPI that are fixed
in Tess 3.04. Idea here i...
It appears that extractText() does not find all text. At a glance it
may be that Tesseract's PDF...
github.com/ocrmypdf/OCRmyPDF - d6124c17878a99f8ed7dde24a7c83c35cca79899 authored about 9 years ago by James R. Barlow <[email protected]>
with reference to Tess version and settings
github.com/ocrmypdf/OCRmyPDF - 80d89b54208c911f56a3949a7f07b8f5cc56e7df authored about 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 74059eecf1240961e64d84a2a34ae95091a369cf authored about 9 years ago by James R. Barlow <[email protected]>