Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective -
Host: opensource -
https://opencollective.com/ocrmypdf
- Code: https://github.com/jbarlow83/OCRmyPDF
github.com/ocrmypdf/OCRmyPDF - 3d3b3abc1bccf05eefb656309543312c19e3fb47 authored almost 8 years ago by James R. Barlow <[email protected]>
Distortion mainly affected —force-ocr
github.com/ocrmypdf/OCRmyPDF - 7cd2770a13653ec647279f44775c80c67ab8ab41 authored almost 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 7b94129d9eb8b427bc9fd8870839a33a74b29c08 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d1a0065ef878712ac4d79a6da758eec051c721b9 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5a817370fdf3431fb9b05b1369081067307941de authored almost 8 years ago by James R. Barlow <[email protected]>
Installing Tess4 PPA over Tess3 proved too much pain, so sever the link
between this and the jba...
This test has to work to ensure spoof/tesseract_cache.py has a writable
directory to put cache i...
github.com/ocrmypdf/OCRmyPDF - c9a83afad69c287db7183daf7de859c397586900 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5e14274f10a8df284bcb591dc4b58697d1d9cd61 authored almost 8 years ago by James R. Barlow <[email protected]>
[ci skip]
github.com/ocrmypdf/OCRmyPDF - 167470b4bdd0051f94ad3eade387c933b1a898a5 authored almost 8 years ago by James R. Barlow <[email protected]>[skip ci]
github.com/ocrmypdf/OCRmyPDF - f06d3c2ec2f19e4ca76c6604af06eb3769fbff8e authored almost 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 74c99a8a774d790f51afc3b3042ab7105eebfdf9 authored almost 8 years ago by James R. Barlow <[email protected]>
-only on master branch
-only Python 3.6 build uploads, so the others don’t compete
-don’t upload...
Debian now has a few disadvantages:
-there is no convenient PPA for Debian tesseract 4.0, but th...
github.com/ocrmypdf/OCRmyPDF - f28bc25dc0e5be8c402f5a96bc2b110724d78454 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a0657ad937839df77d297b2a7b82392c1a1bc9ff authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5b8d88af4ce7f1d68e81774963edd0bcae8715c4 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - fa82b5034075a918eb0d77167d888b8069c5b8b2 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 005216bc5712d589db61f752519f207a14ff8b14 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e748fdcf6f0a944b289d7bb4fb018c026ee5de8f authored almost 8 years ago by James R. Barlow <[email protected]>
This parameter has existed for along time but never really got any
attention.
github.com/ocrmypdf/OCRmyPDF - ea0dd99d0b1770e07a724e02b390ba6aabc6b092 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e0cc67afaedc02e30ee765a53481da7e83eb1e51 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 04f9cbe3642a745146608e6db7a51b03de591bf4 authored almost 8 years ago by James R. Barlow <[email protected]>
At least on macOS with my quadcore performance improves with two
tesseracts in parallel (20% gai...
github.com/ocrmypdf/OCRmyPDF - a6feacc8101ee875c5b042c2a64c1749f9210877 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 65e4b1672fdb265b7c9bf2cc0c610eff565647d5 authored almost 8 years ago by James R. Barlow <[email protected]>
This reverts commit 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c.
github.com/ocrmypdf/OCRmyPDF - 46cc0dd19098e764a3129bd0f09f5865b1839316 authored almost 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 678b9fb603e2ce1bc12a34e14a715dcce5fc4a9c authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 49ab0c1f0b7da2d5860ba9f5fa56266886b7fb07 authored almost 8 years ago by James R. Barlow <[email protected]>
Perhaps this works around the pip/setup.py asymmetry that broke the
4.4 release.
github.com/ocrmypdf/OCRmyPDF - e4ce1dae3549f46309f62a66a8542cab8d73ee22 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 179b812acbaceb1dfc91844f6640356872eca219 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7f170517ec0aa2174effb151ebb828aa455b3424 authored almost 8 years ago by jbarlow83 <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5480da4f0478be0f90775c4f2994de9adfa94c1e authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9a15a4db10555756880b8e4d5f1de04424130185 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 55aeaec293150c936fac5c41743408487086df81 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f6df1fb40cb8eee177615d481c2d4e9fa735c190 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b889a89c36dc4d6576441a28fca35d7691bfb9a2 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 1976dc6f30ba126332482b325bab230066f8e82b authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e864c65d2657a40a1731d5c6f3d370f617b8a824 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 02fba02d312011b977a479d4bec47936166aa055 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - fb9e7c82f681236d81824f9e806d27d569107e3a authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 77d31bf646c6ee3e50b6e487cc18f2cac93fcbcd authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 29ca799bcf7310ce67014c6c7085381c08393352 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 467b7f016337e014bdb45cacc7c8ee186209aede authored almost 8 years ago by James R. Barlow <[email protected]>
There’s no reason text-only PDF shouldn’t become the default for
tesseract 4.
github.com/ocrmypdf/OCRmyPDF - ac40426971e436050863274ad3d6402d12ecac77 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7acfaf6d3476b43683d3cf63adcb414749fcc384 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 99e47c9c046b23b352e7f82a9cbc06ba6505c59c authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d7904e2251a8deb61dadca9308b059045086a81e authored almost 8 years ago by James R. Barlow <[email protected]>
# Conflicts:
# dev_requirements.txt
# requirements.txt
github.com/ocrmypdf/OCRmyPDF - 3f9adcd5e0a47bd3b20c39ee23a9b56420ed3090 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6cc5135d2d7100684b4d3741b7be0927c7698159 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d4c72b371f4e0c764aeef77e9734272cf34581f4 authored almost 8 years ago by James R. Barlow <[email protected]>
It seems that Tesseract v4 on a platform with OpenMP working correctly
while perform poorly with...
At the moment v4 accepts both but who knows if this will get dropped,
so do as document for each...
github.com/ocrmypdf/OCRmyPDF - 6e27ecd2b98dff7569a6fccfa016cc1b33c156dc authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 482692396e78caf38bae923cd322498a49f32ac7 authored almost 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c48acf165ae557076de86bc3d9e8e5a5dd309f2a authored about 8 years ago by James R. Barlow <[email protected]>
Revert to exactly what the previous passing build specified.
github.com/ocrmypdf/OCRmyPDF - 9e004c3ec022ae94570a9447b8892ad52b63e014 authored about 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 7be4e9c9198630cf75aecf1233c453dd6e07671b authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5ec38a4bed1d263356e3590160321abd07469457 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f246779b8e489a14b23972eb800b97ed4ad5d166 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a7d8cdf061a99da7f9358b2176e3cd2d8ce9001c authored about 8 years ago by James R. Barlow <[email protected]>
We already know the number of pages at this stage.
github.com/ocrmypdf/OCRmyPDF - 620745c8121c6bd5914c4dccd2f920ccd6ffa4d4 authored about 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - b8767e5ba99c9dbb7137f65535319c960c942c32 authored about 8 years ago by James R. Barlow <[email protected]>
Because ruffus doesn’t handle exceptions well I tended to call sys.exit
to make sure we got out ...
github.com/ocrmypdf/OCRmyPDF - 4ee9658e976d533e25d31622295f5557bbae1f90 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - dd1b84e7bae6fb6904278e22f6b27167245479df authored about 8 years ago by James R. Barlow <[email protected]>
This leaves __main__.py to handle command line arguments while pipeline.py
runs the pipeline - m...
github.com/ocrmypdf/OCRmyPDF - f0f889440be7df6da3db5f917e4fc29adebe9ca2 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - cc9ceaeb742e4cc449c3a82a1dcb3636eb9dfe69 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ad2fa8d1d7dd4dd04b7353181a0d83076c90c4bd authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - adc1580742a320ed4cf70026cfa7dd5a19ce8fb9 authored about 8 years ago by James R. Barlow <[email protected]>
Redirect stderr->stdout to hopefully make GS output easier to work with
overall, since the previ...
And add new test case for this.
github.com/ocrmypdf/OCRmyPDF - e57aa0eee2b89f9fd08a6f54af8f3697d111b4fa authored about 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 1ae1d116c7f1bd98040fe12002dd26a0d3260149 authored about 8 years ago by James R. Barlow <[email protected]>
And add new test case for this.
github.com/ocrmypdf/OCRmyPDF - 097a69d07f1276237de7dded4dcd7b6d29fd29b1 authored about 8 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - a81ce87a50f219a3ec86b06b0c6df8e14b977b8f authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 88be0d43a078c25d87d3828a9b820127eb03159a authored about 8 years ago by James R. Barlow <[email protected]>
As of 3.1.1, our minimum version, these codecs are now required by
default for a successful inst...
-update requirements.txt and dev_requirements.txt to more recent version
-setup.py updated to Ub...
github.com/ocrmypdf/OCRmyPDF - be0fa35d14b0430c9a594a11d4abba3def114ef8 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9f51ed9d0114e18618f7a1d75e3b096b8c2c1260 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 731e6792c7b9141442e64b38b3ff002caceea960 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c35ec0b4aaec4c4c6c84bf0e38e54155912f4b21 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 03aaf575dc8c94877f528f59f2b2bc7fa26fbd64 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9a060579bac3ede8c259b05680ece08fdade6c99 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d40a5c4f7a5edd25cc986a60cbce44c7f7e495ec authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 21f7dc337774ed453333d4bd39d68db9052db68b authored about 8 years ago by James R. Barlow <[email protected]>
Using a context manager to guard the pdfinfo list makes the lock
unnecessary. (Although it was p...
Never really investigated the reason why ruffus returns a mutex to go
along with its logger. It ...
github.com/ocrmypdf/OCRmyPDF - 2c5437135cc46dd9c325957eb6f9d8f764fc58cc authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 444da025230923322e8edd2fe4a075caaf244059 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 00e8af2381f9c3641a13bc1a42854f56cccf7e2e authored about 8 years ago by James R. Barlow <[email protected]>
I experimented with the idea of using asyncio-based processing but
realized that that does not s...
github.com/ocrmypdf/OCRmyPDF - de939951d49397d9bf2ce3b4cc4eeacc524c20a4 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7725d16a26eb7a25540110e80636ca1a87a725f2 authored about 8 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8a74408d83eccaf1c3937f58ff99dfd9226bd0e9 authored about 8 years ago by James R. Barlow <[email protected]>