Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective -
Host: opensource -
https://opencollective.com/ocrmypdf
- Code: https://github.com/jbarlow83/OCRmyPDF
github.com/ocrmypdf/OCRmyPDF - 6e6f918630bba7077ba9a50d75a138767422bce7 authored over 9 years ago by jbarlow83 <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 46338122461b510add63fd83c39693ff96a52020 authored over 9 years ago by jbarlow83 <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 14bd1555aa2f5b214e662b036fe8774236aa6d3c authored over 9 years ago by jbarlow83 <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b9d7687fa096cca819adcc1800cc652238857af2 authored over 9 years ago by James R. Barlow <[email protected]>
# Conflicts:
# RELEASE_NOTES.md
# src/config.sh
# src/hocrTransform.py
# src/ocrPage.sh
github.com/ocrmypdf/OCRmyPDF - 9e0c443c2f2d9923feeb05c3e5c0778bf1c209d6 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 60832152b1d25698ef8d9b7de7dc96a8256a77de authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6a160d22fe4a5712156013fd6aeace9fc5ed3fe7 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e35526192ce0de18713a75b6bc20b1cf01ee6ad5 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - bea57bdded53ddd117740f289861e45b37dcbcd4 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 2a9da225e4636c0b5b81776f4dfb5a8f99380883 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a3f37de9b5f24323dfc34e17d60a302b050b5ed9 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 606416095389c170d43bdb1962f9bc1423c7c6b9 authored over 9 years ago by James R. Barlow <[email protected]>
Although the real issue was that the ruffus pipeline cannot be executed
twice in the same proces...
Specifically it trips over the need to reimport ocrmypdf.main. That in
turn raises questions ab...
github.com/ocrmypdf/OCRmyPDF - 587fa63c8e8e0364c33e4384a1fad8794bb2f2f3 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b40eec4cb0b07d30b8dec32b6ebfbe1efc6c9559 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7bcd48c26924b79d988248db4ac1890c209f1597 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 2e7cd52c0f7002f9fa17003b97e7465c5c02dd3e authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 77d4cb367e14b7f86371e28e8795a28989084448 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 2c45c5abc63d29cc91b8caa956163a6ee26b6ec0 authored over 9 years ago by James R. Barlow <[email protected]>
It's much better a rendering text baselines than hocr and seems to
produce small file sizes, so ...
github.com/ocrmypdf/OCRmyPDF - 03f7c9bf07aaa381cc752e4ddd34cad2b6322b59 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d5f4862749eba85a1d309a676c5551f269f7d469 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8aced0b6d3c5d003d11a309651ed24ac1b61e2e0 authored over 9 years ago by James R. Barlow <[email protected]>
...except that Ghostscript will sometimes turn out of line images into
inline images on its own,...
Originally it had a smaller image centred in a page, which is not quite
supported.
github.com/ocrmypdf/OCRmyPDF - 30da4fc569ac3f8ca68cc32510b1b5cc10384bd1 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 2c1b5e100b79d7c620a21cdef75e9741e6b4159e authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 3684f278ed6b00d51bf2177eb3369387a07bcf98 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6c3cb6acba9856bd5af312537cdb6875c06d5b77 authored over 9 years ago by James R. Barlow <[email protected]>
Github supports both, and PyPI expects .rst files, so use .rst and make
everyone happy.
Auto-co...
github.com/ocrmypdf/OCRmyPDF - b98ba8d17451e1a89a4fa710abcc1fd53d63077e authored over 9 years ago by James R. Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - d3088829af87192719077f93abc80c98374bab17 authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9aaaba17149600c787217e444d4428b8b075ddfd authored over 9 years ago by James R. Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9adb0d696f6d0112bca8fe7ae6376ed1048bcd12 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c270f1ba5fd14d41f62524fcbb4abbf1c0972d50 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7b255b575abb0fbd6f47c766cb0892ee9d2ab532 authored over 9 years ago by Jim Barlow <[email protected]>
What a pain getting Unicode right, but there it is.
I cannot find anything to confirm that it i...
github.com/ocrmypdf/OCRmyPDF - d7a9f3a2ab4622410b50afb9b948a533e5f9d69e authored over 9 years ago by Jim Barlow <[email protected]>This works for ASCII only; will do Unicode version.
github.com/ocrmypdf/OCRmyPDF - abf2e7e9bb2a62052ae6531100169f3df91f84ef authored over 9 years ago by Jim Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 72e5fa9ba04900214545bc21602fc09625214e03 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 32c1078d2cd721c4ea441cbd6978c997ce5551d0 authored over 9 years ago by Jim Barlow <[email protected]>
@split is for "1 to many" operations, so it's the right tool for this
case.
github.com/ocrmypdf/OCRmyPDF - 42cd683ec072275a36b125191c18a362a384c26a authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 151eb0537751c32ce5d5144e38607f14f30fb40c authored over 9 years ago by Jim Barlow <[email protected]>
Little point to this feature - on most platforms the environment
variable can be overridden if d...
github.com/ocrmypdf/OCRmyPDF - 5ce544289fd206485d8f8f1b0ad75e0cf2e11f46 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 77bd35c3c7d91456c612db68d760b9e16b1e90fe authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 0c5c208db0ac18fbf309bb6a3e065adcfd4a30f4 authored over 9 years ago by Jim Barlow <[email protected]>
Simplifies the logic - one deals with all images, the other details
with an image and .hocr. Als...
github.com/ocrmypdf/OCRmyPDF - 9f90b5cb0a31927617e23728ece2e162cadf09df authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5adff94545ea2df71da5932260fc7b22456f2547 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - aa2baabfa9914fc06276f8a5c6dac1fa8404c1c8 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 75c2b23efcf32a0e291d08bd405ae101e7ea744f authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6451017962a2b1c83d7906349193f3f9333ce9c8 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 0f857a6a3459973bc8f0f688ef947523f81b69e5 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7638a88a6a32e59f48bfd8ff474ba0cfbf35470c authored over 9 years ago by Jim Barlow <[email protected]>
Ghostscript is more reliable than Poppler's pdftoppm renderer. gs is
also a hard dependency, as ...
github.com/ocrmypdf/OCRmyPDF - 587569fcb6a302af61e8cb3160c1719fb5f0a5b1 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8c0dc9a06da9a9fac10a220570fcce22dacd3136 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 289e4025ad61df4fbdec2ff2d340b39b4b38cb17 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5476eafe4ca4046fc38a02894bb151b227f3bb99 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - df32f283cd01f713d9b8c59a440aa26371d6b2fc authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 68ecaac9cca59765b6b6e6c31429d4850c500463 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - cffd4623ca77d49c54e9f0c827bb9b69c9ac7bd9 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6dc2782e806d4a7d5df7032550a44d15ad1a7560 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5df187c086ad306347b5e221dc6ee0bed1aea231 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 7fd172e41e5f277f8dffdced3dcb1bc991f42b42 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 619528a1b57ee6e765d0d94c7f5b4f7e7d4ba88d authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 596d468c1457eb45f532d0a36e96a06d333ca4cf authored over 9 years ago by Jim Barlow <[email protected]>
index 68d1591..95afa8f 100755
--- a/src/ocrmypdf.py
+++ b/src/ocrmypdf.py
@@ -24,6 +24,7 @@ impor...
github.com/ocrmypdf/OCRmyPDF - 33731a686448768ef328e75a3ce8a73d0679ab42 authored over 9 years ago by Jim Barlow <[email protected]>
Mainly workaround lack of @split(...output_dir) in ruffus
github.com/ocrmypdf/OCRmyPDF - 0c36cd2e24bb48b296cc6cfbcc29237f00a90a10 authored over 9 years ago by Jim Barlow <[email protected]>github.com/ocrmypdf/OCRmyPDF - 5cef1be26d4b9b047d711bb59d242f96d5908ea8 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - e89f482c3dc09c654317ca42bff51dfb23f835c6 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - fe3e40305dcd3976d9da2f9e4938639d8ddcd1ef authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a92b5ceb6b944a285e2c1d16a68f874998c2c659 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 0e7e7d843794dfd6266ee70332fa4f2fd60524e7 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f47fa98f3343166feca0ad50ec363c28038475b8 authored over 9 years ago by Jim Barlow <[email protected]>
The immediate reason for doing this is that (newer?) versions of parse()
seem to choke on the pa...
github.com/ocrmypdf/OCRmyPDF - b2168e11db59d5c73a1e1fb059ef9e5e589b1560 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6d5d8be70897c31526512d2f73740340f6a97d62 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ce2dbdf372ec711f6c134914363e035943f5cbb8 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ec8a35a7a676dc32d8e2563c3d8adb6bc2639eee authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f6577c22c3c2de8cf6ba818e369699cb47dbdb58 authored over 9 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 43d6c030930ca14eeb09798b7a33b3dec6a32e53 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 1870f116bbb0fea87c7e5d64046dd53dad561860 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8b87def013489c7f1817ba0f81762ff5db3cc413 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - de599d97b5eb97854e39fe3660314af847f6824b authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5d7e6b45c4b0634ba56d344213b6341f1dca8575 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c6091bcfe18bd3e32309118ea8716514af3b0336 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 466a8a13186bf7f49c25bb420cca429da8a6ea54 authored almost 10 years ago by Jim Barlow <[email protected]>
It appears to be very fragile due to weaknesses in PyPDF. Better
option is probably to use pdftk...
Prior to this change, hocrtransform would render printable text (black
on white) and then a full...
github.com/ocrmypdf/OCRmyPDF - bf114bb1883c2cf0d3ca6312f62a003fee685e3c authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b8eed2f8612c9fbbab15e3fe5e2b0543043bef7f authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ccb1e347be58f03eb73cb02a5140a52945893967 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 8698974f11830b73fafc7264a0b2f0d36388761b authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f2c79c4341f9e09cf6b7c887a0746aac61bb91c4 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4966d1346b0f9e633dfed26ae49466a4861dc98f authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4a9337f757b25dd767217cb26b1e50d1d70a879f authored almost 10 years ago by Jim Barlow <[email protected]>