Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective -
Host: opensource -
https://opencollective.com/ocrmypdf
- Code: https://github.com/jbarlow83/OCRmyPDF
github.com/ocrmypdf/OCRmyPDF - db311fb6a2100484e0f384473dacfca57dcc6748 authored almost 10 years ago by Jim Barlow <[email protected]>
As documented, Tesseract does not escape the filename when inserting it
into .hocr, potentially ...
github.com/ocrmypdf/OCRmyPDF - 52dc74d3cee9c82fccd2093dba5b5cd43c705bec authored almost 10 years ago by Jim Barlow <[email protected]>
Of course, this introduces recompression artifacts, and is unnecessary
if no options are given t...
github.com/ocrmypdf/OCRmyPDF - 638c6db05de22495eb85bc904a353adefec836bc authored almost 10 years ago by Jim Barlow <[email protected]>
Ghostscript has the clunkiest imaginable syntax, obtuse documentation,
quirky behavior, and poor...
The flag -dUseCIEColor is now deprecated, as it invokes the old engine
which introduces color er...
github.com/ocrmypdf/OCRmyPDF - 4d88e64774b88002d1535661ffd196b7910e4ad2 authored almost 10 years ago by Jim Barlow <[email protected]>
It doesn't make much sense to do anything with an all vector page
except extract the page unmodi...
github.com/ocrmypdf/OCRmyPDF - 40058e99e0550311eded97e8b12822064996a680 authored almost 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - bece4c3e024c91d8bb35d5d4d334a07cf848db6e authored almost 10 years ago by Jim Barlow <[email protected]>
It appears to be possible to have a PDF with an embedded font that is
either unused or used only...
github.com/ocrmypdf/OCRmyPDF - dc2a4ab04494d80ca99b052120b2d0d739ff33d8 authored almost 10 years ago by Jim Barlow <[email protected]>
Appears to be necessary to disable each state of the pipeline that is
inactive, not just initial...
github.com/ocrmypdf/OCRmyPDF - 69ce6ff7b5f4d1199a31566dbd97d7dbd0b6e3ec authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 32ba50b8dca1eb5555b557549e8bba1ae22207cb authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 36aca45f35260297fe4f626a01f4c1353f1b1ad7 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 925290342d29564111a491ba61adb50fe371dcb6 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 4dc0370c57f1c590cf62370984e4922231c8acde authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b92f8e43f22040bc79755034798d809e980450d8 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 22b0733a1df0abfeb0badeb86ea4e762c06c1295 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6021684ab6b43ad564342454284143360da4d9da authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f4b1d0cdfe516953ad45c2f5963dfc084513b031 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d0d804862102df7140ee16975014bf4858be9072 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - cfd119325dc1b3e01a42d1cb2b251b16763eba1e authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ad30833ffca8bb6f0a151f8b8bc5075f13acb4d9 authored about 10 years ago by Jim Barlow <[email protected]>
pdftoppm in recent versions (0.26.4,5) seems to be incapable of
producing valid TIFFs, so have i...
convert .pnm -deskew <...> .pnm seems to have a bug that produces an
invalid .pnm file which lat...
github.com/ocrmypdf/OCRmyPDF - 017bc1f25214fc94aed7fe30400c2e927b527916 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - bcd67c009d4cb63d8e4b2b51234bdcc663ea5722 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 635358884e7e1f1f03ee9564e090801be7b8a72d authored about 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 2f6cfafdfc64713e99b6ee7ec78961b9b784ec77 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 25234fa30bbf53ae8494057586cfcadc72a78822 authored about 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5b173418041b9534a693c1f348bf427427cf8011 authored about 10 years ago by fritz-hh <[email protected]>
Exit if the output path points to a folder
Exit if the output path point to an existing file
github.com/ocrmypdf/OCRmyPDF - e1f122097034861a564dc33fd39c08d409f9ef3e authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5855bcd1fe5cb71ac80fe7f57ee3f461dda5b098 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a14af5b9eeea786c9c1ca7a785d4f5c68a051c4f authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - f11c03750e3200cf0a100aa2638d3ee58314b0f6 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - ea5cfa40c181663ad80d38acef8cd2ba2d6cf6c4 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - c562754d8148b6fad91529a47dfef600d0a6de15 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 90d892512af4f36c21bb77b3156b51d8c6fad9dd authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 9c6fedb15b99b9714a540c76a2102b5cc0828b77 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 3a7175115f6b194ad850fb14e854dd62521faca0 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 98c41f322386166453eb2b51d02490c328801bd9 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - d101e96e1665f6a20348c32726b0aa2a6fa7bf21 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - a446b6c4407e3b9c67b0bae1a7ce187d130e981e authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b1fec0f1b19854098496e60209f74fda784a15dc authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 1dfdc93745224c98aedd1785ef1930a1f93713c9 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 6c5ee4095cf379bcbe844efe29e34882652fd3d3 authored over 10 years ago by fritz-hh <[email protected]>
- Introduce -s option to no ocr pages containing fonts
- Solve issue with -f and -s if -C is not...
github.com/ocrmypdf/OCRmyPDF - 2612105d3213b7b6a1ab4ee74450945a4f9cf755 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 954fe13f542c9e743749b9b94dc2fa7b3fd6cfe3 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - bb5a00685e29bb3abe69ba4479eb28b9b752bfd3 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - dabbddb04eea4754bf3f350789c1d5e392141efa authored over 10 years ago by Jim Barlow <[email protected]>
Python does not map the expression to its return code automatically, so
this line returns succes...
remove patch that was required for versions of reportlab <3.0 (fixed in
3.0 now)
patch was neces...
github.com/ocrmypdf/OCRmyPDF - fccfb4589e93f2209a7f326528ce9733fd62a16f authored over 10 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5384c980137c026613508e47030813eec2cacdb3 authored over 10 years ago by Jim Barlow <[email protected]>
More portable solution (works also on OS X) to get OCRmyPDF.sh path (following simlinks)
github.com/ocrmypdf/OCRmyPDF - 2ed2307573d1769bf52fa212266e6d1eed6b1603 authored over 10 years ago by fritz-hh <[email protected]>github.com/ocrmypdf/OCRmyPDF - 3f8a2d8d3ed18583b46ff7a5707ce7958b7d2ec5 authored over 10 years ago by Jim Barlow <[email protected]>
Previously I checked only if the folder in which the input file should
be exists
github.com/ocrmypdf/OCRmyPDF - d7130a1e56f93c9544a19f2d50813b83488be8b9 authored over 10 years ago by Jim Barlow <[email protected]>
Put TESS_CFG_FILES last because it is optional and can be blank. If
omitted it breaks the sequen...
Conflicts:
OCRmyPDF.sh
readlink -f is a GNU coreutils extension, so not available on OS X and
other platforms.
If a page contains font data, the script would abort, unless -f was given,
in which case it woul...
When I upgraded to poppler 0.24.5, pdftoppm was not compiled because the
script had --disable-sp...
github.com/ocrmypdf/OCRmyPDF - d510e7e4aee471c7a167a79c9819e3edbece38b8 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5893290dd9fba6c878798d2e445218417043ebb6 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 5c3bbc4031ab44050a182fcd6431309030f4b1c4 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 27cd8cf0db14b7f77a3da51b2d4f26ddeace3029 authored over 10 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - b403016d5b8ec428cbcd2e124e8fbf78e4720221 authored over 10 years ago by fritz-hh <[email protected]>
Fixed typo
github.com/ocrmypdf/OCRmyPDF - 5a81823969a021f05c9b75b4123b69097013a6a8 authored over 10 years ago by fritz-hh <[email protected]>
- small changes to make this work on Ubuntu 12.04 called via symlink
- lowered minimum parallel...
github.com/ocrmypdf/OCRmyPDF - 5c7b2a2a364573f47eb8f8696b7dbd1447a19a4e authored over 10 years ago by Dorian Scholz <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 1db06de2877f352c97d01c99f3030144adb33350 authored over 10 years ago by Dorian Scholz <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 3904178d44609bc4755df9eaacd8e48018571ecf authored over 10 years ago by Martin Ettl <[email protected]>
fixed tipo ghostcript to ghostscript
github.com/ocrmypdf/OCRmyPDF - 8bb9c3610cd5044f7450ba76f7411163056a7e26 authored over 10 years ago by fritz-hh <[email protected]>github.com/ocrmypdf/OCRmyPDF - 7dcc382ccc07f9a00035e54b1f930128efb48646 authored over 10 years ago by MoritzFago <[email protected]>
Fixed typo in help text
github.com/ocrmypdf/OCRmyPDF - b71fc807d208251401155146be0a3980b1a9ed5c authored over 10 years ago by fritz-hh <[email protected]>github.com/ocrmypdf/OCRmyPDF - 15d28d970aa89c0dbde2a0f4795aad63c53431ed authored over 10 years ago by Andy Signer <[email protected]>
Fixed typo in import of reportlab.
github.com/ocrmypdf/OCRmyPDF - e083a860e9805eda0e37711a8cc1ed8a02d1444a authored over 10 years ago by fritz-hh <[email protected]>github.com/ocrmypdf/OCRmyPDF - 6463b9dd840cc0eb76aeeb558b5e8c5c9d58dcba authored over 10 years ago by Andreas Christ <[email protected]>
Closes #72
github.com/ocrmypdf/OCRmyPDF - c873de6ca4ce15880a62c221eea4699090805bcb authored over 10 years ago by fritz-hh <[email protected]>closes #71
github.com/ocrmypdf/OCRmyPDF - b70863b47e3e092d8761a9bee1e4564e68caec5f authored over 10 years ago by fritz-hh <[email protected]>github.com/ocrmypdf/OCRmyPDF - 3546f84c6d13224daad9744309a880f92009bdc3 authored over 10 years ago by fritz-hh <[email protected]>
If a page contains font data, the script would abort, unless -f was given,
in which case it woul...
github.com/ocrmypdf/OCRmyPDF - 1c34fd69cf9bd40578ec33fc303c23cf9db9a714 authored almost 11 years ago by fritz-hh <[email protected]>
Allow tesseract 3.02.01 to be used.
Even 3.02.01 fails in few cases (see issue #28). I decided t...
github.com/ocrmypdf/OCRmyPDF - 112fb5098bfcddd6a5d23a620d47623170f6db38 authored almost 11 years ago by Jim Barlow <[email protected]>
Leptonica does not interpret those extensions correctly. However, when
asked to produce a .pnm ...
github.com/ocrmypdf/OCRmyPDF - 8cfbdaf0d022dea822e76f1b93bf6a672f5ac07d authored almost 11 years ago by Jim Barlow <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 670343497677ac44b52b66ed1daedc2876442a46 authored almost 11 years ago by Jim Barlow <[email protected]>
A few design notes:
Leptonica's deskew is far superior to ImageMagick's convert -deskew command ...
In case lan is not supported, list the supported languages in the error
message
Check if the languages option provided to tesseract (-l) are supported
github.com/ocrmypdf/OCRmyPDF - 18322b424f4a9824dab417eaadb5d5dd4f9337a3 authored almost 11 years ago by fritz-hh <[email protected]>
better way of checking if the tesseract version is compatible with the
script.
If the required t...
github.com/ocrmypdf/OCRmyPDF - e369ce67669f5f9a9a1cac4c8d8f3b96b7d3d0e4 authored almost 11 years ago by fritz-hh <[email protected]>
github.com/ocrmypdf/OCRmyPDF - 64e4e5d91e7f71ca7d2843a028561bf96f7fdf1c authored almost 11 years ago by fritz-hh <[email protected]>