Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

wording corrected

github.com/ocrmypdf/OCRmyPDF - efce7de9aea113015babdb0a301bcb2b5bd47c0c authored almost 11 years ago by fritz-hh <[email protected]>
dependency to pdftk removed

concatenation is now done also with ghostscript

github.com/ocrmypdf/OCRmyPDF - 38c64ac689aa9a3d40d0b786c8dbd38fc4e6e5c9 authored almost 11 years ago by fritz-hh <[email protected]>
portability improvements + minor changes

github.com/ocrmypdf/OCRmyPDF - 6d203e3eee87137220ca0275a95ad2b2973f9126 authored almost 11 years ago by fritz-hh <[email protected]>
disclaimer added

github.com/ocrmypdf/OCRmyPDF - 81f461e5576af86859fd55a452bbb3071afd7cfc authored almost 11 years ago by fritz-hh <[email protected]>
tmpfiles to $TMPDIR + better portability (mktemp)

mktemp: consider both FreeBSD/OSX and Linux OS having incompatible
syntax
From now on temporary ...

github.com/ocrmypdf/OCRmyPDF - 988bde138703a63eac3437047bd58c62d4d9c5ee authored almost 11 years ago by fritz-hh <[email protected]>
merged pull request from oxplot

github.com/ocrmypdf/OCRmyPDF - aedbabdbe8d0965612a4eaeb409e33b56a510808 authored almost 11 years ago by fritz-hh <[email protected]>
Readme improved

github.com/ocrmypdf/OCRmyPDF - 6ed53e53c7d95d9245f372b60204fdb078fb9834 authored almost 11 years ago by fritz-hh <[email protected]>
Make src scripts executable

Signed-off-by: Mansour Behabadi <[email protected]>

github.com/ocrmypdf/OCRmyPDF - a78630ce99c9ec8844f3c6a14b1a195b535909c0 authored almost 11 years ago by Mansour Behabadi <[email protected]>
Use --gnu in parralell and XX for mktemp

Signed-off-by: Mansour Behabadi <[email protected]>

github.com/ocrmypdf/OCRmyPDF - 6653066784275ccacd2d600405b23bf88d0b9e41 authored almost 11 years ago by Mansour Behabadi <[email protected]>
better handling of ligatures: fixes #58

github.com/ocrmypdf/OCRmyPDF - e40f1fa0811e9fb06100c548453c5aa406a6a8cc authored almost 11 years ago by fritz-hh <[email protected]>
config file restructured

to be make which parameters are allowed to be changed by the user

github.com/ocrmypdf/OCRmyPDF - a872ce751d33f9a7faf88eb9927839b3f19ae2b5 authored almost 11 years ago by fritz-hh <[email protected]>
Check of tmp folder creation was successful

github.com/ocrmypdf/OCRmyPDF - 317846fbdca262026c76c6c1bb72836fae95791b authored almost 11 years ago by fritz-hh <[email protected]>
Merge pull request #57 from jbarlow83/for-upstream/tmpfolder

Fix temporary folder name generation collisions

github.com/ocrmypdf/OCRmyPDF - f581a5554416cc00dfe659b6edba9942eaf93ba0 authored almost 11 years ago by fritz-hh <[email protected]>
minor changes

github.com/ocrmypdf/OCRmyPDF - 447b291e70aa45ab74ff3d7484db834afcfb732f authored almost 11 years ago by fritz-hh <[email protected]>
indicate python2 to be used in header

github.com/ocrmypdf/OCRmyPDF - 01d07253e84c92753f9ca4f31cce201fe291ff9e authored almost 11 years ago by fritz-hh <[email protected]>
Merge pull request #56 from jbarlow83/for-upstream/hocr-selfwidth

Fix AttributeError on self.width if Tesseract finds no OCR text

github.com/ocrmypdf/OCRmyPDF - 034a4660942bf25072821c79de5907bbaa8d4502 authored almost 11 years ago by fritz-hh <[email protected]>
Merge pull request #55 from jbarlow83/for-upstream/check-poppler

Verify that pdftoppm is the Poppler version, not xpdf version

github.com/ocrmypdf/OCRmyPDF - c6211e23354de7110f66f1b94e6b2d39e31f8ccd authored almost 11 years ago by fritz-hh <[email protected]>
Verify that pdftoppm is the Poppler version, not xpdf version

github.com/ocrmypdf/OCRmyPDF - 1d03a6417dce15b179e72f2e259a328d8d8bcedc authored almost 11 years ago by Jim Barlow <[email protected]>
Fix AttributeError on self.width if Tesseract finds no OCR text

self.width remains undefined unless hOCR finds text. It might not, if
a page contains only an i...

github.com/ocrmypdf/OCRmyPDF - 1d62ef27a2f9e635d493045045f8a6e46c5e7090 authored almost 11 years ago by Jim Barlow <[email protected]>
Fix temporary folder name generation collisions

First, the regular expression matches everything after the first period
in a filename. Adding t...

github.com/ocrmypdf/OCRmyPDF - 996048dc08fd9b67f7fba540a521ec36229ea236 authored almost 11 years ago by Jim Barlow <[email protected]>
Resolved conflits with jbarlow83 pull request

github.com/ocrmypdf/OCRmyPDF - bf02ee3bdc7399fd5d34b9229d493eff6b754671 authored almost 11 years ago by fritz-hh <[email protected]>
minor changes (comments)

github.com/ocrmypdf/OCRmyPDF - a3c7fba02d5cff767eb7174707cadb2562e75d32 authored almost 11 years ago by fritz-hh <[email protected]>
remove spurious space in img number

Tell the script that "nbImg" is a number, so that leading/trailing
spaces are removed

github.com/ocrmypdf/OCRmyPDF - a8cd7febf6ef61f77ab5d73651e5074f6fb26ced authored almost 11 years ago by fritz-hh <[email protected]>
avoid spurious error msg if no image in pdf

github.com/ocrmypdf/OCRmyPDF - 20c008b84fd0383929f788c01e43592bb4c8e55c authored almost 11 years ago by fritz-hh <[email protected]>
check if python libs are installed

Check if reportlab and lxml are installed, otherwise exist with an error

github.com/ocrmypdf/OCRmyPDF - 7cd73566bef8f93385267da4e87fb03b7819e6fc authored almost 11 years ago by fritz-hh <[email protected]>
poppler syntax (rather than xpdf syntax)

github.com/ocrmypdf/OCRmyPDF - e56fd53d0649147d27a499abb93ecdd5e7ad8448 authored almost 11 years ago by fritz-hh <[email protected]>
Merge pull request #48 from jbarlow83/for-upstream/osx-errors

Fix pdffonts error when filename contains a space

github.com/ocrmypdf/OCRmyPDF - 810b1b3b3e335e944524624f840df961c7b63ee9 authored almost 11 years ago by fritz-hh <[email protected]>
Merge branch 'v2.x' of https://github.com/fritz-hh/OCRmyPDF into v2.x

github.com/ocrmypdf/OCRmyPDF - cb0b033fe76ee91605eece0d71d3426b67c57d74 authored almost 11 years ago by fritz-hh <[email protected]>
exit if bad parallel/tesseract version installed

github.com/ocrmypdf/OCRmyPDF - 46f673a3b7696b2b4357d39ff732b3cfb2c1939d authored almost 11 years ago by fritz-hh <[email protected]>
parallel version added in RELEASE_NOTES

github.com/ocrmypdf/OCRmyPDF - 455303b3d47a413086be96641ad1535d6ce1fb6e authored almost 11 years ago by fritz-hh <[email protected]>
Fix pdffonts error when filename contains a space

github.com/ocrmypdf/OCRmyPDF - 24a84d63803adb1b46ee7b252ed7e66cbc9683ae authored almost 11 years ago by Jim Barlow <[email protected]>
Monkeypatch reportlab to output grayscale and monochrome colorspaces

github.com/ocrmypdf/OCRmyPDF - 9aa21710522896f6aaacc47e383b95f3ebe75181 authored almost 11 years ago by Jim Barlow <[email protected]>
Merge branch 'for-upstream/pdftoppm-error' into for-upstream/mono

github.com/ocrmypdf/OCRmyPDF - 3a46ea1f3660536ccb1ddcb1326795d3b30563f5 authored almost 11 years ago by Jim Barlow <[email protected]>
Detect monochrome images and extract them as PBM (1 bpp)

github.com/ocrmypdf/OCRmyPDF - d33779f301d4c116b5d179a70407cd8699f63335 authored almost 11 years ago by Jim Barlow <[email protected]>
Fix ocrPage.sh pdftoppm error on OS X 10.9

github.com/ocrmypdf/OCRmyPDF - d6ea0793b8434c6a879f22b1ce506ac47a65eba5 authored almost 11 years ago by Jim Barlow <[email protected]>
version changed to v2.x

github.com/ocrmypdf/OCRmyPDF - 4e5e5bb92539f56a4043bbabc9ab7257c5ffcf15 authored almost 11 years ago by fritz-hh <[email protected]>
link to releases updated

github.com/ocrmypdf/OCRmyPDF - 3232ed8e38c98ba23d7d61cdf72dcd286d7a3cea authored almost 11 years ago by fritz-hh <[email protected]>
release_notes and readme updated for v2.0-rc1

github.com/ocrmypdf/OCRmyPDF - 29d6748af8f372ddc2360420a355b0bbc0b8a22a authored almost 11 years ago by fritz-hh <[email protected]>
erroneous exit code corrected

github.com/ocrmypdf/OCRmyPDF - 828f1950716d96c6e455974ef16ca7d148f4cf35 authored almost 11 years ago by fritz-hh <[email protected]>
fixes #40 and code cleanup

github.com/ocrmypdf/OCRmyPDF - b0b7e327830030d56b8bc24c6b1573ed8623b858 authored almost 11 years ago by fritz-hh <[email protected]>
check tesseract version

fixes #41
versions older than 3.02.02 are known to produce invalid hocr output (in
some cases)

github.com/ocrmypdf/OCRmyPDF - c1103c0248c0082936f5d6615218736aa6ac758f authored almost 11 years ago by fritz-hh <[email protected]>
link to issue tracking system added

github.com/ocrmypdf/OCRmyPDF - 940a016e952f7faab737d008d80faf02ee8abb8c authored almost 11 years ago by fritz-hh <[email protected]>
create symbolic links and not copy

If deskew and/or cleanup is not requested, do not copy the files, but
just create symbolic link....

github.com/ocrmypdf/OCRmyPDF - c6cc098e4743d74ba28a9abe9ef8aeb45e7d87c4 authored almost 11 years ago by fritz-hh <[email protected]>
Minor change

github.com/ocrmypdf/OCRmyPDF - 54f47ab89bb4c7ef8117e6d667c227191c2106e6 authored almost 11 years ago by fritz-hh <[email protected]>
Changed debug page name

In order to have the debug page after the normal panel in the final PDF
file

github.com/ocrmypdf/OCRmyPDF - fc3de64dceb4dbeb6e0c07d26afeda372239adca authored almost 11 years ago by fritz-hh <[email protected]>
round dpi value correctly

github.com/ocrmypdf/OCRmyPDF - 414c4e3f3cad4c12278f30211d514c68e268fa2b authored almost 11 years ago by fritz-hh <[email protected]>
removed unused variables

github.com/ocrmypdf/OCRmyPDF - 6a9f38d31eaae7338292736bb2cd8fa06e89a9c1 authored almost 11 years ago by fritz-hh <[email protected]>
fixes #44

The x/y resolutions are not computed separately anymore.
We do not check anymore if x and y reso...

github.com/ocrmypdf/OCRmyPDF - aa4256d35cdaad61a40765e8001610b92bee5d25 authored almost 11 years ago by fritz-hh <[email protected]>
minor changes (indentation and fct name)

github.com/ocrmypdf/OCRmyPDF - 8a1241ba44e7d9aa028b42f45f2455196bfc8dca authored almost 11 years ago by fritz-hh <[email protected]>
Improved consistency of tmp file names

github.com/ocrmypdf/OCRmyPDF - 7eab052e0f9d19f279fc46670880ab026ad2d658 authored almost 11 years ago by fritz-hh <[email protected]>
v1.1-stable added in release notes

github.com/ocrmypdf/OCRmyPDF - 552d19e36b01e74f8efbf93ebf3cdbe5aae0cf76 authored almost 11 years ago by fritz-hh <[email protected]>
minor change in log msg

github.com/ocrmypdf/OCRmyPDF - c0d8508264afe6e856d25c7c55ce046a3cc638a1 authored almost 11 years ago by fritz-hh <[email protected]>
help and documentation improved

github.com/ocrmypdf/OCRmyPDF - 6ef4ba31e21dbac8f35f23d50539967948d38e35 authored almost 11 years ago by fritz-hh <[email protected]>
default PDI definition moved to cfg file

github.com/ocrmypdf/OCRmyPDF - 10a3d26291a9f2ff74253b50651ece0b8a44635c authored almost 11 years ago by fritz-hh <[email protected]>
explanations added for no_ligature cfg file

github.com/ocrmypdf/OCRmyPDF - ab994b32eefc0dbc0dd65b31973461e69cda4171 authored almost 11 years ago by fritz-hh <[email protected]>
Copyright added

github.com/ocrmypdf/OCRmyPDF - 9352b71d787c7309983cec21cf63a2040be3c9e3 authored almost 11 years ago by fritz-hh <[email protected]>
minor change

github.com/ocrmypdf/OCRmyPDF - 71593421ed6159505b80ea924f922f797192910d authored almost 11 years ago by fritz-hh <[email protected]>
Echo arguments of script in debug mode

github.com/ocrmypdf/OCRmyPDF - 2754970f37d5d2302851080d4c91d37e5dcced4a authored almost 11 years ago by fritz-hh <[email protected]>
Support for -f option

Fixes #16

github.com/ocrmypdf/OCRmyPDF - 5945454597cb11df2fb8e4e75f8368055b03d4be authored almost 11 years ago by fritz-hh <[email protected]>
copyright years updated

github.com/ocrmypdf/OCRmyPDF - 884dbce712c6107405e4feb6aa766e697f69a302 authored almost 11 years ago by fritz-hh <[email protected]>
Minor change

github.com/ocrmypdf/OCRmyPDF - 8ee1bc659821c7a62b0e0e538cef8997e0e4792a authored almost 11 years ago by fritz-hh <[email protected]>
Check if page already contains a font

github.com/ocrmypdf/OCRmyPDF - 7d76c467312485e3814df0242e13ccd5be35d8a8 authored almost 11 years ago by fritz-hh <[email protected]>
path to tmp folder now defined in config.sh

github.com/ocrmypdf/OCRmyPDF - f8ccf42c06b4040e10d7a45562922273877e447d authored almost 11 years ago by fritz-hh <[email protected]>
minor change

github.com/ocrmypdf/OCRmyPDF - 0abe0f1f10c38d6ed51a7f6fe75b360cacde7c5c authored about 11 years ago by fritz-hh <[email protected]>
echo also java version in debug mode

github.com/ocrmypdf/OCRmyPDF - ee8a5d80ff17d929d6319bb17edf538247d576cc authored about 11 years ago by fritz-hh <[email protected]>
Support section added

github.com/ocrmypdf/OCRmyPDF - f08893b5c8dfa5e6af12d30df626920d52011d33 authored about 11 years ago by fritz-hh <[email protected]>
Echo version of the used tools

Fixes #35

github.com/ocrmypdf/OCRmyPDF - 41cd88506e040597b0252936ee4f91837bc34dbf authored about 11 years ago by fritz-hh <[email protected]>
Delete 2013_09_LED_und_Energiesparlampen.pdf

file committed by mistake... So deleting it now

github.com/ocrmypdf/OCRmyPDF - 081223b1381811bac5be7ed8b7df9b4891aa0c8a authored about 11 years ago by fritz-hh <[email protected]>
Warn user in case of low resolution

github.com/ocrmypdf/OCRmyPDF - 4e60c9ba0964e0bcad87d8c796951e16a48d0a46 authored about 11 years ago by fritz-hh <[email protected]>
Oversampling + more than 1 img

- Oversampling resolution can now be set from the cmd line (-o option)
- If a page contains more...

github.com/ocrmypdf/OCRmyPDF - 95fe7cd3bc5bcef41b032be07ff70891bffd9740 authored about 11 years ago by fritz-hh <[email protected]>
Automatic oversampling

- If resolution is too low (<250dpi) perform automatic oversampling of
the image
- comments impr...

github.com/ocrmypdf/OCRmyPDF - 79ec1d994e98d5d5431cba2cd778a82bee5c94c3 authored about 11 years ago by fritz-hh <[email protected]>
minor change

github.com/ocrmypdf/OCRmyPDF - 045362425f2170c7a4438e39fadeed29371ab8b8 authored about 11 years ago by fritz-hh <[email protected]>
minor change

github.com/ocrmypdf/OCRmyPDF - 2b2637fbc3f43710a0ea2d155c7aea8cf6898601 authored about 11 years ago by fritz-hh <[email protected]>
better resolution handling (fixes #38)

- dpi computation moved to in dedicated function
- do not exit in case of resolution mismatch (f...

github.com/ocrmypdf/OCRmyPDF - bfc4f7a28d51efc7aec44be0801972a0f870cc63 authored about 11 years ago by fritz-hh <[email protected]>
Minor change

github.com/ocrmypdf/OCRmyPDF - 407670e1f3a5e1a2023146885962ec7f6986b40f authored about 11 years ago by fritz-hh <[email protected]>
New log level added (LOG_WARN)

github.com/ocrmypdf/OCRmyPDF - d0671d81b5c91fa97534165e6b25f509633a63ca authored about 11 years ago by fritz-hh <[email protected]>
comments and log messages improved

github.com/ocrmypdf/OCRmyPDF - 7a74ebbcc3cf60633a03802bc8a12e4e9b88d9d9 authored about 11 years ago by fritz-hh <[email protected]>
Removed bashism

== does not exist in bourne shell

github.com/ocrmypdf/OCRmyPDF - 7542188592a46252bc817b88db16a4c50aa1fcb8 authored about 11 years ago by fritz-hh <[email protected]>
fixes #34

tell GNU parallel to protect against evaluation by the sub shell (-q
flag).
This is required in ...

github.com/ocrmypdf/OCRmyPDF - b4a23c005d02c347658a78fb3f5f37f8892a3d43 authored about 11 years ago by fritz-hh <[email protected]>
Various improvements

-Constants moved to config.sh
- Use "python2" cmd instead of "python"
- few other minor changes

github.com/ocrmypdf/OCRmyPDF - 5e0f8be4b1519e1d70bdd4cfe68dbd88d0c10e31 authored about 11 years ago by fritz-hh <[email protected]>
File Test_Issue_#28 renamed

github.com/ocrmypdf/OCRmyPDF - 50dee556069f01ddd22e5ecdd8209fff1d4d5cf1 authored about 11 years ago by fritz-hh <[email protected]>
copyright line added

github.com/ocrmypdf/OCRmyPDF - da5cd01fe48cb62954f38200fe677d4ec379c017 authored over 11 years ago by fritz-hh <[email protected]>
readme updated

new feature: Process several pages in parallel if more than one CPU core
is available

github.com/ocrmypdf/OCRmyPDF - d3fb317d4162003c283abe18ba233fae9d7d2f8c authored over 11 years ago by fritz-hh <[email protected]>
OCRmyPDF.sh: added dependency to GNU parallel

github.com/ocrmypdf/OCRmyPDF - 88ddeb1fb64a72ce8e95bf293c4d0e0d51b3079e authored over 11 years ago by fritz-hh <[email protected]>
Merge remote-tracking branch 'origin/v1.x' into v2.x

github.com/ocrmypdf/OCRmyPDF - f9e2e74bf3846af7a05bb9ed1035fd9a4427d7fd authored over 11 years ago by fritz-hh <[email protected]>
readme updated for v1.0-stable

github.com/ocrmypdf/OCRmyPDF - 87e01aff607c56ad7e315669a22af9b745708d83 authored over 11 years ago by fritz-hh <[email protected]>
OCRmyPDF.sh: metadata not added anymore

Removed feature to add metadata in final pdf file (because it lead to to
final PDF file that doe...

github.com/ocrmypdf/OCRmyPDF - 7e8481186abf9f3c339602cbf37a23bfe30ce40a authored over 11 years ago by fritz-hh <[email protected]>
basic implementation of parallel page processing

- basic implementation of parallel page processing using GNU parallel
- processing around 40% fa...

github.com/ocrmypdf/OCRmyPDF - 2b0103a4e6dcb7a5c3f37a14c05e0b1e89d96e57 authored over 11 years ago by fritz-hh <[email protected]>
Merge remote-tracking branch 'origin/v1.x' into v2.x

Conflicts:
OCRmyPDF.sh

Fixes #31

github.com/ocrmypdf/OCRmyPDF - 064d4be83cf406a51260e5ac75c4d6531d5e3b75 authored over 11 years ago by fritz-hh <[email protected]>
OCRmyPDF.sh: fixes issue for files having spaces

fixes #31

github.com/ocrmypdf/OCRmyPDF - ab536d5678b8441f8830d5add1603d25f620f4fd authored over 11 years ago by fritz-hh <[email protected]>
new file to OCR one page

Required to perform OCR of several pages in parallal (using GNU
parallel)

github.com/ocrmypdf/OCRmyPDF - 9db805c4ad26337e7697fdc60c4430a0bdede41f authored over 11 years ago by fritz-hh <[email protected]>
OCRmyPDF.sh: few variables renamed for clarity

github.com/ocrmypdf/OCRmyPDF - f7923a9761e22a5156000b8ee0c7f1bbf21e7f22 authored over 11 years ago by fritz-hh <[email protected]>
.gitattribute: handle *.jar and *.pdf as binary

github.com/ocrmypdf/OCRmyPDF - fd52650255060b9dabaf6b5665637ac5e147a5ba authored over 11 years ago by fritz-hh <[email protected]>
jhove config: fixes #29

github.com/ocrmypdf/OCRmyPDF - f0fe2951752fd23544e2bed787e47cd6b6e7a85f authored over 11 years ago by fritz-hh <[email protected]>
.gitignore corrected + jhove jar files added

.gitignore file corrected, because it prevented some required jhove
binary files from being chec...

github.com/ocrmypdf/OCRmyPDF - 2f89aa393516f0460523e5e6c8d04191d4522057 authored over 11 years ago by fritz-hh <[email protected]>
delete test file

github.com/ocrmypdf/OCRmyPDF - 5aa27343e0d399163e1cc5b719c27ae86045bdb6 authored over 11 years ago by fritz-hh <[email protected]>
JHove: deleted doc + source

Deleted number of jhove files that are not required
(documentation and java source code mainly)
...

github.com/ocrmypdf/OCRmyPDF - 5ce2841389ad3c716d3ad00cea23b8926f11613b authored over 11 years ago by fritz-hh <[email protected]>
OCRmyPDF.sh: provision for parallel pages processing

github.com/ocrmypdf/OCRmyPDF - e4ffb58269a5ce55e223eb4dd87c6f2b728c54ad authored over 11 years ago by fritz-hh <[email protected]>