Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
https://github.com/ocrmypdf/OCRmyPDF

wording corrected

efce7de9aea113015babdb0a301bcb2b5bd47c0c authored about 11 years ago
dependency to pdftk removed

concatenation is now done also with ghostscript

38c64ac689aa9a3d40d0b786c8dbd38fc4e6e5c9 authored about 11 years ago
portability improvements + minor changes

6d203e3eee87137220ca0275a95ad2b2973f9126 authored about 11 years ago
disclaimer added

81f461e5576af86859fd55a452bbb3071afd7cfc authored about 11 years ago
tmpfiles to $TMPDIR + better portability (mktemp)

mktemp: consider both FreeBSD/OSX and Linux OS having incompatible
syntax
From now on temporary ...

988bde138703a63eac3437047bd58c62d4d9c5ee authored about 11 years ago
merged pull request from oxplot

aedbabdbe8d0965612a4eaeb409e33b56a510808 authored about 11 years ago
Readme improved

6ed53e53c7d95d9245f372b60204fdb078fb9834 authored about 11 years ago
Make src scripts executable

Signed-off-by: Mansour Behabadi <[email protected]>

a78630ce99c9ec8844f3c6a14b1a195b535909c0 authored about 11 years ago
Use --gnu in parralell and XX for mktemp

Signed-off-by: Mansour Behabadi <[email protected]>

6653066784275ccacd2d600405b23bf88d0b9e41 authored about 11 years ago
better handling of ligatures: fixes #58

e40f1fa0811e9fb06100c548453c5aa406a6a8cc authored about 11 years ago
config file restructured

to be make which parameters are allowed to be changed by the user

a872ce751d33f9a7faf88eb9927839b3f19ae2b5 authored about 11 years ago
Check of tmp folder creation was successful

317846fbdca262026c76c6c1bb72836fae95791b authored about 11 years ago
Merge pull request #57 from jbarlow83/for-upstream/tmpfolder

Fix temporary folder name generation collisions

f581a5554416cc00dfe659b6edba9942eaf93ba0 authored about 11 years ago
minor changes

447b291e70aa45ab74ff3d7484db834afcfb732f authored about 11 years ago
indicate python2 to be used in header

01d07253e84c92753f9ca4f31cce201fe291ff9e authored about 11 years ago
Merge pull request #56 from jbarlow83/for-upstream/hocr-selfwidth

Fix AttributeError on self.width if Tesseract finds no OCR text

034a4660942bf25072821c79de5907bbaa8d4502 authored about 11 years ago
Merge pull request #55 from jbarlow83/for-upstream/check-poppler

Verify that pdftoppm is the Poppler version, not xpdf version

c6211e23354de7110f66f1b94e6b2d39e31f8ccd authored about 11 years ago
Verify that pdftoppm is the Poppler version, not xpdf version

1d03a6417dce15b179e72f2e259a328d8d8bcedc authored about 11 years ago
Fix AttributeError on self.width if Tesseract finds no OCR text

self.width remains undefined unless hOCR finds text. It might not, if
a page contains only an i...

1d62ef27a2f9e635d493045045f8a6e46c5e7090 authored about 11 years ago
Fix temporary folder name generation collisions

First, the regular expression matches everything after the first period
in a filename. Adding t...

996048dc08fd9b67f7fba540a521ec36229ea236 authored about 11 years ago
Resolved conflits with jbarlow83 pull request

bf02ee3bdc7399fd5d34b9229d493eff6b754671 authored about 11 years ago
minor changes (comments)

a3c7fba02d5cff767eb7174707cadb2562e75d32 authored about 11 years ago
remove spurious space in img number

Tell the script that "nbImg" is a number, so that leading/trailing
spaces are removed

a8cd7febf6ef61f77ab5d73651e5074f6fb26ced authored about 11 years ago
avoid spurious error msg if no image in pdf

20c008b84fd0383929f788c01e43592bb4c8e55c authored about 11 years ago
check if python libs are installed

Check if reportlab and lxml are installed, otherwise exist with an error

7cd73566bef8f93385267da4e87fb03b7819e6fc authored about 11 years ago
poppler syntax (rather than xpdf syntax)

e56fd53d0649147d27a499abb93ecdd5e7ad8448 authored about 11 years ago
Merge pull request #48 from jbarlow83/for-upstream/osx-errors

Fix pdffonts error when filename contains a space

810b1b3b3e335e944524624f840df961c7b63ee9 authored about 11 years ago
Merge branch 'v2.x' of https://github.com/fritz-hh/OCRmyPDF into v2.x

cb0b033fe76ee91605eece0d71d3426b67c57d74 authored about 11 years ago
exit if bad parallel/tesseract version installed

46f673a3b7696b2b4357d39ff732b3cfb2c1939d authored about 11 years ago
parallel version added in RELEASE_NOTES

455303b3d47a413086be96641ad1535d6ce1fb6e authored about 11 years ago
Fix pdffonts error when filename contains a space

24a84d63803adb1b46ee7b252ed7e66cbc9683ae authored about 11 years ago
Monkeypatch reportlab to output grayscale and monochrome colorspaces

9aa21710522896f6aaacc47e383b95f3ebe75181 authored about 11 years ago
Merge branch 'for-upstream/pdftoppm-error' into for-upstream/mono

3a46ea1f3660536ccb1ddcb1326795d3b30563f5 authored about 11 years ago
Detect monochrome images and extract them as PBM (1 bpp)

d33779f301d4c116b5d179a70407cd8699f63335 authored about 11 years ago
Fix ocrPage.sh pdftoppm error on OS X 10.9

d6ea0793b8434c6a879f22b1ce506ac47a65eba5 authored about 11 years ago
version changed to v2.x

4e5e5bb92539f56a4043bbabc9ab7257c5ffcf15 authored about 11 years ago
link to releases updated

3232ed8e38c98ba23d7d61cdf72dcd286d7a3cea authored about 11 years ago
release_notes and readme updated for v2.0-rc1

29d6748af8f372ddc2360420a355b0bbc0b8a22a authored about 11 years ago
erroneous exit code corrected

828f1950716d96c6e455974ef16ca7d148f4cf35 authored about 11 years ago
fixes #40 and code cleanup

b0b7e327830030d56b8bc24c6b1573ed8623b858 authored about 11 years ago
check tesseract version

fixes #41
versions older than 3.02.02 are known to produce invalid hocr output (in
some cases)

c1103c0248c0082936f5d6615218736aa6ac758f authored about 11 years ago
link to issue tracking system added

940a016e952f7faab737d008d80faf02ee8abb8c authored about 11 years ago
create symbolic links and not copy

If deskew and/or cleanup is not requested, do not copy the files, but
just create symbolic link....

c6cc098e4743d74ba28a9abe9ef8aeb45e7d87c4 authored about 11 years ago
Minor change

54f47ab89bb4c7ef8117e6d667c227191c2106e6 authored about 11 years ago
Changed debug page name

In order to have the debug page after the normal panel in the final PDF
file

fc3de64dceb4dbeb6e0c07d26afeda372239adca authored about 11 years ago
round dpi value correctly

414c4e3f3cad4c12278f30211d514c68e268fa2b authored about 11 years ago
removed unused variables

6a9f38d31eaae7338292736bb2cd8fa06e89a9c1 authored about 11 years ago
fixes #44

The x/y resolutions are not computed separately anymore.
We do not check anymore if x and y reso...

aa4256d35cdaad61a40765e8001610b92bee5d25 authored about 11 years ago
minor changes (indentation and fct name)

8a1241ba44e7d9aa028b42f45f2455196bfc8dca authored about 11 years ago
Improved consistency of tmp file names

7eab052e0f9d19f279fc46670880ab026ad2d658 authored about 11 years ago
v1.1-stable added in release notes

552d19e36b01e74f8efbf93ebf3cdbe5aae0cf76 authored about 11 years ago
typo

463b04e795ac5eae256cc6f3ea4e9ba401d85c7f authored about 11 years ago
minor change in log msg

c0d8508264afe6e856d25c7c55ce046a3cc638a1 authored about 11 years ago
help and documentation improved

6ef4ba31e21dbac8f35f23d50539967948d38e35 authored about 11 years ago
default PDI definition moved to cfg file

10a3d26291a9f2ff74253b50651ece0b8a44635c authored about 11 years ago
explanations added for no_ligature cfg file

ab994b32eefc0dbc0dd65b31973461e69cda4171 authored about 11 years ago
Copyright added

9352b71d787c7309983cec21cf63a2040be3c9e3 authored about 11 years ago
minor change

71593421ed6159505b80ea924f922f797192910d authored about 11 years ago
Echo arguments of script in debug mode

2754970f37d5d2302851080d4c91d37e5dcced4a authored about 11 years ago
Support for -f option

Fixes #16

5945454597cb11df2fb8e4e75f8368055b03d4be authored about 11 years ago
copyright years updated

884dbce712c6107405e4feb6aa766e697f69a302 authored about 11 years ago
Minor change

8ee1bc659821c7a62b0e0e538cef8997e0e4792a authored about 11 years ago
Check if page already contains a font

7d76c467312485e3814df0242e13ccd5be35d8a8 authored about 11 years ago
path to tmp folder now defined in config.sh

f8ccf42c06b4040e10d7a45562922273877e447d authored about 11 years ago
minor change

0abe0f1f10c38d6ed51a7f6fe75b360cacde7c5c authored about 11 years ago
echo also java version in debug mode

ee8a5d80ff17d929d6319bb17edf538247d576cc authored about 11 years ago
Support section added

f08893b5c8dfa5e6af12d30df626920d52011d33 authored about 11 years ago
Echo version of the used tools

Fixes #35

41cd88506e040597b0252936ee4f91837bc34dbf authored about 11 years ago
Delete 2013_09_LED_und_Energiesparlampen.pdf

file committed by mistake... So deleting it now

081223b1381811bac5be7ed8b7df9b4891aa0c8a authored about 11 years ago
Warn user in case of low resolution

4e60c9ba0964e0bcad87d8c796951e16a48d0a46 authored about 11 years ago
Oversampling + more than 1 img

- Oversampling resolution can now be set from the cmd line (-o option)
- If a page contains more...

95fe7cd3bc5bcef41b032be07ff70891bffd9740 authored about 11 years ago
Automatic oversampling

- If resolution is too low (<250dpi) perform automatic oversampling of
the image
- comments impr...

79ec1d994e98d5d5431cba2cd778a82bee5c94c3 authored about 11 years ago
minor change

045362425f2170c7a4438e39fadeed29371ab8b8 authored about 11 years ago
minor change

2b2637fbc3f43710a0ea2d155c7aea8cf6898601 authored about 11 years ago
better resolution handling (fixes #38)

- dpi computation moved to in dedicated function
- do not exit in case of resolution mismatch (f...

bfc4f7a28d51efc7aec44be0801972a0f870cc63 authored about 11 years ago
Minor change

407670e1f3a5e1a2023146885962ec7f6986b40f authored about 11 years ago
New log level added (LOG_WARN)

d0671d81b5c91fa97534165e6b25f509633a63ca authored about 11 years ago
comments and log messages improved

7a74ebbcc3cf60633a03802bc8a12e4e9b88d9d9 authored about 11 years ago
typo

9e698003326e5a7e1266b7f85d998b3436641a66 authored about 11 years ago
Removed bashism

== does not exist in bourne shell

7542188592a46252bc817b88db16a4c50aa1fcb8 authored about 11 years ago
fixes #34

tell GNU parallel to protect against evaluation by the sub shell (-q
flag).
This is required in ...

b4a23c005d02c347658a78fb3f5f37f8892a3d43 authored about 11 years ago
Various improvements

-Constants moved to config.sh
- Use "python2" cmd instead of "python"
- few other minor changes

5e0f8be4b1519e1d70bdd4cfe68dbd88d0c10e31 authored about 11 years ago
File Test_Issue_#28 renamed

50dee556069f01ddd22e5ecdd8209fff1d4d5cf1 authored about 11 years ago
copyright line added

da5cd01fe48cb62954f38200fe677d4ec379c017 authored almost 12 years ago
readme updated

new feature: Process several pages in parallel if more than one CPU core
is available

d3fb317d4162003c283abe18ba233fae9d7d2f8c authored almost 12 years ago
OCRmyPDF.sh: added dependency to GNU parallel

88ddeb1fb64a72ce8e95bf293c4d0e0d51b3079e authored almost 12 years ago
Merge remote-tracking branch 'origin/v1.x' into v2.x

f9e2e74bf3846af7a05bb9ed1035fd9a4427d7fd authored almost 12 years ago
readme updated for v1.0-stable

87e01aff607c56ad7e315669a22af9b745708d83 authored almost 12 years ago
OCRmyPDF.sh: metadata not added anymore

Removed feature to add metadata in final pdf file (because it lead to to
final PDF file that doe...

7e8481186abf9f3c339602cbf37a23bfe30ce40a authored almost 12 years ago
basic implementation of parallel page processing

- basic implementation of parallel page processing using GNU parallel
- processing around 40% fa...

2b0103a4e6dcb7a5c3f37a14c05e0b1e89d96e57 authored almost 12 years ago
Merge remote-tracking branch 'origin/v1.x' into v2.x

Conflicts:
OCRmyPDF.sh

Fixes #31

064d4be83cf406a51260e5ac75c4d6531d5e3b75 authored almost 12 years ago
OCRmyPDF.sh: fixes issue for files having spaces

fixes #31

ab536d5678b8441f8830d5add1603d25f620f4fd authored almost 12 years ago
new file to OCR one page

Required to perform OCR of several pages in parallal (using GNU
parallel)

9db805c4ad26337e7697fdc60c4430a0bdede41f authored almost 12 years ago
OCRmyPDF.sh: few variables renamed for clarity

f7923a9761e22a5156000b8ee0c7f1bbf21e7f22 authored almost 12 years ago
.gitattribute: handle *.jar and *.pdf as binary

fd52650255060b9dabaf6b5665637ac5e147a5ba authored almost 12 years ago
jhove config: fixes #29

f0fe2951752fd23544e2bed787e47cd6b6e7a85f authored almost 12 years ago
.gitignore corrected + jhove jar files added

.gitignore file corrected, because it prevented some required jhove
binary files from being chec...

2f89aa393516f0460523e5e6c8d04191d4522057 authored almost 12 years ago
delete test file

5aa27343e0d399163e1cc5b719c27ae86045bdb6 authored almost 12 years ago
JHove: deleted doc + source

Deleted number of jhove files that are not required
(documentation and java source code mainly)
...

5ce2841389ad3c716d3ad00cea23b8926f11613b authored almost 12 years ago
OCRmyPDF.sh: provision for parallel pages processing

e4ffb58269a5ce55e223eb4dd87c6f2b728c54ad authored almost 12 years ago