Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
https://github.com/ocrmypdf/OCRmyPDF

added file to reproduce #28

2ce3d9e19d50bcd42e259e0fb1838919ec2f62ef authored almost 12 years ago
OCRmyPDF.sh: fixes #27

The fix should now be compatible to most implementation of grep

9271fe73a8519f15286381826bba2f92640d7e2d authored almost 12 years ago
OCRmyPDF.sh: fixes #25 and fixes #26

- In debug mode: compute and echo time required for processing
- Resolutions (x/y) that are near...

edaa70b97f7a57d128724a24718c99c0a0f87b2a authored almost 12 years ago
OCRmyPDF.sh: handling of path with spaces

- corrected fct absolutePath() to handle path with spaces correctly
- pdf title metadata: split ...

ab07f4deeafc9520a8402f62c9886acdb7f2548a authored almost 12 years ago
release notes: updated for v1.0-rc2

beb1d7ab5459dd6346e3942321eeaadd53c3c7d4 authored almost 12 years ago
OCRmyPDF.sh: Version number updated

5ce3e9bfec97ecc06b1ce5f1cb0692ce487d9d39 authored almost 12 years ago
OCRmyPDF.sh: added metadata in final pdf file

- added metadata in final pdf file: fixes #4
- improved logging of PDF/A validation results

2bed210a3026398e17381a4e83067588629327d8 authored almost 12 years ago
OCRmyPDF.sh: final pdf same owner & permissions

fixes #9

24415511565baf5f537603ca7d8353823c299942 authored almost 12 years ago
HocrTransform.py: exist if page size if not found

fixes #21

15baca5e080ae228663624ee994ce00bc149e8e6 authored almost 12 years ago
OCRmyPDF.sh: keep tmp files in debug mode

fixes #22

062ef0ca3a37fa9546d1245d9ee45b6209a40b74 authored almost 12 years ago
release notes: unpaper version added

24b46869448b35a76e6a5de34d4f96736a27375c authored almost 12 years ago
Correct version number

fixes #19

d3d1c20ca2820481d789ff11d0ec41ee33e1dc88 authored almost 12 years ago
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF

7f7b81154f7a1dbec07b647616332b617c4fe0fa authored almost 12 years ago
OCRmyPDF.sh: Fixed major problem with deskew

After deskew the images was cropped to the wrong size

6372cec6b801b36ad31698edd108d9c200280c91 authored almost 12 years ago
Update README.md

5ec875325e0ea33644ddd3af0a669b3966c0919e authored almost 12 years ago
Update README.md

4d80709cfd63e145dd1ae9b325800675d18866ae authored almost 12 years ago
Update README.md

2642c1b3d31c8c17f4dda0d741c5c06fd47926c7 authored almost 12 years ago
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF

c4cd7e198281d89bd5a2265aa4d277f5030be3f8 authored almost 12 years ago
OCRmyPDF: log msg corrected

b993c158d06d314e23e831e9f70230ffdbe0e3cd authored almost 12 years ago
release notes updated for v1.0-rc1

1b727042fe7530d727607ef08450a2983026e6e0 authored almost 12 years ago
folder structure cleaned

- put all src files (except OCRmyPDF.sh) to src
- rename tesseract_cfg to tess-cfg

ec2673657767fc5db550d4b262284438e6f70bfa authored almost 12 years ago
typo

a766c5f2b7df1bd555be65dbc9c5f82560ba7d09 authored almost 12 years ago
Update README.md

3b2c804f23a52f6ac00258210b97bd4683d3e93a authored almost 12 years ago
Readme updated

486ed6f2170ebffbb028fb4c88b5518ca97eb359 authored almost 12 years ago
jhove paths corrected

ae716a91cbb0995b7534ed5c55ebe2d1b1dc41ce authored almost 12 years ago
jhove package added

d4195b4362464c7824a6649ee6220cc730b1b99e authored almost 12 years ago
added readme

6ae0452d87600c8acef87598cfc95d40efc490d9 authored almost 12 years ago
add test script

aimed at checking if the quality of the images drops quickly or not

815117f653b631b40cfdc53ff403d4cbd87c066c authored almost 12 years ago
OCRmyPDF.sh: minor improvements

- additionnal data logged
- width/height were inverted: corrected
- few other minor changes

1c0eb03b3b15a399cc92c52174ae111918c531b1 authored almost 12 years ago
OCRmyPDF.sh: log to stderr + check PDF/A profile

- fixes #10
- check not only if the final PDF is well formed and valid, but also if
it conforms ...

3249fba4a2148d824da0fc7b6363b4dd7b7e3212 authored almost 12 years ago
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF

7c173dcc679805bc42c1b9d4675a90b97ee323ca authored almost 12 years ago
OCRmyPDF.sh: check if python is installed

- fixes #14
- minor other changes

357f449e076afc3267d02a1f11884c61c3c7f3d5 authored almost 12 years ago
hocrTransform: font changed to Helvetica

- Font changed to Helvetica (instead of courrier)
- License text deleted (license file already a...

83560cbd1d83be3428b4899db07ed028a9a0ba05 authored almost 12 years ago
Update COPYRIGHT.md

1860f80cae5dc549b64fe9dfa0e90c046313d924 authored almost 12 years ago
Update README.md

4ea97c4fe48fa8f4951a64de94116ff52de57234 authored almost 12 years ago
Fixed: issue with deskewing: size sometimes wrong

fixes #13

ee738be6814bdf4c0b881908abc4fa6380a9c42b authored almost 12 years ago
OCRmyPDF.sh: corrected dpi computation

fixes #12

e21b3155e582940294777a35d635a2755660f25b authored almost 12 years ago
OCRmyPDF.sh: minor change in code documentation

c293ffd621d63f6cbf8b6c455a0feb44e9190245 authored almost 12 years ago
OCRmyPDF.sh: better handling of path and tmp folder

- user can now define the name/location of the output file
- check if the folder in which in/out...

2fdaa7595c569bd69b7a282aa83e31bb7968597b authored almost 12 years ago
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF

968a66f66bf45f3c695109376b57567dc99b9179 authored almost 12 years ago
Support for additional tesseract config files

This corresponds to the -C option

5992afb70713eeab2d128216c2408c6a8e6f1e8e authored almost 12 years ago
OCRmyPDF.sh: typo in usage

9aa83215c45ab9a77d88519f62759244bdeb38b3 authored almost 12 years ago
Update README.md

939a148812cbdba1c3cbc77af1c5fc310799e41c authored almost 12 years ago
OCRmyPDF.sh: new debug option (-g) added

4ce249e6eddb7b40b6e4869e13b62898620ab08f authored almost 12 years ago
hocrTransform.py: various changes

-a option remove
bounding boxes for paragraphs added
color and style of bounding boxes improved

422aaa80f3c9fa23db1f32bff1ff8443981c1d3a authored almost 12 years ago
OCRmyPDF.sh: log levels implemented

fixes #5

b9a346ce7da812b9c05f45a64bc64efa4e8dd80e authored almost 12 years ago
Usage described

fixes #6

64b92ed1809aac6abf0a6c1008e594caafa344cf authored almost 12 years ago
Update README.md

90fc5c9de48e766b3f10a92007ed2ef4000233b2 authored almost 12 years ago
Update README.md

7118c2f04b191b6000159ffa32a1f87147f0dfe8 authored almost 12 years ago
Update README.md

d66712ab4233e3e30d91d42b410ae325edbfcd7c authored almost 12 years ago
OCRmyPDF.sh: various changes

fixes #3
fixes #2

c5f2158b85b3e4849928c48d9c3c064b788376e3 authored almost 12 years ago
Create COPYRIGHT.md

a5c5353fbdf2cab8f550f846febbfba3b9e3da24 authored almost 12 years ago
OCRmyPDF.sh: various improvements

- check if x_dpi = y_dpi
- separate options for image deskewing and cleaning
- exit codes define...

d5a3f762344346d00ca69323fa7db0d7c009ca96 authored almost 12 years ago
Update README.md

7c1820384551190c5a51df8dd7f8cb6e86f808af authored almost 12 years ago
readme: new sections "features" & "Motivation"

d7c238723bc86df347422f3ba9a646501a9738a0 authored almost 12 years ago
OCRmyPDF.sh: minor changes

f3e581d162188855669943bf1470fc2998bfb662 authored almost 12 years ago
OCRmyPDF.sh: check if utilities are installed

4f65a31eba54332d52ae27f8cea22dbf219b582d authored almost 12 years ago
OCRmyPDF.sh: fix error exit not exiting

Fixes an error that lead the script not to exit correctly in case more
than 1 image is detected ...

35d8cffad41f4a3b1e5f3ac6ca641720ecbc91dd authored almost 12 years ago
OCRmyPDF.sh: many improvements!

- automatic analysis of jhove validation report
- quiet generation of PDF/A with gs
- deletion o...

0c46a723bd930ed32735da06368b97e78be7ac00 authored almost 12 years ago
OCRmyPDF.sh: code clean-up

fcac99bc73d21a793aba84cac93b70e40b917cf5 authored almost 12 years ago
Readme: Installation section started

42208aa5feed809e3ab73ef81ae32fa559daca7f authored almost 12 years ago
OCRmyPDF.sh: page number now with leading zeros

7c3abea2320d4291d379e210811ed5034d656e0c authored almost 12 years ago
OCRmyPDF.sh: conversion to PDF/A added

2c23bca913ecb8c17c4947cc2ac74ce55380986a authored almost 12 years ago
OCRmyPDF.sh: computation of resolution

Added compuation of resolution of each PDF page
Added extract of image of pgm if colorspace is G...

4188d702edeb0225be9144f4cef89fe5f4b6d31c authored almost 12 years ago
OCRmyPDF.sh: prepare intelligent image extraction

preparation of extraction of the image in the same resolution than the
original image inside the...

318c77b93412a376d18aa0f569e647f5e4ce457f authored almost 12 years ago
OCRmyPDF.sh: new cmd line I/F of hocrTransform.py

Adapted to new new cmd line I/F of hocrTransform.py

b041c0080b4a85b7c75977d38e45cb461e540abe authored almost 12 years ago
hocrTransform.py: cmd line interface improved

Command line interface improved in order to allow:
- show bounding boxes border
- set OCR resolu...

ed938788511249c1d73d166976fbfff8c273a6dc authored almost 12 years ago
hocrTranform.py: moved size computation to init

df56c134e416c8ff914c976e91a78c918d9c77de authored almost 12 years ago
hocrTranform.py: A4 page size corrected

c51babfd279aa1ee8ccf0e92b03b9127a6b7fd71 authored almost 12 years ago
hocrTranform.py: license added

8fdbfc3c95bb0d94e14a09a85fba82979d570fb0 authored almost 12 years ago
hocrTransform: code cleanup

4d378c3b148802ecabb0f13ac75c88226907f852 authored almost 12 years ago
readme: warning that still in development

81d5b7b5e52d7e8adde9daf531b964743ff9fd88 authored almost 12 years ago
hocrTransform: code cleanup

accc082b918487e0bb85d45a87d9d9edc492ce1a authored almost 12 years ago
initial version

4e4b5ddc58067a91d019e2f9e3fad3e00cbd1f80 authored almost 12 years ago
gitignore, gitattributes and releaseNotes added

4202826dfac73040baa21bcf7c5fe7d76bcd1e2c authored almost 12 years ago
Update README.md

b011ddd2d950edfccc020286931f43088c300dc8 authored almost 12 years ago
Initial commit

7972a156fc441c33cd6ddd60c9ff793fc523c3ff authored almost 12 years ago