Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
https://github.com/ocrmypdf/OCRmyPDF
efce7de9aea113015babdb0a301bcb2b5bd47c0c authored about 11 years ago
concatenation is now done also with ghostscript
38c64ac689aa9a3d40d0b786c8dbd38fc4e6e5c9 authored about 11 years ago6d203e3eee87137220ca0275a95ad2b2973f9126 authored about 11 years ago
81f461e5576af86859fd55a452bbb3071afd7cfc authored about 11 years ago
mktemp: consider both FreeBSD/OSX and Linux OS having incompatible
syntax
From now on temporary ...
aedbabdbe8d0965612a4eaeb409e33b56a510808 authored about 11 years ago
6ed53e53c7d95d9245f372b60204fdb078fb9834 authored about 11 years ago
Signed-off-by: Mansour Behabadi <[email protected]>
a78630ce99c9ec8844f3c6a14b1a195b535909c0 authored about 11 years agoSigned-off-by: Mansour Behabadi <[email protected]>
6653066784275ccacd2d600405b23bf88d0b9e41 authored about 11 years agoe40f1fa0811e9fb06100c548453c5aa406a6a8cc authored about 11 years ago
to be make which parameters are allowed to be changed by the user
a872ce751d33f9a7faf88eb9927839b3f19ae2b5 authored about 11 years ago317846fbdca262026c76c6c1bb72836fae95791b authored about 11 years ago
Fix temporary folder name generation collisions
f581a5554416cc00dfe659b6edba9942eaf93ba0 authored about 11 years ago447b291e70aa45ab74ff3d7484db834afcfb732f authored about 11 years ago
01d07253e84c92753f9ca4f31cce201fe291ff9e authored about 11 years ago
Fix AttributeError on self.width if Tesseract finds no OCR text
034a4660942bf25072821c79de5907bbaa8d4502 authored about 11 years agoVerify that pdftoppm is the Poppler version, not xpdf version
c6211e23354de7110f66f1b94e6b2d39e31f8ccd authored about 11 years ago1d03a6417dce15b179e72f2e259a328d8d8bcedc authored about 11 years ago
self.width remains undefined unless hOCR finds text. It might not, if
a page contains only an i...
First, the regular expression matches everything after the first period
in a filename. Adding t...
bf02ee3bdc7399fd5d34b9229d493eff6b754671 authored about 11 years ago
a3c7fba02d5cff767eb7174707cadb2562e75d32 authored about 11 years ago
Tell the script that "nbImg" is a number, so that leading/trailing
spaces are removed
20c008b84fd0383929f788c01e43592bb4c8e55c authored about 11 years ago
Check if reportlab and lxml are installed, otherwise exist with an error
7cd73566bef8f93385267da4e87fb03b7819e6fc authored about 11 years agoe56fd53d0649147d27a499abb93ecdd5e7ad8448 authored about 11 years ago
Fix pdffonts error when filename contains a space
810b1b3b3e335e944524624f840df961c7b63ee9 authored about 11 years agocb0b033fe76ee91605eece0d71d3426b67c57d74 authored about 11 years ago
46f673a3b7696b2b4357d39ff732b3cfb2c1939d authored about 11 years ago
455303b3d47a413086be96641ad1535d6ce1fb6e authored about 11 years ago
24a84d63803adb1b46ee7b252ed7e66cbc9683ae authored about 11 years ago
9aa21710522896f6aaacc47e383b95f3ebe75181 authored about 11 years ago
3a46ea1f3660536ccb1ddcb1326795d3b30563f5 authored about 11 years ago
d33779f301d4c116b5d179a70407cd8699f63335 authored about 11 years ago
d6ea0793b8434c6a879f22b1ce506ac47a65eba5 authored about 11 years ago
4e5e5bb92539f56a4043bbabc9ab7257c5ffcf15 authored about 11 years ago
3232ed8e38c98ba23d7d61cdf72dcd286d7a3cea authored about 11 years ago
29d6748af8f372ddc2360420a355b0bbc0b8a22a authored about 11 years ago
828f1950716d96c6e455974ef16ca7d148f4cf35 authored about 11 years ago
b0b7e327830030d56b8bc24c6b1573ed8623b858 authored about 11 years ago
fixes #41
versions older than 3.02.02 are known to produce invalid hocr output (in
some cases)
940a016e952f7faab737d008d80faf02ee8abb8c authored about 11 years ago
If deskew and/or cleanup is not requested, do not copy the files, but
just create symbolic link....
54f47ab89bb4c7ef8117e6d667c227191c2106e6 authored about 11 years ago
In order to have the debug page after the normal panel in the final PDF
file
414c4e3f3cad4c12278f30211d514c68e268fa2b authored about 11 years ago
6a9f38d31eaae7338292736bb2cd8fa06e89a9c1 authored about 11 years ago
The x/y resolutions are not computed separately anymore.
We do not check anymore if x and y reso...
8a1241ba44e7d9aa028b42f45f2455196bfc8dca authored about 11 years ago
7eab052e0f9d19f279fc46670880ab026ad2d658 authored about 11 years ago
552d19e36b01e74f8efbf93ebf3cdbe5aae0cf76 authored about 11 years ago
463b04e795ac5eae256cc6f3ea4e9ba401d85c7f authored about 11 years ago
c0d8508264afe6e856d25c7c55ce046a3cc638a1 authored about 11 years ago
6ef4ba31e21dbac8f35f23d50539967948d38e35 authored about 11 years ago
10a3d26291a9f2ff74253b50651ece0b8a44635c authored about 11 years ago
ab994b32eefc0dbc0dd65b31973461e69cda4171 authored about 11 years ago
9352b71d787c7309983cec21cf63a2040be3c9e3 authored about 11 years ago
71593421ed6159505b80ea924f922f797192910d authored about 11 years ago
2754970f37d5d2302851080d4c91d37e5dcced4a authored about 11 years ago
Fixes #16
5945454597cb11df2fb8e4e75f8368055b03d4be authored about 11 years ago884dbce712c6107405e4feb6aa766e697f69a302 authored about 11 years ago
8ee1bc659821c7a62b0e0e538cef8997e0e4792a authored about 11 years ago
7d76c467312485e3814df0242e13ccd5be35d8a8 authored about 11 years ago
f8ccf42c06b4040e10d7a45562922273877e447d authored about 11 years ago
0abe0f1f10c38d6ed51a7f6fe75b360cacde7c5c authored about 11 years ago
ee8a5d80ff17d929d6319bb17edf538247d576cc authored about 11 years ago
f08893b5c8dfa5e6af12d30df626920d52011d33 authored about 11 years ago
Fixes #35
41cd88506e040597b0252936ee4f91837bc34dbf authored about 11 years agofile committed by mistake... So deleting it now
081223b1381811bac5be7ed8b7df9b4891aa0c8a authored about 11 years ago4e60c9ba0964e0bcad87d8c796951e16a48d0a46 authored about 11 years ago
- Oversampling resolution can now be set from the cmd line (-o option)
- If a page contains more...
- If resolution is too low (<250dpi) perform automatic oversampling of
the image
- comments impr...
045362425f2170c7a4438e39fadeed29371ab8b8 authored about 11 years ago
2b2637fbc3f43710a0ea2d155c7aea8cf6898601 authored about 11 years ago
- dpi computation moved to in dedicated function
- do not exit in case of resolution mismatch (f...
407670e1f3a5e1a2023146885962ec7f6986b40f authored about 11 years ago
d0671d81b5c91fa97534165e6b25f509633a63ca authored about 11 years ago
7a74ebbcc3cf60633a03802bc8a12e4e9b88d9d9 authored about 11 years ago
9e698003326e5a7e1266b7f85d998b3436641a66 authored about 11 years ago
== does not exist in bourne shell
7542188592a46252bc817b88db16a4c50aa1fcb8 authored about 11 years ago
tell GNU parallel to protect against evaluation by the sub shell (-q
flag).
This is required in ...
-Constants moved to config.sh
- Use "python2" cmd instead of "python"
- few other minor changes
50dee556069f01ddd22e5ecdd8209fff1d4d5cf1 authored about 11 years ago
da5cd01fe48cb62954f38200fe677d4ec379c017 authored almost 12 years ago
new feature: Process several pages in parallel if more than one CPU core
is available
88ddeb1fb64a72ce8e95bf293c4d0e0d51b3079e authored almost 12 years ago
f9e2e74bf3846af7a05bb9ed1035fd9a4427d7fd authored almost 12 years ago
87e01aff607c56ad7e315669a22af9b745708d83 authored almost 12 years ago
Removed feature to add metadata in final pdf file (because it lead to to
final PDF file that doe...
- basic implementation of parallel page processing using GNU parallel
- processing around 40% fa...
Conflicts:
OCRmyPDF.sh
Fixes #31
064d4be83cf406a51260e5ac75c4d6531d5e3b75 authored almost 12 years agofixes #31
ab536d5678b8441f8830d5add1603d25f620f4fd authored almost 12 years ago
Required to perform OCR of several pages in parallal (using GNU
parallel)
f7923a9761e22a5156000b8ee0c7f1bbf21e7f22 authored almost 12 years ago
fd52650255060b9dabaf6b5665637ac5e147a5ba authored almost 12 years ago
f0fe2951752fd23544e2bed787e47cd6b6e7a85f authored almost 12 years ago
.gitignore file corrected, because it prevented some required jhove
binary files from being chec...
5aa27343e0d399163e1cc5b719c27ae86045bdb6 authored almost 12 years ago
Deleted number of jhove files that are not required
(documentation and java source code mainly)
...
e4ffb58269a5ce55e223eb4dd87c6f2b728c54ad authored almost 12 years ago