Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

main.txt: wrong pdfminer

github.com/ocrmypdf/OCRmyPDF - e3fce112ed546464525c0bcd3b8c6ff3449cd742 authored about 6 years ago
Mention v6.2.5 release

github.com/ocrmypdf/OCRmyPDF - eacd26a68bd8cde34885d898515b02b4d84b4d53 authored about 6 years ago
Update v7.3.0 release notes

github.com/ocrmypdf/OCRmyPDF - 0e88b3c38a397f4b86c4d80138169d3a6c6d7108 authored about 6 years ago
test: test version check code

github.com/ocrmypdf/OCRmyPDF - a2170ef8d6b4eb1ee6ce550cb380c7df0ea30e6c authored about 6 years ago
Update requirements

github.com/ocrmypdf/OCRmyPDF - eed04243904909e7be61f038d4c17f757fbab7ba authored about 6 years ago
Fix "no languages" test and misuse of os.environ

github.com/ocrmypdf/OCRmyPDF - 5ed05e08b1f71e663316a4b2c8b1c84967c22701 authored about 6 years ago
Leptonica: learn to despeckle 1bpp images

github.com/ocrmypdf/OCRmyPDF - 58b26f6715dde2f1cdf2f9960c6986144c85eeeb authored about 6 years ago
leptonica: reduce boilerplate for PIX (2/2)

github.com/ocrmypdf/OCRmyPDF - 806daf42846de7cefc176f05bbb42472bc3cd4de authored about 6 years ago
leptonica: reduce boilerplate for wrapper classes (except PIX)

github.com/ocrmypdf/OCRmyPDF - c64bc9329ed8d58390a5de2049646db932f92141 authored about 6 years ago
Leptonica: add masked threshold fn

github.com/ocrmypdf/OCRmyPDF - dd0174551932fe70a307c29d14928d9ba0064a5c authored about 6 years ago
Fix two failing tests

github.com/ocrmypdf/OCRmyPDF - 501ce726e719d789963cb7d8be74320b95e1ca5e authored over 6 years ago
Leptonica: reduce verbosity, more error trapping, more garbage collection

github.com/ocrmypdf/OCRmyPDF - 03076e89cee7715f07979b28c7b51160126d4ddb authored over 6 years ago
Integrate barcode masking

github.com/ocrmypdf/OCRmyPDF - 02f37293ee8459bddea26e499bab4a3020a09149 authored over 6 years ago
Leptonica: Add barcode API

github.com/ocrmypdf/OCRmyPDF - 590942ad14c91df4a1455496bfbf8fadb0541bf2 authored over 6 years ago
test: Add a basic redo OCR test

github.com/ocrmypdf/OCRmyPDF - 2ac028c7590611cd1af09f51668a13b42838b0dd authored over 6 years ago
Remove text detection from our parser interpret_contents

It's redundant now

github.com/ocrmypdf/OCRmyPDF - 2125b5bfab7913b1a765bc12b5c911ed124a8122 authored over 6 years ago
Only do detailed page analysis when needed by --redo-ocr

github.com/ocrmypdf/OCRmyPDF - b96532caa4f95990d82df091929a39bd4da9c3d6 authored over 6 years ago
Move Ghostscript text analysis into its own module

github.com/ocrmypdf/OCRmyPDF - 995fc58466535ffe26b9e833f3f4c0893e91f4b6 authored over 6 years ago
Make pdfminer Type3 patch conditional on PScript5.dll

It appears that PDFs created by this software have a bug in their BBox
which will cause us to mi...

github.com/ocrmypdf/OCRmyPDF - c023cae299ddb5bfd5c8bfa32ab7a3c750105ebe authored over 6 years ago
Exception message not printed in some cases

Closes #310

github.com/ocrmypdf/OCRmyPDF - 237eaf9130444c2d1013b803a1f4b553342e5a2e authored over 6 years ago
coverage: test compile leptonica

github.com/ocrmypdf/OCRmyPDF - 8b9ab25125d0b38152e03d2104b3a1c53d9a13b3 authored over 6 years ago
coverage: ensure get_orientation is checked

github.com/ocrmypdf/OCRmyPDF - 77e87abe8f39f0511e28359b3265a6b742420b1b authored over 6 years ago
coverage: improve leptonic; don't create objects with null pointers

github.com/ocrmypdf/OCRmyPDF - 3be02e1e8debdf1aedf5f9cb5ec02f4ccad11b4c authored over 6 years ago
leptonica: barcodes, BOXA

github.com/ocrmypdf/OCRmyPDF - 64c9ede979059f580a1f02f0dbad84e943e2e4cc authored over 6 years ago
coverage: make it more likely timeout is tested

github.com/ocrmypdf/OCRmyPDF - 5b8d197812656fe0c3bfd23e9a63ed95c4915e34 authored over 6 years ago
coverage: ensure rotation is actually tested

github.com/ocrmypdf/OCRmyPDF - 2cba62dc4f2358e6cae4b6d58245e1af26845a1e authored over 6 years ago
coverage: add qpdf

github.com/ocrmypdf/OCRmyPDF - 288e28328f21099210a1cb92aca7a956b8ebfb91 authored over 6 years ago
coverage: exclude unicodefun.py

github.com/ocrmypdf/OCRmyPDF - b8214b3c492b1dcaa0268022f2790045314bed43 authored over 6 years ago
Set up code coverage (it works with multiprocessing now!)

github.com/ocrmypdf/OCRmyPDF - 86816939942444ac7a12041762cb4319bae160e5 authored over 6 years ago
Fix failure to pickle file with AcroForm

github.com/ocrmypdf/OCRmyPDF - 1364c63b7c89046af4ccb959177639cac68d0dad authored over 6 years ago
Add AcroForm detection

github.com/ocrmypdf/OCRmyPDF - 4ba9e8fe256f6c28c816da0fa4d183760751b44d authored over 6 years ago
Throw exception on corrupt text

github.com/ocrmypdf/OCRmyPDF - a195713bb4355f847858804652abdcf5cd572e01 authored over 6 years ago
Require pikepdf 0.3.7

github.com/ocrmypdf/OCRmyPDF - 600d31a9075406dc04616bf707f09232a2d9de3b authored over 6 years ago
Add corrupt text warning (when using --redo-ocr)

github.com/ocrmypdf/OCRmyPDF - be31cec33217977bf58ab85a93c08e31d6e49f03 authored over 6 years ago
Add argument checks for --redo-ocr

github.com/ocrmypdf/OCRmyPDF - 22a7cd34210334334b3a5c8035b4db78e9d3d5d3 authored over 6 years ago
pdfminer: If font descent claims to be positive, treat it as negative

github.com/ocrmypdf/OCRmyPDF - 8b61d2d5214ca236ea4bef182901f7147456b626 authored over 6 years ago
Ensure inline image is parsed correctly

Requires pikepdf > 0.3.6

github.com/ocrmypdf/OCRmyPDF - 559e5269d2ecadf06584310830b5898e60cbc7b2 authored over 6 years ago
pdfminer patch: Type3 font height calculation is incorrect

Not sure where it goes wrong or why it needs special treatment, but
this does address it.

github.com/ocrmypdf/OCRmyPDF - ebf6acb3186eacccdd48e703bb13ce9b3a4c547c authored over 6 years ago
pipeline: fix bbox coordinates

github.com/ocrmypdf/OCRmyPDF - 7acd75f013830f8d170d7278743510db68bed0d7 authored over 6 years ago
Refactor TextboxInfo

github.com/ocrmypdf/OCRmyPDF - 93623b2226db2943940d9dff5d3ca30eb77a1a46 authored over 6 years ago
layout: allow names beginning with /i0123 for now

Showed up in GGastro2.pdf. Need to check if this pattern has valid
Unicode mappings but allow fo...

github.com/ocrmypdf/OCRmyPDF - d71fd089cb9beddfca6e15eb66e09de297f9baae authored over 6 years ago
Require pdfminer

github.com/ocrmypdf/OCRmyPDF - 05aa43c856499d1f6b7d1d6f328fb71f0881ca08 authored over 6 years ago
Fix some failing tests after --redo-ocr changes

github.com/ocrmypdf/OCRmyPDF - de80fb6bc8edb7fc4f57a75255f554d11da97647 authored over 6 years ago
Document --redo-ocr more accurately

github.com/ocrmypdf/OCRmyPDF - 8e396f4be2932851ae5e3a794a6b7968bf1030a6 authored over 6 years ago
Fix error on serializing bad character markers

(Since they held a reference to their font, which in turn, had an
open file handle.)

github.com/ocrmypdf/OCRmyPDF - efec6da377f1c7f43c8b854525f567e8a6efb6a0 authored over 6 years ago
Fix corrupt Unicode mapping detection's false positives

github.com/ocrmypdf/OCRmyPDF - 00ef53195ee7572d938ec21e10480672b53ec22b authored over 6 years ago
Remove only_ocr_text

github.com/ocrmypdf/OCRmyPDF - f564aaf485aa530b8fcb50fcc5ca7ee949b7eb2d authored over 6 years ago
Redo OCR can now handle visible and invisible text, so adjust accordingly

Still can't filter out corrupt text

github.com/ocrmypdf/OCRmyPDF - 5ac2d31d0d5869fde1d0deacc740ec52f87d1d13 authored over 6 years ago
pdfinfo: further layout improvements

Rather than grouping visible/invisible in a custom analysis step,
use pdfminer's analysis and it...

github.com/ocrmypdf/OCRmyPDF - fda890ab47e130cf24c1a5cafa2285513a3a20a2 authored over 6 years ago
Fix some recommendations from LGTM (#309)

* Fix unreachable code

This fixes an issue reported by LGTM.

Signed-off-by: Stefan Weil <s...

github.com/ocrmypdf/OCRmyPDF - a873278c2a27808a164ae5d852b20061d920efc2 authored over 6 years ago
pdfinfo: formatting

github.com/ocrmypdf/OCRmyPDF - e6d64be89062e99ca68a7368cd43298dad62e4b6 authored over 6 years ago
pdfinfo: all -> not any

github.com/ocrmypdf/OCRmyPDF - 0e4d978d200b372318539c2b3f855fb9632a7915 authored over 6 years ago
Fix handling of Type3 fonts with no ToUnicode mapping

github.com/ocrmypdf/OCRmyPDF - b12c2cfedf0a2673f17390e769b81e060edfbf55 authored over 6 years ago
Reorganize around getting bboxes for visible/invisible text

github.com/ocrmypdf/OCRmyPDF - 58cc70725e0c64b02c2ec3293a7979c46071816c authored over 6 years ago
--redo-ocr now works in the presence of printable text

github.com/ocrmypdf/OCRmyPDF - 339afb02aa17ba244deeca1fd18cbe73f5e906f8 authored over 6 years ago
Fix strip invisible text bug: missing BT operator

github.com/ocrmypdf/OCRmyPDF - 7ba0ff5c363d76cb401011dd32dccd6576f3f6cd authored over 6 years ago
Add pdfminer based layout analysis

github.com/ocrmypdf/OCRmyPDF - ff41fbf67347a9d7261f3a65825d655f7f1ee832 authored over 6 years ago
Move pdfinfo into a package

github.com/ocrmypdf/OCRmyPDF - 2435cd23cea5858c1d63871c53a41e95dad6af53 authored over 6 years ago
Rename/expose strip_invisible_text

github.com/ocrmypdf/OCRmyPDF - a063cff720d00b5362e3e0891797bb0526958c7f authored over 6 years ago
option check: Remove always-True condition

Both renderers are now lossless reconstruction-capable. (Have
been since 7.0)

github.com/ocrmypdf/OCRmyPDF - 0d396e1ac07cb141cadb3f9a69b69b47a2253417 authored over 6 years ago
Require pikepdf 0.3.5

github.com/ocrmypdf/OCRmyPDF - f5807a2053118ae33a2fb2c0d55046a8f247acc6 authored over 6 years ago
Fix KeyError 'has_vector'

github.com/ocrmypdf/OCRmyPDF - eb4938a36f78ba5feae8d0ae18c88fa85c6fd8e5 authored over 6 years ago
pdfinfo: reminder about 'INLINE IMAGE' sentinel

github.com/ocrmypdf/OCRmyPDF - c5ad530bbf383aed627ec6d3e469e48698e7d91c authored over 6 years ago
Redo OCR: disallow in cases that will damage the output PDF

github.com/ocrmypdf/OCRmyPDF - d11c428407f42d7cbcee77efbe48132ba7b5d472 authored over 6 years ago
Merge branch 'feature/remove-vectors' into feature/redo-ocr

github.com/ocrmypdf/OCRmyPDF - 6182b1f53eb7af0e314369856e24ffa8ee04e0db authored over 6 years ago
optimize: should remove unreference resources too

github.com/ocrmypdf/OCRmyPDF - 00fc1a12e2b153c1daf3260ea764f243a46c4d78 authored over 6 years ago
Add functional "redo OCR" feature

Needs argument validation and some other changes. Needs testing
with mixed-content PDFs.

Only r...

github.com/ocrmypdf/OCRmyPDF - 16af753206e509fb4276f6d6235829448307814f authored over 6 years ago
Add feature to remove vector graphics objects

github.com/ocrmypdf/OCRmyPDF - fa48205bb8fa74484f542ea1f9ba6ef02f5c2135 authored over 6 years ago
pipeline: if vector graphic objects exist, ensure the DPI is reasonable

github.com/ocrmypdf/OCRmyPDF - f7dbf94071d1e23e55ead96c3cfee13464d80e98 authored over 6 years ago
pdfinfo: learn to detect vector graphic objects

github.com/ocrmypdf/OCRmyPDF - b18e66e2ca0edc8730b0115a08bbafa4b18f7054 authored over 6 years ago
pdfinfo: fix terminology (operands, command) -> (operands, operator)

github.com/ocrmypdf/OCRmyPDF - 7a5504dfa5f9326fce5158d03e72b285fe52ef6e authored over 6 years ago
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - d1cad7bc687a7b5a2c70726419895c9386f92ffe authored over 6 years ago
Add Fedora install instructions. (#304)

* Add Fedora install instructions.

* Fix path to fedora_rawhide badget

github.com/ocrmypdf/OCRmyPDF - c58d5c097cc4b008476cd84e4b7166b4b9abbf29 authored over 6 years ago
docs: some redundancies

github.com/ocrmypdf/OCRmyPDF - 46157ca94ec6985195a2c79ffd7ee5f803f6eae4 authored over 6 years ago
Fix broken badges in README

github.com/ocrmypdf/OCRmyPDF - dd99511bcc2e356e30024072e1bc2b158f9c4952 authored over 6 years ago
Removed extra word from docs (#303)

github.com/ocrmypdf/OCRmyPDF - 5bc2efd3c7cfc3940e27766d4116ed4f7a94355c authored over 6 years ago
Fix filename test.txt

github.com/ocrmypdf/OCRmyPDF - 1b18dbecf50f6e6682b819ec821fde4bc090d547 authored over 6 years ago
v7.2.1 release notes

github.com/ocrmypdf/OCRmyPDF - 9f82c0eb6e9892356f710e9d7e392dc7e295747e authored over 6 years ago
Fix compatibility with pikepdf 0.3.5 API change

github.com/ocrmypdf/OCRmyPDF - 68bac1b177be76a9f81f0745a7f6fc4447415905 authored over 6 years ago
Remove cruft to support leptonica < 1.72 in test suite

github.com/ocrmypdf/OCRmyPDF - 1495b78330a151b2c1a745a763dfa0339edf57be authored over 6 years ago
Include Debian copyright file

github.com/ocrmypdf/OCRmyPDF - 6f777d2848ed7d12d686e2f28deb677cd6846937 authored over 6 years ago
Cleanup MANIFEST.in, reorg requirements/*.txt, fix non-Unicode readme

github.com/ocrmypdf/OCRmyPDF - 5650eba84822c1bbc9e68cde8339ad4d1a634006 authored over 6 years ago
v7.2.0 release notes update

github.com/ocrmypdf/OCRmyPDF - 5bc5dc93f30e2fd70c27edf5d28e1b846ad49fc6 authored over 6 years ago
optimize: Exclude soft masks (SMasks) from optimization

Soft masks are only allowed to be of colorspace DeviceGray so we
shouldn't use pngquant on them....

github.com/ocrmypdf/OCRmyPDF - c1e18bb825d75ce19b47801aeb5d02e5520832fe authored over 6 years ago
optimize: more refactoring

Now properly generalized/specialized where it should be

github.com/ocrmypdf/OCRmyPDF - 58282ea0fb15249b88e793d0dae68aba54149186 authored over 6 years ago
optimize: refactor image extraction

github.com/ocrmypdf/OCRmyPDF - 891da7834cf4991dadcedb1743b972216f2b9b22 authored over 6 years ago
optimize: Reorganize so JBIG2 can be performed on images reduced to 1bpp

Closes #297

github.com/ocrmypdf/OCRmyPDF - 5c229d48d5023bf74e76e943c7ecb1b8c942c691 authored over 6 years ago
Travis: use newer macos image

github.com/ocrmypdf/OCRmyPDF - 53f660cf3595f78e684d91c19f08684375120325 authored over 6 years ago
...and document lossy JBIG2

github.com/ocrmypdf/OCRmyPDF - 7b66ca68f2a3af4b5c18f1d9ec7e8649757d29df authored over 6 years ago
requirements: request pikepdf 0.3.4

github.com/ocrmypdf/OCRmyPDF - ba71c3ffbd34cf3d0666d4f13199367a409acbcc authored over 6 years ago
v7.2.0 release notes

github.com/ocrmypdf/OCRmyPDF - 6707ad427a808505ccf4512917db553330db6858 authored over 6 years ago
Change JBIG2 lossy mode to require --jbig2-lossy

github.com/ocrmypdf/OCRmyPDF - 5b84549716260afe8d62436051f38752d65feb1b authored over 6 years ago
Refactor the detailed error messages

github.com/ocrmypdf/OCRmyPDF - c74f2ee6e8f35871b50180d7c926cb20c79d2f46 authored over 6 years ago
Fix lossless JBIG2 when there are multiple JBIG2 images on a single page

github.com/ocrmypdf/OCRmyPDF - b32dd9f9d36b0599b73568b7a34286781d13392f authored over 6 years ago
Fix suppression of tesseract config error messages

github.com/ocrmypdf/OCRmyPDF - fb8b161f6c0faaea66cb9f4e00d416983a5130d4 authored over 6 years ago
Remove libtiff from Brewfile

For some reason, brew complains about it now.

github.com/ocrmypdf/OCRmyPDF - baddd6d233a1b39f7f18e37d51453d6475f6b828 authored over 6 years ago
tesseract: account for behavior changes when params are missing

Tesseract 4.0-rc1 now accepts invalid parameters in config and
won't return an error anymore. We...

github.com/ocrmypdf/OCRmyPDF - 6f554c6ae885a552c1aead27bf0f7e6c23960f55 authored over 6 years ago
test: fix pytest warning about direct use of a fixture

github.com/ocrmypdf/OCRmyPDF - a71e4488b3e5abb5c1cf8277e5f06e6884cc23c5 authored over 6 years ago
Degrade more gracefully when --optimize is set but JBIG2 is not present

github.com/ocrmypdf/OCRmyPDF - 72156b5653289e3587879450fa3ab6fb7f499b11 authored over 6 years ago
Test: send stderr to stderr, why don't we?

github.com/ocrmypdf/OCRmyPDF - 9fa471e05373e7d50ea9f76a5d0c522fedd4ba0b authored over 6 years ago