Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Fix issue #275: doesn't work when installed in non-Unicode path

Closes #275

github.com/ocrmypdf/OCRmyPDF - 58642aa98b7df984b42e7a292d8f60e5c02e6c69 authored over 6 years ago
Fix wrong return code tested

github.com/ocrmypdf/OCRmyPDF - 7baaf00a38d7db757710d8627b262a6f28349bbd authored over 6 years ago
pdfinfo: more robustness

github.com/ocrmypdf/OCRmyPDF - 5cc23dbf2413ec2a296d3af58f70a8ed85e7d410 authored over 6 years ago
pdfinfo: improve the regex

github.com/ocrmypdf/OCRmyPDF - 216d60ea2c2f227a6772fe5eedbbbd8d761cb103 authored over 6 years ago
Fix invalid XML characters choking parser

github.com/ocrmypdf/OCRmyPDF - 8b0496d35ed94f37d52303de39f182951342c96e authored over 6 years ago
Return a distinct error code if PDF/A fails

github.com/ocrmypdf/OCRmyPDF - e44001641c49324ac7ff4888dffb14f07ba048fa authored over 6 years ago
Remove initial qpdf.repair

Since pikepdf is doing the work the initial repair takes time and gives
little benefit.

It turn...

github.com/ocrmypdf/OCRmyPDF - 47885f4230dd0f0618eeb4be025c56d2fd9c1040 authored over 6 years ago
ocrmypdf.exec: trap FileNotFoundError too

github.com/ocrmypdf/OCRmyPDF - 921767e82e89f859694f0d300b97a570568376ab authored over 6 years ago
Add test to optimize if jbig2 is present

github.com/ocrmypdf/OCRmyPDF - 85f96b7fb0179363ad7a0ba02a9d1b9637772a5a authored over 6 years ago
optimize: allow modification of quality settings in command line mode

github.com/ocrmypdf/OCRmyPDF - 890c7fd0f659a55567e49e10dce5824450a55631 authored over 6 years ago
Don't use --optimize in test since jbig2enc is not always installed

github.com/ocrmypdf/OCRmyPDF - 39c44bdd2f3438436e40d484d98e72cfd7ea96f0 authored over 6 years ago
Upgrade to Py3.7 locally and resolve a few issues

github.com/ocrmypdf/OCRmyPDF - 5f99f7f6ca39a5cf9e892667138b517c89964a85 authored over 6 years ago
Update macOS Brewfile

github.com/ocrmypdf/OCRmyPDF - 4f864bce983646d1e362c1ea130ebaf45f6fc77d authored over 6 years ago
Make jpeg/png quality tunable args

github.com/ocrmypdf/OCRmyPDF - 2974929b26036d0ac24956a48f0bf5996ceabc38 authored over 6 years ago
Improve release notes

github.com/ocrmypdf/OCRmyPDF - db837aa55cbb316e45ddd40299d181a21549a9a3 authored over 6 years ago
Fix installation for Python 3.7

Need to use private fork of ruffus for Python 3.7. Backward compatible with Python 3.6 for ruffu...

github.com/ocrmypdf/OCRmyPDF - 72006230078727f166f595e3ffde02613a422209 authored over 6 years ago
Hopefully workaround Py3.5 marshal error

https://github.com/eliben/pycparser/issues/251

github.com/ocrmypdf/OCRmyPDF - 73e02ae4ea12397847b34306a8230e12766a2ea7 authored over 6 years ago
Update test cache with naming rule change

github.com/ocrmypdf/OCRmyPDF - d4cbef94571917beb1ade4e01843df6ccadd8511 authored over 6 years ago
Optimize some of our bigger test files

Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without ...

github.com/ocrmypdf/OCRmyPDF - ed8ff79e101d8dfe7faefa470ed876f36140aa9d authored over 6 years ago
Add test case to ensure mono is not inverted

github.com/ocrmypdf/OCRmyPDF - e725f64b6a2c656df2c170b1b500b45f4e4f2d54 authored over 6 years ago
optimize: fix PNGs that were reduced to 1-bit being inverted

At some point the color gets flipped, we have to flip it again,
for mono.

Incidentally this exp...

github.com/ocrmypdf/OCRmyPDF - 0029cc4fe75214f7ab4a356646286becff1a19d3 authored over 6 years ago
Fix test resources naming inconsistency

github.com/ocrmypdf/OCRmyPDF - 9637696a546d4fd115d65393762b5719388a3315 authored over 6 years ago
Compress test images more heavily

github.com/ocrmypdf/OCRmyPDF - 02b3ca6862be7a4036460d2fa71df80d04cf55a1 authored over 6 years ago
Replace all Pix.read with Pix.open

github.com/ocrmypdf/OCRmyPDF - bc90f40a8fd76d3a30311e6d55a2bd188dd34a29 authored over 6 years ago
Fix leptonica remove_colormap was replaced with a no-op at some point

github.com/ocrmypdf/OCRmyPDF - 3d727ff4c03c884e55c1ad692826fc3cc725076a authored over 6 years ago
Add Python 3.7 support

github.com/ocrmypdf/OCRmyPDF - b0eacd6586c9e8217f9b4924e7093ae976360e7d authored over 6 years ago
Merge branch 'test/ignore-masks'

github.com/ocrmypdf/OCRmyPDF - 779570159581e3c9509ad50b64106ddc719ca8a3 authored over 6 years ago
Use newer pikepdf API for objgen

github.com/ocrmypdf/OCRmyPDF - bf214eecb36e6914cb040e2a4897b70fa2fcafea authored over 6 years ago
optimize: skip incremental images if any

These are fairly rare

github.com/ocrmypdf/OCRmyPDF - 434b96d7348c691b348ecd5af0bdf55f16533148 authored over 6 years ago
optimize: use new pikepdf api for objgen

github.com/ocrmypdf/OCRmyPDF - b9dc1098928e0886f54b09a59eeb7cc9322493e0 authored over 6 years ago
Use qpdf 8.0.2 backport, force old pytest-timeout to fix build

github.com/ocrmypdf/OCRmyPDF - 1f40a7055408e30628470c724ff694af3d67f7da authored over 6 years ago
v6.2.1 release notes

github.com/ocrmypdf/OCRmyPDF - e14ffbf03f7be91319e856de66315b9475279cd2 authored over 6 years ago
Fix recent versions of tesseract not registering as textonly_pdf

This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04

github.com/ocrmypdf/OCRmyPDF - 25a1dde57cbab92ef0a6e2f077a37eac75f84ff6 authored over 6 years ago
Ignore whether or not textonly_pdf was used in cache

The difference doesn't matter in 7.0.0 anymore.

github.com/ocrmypdf/OCRmyPDF - bf96171b6514c010189fd80e5baca34b8f18e141 authored over 6 years ago
Fix recent versions of tesseract not registering as textonly_pdf

This change happened sometime after the 4.0.0-beta1 release in
Ubuntu 18.04

github.com/ocrmypdf/OCRmyPDF - b7ff821fa3f6318b3ae45f66526f038209f9854b authored over 6 years ago
Regenerate test cache

github.com/ocrmypdf/OCRmyPDF - b81daf71d1b8d651f9158eae28b4f968adc0e440 authored over 6 years ago
Reactivate two tests that weren't using their fixtures properly

github.com/ocrmypdf/OCRmyPDF - faad1fc58ae85b82c60c19a9451dd1f810ef87d9 authored over 6 years ago
Disable a pylint

github.com/ocrmypdf/OCRmyPDF - 6f48181a56abc608a3c88ce81c88ed213660f2f1 authored over 6 years ago
pdfa: fix function using closure when it shouldn't

github.com/ocrmypdf/OCRmyPDF - f1305e5a375018c2093c633b2b1f5d023ad48d1e authored over 6 years ago
leptonica: fix variables defined on class outside __init__

github.com/ocrmypdf/OCRmyPDF - f0e0f92776aafffc2cac7fea14995c1f198bd868 authored over 6 years ago
Trailing whitespace

github.com/ocrmypdf/OCRmyPDF - 807c8b072638c7f8852f559ef640f88ed1b3f3d7 authored over 6 years ago
Cleanup some cases where log was lazy and should be

github.com/ocrmypdf/OCRmyPDF - 6333ec928ca679b79d3070310eb22704671a8871 authored over 6 years ago
pipeline: search_window variable not actually used

github.com/ocrmypdf/OCRmyPDF - cd220d9ed9b82d94812ecfdd00cea8c1662048f3 authored over 6 years ago
tesseract.get_orientation: removed unused language parameter

github.com/ocrmypdf/OCRmyPDF - 76532649b829cdbf2650fdbfe7ae9911780157f5 authored over 6 years ago
Cleanup unused imports

github.com/ocrmypdf/OCRmyPDF - b0dbaeafc52de3f77211e0cd7f618961858e9490 authored over 6 years ago
Fix several pylint errors and warnings

github.com/ocrmypdf/OCRmyPDF - 2530d1791bc608010ba061a2e3b07a5a3523d5d9 authored over 6 years ago
Remove qpdf.merge

We no longer need to merge pages this way. Much of the functionality
was there to implement page...

github.com/ocrmypdf/OCRmyPDF - 94150f414adf154bda55573e1c442f91cbd86119 authored over 6 years ago
Remove special of TypeError from ruffus

split_pages would still run if repair_pdf failed, for some reason.
Since we are no longer splitt...

github.com/ocrmypdf/OCRmyPDF - 54e74f84cc4b438fcbfecc3c55776e4b9d39a5f4 authored over 6 years ago
Replace several uses of str(path) with fspath(path)

Helps make it more explicit. Did not do this to tests because use of paths
is more involved there.

github.com/ocrmypdf/OCRmyPDF - 76e7e8dbbb5d9fe920ddaa5ec9c1bff5182d81bb authored over 6 years ago
Remove helpers.universal_open()

This helper function only had a single usage, this was always an awkward
way to support Python 3...

github.com/ocrmypdf/OCRmyPDF - 324598e9924b908730a317455710920ec4589050 authored over 6 years ago
Rename _optimize to optimize.py

github.com/ocrmypdf/OCRmyPDF - 9e765ddf4644c70b8bf85433c947d76810d5c239 authored over 6 years ago
Fix PEP8 docstring convention misuse in a few places

github.com/ocrmypdf/OCRmyPDF - 6ac9e92f17f86e4aac38d6c7e804f4103921caee authored over 6 years ago
Ghostscript, PDF/A: support pathlib

github.com/ocrmypdf/OCRmyPDF - faaa4a1def562ce8ca83884e4cc9f9a9952d6685 authored over 6 years ago
Remove fitz from Travis

github.com/ocrmypdf/OCRmyPDF - 0aa51f0f3ab3685f35b368533db747cf38dc3e26 authored over 6 years ago
Remove obsolete _naive_find_text

github.com/ocrmypdf/OCRmyPDF - 73431d9761eb10be02acd06e602b8627e9c72c56 authored over 6 years ago
Remove other references to PyMuPDF

github.com/ocrmypdf/OCRmyPDF - 45cb4525cf83894919842b67c82a6cd8ba0bfe1a authored over 6 years ago
Use Ghostscript for text region detection

Ghostscript txtwrite seems to be quite effective at the task.

Eliminates dependency on fitz

github.com/ocrmypdf/OCRmyPDF - 8c84c515b6e5c06290b40cce708c5b57765fdb87 authored over 6 years ago
Adjust for pikepdf API change

github.com/ocrmypdf/OCRmyPDF - 1dfbbdebf4affdf1afd1ebecc2fcadc4392dfe7a authored over 6 years ago
Create debug envvar to override Creator or Producer

Note that Ghostscript always overrides Producer

github.com/ocrmypdf/OCRmyPDF - 740918daeedd46b2a6ea072afdfd6c4260a501b4 authored over 6 years ago
Add wiki link to issue template

[ci skip]

github.com/ocrmypdf/OCRmyPDF - 1d10eac764dc16db7ab34dcccf08493fb42ad6de authored over 6 years ago
Remove gpg

[ci skip]

github.com/ocrmypdf/OCRmyPDF - 3f868118cd117e95375d30824713d86ad8c86974 authored over 6 years ago
optimize: fix error in Py3.5

github.com/ocrmypdf/OCRmyPDF - 04d79b15b4551a248186ac11d1485f02148d7d88 authored over 6 years ago
Suppress some spurious tesseract errors

github.com/ocrmypdf/OCRmyPDF - a13c398c064743371ba3e1245a858a7b59d53474 authored over 6 years ago
optimize: use tempdir for cmdline invocation

github.com/ocrmypdf/OCRmyPDF - e3b3f716ee39b12c4bb3e3daac949ecf78feaf1e authored over 6 years ago
Use python-xmp-toolkit for xmp check

Eliminates PyPDF2 and defusedxml as dependencies.

github.com/ocrmypdf/OCRmyPDF - cf43c06f46504d4d53ab62e7b850cd74b9a39d50 authored over 6 years ago
Tweak release notes

github.com/ocrmypdf/OCRmyPDF - 74a5a18607e32abd27c2081363d9e9bec3435717 authored over 6 years ago
Travis: remove deploy to testpypi since it's broken

github.com/ocrmypdf/OCRmyPDF - 44241c6dd531b28885e3a8fcdb24e59923b5d0ab authored over 6 years ago
Fix Py3.5 not understanding os.path.exists(Path(...))

github.com/ocrmypdf/OCRmyPDF - 8fff496ffd0c5ff935169a490669ec078b81950a authored over 6 years ago
Update v7 release notes

github.com/ocrmypdf/OCRmyPDF - edf75c519cb81f985d6a62e7439cdca6289cb43a authored over 6 years ago
Remove all uses of PyPDF2 except PDF/A check

Leave PDF/A check alone for now, since pikepdf has no equivalent.

github.com/ocrmypdf/OCRmyPDF - 9608b22d347019faf52ce6fe8cce961fab3d1f4d authored over 6 years ago
pdfinfo: more robustness

github.com/ocrmypdf/OCRmyPDF - 8ba4968c4825ff5d4d4d138841759979f78b89db authored over 6 years ago
pdfinfo: Fix text_operators type not changed in related commit

github.com/ocrmypdf/OCRmyPDF - ffdd78f1a56c19a76ea04cd5aba71f6351402043 authored over 6 years ago
pdfinfo: reinstate stack normalization for q/Q

github.com/ocrmypdf/OCRmyPDF - ad9f8ca78e0ccafefe0c426b8998111e9fafd5cf authored over 6 years ago
Consider qpdf behavior on algo4 a pass

qpdf opens files with null user password, so do the same.

github.com/ocrmypdf/OCRmyPDF - 78a686ecb446855eccaa60feb4d156ed708763ee authored over 6 years ago
Remove old code to deal with single page only things

github.com/ocrmypdf/OCRmyPDF - 59e786eb3c4170788466ec92b8c6681c908eec3c authored over 6 years ago
Use OperandGrouper whitelist

github.com/ocrmypdf/OCRmyPDF - 6d0461435f9908025bb87749a89aa2988625966a authored over 6 years ago
Document need for pdfinfo to be pickleable

github.com/ocrmypdf/OCRmyPDF - 0a04a60f69f441d303ea14283ef32e1435049b86 authored over 6 years ago
Found out this test was extremely slow - no reason to actual use a large file

github.com/ocrmypdf/OCRmyPDF - 68d864298804b35f6edf0747d65e6e8dd751906a authored over 6 years ago
Main changeset for pikepdf-based refactor pdfinfo

github.com/ocrmypdf/OCRmyPDF - 16f70ff054854815422d7967e43866d151875dda authored over 6 years ago
Add scratch file

github.com/ocrmypdf/OCRmyPDF - c00aeafff0b0fef423a1d7ae10cb5bf2adbcced8 authored over 6 years ago
Start removing PyPDF2

github.com/ocrmypdf/OCRmyPDF - 83f35e00f3edaf06725e7af86ce859501053e391 authored over 6 years ago
Make optimize test do a little more

github.com/ocrmypdf/OCRmyPDF - 786a2ad65a7069f6416d99c38a42870b3b9cb826 authored over 6 years ago
Use pikepdf to handle paletted images

Removes all use of PyMuPDF in optimize

github.com/ocrmypdf/OCRmyPDF - 9425506c2aac1693962830602e17fd7cc9f3c638 authored over 6 years ago
Remove qpdf appimage support for now, check for pngquant

github.com/ocrmypdf/OCRmyPDF - 93b858afd1df04ad1aae6de3daf157eba95e7315 authored over 6 years ago
Add notes for v7

github.com/ocrmypdf/OCRmyPDF - 7b0a3ec3653defd06305f68cfe317a171e99cee4 authored over 6 years ago
main: wording change

github.com/ocrmypdf/OCRmyPDF - 083d442529b40e149a0f38541ea171d8870c4829 authored over 6 years ago
optimize: use pikepdf to save PIL images

Eliminates another usage of PyMuPDF in the main path.

github.com/ocrmypdf/OCRmyPDF - b52eb95cf8dd210dc8260634eee978e7e4713d48 authored over 6 years ago
Ensure we try compress anything that's not compressed when saving

github.com/ocrmypdf/OCRmyPDF - f4571e25083977e9e8d114dab4a41b7b700d1321 authored over 6 years ago
pipeline: use the resolution of the OCR image rather than recalculating

(Recalculating would fail if the image is not centered.)

github.com/ocrmypdf/OCRmyPDF - b06ef03aac29588e2526f487e260425289a03762 authored over 6 years ago
weave: fix rescaling logic

rotation % 90 == 0 is always true.

github.com/ocrmypdf/OCRmyPDF - 1d1962a106b34c76ae4e2d921a87fdb0baad7f14 authored over 6 years ago
weave: if we don't have textonly_pdf, delete instruction to draw image

github.com/ocrmypdf/OCRmyPDF - 4b98e9ff08b1123139571be90a1956014768d529 authored over 6 years ago
weave: whitespace

github.com/ocrmypdf/OCRmyPDF - f83ca5d8ac9c57f287cc8bd05819359b5165c02f authored over 6 years ago
pipeline: make /Info from indirect object as required

github.com/ocrmypdf/OCRmyPDF - 95cb4d22d7eb1877860f510f3ce823e31b2c7fc4 authored over 6 years ago
Fix test failure on missing JobContext

github.com/ocrmypdf/OCRmyPDF - 0c279b01a4bad739234596a1cce7c35d1ef63ccc authored over 6 years ago
test_metadata: change from xfail to skipif without fitz

github.com/ocrmypdf/OCRmyPDF - 3b820ffa7b26d3895634fbfc8d4704622d0b92a5 authored over 6 years ago
pipeline: remove fitz-based attempt to repair table of contents

Prior to unsplit, if we were rebuilding the PDF we'd lose the
table of contents. With unsplit we...

github.com/ocrmypdf/OCRmyPDF - 35cb416563fb9c072caaf9de14af62d6fd26896a authored over 6 years ago
pipeline: remove old page merge strategies

github.com/ocrmypdf/OCRmyPDF - cdb737259c09b260f222bc2698fe2400b55dafdf authored over 6 years ago
pipeline: Move weave* to its own file

github.com/ocrmypdf/OCRmyPDF - 0843b5939ce2bb01353613c3f0cb86e9452e03cd authored over 6 years ago
Add code to repair ToC with pikepdf

github.com/ocrmypdf/OCRmyPDF - 2b5f23a2d1f5303bb2bc8c1fe2a80cde88bca27d authored over 6 years ago
metadata: Fix failing test on __getitem__['/CreationDate']

github.com/ocrmypdf/OCRmyPDF - 5e20d1d5540629bc769573db8cdcbb12a4e4ea49 authored over 6 years ago