Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Add a simple test for image to PDF

github.com/ocrmypdf/OCRmyPDF - e70387b1af33e5b6a5a4c5c86a4b1223236ad462 authored over 8 years ago by James R. Barlow <[email protected]>
PDF/A: handle case of no XMP metadata gracefully

github.com/ocrmypdf/OCRmyPDF - 44f47fba216bf7927897d5bdfc0d9fdd1d946cdf authored over 8 years ago by James R. Barlow <[email protected]>
Suppress NUL bytes in metadata from input files

github.com/ocrmypdf/OCRmyPDF - 02584094a1bcfcc29002ac2a31f8303ef643de30 authored over 8 years ago by James R. Barlow <[email protected]>
Add test cases for --output-type

github.com/ocrmypdf/OCRmyPDF - 91d715ac934d0f8c357b335678eea95f26b0c298 authored over 8 years ago by James R. Barlow <[email protected]>
Complain if Chinese is requested with settings known to not work

Should extend test for other Asian languages

github.com/ocrmypdf/OCRmyPDF - 35addb8a33485986766e80b1580a023a4bfbdbd8 authored over 8 years ago by James R. Barlow <[email protected]>
Remove dead code from qpdf merge + PyPDF2 metadata patching

I tried "qpdf merge + PyPDF2 metadata patching" first. The problem is
that PyPDF2 produces a 1.3...

github.com/ocrmypdf/OCRmyPDF - d32ea8d0ddc321639ee4a231b77561e63ce1be02 authored over 8 years ago by James R. Barlow <[email protected]>
Improve PDF/A validity checking at end

github.com/ocrmypdf/OCRmyPDF - 12575d594a0fa9468a892feb52473a0cd588d5e7 authored over 8 years ago by James R. Barlow <[email protected]>
Fix failing test case - unbound local variable in finally block

github.com/ocrmypdf/OCRmyPDF - 0746083301a7d074525a3fa2a970c732cb0c65dd authored over 8 years ago by James R. Barlow <[email protected]>
Experimental change to use qpdf to merge files (disables Ghostscript)

All but one tests pass, test_input_file_not_a_pdf

Not sure if PyPDF2 metadata generation will m...

github.com/ocrmypdf/OCRmyPDF - 5c99acf6d12997e337d582f5517c66038dd62a2f authored over 8 years ago by James R. Barlow <[email protected]>
leptonica: note about when it may be safe to drop <1.72 workaround

github.com/ocrmypdf/OCRmyPDF - 2b10df7b7449d0e08df457add75f4603f1f3dc6b authored over 8 years ago by James R. Barlow <[email protected]>
Functional qpdfmerge with PyPDF2 for DocumentInfo block

Tests mostly passing. For the moment this is the new default.

Although PyPDF2 produces a PDF-1....

github.com/ocrmypdf/OCRmyPDF - ebe68de4ff9bd033dd7fe793b334b32c77fc0665 authored over 8 years ago by James R. Barlow <[email protected]>
Experimental qpdf merging

Does not copy /Catalog metadata, but otherwise functional

github.com/ocrmypdf/OCRmyPDF - b17c6a146d18ff22d095b575842d32a05493508d authored over 8 years ago by James R. Barlow <[email protected]>
Clarify trusty/precise stuff

github.com/ocrmypdf/OCRmyPDF - 46d837c86652514fceaa969869e677da54aeafa1 authored over 8 years ago by James R. Barlow <[email protected]>
Fix typo in readme

github.com/ocrmypdf/OCRmyPDF - 24856b61e41ab06cceee5f042c694827c0dae5b4 authored over 8 years ago by James R. Barlow <[email protected]>
pyvenv -> python3 -m venv

Sadly the Python developers are removing this script

github.com/ocrmypdf/OCRmyPDF - 8d0c6ff616d8d3f5b4ed1cbab6b445f39a5543f3 authored over 8 years ago by James R. Barlow <[email protected]>
ocrmyimage: complain about ICC profiles being presumed

github.com/ocrmypdf/OCRmyPDF - 0b24f971cd7f38984696d2eccc9ea9d348f88473 authored over 8 years ago by James R. Barlow <[email protected]>
Don't overload --oversample, use --image-dpi instead for images

github.com/ocrmypdf/OCRmyPDF - bc5d3824bd0c4893e8d37e3ccbd2cac255cee58e authored over 8 years ago by James R. Barlow <[email protected]>
Suppress overly long stack traces on traverse_ruffus_exception

github.com/ocrmypdf/OCRmyPDF - 43569837071ffc8dfb4a07fe07a892cff44ae4fb authored over 8 years ago by James R. Barlow <[email protected]>
More cleanup of exception related errors

github.com/ocrmypdf/OCRmyPDF - 2414b79ee61c1352873578f36544e5e6a56f2a88 authored over 8 years ago by James R. Barlow <[email protected]>
Refactor image file triage

github.com/ocrmypdf/OCRmyPDF - 968e1546f01dd3ec0d17bf6145599dff8bbb21c5 authored over 8 years ago by James R. Barlow <[email protected]>
Update release notes and readme

github.com/ocrmypdf/OCRmyPDF - 48213c9c3f8e91295566979a2c913ce71f75829b authored over 8 years ago by James R. Barlow <[email protected]>
Refactor "is this an iterable that's not a string?" test

github.com/ocrmypdf/OCRmyPDF - f385772d212fb6eb7fb7e7b233fb3a10b3eac50a authored over 8 years ago by James R. Barlow <[email protected]>
Most tests were failing at split_pages()

It seems that ruffus sometimes decides to send a ['inputfile.pdf']
instead of a bare string.

github.com/ocrmypdf/OCRmyPDF - d257c835200904331295259ca4e156c566d2eed9 authored over 8 years ago by James R. Barlow <[email protected]>
ocrmyimage: better handling of missing/invalid DPI

github.com/ocrmypdf/OCRmyPDF - 7b72ffec4f8ed1243c222fe83a16a679e8fb3a79 authored over 8 years ago by James R. Barlow <[email protected]>
ocrmyimage - Attempt conversion to PDF if input file is not a PDF

First cut.

May have broken ruffus errors again too.

github.com/ocrmypdf/OCRmyPDF - 757f6826dca355c30ce0e081299eebca5dec0e47 authored over 8 years ago by James R. Barlow <[email protected]>
Travis: use Python 3.5 too

github.com/ocrmypdf/OCRmyPDF - 5df83a0d30c60f5cb57229b2c5e406bb5cfe1ed1 authored over 8 years ago by James R. Barlow <[email protected]>
ruffus exceptions: for clarity only, don't iterate strings

It's a good habit to ensure any iterator test is explicit about
allowing or disallowing strings.

github.com/ocrmypdf/OCRmyPDF - d70e3d37531b6298433d888c11384f9b331d0968 authored over 8 years ago by James R. Barlow <[email protected]>
Remove old OCRmyPDF 2.x from release notes; update 4.2 notes

github.com/ocrmypdf/OCRmyPDF - 0dfceedcfb35a373f635f9d8d2748a410a79323c authored over 8 years ago by James R. Barlow <[email protected]>
Travis: build partly working on trusty; tweak requirements again

The build is #122
https://travis-ci.org/jbarlow83/OCRmyPDF/builds/148255615

Errors seem to be r...

github.com/ocrmypdf/OCRmyPDF - 2c30f4bfc5357b54f5eb1d850e8339beea733eaf authored over 8 years ago by James R. Barlow <[email protected]>
Travis: add PPA to support unpaper

github.com/ocrmypdf/OCRmyPDF - 9e7fb52b4798e1e098a18b57598b3950bc8fe8d5 authored over 8 years ago by James R. Barlow <[email protected]>
Remove additional PPA's and try again

github.com/ocrmypdf/OCRmyPDF - bb5fd38e3870d2c253394a8dba882c35c4a75bab authored over 8 years ago by James R. Barlow <[email protected]>
Try travis-trusty

This removes some backports for packages that Ubuntu trusty offers but
for which Ubuntu precise ...

github.com/ocrmypdf/OCRmyPDF - 7c8cf5cfa25c4af6db596f408b260f59a522f375 authored over 8 years ago by James R. Barlow <[email protected]>
Fix handling of DPI for rare case of JPEG recompression after deskew/clean

This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/cl...

github.com/ocrmypdf/OCRmyPDF - fef35e4eb26a3af3786fd3b7868a8ea3529880b5 authored over 8 years ago by James R. Barlow <[email protected]>
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1

Tesseract renderer not immediately fixable.

github.com/ocrmypdf/OCRmyPDF - 8f77576dc4558f31e4bbe193c4685a36323939ab authored over 8 years ago by James R. Barlow <[email protected]>
Refactor DPI: fix regressions in test suite

Some called functions are particular about the data format of DPI and
don't like to deal with th...

github.com/ocrmypdf/OCRmyPDF - b3fcf24a267355d4f0c5235598cd4ad8fcd591e9 authored over 8 years ago by James R. Barlow <[email protected]>
Bug fix: --force-ocr should still run on pages with no images

Useful for people who want to reprocess text.

This also requires --oversample because DPI is un...

github.com/ocrmypdf/OCRmyPDF - 16e4d342d2d18e88f36a1a12e679380655cdc577 authored over 8 years ago by James R. Barlow <[email protected]>
Tighten requirements and dependencies

github.com/ocrmypdf/OCRmyPDF - 8458a51860c18ecae7fdaba0b0c15387814f8421 authored over 8 years ago by James R. Barlow <[email protected]>
Ghostscript: do raster output with -dSAFER

-dSAFER does not work when rendering PDF/A, because that needs to load
the ICC file, and -dSAFER...

github.com/ocrmypdf/OCRmyPDF - 636d1903b35fed6b07a01af53769fea81f388b82 authored over 8 years ago by James R. Barlow <[email protected]>
Readme: Add table of contents, brew install tesseract --with-language packs

github.com/ocrmypdf/OCRmyPDF - 514efa36fcc2f79ae173f429cb208a63ae968f5b authored over 8 years ago by jbarlow83 <[email protected]>
v4.1.4 release notes

github.com/ocrmypdf/OCRmyPDF - bd48f40d3d049d605aa571fdddd9be2a070afac2 authored over 8 years ago by James R. Barlow <[email protected]>
Merge commit '68cf9cbd87c188823027f9d1bfe9029017e7281f' into develop

github.com/ocrmypdf/OCRmyPDF - c02dbc809ab0cbfbf16c0b6485eef7335979255f authored over 8 years ago by James R. Barlow <[email protected]>
Bug fix: Monochrome images with ICC treated as full color images

Issue #79.
User submitted PDF with ICC profile attached to the monochrome image
in the input fil...

github.com/ocrmypdf/OCRmyPDF - 410111d6fbd131516956c1af872ef19762e62d59 authored over 8 years ago by James R. Barlow <[email protected]>
.rst: add code-block markup

github.com/ocrmypdf/OCRmyPDF - 68cf9cbd87c188823027f9d1bfe9029017e7281f authored over 8 years ago by jbarlow83 <[email protected]>
Fix some .rst formatting errors

github.com/ocrmypdf/OCRmyPDF - c9b2540d9d69c3cffd92d4881c6b6a4aaff53561 authored over 8 years ago by jbarlow83 <[email protected]>
Update license information for encrypted_algo4.pdf

github.com/ocrmypdf/OCRmyPDF - 1bacf35a2c84fe57154d39d64852d522280f8305 authored over 8 years ago by jbarlow83 <[email protected]>
Merge pull request #76 from Jmuccigr/patch-2

Adding explicit reference to help

github.com/ocrmypdf/OCRmyPDF - 8aef0d92779af722da0e18938a7907c2b4c3ac8d authored over 8 years ago by jbarlow83 <[email protected]>
Adding explicit reference to help

github.com/ocrmypdf/OCRmyPDF - b2fa8645ba34e9ebf2546f86fea5274a13d1f3ff authored over 8 years ago by John Muccigrosso <[email protected]>
v4.1.3 release notes

github.com/ocrmypdf/OCRmyPDF - c96823a6485cfceb722959f60107e840043c87ca authored over 8 years ago by James R. Barlow <[email protected]>
Merge branch 'feature/leptfun' into develop

github.com/ocrmypdf/OCRmyPDF - 3807b7d65506a5bdbb1602039d72ffed7b188e1b authored over 8 years ago by James R. Barlow <[email protected]>
Fix order of operations in matrix multiplication

Issue #73. The order of operations happens to not matter for scaling
but does matter for transla...

github.com/ocrmypdf/OCRmyPDF - a45505cf1d014c1d5fcec0086c0fb6c60bf3c659 authored over 8 years ago by James R. Barlow <[email protected]>
Test case for "algorithm 4" test

Algorithm 4 -> PDF version 1.6

github.com/ocrmypdf/OCRmyPDF - b4a734fc0d6051c587239cf9adac351f71287a66 authored over 8 years ago by James R. Barlow <[email protected]>
Add helpful error message for PDFs that use algorithm 4

github.com/ocrmypdf/OCRmyPDF - bbd02926e14d8d75f1dfcff21809e7dfaa8f0667 authored over 8 years ago by James R. Barlow <[email protected]>
Update Windows directions

github.com/ocrmypdf/OCRmyPDF - 5022ded27647bd598e56c46b27fb6a2f95067ea9 authored over 8 years ago by jbarlow83 <[email protected]>
cpix -> _pix

github.com/ocrmypdf/OCRmyPDF - 8d79b94b8456742e8bebfe395af9e60e6ba0051a authored over 8 years ago by James R. Barlow <[email protected]>
More leptonica functions for page manipulation

github.com/ocrmypdf/OCRmyPDF - d7f60b96c107cd5c385aec229b6b9b1fe4d7442b authored over 8 years ago by James R. Barlow <[email protected]>
leptonica: pillow interop

github.com/ocrmypdf/OCRmyPDF - c7612152ef8096d32d0222abeb5a012bc13a3ac9 authored over 8 years ago by James R. Barlow <[email protected]>
lept: fix __getstate/__setstate

github.com/ocrmypdf/OCRmyPDF - af91642cd177ba4f59fb3d0020b2cc0164bbd5ff authored over 8 years ago by James R. Barlow <[email protected]>
Leptonica - ortho rotate, background norm

github.com/ocrmypdf/OCRmyPDF - 9c66334c38f9fe738e4187de35c3d3b201fdee26 authored over 8 years ago by James R. Barlow <[email protected]>
Update filename references from sRGB_IEC to sRGB

github.com/ocrmypdf/OCRmyPDF - b964999427c01317b991322f46409c5e02482e1d authored over 8 years ago by James R. Barlow <[email protected]>
Replace sRGB_IEC with MIT license compatible sRGB

New file is from Debian package icc-profiles-free

github.com/ocrmypdf/OCRmyPDF - 3473345ea61b9bf57bb9146bd57984b91e92ad14 authored over 8 years ago by James R. Barlow <[email protected]>
Provide more helpful error message if pypdf can't merge pages

github.com/ocrmypdf/OCRmyPDF - 349ec5c81fb5153dabc78239d20be11d22be5797 authored over 8 years ago by James R. Barlow <[email protected]>
v4.1 release notes

github.com/ocrmypdf/OCRmyPDF - ff78d7c56c18f3f6affb53779a6ac4245dc2ca5f authored over 8 years ago by James R. Barlow <[email protected]>
Fix race condition between these tests when run in parallel

github.com/ocrmypdf/OCRmyPDF - ff092c86299e09bc9bba5c176c791a5ad9814e16 authored over 8 years ago by James R. Barlow <[email protected]>
Fix ruffus exception output

I found this issue in ruffus 2.6.3
https://github.com/bunbun/ruffus/issues/65
also discussed her...

github.com/ocrmypdf/OCRmyPDF - fe14cb57c0c27b5c54392a9c839491193f24e1e5 authored over 8 years ago by James R. Barlow <[email protected]>
Refactor _find_page_images

github.com/ocrmypdf/OCRmyPDF - 507fbc01d5fe62397f72c3e17c47b8e8c1b63b20 authored over 8 years ago by James R. Barlow <[email protected]>
Fix test failure: inline images with multiple image filters specified

github.com/ocrmypdf/OCRmyPDF - 325479e5be3daf29ddf03415841278a6d351951e authored over 8 years ago by James R. Barlow <[email protected]>
Fuzzing: check for graphics stack overflow

Very unlikely to occur

github.com/ocrmypdf/OCRmyPDF - e926ecb8b26adc8b8d6c3b6829f27efb87d86105 authored over 8 years ago by James R. Barlow <[email protected]>
Replace private hypotenuse formula with hypot()

github.com/ocrmypdf/OCRmyPDF - d0cb6c0e924586fe96205c29b7780014644820a4 authored over 8 years ago by James R. Barlow <[email protected]>
Remove check for /ImageMask

/ImageMask means the the image is a stencil mask for a grayscale or
color image. From issue #63 ...

github.com/ocrmypdf/OCRmyPDF - 5b7c8cf5d3d46cbba8e214803dbef0fee18560fd authored over 8 years ago by James R. Barlow <[email protected]>
Remove dead code "import stuff in testcase"

github.com/ocrmypdf/OCRmyPDF - 40baab32acbec996079adf545f5681a66e15bb3f authored over 8 years ago by James R. Barlow <[email protected]>
--rotate-pages: Only apply rotation if we're reasonable confident

Take the threshold from tesseract's default value for -psm 1.

github.com/ocrmypdf/OCRmyPDF - e877d37ac869cef03df6b6d8bd5448294a39d1a8 authored over 8 years ago by James R. Barlow <[email protected]>
Merge commit '1605408c23fa1b9252c5d3f10f279b43733b0728' into develop

github.com/ocrmypdf/OCRmyPDF - 5a9f77e4382aadcb6b4ae0fc3f52feb2646f15c6 authored over 8 years ago by James R. Barlow <[email protected]>
Check encoding of inline images

github.com/ocrmypdf/OCRmyPDF - 8ddd67d1e292f0cebcac5c77f9e2e1d4370923de authored almost 9 years ago by James R. Barlow <[email protected]>
README: add libffi-dev

github.com/ocrmypdf/OCRmyPDF - 1605408c23fa1b9252c5d3f10f279b43733b0728 authored almost 9 years ago by jbarlow83 <[email protected]>
Simplify DPI calculation with algebraic derivation

Needs testing

github.com/ocrmypdf/OCRmyPDF - 2d3b1ebf6ef6533cc88254407e1bf1239c1197fb authored almost 9 years ago by James R. Barlow <[email protected]>
Update license: sRGB ICC

github.com/ocrmypdf/OCRmyPDF - c74eaab7f55883284d818d6e21d41aaceff8fbc6 authored almost 9 years ago by James R. Barlow <[email protected]>
Merge commit 'a73afc4e769202b916d35dee481d741cf6bb7224'

github.com/ocrmypdf/OCRmyPDF - c21d231388ab52f450f47104889046e6883e9bfb authored almost 9 years ago by James R. Barlow <[email protected]>
Merge pull request #59 from spwhitton/apt-get

README: Debian and Ubuntu installation option

github.com/ocrmypdf/OCRmyPDF - a73afc4e769202b916d35dee481d741cf6bb7224 authored almost 9 years ago by jbarlow83 <[email protected]>
README: Debian and Ubuntu installation option

github.com/ocrmypdf/OCRmyPDF - 76c364150d42b80e07e053fe8aec89ff89d93b76 authored almost 9 years ago by Sean Whitton <[email protected]>
Add otsu threshold to leptonica

github.com/ocrmypdf/OCRmyPDF - 94a3e447cc4cfd09b697108919df8fbc3d737bdc authored almost 9 years ago by James R. Barlow <[email protected]>
Travis: install unpaper.deb instead of compiling from source

github.com/ocrmypdf/OCRmyPDF - 12868b461a3dc032a49a4a0ed645406a7298eef0 authored almost 9 years ago by James R. Barlow <[email protected]>
unpaper: fix check for missing and old versions, add test case

github.com/ocrmypdf/OCRmyPDF - 322085933bc1e095b0ae99e7a2ae9ad5c691a186 authored almost 9 years ago by James R. Barlow <[email protected]>
v4.0.7

github.com/ocrmypdf/OCRmyPDF - 3fed94bb796bae3686cd05270bce8b7d344243a3 authored almost 9 years ago by James R. Barlow <[email protected]>
Fix leptonica initializers

github.com/ocrmypdf/OCRmyPDF - 8c877482bd5406376dcbd2411437b19270b9d28c authored almost 9 years ago by James R. Barlow <[email protected]>
Don't set -sOutputICCProfile

Ghostscript dev advised against. It appears that this is for
creating target for a device that c...

github.com/ocrmypdf/OCRmyPDF - b17d589e84a25959c84630f3dcbfb9344292ac8f authored almost 9 years ago by James R. Barlow <[email protected]>
setuptools_scm_git_archive seems suddenly broken

github.com/ocrmypdf/OCRmyPDF - 368252a2439f14116637142278f03a220b3219d0 authored almost 9 years ago by James R. Barlow <[email protected]>
v4.0.6 notes

github.com/ocrmypdf/OCRmyPDF - ccefda1beea2445102133d230e4cf9b05bc38a59 authored almost 9 years ago by James R. Barlow <[email protected]>
Provide our own sRGB profile instead of Ghostscript's

github.com/ocrmypdf/OCRmyPDF - 3d0e8c9629b89b89a8e5e589746e0efd6e1d84bb authored almost 9 years ago by James R. Barlow <[email protected]>
setup_scm_git_archive: add additional files

github.com/ocrmypdf/OCRmyPDF - 313bbbb94c8afd5d5f6360a67efb46c7b8a977c4 authored almost 9 years ago by James R. Barlow <[email protected]>
get_postscript_icc_path: don't check the same path multiple times

github.com/ocrmypdf/OCRmyPDF - 0360f078de904159a641b80c6e09a185adaffeee authored almost 9 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - c8901666c41f45ed2715e87e84f46d94acf419be authored almost 9 years ago by James R. Barlow <[email protected]>
Improve install instructions for OS X (unpaper)

github.com/ocrmypdf/OCRmyPDF - 7430006596fb71d367e5cc87526863a389cf8556 authored almost 9 years ago by James R. Barlow <[email protected]>
Add bookmarks to file for more testing

github.com/ocrmypdf/OCRmyPDF - f3e06b2dbd688e6a40cb42edb6a925151b79af7f authored almost 9 years ago by James R. Barlow <[email protected]>
Merge pull request #54 from stweil/master

Replace broken link to c't article by permalink

github.com/ocrmypdf/OCRmyPDF - e97df307ffe2f7499793a6f20c234156495204ef authored almost 9 years ago by jbarlow83 <[email protected]>
Replace broken link to c't article by permalink

Update also the 2nd article link to use a permalink, too.

Signed-off-by: Stefan Weil <sw@weilne...

github.com/ocrmypdf/OCRmyPDF - 1443354aa2d701ebef2f28f9bd47709e4c451d05 authored almost 9 years ago by Stefan Weil <[email protected]>
v4.0.5 release notes

github.com/ocrmypdf/OCRmyPDF - 250e68c1cd3546a5b1349973158ad516401fac8e authored almost 9 years ago by James R. Barlow <[email protected]>
Fix temporary file placed in wrong folder

github.com/ocrmypdf/OCRmyPDF - 6a380ee99ce6b91a13d160888185d6fee6111903 authored almost 9 years ago by James R. Barlow <[email protected]>
Remove extraneous debug print() messages

github.com/ocrmypdf/OCRmyPDF - 3c90bd96a99efde05378f51aceec9e0b10a87f9a authored almost 9 years ago by James R. Barlow <[email protected]>
v4.0.4 Updates release notes

github.com/ocrmypdf/OCRmyPDF - 06a7ceb25a8e443f6ae0a09e7f45b76aeebc5405 authored almost 9 years ago by James R. Barlow <[email protected]>
Merge branch 'feature/parsecontent'

github.com/ocrmypdf/OCRmyPDF - 733a8e7d58c26d46b714d02379f2361370ce94d1 authored almost 9 years ago by James R. Barlow <[email protected]>