Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Moved venvs

github.com/ocrmypdf/OCRmyPDF - 3d0dc95a06b77a1ab2f9c09447c5aa2a345d708d authored about 8 years ago by James R. Barlow <[email protected]>
OS X -> macOS

github.com/ocrmypdf/OCRmyPDF - 04a57a3cc28ddae2157dac8c5517522e751a48d4 authored about 8 years ago by James R. Barlow <[email protected]>
v4.3.2 release notes

github.com/ocrmypdf/OCRmyPDF - d0c22ce01dd4df447066353c03cf13d627ff6b81 authored about 8 years ago by James R. Barlow <[email protected]>
ghostscript: elide overprinting to fix PDF/A errors in GS 9.20

It looks like GS 9.19 can incorrectly set overprinting for the text layer
even though this makes...

github.com/ocrmypdf/OCRmyPDF - 23c95e966011580398ecdd2bbae748693af03e2c authored about 8 years ago by James R. Barlow <[email protected]>
pdfa: fix KeyError on pdfa_dict if document has some xmp metadata but

not exactly what we’re looking for

github.com/ocrmypdf/OCRmyPDF - eecab9b95d64cf684147ec6edc594f6a4df1a18a authored about 8 years ago by James R. Barlow <[email protected]>
Merge branch 'develop'

github.com/ocrmypdf/OCRmyPDF - 8abc2f113cce0789be3d3888d7da7fcb5951cb90 authored about 8 years ago by James R. Barlow <[email protected]>
v4.3.1 release notes

github.com/ocrmypdf/OCRmyPDF - 949d2ff1c23b042fb78904a133340ac25496712b authored about 8 years ago by James R. Barlow <[email protected]>
test_pageinfo: Remove bits per component test

The behavior of this test will ultimately depend on what version of
img2pdf is installed, since ...

github.com/ocrmypdf/OCRmyPDF - 1c8b763d53bd038f1b2d3ef9525c331d428136a4 authored about 8 years ago by James R. Barlow <[email protected]>
Fix “deskew-rotate” bug.

Turns out this occurred in any case where pdf-renderer hocr was used
and a tesseract timeout or ...

github.com/ocrmypdf/OCRmyPDF - bb91393b8518c5b7154b2634bb9eb2934ea54584 authored about 8 years ago by James R. Barlow <[email protected]>
Add test case for documents that get rotated incorrectly after deskew

github.com/ocrmypdf/OCRmyPDF - cc9c0d819eaa1ea7a9c5d98df54460012213d6c7 authored about 8 years ago by James R. Barlow <[email protected]>
Update documentation on other languages, multilingual documents

github.com/ocrmypdf/OCRmyPDF - a72b8caf476c6875108708a1194ca03eb04e7cad authored about 8 years ago by James R. Barlow <[email protected]>
Optimize some of the test resources to reduce file sizes

Mostly by reducing RGB -> monochrome and applying JBIG2 compression

github.com/ocrmypdf/OCRmyPDF - fdd9b8b8ce90d6b223ece399f6fc426b8710617e authored about 8 years ago by James R. Barlow <[email protected]>
Make debug dump of pageinfo at the end of processing readable

github.com/ocrmypdf/OCRmyPDF - c096b4ca8cac00436448976e0c2630f01497d590 authored about 8 years ago by James R. Barlow <[email protected]>
Add @posttask debug hooks

github.com/ocrmypdf/OCRmyPDF - 427add30086960dc07308df9125580dba3571343 authored about 8 years ago by James R. Barlow <[email protected]>
Fix bug: LeptonicaErrorTrap() leaks file handles

github.com/ocrmypdf/OCRmyPDF - c45871700dd2a8748b4cba47ecd63e22a9d1849b authored about 8 years ago by James R. Barlow <[email protected]>
disable mathjax sphinx extension (#103)

Mathjax isn't actually needed for OCRmyPDF's docs, but enabling this
extension causes the brows...

github.com/ocrmypdf/OCRmyPDF - 6821e8eeb2ba317f56ee712fc6363252fb831bec authored about 8 years ago by Sean Whitton <[email protected]>
tesseract caching: don't transcode tesseract's output, hash source file

For sanity's sake, deal with tesseract streams in binary without
transcoding (via universal_newl...

github.com/ocrmypdf/OCRmyPDF - a4f07756a57dc30eeb551cdfda513de90992cef8 authored about 8 years ago by James R. Barlow <[email protected]>
Obligatory MANIFEST.in repair

github.com/ocrmypdf/OCRmyPDF - f24fb0e0c522dc98b15e4818747e7180bdfc9614 authored about 8 years ago by James R. Barlow <[email protected]>
More work on documentation

github.com/ocrmypdf/OCRmyPDF - 73b88a0a6ffcbbbca6832108172257c39b06247c authored about 8 years ago by James R. Barlow <[email protected]>
Update README to point to ReadTheDocs

github.com/ocrmypdf/OCRmyPDF - c42f39e2d4eb4e8bf4eed1e35e8bd637dcd6ac47 authored about 8 years ago by James R. Barlow <[email protected]>
docs: OS X -> macOS branding change

github.com/ocrmypdf/OCRmyPDF - 5e5fe3175f1bf79aee8036242fed4c1eb06f7e2c authored about 8 years ago by James R. Barlow <[email protected]>
pageinfo: add a python3.4 implementation of isclose()

github.com/ocrmypdf/OCRmyPDF - cab65d1f11a18ebee7d55856de23af1ad5bc7eea authored about 8 years ago by James R. Barlow <[email protected]>
docs: allow python setup.py install --force to bypass checks

ReadTheDocs needs this.

github.com/ocrmypdf/OCRmyPDF - 245f05d5f4b94ac2b3cabd8e30ca292cb1415fd5 authored about 8 years ago by James R. Barlow <[email protected]>
Merge branch 'feature/docs' into develop

# Conflicts:
# ocrmypdf/__main__.py

github.com/ocrmypdf/OCRmyPDF - dda751f9e3db516e59c9c15bae8c70c5f726d431 authored about 8 years ago by James R. Barlow <[email protected]>
Update release notes for 4.3

github.com/ocrmypdf/OCRmyPDF - 3d37ae988a4c92480ae01478c899ae7b02993231 authored about 8 years ago by James R. Barlow <[email protected]>
Prevent dumping binary PDFs to stdout

github.com/ocrmypdf/OCRmyPDF - 717acd9855688aeab1f47c9fd4242999c757cd66 authored about 8 years ago by James R. Barlow <[email protected]>
Allow piping output to stdout

github.com/ocrmypdf/OCRmyPDF - 2e4431cc638d3b320e244ae197513be9889cdc55 authored about 8 years ago by James R. Barlow <[email protected]>
test_stdin: simplify this test

No need to involve 'cat', just hook the file up to stdin.

github.com/ocrmypdf/OCRmyPDF - f7387b0859133774f11b305b252b42c084fef8e4 authored about 8 years ago by James R. Barlow <[email protected]>
Test cases: check that stdout is clear of output

To ensure piping to stdout is possible.

github.com/ocrmypdf/OCRmyPDF - a09f6b8977477707b82fd2dabd832a25358b9934 authored about 8 years ago by James R. Barlow <[email protected]>
main: don't print output file location to stdout, use stderr

github.com/ocrmypdf/OCRmyPDF - d63449c2143caabb61f1eaaf95d3bd0070c93160 authored about 8 years ago by James R. Barlow <[email protected]>
Remove possibly non-free page from "multipage.pdf"

github.com/ocrmypdf/OCRmyPDF - a86805f0d98f58e1c419c3f7e95fc0d82d55ee1e authored about 8 years ago by James R. Barlow <[email protected]>
ghostscript: log errors from stdout

github.com/ocrmypdf/OCRmyPDF - 7d2009ccefbfe9fcfea096f62c564ce99ca69ce9 authored about 8 years ago by James R. Barlow <[email protected]>
ghostscript: ensure raster resolution is specified in integer units

github.com/ocrmypdf/OCRmyPDF - 18ae5db06da6db751c5b1b4880f657d87ed66290 authored about 8 years ago by James R. Barlow <[email protected]>
pageinfo: accept "cm/Do" image drawing without the usual "q/Q" wrapper

Some PDFs omit the traditional q/Q wrapper and alter ctm with a stack
depth of zero, so make our...

github.com/ocrmypdf/OCRmyPDF - 9a1838f1023d676f2dcfbe6ec4844b8e69b14898 authored about 8 years ago by James R. Barlow <[email protected]>
leptonica: add color testing functions for future experiments

github.com/ocrmypdf/OCRmyPDF - e20346032d2663d13d48039541c8a7317ebcdc85 authored about 8 years ago by James R. Barlow <[email protected]>
leptonica: add iPython display hook and equality test

github.com/ocrmypdf/OCRmyPDF - 693a27d76c91c2cb0423cb9af365fee41987b394 authored about 8 years ago by James R. Barlow <[email protected]>
leptonica: fix Pillow conversion for 1-bit and 8-bit gray images

github.com/ocrmypdf/OCRmyPDF - 203966d86b2fffdc3b2e1d189efcf9be83cd5e76 authored about 8 years ago by James R. Barlow <[email protected]>
Implement new preprocessing feature, background removal

github.com/ocrmypdf/OCRmyPDF - 7eca8508fd53f6faab4cd83b4b865a84f1709faf authored about 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' into develop

github.com/ocrmypdf/OCRmyPDF - b85270df1c644edba2f9b45431171653da92b6b3 authored about 8 years ago by James R. Barlow <[email protected]>
v4.2.5: update release notes, fix silly typo in pageinfo.py

github.com/ocrmypdf/OCRmyPDF - aff597cef4384f4690b1bc36e10c374c106f08aa authored about 8 years ago by James R. Barlow <[email protected]>
Fix issue: BitsPerComponent is an optional field, sometimes omitted

github.com/ocrmypdf/OCRmyPDF - 61b05b3dee4fa8ac6371c72bcb6237850aa6b905 authored about 8 years ago by James R. Barlow <[email protected]>
Update README.rst (#98)

`brew install tesseract` just installed the english language pack not French, German or Spanish

github.com/ocrmypdf/OCRmyPDF - 453c4ef602ff88baa428644483f6af70519f4d44 authored about 8 years ago by Julian Kahnert <[email protected]>
The main 'quick' test should be a file that OCRs to recognizable text

github.com/ocrmypdf/OCRmyPDF - cf4b04f92d435e7d82a388f840cb270a6558215f authored over 8 years ago by James R. Barlow <[email protected]>
Merge commit '07891d994aab92e7a14aebe1ac509aab2d4f170c'

github.com/ocrmypdf/OCRmyPDF - 06c699998708079052b606b7aaedcaec1ec1b856 authored over 8 years ago by James R. Barlow <[email protected]>
Replace redacted file with an OCR-able file

github.com/ocrmypdf/OCRmyPDF - 013c5a369f608ba756813358fd68f1dad0df82d1 authored over 8 years ago by James R. Barlow <[email protected]>
Replace redacted file with an OCR-able file

github.com/ocrmypdf/OCRmyPDF - 07891d994aab92e7a14aebe1ac509aab2d4f170c authored over 8 years ago by James R. Barlow <[email protected]>
Replace with non-free file milk.pdf with free equivalent

github.com/ocrmypdf/OCRmyPDF - 6baf8668a66efa4325dd3e0b3b7eef01ce1c1aa8 authored over 8 years ago by James R. Barlow <[email protected]>
Comment on non-free files

github.com/ocrmypdf/OCRmyPDF - 4ba2962c5698e3dc9724c645dedf7c8ffb104bfd authored over 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - 7ad92f5db4a9b7aec229d91bf8361bd03afc10f7 authored over 8 years ago by James R. Barlow <[email protected]>
resources/README: replace the other large table with a list table

github.com/ocrmypdf/OCRmyPDF - 4dad09cc919575d928c5d43501079d4db5d460bb authored over 8 years ago by James R. Barlow <[email protected]>
also exclude .git in pytest.ini (#94)

github.com/ocrmypdf/OCRmyPDF - 7b2e0c7a7a7188fba39fbfaa798278e5cd6ebcfa authored over 8 years ago by Sean Whitton <[email protected]>
pytest skipif for milk.pdf test (#95)

Skip the test if the fair use restricted milk.pdf is not present.

github.com/ocrmypdf/OCRmyPDF - 7f08f15fc9483b33819b3b040b52a854a911fc39 authored over 8 years ago by Sean Whitton <[email protected]>
Note that milk.pdf is non-free, start using list-tables

github.com/ocrmypdf/OCRmyPDF - 825c0f8b2a726b7f78bb1e15cb26ad6547b6a47c authored over 8 years ago by James R. Barlow <[email protected]>
Update tesseract supported languages

github.com/ocrmypdf/OCRmyPDF - dbe880bc417e96b6f3559f5ff7d954971ffb30f3 authored over 8 years ago by James R. Barlow <[email protected]>
leptonica: learn a few new tricks

Found some interesting options for background norm.

github.com/ocrmypdf/OCRmyPDF - 2ec516b6ffdf1e91b7a3621fc4aa2db6221997a7 authored over 8 years ago by James R. Barlow <[email protected]>
leptonica: This is not a Py2 module anymore

github.com/ocrmypdf/OCRmyPDF - 7942a01e504c65feeace2a9b178cb33515a1a872 authored over 8 years ago by James R. Barlow <[email protected]>
Update tesseract supported languages

github.com/ocrmypdf/OCRmyPDF - df684f9344b80573022aff55ff306cbc64314ca9 authored over 8 years ago by James R. Barlow <[email protected]>
leptonica: scale should be a tuple for consistency

github.com/ocrmypdf/OCRmyPDF - ae16e95e42977deb01a99ebbda361ef8623df1e1 authored over 8 years ago by James R. Barlow <[email protected]>
More doc tweaks, mainly introduction

github.com/ocrmypdf/OCRmyPDF - 2ac8e8a0cc1ebd313b43699beda4225726d919ed authored over 8 years ago by James R. Barlow <[email protected]>
Start the documentation

github.com/ocrmypdf/OCRmyPDF - 0a0ceda71f51ee154c763893f3796f685488cfd6 authored over 8 years ago by James R. Barlow <[email protected]>
tasks.py: stop tracking this file for now

This helper script is still in development and needs to be changed each
release, which breaks th...

github.com/ocrmypdf/OCRmyPDF - 220f1ce161e2f41d0780619ca480cff21772cb20 authored over 8 years ago by James R. Barlow <[email protected]>
v4.2.4 release notes

github.com/ocrmypdf/OCRmyPDF - c62a8a97c9d95429060dbe48259be1b87e0ef336 authored over 8 years ago by James R. Barlow <[email protected]>
tasks: show logging info

github.com/ocrmypdf/OCRmyPDF - f8a1136979f06a7a8264c524628b5941864ea162 authored over 8 years ago by James R. Barlow <[email protected]>
Update description of masks.pdf to reflect what it actually tests

github.com/ocrmypdf/OCRmyPDF - 9ca29c787b85dd1924840ce006301a90cbe55004 authored over 8 years ago by James R. Barlow <[email protected]>
pageinfo: regression - didn't add inline images to list

github.com/ocrmypdf/OCRmyPDF - 6af748a251bd3a27db166b6f0d91092f98c0ea82 authored over 8 years ago by James R. Barlow <[email protected]>
pageinfo: exclude images from DPI calculation if drawn at stack depth 0

More thorough testing showed that Acrobat do not presume that images
fill the page if the CTM is...

github.com/ocrmypdf/OCRmyPDF - 9041867f865f8527c63ac8637d6d8bef66cc81e1 authored over 8 years ago by James R. Barlow <[email protected]>
pageinfo: handle stencil masks when stack depth > 0

github.com/ocrmypdf/OCRmyPDF - 04099b087c501bece05a39041e7875e8e34b099b authored over 8 years ago by James R. Barlow <[email protected]>
tasks: fix logic error and make magic numbers disappear

github.com/ocrmypdf/OCRmyPDF - 6d6234714cb65d67513b69a071e71d5efbabfd6b authored over 8 years ago by James R. Barlow <[email protected]>
Add release helper script

github.com/ocrmypdf/OCRmyPDF - 520be23481596681c1886023cff00a4d110387a7 authored over 8 years ago by James R. Barlow <[email protected]>
Start tracking development requirements

github.com/ocrmypdf/OCRmyPDF - 346c3c8dd3f56ac6ed39a5de2c60d4f437dce73b authored over 8 years ago by James R. Barlow <[email protected]>
main.py -> __main__.py

Executing a package with python -m packagename will check for
__main__.py inside the package. I...

github.com/ocrmypdf/OCRmyPDF - bd534c33137eccf9627a5606eaf2546bf644c6bd authored over 8 years ago by James R. Barlow <[email protected]>
link: more MANIFEST.in tweaks

github.com/ocrmypdf/OCRmyPDF - 2625368aed389a6b4fefef8b000ea14af910d31f authored over 8 years ago by James R. Barlow <[email protected]>
lint: no need to check for DEVNULL; all supported versions have it

github.com/ocrmypdf/OCRmyPDF - 8ac94879f109948246cf755bd0d5a0d346edcf20 authored over 8 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - dd8c0f3756a97e4f3f61dbbd3ddd2f73fdb416ce authored over 8 years ago by James R. Barlow <[email protected]>
v4.2.3 release notes

github.com/ocrmypdf/OCRmyPDF - 010f353a5e28610847dbb9a3810d9b550ae7c452 authored over 8 years ago by James R. Barlow <[email protected]>
Fix MANIFEST.in, as Python packages require

github.com/ocrmypdf/OCRmyPDF - e0a18edb924745a8c3fd6b337ec311bd1d926042 authored over 8 years ago by James R. Barlow <[email protected]>
Reinstate OCRmyPDF.sh with a deprecation warning

github.com/ocrmypdf/OCRmyPDF - c6f2eea058e806179be3579fccf12b49a09999cf authored over 8 years ago by James R. Barlow <[email protected]>
Add milk.pdf test case

github.com/ocrmypdf/OCRmyPDF - bf89e38c69c6035871c3493f4d96608e6418b5b9 authored over 8 years ago by James R. Barlow <[email protected]>
Create issue template

github.com/ocrmypdf/OCRmyPDF - e1f0640d42955fc2501199fca30815fc683ffcd0 authored over 8 years ago by jbarlow83 <[email protected]>
Bug fix issue #89: trying to perform arithmetic on IndirectObject

TypeError: bad operand type for unary -: 'IndirectObject'

github.com/ocrmypdf/OCRmyPDF - 71b54035ba192313956fd66f579f05c6cc835871 authored over 8 years ago by James R. Barlow <[email protected]>
Allow test cases to run without installing first

As @spwhitton found:

The test suite needs to call "python3 -m ocrmypdf.main" instead of
just "o...

github.com/ocrmypdf/OCRmyPDF - 325cc0beca8f268744779f5aa5e17061626e440f authored over 8 years ago by James R. Barlow <[email protected]>
Remove OCRmyPDF.sh and its usage in all test cases

github.com/ocrmypdf/OCRmyPDF - 1a9f09c4d519eebce7d0e2b7890b939ab2bbc8f6 authored over 8 years ago by James R. Barlow <[email protected]>
tests: don't try to pass Unicode arguments on command line on Linux

Depends on locale being configured properly, and it's not necessary
to be able to do this.

github.com/ocrmypdf/OCRmyPDF - 4fed4e2af334db2e86d6f14c4fbda1a18f467cbe authored over 8 years ago by James R. Barlow <[email protected]>
pytest.ini: apply patch from Debian to exclude .pc dir

https://sources.debian.net/src/ocrmypdf/4.2.1%2Bgit.20160824.1.5d67cc7-1/debian/patches/0003-pyt...

github.com/ocrmypdf/OCRmyPDF - 74cc2346a5cd869b7fa7037db09afbb9e8cd6b90 authored over 8 years ago by James R. Barlow <[email protected]>
Improve some documentation for tests

github.com/ocrmypdf/OCRmyPDF - cc7e328358af8e4b1f9fdf78a7582572764cf08c authored over 8 years ago by James R. Barlow <[email protected]>
Add test case for PDFs with masks and stencil masks

github.com/ocrmypdf/OCRmyPDF - d25397e2b005996041d253787873ba7f47b2afa3 authored over 8 years ago by James R. Barlow <[email protected]>
Help text: example of shell pipeline with img2pdf

github.com/ocrmypdf/OCRmyPDF - bc11454e1c70fbaff0c2236bf2911b4bd8096dea authored over 8 years ago by James R. Barlow <[email protected]>
Test case for stdin streaming

github.com/ocrmypdf/OCRmyPDF - 2025a096c369fae6078964398476b4b7c525c534 authored over 8 years ago by James R. Barlow <[email protected]>
Make final PDF/A output message less obtuse

github.com/ocrmypdf/OCRmyPDF - 38fe14b1080e750df7c9a01193aabd11bfd0a9d4 authored over 8 years ago by James R. Barlow <[email protected]>
v4.2.2 release notes, documentation improvements

github.com/ocrmypdf/OCRmyPDF - 1b7b2f3695717ddb0e8e05f50286174342083dd3 authored over 8 years ago by James R. Barlow <[email protected]>
Update 4.2.1 release notes

github.com/ocrmypdf/OCRmyPDF - 5d67cc76cc7aaa4f22c45ed13138dfb880a8a902 authored over 8 years ago by James R. Barlow <[email protected]>
Recover input filename from symlink on error message

The recent commit to accept files from stdin broken the feature of
returning the input filename ...

github.com/ocrmypdf/OCRmyPDF - 27a3813207cae05d6d16df0b57e1a303b3f56a31 authored over 8 years ago by James R. Barlow <[email protected]>
Merge branch 'develop'

github.com/ocrmypdf/OCRmyPDF - b06e0bfdcde943aa5863e5824687450db18f6ba9 authored over 8 years ago by James R. Barlow <[email protected]>
Implement DPI checking for stencil masks

github.com/ocrmypdf/OCRmyPDF - d616f25324a4cb98e581c59942b5b914d82de26e authored over 8 years ago by James R. Barlow <[email protected]>
setup.py -> license is MIT

github.com/ocrmypdf/OCRmyPDF - b03028e31feff9244410ad3e646023f1b2e3cbd1 authored over 8 years ago by James R. Barlow <[email protected]>
Tweak pipeline again

github.com/ocrmypdf/OCRmyPDF - e08c42fd3d78a4dd89f276cd4486eaa1ff7d4c5f authored over 8 years ago by James R. Barlow <[email protected]>
Accept input from stdin if input filename is '-'

github.com/ocrmypdf/OCRmyPDF - 16901f7134047b9681fd579ff9c9039c2f6ebc7a authored over 8 years ago by James R. Barlow <[email protected]>
Update the pipeline image

github.com/ocrmypdf/OCRmyPDF - dffceedd85da173216836b10609776a6b18e4552 authored over 8 years ago by James R. Barlow <[email protected]>
New test to confirm we can emit JBIG2 with appropriate settings

github.com/ocrmypdf/OCRmyPDF - e5541e435caf5552d507c27f4b2cf70b369124e3 authored over 8 years ago by James R. Barlow <[email protected]>
Tweak release notes

github.com/ocrmypdf/OCRmyPDF - b969aad67b4adb8f47b9f9d6892b2d307b1ff764 authored over 8 years ago by James R. Barlow <[email protected]>