Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Don't generate PDF/A-1b with object streams

Acrobat insists that PDF/A-1b should not have object streams.
Other programs like veraPDF disagr...

github.com/ocrmypdf/OCRmyPDF - 4124889f360446bdfdab3ba0aaef1b154579c63c authored almost 4 years ago by James R. Barlow <[email protected]>
helpers: tidy check_pdf

github.com/ocrmypdf/OCRmyPDF - a23c22b0e8b373cd421f11aa6df2ce9b81ed8621 authored almost 4 years ago by James R. Barlow <[email protected]>
pyproject: black doesn't like py39 yet

github.com/ocrmypdf/OCRmyPDF - dd1f5f7215ed6cd12fa1c9065a4226befc41e1ca authored almost 4 years ago by James R. Barlow <[email protected]>
Allow --sidecar along --pages (#735)

github.com/ocrmypdf/OCRmyPDF - 5e2206bae76fe4b98f353676fdb37d8092a305e4 authored almost 4 years ago by Dima Kuznetsov <[email protected]>
pyproject: also target py39

github.com/ocrmypdf/OCRmyPDF - 079ee86d4354470616051abaf07acb48d54bd1a7 authored almost 4 years ago by James R. Barlow <[email protected]>
v11.6.2 release notes

github.com/ocrmypdf/OCRmyPDF - 3692868004c2c4255603b3f75b4b1d872ffd1bec authored almost 4 years ago by James R. Barlow <[email protected]>
Fix page rotation regression

Page size fixes in commit b26749 did accounted for a "kept" rotation,
but not a corrected rotati...

github.com/ocrmypdf/OCRmyPDF - 064f935699e9763045f89f8128d18e6ba9b26635 authored almost 4 years ago by James R. Barlow <[email protected]>
tests: remove unreliable/incomplete test

github.com/ocrmypdf/OCRmyPDF - 8770fff96885dd16b2bc95e5815592bb18d7ef94 authored almost 4 years ago by James R. Barlow <[email protected]>
v11.6.1 release notes

github.com/ocrmypdf/OCRmyPDF - 82de78b6b0ec35b038a9e9db560bd3e99f70d650 authored almost 4 years ago by James R. Barlow <[email protected]>
optimize: skip images with unusually small dimensions

They're unlikely to be handled well by our recompressors. It seems
that JBIG2 cannot handle very...

github.com/ocrmypdf/OCRmyPDF - 2a52c6dec2ea58a7ba0a108b5dd60b14691c8d7b authored almost 4 years ago by James R. Barlow <[email protected]>
docker-compose: fix typo

github.com/ocrmypdf/OCRmyPDF - 2898879be7bdf4b526c3924626ea673aa6768a56 authored almost 4 years ago by James R. Barlow <[email protected]>
docker-compose: fix typo

github.com/ocrmypdf/OCRmyPDF - 18e613657cac71e7c12b0a3aa950747a181299b4 authored almost 4 years ago by James R. Barlow <[email protected]>
Add filter_pdf_page hook

github.com/ocrmypdf/OCRmyPDF - a48ca556c7f568d3245096219c0a1c34be184803 authored almost 4 years ago by James R. Barlow <[email protected]>
Remove deprecated code

github.com/ocrmypdf/OCRmyPDF - 9cba738b485fec161c50768c0eaa8aacc1783995 authored almost 4 years ago by James R. Barlow <[email protected]>
Package OCR in Form XObject

Should improve results in some situations where the initial content
stream is messy or not well-...

github.com/ocrmypdf/OCRmyPDF - 390fdf8c05f5a07f25748add31d3775d7a7cf1fb authored almost 4 years ago by James R. Barlow <[email protected]>
Stricter parameter checking for many public functions

github.com/ocrmypdf/OCRmyPDF - bccf2f423f43b61425ccdc10d3001d8e2a0c278a authored almost 4 years ago by James R. Barlow <[email protected]>
Merge branch 'feature/colorstrategy'

github.com/ocrmypdf/OCRmyPDF - 166de3086b333fc2625fda59ffcccb4c414bdb52 authored almost 4 years ago by James R. Barlow <[email protected]>
docs: api

github.com/ocrmypdf/OCRmyPDF - 206c675df68bbca06a7358b6c7a137d5f27d3032 authored almost 4 years ago by James R. Barlow <[email protected]>
Update awslambda to new pluginspec

github.com/ocrmypdf/OCRmyPDF - 6c8f9223e9e3e18f95ccd9cd60c77e0b6a6d86d0 authored almost 4 years ago by James R. Barlow <[email protected]>
Fix calls to hook.get_executor

github.com/ocrmypdf/OCRmyPDF - 85c6a974ca4ef8335d6e413ad008d959535d951a authored almost 4 years ago by James R. Barlow <[email protected]>
leptonica: tidy

github.com/ocrmypdf/OCRmyPDF - dccdcfaa913db3c4f4258927a806f7eb36a8ed36 authored almost 4 years ago by James R. Barlow <[email protected]>
Add plugin for setting logging console

So that we are not tied to tqdm.

github.com/ocrmypdf/OCRmyPDF - b1da09f141fe1c9af0aa6c7e3b4cc8b39b5c31ed authored almost 4 years ago by James R. Barlow <[email protected]>
optimize: rewrite JPEG optimize to avoid use of tqdm and parallelize

For some reason JPEG optimization was not done in parallel, and was
perhaps never done in parall...

github.com/ocrmypdf/OCRmyPDF - 42c84531e42d909b1e1a295483f32bceca7b8d37 authored almost 4 years ago by James R. Barlow <[email protected]>
optimize: Remove shim for unsupported pikepdf version

github.com/ocrmypdf/OCRmyPDF - a9ad805347e48df3cd6ed53be46bc3519b46a429 authored almost 4 years ago by James R. Barlow <[email protected]>
Refactor - decouple progressbar from executor

github.com/ocrmypdf/OCRmyPDF - 16bda74974df2802f44f421402ed969b3f119e3b authored almost 4 years ago by James R. Barlow <[email protected]>
Refactor to eliminate global state in _concurrent

github.com/ocrmypdf/OCRmyPDF - d274d88929d7d87b0e41313325ca56b1e4739b87 authored almost 4 years ago by James R. Barlow <[email protected]>
Use ColorConversionStrategy "LeaveColorUnchanged"

Faster, still produces PDF/A

github.com/ocrmypdf/OCRmyPDF - 327df5cbbc78ce160e877eb27bae603791fe397b authored almost 4 years ago by James R. Barlow <[email protected]>
v11.6.0 release notes

github.com/ocrmypdf/OCRmyPDF - 46d0632fe27f75e583e957edcc21beec63a2223e authored almost 4 years ago by James R. Barlow <[email protected]>
Delinting

github.com/ocrmypdf/OCRmyPDF - ef1e7a814ec7578c81589fa5e34042f3f7a7e9ae authored almost 4 years ago by James R. Barlow <[email protected]>
docs: improve API docs

github.com/ocrmypdf/OCRmyPDF - 108472493762cb529b158e238432e8a1f1e44cbf authored almost 4 years ago by James R. Barlow <[email protected]>
docs: fix rst formatting error

github.com/ocrmypdf/OCRmyPDF - ecb0109d79fcbfc015a3a329e113aadc8605dac3 authored almost 4 years ago by James R. Barlow <[email protected]>
Make progress pool common rather than plugin-specific

github.com/ocrmypdf/OCRmyPDF - 386cabff001dabfb8aa3e8644fc02ba6ce7ba083 authored almost 4 years ago by James R. Barlow <[email protected]>
lambda: move to extra_plugins folder

github.com/ocrmypdf/OCRmyPDF - 3bd5054634446cec9fa2152e7dc6fa66ed2c4fa8 authored almost 4 years ago by James R. Barlow <[email protected]>
lambda: more issues related to new executor semantics

Now all tests pass, except for:
-tests that check the progress bar
-tests where xdist may or may...

github.com/ocrmypdf/OCRmyPDF - 6a8dd65aa28ac87ef8ec38167991de29948829fb authored almost 4 years ago by James R. Barlow <[email protected]>
lambda: don't overrun number of workers needed

github.com/ocrmypdf/OCRmyPDF - 6083b4f0a7e71c7547136aaa91b14c0a0309e87e authored almost 4 years ago by James R. Barlow <[email protected]>
lambda: Don't be paranoid about exception marshalling

It works

github.com/ocrmypdf/OCRmyPDF - 1a3ce59476df8f77a9a7fb297963358833516b1d authored almost 4 years ago by James R. Barlow <[email protected]>
Temporary move into package

github.com/ocrmypdf/OCRmyPDF - c6a2716cdbe85ca94cc69216ba94b5b04f29cfcf authored almost 4 years ago by James R. Barlow <[email protected]>
lambda: tidying, special casing use_threads

github.com/ocrmypdf/OCRmyPDF - c395436ba30bc78d529c147bb75aaf18c741fe92 authored almost 4 years ago by James R. Barlow <[email protected]>
Operational lambda executor

github.com/ocrmypdf/OCRmyPDF - 8d23d0b4414ac955483d433edc568c6b7b0c7804 authored almost 4 years ago by James R. Barlow <[email protected]>
tests: fix concurrency

github.com/ocrmypdf/OCRmyPDF - 7bccb8c74844af7b1b7f1376a2abc3bdce7bb466 authored almost 4 years ago by James R. Barlow <[email protected]>
lambda_plugin.py: doesn't work since entry point needs to be in package

github.com/ocrmypdf/OCRmyPDF - 5545bae76f986f8d965c0aa0c4787c71e77d8b30 authored almost 4 years ago by James R. Barlow <[email protected]>
concurrency: lock progress pool

For API sanity and to communicate expectations. One progress pool at
a time is plenty of complex...

github.com/ocrmypdf/OCRmyPDF - 173c0d12740cfc9a6731a427670e1a9590effa28 authored almost 4 years ago by James R. Barlow <[email protected]>
pdfinfo: remove some messy concurrency handling

We can cut down on the use of global variables and save opening
an extra copy of the Pdf when th...

github.com/ocrmypdf/OCRmyPDF - 6953f324653ea4b5903e6137a142578733df7c70 authored almost 4 years ago by James R. Barlow <[email protected]>
Refactor concurrency so that it is pluggable

However, this may not be the best idea because it involves global
state that could be overridden...

github.com/ocrmypdf/OCRmyPDF - 26b4d9bb4b4bf508ed8d22419eff8a8e64512c67 authored almost 4 years ago by James R. Barlow <[email protected]>
Use queue.Queue instead of multiprocessing.Queue in threaded mode

github.com/ocrmypdf/OCRmyPDF - 34e564cd7de9847c63093b8d3601968341b22148 authored almost 4 years ago by James R. Barlow <[email protected]>
Refactor plugin manager to eliminate callback

github.com/ocrmypdf/OCRmyPDF - 504d5776d2118bfea6bddcc2b7407604378c43fb authored almost 4 years ago by James R. Barlow <[email protected]>
Re-sequence plugin installation

github.com/ocrmypdf/OCRmyPDF - ee23976858418a464956fcd48c719d18f36816d4 authored almost 4 years ago by James R. Barlow <[email protected]>
Insert setuptools plugins with ocrmypdf prefix

github.com/ocrmypdf/OCRmyPDF - f559316881befc67d389530599d0bbd10f31019e authored almost 4 years ago by James R. Barlow <[email protected]>
Automate insertion of builtin modules

github.com/ocrmypdf/OCRmyPDF - 084610c242be607d5cf586146f0829c71b3a7951 authored almost 4 years ago by James R. Barlow <[email protected]>
Update pre-commit

github.com/ocrmypdf/OCRmyPDF - 9ff627472b24dfab4ef33a4cc06faccc98298dc3 authored almost 4 years ago by James R. Barlow <[email protected]>
Import PageContext, PdfContext since they are referenced in pluginspec

github.com/ocrmypdf/OCRmyPDF - 956310d1ecc69c9eca121560aa35533daf0c55d6 authored almost 4 years ago by James R. Barlow <[email protected]>
tests: confirm that we produce pdf when optimization is off

github.com/ocrmypdf/OCRmyPDF - 1a982da442a91bea8a50b0d67b4de60820be1b9e authored almost 4 years ago by James R. Barlow <[email protected]>
docs: no MS Store Python

github.com/ocrmypdf/OCRmyPDF - 4879a1f0ded5e1e4cbeb98e6f6f1b4f4943ee4be authored almost 4 years ago by James R. Barlow <[email protected]>
github: Ask how ocrmypdf was installed

github.com/ocrmypdf/OCRmyPDF - ce66bcc9c84b224bb557a8265f6332b317771ab5 authored almost 4 years ago by James R. Barlow <[email protected]>
v11.5.0 release notes

github.com/ocrmypdf/OCRmyPDF - 1ebf3144afd0ba454551e043d71523fb7f46182f authored almost 4 years ago by James R. Barlow <[email protected]>
Fallback to LeptonicaErrorTrap_Redirect if ffi.callback fails

Might fix issue #709, Apple silicon support.

github.com/ocrmypdf/OCRmyPDF - 7a1cccbc4e098762fb0a6303cf5193b94a503985 authored almost 4 years ago by James R. Barlow <[email protected]>
tests: Fix debug logging test

github.com/ocrmypdf/OCRmyPDF - ebacff1b3915435365b9391e768f4547b558081d authored almost 4 years ago by James R. Barlow <[email protected]>
Add test for configure_debug_logging

Since we can't directly test it

github.com/ocrmypdf/OCRmyPDF - c7c447be66cb51b9a389901e66a56d738512fad8 authored almost 4 years ago by James R. Barlow <[email protected]>
Consider text when determining page raster DPI

Previously if we found vectors of any sort on a page, we would bump
the DPI up to 400. We did no...

github.com/ocrmypdf/OCRmyPDF - 91aa175602e2b3a878dc4bc2bda8c8865e8a5e4b authored almost 4 years ago by James R. Barlow <[email protected]>
Create raster PDF pages to match input page size

Previously we produced a raster image, then multiplied image width
by DPI to get the page size. ...

github.com/ocrmypdf/OCRmyPDF - b267494e4a38e694178f8f49bba7d0a760a8a847 authored about 4 years ago by James R. Barlow <[email protected]>
tests: tidy pdfinfo

github.com/ocrmypdf/OCRmyPDF - f687180ecc3561ee129ee4991755f7d6b3bea02f authored about 4 years ago by James R. Barlow <[email protected]>
ghostscript: tidy comments

github.com/ocrmypdf/OCRmyPDF - 6f4b38b103d8481986bf870257185f5875f775ab authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.5 release notes

github.com/ocrmypdf/OCRmyPDF - d32324859ca54cc0ef225f3724ea3296438442c7 authored about 4 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - 48222b87b5eafbe054178501b72a68436b7656a4 authored about 4 years ago by James R. Barlow <[email protected]>
fix unclosed file warnings. (#710)

Co-authored-by: Jonas Winkler <[email protected]>

github.com/ocrmypdf/OCRmyPDF - 62e5edc72bfbff8640f1589355a89c1d30d46b7c authored about 4 years ago by Jonas Winkler <[email protected]>
Remove .coveragerc and fold into setup.cfg

github.com/ocrmypdf/OCRmyPDF - 2846d46bb83ea846fb71dfafab46ef03067943ec authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.4 release notes

github.com/ocrmypdf/OCRmyPDF - 47ef1914d491f9a6652c554aa6c93099410f2439 authored about 4 years ago by James R. Barlow <[email protected]>
Make ocrmypdf.ocr take a threading lock

github.com/ocrmypdf/OCRmyPDF - df157552f3040773eaac84a2d6dee807e75a1b7e authored about 4 years ago by James R. Barlow <[email protected]>
Partial fix crash on 'userunit' None (#700)

Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer retu...

github.com/ocrmypdf/OCRmyPDF - 0b3a526049f10032a29939a82dcd5113cd895935 authored about 4 years ago by James R. Barlow <[email protected]>
tesseract: fix typing of some optional arguments

github.com/ocrmypdf/OCRmyPDF - 1e80d412fa983e96d30128770fbaba366132cc58 authored about 4 years ago by James R. Barlow <[email protected]>
concurrent: simplify results loop

github.com/ocrmypdf/OCRmyPDF - df6e1062033003ca3dcf5463b95d04b74d00481b authored about 4 years ago by James R. Barlow <[email protected]>
tests: tag tests that need pngquant, jbig2enc

github.com/ocrmypdf/OCRmyPDF - bd0f00586147795993629c8d362134213fae77ff authored about 4 years ago by James R. Barlow <[email protected]>
ci: temporarily disable pngquant on Windows

Looks like a packaging error, choco complains of bad hashes.

github.com/ocrmypdf/OCRmyPDF - 6ba4b7b3f3a2dbbec9f974df97f551d23fff025d authored about 4 years ago by James R. Barlow <[email protected]>
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF

github.com/ocrmypdf/OCRmyPDF - 2c11349ee83475d85a21ab53cbfa972cec55e5bb authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.3 release notes

github.com/ocrmypdf/OCRmyPDF - b0afef09efe23124adbdef95e625739719286f14 authored about 4 years ago by James R. Barlow <[email protected]>
tests: skip metadata test for two pikepdf versions that warn incorrectly

github.com/ocrmypdf/OCRmyPDF - 72fa347c38319613516dc0b6a2004fbfc0367b39 authored about 4 years ago by James R. Barlow <[email protected]>
pipeline: refactor metadata_fixup

github.com/ocrmypdf/OCRmyPDF - 96d68c2413672dd6b37c2aef8b5365b75d76cd28 authored about 4 years ago by James R. Barlow <[email protected]>
tests: assert that most patched functions are called

We were not actually checking if functions we patched we called when
expected.

github.com/ocrmypdf/OCRmyPDF - babc76fa740a282240fada0620088c439a1cc12d authored about 4 years ago by James R. Barlow <[email protected]>
docs: fix simple typo, instsalled -> installed (#704)

There is a small typo in docs/installation.rst.

Should read `installed` rather than `instsall...

github.com/ocrmypdf/OCRmyPDF - dc06990e5d67f4b89e4f342437384c3ad41925ae authored about 4 years ago by Tim Gates <[email protected]>
Remove PDF/A overprint debug message

Since we currently log all of a process's output at debug it's
redundant to log this separate me...

github.com/ocrmypdf/OCRmyPDF - 0ff0d2f8d16a38bb0216cc848cff7baed6f08994 authored about 4 years ago by James R. Barlow <[email protected]>
Fix test not patching properly after Ghostscript polling change

github.com/ocrmypdf/OCRmyPDF - 81602cf420e8e79609097c39b62d85cdf5828604 authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.2 release notes

github.com/ocrmypdf/OCRmyPDF - 607e2d7e8172da0534744ff89b2c7a930d9eb794 authored about 4 years ago by James R. Barlow <[email protected]>
Deal with missing pthread_sigmask on Cygwin

Closes #701

github.com/ocrmypdf/OCRmyPDF - b01d9e07e80435f755429c978018b850015ee60d authored about 4 years ago by James R. Barlow <[email protected]>
watcher: fix OCR_LOGLEVEL env var not processed

Closes #702

github.com/ocrmypdf/OCRmyPDF - 91db94cf2ec166b7a30bdf01391f2ed5c771f84c authored about 4 years ago by James R. Barlow <[email protected]>
pdfinfo: stricter typing

github.com/ocrmypdf/OCRmyPDF - 416df803d46e39235fd05d61ab0a4e34f57efd67 authored about 4 years ago by James R. Barlow <[email protected]>
pdfinfo: refactor to eliminate RawPageInfo

github.com/ocrmypdf/OCRmyPDF - 037b96ca16217bb91f30e0f838bdab3eee74c5d8 authored about 4 years ago by James R. Barlow <[email protected]>
pdfinfo: Refactor pageinfo dictionary into a class

github.com/ocrmypdf/OCRmyPDF - bb258fc99c3d3676977d708dded660d3802bc283 authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.1 release notes

github.com/ocrmypdf/OCRmyPDF - 4b8ccbe8cb76480b03ab42b0c61814acd1c59a60 authored about 4 years ago by James R. Barlow <[email protected]>
misc: synology fix

Accept user-contributed fix. Not testable.

Close #690.

github.com/ocrmypdf/OCRmyPDF - ab1ff3331b3d9c42ec4b8920fedac5834f7e10aa authored about 4 years ago by James R. Barlow <[email protected]>
Fix certain invalid page ranges causing exception

Closes #686

github.com/ocrmypdf/OCRmyPDF - 3675ae918cf4e91b4fed9a237350c2c0f7a5d1aa authored about 4 years ago by James R. Barlow <[email protected]>
Revert "v11.4.0 release notes - remove change not actually implemented"

This reverts commit ad202693b3dcf905e180a665a54f349d00d8dfba.
Temporary folder prefix was actual...

github.com/ocrmypdf/OCRmyPDF - 0ba32b96b71bc683139b1c52ed75ec37fb36028e authored about 4 years ago by James R. Barlow <[email protected]>
docs: com.github.ocrmypdf -> ocrmypdf.io

github.com/ocrmypdf/OCRmyPDF - add64e4fa2e8887be42759b8117b0feec62aa320 authored about 4 years ago by James R. Barlow <[email protected]>
Change wheel tag to py36, update package_data to include py.typed

github.com/ocrmypdf/OCRmyPDF - 7fe2954edef586c93cb9a1b9aaef666b10d96d65 authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.0 release notes - remove change not actually implemented

Remove a change that was pushed back to a future release.

github.com/ocrmypdf/OCRmyPDF - ad202693b3dcf905e180a665a54f349d00d8dfba authored about 4 years ago by James R. Barlow <[email protected]>
v11.4.0 release notes

github.com/ocrmypdf/OCRmyPDF - 594ef83551be9233c6fe12a4b958ed793c7b51ba authored about 4 years ago by James R. Barlow <[email protected]>
Fix BufferedReader TypeError

github.com/ocrmypdf/OCRmyPDF - 78b71618c1ac8244f2fd96bb37d185f32c69186f authored about 4 years ago by James R. Barlow <[email protected]>
Fix log message queue flooding on certain files

Fixes #692

github.com/ocrmypdf/OCRmyPDF - b8aa89e1ece7fbd58910e390057415dab96fea62 authored about 4 years ago by James R. Barlow <[email protected]>
cli: typing

github.com/ocrmypdf/OCRmyPDF - 156d5d9a9c2739a4c5b1837a99f7ec5efb5b818b authored about 4 years ago by James R. Barlow <[email protected]>
typing: tidy up

github.com/ocrmypdf/OCRmyPDF - b4c1f66bc1d9b39811644f33ce1e3da80c0d0a53 authored about 4 years ago by James R. Barlow <[email protected]>
pdfa: help mypy figure out a type

github.com/ocrmypdf/OCRmyPDF - d2908640c61480011091ddd20e1245e72507f038 authored about 4 years ago by James R. Barlow <[email protected]>