Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Collective - Host: opensource - https://opencollective.com/ocrmypdf - Code: https://github.com/jbarlow83/OCRmyPDF

Use pikepdf for get_pdfmark

It does fine.

github.com/ocrmypdf/OCRmyPDF - 18595ca86a5212ab8737e63ed6efd82ae2a930eb authored over 6 years ago
Ubuntu 14.04 has a qpdf 8.0.2 backport, making life easier

github.com/ocrmypdf/OCRmyPDF - 3e269fa1882fe19f51974654b6abfa669a9c9c4f authored over 6 years ago
Try getting qpdf from Ubuntu 18.04

github.com/ocrmypdf/OCRmyPDF - 65405c2cb92eba2eb77c5e3253a18a36f378627a authored over 6 years ago
Travis: maybe upgrading wheel?

github.com/ocrmypdf/OCRmyPDF - 442cf8897a2df8d332473704a349c38387a541f5 authored over 6 years ago
Travis: hack in qpdf appimage version

qpdf from appimage does not report its version with --version if renamed
or accessed via symlink...

github.com/ocrmypdf/OCRmyPDF - d5fb275e9eccad62a2181354b78c3253a800494d authored over 6 years ago
Travis: why can't we use qpdf appimage?

github.com/ocrmypdf/OCRmyPDF - e60aec81ca947948eead7f8652215582fe96c5cc authored over 6 years ago
optimize: Changed pikepdf API

github.com/ocrmypdf/OCRmyPDF - 398e9e535ef014b54464bfe7833b0f4d8db8be25 authored over 6 years ago
Refactor JBIG2 path for non-CCITT monochrome images

github.com/ocrmypdf/OCRmyPDF - 08bf651ef25fd5ca1b84879ef57738e62a0e4af2 authored over 6 years ago
optimize: move a lot of image scanning code to pikepdf

github.com/ocrmypdf/OCRmyPDF - 6171de41bff1e300b8d7dd09a2bde6939ad7a9e8 authored over 6 years ago
Pull JobContext out of pipeline.py to avoid circular reference

github.com/ocrmypdf/OCRmyPDF - f0a56592e294d66cef919df7f9bb26f862ee482d authored over 6 years ago
Another fitz failure - incorrect object reference introduced

MuPDF/fitz changed some font references to point to table of contents
entries, corrupting the pa...

github.com/ocrmypdf/OCRmyPDF - 87a7d4d1a83ce28caaefa4a2a4c1363fa33a49bb authored over 6 years ago
Travis: Tweak setup so it can run

github.com/ocrmypdf/OCRmyPDF - 96e453feb6001b50202b4075fe39cca02d36cbbf authored over 6 years ago
Move qpdf to before_script

github.com/ocrmypdf/OCRmyPDF - 3bde0715b084a92e57f2490e76d0b75800bd1ccf authored over 6 years ago
Travis: adjust qpdf appimage

github.com/ocrmypdf/OCRmyPDF - e2ec3d8b9bfc43038139e21edc84aef5d80c93dd authored over 6 years ago
Travis: try using qpdf appimage to speed up build

github.com/ocrmypdf/OCRmyPDF - ad91eaf8a7528f285e590c0792cdfb7987c9427f authored over 6 years ago
PyMuPDF 1.13.4 looks good, use it

github.com/ocrmypdf/OCRmyPDF - b6d30214fd978718ec33aae1f172dae7b5f2f492 authored over 6 years ago
Fix "AttributeError: 'ImageInfo' object has no attribute '_type'"

Also deal with 'fixme' imagemask comment.

Also fix bpc incorrectly set to 8 by default on stenc...

github.com/ocrmypdf/OCRmyPDF - c4ab01d63d9a4f1ac473d5c5de955402f2dbb90a authored over 6 years ago
Fix rotate_pages_threshold test failure

github.com/ocrmypdf/OCRmyPDF - 4ba3b3f55a3e53fc2b4da6e717f0f203dd95e2ad authored over 6 years ago
optimize: Fix error causing many images to be skipped

github.com/ocrmypdf/OCRmyPDF - 52d2706a9e3cb0e435a2c4cd07602ab0f843e9c3 authored over 6 years ago
leptonica: ErrorTrap is an implementation detail

github.com/ocrmypdf/OCRmyPDF - 964afc69f69310714fd9ad83456ab73cb0404b54 authored over 6 years ago
optimize: leptonica can fail to open PNG

ERROR - Info in pixReadStreamPng: converting (cmap + alpha) ==> RGBA
Error in pixReadStreamPng:...

github.com/ocrmypdf/OCRmyPDF - 3ddf545ccdbaad8a74710b964730c701f4668e3b authored over 6 years ago
optimize: process ICCBased images that declare an /Alternate we recognize

github.com/ocrmypdf/OCRmyPDF - f9374733bb55be96244da137da76cee3b7b7218d authored over 6 years ago
optimize: Refactor naming helpers

github.com/ocrmypdf/OCRmyPDF - 5930135f4550a4c739363c266aa805cc83228a20 authored over 6 years ago
optimize: document problem with transcode free compressed image data

github.com/ocrmypdf/OCRmyPDF - f03f6bc1281649e0907b486af8a2e0b48feda518 authored over 6 years ago
Try to optimize paletted images

github.com/ocrmypdf/OCRmyPDF - 6c50c7023580012c7118ab7bde4111b0ef7fa5ad authored over 6 years ago
optimize: add knobs to control image quality but don't show the user yet

github.com/ocrmypdf/OCRmyPDF - 8790fc2c1bfab810b3d9a22315da3f5dbf9a6474 authored over 6 years ago
optimize: don't alter >8 bpc images

github.com/ocrmypdf/OCRmyPDF - f86c4fccf4ad7a96fa8f35106a10de36350e4fb9 authored over 6 years ago
main: do better parameter validation

github.com/ocrmypdf/OCRmyPDF - 7d0785e9ed621744ba1ad9c7db0f9fd6eacf56d2 authored over 6 years ago
Ignore masks when deciding what color to rasterize at

github.com/ocrmypdf/OCRmyPDF - 2cac88162c2d85bc353f8446b22496a156818219 authored over 6 years ago
Fix jbig2enc name

github.com/ocrmypdf/OCRmyPDF - 4809627d8a891063d351e004e41dc5e4fa742fc9 authored over 6 years ago
Temporarily unbreak without fitz mode

github.com/ocrmypdf/OCRmyPDF - 871979abd6cb170315b926696fd26e4894e503c8 authored over 6 years ago
Travis: Use declarative APT for Tesseract too

github.com/ocrmypdf/OCRmyPDF - efb95722ca953f3559c3447716ee2137e03f52c2 authored over 6 years ago
Don't try to run jbig2 when not available

github.com/ocrmypdf/OCRmyPDF - d9bbb80a6b81e68ccc269dc5e8af2dad8628b1b9 authored over 6 years ago
Update test cache

github.com/ocrmypdf/OCRmyPDF - 3254315127eeb96739defd796569e79d8504b8b6 authored over 6 years ago
Warn about --user-words not having any effect

Might be available in full release of Tess4

github.com/ocrmypdf/OCRmyPDF - ac36a43cef85ba96eab3a22063908f6e18ef2bbe authored over 6 years ago
Update our dependencies

github.com/ocrmypdf/OCRmyPDF - f00183115d69d52ef98d589469ec8c6af2f494e4 authored over 6 years ago
Check jbig2 when optimizing is requested

github.com/ocrmypdf/OCRmyPDF - 161b29a899a999e34e629e1397ab7321484e8a1e authored over 6 years ago
Add arguments to control optimization

github.com/ocrmypdf/OCRmyPDF - 72253d09fa3603e2b32b7ee270300f9f3c355613 authored over 6 years ago
Fix merge error in Leptonica

github.com/ocrmypdf/OCRmyPDF - 40d09ddb23a3d87da27ffa5588a4f57fee233280 authored over 6 years ago
Remove jbig2enc.py

github.com/ocrmypdf/OCRmyPDF - 3026d86a9e46fe1a330b1493a6039ac7a81a2599 authored over 6 years ago
Merge optimize

github.com/ocrmypdf/OCRmyPDF - 0661a7edc374a8dc09e8b9cfbc7f88954e049ce5 authored over 6 years ago
Merge branch 'master' into develop

github.com/ocrmypdf/OCRmyPDF - 24b0adfacc2b4b06f96122659444c99ce1ec3a3b authored over 6 years ago
Make XML metadata test actually work

github.com/ocrmypdf/OCRmyPDF - acc6698ab3eda3d2a46ee92781d62fbb30afd1fd authored over 6 years ago
Remove tests that exercise obsolete features (tesseract, -g)

github.com/ocrmypdf/OCRmyPDF - 606d3e6aa1683aee6015a61c5c63bc0e23a66999 authored over 6 years ago
test_main: uses leptonica

github.com/ocrmypdf/OCRmyPDF - 687a7954d6881e5b08d20b9a71068c803591ff40 authored over 6 years ago
Weave: Unconditionally rotate and scale the text layerThis solves two issues. First, the text layer can end up being adifferent size, probably if the DPI is not an integer; scaling helps itfit slightly better. Second, other printable text on the page can end uphorizontally scaled or misaligned if we don't all of our drawing in aq/Q pair.

github.com/ocrmypdf/OCRmyPDF - 36a53a7b37b4c1031dd4da4222e6a9f74bd5b12c authored over 6 years ago
PyMuPDF tweaks: don't clean

In MuPDF 1.13 clean might be unreliable, so explicitly don't do it,
even though it doesn't cause...

github.com/ocrmypdf/OCRmyPDF - 0a5982a9025eaf8976226b9251244b6858d79d8b authored over 6 years ago
Return to PyMuPDF 1.12.5

github.com/ocrmypdf/OCRmyPDF - 601863f9e966a572a6afba5a36c39ff4449182ea authored over 6 years ago
Fix DPI mismatch between OCR page and source page

github.com/ocrmypdf/OCRmyPDF - c9ce731119c537b700cce92ea06e1fe1ccfc02c4 authored over 6 years ago
Add metadata preservation test from stash

github.com/ocrmypdf/OCRmyPDF - abed8e034e6bc97032676747898ba5d9b835dbc9 authored over 6 years ago
Revert "Since PyMuPDF 1.13.3 corrupts text, pin 1.12.5 and work around it"

This reverts commit b0ce7c63dd27257d9c979fde9013243b8ae38c98.

github.com/ocrmypdf/OCRmyPDF - 63032d304d9076c748fcc960fe3203650c69d890 authored over 6 years ago
Refactor textareas to remove duplicate code

github.com/ocrmypdf/OCRmyPDF - a57ecede7806322299b92171c6e99f62804a1fd8 authored over 6 years ago
Since PyMuPDF 1.13.3 corrupts text, pin 1.12.5 and work around it

github.com/ocrmypdf/OCRmyPDF - b0ce7c63dd27257d9c979fde9013243b8ae38c98 authored over 6 years ago
Weave: periodically save to prevent indefinite growth of open file list

github.com/ocrmypdf/OCRmyPDF - d139a11c166a4d485ac8c7a2b6f0cd5bd8d2b562 authored over 6 years ago
Revise parameter validation for output-type, pdf-renderer, lang

github.com/ocrmypdf/OCRmyPDF - aef043db0b563ac774fa0768facd0116b59c34ca authored over 6 years ago
Remove tesseract renderer entirely

Grafting lets us work with older Tesseract versions as if they could use
sandwich, so there is n...

github.com/ocrmypdf/OCRmyPDF - b8f3ead541b0e64669c06af8b2271b50a79a4bcb authored over 6 years ago
Remove hocr debug renderer (-g)

The fact that this produces additional pages makes it a maintenance
burden. hocr can be debugged...

github.com/ocrmypdf/OCRmyPDF - e0bb898f29c7e854121575c964dad82cd3e35025 authored over 6 years ago
textareas: filter out images

github.com/ocrmypdf/OCRmyPDF - 45336c7c2844dc0b090868161e38eb93eb014ac1 authored over 6 years ago
When deciding if there is a text on a page, ignore the margins

Margins may include watermarks or digital stamps on otherwise
text-free pages.

github.com/ocrmypdf/OCRmyPDF - 20aabb2e838eb52b2e75189ace05908d4cc1d3c7 authored over 6 years ago
Ignore masks when deciding what color to rasterize at

github.com/ocrmypdf/OCRmyPDF - 1539e24d61b98569e1611827d0e0b0032403ca40 authored over 6 years ago
Fixed language option example (French) (#266)

Replace fre to fra.

github.com/ocrmypdf/OCRmyPDF - c7cf041e4a33b22923d4e1cfb0e4a96c6926875d authored over 6 years ago
Add unconditional (for now) whiteout of text areas

github.com/ocrmypdf/OCRmyPDF - da80d3f3545e80325b9ce87ed3c352a7fd6cb553 authored over 6 years ago
Upgrade PyMuPDF version

github.com/ocrmypdf/OCRmyPDF - 001c8d767847209d53b89e22c47470ec7fd9e976 authored over 6 years ago
Restore unpaper

It's a suggested/recommended dep not required in Deb/Ubu.

github.com/ocrmypdf/OCRmyPDF - 38ab03655b229c4206fa58ff18c2217c5b70d7c2 authored over 6 years ago
Trap PDF/A-3 errors on old Ghostscript

github.com/ocrmypdf/OCRmyPDF - 9226f8a5d1bd8dfd6e4b9d6b035de6a4703ea4fb authored over 6 years ago
Fix failure to prevent use of Ghostscript on /UserUnit files

github.com/ocrmypdf/OCRmyPDF - 5c8a007f3ed8d15c7dc2a815c7692d544fb6b103 authored over 6 years ago
v6.2.0 Release notes

github.com/ocrmypdf/OCRmyPDF - d607553e482f9b63ce9a89603d48d2387516cdc5 authored over 6 years ago
Merge branch 'feature/pdfa3'

github.com/ocrmypdf/OCRmyPDF - 7cf83c77ca5a93aed5ed25debbef10b78cfdd8f3 authored over 6 years ago
Fix XMP validation issue with /CreationDate

Related to previous validation issue. If the /CreationDate had no
timezone, Ghostscript also cre...

github.com/ocrmypdf/OCRmyPDF - 8a9f174f63ec09881de996ff6751cc830e80ff21 authored over 6 years ago
Add 18.04 update procedure

github.com/ocrmypdf/OCRmyPDF - 98a0786c320217ca7bc2d3f43adf2738727ac9a2 authored over 6 years ago
Update Dockerfile for Ubuntu 18.04

github.com/ocrmypdf/OCRmyPDF - df1129724c6b3b9c9c9c6a85a053c3a6dede5809 authored over 6 years ago
Handle procset properly

github.com/ocrmypdf/OCRmyPDF - 423cef08bf200f650b65a83690d0127de9662137 authored over 6 years ago
Document aliasing of tesseract renderer

github.com/ocrmypdf/OCRmyPDF - 04580accb419ad185a999977d9277ffe49d447de authored over 6 years ago
Refactor, remove trigonometry

github.com/ocrmypdf/OCRmyPDF - 6376f77b8cfcbc54fb7a3f7d39ec0075b6a51245 authored over 6 years ago
Fixed rotation hard case

github.com/ocrmypdf/OCRmyPDF - e27e614ed944cf92270c60022755159ba8f3e1f5 authored over 6 years ago
Fixed all but one rotation case

github.com/ocrmypdf/OCRmyPDF - b0c04704a124ca59a33b463afae5578dbaed597a authored over 6 years ago
Fix correction angle used from wrong page

github.com/ocrmypdf/OCRmyPDF - 6bb6bf8323d9addbbe10e3774c64495a62fc23c5 authored over 6 years ago
Silence debug messages

github.com/ocrmypdf/OCRmyPDF - e22fe8aefc11c78f633707c06c0f7b22d3c9ad69 authored over 6 years ago
Split out rotation related tests

github.com/ocrmypdf/OCRmyPDF - 76276f61e5a56124bb52177d01dd0b163922653a authored over 6 years ago
Tests: confirm OCR layer copied

github.com/ocrmypdf/OCRmyPDF - bfd26e6ec6b22815e851f64a4f66601d2d03c500 authored over 6 years ago
ghostscript.py not saved in last commit

Given importance of last one, confirmed that when the file is saved all tests pass too.
Passing ...

github.com/ocrmypdf/OCRmyPDF - d787e1ea0fcbac6aae645b22af7c244b0c7bfd22 authored over 6 years ago
Fix all issues with rotations

All tests now pass

github.com/ocrmypdf/OCRmyPDF - b5d7e9cbb06b491dd585d367197711915e25cfe0 authored over 6 years ago
Fix a comment about Tesseract behavior in certain versions

github.com/ocrmypdf/OCRmyPDF - f3b6d9dcdf0b1093a16fa1eb45fa64311e839187 authored over 6 years ago
Remove the old tesseract pdf_renderer

github.com/ocrmypdf/OCRmyPDF - a9abe13185b51a836020de9b6410952ccad53aa0 authored over 6 years ago
Add ability to disable cache

github.com/ocrmypdf/OCRmyPDF - 6b315e83156d911776dc05e5e7aec51fa7796b77 authored over 6 years ago
Fix regressions: pdfa.ps not used, PDF/A failures, handling of text layers with no font

github.com/ocrmypdf/OCRmyPDF - 37677de8845319d180bbfa384a767246a5e10a04 authored over 6 years ago
Fix auto rotate

github.com/ocrmypdf/OCRmyPDF - c7387de325c769bf0b746085f41d3811f5cbc94b authored over 6 years ago
Refactor find font, get test cases working again

github.com/ocrmypdf/OCRmyPDF - 2495b1e038fc2519b56225541d317c912c602509 authored over 6 years ago
Use hocr and weave; eliminate old combine layers and merge pages

github.com/ocrmypdf/OCRmyPDF - 073ee52ce74c766641c5b82aa7086acc07558510 authored over 6 years ago
Further elimination of tesseract renderer special casing

We don't need to keep a "skip page" around anymore since
skipping means just not grafting on the...

github.com/ocrmypdf/OCRmyPDF - 54150a14e96e8a9d787196380f195c3b2f87010e authored over 6 years ago
Unify tesseract and sandwich renderer paths

Since the new weaving method copies the font and content
stream from the Tesseract PDF, it doesn...

github.com/ocrmypdf/OCRmyPDF - 88ff091ccecfb6c185426ca0d4c874a7ab00b7d8 authored over 6 years ago
Remove now-unnecessary code to rotate pages

Track only the decision to change rotation.

github.com/ocrmypdf/OCRmyPDF - e87a5776f1b431d6f19b6f09bd9026abae2b2ce1 authored over 6 years ago
Fix rotation for unsplit (modulo --rotate-pages)

github.com/ocrmypdf/OCRmyPDF - 0806ce6406b042bbade3917ac54e7789ca5ea488 authored over 6 years ago
feature/unsplit-try-imagerotate

github.com/ocrmypdf/OCRmyPDF - 6409894a713f40fd57eb25058ed24cd2f243438e authored over 6 years ago
Unsplit now works with multipage, --force-ocr

github.com/ocrmypdf/OCRmyPDF - e7286f61294d92cdb8bc9b0d38f4a6509d310d96 authored over 6 years ago
unsplit: it's alive

First successful file output.

github.com/ocrmypdf/OCRmyPDF - 2ab94b3151329ba0ad209df1565a59baf43721f9 authored almost 7 years ago