Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
https://github.com/ocrmypdf/OCRmyPDF

v10.0.1 release notes

863835f66060c933cfb7b7922d650d9128531342 authored over 4 years ago
Fix error on -l lang1+lang2

393c5a9ea4240593a2c6a90ed10a039588109a5f authored over 4 years ago
Fix tests that fail in CI

c6b9a49cbb2188e8036e3ba01229459de41b1e57 authored over 4 years ago
v10 release notes and dependencies

17a4831745a31151db130f47a6cd792c1e0a42b1 authored over 4 years ago
info: change "Scan" message

7caf1e85ff430dbe6fb45fd08e399b179b2cbdfc authored over 4 years ago
info: tidy handling of content streams

f59a757e8b2752fa99e607225051e45edaf7d308 authored over 4 years ago
Reinstate quick test for text/no text

Partial revert of commit 991db17

872bafad4b3bc50c2dcd40b82b4098ad0acd6296 authored over 4 years ago
Only do page analysis on pages we will do OCR on

8599400445d160c97bdb00ee1ae7d69ff4d0f5a2 authored over 4 years ago
Use pikepdf.open with block to manage PdfInfo

b6eebadf054b53480ff1f32ff92b6721683576d2 authored over 4 years ago
Simplify plugin_manager pickling

a4e88eb8f083b285771cc203f21aa1bb0ff243ff authored over 4 years ago
subprocess: lru_cache version checks

f6257c21839e93bf7fee9ff06196fabe71980010 authored over 4 years ago
Pre-release delinting

64891c2fc33712c5f9c6d69098469a8543a175e4 authored over 4 years ago
Merge branch 'release/v10' into trialmerge

fe156db41dc418a25b21fc48853234c724021b41 authored over 4 years ago
Rename ocrmypdf.exec -> ocrmypdf._exec

0f942fb714d5a5b7d8ea868f1221899ce85b4626 authored over 4 years ago
Move ocrmypdf.exec.run and friends to ocrmypdf.subprocess

be8ca589d42a980e0b3d3c2c9fdfb871adb0c764 authored over 4 years ago
Remove tesseract_env, --tesseract-env

3b6f6782f0a3950d0c0f4e6f19f16ec2f2acbd9a authored over 4 years ago
Remove _OCRMYPDF_TEST_PATH environment variable

21c0e045cbdd2a7a7253532ff34085a248239f03 authored over 4 years ago
The big payoff: abolishing spoofing machinery

ebbf68bd08dd25ea2afd5f674f9a7f3bb2903685 authored over 4 years ago
Convert all ghostscript spoofs to test plugins

2059e916da89bbfc2fc45b41707a40ac08c166d7 authored over 4 years ago
Plugins must return not-None if they intend to stop builtin

c22f2456064d70cbae6e25c9ac799fd6e76c47a8 authored over 4 years ago
Convert generate_pdfa to plugin

7b9025f3977bf7dc423ff4bc1eb374f06eb6bb6b authored over 4 years ago
Move Ghostscript rasterize_pdf to plugin

b109445215155fdd5d8707f96266885ca4706bbb authored over 4 years ago
docs: explain --rotate-pages-threshold

fd1cd8e50afaaada707f15d10af00f3be8f2c609 authored over 4 years ago
docs: Ubuntu 20.04 install instructions

c6c70c21712af36d065467abed1c7dc0278568cc authored over 4 years ago
Convert all tesseract cache usages to plugin

a9a473f2e5d544082f70946f53b26b75f4a3496e authored over 4 years ago
Begin replacing tests/spoof/tesseract_cache with plugin

6268e2fafff02469381f29550626b059e4eb9256 authored over 4 years ago
Convert tesseract_badutf8 to plugin

ec3f506500175b37cb76154e4a6c4dc33edd58bd authored over 4 years ago
v9.8.2 release notes

00daa51a7393cf4cf0db36241515b6835b2205dd authored over 4 years ago
docs: tidy Cygwin install

e60f4d3f43254e6040af4271877a4aad2c54f642 authored over 4 years ago
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF

7460745f80b271586acc01c8247c5008f6a53f46 authored over 4 years ago
Fix test_report_file_size

Use more realistic test data

5e14d5b0dd853e8958a8e81c866dbce1687f7889 authored over 4 years ago
layout: look for text in XObjects too

d118132fa6f0c52210276acfde80d9f91be37f4f authored over 4 years ago
Add installation instructions for Windows/Cygwin64 (#571)

Co-authored-by: Jim Garrison <[email protected]>

5f47aac36f9fb0efa87b562039bcb1d5073303b2 authored over 4 years ago
Remove unpaper spoof; no plugin needed

c6b2fa8851df3c1b23c3c2f2c525700d76ddb0e3 authored over 4 years ago
Convert tesseract_crash to plugin

1b92f447c3c44440641e87acd1af2bc13bd13b37 authored over 4 years ago
Tidy tesseract_noop

82e7eb91d2d3f56fd627bc3ac48699c17b09e771 authored over 4 years ago
Convert tesseract_big_image_error to plugin

4f4ad0fb7602f4c10e9acf17ae67e2d0b778fe2c authored over 4 years ago
Improve file size increase warning to account for changes to small files

Fixes #569

1d0b8641a0447d5f120921c336ee46b6a6207d7f authored over 4 years ago
Mark pdfminer.six 20200517 as supported

daca9197755e2872b26d010ebf4b27a5b2661b6f authored over 4 years ago
Abolish spoof_tesseract_noop

1598f2f0e5f1a07f3af3c292adf0622925c8110d authored over 4 years ago
tesseract_noop: begin implementing with plugin

2b23f7ec73121214c91496f32b8669538d911d94 authored over 4 years ago
Fix tesseract_ocr.py errors

6528234608b66216843fe0ed17e78ef2b3e967dd authored over 4 years ago
Fix test that failed on Windows

642ebc6098da30a1aa41006fee809ff3b654d55a authored over 4 years ago
v9.8.1 notes

74fdfeea3f3ad06107ec1923d73326811f7de39c authored over 4 years ago
Mark pdfminer.six 20200517 as supported

3754185f56b3d8f0b978d2fc91b83281f4b4b493 authored over 4 years ago
Fix shim_paths to account for unexpected files in Program Files\gs

Fixes #565

df9f5157bd4dd5cb447f18031d35ee030c15ff66 authored over 4 years ago
Refactor tesseract_env variable into the plugin

Removed all cases except one in api.py, which isn't worth solving because
it should be removed a...

aa060db5bc7f2c4017902360cc2b039bfca4d8bc authored over 4 years ago
Refactor --language argument into set

d43212d30b6e26dd272efffcbe4650c9b4c94970 authored over 4 years ago
Move Tesseract options validation into plugin

a0f9ca3a30d3de8b3b4f555985f2fd5decee0d7f authored over 4 years ago
Update email

0cefe886ec816632a963d1d88a57dca72436cf51 authored over 4 years ago
docs: Note about OCRmyPDF speed

f656c00f41a02e395fa8e40b6588b0fccb07136a authored over 4 years ago
Test files needed!

03da34ee2473de1121707d795dcec146bccfef82 authored almost 5 years ago
Move Tesseract specific arguments to plugin

9bccff4f885b4cbf3e38aa36437073c9003997d4 authored almost 5 years ago
Compare requested languages to OCR engine instead of tesseract directly

Also refactoring to facilitating validation needing the plugin manager.

2bd586e093a191fb6e9e939b56eb801de5069035 authored almost 5 years ago
pipeline: use OCR engine abstraction instead of Tesseract

9af94ac9b7b75601689443f146052e9e45876998 authored almost 5 years ago
Begin transforming Tesseract into pluggable OCR engine

8174089c8be254152b4f9b80f2833f108bdb0ff9 authored almost 5 years ago
Standardize tesseract.generate_hocr and _pdf parameters

41eb54cc0a59855aaa2d5d03001f80507d2fb2b6 authored almost 5 years ago
Fix validation of languages not using tesseract_env

And some related issues.

12a2f78c4dda83c50ba1b484a0f29f5adc8a0a6e authored almost 5 years ago
Remove "skip page" from tesseract interface

Breaks tests/test_main.py::test_tesseract_missing_tessdata because
conftest.py does not update o...

d372f1f7fa70ce29b49cb29efa475d10243de5c9 authored almost 5 years ago
Remove lru_cache on get_version

Does not play well with forking.

6f5b75bcd04040b262600f9eaff781f99aadb679 authored almost 5 years ago
Convert remaining imports to absolute

a2d3e0b53ea8b4d5430788dbd9f72ff42fda1165 authored almost 5 years ago
ocrmypdf.__init__: Hide _HookimplMarker

7f67556995568ddef50bef76c21266dc0c1201b4 authored almost 5 years ago
Refactor ocrmypdf.exec.__init__.py

db8c37e58cb6d7f572f8437da5cad24f78d68def authored almost 5 years ago
helpers: remove unnecessary isinstance test

a87c81a64fc0347f2a2624b69f18d257f16dc239 authored almost 5 years ago
cli: make ArgumentParser._api_mode private

4b986a5943e578c2a0efa187792333757c024c8f authored almost 5 years ago
Remove **kwargs from check_external_program; deprecated

2fae9b655e9c5c7271cee74b869511858a42a9b5 authored almost 5 years ago
Fix missing jbig2enc reported as error with -O3 instead of warning

Fixes #558

2541f6cf899f3e7e282c92de403598dabf3fecd4 authored almost 5 years ago
watcher: cleanup getenv casting

33b68454f35b483aac5edc349d90253143b0ac18 authored almost 5 years ago
Delint some tests

977665d2b6175f0ac27759bd26df779feb2e2635 authored almost 5 years ago
Remove old function tesseract.v4()

fd7497f00d2bdc9263fc348a6a868c46ec3479ef authored almost 5 years ago
Add fix for bug in Windows Python 3.6/3.7

TypeError: argument of type 'WindowsPath' is not iterable

790ff58f675bab850fca41a8ac0c2918bc8dbd96 authored almost 5 years ago
docs: rename security->pdfsecurity so github won't misinterpret it

4b98ce391b161b92c37f724cb85b755023367896 authored almost 5 years ago
docs: plugin documentation

417dbd43f6d4dfa4f87e939547c8e77e485187f1 authored almost 5 years ago
Relocate example plugin

7a12908db904cfbda9a43e5ea805d210d5c0652c authored almost 5 years ago
graft: more refactoring

9462f0a28fd73f235ee4a79b440b144f4406f673 authored almost 5 years ago
graft: refactor

e760622a5c4ea69cb90e9e147daaa62e4ed03f8a authored almost 5 years ago
tesseract.py: api cleanup

1b086f60a9b82e86aac214aab5390bda1b6d0d43 authored almost 5 years ago
Convert many uses of str paths to Path

85cbf94a6e76338b764708a26075a49e2639ce8e authored almost 5 years ago
New hook: filter_page_image

6f4286e1b11b580c0d31e476906e659d828749b6 authored almost 5 years ago
Rename install_cli to add_options

39888ae8c9485806df9d93bbb7bc431cd391c366 authored almost 5 years ago
Support importing plugin by filename

dd361ecd059a5fb0eaae160a093eef71c10bcd77 authored almost 5 years ago
Change argument from --plugins to --plugin

32759c902522a80d59572fc3cbd4de148a6023a1 authored almost 5 years ago
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF

59440448eedd7c8a21783ec1193dda66a9ba62ca authored almost 5 years ago
docs: update Arch Linux install instructions (#540)

The python-pdfminer.six package is now available in the official Arch
repositories. The depende...

51b54893ceed111ca74d3db12c03ea12b5711a36 authored almost 5 years ago
docs: remove reference to brewfile

1f3665f6144f731fe6bfca9bbda64c14276ac436 authored almost 5 years ago
optimize: convert from executor to progress pool

75c34b873a46a2561f8c4fb75c9f1010567d0a70 authored almost 5 years ago
safe_symlink: remove deprecated params

fe4296c53b151a2a408f093b0700c537d82a7dd2 authored almost 5 years ago
Delinting

c85278b31d546d85f425803df761b258502b2183 authored almost 5 years ago
Rename PDFContext->PdfContext

5dbc080fa034702a076db37b4d60091cb71dfedf authored almost 5 years ago
Support plugin invocation with API

e02f6c1e97c4353834f7c982ec2d79c15b60aef7 authored almost 5 years ago
pluginspec: avoid circular reference

8c9a8fc85cb0cafdd41ab305d20077cc21e8540a authored almost 5 years ago
Allow plugins to add command line arguments

23d558ad8c4149a503aed237893cf6b8af6761e9 authored almost 5 years ago
Set up filter_ocr_image hook

be107b4fedb838b1c4b4599abd75488edf6f6fa1 authored almost 5 years ago
Get pluggy to work with forking workers

8d2535e327d98ac8dc3995880117b026e5a43edd authored almost 5 years ago
Refactor plugin setup to get_plugin_manager

5eb4fe00525dfec915b2b394eae05a7aa12e44fc authored almost 5 years ago
Move samefile to helpers

d8ff4485f8482411480431c7774833f8cc916439 authored almost 5 years ago
Start pluggy-based plugin system

82bce463aece0c2e423a2fc7c0d4319546e77b24 authored almost 5 years ago
Add warning if problematic --tesseract-pagesegmode is selected

Fixes #549

016dfd420c7d42d01583ac6902c8926bd6a96126 authored almost 5 years ago
v9.8.0 release notes

b59e761a14af27a079a286b4fe52359d671ab8e3 authored almost 5 years ago
Don't utf-8 decode tesseract --print-parameters

Output not guaranteed to be UTF-8.

Fixes #543.

17cd655752fba4e261a4a1260fa33bef22428b11 authored almost 5 years ago