Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
https://github.com/ocrmypdf/OCRmyPDF

Fix private accessors, rename pdf to canvas

2ca6e110ca15f291bfd1136691e96830a9d0f550 authored about 1 year ago
Further exploratory improvements

30a0c315fba50354e6d963813de083c31b51efc3 authored about 1 year ago
Refactor debug printing

334a07c8390cfc76842ccea93dbca66fe69789d9 authored about 1 year ago
Refactor: extract methods

a57c39358d73168c69b8648cd61a329e34990b37 authored about 1 year ago
More colors

7ab5c55d4678423a983c0a50dbb61bb558eb2464 authored about 1 year ago
Remove concept of HOCR_OK_LANGS

491b6bdb1f24bd7b019da631148189858c2db94b authored about 1 year ago
Fix line and rect drawing

8b6ecd5971b255fe283fe12107ca33e13dd8a6c9 authored about 1 year ago
Add rendering of space between boxes

686cfb25394041af66392d44696f6bbce75dd5fe authored about 1 year ago
Fix position errors; ignore non-glyphless font

7b0871ae4c95fe8c12372d0c7e503b74185fa2c8 authored about 1 year ago
Significantly improvement overall

d9ae453a63f9c1be5a05b18b74f160da74ad9286 authored about 1 year ago
Tidy up

d739b91aef1b51aad712eef363a8bf8e323b5236 authored about 1 year ago
Tidying new hOCR renderer

b14f6f778abac2cbf406664128442f5e7c134111 authored about 1 year ago
WIP improve text positioning (not there yet)

14f4c19f5a3be1f55b26a841bb9e8056b16fe869 authored about 1 year ago
Create pikepdf backend renderer

d0133f86412d28f12495fb90cd5d4a1390665a7c authored about 1 year ago
Add pdf.ttf

e966c1fceb0b6b7a12e2a936e9e7596490f89300 authored about 1 year ago
Refactor reportlab into generic backend

6d30b497dcdb6c6abefd2c31a9539809d126cda6 authored about 1 year ago
Refactor reportlab backend out of hocrtransform

f3b89e66eb9fec87769dc4658c79f4237234c797 authored about 1 year ago
Test pikepdf canvas - renders... something at this point

60645717e27af004d1dde278511ba2f8ed89d814 authored about 1 year ago
hocrtransform: move to module

04154e207c989719af95e531435c12a4eb0c556f authored about 1 year ago
Fix import of pdf.ttf

1cbf578538753e6d8663d7abd8ee2f69b8e23577 authored about 1 year ago
Fix dashes

b73af7ce1001fb6a9b14c03c5c1e44f68f39fbc4 authored about 1 year ago
Fix pikepdf PdfMatrix deprecation warning; v15.4.3 release notes

9898904be70bf957154cc66cb5d6f57bac7c26b4 authored about 1 year ago
Make logger names unique

27d52298420e785300e418bef4e4219ef97720e1 authored about 1 year ago
ghostscript: better comments

4a9a575ef0f2cd7459538a43e802c8d8b3846168 authored about 1 year ago
v15.4.2 release notes

52fd9a630d57ac7af91db423b17e6b0297fc1c66 authored about 1 year ago
Raise exception if resulting PDF might appear blank in a known in some PDF viewers

Fixes #1187

a596ccf84469c81c6d19ab0eb59f53a5df0bc3c7 authored about 1 year ago
ghostscript duplicate filter: filter within a window of previous messages

e7fa97731f507c1d91e4289b9d5f3344533d7c4c authored about 1 year ago
Fix error on attempt to write to debug log after removing debug log handler

290aa2810868fb506a0ded2db8dabb580d5a2459 authored about 1 year ago
v15.4.1 release notes

Closes #1185
Closes #1183

a95640ed9ec0a5caa360bd64b9ad6cc9f916fa61 authored about 1 year ago
watcher: restore ability to read json from file or command line string

f69267bb675c3261121b70451f6e673f433bc7ad authored about 1 year ago
Make grafting a little bit more configurable

e36d5a309fd250927ed2c14983be35e806f7fdf0 authored over 1 year ago
Fix watcher.py kwarg error

55566d9830cc12a27e7c47b9a467bd021698ddbf authored over 1 year ago
docs: plugin documentation missing key special members

f02ea20678ab07f0f847347720b1db1f8a52cd2e authored over 1 year ago
docs: improve

372c22d42b449c9c0d3ca2826890ee296ac6ffb1 authored over 1 year ago
graft: improve typing and remove procset tracking

ProcSet is optional and deprecated in PDF 2.0, and does little anyway; so
we removed it.

949265bbd0e4b0c3c606129e37039905c58b22c9 authored over 1 year ago
Skip semfree unless on Linux

916106733c36c080e4707f0fc8e8613800ad0ed7 authored over 1 year ago
Add missing file header

44bcafd3aa2dcf3270c87bdc79c5ccff8852f0eb authored over 1 year ago
Make hocr API experimental for now

This commit can be reverted when we are ready to release a new version.

71166f7be8ef8a04fd4477dc3bdec07c043fed06 authored over 1 year ago
Merge branch 'feature/gscan2pdf'

Reconcile release notes and copy_final() with new pipeline.

580252a1a0c822f9a057bf08b995a6b5fa4d9490 authored over 1 year ago
build: add repository -y

c0b60dae6a5adde8b2d26dfb42b077e6825a8a53 authored over 1 year ago
Try to retain/copy xattrs

ae123fd20994d649388ed0e297ff0233b534aed0 authored over 1 year ago
build/macos: add openssl

454ad0acc520dc09cc5c6ef36dee0428536878b8 authored over 1 year ago
v15.3.1 release ntoes

0c306ac328374820a10f582c6ee33c5e9e73fb3a authored over 1 year ago
Fix mistakes with watcher loglevel handling

52d99732b12ce734881e7115388b88d810bd1858 authored over 1 year ago
Tweak documentation of --output-type

5b5827983be2d6d341717c11368b08fded9386fa authored over 1 year ago
Improve verbosity of colorspace selection

56f9bc311d37a0c5dd07bcc6074f9f12b25a6543 authored over 1 year ago
Fix pdf save settings at metadata_fixup

eb17dc1ecfca676af76140198529cf706013fda1 authored over 1 year ago
Fix import of metadata_fixup

6f8115a052794ce6289eb96d8d3e6ccf3442aa39 authored over 1 year ago
tesseract: EAFP

aac913c666cdb72f11fda8cde5a5fa3d863cb2cb authored over 1 year ago
Drop check for obsolete .dockerinit file

b5e73ac4e411877cbfd5037e14117eca0d8feb4e authored over 1 year ago
docs: note on docker performance

9e98c908919b295ae6583568a351b1928879d795 authored over 1 year ago
Update draft release notes

ca2592c1d95e3f4204659d36fba1b8f26da216d2 authored over 1 year ago
Update release notes so far

5a759947dd1743f57f22bd5b7290104088435aa8 authored over 1 year ago
Update comments and make worker functions private

a31f17bb9d44c961b01a3d7492162962c661ee5d authored over 1 year ago
Update pluginspec docs

1cb46afa94eca2c9f418d186df020b4744c605de authored over 1 year ago
Simplify function signature of extract_image_filter

dfa4ebf1a60e4233ab425d23c2243b4a76c72076 authored over 1 year ago
pngquant: remove unused ability to quantize a non-PNG

Covering testing showed this branch was never used, and when tested it didn't work.

cd61c4efd9ef369d1ca707cda6c7824df42b8a60 authored over 1 year ago
Remove public domain congress.jpg and replace with baiona_color.jpg

For reuse compliance we are phasing out public domain licenses

9ffb45f28332a92d7b3ed67b0d72abdc505a2b9e authored over 1 year ago
Update dep5

299f0c4003db2e3c25a414fff536d0ab545fcaa2 authored over 1 year ago
Improve passing of arguments to workers

The executor system was built around passing only a single
argument to workers, which was
always...

46a279a49aa4bd42896bcde842ae36691bd00104 authored over 1 year ago
Reorganize progress bars so they can be typed properly

d2dbea6cf832646428f8c5faf671bc79c446fa0b authored over 1 year ago
info: clarify pageinfo context management

e4cd081d4d02e1c7cc77ada26c9e82200b2fa300 authored over 1 year ago
info: clarify ICC -> components checking

d2297b39d0868c2b9c17da1d56f817acfe99695c authored over 1 year ago
unpaper: Remove format conversion

Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.

a06ab2a1c5176f3aeb141923366295c225d1e007 authored over 1 year ago
Fix hocrtransform test to generate blank hocr

a4059762e63e6f237ec1d895b83067a57a21cb12 authored over 1 year ago
Eliminate more run_ocrmypdf calls

82bef40aa68417b0951b4b70acb54739ad1182c9 authored over 1 year ago
pluginspec: spacing

40afcd68a79b1b4618b9c7be44626b80ab6bb50d authored over 1 year ago
Convert many run_ocrmypdf -> run_ocrmypdf_api

8916955f45de66d27dfcf47ceda24f21831ac5ca authored over 1 year ago
Improve documentation of new public hOCR APIs

f238e721edad9ffe9a8da3b005ada6c0faeb5cf5 authored over 1 year ago
Fix unused imports and other trivia

16eb5627a77e074ad07e7680d727a97c23eab6e8 authored over 1 year ago
hocr_to_ocr_pdf: handle missing hocr json file

fbf06741896fac650f936738162f9554b82fc594 authored over 1 year ago
Remove ocrmypdf._sync

db3df13e959b892a4423cde90809e0f8328469ba authored over 1 year ago
Improve ._pipelines naming

e400112f3277abb29e2b49385ea944b6ae99124f authored over 1 year ago
Use empty .hocr file instead of dummy template for symmetry with sandwich

7935914f5573dff1b7a89a3d6c405499adbffd31 authored over 1 year ago
optimize: typing

2a8bc03167d4448390ea3bc7e902d7b1a473f9ac authored over 1 year ago
Remove duplicate thread local storage of page numbers

62c4f65fc36c434ceeb5804dfa3835e0d6dfbb15 authored over 1 year ago
Fix some typing issues

4dbc5e1dbaa848c8a3ca7ed54455c98cc8688001 authored over 1 year ago
vscode isn't ready for black py312, revert

c0637c287e0a76cdf252f93d1fcd4908ac703f94 authored over 1 year ago
tests: replace many run_ocrmypdf -> run_ocrmypdf_api

1c45f329411e17613aa631ff221b7f47dd7793cc authored over 1 year ago
Fix coverage settings and cover semfree

990b462a94403c5b388b203bc6aada268a300442 authored over 1 year ago
Replace cryptic test error messages with more informative ones

fadc0cf69b7e9db6f679de870f94e1b70762559f authored over 1 year ago
tqdm_kwargs to progress_kwargs

6127f7abd66f765cab98e3404f0124962cd0a270 authored over 1 year ago
Define progress bar plugins formally instead of "tqdm-like"

7ce9d08b2d8c170f94545913bf56d1a799bf7818 authored over 1 year ago
optimize: better coverage

58f388c69d0d6a9b303bc8c8c3ba251401c0125a authored over 1 year ago
deps: update PyMuPDF req

ad3a1dbbad03ad16f2f871b869f670aae7ba47be authored over 1 year ago
Prefer pikepdf's newer Page.mediabox accessor over .MediaBox

eb3a51e33a7ec577c681bc025e87cf5870d9a468 authored over 1 year ago
Skip fewer tests

b928dc0808e580674faf9e58ddfe7d4459812f8b authored over 1 year ago
optimize: explore page container as objects instead of page helpers

f3dd73377309bc854bad1c80b0276a52548c7107 authored over 1 year ago
Rename post_process -> postprocess

For consistency with preprocess

c278fecb34df2c77372fec3c9c49b2d20a9a0ad7 authored over 1 year ago
Enable multiprocessing freeze_support (for Windows) and enable forkserver

Since we use a thread for logging among other possibilities, we were
never able to use threads s...

b9646b6f85fa9cbcc44f72df7c0fc315176a646b authored over 1 year ago
Fix error on no languages available

05721ba84af4e75cf8d0493d5769ba4d40457750 authored over 1 year ago
Refactor -migrate metadata repair to new module

1a7738a925ed42a0ab1671ee22eb3bb3c7adb178 authored over 1 year ago
docs: some copyediting

04a937258474bc9211b13fb7d45ac5439c8fb364 authored over 1 year ago
Refactor metadata handling

d153a6f6df923d49a841079536b98797287bc2c5 authored over 1 year ago
Automatically set document language to OCR language

38c3422e5ee8319d0110fdb6c5235bc541f4034d authored over 1 year ago
Add py312 to black coverage

0655f8e7ae5db1a66864011ae224c86a630d836d authored over 1 year ago
Fix exit code error on Ghostscript failure

0856750ee2f0aeb1cfe7962d620fb7fdd134261c authored over 1 year ago
languages: kwargs are overkill

e38d569d8f2a63a0d2e0060441fed5bf0f7405cd authored over 1 year ago
Refactor debug log and work folder context cleanup

fc6f959d21b3b19309c1e50359aa39a629ac3e07 authored over 1 year ago
Refactor setup_pipeline to decouple manage_work_folder

6f82097d141cb7d2ae76487aec654e7787b2a89a authored over 1 year ago