Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/zstd-dictionary-trainer
Training ZSTD dictionaries for use in ZST WARCs.
https://github.com/ArchiveTeam/zstd-dictionary-trainer
Version 20230531.01.
c0a175476ca7f6de34bcfd9ce613df38b6cb6a8d authored over 1 year ago by arkiver <[email protected]>
c0a175476ca7f6de34bcfd9ce613df38b6cb6a8d authored over 1 year ago by arkiver <[email protected]>
Better determine if mimetype is text based.
d0b9d1a2b9a9e8ba72ab768d1c1cafac1e7fc4d4 authored over 1 year ago by arkiver <[email protected]>
d0b9d1a2b9a9e8ba72ab768d1c1cafac1e7fc4d4 authored over 1 year ago by arkiver <[email protected]>
Set sample size on cdx URLs function.
f677aa132a06ddb6e96e590c01c4993ffde736a0 authored over 1 year ago by Arkiver <[email protected]>
f677aa132a06ddb6e96e590c01c4993ffde736a0 authored over 1 year ago by Arkiver <[email protected]>
Support setting initial text for initial dictionary.
1f991691436991e00878a97e88f37d8b1c1f8f00 authored almost 3 years ago by arkiver <[email protected]>
1f991691436991e00878a97e88f37d8b1c1f8f00 authored almost 3 years ago by arkiver <[email protected]>
Version 20210426.01. Use transfer.archivete.am instead of transfer.notkiska.pw. Use scope=all param for finding web items.
6a4b2896aab41cb9dba0b1283152d1a3b5cdca59 authored over 3 years ago by arkiver <[email protected]>
6a4b2896aab41cb9dba0b1283152d1a3b5cdca59 authored over 3 years ago by arkiver <[email protected]>
Use list of selected CDX records.
f47691738edc18224d8749c582eb4eb14269a835 authored over 4 years ago by arkiver <[email protected]>
f47691738edc18224d8749c582eb4eb14269a835 authored over 4 years ago by arkiver <[email protected]>
Support gz and uncompressed WARC files.
f4f3f4148fbc72ddded7c986559bf3b1d5037175 authored over 4 years ago by arkiver <[email protected]>
f4f3f4148fbc72ddded7c986559bf3b1d5037175 authored over 4 years ago by arkiver <[email protected]>
Version 20200804.01.
0cc8e4ad35d3ad172be1fa3ff18ecf2fa1bdd993 authored over 4 years ago by arkiver <[email protected]>
0cc8e4ad35d3ad172be1fa3ff18ecf2fa1bdd993 authored over 4 years ago by arkiver <[email protected]>
Use random.choices instead of random.sample for drawing CDX records.
71a23a9f9e8094f3a672152640d6f0d3986b1f77 authored over 4 years ago by arkiver <[email protected]>
71a23a9f9e8094f3a672152640d6f0d3986b1f77 authored over 4 years ago by arkiver <[email protected]>
Also use mimetype text/xml to train dictionary.
29e35b7a1093515d4c0376ca20577a4bffac8001 authored over 4 years ago by arkiver <[email protected]>
29e35b7a1093515d4c0376ca20577a4bffac8001 authored over 4 years ago by arkiver <[email protected]>
Add option to run dashboard, set refresh time. Only generate initial dictionary once.
929cb0300166e9c844319e870bcd8251e13d7f0d authored over 4 years ago by arkiver <[email protected]>
929cb0300166e9c844319e870bcd8251e13d7f0d authored over 4 years ago by arkiver <[email protected]>
Add dummy dictionary trainer for initialization of project. Increase IA upload retries to 30.
761b4c5b0b7b8984c9743383e7c9fd5b4a3bfc91 authored over 4 years ago by arkiver <[email protected]>
761b4c5b0b7b8984c9743383e7c9fd5b4a3bfc91 authored over 4 years ago by arkiver <[email protected]>
Extract WARC records from online WARC at IA. Add option to set concurrency.
bc31c0a1ab115aa49b3898911fbf6bc7a2bfee54 authored over 4 years ago by arkiver <[email protected]>
bc31c0a1ab115aa49b3898911fbf6bc7a2bfee54 authored over 4 years ago by arkiver <[email protected]>
Change collection to archiveteam_inbox.
832fe602f813213bf6bc29304374416ec5289799 authored almost 5 years ago by arkiver <[email protected]>
832fe602f813213bf6bc29304374416ec5289799 authored almost 5 years ago by arkiver <[email protected]>
Change 'version' to 'id'. Retry after 5 minutes on error. Compress zstdict with ZSTD.
96f31f0ca6c8d34710a23bbb1060bc12a74afbe7 authored almost 5 years ago by arkiver <[email protected]>
96f31f0ca6c8d34710a23bbb1060bc12a74afbe7 authored almost 5 years ago by arkiver <[email protected]>
initial
d7b7c2f7cdf67572c2061c981a12681cd5a0ac11 authored almost 5 years ago by arkiver <[email protected]>
d7b7c2f7cdf67572c2061c981a12681cd5a0ac11 authored almost 5 years ago by arkiver <[email protected]>