Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/ArchiveBot
ArchiveBot, an IRC bot for archiving websites
https://github.com/ArchiveTeam/ArchiveBot
This requires at least git 2.6.0 due to --date=format:x
aecfb01d0b5d2b39fd3ba3188d631996a3cb4fea authored over 5 years ago by JustAnotherArchivist <[email protected]>Require that the time zone is set to UTC on pipelines
d258af0a76ddbce2e2f531926262159f4ed772b4 authored over 5 years ago by JustAnotherArchivist <[email protected]>fc7d9de79ebf5faab95c1827b43db47f8f97651f authored over 5 years ago by JustAnotherArchivist <[email protected]>
Notify IRC channel on pipeline changes
5cd1e38a11c62a4d65c16e2c340cf67e777273ba authored over 5 years ago by JustAnotherArchivist <[email protected]>d26dedaa5dc50d20efb734fdc5ab2d427a367e82 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Register pipeline immediately on startup
3a2c8d266eed661a710d903162687563cf86d315 authored over 5 years ago by JustAnotherArchivist <[email protected]>The data directory is created by seesaw when it starts running the Pipeline (or specifically, th...
b39fe701b98a98dde8e7aca475ec2409de9c465a authored over 5 years ago by JustAnotherArchivist <[email protected]>1585458ad778036b32203422fbd76f628fa2b9cb authored over 5 years ago by JustAnotherArchivist <[email protected]>
Fix log shipper crash
48092a88c2a4b9251d7cd954dd767ed93eaad491 authored over 5 years ago by JustAnotherArchivist <[email protected]>Previously, only connection errors were caught. Buf if the disk on the control node was full, fo...
01a4e9620d290086a7d1f43e9e86978e9b4d121f authored over 5 years ago by JustAnotherArchivist <[email protected]>63810f21e60335736d32e23f9aa4e425c1563ebd authored over 5 years ago by JustAnotherArchivist <[email protected]>
cogs: Run the Broadcaster only if necessary, i.e. if tweeting is enabled
bd2abc6db7e50ffbd0cad4f3ac16fe4ffd40dba0 authored over 5 years ago by JustAnotherArchivist <[email protected]>Block uploading to an rsync URL that doesn't end in a slash
c7c53ee09f4840af3634e625f1209b61565202f0 authored over 5 years ago by JustAnotherArchivist <[email protected]>Ignore a CBL sinkhole domain
2161a6e6533bc2124fa5740098aee6ed718fef3c authored over 5 years ago by JustAnotherArchivist <[email protected]>Check for local webservers
86cb4257a7a7673ef5bbc13985489c0f4dff9bf3 authored over 5 years ago by JustAnotherArchivist <[email protected]>ad0f1616e2a3699f0c8c2eb858cf185969000f41 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Verify that inexistent domains do not resolve
a7ac103fbf1badead45768ad1a5736b2ef7075a9 authored over 5 years ago by JustAnotherArchivist <[email protected]>Compress the log file if wpull didn't finish cleanly (and didn't write a meta WARC)
38f81b5795d37a2930b7cbdfabc48a72ddc6cc92 authored over 5 years ago by JustAnotherArchivist <[email protected]>9e5e39ed3d33396a199de3e3ddbfcede77c0e285 authored over 5 years ago by JustAnotherArchivist <[email protected]>
63f60fa243b54948a73a77eb070bbfe7d0a760b1 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Update tests (August 2019)
686b8951d373d4e5fb3f2b46e63377f9af4ba068 authored over 5 years ago by JustAnotherArchivist <[email protected]>6f7579d644c7410b76b8ad24e78dbc0490a66e40 authored over 5 years ago by JustAnotherArchivist <[email protected]>
26eeec1e5d2cb076fdc16d7515e00f455174bc9b authored over 5 years ago by JustAnotherArchivist <[email protected]>
7a2a7693414e5314852f2625ea75ebd9abdb6029 authored over 5 years ago by JustAnotherArchivist <[email protected]>
5c858b080dc3a2448d05cb4323b0989b4519cf65 authored over 5 years ago by JustAnotherArchivist <[email protected]>
35af69bcde3e5aae25bba78ac67f1666b456cfc9 authored over 5 years ago by JustAnotherArchivist <[email protected]>
308a5fde2b2b57aaf2f1a4cecef2577353697c14 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Remove PhantomJS support
14c4a8423fc1df54c9558e42990923894ccc64d1 authored over 5 years ago by JustAnotherArchivist <[email protected]>* PhantomJS is no longer maintained. The last stable release was in early 2016, little developme...
abcb77fac8ecd849b0aae9d7b78e632505430b33 authored over 5 years ago by JustAnotherArchivist <[email protected]>Check that rsync exists in pipeline and add it to the installation command
7aad15ae7817a66cd098b7d82423a225b6fa9e17 authored over 5 years ago by JustAnotherArchivist <[email protected]>323e1a959f5a43f36f3e40275d74c1200f9a9820 authored over 5 years ago by JustAnotherArchivist <[email protected]>
www<dot>cgzxb<dot>com is associated with some malware, and accessing it causes a listing on the ...
5482b4f585414c379b9b8173f04aa431025c3e97 authored over 5 years ago by JustAnotherArchivist <[email protected]>The Broadcaster uses a ridiculous amount of CPU, and if tweeting's disabled, that's just a waste...
9ce5206d33d6af16a9b60e026d5574290305a55b authored over 5 years ago by JustAnotherArchivist <[email protected]>https://github.com/ArchiveTeam/ArchiveBot/issues/305
Fixes issue 305.
d9df1e5b1af8ee9c2bc2c7cc9a68f261d0c66400 authored over 5 years ago by Matt Iggo <[email protected]>Ignore more WhatsApp, Facebook, LinkedIn, and LINE share links
7d7fd51aedd07b9aef6a2a1f0e35b2fae61bc42c authored over 5 years ago by JustAnotherArchivist <[email protected]>Ignore WordPress Fastest Cache's minified JS recursion
717cae6969f048006e889c83798cfe42fab22fa1 authored over 5 years ago by JustAnotherArchivist <[email protected]>fbc7ab25b7d6cb54b1a1ff71189b1fd357550258 authored over 5 years ago by JustAnotherArchivist <[email protected]>
bf157e5738d35786094526ca4c71373c87d6d169 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Self-host the dashboard's JS dependencies and update them
6d8f2996d90646e8b7bbd47997c051553c80c76b authored over 5 years ago by JustAnotherArchivist <[email protected]>jQuery from 2.1.1 to 3.4.1, DataTables from 1.10.2 to 1.10.19.
ef37770938b3adda75b3110ce80f5a682e75334f authored over 5 years ago by JustAnotherArchivist <[email protected]>Update user agents to current versions
ce1115cf9cc3a4f6412967fc421118d5b1c3366d authored over 5 years ago by JustAnotherArchivist <[email protected]>Mostly based on https://techblog.willshouse.com/2012/01/03/most-common-user-agents/
Couldn't fi...
9e94db32fb9232c5c293780e1a6baa8541a6bf40 authored over 5 years ago by JustAnotherArchivist <[email protected]>Ignore reverse pagination on custom Blogspot domains as well
3ee8f321f41c1f8d35a623406abcbe96e8f7aaa9 authored over 5 years ago by JustAnotherArchivist <[email protected]>2327e205ca09c0c917f3a1d4e9811af415b049ed authored over 5 years ago by JustAnotherArchivist <[email protected]>
Add separate igsets for MediaWiki locales and a script to generate them from MW message files
b2917de5a35d0564336860634f951be530733f00 authored over 5 years ago by JustAnotherArchivist <[email protected]>Add igsets for GitHub
cb73a624eaa6bb81aad5c43f46b0abde0a924005 authored over 5 years ago by JustAnotherArchivist <[email protected]>Basic Mastodon igset
040aadd86f0cbc20fd036d4af3f5eb92e1f2ce13 authored over 5 years ago by JustAnotherArchivist <[email protected]>A few blogs igset improvements
c08e4de42390a28f344fcd3beab23371485dc4ec authored over 5 years ago by JustAnotherArchivist <[email protected]>Ignore Squarespace JS hell
6a5cb653ef4d18fb147c66356f466c1ec1b0426d authored over 5 years ago by JustAnotherArchivist <[email protected]>Various improvements for MediaWiki ignores
c842c1f7f298783959ee57b97b28f746281ac054 authored over 5 years ago by JustAnotherArchivist <[email protected]>8737d8a6ca8cd2739a36a4a8769755ea0c753ef5 authored over 5 years ago by Flashfire42 <[email protected]>
Update Safari UA
bc1d9b96f6add654a590bf28862ba0150386e46b authored over 5 years ago by JustAnotherArchivist <[email protected]>1514d162e035c4cb1d304b41ced32a9f971638f8 authored over 5 years ago by JustAnotherArchivist <[email protected]>
"github" is the general igset that should be used with GitHub grabs. "nogithubcode" ignores the ...
5235b191e1b525805992f51f2f2a72b1e6b9629a authored over 5 years ago by JustAnotherArchivist <[email protected]>edc80952495f154c987f83eeafd6a80710968775 authored over 5 years ago by JustAnotherArchivist <[email protected]>
- Fix CSI ignore on Blogspot, and remove search and label ignore since those pages are actually ...
3b90d44286e79aadcbc0c24670b0308a35b8e1d0 authored over 5 years ago by JustAnotherArchivist <[email protected]>bbd5807ec3569ac55386e39a60c96fc88cdb8ef1 authored over 5 years ago by JustAnotherArchivist <[email protected]>
- I've seen a couple jobs recurse through the entire log of a user with limit=1. Nothing good ca...
900586fe05912a2c0bfb86b3b7760ecdb3bdb49c authored over 5 years ago by JustAnotherArchivist <[email protected]>New dashboard WebSocket server
467cd28b02857520007b23217db6f3a0ea1f034b authored over 5 years ago by JustAnotherArchivist <[email protected]>The old server was unable to keep up with the messages (#333), ate all the CPU and RAM it could ...
286970b33c89782703a795c30b690256ca2ffd6c authored over 5 years ago by JustAnotherArchivist <[email protected]>Updated Safari UA for Mac OSX with the string for High Sierra
79eb529f827a2b11b08cdeccda14898222293d3a authored over 5 years ago by Flashfire42 <[email protected]>5013966112419ac6c88eabee6a030cc07be80032 authored over 5 years ago by Flashfire42 <[email protected]>
Add pending job list to the dashboard and disable !pending when the queue is long
336cc412bc6ca4d5b7ec982e88f04935e7562fb3 authored over 5 years ago by JustAnotherArchivist <[email protected]>9b4ee44257d0feaf7228a439b4614eeb997f818f authored over 5 years ago by JustAnotherArchivist <[email protected]>
176cb710fc99c5b88ab072668fef01c6daa61d21 authored over 5 years ago by JustAnotherArchivist <[email protected]>
522bcc6594c2a84ae8cb32556c1e9b787a894678 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Allow unauthorised users to add an explanation to their own jobs (fixes #223)
7f8ef3202c6ed976dba2cc0929d27fc9f328358f authored over 5 years ago by JustAnotherArchivist <[email protected]>Add alias --concurrent for !a/!ao command and !concurrent for !concurrency
eeb9e2e109280a7843ea364ebcb0f324631b8f8b authored over 5 years ago by JustAnotherArchivist <[email protected]>Add pending-ao counter to !status reply
8c045f0547d38105ed67321b25ce2c67f70ca8b7 authored over 5 years ago by JustAnotherArchivist <[email protected]>Add some error handling
a50778c51453ff12f9690d37b22ae1b63512bee6 authored over 5 years ago by JustAnotherArchivist <[email protected]>6421c07ec88c739e8a2dea3faf72b2b3388cab91 authored over 5 years ago by JustAnotherArchivist <[email protected]>
Add WPULL_MONITOR_DISK/MEMORY env vars
2b8ef1f6cc89d34ea2c70e2cfab4b4df9fba219b authored over 5 years ago by JustAnotherArchivist <[email protected]>008495b45c38a629bde316ab28cffb67f34148de authored over 5 years ago by JustAnotherArchivist <[email protected]>
77b0b86def8171fcdb18ca4ca77b1e2af8bea4e8 authored over 5 years ago by JustAnotherArchivist <[email protected]>
90622b50b8b1d3df1cd6e8f9f17fd9223b56f4d4 authored over 5 years ago by JustAnotherArchivist <[email protected]>
wpull's option is called --concurrent, so it makes sense to support that on ArchiveBot as well.
35af01a175d5fd6f4b7d5085a237db68114d95a8 authored over 5 years ago by JustAnotherArchivist <[email protected]>b0977ce5b573824899624dc9dcf6dfc034468068 authored over 5 years ago by JustAnotherArchivist <[email protected]>
The defaults might not fit every pipeline and are a bit low when it
comes to large file download...
Updated grab-site repo URL in README
47b0f4371a95c169c19381125743ec99199f9583 authored almost 6 years ago by JustAnotherArchivist <[email protected]>8d0427f65af4b1eb0ecc5e31d311789f58bc13d1 authored almost 6 years ago by hook <[email protected]>
Allow only opped users to add jobs when there is a queue of 5 or more jobs
0a82925930358ef3be23b30f94fadee410231efa authored about 6 years ago by JustAnotherArchivist <[email protected]>document the archive --large flag
0cbc710b815eb8c9f430833cfb28faa6cbcbc4e1 authored about 6 years ago by JustAnotherArchivist <[email protected]>dd5bae2f26761accf8fc316edf8ae9f74525e683 authored about 6 years ago by JustAnotherArchivist <[email protected]>
It's in the source but not in the docs.
4143ca59eda0ed8eeb50744a2d0589ad6136e149 authored over 6 years ago by anarcat <[email protected]>Disable youtube-dl until #291 is fixed
9f194a4b6df5f8e90b10af2f1755350d81efca2c authored over 6 years ago by David Yip <[email protected]>196fd335fb22e1734483d901035ab66d803465dd authored over 6 years ago by JustAnotherArchivist <[email protected]>
This reflects the ignoracle speedups.
958583bb0c380568640cabd96eba00591af9d5c3 authored over 6 years ago by David Yip <[email protected]>make job status less chatty
90bf00bcd7665c1278551bb8fa3bc89b193b6aa5 authored over 6 years ago by David Yip <[email protected]>... and the rest of the universe
a05779825d86ac2294a7609379655d1d695fd062 authored over 6 years ago by Antoine Beaupré <[email protected]>
The reference is noisy for nothing - it's already in the channel topic
and it's not used in othe...
The job status is currently two verbose and spams the channel with
four lines of log.
So instea...
69b525ff21f3ca32d1f1f18701fdfb8408742ec7 authored over 6 years ago by Antoine Beaupré <[email protected]>Import grab-site's dashboard fixes
8df159a3048926cb77181d36e1b6d9fbd6cbf0b1 authored over 6 years ago by David Yip <[email protected]>Cache the ignoracle patterns while primary_url and primary_netloc stay constant
fc99354db4010bcf649ccd63116ba643975ab234 authored over 6 years ago by David Yip <[email protected]>132d56a68ef827c3ec7821054833b36559b1aa7a authored over 6 years ago by Ivan Kozik <[email protected]>
773904767c49fbd8ccf581d9f17b0d7fdb4023a2 authored over 6 years ago by Ivan Kozik <[email protected]>
5c29a4860a4ebe68199424320e8980ab0880a61f authored over 6 years ago by Ivan Kozik <[email protected]>
171221982d4044d3e57fc630a760eeb65baa7e8f authored over 6 years ago by Ivan Kozik <[email protected]>
b2c2bc5a59fbc8c9ee65399cccf371bb3f561729 authored over 6 years ago by Ivan Kozik <[email protected]>
fdb29ab7ba5e11e95c545f44bb89b91a3f55b416 authored over 6 years ago by Ivan Kozik <[email protected]>
c6ae77ee696e921e82a9510b34bc0ff342dbb99c authored over 6 years ago by Ivan Kozik <[email protected]>