Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

Archive Team

We are going to rescue your shit
Collective - Host: opensource - https://opencollective.com/archiveteam - Website: https://archiveteam.org/ - Code: https://github.com/ArchiveTeam

Version 20231112.01. Also support trailing / for 301 to other domain to handle spam.

github.com/ArchiveTeam/urls-grab - dcdee944eec699a45abea4b47837bd361af81e02 authored 11 months ago by arkiver <[email protected]>
README: install on NixOS: this _can_ be run as root

github.com/ArchiveTeam/grab-site - dfb99dfdcd8b68d31612ab493044244f4f455f28 authored 11 months ago by Ivan Kozik <[email protected]>
README: nixpkgs 22.11 -> 23.05

github.com/ArchiveTeam/grab-site - edca2cda84be00ae19c39132dc2bc7febb34e05a authored 11 months ago by Ivan Kozik <[email protected]>
Version 20231111.02.

github.com/ArchiveTeam/reddit-grab - 8fc86a11ca8d34ec4d00b1bfa6e23178b0bbe2ac authored 11 months ago by arkiver <[email protected]>
Version 20231111.01. Switch ciphers again.

github.com/ArchiveTeam/reddit-grab - 6fdf778e19de3826b8cd3922f1f33b9f1abc6eec authored 11 months ago by arkiver <[email protected]>
Merge pull request #563 from ivan/stopped-logged-window

github.com/ArchiveTeam/ArchiveBot - df5ccbe16a89d7dcc96ca5a20ced9141e9cee364 authored 11 months ago by Ivan Kozik <[email protected]>
When a new job is added, don't scroll the stopped logged window being hovered over

This fixes:

"when you hover over a log window, it stops scrolling as intended, but
each time a ...

github.com/ArchiveTeam/ArchiveBot - 8345744b2cc0bebfbef9347e4723d894e2fa4d15 authored 11 months ago by Ivan Kozik <[email protected]>
Add people who regularly delete blog posts

github.com/ArchiveTeam/urls-sources - 9b497f202d8bbe93a6769431e5827e0802409897 authored 11 months ago by Paul Wise <[email protected]>
Version 20231108.01. Do not install utf8 with luarocks, this is now in base parent image.

github.com/ArchiveTeam/telegram-grab - 4960ffc19e85bf9c923d041c26f5ccbe5e46c8b8 authored 11 months ago by arkiver <[email protected]>
Version 20231108.02. Move to another cipher.

github.com/ArchiveTeam/reddit-grab - e87de8969cf9ec94eb28826fbbb74ab8c65e0ff3 authored 11 months ago by arkiver <[email protected]>
Version 20231108.01. Do not install utf8 with luarocks, this is now in base parent image.

github.com/ArchiveTeam/reddit-grab - 9c9b59dafdb8ce0ffee2eea04055ebc06d471800 authored 11 months ago by arkiver <[email protected]>
Version 20231108.06. Do not download 301 redirected to URL in same session when not queued back.

github.com/ArchiveTeam/urls-grab - 24ec555db94a95afe261280978176a0b3b319fe6 authored 11 months ago by arkiver <[email protected]>
Version 20231108.01. Remove broken temporary hack introduced in b46ba0a05e16e7106ee4019726e2fdf23fc74331

With the recent addition of sudo in grab-base, the manual removal of /usr/bin/sudo breaks sudo f...

github.com/ArchiveTeam/warrior-dockerfile - 62ee62887d777d12a324d428b0ac1da1cda1e0c6 authored 11 months ago by JustAnotherArchivist <[email protected]>
Version 20231108.05. Queue again from ads.txt and app-ads.txt. Do not queue URL to which is 301 redirected with if it a front page without trailing /.

github.com/ArchiveTeam/urls-grab - c7c2718124f502d3ac0dc0c0a16f65c7345779a7 authored 11 months ago by arkiver <[email protected]>
Version 20231108.04. Stop queuing from ads.txt and app-ads.txt. Multi item size to 100, to limit at tracker side.

github.com/ArchiveTeam/urls-grab - 0d53e097f74dcf9db76c9f5540ba2b6b2a74c28b authored 11 months ago by arkiver <[email protected]>
Version 20231108.03. Append / to URL when normalising for aborted URLs check if not enough / in URL.

github.com/ArchiveTeam/urls-grab - 51da137bece442f1ddecd3e1881602225c6bc91b authored 11 months ago by arkiver <[email protected]>
Version 20231108.02. Exit on S?SID and _?s URL paramaters.

github.com/ArchiveTeam/urls-grab - 10695bfba2c44fa216171132931d7be37fe84b03 authored 11 months ago by arkiver <[email protected]>
Version 20231108.01. Take out new /template/news/{xzx,b1/} spam changes.

github.com/ArchiveTeam/urls-grab - b1d6ce647b5e1d9afa8bc69ce396cde5eb1232a3 authored 11 months ago by arkiver <[email protected]>
Version 20231107.01. To build with latest grab-base-df.

github.com/ArchiveTeam/warrior-dockerfile - 7aeb1dbf2a197cb9e6666529d78e1774d1944307 authored 12 months ago by arkiver <[email protected]>
Version 20231107.02. Initial commented out code for using pandoc to convert file to PDF for further processing for URLs extraction.

github.com/ArchiveTeam/urls-grab - 7050f9ecb4023a0d8a7a11abec8d8b5e2d76a445 authored 12 months ago by arkiver <[email protected]>
Version 20231107.01. Do not label wget-at.version anymore.

github.com/ArchiveTeam/grab-base-df - fc28f661bb26c8608aefac8254f673fc7953ea3c authored 12 months ago by arkiver <[email protected]>
Version 20231107.01. Rewrite ˜ to ~ in PDF extracted URLs. Handle port in extracted URLs without protocol from PDF.

github.com/ArchiveTeam/urls-grab - 4abe1c23611b9b4f9fe5de5398ead04fc4832e37 authored 12 months ago by arkiver <[email protected]>
Version 20231107.01. Domain telegram-cdn.org moved to cdn-telegram.org.

github.com/ArchiveTeam/telegram-grab - 31a4fbf9b9f23dbd04af7dcc67f885996d1a8858 authored 12 months ago by arkiver <[email protected]>
Version 20231103.01. Install sudo. Install utf8 with luarocks.

github.com/ArchiveTeam/grab-base-df - d95deaa568493ea7ffef134d2da2396cf1db434c authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/imgur-grab - 390f4dae7058673d44bbcd4d204f1a0591740b58 authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/pastebin-grab - 10a6d7382730b67fd0a19d7d63bee8106a2f8882 authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/mediafire-grab - 34720d16a675503b1bbf638aef5efac8086fb7fc authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/github-grab - b0b703bc562d55e86871bbcd658e3111a9904e9a authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/telegram-grab - fed8478dbb87be24de60fd78bdcce131176f3826 authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/urls-grab - d99c6ba6958e974361f39f416d600b70ce49f2eb authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync.

github.com/ArchiveTeam/reddit-grab - 388e4325c5b5520b9211540e60927398ac17d8d9 authored 12 months ago by arkiver <[email protected]>
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.

github.com/ArchiveTeam/youtube-grab - 9a7e5f438f5ad43f05746c8e329c6f893f293cde authored 12 months ago by arkiver <[email protected]>
Version 20231031.02. Increase maximum multi item size to 100.

github.com/ArchiveTeam/telegram-grab - 91171138bf28e30dde9cd5d4ca7fed1a915948e3 authored 12 months ago by arkiver <[email protected]>
Version 20231031.01. Do not queue telegram.me URL for comment page. Fix Lua error.

github.com/ArchiveTeam/telegram-grab - 20eab4e6909d35552854f3b80d048bcf81a1584c authored 12 months ago by arkiver <[email protected]>
Version 20231031.01. Take out new /template/ loops.

github.com/ArchiveTeam/urls-grab - 0c36bf8ee7c61a163e5d238f90ef8b89d1a3b3ee authored 12 months ago by arkiver <[email protected]>
Version 20231026.01. Use --ciphers HIGH:+SHA384.

github.com/ArchiveTeam/reddit-grab - 1c2723f9f2895e656924c0fd3fe73e569ce3f05b authored 12 months ago by arkiver <[email protected]>
Version 20231024.03. Extract every candidate URL from set of strings to join.

github.com/ArchiveTeam/urls-grab - b4d2520b4c7e863adc15458412606db58c119d62 authored 12 months ago by arkiver <[email protected]>
Version 20231024.02. Filter bad extracted URL.

github.com/ArchiveTeam/urls-grab - 624906dd015c17fb78e6b31e8e49023e5464f8b4 authored 12 months ago by arkiver <[email protected]>
Version 20231024.01. Remove some too wide filter patterns.

github.com/ArchiveTeam/urls-grab - b549441f1ca1fe17ed5dc41b569443b707a7fff9 authored 12 months ago by arkiver <[email protected]>
Merge pull request #21 from chosak/patch-1

Fix WARC CDX writing

github.com/ArchiveTeam/wget-lua - 01bae48b489b93efe26fee97f10f6f5b5ba4583e authored 12 months ago by arkiver <[email protected]>
Version 20231020.02. Fix filter pattern to allow for - in URL.

github.com/ArchiveTeam/urls-grab - 9c7488c70f9780a0635ef6acb60b09cd8e3db2bb authored 12 months ago by arkiver <[email protected]>
Version 20231020.01. Add more ads URLs to one-time patterns list.

github.com/ArchiveTeam/urls-grab - 82d36071cdb3de185ea7c72c6dcbf171c53e78d4 authored 12 months ago by arkiver <[email protected]>
Version 20231020.01. Use gnutls. Support new method of serving Reddit comments.

github.com/ArchiveTeam/reddit-grab - e350e69f898482dc6e250683761e718d76f8c44b authored 12 months ago by arkiver <[email protected]>
Version 20231019.02. Queue back all URLs found on special interest pages.

github.com/ArchiveTeam/urls-grab - 8b395ccab59bd3b5c9c3551d32539f4631c73430 authored 12 months ago by arkiver <[email protected]>
Version 20231019.01. Support extracting URLs from PDF with obfuscated '.' as ' dot ' or ' (dot) ' or ' [dot] '. Handle extra white spaces after newline in PDF URL extraction.

github.com/ArchiveTeam/urls-grab - f3e6c3caaf889eb4b460f2809f4b51778c0efd86 authored 12 months ago by arkiver <[email protected]>
Version 20231019.01. Use --secure-protocol=TLSv1_2.

github.com/ArchiveTeam/reddit-grab - 0e7392acd3b1e586fb142de61fb90fd27966ba93 authored almost 1 year ago by arkiver <[email protected]>
Version 20231018.02. Handle og:image in ?comment= URL equally as post URL.

github.com/ArchiveTeam/telegram-grab - 9525e1f1be6fb459b0864727a279927b6311f382 authored about 1 year ago by arkiver <[email protected]>
Version 20231018.01. Support comment items.

github.com/ArchiveTeam/telegram-grab - 99ebc02cac1eb91e1c311af07cf819c645fe0f8c authored about 1 year ago by arkiver <[email protected]>
Version 20231017.02. Use --secure-protocol=TLSv1_3.

github.com/ArchiveTeam/reddit-grab - 4bcc04734fd5503b6689d4f35669ca70428dc45d authored about 1 year ago by arkiver <[email protected]>
Version 20231017.01. Exit URL on _event_transid parameter.

github.com/ArchiveTeam/urls-grab - 32d22d21432810b1993561d1990701c87c8b19eb authored about 1 year ago by arkiver <[email protected]>
Version 20231017.01. Use --secure-protocol=auto. Use new minimum Wget version checker.

github.com/ArchiveTeam/reddit-grab - b1bf682030f05070f1a3ec9f5062324df6566bd7 authored about 1 year ago by arkiver <[email protected]>
Fix WARC CDX writing

Commit fd873c1ecb96467e633f145aeaad256ca36fcd63
introduced a bug in the CDX file writing logic ...

github.com/ArchiveTeam/wget-lua - 33515e0478e4dbd62b6aec8f89c0d68b96718883 authored about 1 year ago by Andy Chosak <[email protected]>
Version 20231016.02.

github.com/ArchiveTeam/urls-grab - c2fdb4c2e3f987251e479e5cf8fb521caf4d38ee authored about 1 year ago by arkiver <[email protected]>
Filter out /read/ loop.

github.com/ArchiveTeam/urls-grab - d982c15f7832cd3b00ade9a51dd06170df243c3b authored about 1 year ago by arkiver <[email protected]>
Prevent /pics/K888 loop.

github.com/ArchiveTeam/urls-grab - 1b86e04abc8ff2dcf8032e0657d2b0e04d75cbdc authored about 1 year ago by arkiver <[email protected]>
Handle newshtml loop.

github.com/ArchiveTeam/urls-grab - c7c5a8e2cf1867ab894d8ce48f496ba995f03b56 authored about 1 year ago by arkiver <[email protected]>
Version 20231016.01. Ignore upluds yamaxun loop.

github.com/ArchiveTeam/urls-grab - f4c4e39f7954014b779f5586541a1399912ae1bf authored about 1 year ago by arkiver <[email protected]>
Version 20231015.02. Ignore loops.

github.com/ArchiveTeam/urls-grab - 68977d2e3131cbda44392f9eaf8ba358b6d03d8b authored about 1 year ago by arkiver <[email protected]>
Version 20231015.01. googlesyndication.com and googletagmanager.com URLs are one-time URLs.

github.com/ArchiveTeam/urls-grab - 5bb9f7ccd9e3c14911659227e533740caf0ac624 authored about 1 year ago by arkiver <[email protected]>
Version 20231011.02. Support ALLOW_IPV6 environment variable to not use --inet4-only in Wget-AT.

github.com/ArchiveTeam/telegram-grab - ce9007c7248c76571ce211be89dbdbcf00be9e9d authored about 1 year ago by arkiver <[email protected]>
Version 20231011.01. Allow only IPv4.

github.com/ArchiveTeam/telegram-grab - f5eda9e5e484c0cdbea02b399ecf277e369bca94 authored about 1 year ago by arkiver <[email protected]>
Version 20231010.03. Actually stop opening user-agents.txt file as well.

github.com/ArchiveTeam/urls-grab - 3ac6baafa6d35ac9aed7b2f2ba0c22195e288abd authored about 1 year ago by arkiver <[email protected]>
Version 20231010.02. Load user agents list only once.

github.com/ArchiveTeam/urls-grab - e3e5e74136e469ff07437f61a837a5f90a0e9cd0 authored about 1 year ago by arkiver <[email protected]>
Version 20231010.01. Disable check on http://on.quad9.net/.

github.com/ArchiveTeam/urls-grab - 1ebafa4f63cfcd9a922a2eccc5c3feb8e97395dc authored about 1 year ago by arkiver <[email protected]>
Version 20231003.01. Handle problematic items.

github.com/ArchiveTeam/pagespersoorange-grab - bc158f9d77b085d680d079410b12f0541752f9e9 authored about 1 year ago by arkiver <[email protected]>
Version 20231002.01. Abort item is URL is seen more than 5 times.

github.com/ArchiveTeam/pagespersoorange-grab - 49ca43f1fa635c88abc1bb6b7a756f9b39a9a863 authored about 1 year ago by arkiver <[email protected]>
Version 20231001.03. Increase multi item size to 100 to limit on tracker side. 6 second sleep time for 404 redirect, else 1 second.

github.com/ArchiveTeam/pagespersoorange-grab - 1b5a037d38d36b62eff5860daa4f58c6df4f6958 authored about 1 year ago by arkiver <[email protected]>
Version 20231001.02. Only strip marie, ecole, assoc or pagespro-orange.

github.com/ArchiveTeam/pagespersoorange-grab - c779112eb897f3c2874d31a88ac9363e86ff9083 authored about 1 year ago by arkiver <[email protected]>
Version 20231001.01. Strip .ecole, .assoc, .marie from 'site'.

github.com/ArchiveTeam/pagespersoorange-grab - c34768ce568217ec7c3e772ad6f2d2565ec8dcde authored about 1 year ago by arkiver <[email protected]>
Version 20230929.02. Prevent subdomain loop.

github.com/ArchiveTeam/pagespersoorange-grab - 903a2cb10f8c85baa391fe683bf5eb7cda220989 authored about 1 year ago by arkiver <[email protected]>
Version 20230929.01. Skip several domains from queuing to.

github.com/ArchiveTeam/pagespersoorange-grab - 1e60275fc0606a8ef0978b91951c4ccd39ebaaa0 authored about 1 year ago by arkiver <[email protected]>
Version 20230928.04. Improvements.

github.com/ArchiveTeam/pagespersoorange-grab - 48fc3b422ca345fb3fb14c855304efc0e96bfd69 authored about 1 year ago by arkiver <[email protected]>
Version 20230928.03. Fix resetting some variables.

github.com/ArchiveTeam/pagespersoorange-grab - f76fc64a5dd90f06bbb5e2800ea27a2970b414e5 authored about 1 year ago by arkiver <[email protected]>
Version 20230928.02. Also queue *.orange and monsite.wanadoo.

github.com/ArchiveTeam/pagespersoorange-grab - aaeb1a7b35f02c3d56d944651cc4dd7c655f9553 authored about 1 year ago by arkiver <[email protected]>
Version 20230928.01. Do not queue some domains if another is found. Handle woopic.com. Use different sleep times. Discover outlinks.

github.com/ArchiveTeam/pagespersoorange-grab - 814852c35f4be3c099667788e74607f0c1b96d83 authored about 1 year ago by arkiver <[email protected]>
Version 20230919.01. Support all item types.

github.com/ArchiveTeam/zowa-grab - c267607d140254b728108b21ae053751a6e44839 authored about 1 year ago by arkiver <[email protected]>
Version 20230918.02. Fix check for api.zowa.app.

github.com/ArchiveTeam/zowa-grab - d1f4c9952739a4f210167825d6e135df3f729d5b authored about 1 year ago by arkiver <[email protected]>
Version 20230918.01. Initial.

github.com/ArchiveTeam/zowa-grab - 9cfb20e070520c135e4e29b4daccc5068c196212 authored about 1 year ago by arkiver <[email protected]>
Version 20230914.01. Pull wget-at-gnutls from atdr.meo.ws/archiveteam/grab-base:gnutls.

github.com/ArchiveTeam/warrior-dockerfile - 2af6c94f6597521874868c010aa9d2b9d4991a22 authored about 1 year ago by arkiver <[email protected]>
Version 20230913.01. Use cjson. Get rid of not used Lua files.

github.com/ArchiveTeam/telegram-grab - 0eeef780e3cb9b9c36003dd21e818f180a34e916 authored about 1 year ago by arkiver <[email protected]>
Merge pull request #9 from projectten/master

Remove unnecessary JSON parse in comments API path

github.com/ArchiveTeam/telegram-grab - 0b4c84210d4416bf8ca364b80dfb3467f8fc8ca0 authored about 1 year ago by arkiver <[email protected]>
Version 20230912.02. Update user agent.

github.com/ArchiveTeam/youtube-grab - 2091c607d7a7d293520a6dd5563aa8c56dacf2de authored about 1 year ago by arkiver <[email protected]>
Version 20230912.01. Prevent consent redirect.

github.com/ArchiveTeam/youtube-grab - 8547a575293cb89db4c97593fbbad06d184b4dea authored about 1 year ago by arkiver <[email protected]>
remove logic for deprecated Python versions

github.com/ArchiveTeam/ludios_wpull - 07a6340610158cd6f85f7ff4ff84f1fa187d8130 authored about 1 year ago by HeliosLHC <[email protected]>
update references to use new collections.abc module

github.com/ArchiveTeam/ludios_wpull - d228c01f108e42022aceacdbbd9feb61d884a2c0 authored about 1 year ago by HeliosLHC <[email protected]>
Version 20230910.01. Temporary hack to get sudo to work.

github.com/ArchiveTeam/warrior-dockerfile - b46ba0a05e16e7106ee4019726e2fdf23fc74331 authored about 1 year ago by arkiver <[email protected]>
replaced namedlist and ordereddefaultdict with stdlib implementations

github.com/ArchiveTeam/ludios_wpull - 7d1db472d848d116255dbf683a94e3954c6397e5 authored about 1 year ago by HeliosLHC <[email protected]>
Version 20230910.05. Install Lua utf8 library through warrior-install.sh.

github.com/ArchiveTeam/reddit-grab - a0e35bb72d0267fddfd9c56a238613845ca96283 authored about 1 year ago by arkiver <[email protected]>
Version 20230910.04. Install lua utf8 library. Fix converting unicode codepoint to utf8 character support.

github.com/ArchiveTeam/reddit-grab - 3add4f891ce3600b1b8e7328566353ddde1d5981 authored about 1 year ago by arkiver <[email protected]>
Version 20230910.03. Increase hardcoded multi item size to 100, for soft limiting on tracker side.

github.com/ArchiveTeam/reddit-grab - 12abd58d4dbf7496f811039a0c76bc12a58004bb authored about 1 year ago by arkiver <[email protected]>
Version 20230910.02. Remove old Lua files.

github.com/ArchiveTeam/reddit-grab - 8a46824231ca9725939c5e700da9ffc32f83af39 authored about 1 year ago by arkiver <[email protected]>
Version 20230910.01. Use cjson instead of JSON.lua.

github.com/ArchiveTeam/reddit-grab - a2ffd1f6712fd58861c5b1d1b26ebc74dd59c589 authored about 1 year ago by arkiver <[email protected]>
Remove unnecessary JSON parse in comments API path

github.com/ArchiveTeam/telegram-grab - db56bd3cef5dde3252e10a61d95df497012f0c81 authored about 1 year ago by project10 <[email protected]>
Version 20230901.01. Allow 404. Reduce max tries to 5.

github.com/ArchiveTeam/gfycat-grab - bfe80417b4d8340ab1896a329bb040fcf4e34748 authored about 1 year ago by arkiver <[email protected]>
Version 20230831.01. Fix archiving gif with title containing gif URL. Attempt to get redirect to redgifs.com if 404 on gfycat.com. Accept some 403s but do not write to WARC.

github.com/ArchiveTeam/gfycat-grab - ffd3fbeaeb906bcc3f03613ed114d6591506816e authored about 1 year ago by arkiver <[email protected]>
Version 20230829.02. Support . in user name.

github.com/ArchiveTeam/gfycat-grab - f146ee1b6d92daeb0033069a48b2f02df117328d authored about 1 year ago by arkiver <[email protected]>
Version 20230829.01. Support _ and - user profile name. Ignore /stickers/search/. Ignore non-gfycat user profile images.

github.com/ArchiveTeam/gfycat-grab - 9767145da685dbc7ab049a4f8acf9ebcf28a819d authored about 1 year ago by arkiver <[email protected]>