Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/google-sites-grab
Archiving Google Sites Classic.
https://github.com/ArchiveTeam/google-sites-grab
Version 20230607.01. Use GNU Wget 1.21.3-at.20230605.01 and arguments around DNS.
0824c04bb340ab247d134eac126fdb2b4cf3a79d authored over 1 year ago by arkiver <[email protected]>
0824c04bb340ab247d134eac126fdb2b4cf3a79d authored over 1 year ago by arkiver <[email protected]>
Version 20221201.01. Support GNU Wget 1.21.3-at.20220608.02.
631cdd82b3f73470d6f0b449c2ea60cf248645fd authored about 2 years ago by arkiver <[email protected]>
631cdd82b3f73470d6f0b449c2ea60cf248645fd authored about 2 years ago by arkiver <[email protected]>
Version 20211001.01. Use GNU Wget 1.20.3-at.20211001.01.
b8d469fef6beceeb7f45767a3e7e63d35978eec9 authored about 3 years ago by arkiver <[email protected]>
b8d469fef6beceeb7f45767a3e7e63d35978eec9 authored about 3 years ago by arkiver <[email protected]>
Version 20210609.03. Only skip 403 on 'Access denied' match.
4c9c835dedecfa5b879aff70cd1419bfc5493ce9 authored over 3 years ago by arkiver <[email protected]>
4c9c835dedecfa5b879aff70cd1419bfc5493ce9 authored over 3 years ago by arkiver <[email protected]>
Add timeout of 1200 seconds on requests.
aacd7204255510bdb367ea89691c96f67a6bf107 authored over 3 years ago by arkiver <[email protected]>
aacd7204255510bdb367ea89691c96f67a6bf107 authored over 3 years ago by arkiver <[email protected]>
20210609.02 - Do not abort on 403's
8469b1600dca8d477b66411b22d146f2dad93ace authored over 3 years ago by Thomas Glass <[email protected]>
8469b1600dca8d477b66411b22d146f2dad93ace authored over 3 years ago by Thomas Glass <[email protected]>
20210609.01 - 429 handling & new wget-at
c4166c916f5a2d8dd9c959b44d5a88975b72d514 authored over 3 years ago by Thomas Glass <[email protected]>
c4166c916f5a2d8dd9c959b44d5a88975b72d514 authored over 3 years ago by Thomas Glass <[email protected]>
20210410.01 - New day new wget-at
36b8de790a29a28e67db1fe1291d90b7637538c4 authored over 3 years ago by Thomas Glass <[email protected]>
36b8de790a29a28e67db1fe1291d90b7637538c4 authored over 3 years ago by Thomas Glass <[email protected]>
Update Dockerfile
2230fd4c25f3933214981cbda7b5526411ba0831 authored almost 4 years ago by Thomas Glass <[email protected]>
2230fd4c25f3933214981cbda7b5526411ba0831 authored almost 4 years ago by Thomas Glass <[email protected]>
Update pipeline.py
e22a603fb235aa9079b61a8acd364bc11bbf17aa authored almost 4 years ago by Thomas Glass <[email protected]>
e22a603fb235aa9079b61a8acd364bc11bbf17aa authored almost 4 years ago by Thomas Glass <[email protected]>
Merge pull request #8 from NGTmeaty/patch-1
Update dictionary URL
aec44d53b69c628e8b5d88047a86421f6ae34d9e authored almost 4 years ago by Thomas Glass <[email protected]>
Merge pull request #7 from tech234a/patch-1
Fix Warrior support
41fb7f56a7587227b414a3bfcbe2c988035d8620 authored almost 4 years ago by Thomas Glass <[email protected]>
Update dictionary URL
UNTESTED but should work.
ed35cd36c6d57c543e489ec8f5f9fe4d96666855 authored almost 4 years ago by NGTmeaty <[email protected]>
Fix Warrior support
e9d076669027ecb8bc0e33ad451a1948b0924427 authored almost 4 years ago by tech234a <[email protected]>
e9d076669027ecb8bc0e33ad451a1948b0924427 authored almost 4 years ago by tech234a <[email protected]>
Enable warrior + new tracker host
e3b030f93e31497036bcbc45f3746c6f13753232 authored almost 4 years ago by Thomas Glass <[email protected]>
e3b030f93e31497036bcbc45f3746c6f13753232 authored almost 4 years ago by Thomas Glass <[email protected]>
Merge pull request #5 from OrIdow6/same-name-bug-pr
Fix bug where it would recurse to external sites
8b5f8fc3bf9f62ffb00fbec72d0f75708531a0d8 authored almost 4 years ago by Arkiver2 <[email protected]>
Move the match to the inside of the token check loop
Per EggplantN - more caution for when domain: items are added
80d0a9c17561420e16131af63d37ef2526b40891 authored almost 4 years ago by OrIdow6 <[email protected]>
Fix bug where it would recurse to external sites
Occured when they had a token in their URLs which was the same as the site name
E.g. site:homepi...
Merge branch 'master' of https://github.com/ArchiveTeam/google-sites-grab
4555b2efca73282189ef47b435e9673561b82a7d authored about 4 years ago by arkiver <[email protected]>
4555b2efca73282189ef47b435e9673561b82a7d authored about 4 years ago by arkiver <[email protected]>
Version 20201110.01. Extra check for correct imports.
415e509436e539dcb598e31f1fc94099a0bc55d1 authored about 4 years ago by arkiver <[email protected]>
415e509436e539dcb598e31f1fc94099a0bc55d1 authored about 4 years ago by arkiver <[email protected]>
Update grab-base
ae68cb49ba0066d71571d1a2f0135f17aa94c107 authored about 4 years ago by km09 <[email protected]>
ae68cb49ba0066d71571d1a2f0135f17aa94c107 authored about 4 years ago by km09 <[email protected]>
Version 20201031.01. Fix conflict.
e3ccc63b4230baeebb55f368d37a523666aa42fc authored about 4 years ago by arkiver <[email protected]>
e3ccc63b4230baeebb55f368d37a523666aa42fc authored about 4 years ago by arkiver <[email protected]>
Version 20201031.01. Support Wget-AT version .20.3-at.20201030.01.
b79b4a8ef4fa0f20bf4f33bbfc70558b4414fcf8 authored about 4 years ago by arkiver <[email protected]>
b79b4a8ef4fa0f20bf4f33bbfc70558b4414fcf8 authored about 4 years ago by arkiver <[email protected]>
Merge pull request #2 from HarryC145/patch-1
Change UA
7c4eacefe62d601b48405b94f9016ddf2aaf9838 authored about 4 years ago by Arkiver2 <[email protected]>
Change UA
bc04a4d0cb8af53490c3b6c7d205718676dd284b authored about 4 years ago by HarryC145 <[email protected]>
bc04a4d0cb8af53490c3b6c7d205718676dd284b authored about 4 years ago by HarryC145 <[email protected]>
Version 20201007.02. Sleep 600 seconds in case of CAPTCHA.
9704249af01685c8cea0ad977e3e20ca158eacd1 authored about 4 years ago by arkiver <[email protected]>
9704249af01685c8cea0ad977e3e20ca158eacd1 authored about 4 years ago by arkiver <[email protected]>
Version 20201007.01. Add URLs to ignore.
a3c0f0ccdf3f865e5f252a470dd72b3b67393e45 authored about 4 years ago by arkiver <[email protected]>
a3c0f0ccdf3f865e5f252a470dd72b3b67393e45 authored about 4 years ago by arkiver <[email protected]>
Version 20201006.08. Do not get /a/defaultdomain/. Prevent large numbers of overlapping feed URLs.
aa0509e72cd187d1edad3d2473262e74ccc32445 authored about 4 years ago by arkiver <[email protected]>
aa0509e72cd187d1edad3d2473262e74ccc32445 authored about 4 years ago by arkiver <[email protected]>
Version 20201006.07. Ignore picasaembed URL.
d447aa72c3ae423aa00f35aa5ac63c15207eb304 authored about 4 years ago by arkiver <[email protected]>
d447aa72c3ae423aa00f35aa5ac63c15207eb304 authored about 4 years ago by arkiver <[email protected]>
Version 20201006.06. Do not report item site:sites to backfeed.
30b078dcfa9d110a1fba17d57383e8c292889aeb authored about 4 years ago by arkiver <[email protected]>
30b078dcfa9d110a1fba17d57383e8c292889aeb authored about 4 years ago by arkiver <[email protected]>
Version 20201006.05. Fix typo.
7fa78fcbe1fd5668329943ae7867d7de5336e67f authored about 4 years ago by arkiver <[email protected]>
7fa78fcbe1fd5668329943ae7867d7de5336e67f authored about 4 years ago by arkiver <[email protected]>
Version 20201006.04. Add 18+ cookie.
8f5251b5fc87567adc0806f77ef0df1e8bc0300a authored about 4 years ago by arkiver <[email protected]>
8f5251b5fc87567adc0806f77ef0df1e8bc0300a authored about 4 years ago by arkiver <[email protected]>
Version 20201006.03. Support Wget-AT 1.20.3-at.20200919.01.
fd2aa76ab2735ac8ecb8a0ecc9015317b35ed9ed authored about 4 years ago by arkiver <[email protected]>
fd2aa76ab2735ac8ecb8a0ecc9015317b35ed9ed authored about 4 years ago by arkiver <[email protected]>
Version 20201006.02. Strip more URLs of attredirects params. Set max tries on external URLs to 1.
d4245ccbb9a0d6b27d63bc61efb6ae024f469429 authored about 4 years ago by arkiver <[email protected]>
d4245ccbb9a0d6b27d63bc61efb6ae024f469429 authored about 4 years ago by arkiver <[email protected]>
Version 20201006.01. Overall major improvements.
5633349774c4e670950a5c053fd94496418d26e6 authored about 4 years ago by arkiver <[email protected]>
5633349774c4e670950a5c053fd94496418d26e6 authored about 4 years ago by arkiver <[email protected]>
Version 20201001.01. Overall improvements.
017ac752d7bc9caf94b0e4b1fc2b21cbcb8469be authored about 4 years ago by arkiver <[email protected]>
017ac752d7bc9caf94b0e4b1fc2b21cbcb8469be authored about 4 years ago by arkiver <[email protected]>
Merge pull request #1 from OrIdow6/master
Misc
61f6c6f07c9138fa00c18bd5d3da3d01b74a7def authored about 4 years ago by Arkiver2 <[email protected]>
Bump version
a41ef9950ec30a92dcf2b60c92dba7dea62d8fb9 authored over 4 years ago by OrIdow6 <[email protected]>
a41ef9950ec30a92dcf2b60c92dba7dea62d8fb9 authored over 4 years ago by OrIdow6 <[email protected]>
Ignore all URLs under https://ssl.gstatic.com/sites/p/[a-z0-9]+
Having these as static ignores would have required updating the list too frequently.
230068a30fdb2946356ec42f0b5516a7a4ac6f0c authored over 4 years ago by OrIdow6 <[email protected]>
Get the feed with "?max-results=1000000000" on it as well
19771e76d4c1d5f0f0402f267880a6f0ce6ff189 authored over 4 years ago by OrIdow6 <[email protected]>
19771e76d4c1d5f0f0402f267880a6f0ce6ff189 authored over 4 years ago by OrIdow6 <[email protected]>
Fixed format
9cc6f074e95ade1deb8650dd485769cf695e3e1b authored over 4 years ago by OrIdow6 <[email protected]>
9cc6f074e95ade1deb8650dd485769cf695e3e1b authored over 4 years ago by OrIdow6 <[email protected]>
More ignores
76155c2dd09a6ff8c3ffc4a785f3587698a44d71 authored over 4 years ago by OrIdow6 <[email protected]>
76155c2dd09a6ff8c3ffc4a785f3587698a44d71 authored over 4 years ago by OrIdow6 <[email protected]>
Bump version
20128da5baad82c55695b0270ad5c833b9fd3514 authored over 4 years ago by OrIdow6 <[email protected]>
20128da5baad82c55695b0270ad5c833b9fd3514 authored over 4 years ago by OrIdow6 <[email protected]>
More ignores
6e574e57f4dce64c098ce3c1289c5f520c355fee authored over 4 years ago by OrIdow6 <[email protected]>
6e574e57f4dce64c098ce3c1289c5f520c355fee authored over 4 years ago by OrIdow6 <[email protected]>
Fetch feed
21a3f88175a0cf59a57441694ed9edbeaabacf27 authored over 4 years ago by OrIdow6 <[email protected]>
21a3f88175a0cf59a57441694ed9edbeaabacf27 authored over 4 years ago by OrIdow6 <[email protected]>
Related to attredirects
Strip attredirects param off of resource URLs; and also follow redirects to googlegroups.com that...
915ab55c51505da7495811c0cabbfa6edb607277 authored over 4 years ago by OrIdow6 <[email protected]>
Some static ignores
52f79dac06de5cb4b14e7ff57728508f596f952c authored over 4 years ago by OrIdow6 <[email protected]>
52f79dac06de5cb4b14e7ff57728508f596f952c authored over 4 years ago by OrIdow6 <[email protected]>
Version 20200910.03. Empty ignore-list file.
ea223a9ed0d5f8df58b1cdfaca56b43e41830aab authored over 4 years ago by arkiver <[email protected]>
ea223a9ed0d5f8df58b1cdfaca56b43e41830aab authored over 4 years ago by arkiver <[email protected]>
Version 20200910.02. Add discovery. Get external resources.
f1aebbd4310f0d9c637f4f0421d3679ebdbd8c8f authored over 4 years ago by arkiver <[email protected]>
f1aebbd4310f0d9c637f4f0421d3679ebdbd8c8f authored over 4 years ago by arkiver <[email protected]>
initial
9b1585e1ca36f682b0867374474e01f1f0d46cae authored over 4 years ago by arkiver <[email protected]>
9b1585e1ca36f682b0867374474e01f1f0d46cae authored over 4 years ago by arkiver <[email protected]>