Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/github-grab
Archiving GitHub
https://github.com/ArchiveTeam/github-grab
Version 20231102.01. Do not keep partial files over rsync. Check for minimum version of Wget-AT instead of specific version.
b0b703bc562d55e86871bbcd658e3111a9904e9a authored about 1 year ago by arkiver <[email protected]>
b0b703bc562d55e86871bbcd658e3111a9904e9a authored about 1 year ago by arkiver <[email protected]>
Merge pull request #10 from imerr/master-1
Extra docker container params
3ac2d2a45bf99417a47cb9355e8bcc2f2f371b06 authored over 1 year ago by arkiver <[email protected]>
Extra docker container params
watchtower: `--include-restarting` also update if the container is in a crash loop due to a bad ...
7a1283f624027d94c0c08426db9bf2d800ec39ac authored over 1 year ago by Robin Rolf <[email protected]>
Version 20230727.01. Use GNU Wget 1.21.3-at.20230623.01. Use Wget-AT option --reject-reserved-subnets. Remove old Wget files. Update README to latest.
f7437a39934f79cf4da07fe3a5638a42ef4e155c authored over 1 year ago by arkiver <[email protected]>
f7437a39934f79cf4da07fe3a5638a42ef4e155c authored over 1 year ago by arkiver <[email protected]>
Version 20230607.01. Use GNU Wget 1.21.3-at.20230605.01 and arguments around DNS.
8c14051dca1d74dcbbb47b8064debcf9e2cd1c48 authored over 1 year ago by arkiver <[email protected]>
8c14051dca1d74dcbbb47b8064debcf9e2cd1c48 authored over 1 year ago by arkiver <[email protected]>
Version 20221201.01. Support GNU Wget 1.21.3-at.20220608.02.
e06ba18668b65808087012853a669cd126f47c20 authored about 2 years ago by arkiver <[email protected]>
e06ba18668b65808087012853a669cd126f47c20 authored about 2 years ago by arkiver <[email protected]>
Version 20220605.01. Support GNU Wget 1.21.3-at.20220503.02. Fix killing crawl when items cannot be queued.
dd15afda2d754f6bac780b19c27c32b348dce07e authored over 2 years ago by arkiver <[email protected]>
dd15afda2d754f6bac780b19c27c32b348dce07e authored over 2 years ago by arkiver <[email protected]>
Version 20220425.01. Fix backfeed queuing.
94a6aab7e69037b4b497430f3e44a2418efc16f2 authored over 2 years ago by arkiver <[email protected]>
94a6aab7e69037b4b497430f3e44a2418efc16f2 authored over 2 years ago by arkiver <[email protected]>
Version 20220324.01. Fix variable name.
967c5e599b212e4200dd63b1d213da9b2ca01aeb authored over 2 years ago by arkiver <[email protected]>
967c5e599b212e4200dd63b1d213da9b2ca01aeb authored over 2 years ago by arkiver <[email protected]>
Version 20220323.02. Fix items to maxtries variable name. Fix variable name items -> newurls.
f31a1225e793adfab8d7190d26069c60511b6531 authored almost 3 years ago by arkiver <[email protected]>
f31a1225e793adfab8d7190d26069c60511b6531 authored almost 3 years ago by arkiver <[email protected]>
Version 20220323.01. Fix backfeed for large numbers of discovered URLs. Fix maxtries use.
d10d24b685c7ec282902f8406e2005299f9958fc authored almost 3 years ago by arkiver <[email protected]>
d10d24b685c7ec282902f8406e2005299f9958fc authored almost 3 years ago by arkiver <[email protected]>
Version 20220318.01. Restrict query value to single term. Support , in fork number. Allow 403 on githubusercontent.com URLs ending with /.
7462786564c04e2b433e1f9319eb5b58ce809453 authored almost 3 years ago by arkiver <[email protected]>
7462786564c04e2b433e1f9319eb5b58ce809453 authored almost 3 years ago by arkiver <[email protected]>
Version 20220316.01. Various fixes.
5ba6f0fbf38fd00c28ebae344499fea5aff34e83 authored almost 3 years ago by arkiver <[email protected]>
5ba6f0fbf38fd00c28ebae344499fea5aff34e83 authored almost 3 years ago by arkiver <[email protected]>
Version 20220315.02. Backfeed v2. Do not queue timestamp items for web:complete:.
76ede1b4d8fcc7589db473e0b85ea76211fe92dd authored almost 3 years ago by arkiver <[email protected]>
76ede1b4d8fcc7589db473e0b85ea76211fe92dd authored almost 3 years ago by arkiver <[email protected]>
Version 20220315.01. Backfeed v2.
8a8ded732c032f8dc18362c8737acbcdf694c413 authored almost 3 years ago by arkiver <[email protected]>
8a8ded732c032f8dc18362c8737acbcdf694c413 authored almost 3 years ago by arkiver <[email protected]>
Version 20211228.02. Support discussions. Queue outlinks to urls-grab.
246eda2f82a04dbde4b23c1d0f5d1ccd20b2e468 authored almost 3 years ago by arkiver <[email protected]>
246eda2f82a04dbde4b23c1d0f5d1ccd20b2e468 authored almost 3 years ago by arkiver <[email protected]>
Version 20211228.01. Fix archiving zip and tar.gz archives.
d263fd634a1416ac58a3188c6097132418fa3572 authored almost 3 years ago by arkiver <[email protected]>
d263fd634a1416ac58a3188c6097132418fa3572 authored almost 3 years ago by arkiver <[email protected]>
Version 20211220.01. Initial fixes.
9d92d782ed884a3594bb1d8214f8a8484539d5db authored about 3 years ago by arkiver <[email protected]>
9d92d782ed884a3594bb1d8214f8a8484539d5db authored about 3 years ago by arkiver <[email protected]>
Version 20211025.01. Use GNU Wget 1.20.3-at.20211001.01.
d59d9b050c32f22d75223b8f0504a045b49a3dfd authored about 3 years ago by arkiver <[email protected]>
d59d9b050c32f22d75223b8f0504a045b49a3dfd authored about 3 years ago by arkiver <[email protected]>
Version 20210730.02. Fix conflicts.
d5ed86473bc1ebeaa3255c9633c332731b58bbed authored over 3 years ago by arkiver <[email protected]>
d5ed86473bc1ebeaa3255c9633c332731b58bbed authored over 3 years ago by arkiver <[email protected]>
Version 20210730.01. Ignore /refs$, handle signup? and various chunk-*.js files. Use new legacy-api.arpa.li tracker URL.
8c276b93de5034bf65446fc0ae6787816526c2a5 authored over 3 years ago by arkiver <[email protected]>
8c276b93de5034bf65446fc0ae6787816526c2a5 authored over 3 years ago by arkiver <[email protected]>
20210410.01 - New day new wget-at
75464a606c00706fcac4e928885804db750069db authored over 3 years ago by Thomas Glass <[email protected]>
75464a606c00706fcac4e928885804db750069db authored over 3 years ago by Thomas Glass <[email protected]>
Fix docker base image
d8657cb656befab586205e0f231e4441d1b15148 authored almost 4 years ago by Thomas Glass <[email protected]>
d8657cb656befab586205e0f231e4441d1b15148 authored almost 4 years ago by Thomas Glass <[email protected]>
Fix zstd, now get, latest wget-at and version bump
c1bc2399a2f49c04d06ddf915b7aded6cef723c4 authored almost 4 years ago by Thomas Glass <[email protected]>
c1bc2399a2f49c04d06ddf915b7aded6cef723c4 authored almost 4 years ago by Thomas Glass <[email protected]>
Merge pull request #2 from tech234a/patch-1
Fix Warrior support
3919fccb4568d6ea16a85271a210aafa53a79400 authored almost 4 years ago by Thomas Glass <[email protected]>
Fix Warrior support
d6d48d51f9908d9f0520f8838a0d6ccfd9c5913c authored almost 4 years ago by tech234a <[email protected]>
d6d48d51f9908d9f0520f8838a0d6ccfd9c5913c authored almost 4 years ago by tech234a <[email protected]>
Enable warriors + updated tracker host
08b3a9fbf8462bf1063a694fea23789efbe4b846 authored almost 4 years ago by Thomas Glass <[email protected]>
08b3a9fbf8462bf1063a694fea23789efbe4b846 authored almost 4 years ago by Thomas Glass <[email protected]>
Version 20210119.03. Do not get amazonaws.com URLs from the front project page.
b35b8d9f979c144392833fad75dfad114823d7e5 authored almost 4 years ago by arkiver <[email protected]>
b35b8d9f979c144392833fad75dfad114823d7e5 authored almost 4 years ago by arkiver <[email protected]>
Version 20210119.02. Fix getting timestamp. Also it is queuing to github-next, not reddit-next.
844c5423c152eecf639daf23b8b52355e3e5b46c authored almost 4 years ago by arkiver <[email protected]>
844c5423c152eecf639daf23b8b52355e3e5b46c authored almost 4 years ago by arkiver <[email protected]>
Merge branch 'master' of https://github.com/ArchiveTeam/github-grab
52478f0a40d7df39ff2d1b4415046c89b9a88072 authored almost 4 years ago by arkiver <[email protected]>
52478f0a40d7df39ff2d1b4415046c89b9a88072 authored almost 4 years ago by arkiver <[email protected]>
Version 20210119.01. Queue new items to reddit-next project.
ed1ba3a221c4816fff9bc9e32db0d0f27d8eaa1c authored almost 4 years ago by arkiver <[email protected]>
ed1ba3a221c4816fff9bc9e32db0d0f27d8eaa1c authored almost 4 years ago by arkiver <[email protected]>
Update grab-base
a2a0b33e3072c1ac005448c7e8c00f5b41434598 authored about 4 years ago by km09 <[email protected]>
a2a0b33e3072c1ac005448c7e8c00f5b41434598 authored about 4 years ago by km09 <[email protected]>
Version 20201031.02. Support Wget-AT version .20.3-at.20201030.01.
9aeb5012a736fc011253a93056dc3f85f5b20841 authored about 4 years ago by arkiver <[email protected]>
9aeb5012a736fc011253a93056dc3f85f5b20841 authored about 4 years ago by arkiver <[email protected]>
Version 20201031.01. Get branches for :complete: items.
630da80153d1c15babb409d54e0fc333d09ecb37 authored about 4 years ago by arkiver <[email protected]>
630da80153d1c15babb409d54e0fc333d09ecb37 authored about 4 years ago by arkiver <[email protected]>
Version 20201022.01. Fix ignoring renamed js files on GitHub.
80860858f921f3829cfb3b6e2fb7e538edff4213 authored about 4 years ago by arkiver <[email protected]>
80860858f921f3829cfb3b6e2fb7e538edff4213 authored about 4 years ago by arkiver <[email protected]>
Version 20200930.01. Support web:complete: item.
ac1cbfffb3251c26a716dbbbff0d4178bf8967a1 authored about 4 years ago by arkiver <[email protected]>
ac1cbfffb3251c26a716dbbbff0d4178bf8967a1 authored about 4 years ago by arkiver <[email protected]>
Version 20200919.02. Archive tar.gz archive if no stars or forks, but custom downloads or noted.
8b9189072a58b24d1f4811509dd516351a63e030 authored over 4 years ago by arkiver <[email protected]>
8b9189072a58b24d1f4811509dd516351a63e030 authored over 4 years ago by arkiver <[email protected]>
Version 20200919.01. Support Wget-AT version 1.20.3-at.20200919.01.
f0b90b9f1e00b687b1f957bc0a6bd5d2c19fee6a authored over 4 years ago by arkiver <[email protected]>
f0b90b9f1e00b687b1f957bc0a6bd5d2c19fee6a authored over 4 years ago by arkiver <[email protected]>
Version 20200917.01. Support Wget-AT version 1.20.3-at.20200917.01.
4b6b16caff926a8a68c903949aee2ea6d0cddc7b authored over 4 years ago by arkiver <[email protected]>
4b6b16caff926a8a68c903949aee2ea6d0cddc7b authored over 4 years ago by arkiver <[email protected]>
Version 20200911.01. Add SHA1 hash of item name to warc_file_base.
3fa2956c4b3d06249e8302c97d53a77d0a54d2e5 authored over 4 years ago by arkiver <[email protected]>
3fa2956c4b3d06249e8302c97d53a77d0a54d2e5 authored over 4 years ago by arkiver <[email protected]>
Version 20200909.02. Use socket.url to get absolute URL. Ignore URLs with certain characters for github.io. Ignore certain relative URLs.
90dfcc9ed8923abe528301f3e27b3b888e7f557b authored over 4 years ago by arkiver <[email protected]>
90dfcc9ed8923abe528301f3e27b3b888e7f557b authored over 4 years ago by arkiver <[email protected]>
Version 20200909.01. Add timeout of 3 seconds for getting target information. Chose new target on failing upload.
09a9a0785503852382d67371784ecb57069b7ad8 authored over 4 years ago by arkiver <[email protected]>
09a9a0785503852382d67371784ecb57069b7ad8 authored over 4 years ago by arkiver <[email protected]>
Version 20200904.03. Archive certain /_render_node/ URLs. Archive old edited comments.
660e0ccd55fb47db029381a78a8f8d04781c2473 authored over 4 years ago by arkiver <[email protected]>
660e0ccd55fb47db029381a78a8f8d04781c2473 authored over 4 years ago by arkiver <[email protected]>
Version 20200904.02. Remove debug line.
a59f5a7c49fdbf70474e4fcc5702473d569a569b authored over 4 years ago by arkiver <[email protected]>
a59f5a7c49fdbf70474e4fcc5702473d569a569b authored over 4 years ago by arkiver <[email protected]>
Version 20200904.01. Make URL check case insensitive.
9f6ced6b05bb8c9b4854036beb6139baec4f0324 authored over 4 years ago by arkiver <[email protected]>
9f6ced6b05bb8c9b4854036beb6139baec4f0324 authored over 4 years ago by arkiver <[email protected]>
Version 20200903.01. Add rsync finding capabilities.
76c6302e2aeac49bd15b61fb3bc5d8e16e4e01d9 authored over 4 years ago by arkiver <[email protected]>
76c6302e2aeac49bd15b61fb3bc5d8e16e4e01d9 authored over 4 years ago by arkiver <[email protected]>
Version 20200902.01. Support Wget-AT version 1.20.3-at.20200902.01. Do not write response record on 429 on github.com.
ef5848d80016738503009f13805e7a9f19d68dd1 authored over 4 years ago by arkiver <[email protected]>
ef5848d80016738503009f13805e7a9f19d68dd1 authored over 4 years ago by arkiver <[email protected]>
Version 20200901.05. Fix extraction of stars and forks when number is 1.
3e376b8b743b90726862db12c5dfd5a78d25d3a6 authored over 4 years ago by arkiver <[email protected]>
3e376b8b743b90726862db12c5dfd5a78d25d3a6 authored over 4 years ago by arkiver <[email protected]>
Version 20200901.04. Fix extraction of of large numbers of forks or stars.
f6f39d74699a7d034c8a55875a1718672f1d6f0b authored over 4 years ago by arkiver <[email protected]>
f6f39d74699a7d034c8a55875a1718672f1d6f0b authored over 4 years ago by arkiver <[email protected]>
Version 20200901.03. Do not ignore two tree URLs containing partial HTML data.
87f7a7eb81c9da68a83a23362c60471e8c1c9f03 authored over 4 years ago by arkiver <[email protected]>
87f7a7eb81c9da68a83a23362c60471e8c1c9f03 authored over 4 years ago by arkiver <[email protected]>
Version 20200901.02. Use trackerproxy for getting ZSTD dictionary.
f3c6a3f12eda7a14766fdbeb7ec59267eb42b408 authored over 4 years ago by arkiver <[email protected]>
f3c6a3f12eda7a14766fdbeb7ec59267eb42b408 authored over 4 years ago by arkiver <[email protected]>
Version 20200901.01. Get unix time from trackerproxy.archiveteam.org. Remove debug line in github.lua.
cafaa95b7c343e5c4f2c68303bcc9f5393cca20e authored over 4 years ago by arkiver <[email protected]>
cafaa95b7c343e5c4f2c68303bcc9f5393cca20e authored over 4 years ago by arkiver <[email protected]>
Version 20200831.03. Set tracker ID to github.
29212bfa96df69b1b0a33a492392584938f96d71 authored over 4 years ago by arkiver <[email protected]>
29212bfa96df69b1b0a33a492392584938f96d71 authored over 4 years ago by arkiver <[email protected]>
Version 20200831.02. Assert config is 'initial'.
a918e6125d05a51d62d06cc280742e743e10355f authored over 4 years ago by arkiver <[email protected]>
a918e6125d05a51d62d06cc280742e743e10355f authored over 4 years ago by arkiver <[email protected]>
Version 20200831.01. Use number of stars and forks to decide on downloading archives. Write new item to _data.txt with current unix timestamp.
b6a7493b5e109c98f033e46928b60c433e7da9c7 authored over 4 years ago by arkiver <[email protected]>
b6a7493b5e109c98f033e46928b60c433e7da9c7 authored over 4 years ago by arkiver <[email protected]>
Version 20200821.03. Ignore zip archives.
2f12723d4c9dcda72332ec9fd2176db6f9739866 authored over 4 years ago by arkiver <[email protected]>
2f12723d4c9dcda72332ec9fd2176db6f9739866 authored over 4 years ago by arkiver <[email protected]>
Version 20200821.02. Set tracker host to trackerproxy.archiveteam.org.
8776e3b1193ce26cdaf27fc66dae2b5d683a8838 authored over 4 years ago by arkiver <[email protected]>
8776e3b1193ce26cdaf27fc66dae2b5d683a8838 authored over 4 years ago by arkiver <[email protected]>
Version 20200821.01. Ignore bad status codes from camo.githubusercontent.com. Max tries for no important URLs to 3. Ignore bad status codes for non-GitHub amazonzws.com URL.
dc5298402ffbc9dbd750cfa1d0a615f6faaffe81 authored over 4 years ago by arkiver <[email protected]>
dc5298402ffbc9dbd750cfa1d0a615f6faaffe81 authored over 4 years ago by arkiver <[email protected]>
Version 20200813.01. Do not use API anymore due to harsh rate limit.
51db9ef3aaa44a31e279b6fc2eabc35773836aa9 authored over 4 years ago by arkiver <[email protected]>
51db9ef3aaa44a31e279b6fc2eabc35773836aa9 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.07. Ignore external homepages.
493dfa9526a887f007a8a15563c30dba17fb873c authored over 4 years ago by arkiver <[email protected]>
493dfa9526a887f007a8a15563c30dba17fb873c authored over 4 years ago by arkiver <[email protected]>
Version 20200812.06. Improve check for valid homepage.
df6a5c266644fd2c61ddc6529f66da92689d3377 authored over 4 years ago by arkiver <[email protected]>
df6a5c266644fd2c61ddc6529f66da92689d3377 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.05. Check length of homepage before processing.
e5fa58ab1b7f0723d835e4ce7aef3b0c4bfc29f6 authored over 4 years ago by arkiver <[email protected]>
e5fa58ab1b7f0723d835e4ce7aef3b0c4bfc29f6 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.04. Use tracker githubtest2.
c8dbd0091fe3ff3cf305fe5b08d8bd75475e48a5 authored over 4 years ago by arkiver <[email protected]>
c8dbd0091fe3ff3cf305fe5b08d8bd75475e48a5 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.03.
05c2b4947fe0d83ce76722ab5db1c900a14f93e3 authored over 4 years ago by arkiver <[email protected]>
05c2b4947fe0d83ce76722ab5db1c900a14f93e3 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.02. Support external pages sites.
2dd0fbf910b29fc23e6785f42434f67684b49060 authored over 4 years ago by arkiver <[email protected]>
2dd0fbf910b29fc23e6785f42434f67684b49060 authored over 4 years ago by arkiver <[email protected]>
Check validity of URL.
7cec7b5cf8f27a71b5b47c66dd47bcdd91f20fd5 authored over 4 years ago by arkiver <[email protected]>
7cec7b5cf8f27a71b5b47c66dd47bcdd91f20fd5 authored over 4 years ago by arkiver <[email protected]>
Version 20200812.01. Commented out support for external domains.
bd0c7ed1c21a58a4baaeed3505d653001a838d31 authored over 4 years ago by arkiver <[email protected]>
bd0c7ed1c21a58a4baaeed3505d653001a838d31 authored over 4 years ago by arkiver <[email protected]>
Version 20200811.02. Handle github.io repositories. Handle amazonaws.com release download redirects.
9ebed94a69b745d532a85d4c0e03e1f19bc89b7a authored over 4 years ago by arkiver <[email protected]>
9ebed94a69b745d532a85d4c0e03e1f19bc89b7a authored over 4 years ago by arkiver <[email protected]>
Version 20200811.01. Archive all tags if repository is no fork, else only download tags with extra downloads or notes attached.
b9a34a753c859c6db335cdfb5e1dd72e2cb944c4 authored over 4 years ago by arkiver <[email protected]>
b9a34a753c859c6db335cdfb5e1dd72e2cb944c4 authored over 4 years ago by arkiver <[email protected]>
Version 20200806.03. Skip status code 451.
fd52c3b9f8bc8c3f9ccfa3281db777ff7f238b94 authored over 4 years ago by arkiver <[email protected]>
fd52c3b9f8bc8c3f9ccfa3281db777ff7f238b94 authored over 4 years ago by arkiver <[email protected]>
Version 20200806.02. Support github.io URLs.
e1e19b92f10cd470c1a122fdd504d11a68db1825 authored over 4 years ago by arkiver <[email protected]>
e1e19b92f10cd470c1a122fdd504d11a68db1825 authored over 4 years ago by arkiver <[email protected]>
Merge branch 'master' of https://github.com/ArchiveTeam/github-grab
bc04b52a8b84af6fd5bc6d4c770143b2e2a27de4 authored over 4 years ago by arkiver <[email protected]>
bc04b52a8b84af6fd5bc6d4c770143b2e2a27de4 authored over 4 years ago by arkiver <[email protected]>
Version 20200806.01. Add ignores.
e46aa65bd7de8d4512b090dfcf61619f47991f98 authored over 4 years ago by arkiver <[email protected]>
e46aa65bd7de8d4512b090dfcf61619f47991f98 authored over 4 years ago by arkiver <[email protected]>
use wget-at instead of wget-lua
b5d68306b21977ddc64f60e56ab2fbd5abd515cf authored over 4 years ago by Katie Holly <[email protected]>
b5d68306b21977ddc64f60e56ab2fbd5abd515cf authored over 4 years ago by Katie Holly <[email protected]>
update to latest grab-base
ccb524b0a79e182c8003f2edc4af00404207994f authored over 4 years ago by Katie Holly <[email protected]>
ccb524b0a79e182c8003f2edc4af00404207994f authored over 4 years ago by Katie Holly <[email protected]>
Version 20200805.02. Force github for dictionar project ID.
2d65a47f4047a8618f7ef91b711d72f48e8b971d authored over 4 years ago by arkiver <[email protected]>
2d65a47f4047a8618f7ef91b711d72f48e8b971d authored over 4 years ago by arkiver <[email protected]>
Version 20200805.01. Change tracker to githubtest.
b6562fe5c7bebb38af27248d7ac5dc3fa68e60ae authored over 4 years ago by arkiver <[email protected]>
b6562fe5c7bebb38af27248d7ac5dc3fa68e60ae authored over 4 years ago by arkiver <[email protected]>
Version 20200804.02. Remove sleep.
244882c8b94b9b2ed882b3ef3ad96166ecbb3102 authored over 4 years ago by arkiver <[email protected]>
244882c8b94b9b2ed882b3ef3ad96166ecbb3102 authored over 4 years ago by arkiver <[email protected]>
Version 20200804.01. Do not write BANNED file.
5a8f6af17dac1e1995f5d0f91a1198e1db1dd18d authored over 4 years ago by arkiver <[email protected]>
5a8f6af17dac1e1995f5d0f91a1198e1db1dd18d authored over 4 years ago by arkiver <[email protected]>
Initial
7b5ba4bac4a9f22754e85932eeb45d3eacb42fdd authored over 4 years ago by arkiver <[email protected]>
7b5ba4bac4a9f22754e85932eeb45d3eacb42fdd authored over 4 years ago by arkiver <[email protected]>