Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/urls-grab
Archiving URLs (outlinks) from a variety of sources.
https://github.com/ArchiveTeam/urls-grab
Version 20210302.06. Stop getting outlinks.
251c493971c724cf1199c41ad8c295fdb0d88ddb authored almost 4 years ago by arkiver <[email protected]>
251c493971c724cf1199c41ad8c295fdb0d88ddb authored almost 4 years ago by arkiver <[email protected]>
Version 20210302.05. Check differences between page requisites and page URL.
358c971e058e302b027f69e6b8ce14a04dc3cac2 authored almost 4 years ago by arkiver <[email protected]>
358c971e058e302b027f69e6b8ce14a04dc3cac2 authored almost 4 years ago by arkiver <[email protected]>
Version 20210302.04. Unquote bad URLs.
afde2d5bd93f9c64a677beefc4c42b57869741c1 authored almost 4 years ago by arkiver <[email protected]>
afde2d5bd93f9c64a677beefc4c42b57869741c1 authored almost 4 years ago by arkiver <[email protected]>
Version 20210302.03. Disable page requisites for now.
476c60230b2a28e61fbee1b952eac8b35e1e433a authored almost 4 years ago by arkiver <[email protected]>
476c60230b2a28e61fbee1b952eac8b35e1e433a authored almost 4 years ago by arkiver <[email protected]>
Version 20210302.02. Use lowercase string for URL loop checks.
69a2c27fe75d147cd5ea07ed3c67c7bccc443573 authored almost 4 years ago by arkiver <[email protected]>
69a2c27fe75d147cd5ea07ed3c67c7bccc443573 authored almost 4 years ago by arkiver <[email protected]>
Version 20210302.01. Extract page requisites. Percent encode URLs to be submitted.
178b9b8160688779d35d20e1913313cd3bcb7c15 authored almost 4 years ago by arkiver <[email protected]>
178b9b8160688779d35d20e1913313cd3bcb7c15 authored almost 4 years ago by arkiver <[email protected]>
Version 20210301.05. Disable page requisites archiving.
641a1c117d4fdcbdfc51e854b99262075e68945f authored almost 4 years ago by arkiver <[email protected]>
641a1c117d4fdcbdfc51e854b99262075e68945f authored almost 4 years ago by arkiver <[email protected]>
Version 20210301.04. Add local cache of previously found URLs.
fe3c113765ec41042a3b8b6eb16338ccdb22fc90 authored almost 4 years ago by arkiver <[email protected]>
fe3c113765ec41042a3b8b6eb16338ccdb22fc90 authored almost 4 years ago by arkiver <[email protected]>
Version 20210301.03. Multi item size 100.
f6f1fb8b9ec9250acc3f902bbaa62a9ef6284297 authored almost 4 years ago by arkiver <[email protected]>
f6f1fb8b9ec9250acc3f902bbaa62a9ef6284297 authored almost 4 years ago by arkiver <[email protected]>
Version 20210301.02. Prevent loops with max 3 occurences of [^/]+ strings in URL.
c3a382531377340c2b7ef3cc1a451b6ee12ab876 authored almost 4 years ago by arkiver <[email protected]>
c3a382531377340c2b7ef3cc1a451b6ee12ab876 authored almost 4 years ago by arkiver <[email protected]>
Version 20210301.01. Get page requisites.
5a2ee0b329048de1a61b62bcb14afd462b564972 authored almost 4 years ago by arkiver <[email protected]>
5a2ee0b329048de1a61b62bcb14afd462b564972 authored almost 4 years ago by arkiver <[email protected]>
20210225.02: update dict url
1d7e08465974c64447908428139342ac2b46a15d authored almost 4 years ago by Katie Holly <[email protected]>
1d7e08465974c64447908428139342ac2b46a15d authored almost 4 years ago by Katie Holly <[email protected]>
Version 20210225.01. Disable dedup from AT collection due to 429s from IA.
025ba35767ee2c61187188543b823ffae5a36c4f authored almost 4 years ago by arkiver <[email protected]>
025ba35767ee2c61187188543b823ffae5a36c4f authored almost 4 years ago by arkiver <[email protected]>
Version 20210224.01. Check URLs of over 5 MB in size with the Wayback Machine.
22cdd38f787173f8c0b4106d23b69c5f1cbfbddf authored almost 4 years ago by arkiver <[email protected]>
22cdd38f787173f8c0b4106d23b69c5f1cbfbddf authored almost 4 years ago by arkiver <[email protected]>
Drone poke .05
43128f53904d35e8f1dafc8fed61a6527890de8a authored almost 4 years ago by Thomas Glass <[email protected]>
43128f53904d35e8f1dafc8fed61a6527890de8a authored almost 4 years ago by Thomas Glass <[email protected]>
Version 20210221.04. Use Wget-AT 1.20.3-at.20210212.02.
5766b336914d7920acda14b35b176f406e80f365 authored almost 4 years ago by arkiver <[email protected]>
5766b336914d7920acda14b35b176f406e80f365 authored almost 4 years ago by arkiver <[email protected]>
Version 20210221.03. Do not set bad URL on looping redirect.
25aad941b17af4bf324ed8e826f14fdca9c1711f authored almost 4 years ago by arkiver <[email protected]>
25aad941b17af4bf324ed8e826f14fdca9c1711f authored almost 4 years ago by arkiver <[email protected]>
Add note on version.
90627697d2a73aedf7081eafc28e16a3275912ec authored almost 4 years ago by arkiver <[email protected]>
90627697d2a73aedf7081eafc28e16a3275912ec authored almost 4 years ago by arkiver <[email protected]>
Version 20210221.02. Fix conflict.
b66ab5b795cd0e2414ea4b6f7976ea2883b87bd3 authored almost 4 years ago by arkiver <[email protected]>
b66ab5b795cd0e2414ea4b6f7976ea2883b87bd3 authored almost 4 years ago by arkiver <[email protected]>
Version 20210221.01. Only queue URLs on 2xx status code.
d36a239c1647fb17b7885185a0654a37b95ac2d7 authored almost 4 years ago by arkiver <[email protected]>
d36a239c1647fb17b7885185a0654a37b95ac2d7 authored almost 4 years ago by arkiver <[email protected]>
Allow warriors on this project
c27dee2dd11a30586c71ed5e66487daf53d53c57 authored almost 4 years ago by km09 <[email protected]>
c27dee2dd11a30586c71ed5e66487daf53d53c57 authored almost 4 years ago by km09 <[email protected]>
update base image
2d6549a8ce77d5076415880eccf5eff284b6dd21 authored almost 4 years ago by Katie Holly <[email protected]>
2d6549a8ce77d5076415880eccf5eff284b6dd21 authored almost 4 years ago by Katie Holly <[email protected]>
update tracker url
86032fece361538f96146e298e6eff3ba0ef28f3 authored almost 4 years ago by Katie Holly <[email protected]>
86032fece361538f96146e298e6eff3ba0ef28f3 authored almost 4 years ago by Katie Holly <[email protected]>
Revert "report_bad_url and exit on being redirected to Instagram login page"
This reverts commit 857e96c5f44d124b8a0f9af3a7e9a414e14cf707.
e48bad18bc3b38b3cd5b738394182d80bd1c0fad authored almost 4 years ago by km09 <[email protected]>
Revert "HTTP or https"
This reverts commit 636a56846da0b4225c25d7ef2bc995945fb74dd3.
75bb3181745309236248a21bf088c6a741374454 authored almost 4 years ago by km09 <[email protected]>
Revert "Bump version"
This reverts commit ca0ed028861d2e6562e599859ab876f9cf6d691e.
a8026b76f567b9a298043e8515f6db2d8d1f4329 authored almost 4 years ago by km09 <[email protected]>
Bump version
ca0ed028861d2e6562e599859ab876f9cf6d691e authored almost 4 years ago by km09 <[email protected]>
ca0ed028861d2e6562e599859ab876f9cf6d691e authored almost 4 years ago by km09 <[email protected]>
HTTP or https
636a56846da0b4225c25d7ef2bc995945fb74dd3 authored almost 4 years ago by OrIdow6 <[email protected]>
636a56846da0b4225c25d7ef2bc995945fb74dd3 authored almost 4 years ago by OrIdow6 <[email protected]>
report_bad_url and exit on being redirected to Instagram login page
Apparently taking up a large amount of time.
I'm assuming here that report_bad_urls requeues ...
857e96c5f44d124b8a0f9af3a7e9a414e14cf707 authored almost 4 years ago by OrIdow6 <[email protected]>
Version 20201123.01.
6a934a07886e2d0210b51398aae9a29e706124a3 authored about 4 years ago by arkiver <[email protected]>
6a934a07886e2d0210b51398aae9a29e706124a3 authored about 4 years ago by arkiver <[email protected]>
Merge pull request #1 from OrIdow6/patch-1
Add more params to strip
6185c463c22b205433536ccf2b6eee76d933e263 authored about 4 years ago by Arkiver2 <[email protected]>
Revert "Version"
This reverts commit 933e68c02e1e6de3ddc5587b77e45460f014cea5.
51f084e6c4e03e0f25870f1d0fe4dbae92e371cf authored about 4 years ago by OrIdow6 <[email protected]>
Version 20201116.01. Ignore certain URLs with certain patterns.
3777be54570b3300efbc64ac8bd9fbf0c8cf67fa authored about 4 years ago by arkiver <[email protected]>
3777be54570b3300efbc64ac8bd9fbf0c8cf67fa authored about 4 years ago by arkiver <[email protected]>
Version
933e68c02e1e6de3ddc5587b77e45460f014cea5 authored about 4 years ago by OrIdow6 <[email protected]>
933e68c02e1e6de3ddc5587b77e45460f014cea5 authored about 4 years ago by OrIdow6 <[email protected]>
Add more params to strip
Based on the list at https://gitlab.com/KevinRoebert/ClearUrls/-/blob/master/data/data.json
3857a7470e9a3cdfacc86bca5e23995c983682f9 authored about 4 years ago by OrIdow6 <[email protected]>
Version 20201112.01. Split off list of tracker parameters. Improve extraction of URLs to queue to prevent infinite loops.
06d76c1c695b69f87eae6126e809b0dd18cd4415 authored about 4 years ago by arkiver <[email protected]>
06d76c1c695b69f87eae6126e809b0dd18cd4415 authored about 4 years ago by arkiver <[email protected]>
Version 20201109.04. Report bad URL as lowercase.
b9606c33f463fe4c839d6044a464a87585ec02a0 authored about 4 years ago by arkiver <[email protected]>
b9606c33f463fe4c839d6044a464a87585ec02a0 authored about 4 years ago by arkiver <[email protected]>
Version 20201109.03. Sad bad URL and skip when redirected to Google sorry page.
3b6d17e38f175afc3ade111871b8d680bc03fc83 authored about 4 years ago by arkiver <[email protected]>
3b6d17e38f175afc3ade111871b8d680bc03fc83 authored about 4 years ago by arkiver <[email protected]>
Version 20201109.02. Also queue new URLs for 3xx URLs.
31885798e820de5dbd1fa63d5d6ce107a2173c45 authored about 4 years ago by arkiver <[email protected]>
31885798e820de5dbd1fa63d5d6ce107a2173c45 authored about 4 years ago by arkiver <[email protected]>
Version 20201109.01. Queue URLs with replaced ?amp; and &.
d589881d1e25a2e078fa27fcd1bfd0d389478459 authored about 4 years ago by arkiver <[email protected]>
d589881d1e25a2e078fa27fcd1bfd0d389478459 authored about 4 years ago by arkiver <[email protected]>
Version 20201106.01. Increase Wget-AT process timeout to 600 seconds.
1063f1195681dfeaf8fea0d7ab9d8bcda9cb6b05 authored about 4 years ago by arkiver <[email protected]>
1063f1195681dfeaf8fea0d7ab9d8bcda9cb6b05 authored about 4 years ago by arkiver <[email protected]>
Version 20201104.04. Add ZSTD+dict compression.
fd1647d59193e5f87f581a4a24def66c75565efd authored about 4 years ago by arkiver <[email protected]>
fd1647d59193e5f87f581a4a24def66c75565efd authored about 4 years ago by arkiver <[email protected]>
Version 20201104.03. Fix queue for queuing URLs.
12659f1371e310047c7c6206a2c466dfdd1d5400 authored about 4 years ago by arkiver <[email protected]>
12659f1371e310047c7c6206a2c466dfdd1d5400 authored about 4 years ago by arkiver <[email protected]>
Version 20201104.02. Queue URLs without parameters and with certain parameters removed.
104776d6a9c25e43bf5e68e8adaec78d2ed090ef authored about 4 years ago by arkiver <[email protected]>
104776d6a9c25e43bf5e68e8adaec78d2ed090ef authored about 4 years ago by arkiver <[email protected]>
Version 20201104.01. Use Connection: keep-alive (lowercase). Update user agent.
6fc1926a0971f86fbe69e87296fef25738b02de7 authored about 4 years ago by arkiver <[email protected]>
6fc1926a0971f86fbe69e87296fef25738b02de7 authored about 4 years ago by arkiver <[email protected]>
Version 20201103.02. Experimental and temporary timeout for Wget-AT command.
05bcd309725e873026c2e054fe725565bec33922 authored about 4 years ago by arkiver <[email protected]>
05bcd309725e873026c2e054fe725565bec33922 authored about 4 years ago by arkiver <[email protected]>
Version 20201103.01. Do not wait between retries. Retry at max 2 times in Wget-AT options.
ff8d368567aaa519bda2f48745a7f42ca27ba405 authored about 4 years ago by arkiver <[email protected]>
ff8d368567aaa519bda2f48745a7f42ca27ba405 authored about 4 years ago by arkiver <[email protected]>
Version 20201101.05. Return when completing item.
9092597803a1166b7242a3c78772b6e1b3acd8b1 authored about 4 years ago by arkiver <[email protected]>
9092597803a1166b7242a3c78772b6e1b3acd8b1 authored about 4 years ago by arkiver <[email protected]>
Version 20201101.04. Maybe SendDoneToTracker.
7cfe710f6d67916d5f95bd7ecb427647d9391c8c authored about 4 years ago by arkiver <[email protected]>
7cfe710f6d67916d5f95bd7ecb427647d9391c8c authored about 4 years ago by arkiver <[email protected]>
Version 20201101.03. Also pop bad URLs in lowercase items list.
ba55d492f35660882fd4f08a27c9879934313e14 authored about 4 years ago by arkiver <[email protected]>
ba55d492f35660882fd4f08a27c9879934313e14 authored about 4 years ago by arkiver <[email protected]>
Version 20201101.02. No retries on bad responses. Remove bad URLs from multi-item based on lowercase.
422c6162c2054ea2d187f15558a7ced6448239b8 authored about 4 years ago by arkiver <[email protected]>
422c6162c2054ea2d187f15558a7ced6448239b8 authored about 4 years ago by arkiver <[email protected]>
Version 20201101.01. Prevent loop if item URL could not be found.
fe0a81222f8dc7c7638e416053a3ae4d4bffa641 authored about 4 years ago by arkiver <[email protected]>
fe0a81222f8dc7c7638e416053a3ae4d4bffa641 authored about 4 years ago by arkiver <[email protected]>
Add README.
8aa7daa0bd394066e760c3ce425964045134b6ed authored about 4 years ago by arkiver <[email protected]>
8aa7daa0bd394066e760c3ce425964045134b6ed authored about 4 years ago by arkiver <[email protected]>
Version 20201031.06. Only report correctly downloaded URLs as finished to tracker.
cbbe3dec49408830116bc03067c3fe62c098f4ca authored about 4 years ago by arkiver <[email protected]>
cbbe3dec49408830116bc03067c3fe62c098f4ca authored about 4 years ago by arkiver <[email protected]>
Version 20201031.05. Abort item on bad URL. Treat status code 0 as bad status code.
f88a4f61b7d0a131683727fe207c7326d787755d authored about 4 years ago by arkiver <[email protected]>
f88a4f61b7d0a131683727fe207c7326d787755d authored about 4 years ago by arkiver <[email protected]>
Version 20201031.04. Remove bad char.
279eb92602f42341f1bed39deab6e2f74a3b33b7 authored about 4 years ago by arkiver <[email protected]>
279eb92602f42341f1bed39deab6e2f74a3b33b7 authored about 4 years ago by arkiver <[email protected]>
Version 20201031.03. Remove more printing.
108018633eb0fb5a0b9787db35bb9a88129639c3 authored about 4 years ago by arkiver <[email protected]>
108018633eb0fb5a0b9787db35bb9a88129639c3 authored about 4 years ago by arkiver <[email protected]>
Version 20201031.02. Do not print URLs at start.
ea7c7664eb658fdc8945ca886affb8e315ebf8b1 authored about 4 years ago by arkiver <[email protected]>
ea7c7664eb658fdc8945ca886affb8e315ebf8b1 authored about 4 years ago by arkiver <[email protected]>
initial
eddeb3951176ddb007d6085dca9dfad3dd88bcdc authored about 4 years ago by arkiver <[email protected]>
eddeb3951176ddb007d6085dca9dfad3dd88bcdc authored about 4 years ago by arkiver <[email protected]>