Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/reddit-grab
Grabbing everything from reddit.
https://github.com/ArchiveTeam/reddit-grab
Version 20210114.04. Support cookies.
0be16f775ab8ad877d365237ead6bdf4f2dbb645 authored almost 4 years ago by arkiver <[email protected]>
0be16f775ab8ad877d365237ead6bdf4f2dbb645 authored almost 4 years ago by arkiver <[email protected]>
Version 20210114.03. Do not accept 403.
0d0e824421c09b7de34840c20199e466d4e6ab39 authored almost 4 years ago by arkiver <[email protected]>
0d0e824421c09b7de34840c20199e466d4e6ab39 authored almost 4 years ago by arkiver <[email protected]>
Version 20210114.02. Actually add the user-agents file.
94d8b551f883277e76f0e51dbae0207001ba997e authored almost 4 years ago by arkiver <[email protected]>
94d8b551f883277e76f0e51dbae0207001ba997e authored almost 4 years ago by arkiver <[email protected]>
Version 20210114.01. Use a random user-agent.
bc94cf036f3d4256451e86adaa00dc44163f79f7 authored almost 4 years ago by arkiver <[email protected]>
bc94cf036f3d4256451e86adaa00dc44163f79f7 authored almost 4 years ago by arkiver <[email protected]>
Version 20210109.01. Use browser user-agent.
df1f60079daa6889b28a83463f54a002d456c097 authored almost 4 years ago by arkiver <[email protected]>
df1f60079daa6889b28a83463f54a002d456c097 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.09. Ignore over18 URLs on old.reddit.com (cookie fix coming up, not a problem on www.reddit.com).
2911934fd4308901b0a09f16a7bb887d7b3bef47 authored almost 4 years ago by arkiver <[email protected]>
2911934fd4308901b0a09f16a7bb887d7b3bef47 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.08. Do not archive URLs with utm_source for old.reddit.com.
f3d41ea2e11e218aab01852c079c5e50be1655d0 authored almost 4 years ago by arkiver <[email protected]>
f3d41ea2e11e218aab01852c079c5e50be1655d0 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.07. Use tracker reddit.
992fb6b953ebed28fc957ac3d86e966eeef39851 authored almost 4 years ago by arkiver <[email protected]>
992fb6b953ebed28fc957ac3d86e966eeef39851 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.06.
6a8d5a62ac3b367543da98fd785660acc52b239a authored almost 4 years ago by arkiver <[email protected]>
6a8d5a62ac3b367543da98fd785660acc52b239a authored almost 4 years ago by arkiver <[email protected]>
Fix for archiving videos.
1b220e014bec4aceee27e78edd771df63eb341b0 authored almost 4 years ago by arkiver <[email protected]>
1b220e014bec4aceee27e78edd771df63eb341b0 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.05.
4a371be167910dfecdf764faed8790b2db2e7b40 authored almost 4 years ago by arkiver <[email protected]>
4a371be167910dfecdf764faed8790b2db2e7b40 authored almost 4 years ago by arkiver <[email protected]>
Handle NULL byte seperated multi items. Support unicode chars in JSON permalink.
3d20ca90afe2418ed63b62e35ad016654c289deb authored almost 4 years ago by arkiver <[email protected]>
3d20ca90afe2418ed63b62e35ad016654c289deb authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.03.
7c5ea717a8767eb7c14c41e0d537314143e9c71d authored almost 4 years ago by arkiver <[email protected]>
7c5ea717a8767eb7c14c41e0d537314143e9c71d authored almost 4 years ago by arkiver <[email protected]>
Merge branch 'master' of https://github.com/ArchiveTeam/reddit-grab
5f3958c28244fdcf19d388dc7ee9c3f41f5b6ab7 authored almost 4 years ago by arkiver <[email protected]>
5f3958c28244fdcf19d388dc7ee9c3f41f5b6ab7 authored almost 4 years ago by arkiver <[email protected]>
Version 20210108.02.
1924d5217e22e91c76c65a2cd3980b693ba87da7 authored almost 4 years ago by arkiver <[email protected]>
1924d5217e22e91c76c65a2cd3980b693ba87da7 authored almost 4 years ago by arkiver <[email protected]>
Support single comment and post items. Queue outlinks to URLs project.
16836ba20140dda9c3d2b9de1329c2188d2d187f authored almost 4 years ago by arkiver <[email protected]>
16836ba20140dda9c3d2b9de1329c2188d2d187f authored almost 4 years ago by arkiver <[email protected]>
Use multi items.
ae57a81bafd2c792c082ca19296cc6df51968008 authored almost 4 years ago by arkiver <[email protected]>
ae57a81bafd2c792c082ca19296cc6df51968008 authored almost 4 years ago by arkiver <[email protected]>
Use updated grab-base
eb945d2470d44d9355bfa00b8da7492f730093b2 authored about 4 years ago by km09 <[email protected]>
eb945d2470d44d9355bfa00b8da7492f730093b2 authored about 4 years ago by km09 <[email protected]>
Version 20201031.01. Support Wget-AT version 1.20.3-at.20201030.01.
4284d24b47b8f288f9fec889854893a78aae6c2d authored about 4 years ago by arkiver <[email protected]>
4284d24b47b8f288f9fec889854893a78aae6c2d authored about 4 years ago by arkiver <[email protected]>
Version 20200902.01. Support Wget-AT version 1.20.3-at.20200902.01.
9ecf9a3a3050cc051e4738dbf5297de5d6e00442 authored over 4 years ago by arkiver <[email protected]>
9ecf9a3a3050cc051e4738dbf5297de5d6e00442 authored over 4 years ago by arkiver <[email protected]>
Version 20200821.02. Set tracker host to trackerproxy.archiveteam.org.
99875895b64fe0a859aa4a4510c536ae59a24df5 authored over 4 years ago by arkiver <[email protected]>
99875895b64fe0a859aa4a4510c536ae59a24df5 authored over 4 years ago by arkiver <[email protected]>
Version 20200821.01. Ignore comment URL with utm_source param.
2087174a5ca35a48067eae57b030a54c9c74dbd9 authored over 4 years ago by arkiver <[email protected]>
2087174a5ca35a48067eae57b030a54c9c74dbd9 authored over 4 years ago by arkiver <[email protected]>
Version 20200805.01. Support Wget-AT version 1.20.3-at.20200804.01.
ace1a4f037bf9054e51e479b64d9ae573aeb2c3a authored over 4 years ago by arkiver <[email protected]>
ace1a4f037bf9054e51e479b64d9ae573aeb2c3a authored over 4 years ago by arkiver <[email protected]>
Use new README template.
8b40429e9552edf5bf484fe2c70368c310d3c6cf authored over 4 years ago by arkiver <[email protected]>
8b40429e9552edf5bf484fe2c70368c310d3c6cf authored over 4 years ago by arkiver <[email protected]>
Version 20200730.01. Support /user/ post better (like /r/).
23bfe8b12cb5ee31bef9d63004f8a1317f7e12ae authored over 4 years ago by arkiver <[email protected]>
23bfe8b12cb5ee31bef9d63004f8a1317f7e12ae authored over 4 years ago by arkiver <[email protected]>
Version 20200728.01. Ignore non-reddit URLs. Fix extraction of tokens for morecomments.
450d4e04135866411dc36ae06d62b85188040d14 authored over 4 years ago by arkiver <[email protected]>
450d4e04135866411dc36ae06d62b85188040d14 authored over 4 years ago by arkiver <[email protected]>
Version 20200727.03. Fix handling video URLs without extension.
9a6417ecbc0ffe55fc47aef83acfd7236c389090 authored over 4 years ago by arkiver <[email protected]>
9a6417ecbc0ffe55fc47aef83acfd7236c389090 authored over 4 years ago by arkiver <[email protected]>
Remove unused cookies.txt file. Update README.
869cdc4e6e1f319bf4dafe3cd99c417a42ac264c authored over 4 years ago by arkiver <[email protected]>
869cdc4e6e1f319bf4dafe3cd99c417a42ac264c authored over 4 years ago by arkiver <[email protected]>
Version 20200727.02. Set TRACKER_ID to reddittest.
911c675e749d6b528c2b849ec2106d2cd5f5bd9d authored over 4 years ago by arkiver <[email protected]>
911c675e749d6b528c2b849ec2106d2cd5f5bd9d authored over 4 years ago by arkiver <[email protected]>
Version 20200727.01. Use trackerproxy for dictionaries. Ignore irc: URLs.
147c6416ed84c4b17baeaec4ac7c7346d46f27bc authored over 4 years ago by arkiver <[email protected]>
147c6416ed84c4b17baeaec4ac7c7346d46f27bc authored over 4 years ago by arkiver <[email protected]>
Version 20200726.06. Fix project name for ZSTD dictionary request.
910687b053d33d0350f540dc6e7b611e9973cdec authored over 4 years ago by arkiver <[email protected]>
910687b053d33d0350f540dc6e7b611e9973cdec authored over 4 years ago by arkiver <[email protected]>
Version 20200726.05. Add cookies to access some quarantines subreddits.
496c018eef2c8e61b1ea8923769b49430332b008 authored over 4 years ago by arkiver <[email protected]>
496c018eef2c8e61b1ea8923769b49430332b008 authored over 4 years ago by arkiver <[email protected]>
Version 20200726.04. Use reddittest tracker for size estimate.
3d5e7e17f91f78b206c16e2f573d092cc43fca52 authored over 4 years ago by arkiver <[email protected]>
3d5e7e17f91f78b206c16e2f573d092cc43fca52 authored over 4 years ago by arkiver <[email protected]>
Version 20200726.03. Support galleries and comments.
23fec56409e562f5d5a4263aeda937f0567c64ef authored over 4 years ago by arkiver <[email protected]>
23fec56409e562f5d5a4263aeda937f0567c64ef authored over 4 years ago by arkiver <[email protected]>
Version 20200726.01. Fully support new and old design for posts.
2f6a6023133d9905a186feadb10fef57ece9ee12 authored over 4 years ago by arkiver <[email protected]>
2f6a6023133d9905a186feadb10fef57ece9ee12 authored over 4 years ago by arkiver <[email protected]>
Use default upload concurrent of 2.
56571306dd2db943661ff3c6937a258a4f8cd9ce authored over 4 years ago by arkiver <[email protected]>
56571306dd2db943661ff3c6937a258a4f8cd9ce authored over 4 years ago by arkiver <[email protected]>
Use wget-at with ZSTD.
40063adcaf4825af4cc342897eaffaff511139fa authored over 4 years ago by arkiver <[email protected]>
40063adcaf4825af4cc342897eaffaff511139fa authored over 4 years ago by arkiver <[email protected]>
Do not import warcio. Update version to 20200102.03.
831f79f0d9c79ce6c805d0ab26a08eb655024f84 authored about 5 years ago by Arkiver2 <[email protected]>
831f79f0d9c79ce6c805d0ab26a08eb655024f84 authored about 5 years ago by Arkiver2 <[email protected]>
Skip URL on status code 204. Update version to 20200102.02.
cf3f6c7af94249ce58ebc4dadcfce9dc111406a5 authored about 5 years ago by Arkiver2 <[email protected]>
cf3f6c7af94249ce58ebc4dadcfce9dc111406a5 authored about 5 years ago by Arkiver2 <[email protected]>
Update version to 20200102.01.
ac65b0a818f7dba064ea4a085be289724a78959e authored about 5 years ago by Arkiver2 <[email protected]>
ac65b0a818f7dba064ea4a085be289724a78959e authored about 5 years ago by Arkiver2 <[email protected]>
Fix string joining.
0eb4b6205ae2df241e20e5bbf1f5d444e78008ac authored about 5 years ago by Arkiver2 <[email protected]>
0eb4b6205ae2df241e20e5bbf1f5d444e78008ac authored about 5 years ago by Arkiver2 <[email protected]>
Split off checking if URL was processed. Do not add URL without trailing / already added with trailing /.
ad2cf89404c7dead6d1680ea7c835156f5213cee authored over 5 years ago by Arkiver2 <[email protected]>
ad2cf89404c7dead6d1680ea7c835156f5213cee authored over 5 years ago by Arkiver2 <[email protected]>
Skip amp.reddit.com post pages.
d4d5c9a93f844ff1789fe017c66febca2f2a45b0 authored over 5 years ago by Arkiver2 <[email protected]>
d4d5c9a93f844ff1789fe017c66febca2f2a45b0 authored over 5 years ago by Arkiver2 <[email protected]>
Version 20190729.01; do not get page requisites from outlinks; do not pip install warcio.
4cf7bd18f0feb18a0917391a5d1896ddf05355a0 authored over 5 years ago by Arkiver2 <[email protected]>
4cf7bd18f0feb18a0917391a5d1896ddf05355a0 authored over 5 years ago by Arkiver2 <[email protected]>
Version 20190405.01; support www.reddit.com; support videos; support outlinks
8902255c76c9c66cf43fcccd58b860556b335128 authored over 5 years ago by Arkiver2 <[email protected]>
8902255c76c9c66cf43fcccd58b860556b335128 authored over 5 years ago by Arkiver2 <[email protected]>
rewrite
9d1ea0c6888125db219b9007f8434e1bdb472dec authored almost 6 years ago by Arkiver2 <[email protected]>
9d1ea0c6888125db219b9007f8434e1bdb472dec authored almost 6 years ago by Arkiver2 <[email protected]>
reddit.lua: ignore urls, fixes
c08fd59a29b67edaff3eb372783456f600c40ea1 authored over 9 years ago by Arkiver2 <[email protected]>
c08fd59a29b67edaff3eb372783456f600c40ea1 authored over 9 years ago by Arkiver2 <[email protected]>
pipeline.py: cookies!
11aef69a3206655efe9eced31259a286977fc1cb authored over 9 years ago by Arkiver2 <[email protected]>
11aef69a3206655efe9eced31259a286977fc1cb authored over 9 years ago by Arkiver2 <[email protected]>
cookies
38074381c4f759f9f62a9751522c7525dd3b1609 authored over 9 years ago by Arkiver2 <[email protected]>
38074381c4f759f9f62a9751522c7525dd3b1609 authored over 9 years ago by Arkiver2 <[email protected]>
README.md
e87a2e4a51b96db2fec7653c103226683ed5c1af authored over 9 years ago by Arkiver2 <[email protected]>
e87a2e4a51b96db2fec7653c103226683ed5c1af authored over 9 years ago by Arkiver2 <[email protected]>
reddit.lua
2dd4e29062a3b0ae8788a4925802d42c5f3abe39 authored over 9 years ago by Arkiver2 <[email protected]>
2dd4e29062a3b0ae8788a4925802d42c5f3abe39 authored over 9 years ago by Arkiver2 <[email protected]>
pipeline.py: use redd.it to discover comments
9f531c900fbb1f99061a78b00644d28b4079d719 authored over 9 years ago by Arkiver2 <[email protected]>
9f531c900fbb1f99061a78b00644d28b4079d719 authored over 9 years ago by Arkiver2 <[email protected]>
Update pipeline.py
61dd537f153f18ed20253bc08f4635af0d88291e authored over 9 years ago by Arkiver2 <[email protected]>
61dd537f153f18ed20253bc08f4635af0d88291e authored over 9 years ago by Arkiver2 <[email protected]>
pipeline.py
ff1bb532c6fe10195ef9fabe7162e81239b2323e authored over 9 years ago by Arkiver2 <[email protected]>
ff1bb532c6fe10195ef9fabe7162e81239b2323e authored over 9 years ago by Arkiver2 <[email protected]>
first files
2ab52c1b3dbd5642c9314631072088b432246596 authored over 9 years ago by Arkiver2 <[email protected]>
2ab52c1b3dbd5642c9314631072088b432246596 authored over 9 years ago by Arkiver2 <[email protected]>