Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/yahooanswers-grab
Saving all questions and answers from Yahoo! Answers.
https://github.com/ArchiveTeam/yahooanswers-grab
Version 20210505.02. Disable support for qid items.
2801ed94a5281f230376158273b8ec3db74ae693 authored over 3 years ago by arkiver <[email protected]>
2801ed94a5281f230376158273b8ec3db74ae693 authored over 3 years ago by arkiver <[email protected]>
Version 20210505.01. Support new Wget-AT version 1.20.3-at.20210504.01. Do not get _reservice_ URLs.
81a1b9687db98666438202e2dd83101c55a0c9ae authored over 3 years ago by arkiver <[email protected]>
81a1b9687db98666438202e2dd83101c55a0c9ae authored over 3 years ago by arkiver <[email protected]>
Version 20210430.02. Allow off-by-one payload count differences.
331cf89a5caba4728223bbde618f26fb319ba22f authored over 3 years ago by arkiver <[email protected]>
331cf89a5caba4728223bbde618f26fb319ba22f authored over 3 years ago by arkiver <[email protected]>
Version 20210430.01. Set max tries on 403 to 2.
471248dd2832efdb23ec88444449349222fe5909 authored over 3 years ago by arkiver <[email protected]>
471248dd2832efdb23ec88444449349222fe5909 authored over 3 years ago by arkiver <[email protected]>
Version 20210429.02. Do not require 3 related questions on a FETCH_EXTRA_QUESTION_LIST_END _reservice_ call. Retry _reservice_ URL for kid item in case of timeout.
c129c503bb8e19e168f9ee9163a3df238c9b3314 authored over 3 years ago by arkiver <[email protected]>
c129c503bb8e19e168f9ee9163a3df238c9b3314 authored over 3 years ago by arkiver <[email protected]>
20210429.01. Re-enable support for KID items
e3004fc5e243885f992118e3ea4af12c80b5bb9a authored over 3 years ago by Thomas Glass <[email protected]>
e3004fc5e243885f992118e3ea4af12c80b5bb9a authored over 3 years ago by Thomas Glass <[email protected]>
Version 20210428.01. Do not check for related questions on web page of question.
f683b8bfdb93a282dba139b1a31f6b76417dbbfd authored over 3 years ago by arkiver <[email protected]>
f683b8bfdb93a282dba139b1a31f6b76417dbbfd authored over 3 years ago by arkiver <[email protected]>
Version 20210427.06. Change order of getting _reservice_ URLs.
9a58cea80c7e90d154c351b2c92c572fc3cc53ed authored over 3 years ago by arkiver <[email protected]>
9a58cea80c7e90d154c351b2c92c572fc3cc53ed authored over 3 years ago by arkiver <[email protected]>
Version 20210427.05. Do not abort on empty payload for FETCH_EXTRA_QUESTION_LIST_END for tw, th, hk questions.
a90873a43bafe2c039114d95fc0bbaebeaa94dbf authored over 3 years ago by arkiver <[email protected]>
a90873a43bafe2c039114d95fc0bbaebeaa94dbf authored over 3 years ago by arkiver <[email protected]>
Version 20210427.04. Move discovery of items.
95bd52d879980e040e60250d54275812b1edeb29 authored over 3 years ago by arkiver <[email protected]>
95bd52d879980e040e60250d54275812b1edeb29 authored over 3 years ago by arkiver <[email protected]>
Version 20210427.03. Accept at least 1 trending question on web page.
c31d78837cfd9b3780c58358cc70e5da3c39f4f4 authored over 3 years ago by arkiver <[email protected]>
c31d78837cfd9b3780c58358cc70e5da3c39f4f4 authored over 3 years ago by arkiver <[email protected]>
Version 20210427.02. Disable support for kid items.
8dc9c87dd0fa469d8e289e928b929f2a20cef5be authored over 3 years ago by arkiver <[email protected]>
8dc9c87dd0fa469d8e289e928b929f2a20cef5be authored over 3 years ago by arkiver <[email protected]>
Version 20210427.01. Get only main answers.yahoo.com domains. Check for only 5 related questions.
871e8ec17d8585454a78b3cb9b9e26b50dd94468 authored over 3 years ago by arkiver <[email protected]>
871e8ec17d8585454a78b3cb9b9e26b50dd94468 authored over 3 years ago by arkiver <[email protected]>
Version 20210423.06. Relax strict 10 related questions on HTML page for language domains.
7504a5b55d43fe321bd13bba5ff29854ac75c377 authored over 3 years ago by arkiver <[email protected]>
7504a5b55d43fe321bd13bba5ff29854ac75c377 authored over 3 years ago by arkiver <[email protected]>
Version 20210423.05.
17114d691706599823bd7475ff34df94501c16f4 authored over 3 years ago by arkiver <[email protected]>
17114d691706599823bd7475ff34df94501c16f4 authored over 3 years ago by arkiver <[email protected]>
Version 20210423.04. Handle espanol and malaysia language domains.
b3c68bd250f1d8f94111155a080e002231180efe authored over 3 years ago by arkiver <[email protected]>
b3c68bd250f1d8f94111155a080e002231180efe authored over 3 years ago by arkiver <[email protected]>
Version 20210423.03. Only get /_reservice_/ URLs on answers.yahoo.com.
d12e62fd071c8621dc2a498f93ca088eeac1f916 authored over 3 years ago by arkiver <[email protected]>
d12e62fd071c8621dc2a498f93ca088eeac1f916 authored over 3 years ago by arkiver <[email protected]>
Version 20210423.02.
f40b0625377c751645b74fb2fa2a13eb22bb130b authored over 3 years ago by arkiver <[email protected]>
f40b0625377c751645b74fb2fa2a13eb22bb130b authored over 3 years ago by arkiver <[email protected]>
Version 20210423.01. Support kid items.
556001fdd8334c2c8c88742ddbee2cd68f04b042 authored over 3 years ago by arkiver <[email protected]>
556001fdd8334c2c8c88742ddbee2cd68f04b042 authored over 3 years ago by arkiver <[email protected]>
Version 20210421.01. Submit discovered items at abort.
165663bcac3b8b65138fdc24ba05b7b5a329e624 authored over 3 years ago by arkiver <[email protected]>
165663bcac3b8b65138fdc24ba05b7b5a329e624 authored over 3 years ago by arkiver <[email protected]>
Change backfeed domain
a06166bb9b20635acc9b5959dfc09325772e6820 authored over 3 years ago by JustAnotherArchivist <[email protected]>
a06166bb9b20635acc9b5959dfc09325772e6820 authored over 3 years ago by JustAnotherArchivist <[email protected]>
Version 20210415.02. Handle inconsistencies in Yahoo's data.
85824449d273a4022acc23ed32f1004e3705c148 authored over 3 years ago by arkiver <[email protected]>
85824449d273a4022acc23ed32f1004e3705c148 authored over 3 years ago by arkiver <[email protected]>
Version 20210415.01. User browser useragents. Add many checks to ensure correct data. Extract URLs from reference text.
986423d7649fa667deddf8f2c57aea473af96495 authored over 3 years ago by arkiver <[email protected]>
986423d7649fa667deddf8f2c57aea473af96495 authored over 3 years ago by arkiver <[email protected]>
Merge pull request #6 from Nintendofan885/patch-1
Add 'hk' to languages
5b55e771c7e3224d17ac296dc2339ad3351a8f90 authored over 3 years ago by Arkiver2 <[email protected]>
Add 'hk' to languages
45e0abe9a43a900292571e27284aa46328526988 authored over 3 years ago by Nintendofan885 <[email protected]>
45e0abe9a43a900292571e27284aa46328526988 authored over 3 years ago by Nintendofan885 <[email protected]>
Version 20210413.07. Bump minimum found qid items to very strict 10.
057cdfaa8886de1b054a8105f6cbed5829dfed3f authored over 3 years ago by arkiver <[email protected]>
057cdfaa8886de1b054a8105f6cbed5829dfed3f authored over 3 years ago by arkiver <[email protected]>
Version 20210413.06. Ensure enough qid items were found on page.
1d59f2eed7d7331948f1df269adc8a5a2ea9df78 authored over 3 years ago by arkiver <[email protected]>
1d59f2eed7d7331948f1df269adc8a5a2ea9df78 authored over 3 years ago by arkiver <[email protected]>
Version 20210413.05. Check if answers are in HTML as well.
f94f2f4886bbb56c0763b96044ae2a44e95e361e authored over 3 years ago by arkiver <[email protected]>
f94f2f4886bbb56c0763b96044ae2a44e95e361e authored over 3 years ago by arkiver <[email protected]>
Version 20210413.04. Extras check to ensure data on webpage is correct.
95898c191f1d62a89cc6a69900d50e19ec92c012 authored over 3 years ago by arkiver <[email protected]>
95898c191f1d62a89cc6a69900d50e19ec92c012 authored over 3 years ago by arkiver <[email protected]>
Version 20210413.04. Extras check to ensure data on webpage is correct.
db25246e43f54141b17ecf788bbd8169a6f45a3d authored over 3 years ago by arkiver <[email protected]>
db25246e43f54141b17ecf788bbd8169a6f45a3d authored over 3 years ago by arkiver <[email protected]>
Version 20210413.03. Add languages file.
46675106f3a1afb2a6cad1c0b078cbf03a528d0d authored over 3 years ago by arkiver <[email protected]>
46675106f3a1afb2a6cad1c0b078cbf03a528d0d authored over 3 years ago by arkiver <[email protected]>
Version 20210413.02. Get only regional and global Yahoo Answers.
0475319c15c5ab94eb879acc429ff2b0c3368231 authored over 3 years ago by arkiver <[email protected]>
0475319c15c5ab94eb879acc429ff2b0c3368231 authored over 3 years ago by arkiver <[email protected]>
Versin 20210413.01. Do not write 429 and 500 responses to WARC.
ba60225e8850fdf5805fe738b45892f42457926a authored over 3 years ago by arkiver <[email protected]>
ba60225e8850fdf5805fe738b45892f42457926a authored over 3 years ago by arkiver <[email protected]>
Version 20210412.01. Skip getting related questions question, and first answers to related questions.
c6be2f2ef84e219c0216a59271c75006083360f1 authored over 3 years ago by arkiver <[email protected]>
c6be2f2ef84e219c0216a59271c75006083360f1 authored over 3 years ago by arkiver <[email protected]>
poke drone (hello :wave:)
180e4b55fabcb73b39e61f629e80a35c1b454457 authored over 3 years ago by Thomas Glass <[email protected]>
180e4b55fabcb73b39e61f629e80a35c1b454457 authored over 3 years ago by Thomas Glass <[email protected]>
- Use openssl wget-at on warrior, no version bump required
f40cb1edc63110bda9e1a8cbb6c1ec1b4dc511bd authored over 3 years ago by Thomas Glass <[email protected]>
f40cb1edc63110bda9e1a8cbb6c1ec1b4dc511bd authored over 3 years ago by Thomas Glass <[email protected]>
Version 20210410.06. Use Wget-AT 1.20.3-at.20210410.01. More reliably get values from JSON. Fix abort on 403 for s.yimg.com. Retry on bad status code for main question page. Do not get all sort types. Ignore rss and amp URLs.
c030c827f3af8a1c48985308c4c0c7734278dc7e authored over 3 years ago by arkiver <[email protected]>
c030c827f3af8a1c48985308c4c0c7734278dc7e authored over 3 years ago by arkiver <[email protected]>
20210410.05 - Cleaner handling of kid/dir items
e952ee4014eca28607c530d3acc592c5f10b5b21 authored over 3 years ago by Thomas Glass <[email protected]>
e952ee4014eca28607c530d3acc592c5f10b5b21 authored over 3 years ago by Thomas Glass <[email protected]>
20210410.04 - change dockerfile & change error for dir/kid items
35766b1464bd464074d2106cbff5f6b9ee6b5896 authored over 3 years ago by Thomas Glass <[email protected]>
35766b1464bd464074d2106cbff5f6b9ee6b5896 authored over 3 years ago by Thomas Glass <[email protected]>
20210410.03 - disable multi item
c581a5c52e7e04a0f4efc1aa10d3655a438f49be authored over 3 years ago by Thomas Glass <[email protected]>
c581a5c52e7e04a0f4efc1aa10d3655a438f49be authored over 3 years ago by Thomas Glass <[email protected]>
Version 20210410.02. Commit ignore-list.
2fbabc601dea782b172844a5f20381d6f77a99a2 authored over 3 years ago by arkiver <[email protected]>
2fbabc601dea782b172844a5f20381d6f77a99a2 authored over 3 years ago by arkiver <[email protected]>
Fix website spelling.
5cef589fe4f41c25c7b352a92ff7a5027ea66eeb authored over 3 years ago by arkiver <[email protected]>
5cef589fe4f41c25c7b352a92ff7a5027ea66eeb authored over 3 years ago by arkiver <[email protected]>
Version 20210410.01.
376fb6acf834b472ad25ce6c29a14631c7f13e80 authored over 3 years ago by arkiver <[email protected]>
376fb6acf834b472ad25ce6c29a14631c7f13e80 authored over 3 years ago by arkiver <[email protected]>
Discover outlinks.
4c25d1930b2f26e407463401aab84ae065e70a8b authored over 3 years ago by arkiver <[email protected]>
4c25d1930b2f26e407463401aab84ae065e70a8b authored over 3 years ago by arkiver <[email protected]>
Support getting images. Get all sort types for paginated comments.
9445da880ca3b599054681769c6580c818c4f5e0 authored over 3 years ago by arkiver <[email protected]>
9445da880ca3b599054681769c6580c818c4f5e0 authored over 3 years ago by arkiver <[email protected]>
Use ZSTD dictionary compression.
b2906e79904fb85016067d050215991dc94ecac3 authored over 3 years ago by arkiver <[email protected]>
b2906e79904fb85016067d050215991dc94ecac3 authored over 3 years ago by arkiver <[email protected]>
Disable support for dir and kid items for now.
2d7fe50741bd2e0ee30dcc8458fd9257b2a76dbb authored over 3 years ago by arkiver <[email protected]>
2d7fe50741bd2e0ee30dcc8458fd9257b2a76dbb authored over 3 years ago by arkiver <[email protected]>
Support queueing items.
06ad4fe7efc88bb13fcea436c424264e3200cff1 authored over 3 years ago by arkiver <[email protected]>
06ad4fe7efc88bb13fcea436c424264e3200cff1 authored over 3 years ago by arkiver <[email protected]>
Extract more items. Rename user item to kid.
5837e129c3612ed571a1b2f14f48f698bd9e09ea authored over 3 years ago by arkiver <[email protected]>
5837e129c3612ed571a1b2f14f48f698bd9e09ea authored over 3 years ago by arkiver <[email protected]>
Update README.md
5211113acae0b69106c0e60c7a1d84546ba56c18 authored over 3 years ago by Thomas Glass <[email protected]>
5211113acae0b69106c0e60c7a1d84546ba56c18 authored over 3 years ago by Thomas Glass <[email protected]>
Version 20210408.02. Use tracker ID yahooanswers2.
502a46a907ec3c96a0d5a4aee186fd2a9fb82f1f authored over 3 years ago by arkiver <[email protected]>
502a46a907ec3c96a0d5a4aee186fd2a9fb82f1f authored over 3 years ago by arkiver <[email protected]>
Version 20210408.01. Support qid items.
35f81e4e4e9d94eb7c5a05e92bb3d4c8f65c64a1 authored over 3 years ago by arkiver <[email protected]>
35f81e4e4e9d94eb7c5a05e92bb3d4c8f65c64a1 authored over 3 years ago by arkiver <[email protected]>
README: add flex and autoconf dependencies
f3141cb23461baf4b643ca9628a971f9612e88a7 authored almost 8 years ago by Arkiver2 <[email protected]>
f3141cb23461baf4b643ca9628a971f9612e88a7 authored almost 8 years ago by Arkiver2 <[email protected]>
Fix unicode encoding problem.
Bump version to 20160831.01.
dbf2e6bab3ad30145bb70c812cc5cadd54cc4f90 authored over 8 years ago by Arkiver2 <[email protected]>
Remove urllib2 import. Bump version to 20160825.01.
30c048aaa6894e37a55a9b91fef0a55fcd4aaa70 authored over 8 years ago by Arkiver2 <[email protected]>
30c048aaa6894e37a55a9b91fef0a55fcd4aaa70 authored over 8 years ago by Arkiver2 <[email protected]>
Merge pull request #2 from bzc6p/patch-1
0.7 sec sleep time between requests
3e6e6af0c8870b53d9e46ffd687bbba20ff3601a authored over 8 years ago by Arkiver2 <[email protected]>
0.7 sec sleep time between requests
With concurrency of 2, it is 3 urls/sec, probably safe.
b68634afd7648736b0f202a48721d092b4e5f8ab authored over 8 years ago by bzc6p <[email protected]>
set as executable files
3580d75c391e3e9a455962eec7325369c2017a9a authored over 8 years ago by Arkiver2 <[email protected]>
3580d75c391e3e9a455962eec7325369c2017a9a authored over 8 years ago by Arkiver2 <[email protected]>
initial
edb393aa86a6fd30ca3b5907462b2b62b6d506e0 authored over 8 years ago by Arkiver2 <[email protected]>
edb393aa86a6fd30ca3b5907462b2b62b6d506e0 authored over 8 years ago by Arkiver2 <[email protected]>