Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/proust-pulling
Hacky tools for downloading public Proust stories
https://github.com/ArchiveTeam/proust-pulling
Retrieve non-story content from Proust.
1a68556e7d7eb7382d3d7bbdbf4ee5bab86131cb authored almost 13 years ago by David Yip <[email protected]>
1a68556e7d7eb7382d3d7bbdbf4ee5bab86131cb authored almost 13 years ago by David Yip <[email protected]>
Retrieve http://www.proust.com/story/user (without trailing /).
7cfadb8b91cfa770f4e309e2613661e685ab7150 authored almost 13 years ago by David Yip <[email protected]>
7cfadb8b91cfa770f4e309e2613661e685ab7150 authored almost 13 years ago by David Yip <[email protected]>
Simpler upload script.
51485327c7286f3fe7814fcb41cfe48672af2ff6 authored almost 13 years ago by David Yip <[email protected]>
51485327c7286f3fe7814fcb41cfe48672af2ff6 authored almost 13 years ago by David Yip <[email protected]>
Upload script.
3bb9b1f7682605dcd8b15012defe9876a6e7ad0d authored almost 13 years ago by David Yip <[email protected]>
3bb9b1f7682605dcd8b15012defe9876a6e7ad0d authored almost 13 years ago by David Yip <[email protected]>
Fix escaping.
3266696cccb934d58abb7dee1cff817494f474a3 authored almost 13 years ago by David Yip <[email protected]>
3266696cccb934d58abb7dee1cff817494f474a3 authored almost 13 years ago by David Yip <[email protected]>
If a member's page can't be found, log it and move on.
e8ddae0d5377ebd7e2618ec3d252c7fd9949db51 authored almost 13 years ago by David Yip <[email protected]>
e8ddae0d5377ebd7e2618ec3d252c7fd9949db51 authored almost 13 years ago by David Yip <[email protected]>
Defer retrieval of private stories.
f946ff6f4c2756d01c979139757b55ecf62333a8 authored almost 13 years ago by David Yip <[email protected]>
f946ff6f4c2756d01c979139757b55ecf62333a8 authored almost 13 years ago by David Yip <[email protected]>
Remove fetch data after a story has been successfully grabbed.
f1fa6b66e484495aaddd8f0c0106c1f69b826433 authored almost 13 years ago by David Yip <[email protected]>
f1fa6b66e484495aaddd8f0c0106c1f69b826433 authored almost 13 years ago by David Yip <[email protected]>
A script to download blog.proust.com.
ed7fd7cac641b4e775e61ba1377c236d951c83e9 authored almost 13 years ago by David Yip <[email protected]>
ed7fd7cac641b4e775e61ba1377c236d951c83e9 authored almost 13 years ago by David Yip <[email protected]>
Use Proust's sitemap to discover users.
0ba46a47984dbbefe5b5cc969b7b4fae5ae3cd1c authored almost 13 years ago by David Yip <[email protected]>
0ba46a47984dbbefe5b5cc969b7b4fae5ae3cd1c authored almost 13 years ago by David Yip <[email protected]>
Fix log location.
4c447e85990f7f2057eaea73c647ead0f477e034 authored almost 13 years ago by David Yip <[email protected]>
4c447e85990f7f2057eaea73c647ead0f477e034 authored almost 13 years ago by David Yip <[email protected]>
A script to download all of Proust.
2bb0bf37b0abdf99f3ae66dc5f2b27c4f713f857 authored almost 13 years ago by David Yip <[email protected]>
2bb0bf37b0abdf99f3ae66dc5f2b27c4f713f857 authored almost 13 years ago by David Yip <[email protected]>
Move Redis configuration to a separate file.
baf10ab776e4aefe85b0cdecf4d1166193502eab authored almost 13 years ago by David Yip <[email protected]>
baf10ab776e4aefe85b0cdecf4d1166193502eab authored almost 13 years ago by David Yip <[email protected]>
Use Ruby's logger for, uh, logging.
f2a6c1246288782ea655e9ff050cfc46a2b1ecf6 authored almost 13 years ago by David Yip <[email protected]>
f2a6c1246288782ea655e9ff050cfc46a2b1ecf6 authored almost 13 years ago by David Yip <[email protected]>
Fix typo.
e27ad5859a481eae7a1e417055156ff67a43bbcc authored almost 13 years ago by David Yip <[email protected]>
e27ad5859a481eae7a1e417055156ff67a43bbcc authored almost 13 years ago by David Yip <[email protected]>
More sophisticated wget status checks.
We can tolerate 404s: Proust issues those from a wget mirroring
operation, and it doesn't seem t...
Use Redis key expiration for work tracking.
5276aa1fcb66da4f656516702f35ce24d53592fe authored almost 13 years ago by David Yip <[email protected]>
5276aa1fcb66da4f656516702f35ce24d53592fe authored almost 13 years ago by David Yip <[email protected]>
Fetch the last page of Proust public URLs.
93c234175123d4bc96823f695424136024baf3d3 authored almost 13 years ago by David Yip <[email protected]>
93c234175123d4bc96823f695424136024baf3d3 authored almost 13 years ago by David Yip <[email protected]>
If something goes wrong fetching memorabilia URLs, yield an empty list.
2d2cdd429932846369c3c9c26ff21288cfb61ca5 authored almost 13 years ago by David Yip <[email protected]>
2d2cdd429932846369c3c9c26ff21288cfb61ca5 authored almost 13 years ago by David Yip <[email protected]>
Use system to execute subordinate scripts.
This avoids unnecessary capture of standard output.
7e6171ca7fae758f17184d63cb6375f39fc4f781 authored almost 13 years ago by David Yip <[email protected]>
Redirect fetcher output to stderr.
ea5fd90b1bb8c075b2ada7b741a9bf94281b5947 authored almost 13 years ago by David Yip <[email protected]>
ea5fd90b1bb8c075b2ada7b741a9bf94281b5947 authored almost 13 years ago by David Yip <[email protected]>
Grab memorabilia URLs from get_one_story.rb.
675e996bb5c1fc8c6cbb54aaa7aca83540c09c89 authored almost 13 years ago by David Yip <[email protected]>
675e996bb5c1fc8c6cbb54aaa7aca83540c09c89 authored almost 13 years ago by David Yip <[email protected]>
Include full-size memorabilia in the WARC.
08d8ff4d376eca7f56ce05ebfde9b1bdbeebe24b authored almost 13 years ago by David Yip <[email protected]>
08d8ff4d376eca7f56ce05ebfde9b1bdbeebe24b authored almost 13 years ago by David Yip <[email protected]>
Include external Javascripts referenced by Proust pages.
These seem to be needed: Proust pages break in funny ways when they're
missing. Also, systems l...
Move command line generation code to a utility module.
Also add a script to download one story.
09d6133ec27cd5a12e78a4ac6855de563d02e84b authored almost 13 years ago by David Yip <[email protected]>
Don't force a directory structure.
Proust uses paths like this:
/memorabilia
/memorabilia/12345/post
wget will encounter the firs...
953b5d908cd3538e9c21a1a6e44212ae617794c8 authored almost 13 years ago by David Yip <[email protected]>
Record in-progress and pending members.
This allows for multiple get_stories processes.
a39afb03fabad9367d9e4a6e9b54fc980f72ef65 authored almost 13 years ago by David Yip <[email protected]>
Remove unused ACCEPT array.
ca967e62444a2c4daf34911b23497722eecc0270 authored almost 13 years ago by David Yip <[email protected]>
ca967e62444a2c4daf34911b23497722eecc0270 authored almost 13 years ago by David Yip <[email protected]>
Add story grabber; rename get_story to better reflect its usage.
3e8d3f5f9cdeefde08b949c3b84b93e73ec06557 authored almost 13 years ago by David Yip <[email protected]>
3e8d3f5f9cdeefde08b949c3b84b93e73ec06557 authored almost 13 years ago by David Yip <[email protected]>
Fix ignore-robots command.
9e1fed7a12601480f6f1a382f60282da9afa63e1 authored almost 13 years ago by David Yip <[email protected]>
9e1fed7a12601480f6f1a382f60282da9afa63e1 authored almost 13 years ago by David Yip <[email protected]>
Escape shell arguments, rearrange code.
da0bed5086a0b3f31b6b60be1c82c0396556db5a authored almost 13 years ago by David Yip <[email protected]>
da0bed5086a0b3f31b6b60be1c82c0396556db5a authored almost 13 years ago by David Yip <[email protected]>
Ignore generated data.
51508e121a8be07f6286bc544e2f859d1cb43321 authored almost 13 years ago by David Yip <[email protected]>
51508e121a8be07f6286bc544e2f859d1cb43321 authored almost 13 years ago by David Yip <[email protected]>
Limit scope to a single UID.
9992ff9e4c1cb02196737a1c69d63a4e60c4314d authored almost 13 years ago by David Yip <[email protected]>
9992ff9e4c1cb02196737a1c69d63a4e60c4314d authored almost 13 years ago by David Yip <[email protected]>
First cut at story puller.
468c84bbb5585fbd35f21417f1d12a44d73f48fe authored almost 13 years ago by David Yip <[email protected]>
468c84bbb5585fbd35f21417f1d12a44d73f48fe authored almost 13 years ago by David Yip <[email protected]>
Remove compiled wget-warc.
d7145b095263e738125e120d2a12a460cd1b0012 authored almost 13 years ago by David Yip <[email protected]>
d7145b095263e738125e120d2a12a460cd1b0012 authored almost 13 years ago by David Yip <[email protected]>
Gem environment.
a8949730602fef8fa3056cff8ac05a6945bb65fd authored almost 13 years ago by David Yip <[email protected]>
a8949730602fef8fa3056cff8ac05a6945bb65fd authored almost 13 years ago by David Yip <[email protected]>
Initial commit
3b34c248f67bb9ac701b1d79a2a2a9202cae5bfa authored almost 13 years ago by David Yip <[email protected]>
3b34c248f67bb9ac701b1d79a2a2a9202cae5bfa authored almost 13 years ago by David Yip <[email protected]>