Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/proust-pulling

Hacky tools for downloading public Proust stories
https://github.com/ArchiveTeam/proust-pulling

Retrieve non-story content from Proust.

1a68556e7d7eb7382d3d7bbdbf4ee5bab86131cb authored almost 13 years ago by David Yip <[email protected]>
Retrieve http://www.proust.com/story/user (without trailing /).

7cfadb8b91cfa770f4e309e2613661e685ab7150 authored almost 13 years ago by David Yip <[email protected]>
Simpler upload script.

51485327c7286f3fe7814fcb41cfe48672af2ff6 authored almost 13 years ago by David Yip <[email protected]>
Upload script.

3bb9b1f7682605dcd8b15012defe9876a6e7ad0d authored almost 13 years ago by David Yip <[email protected]>
Fix escaping.

3266696cccb934d58abb7dee1cff817494f474a3 authored almost 13 years ago by David Yip <[email protected]>
If a member's page can't be found, log it and move on.

e8ddae0d5377ebd7e2618ec3d252c7fd9949db51 authored almost 13 years ago by David Yip <[email protected]>
Defer retrieval of private stories.

f946ff6f4c2756d01c979139757b55ecf62333a8 authored almost 13 years ago by David Yip <[email protected]>
Remove fetch data after a story has been successfully grabbed.

f1fa6b66e484495aaddd8f0c0106c1f69b826433 authored almost 13 years ago by David Yip <[email protected]>
A script to download blog.proust.com.

ed7fd7cac641b4e775e61ba1377c236d951c83e9 authored almost 13 years ago by David Yip <[email protected]>
Use Proust's sitemap to discover users.

0ba46a47984dbbefe5b5cc969b7b4fae5ae3cd1c authored almost 13 years ago by David Yip <[email protected]>
Fix log location.

4c447e85990f7f2057eaea73c647ead0f477e034 authored almost 13 years ago by David Yip <[email protected]>
A script to download all of Proust.

2bb0bf37b0abdf99f3ae66dc5f2b27c4f713f857 authored almost 13 years ago by David Yip <[email protected]>
Move Redis configuration to a separate file.

baf10ab776e4aefe85b0cdecf4d1166193502eab authored almost 13 years ago by David Yip <[email protected]>
Use Ruby's logger for, uh, logging.

f2a6c1246288782ea655e9ff050cfc46a2b1ecf6 authored almost 13 years ago by David Yip <[email protected]>
Fix typo.

e27ad5859a481eae7a1e417055156ff67a43bbcc authored almost 13 years ago by David Yip <[email protected]>
More sophisticated wget status checks.

We can tolerate 404s: Proust issues those from a wget mirroring
operation, and it doesn't seem t...

bb6e05cd1f7c0d6e1118ec36541f206b676d55bd authored almost 13 years ago by David Yip <[email protected]>
Use Redis key expiration for work tracking.

5276aa1fcb66da4f656516702f35ce24d53592fe authored almost 13 years ago by David Yip <[email protected]>
Fetch the last page of Proust public URLs.

93c234175123d4bc96823f695424136024baf3d3 authored almost 13 years ago by David Yip <[email protected]>
If something goes wrong fetching memorabilia URLs, yield an empty list.

2d2cdd429932846369c3c9c26ff21288cfb61ca5 authored almost 13 years ago by David Yip <[email protected]>
Use system to execute subordinate scripts.

This avoids unnecessary capture of standard output.

7e6171ca7fae758f17184d63cb6375f39fc4f781 authored almost 13 years ago by David Yip <[email protected]>
Redirect fetcher output to stderr.

ea5fd90b1bb8c075b2ada7b741a9bf94281b5947 authored almost 13 years ago by David Yip <[email protected]>
Grab memorabilia URLs from get_one_story.rb.

675e996bb5c1fc8c6cbb54aaa7aca83540c09c89 authored almost 13 years ago by David Yip <[email protected]>
Include full-size memorabilia in the WARC.

08d8ff4d376eca7f56ce05ebfde9b1bdbeebe24b authored almost 13 years ago by David Yip <[email protected]>
Include external Javascripts referenced by Proust pages.

These seem to be needed: Proust pages break in funny ways when they're
missing. Also, systems l...

74339069ca7388a507cb928a95222ac930c67168 authored almost 13 years ago by David Yip <[email protected]>
Move command line generation code to a utility module.

Also add a script to download one story.

09d6133ec27cd5a12e78a4ac6855de563d02e84b authored almost 13 years ago by David Yip <[email protected]>
Don't force a directory structure.

Proust uses paths like this:

/memorabilia
/memorabilia/12345/post

wget will encounter the firs...

953b5d908cd3538e9c21a1a6e44212ae617794c8 authored almost 13 years ago by David Yip <[email protected]>
Record in-progress and pending members.

This allows for multiple get_stories processes.

a39afb03fabad9367d9e4a6e9b54fc980f72ef65 authored almost 13 years ago by David Yip <[email protected]>
Remove unused ACCEPT array.

ca967e62444a2c4daf34911b23497722eecc0270 authored almost 13 years ago by David Yip <[email protected]>
Add story grabber; rename get_story to better reflect its usage.

3e8d3f5f9cdeefde08b949c3b84b93e73ec06557 authored almost 13 years ago by David Yip <[email protected]>
Fix ignore-robots command.

9e1fed7a12601480f6f1a382f60282da9afa63e1 authored almost 13 years ago by David Yip <[email protected]>
Escape shell arguments, rearrange code.

da0bed5086a0b3f31b6b60be1c82c0396556db5a authored almost 13 years ago by David Yip <[email protected]>
Ignore generated data.

51508e121a8be07f6286bc544e2f859d1cb43321 authored almost 13 years ago by David Yip <[email protected]>
Limit scope to a single UID.

9992ff9e4c1cb02196737a1c69d63a4e60c4314d authored almost 13 years ago by David Yip <[email protected]>
First cut at story puller.

468c84bbb5585fbd35f21417f1d12a44d73f48fe authored almost 13 years ago by David Yip <[email protected]>
Remove compiled wget-warc.

d7145b095263e738125e120d2a12a460cd1b0012 authored almost 13 years ago by David Yip <[email protected]>
Gem environment.

a8949730602fef8fa3056cff8ac05a6945bb65fd authored almost 13 years ago by David Yip <[email protected]>
Initial commit

3b34c248f67bb9ac701b1d79a2a2a9202cae5bfa authored almost 13 years ago by David Yip <[email protected]>