Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/yahooblog2-grab


https://github.com/ArchiveTeam/yahooblog2-grab

Filter out another URL pattern

c769e8f6bfc15857ae532a2e561001385eae6d17 authored about 11 years ago by Ivan Kozik <[email protected]>
Get less redundant data from blog.yahoo.com

e0f459a9dc2b92543e7dcf8fc677dec4534ad2f2 authored about 11 years ago by Ivan Kozik <[email protected]>
Avoid all URLs with a backslash to avoid descending into the void

45e102dc2dcb11ccef43d9104a9490f940929f9f authored about 11 years ago by Ivan Kozik <[email protected]>
Ports give-up logic from wretch-grab.

5c872f053910ed8ecd90c6114646d452f2aa4f02 authored about 11 years ago by Christopher Foo <[email protected]>
Bump version. Closes #5.

a5a31a67514559380e6f1518b1b303ce387449e4 authored about 11 years ago by Christopher Foo <[email protected]>
Don't fall down rabbit holes.

Some Yahoo! Blog user accounts have data that generates an infinitely
(or at least unpleasantly)...

a40e5491a1f57c5fda37f3b4184a41959c11a1bb authored about 11 years ago by David Yip <[email protected]>
Bump pipeline version.

3310109f6d16e4f356576c0979d31d152285c26a authored about 11 years ago by David Yip <[email protected]>
Merge pull request #4 from ArchiveTeam/subvert-resolution

Override wget's DNS resolution with previously known Y!B IPs.

b50c66a33dfd7b48ddf7746c9c3367e41d65ee22 authored about 11 years ago by yipdw <[email protected]>
Override wget's DNS resolution with previously known Y!B IPs.

193847d366aca2d28156a0cc87a048fbfbd4be66 authored about 11 years ago by David Yip <[email protected]>
Use random user agent on each invoke of wget.

5e31eb8622d5d7874daa9a037660f6067dbaf7d5 authored about 11 years ago by Christopher Foo <[email protected]>
Enable --no-cookies and decrease sleep time to 0.1sec.

dd4594b1625ad4f36fe910bd65ea997a3f38cf06 authored about 11 years ago by Christopher Foo <[email protected]>
Merge pull request #3 from aggroskater/master

Layout tweak in readme.

69883a9b2869652467be0a972d8ef584416121b0 authored about 11 years ago by Christopher Foo <[email protected]>
Layout tweak in readme.

f3247a68fb613681a1d52af233cfb37ba0d3e9e7 authored about 11 years ago by Preston Maness <[email protected]>
readme.md: ~/.local/bin appears to be standard.

Also removes the statement about wget-lua

ccd4b95add929b57900ce998747b72f394c9a423 authored about 11 years ago by Christopher Foo <[email protected]>
Update README.md

Clarify instructions for unlucky users with no root access

0c07514159a489796d80d7fe28c6cb085c1d7d8b authored about 11 years ago by nemobis <[email protected]>
Skip broken rss feed (Closes #2)

c433758f0eb87d10ac251d6e4c5d94ea73d5b936 authored about 11 years ago by Christopher Foo <[email protected]>
Closes #1. Fixes --tries not set to inf.

Not using inf may miss pages when yahoo 999 banned.

38587fda908d50caa85340ccba4764f18e9c77c0 authored about 11 years ago by Christopher Foo <[email protected]>
Reject any abuse links. Fixes false 500.

Wget has a nasty habit of scraping urls from javascript.

In this page,
http://blog.yahoo.com/_V...

c7aa07ccb7e9252cdc0995f183df973633e263d3 authored about 11 years ago by Christopher Foo <[email protected]>
hyves-grab -> yahooblog2-grab.

03f6151264c91d9ca3ba754834ac60ff74f387b0 authored about 11 years ago by David Yip <[email protected]>
Use random user agent. Increase sleep time to 5 sec.

8dae0aaacc8997bfe5f5037018b7cf4251153b7f authored about 11 years ago by Christopher Foo <[email protected]>
Increase sleep request time to 3 sec.

c4590105528531ee27fc40aa46ba2f63dcd7368e authored about 11 years ago by Christopher Foo <[email protected]>
Increase ban sleep time to 60sec, request to 2sec.

e2d3b56f0179a07259f64be50daecb97e94f4dff authored about 11 years ago by Christopher Foo <[email protected]>
Replace the get-wget-lua.sh with newest version.

6fb877bf9666bf8f923e0db568dac25c444f4c25 authored about 11 years ago by Christopher Foo <[email protected]>
Add instructions to readme

276c0c81ccadc1aa63834f0b27213c5bb1ccf1be authored about 11 years ago by Christopher Foo <[email protected]>
Modernize the scripts.

6965f7b1625bd8a10cc01c9deb07fd4a312d6afa authored about 11 years ago by Christopher Foo <[email protected]>
Update version number.

d2874dcf9c77777bdaa52a234100b1cfb224d669 authored almost 12 years ago by Alard <[email protected]>
Escape the periods in .wikipedia.org.

134c65de2ea1af306d34d1e0238a22b4cd4014db authored almost 12 years ago by Alard <[email protected]>
fix the "fix" for --reject-regex

revert changes for the original regex portion, remove superfluous escaping on wikipedia portion

967b894ec1f60ea022da25207a0316f6d3c0cd4d authored almost 12 years ago by Brian Boucheron <[email protected]>
fix --reject-regex, add wikipedia url exclusion

fix existing --reject-regex (had one too many backslashes) and add .wikipedia.org to reject wiki...

67e0dc4bee518c79889ef17f3f1293d0f8736782 authored almost 12 years ago by Brian Boucheron <[email protected]>
Ignore urls with \, ', ".

e875aa61214351fe2ea38e6bf3c040f14749fe4f authored almost 12 years ago by Alard <[email protected]>
Back to normal uploads to fos.

605f9a943bc22854766525440ee7ac79c692c69c authored almost 12 years ago by Alard <[email protected]>
Handle 999.

68d6a1a6a5d585d50624ea1f9b2cffe9d33dd3d8 authored almost 12 years ago by Alard <[email protected]>
Upload to debugging area.

0fbfd3b686834ead97510d3a7e53727cbc06ff19 authored almost 12 years ago by Alard <[email protected]>
Show download progress.

1c6971cd7eae611b6685bc9ea39c880fe90cc25a authored about 12 years ago by Alard <[email protected]>
There is no usernames.txt to upload.

9faf7828a3816c72423c3480e33075c69eb44437 authored about 12 years ago by Alard <[email protected]>
Remove reference to previous project.

5c87c48b1703f2c3481a5adf9cf402d4181d97e4 authored about 12 years ago by Alard <[email protected]>
Initial commit.

0172c5ba03fc8604d1c3abf484bee1c82e8f04ed authored about 12 years ago by Alard <[email protected]>