Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/greader-grab

http://www.archiveteam.org/index.php?title=Google_Reader
https://github.com/ArchiveTeam/greader-grab

A 302 is a very likely CAPTCHA

6f22c350cdd5260956f0da738790cd222d057edf authored over 11 years ago by Ivan Kozik <[email protected]>
Accept all response codes except 503 to deal with a ~12,000 items that result in at least one 400 Bad Request

1146513968af29c11eb379c368e3d546dc2b679d authored over 11 years ago by Ivan Kozik <[email protected]>
Lower the unexpected response code delay to 60 seconds

63a31e06ce99b09bd94538306ac25fbfd7c7e34e authored over 11 years ago by Ivan Kozik <[email protected]>
Accept 414 Request URI Too Long responses as well

d15994be2d3f78308309a95c3f6f15f5a8bc8803 authored over 11 years ago by Ivan Kozik <[email protected]>
Allow two rsync threads by default

f1c3b82809b5834e07ce55d172a13c88cc73153e authored over 11 years ago by Ivan Kozik <[email protected]>
Yeah, let's get the last character of the href too

583f7f5bf4d855298b820f70d26c9704255862da authored over 11 years ago by Ivan Kozik <[email protected]>
Extract hrefs from response bodies in the CookWARC task and upload them to the target

e3c808cc07a270ef59b337eeb7c8cfbe96bae843 authored over 11 years ago by Ivan Kozik <[email protected]>
Monkeypatch seesaw's AsyncPopen to avoid crash resulting in endless spew of AttributeError: 'AsyncPopen' object has no attribute 'pipe'

c25d2eddb4a942d68b6344ed5a3f0d28878d08c0 authored over 11 years ago by Ivan Kozik <[email protected]>
CentOS may need --no-check-certificate as well due to domain mismatch

2f2fa9c386b53e469ee5a69e55bb85a7e1e2d58e authored over 11 years ago by Ivan Kozik <[email protected]>
Update dashboard URL

f4b7aa3c72608c6983336cffa671343f2fd54099 authored over 11 years ago by Ivan Kozik <[email protected]>
Print progress less frequently

1412c3034ce7b5209a0d1b98cab6e6cabba1071e authored over 11 years ago by Ivan Kozik <[email protected]>
README: suggest --disable-web-server

a0baa07a82dcf0ae233755952f6ace20bdad95b8 authored over 11 years ago by Ivan Kozik <[email protected]>
Report size of the cooked WARC to the tracker, not the original

ff6aa802db68e67e9c0f5011c0bfff0af32f0b6e authored over 11 years ago by Ivan Kozik <[email protected]>
Change tracker location

e9c4301ad1ea38bdce21be377807325c1bc5f5b7 authored over 11 years ago by Ivan Kozik <[email protected]>
Make the CookWARC step strip out the request/response pairs that 404ed

141ca4abec6ef6fa29f9f510003609e8f2e6cba6 authored over 11 years ago by Ivan Kozik <[email protected]>
Create a cooked WARC that has gunzipped HTTP responses; upload the cooked WARC

e948205f9c6e32a88e9ad6565f7b67ed0a945614 authored over 11 years ago by Ivan Kozik <[email protected]>
Quiet the 'Broken pipe' spew

e8da49e6374109bcc7c19adad7777a39ba101218 authored over 11 years ago by Ivan Kozik <[email protected]>
Add hack to append warc-tools to sys.path

3d48384ac525d77655aae56ded38619c4f596cb3 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix docstring

af5adad5422cb9f61840d1f3d86120c9d277754a authored over 11 years ago by Ivan Kozik <[email protected]>
Rename, as this is not the vanilla warc2warc.py

4a38fb074ca68342d3c65eacef318871dbded927 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix whitespace

d187edc322a8adf453180ec6b51b232fab69a98c authored over 11 years ago by Ivan Kozik <[email protected]>
Tabify warc2warc.py

711aeb11e7c0ba3a44f3ece01140a2619ce37d75 authored over 11 years ago by Ivan Kozik <[email protected]>
Import https://github.com/alard/warctozip/blob/master/warc2warc.py d36bbbaa529d146b5b25252edbcc8f966528d8aa

cd5fbea1b0e2b340887acebe6b70e74ca421a567 authored over 11 years ago by Ivan Kozik <[email protected]>
Import http://code.hanzoarchives.com/warc-tools fd3b49a

483588d27a11e30d7c6e5ce41354318f8fcc6b08 authored over 11 years ago by Ivan Kozik <[email protected]>
Ask for gzip-encoded responses from Google

46409b435b543d62218342e6842766bab5ef9b83 authored over 11 years ago by Ivan Kozik <[email protected]>
Support grabbing the task_urls from another server instead of the tracker

1c1d2d74d457b71daa9312e00ce0efa7fc6e1d32 authored over 11 years ago by Ivan Kozik <[email protected]>
Don't try to read from `nil` file

a1038c78d7d4eb72297fdfb9dabce2200c26e378 authored over 11 years ago by Ivan Kozik <[email protected]>
Give up on the WARC if we get a CAPTCHA page from Google

fffccee80b8a97b3694e8b455d9fc613367ece98 authored over 11 years ago by Ivan Kozik <[email protected]>
Revert autoinstall scripts - this pull request was created before the instructions were fixed

5e93663a7bd7407113f503f607c0a9b36797bac4 authored over 11 years ago by Ivan Kozik <[email protected]>
Update README according to 929d3bc

ac29b02f0b3b9b92d5b497b8d9a12798f36aacd4 authored over 11 years ago by Terry Wrist <[email protected]>
Clone autoinstall script for other distros

Cloning autoinstall script from https://gist.github.com/citruspi/5675678/raw/2e958708063a885abd1...

1d87df61ce2cf1e0d05610e959607ad7011ae50e authored over 11 years ago by Terry Wrist <[email protected]>
"Clone" Ubuntu one-line install script

Cloning one-line install script from https://gist.github.com/citruspi/5675678/raw/85fb1e43b965b0...

8db5f755f44978b3a3ee1c19d704ca222a0078b6 authored over 11 years ago by Terry Wrist <[email protected]>
You might need a little more disk

0400041300a00446ad61999e8683a302b2b236a6 authored over 11 years ago by Ivan Kozik <[email protected]>
Mention IRC channel here for those who miss the wiki

a71e271394e102db79b8921c1e2abdcabd031c9e authored over 11 years ago by Ivan Kozik <[email protected]>
Raise the --concurrent recommendation to 3

e6c43bb4db6f69105907577d6c3c2bf4d055ae95 authored over 11 years ago by Ivan Kozik <[email protected]>
Grab the continuation without decoding JSON. This should drastically reduce CPU and memory usage.

6ee171d250b4b43ba18cd3eea84e07e1c2bc25cf authored over 11 years ago by Ivan Kozik <[email protected]>
Revert "Grab the continuation without decoding JSON. This should drastically reduce CPU and memory usage."

This reverts commit 8125d15b7d569d15a283b32c13c1e3691a08cdf5.

Something went horribly wrong wit...

dfa397a4774b30868479c61e44e2c86a4c743620 authored over 11 years ago by Ivan Kozik <[email protected]>
Grab the continuation without decoding JSON. This should drastically reduce CPU and memory usage.

8125d15b7d569d15a283b32c13c1e3691a08cdf5 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix whitespace

e15c3db740a55782efc76767b48e568753f96744 authored over 11 years ago by Ivan Kozik <[email protected]>
Move 'start downloading' into per-OS steps; describe this grab

87254fc83a833ba3d19e097b83d029491a33db9b authored over 11 years ago by Ivan Kozik <[email protected]>
Fix minor README problems

db12c16cc49c1189b74d047882f1201dbd48aa6e authored over 11 years ago by Ivan Kozik <[email protected]>
No more need for libgnutls-dev

76ec5bfbf7d6ac25801c01b8acddec2114d1d8da authored over 11 years ago by Ivan Kozik <[email protected]>
Fix certs/ for openssl 0.9.8

5dbd33027345563cecad5e98c49e476063e1b338 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix Debian 6 steps: libgnutls-dev is needed (though we might remove this requirement later)

b68113e765978ef480a5bcc77d49b038e78c8658 authored over 11 years ago by Ivan Kozik <[email protected]>
README: fix pip-related steps; add Debian 6 steps

f6c7415d74053f2c01207d8378529c27d0f5ea28 authored over 11 years ago by Ivan Kozik <[email protected]>
This also works on Debian

c914efd7a978406ea0ff976e2ccbb50ffe75186a authored over 11 years ago by Ivan Kozik <[email protected]>
Bump VERSION

4ad4a7c07ee0dd7fcb813c09bf2332f2a9cfe2a7 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix WgetDownloadWithStdin by copying the rest of the definition from yahoo-upcoming-grab

6f4db8e2d777f3fcc54046d93a340345c6f5f410 authored over 11 years ago by Ivan Kozik <[email protected]>
chmod +x get-wget-lua.sh

b05c4618f2d44800ec7c8fc1431db76cbae2270e authored over 11 years ago by Ivan Kozik <[email protected]>
chmod +x the binaries

94821fae6e57730ba6f6cfc15ad961e609e928fd authored over 11 years ago by Ivan Kozik <[email protected]>
Fix docstring; there is .gz now

4f7a228748fec321b28642f0c4735cac7923f356 authored over 11 years ago by Ivan Kozik <[email protected]>
Read URLs from .gz'ed files to use 3-6x less disk space

b84840b71df98a2bf16fa0417886add17e9d6b00 authored over 11 years ago by Ivan Kozik <[email protected]>
Link to newer pip

5437f57ef0defaa9002b903ebfa19d0df3c0ef15 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix some README bugs

a082a8fbd31e225e633c6f5ab003b6dafc2c7b28 authored over 11 years ago by Ivan Kozik <[email protected]>
Add README

945a04eebc7ffec210a3bd5d5de0c653115a4d6b authored over 11 years ago by Ivan Kozik <[email protected]>
Allow customizing TRACKER_URL for debugging

c6457a6633feb445319bcf40739dd1cb81e4279c authored over 11 years ago by Ivan Kozik <[email protected]>
Switch tracker to tracker.archiveteam.org

aaf339eff30ae12aa6497b11084392d4713bce5e authored over 11 years ago by Ivan Kozik <[email protected]>
Accept only SSL certs signed by Google's root CA, EquifaxSecureCA; improve docstring

379591c5b1921d771bc95eca8d544db9d9ee6a72 authored over 11 years ago by Ivan Kozik <[email protected]>
Don't try to decode non-JSON responses; print better error for failed decodings

1d49757671b9c351ad52743aa5c1192513ed2d84 authored over 11 years ago by Ivan Kozik <[email protected]>
Add non-dummy code for universal-tracker (grab encoded URLs from files)

3a52b9bc51b380d04e10579fe01668b98a195d4b authored over 11 years ago by Ivan Kozik <[email protected]>
Get feed URLs from universal-tracker in a nice JSON Array instead of doing hairy splitting; get more parameters from tracker; pass URLs to wget through stdin; accept only latest wget-lua

b8c62bdd713086c260597ea37258c171c6a5a921 authored over 11 years ago by Ivan Kozik <[email protected]>
Get newer wget-lua tarball

a220a10a010d7549bbf9bd7ff70722df9dfcbc42 authored over 11 years ago by Ivan Kozik <[email protected]>
Copy newer wget-lua-warrior from posterous-grab

209cf38d3044db3e23b524a4a5392f4197b7d521 authored over 11 years ago by Ivan Kozik <[email protected]>
Copy wget-lua-warrior from posterous-grab

6e8cbd0db483d078e463ba709001e1581c3d57bb authored over 11 years ago by Ivan Kozik <[email protected]>
Don't set user_agent if it's already in item

d24c69d3fa159069b1e2cb0e9054d6024206d826 authored over 11 years ago by Ivan Kozik <[email protected]>
Avoid URL spew in terminal

7f112c4a63fd1bcb45a5557a1cac8a027f30dd34 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix broken code; hit /reader/ with the right URL params so that we get a continuation

b577b11ae4fc623de1fd819f6ccb16aa48b0fef6 authored over 11 years ago by Ivan Kozik <[email protected]>
Make it work

fdcce80ad4c9485bd839d89b997b135d06829923 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix whitespace

ac2f94b587c98740e3d6c2e7a5a29394e6e533a6 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix some wget options for greader

aab4e9b5e2b7f70c96b73ab29032edbff7232010 authored over 11 years ago by Ivan Kozik <[email protected]>
Update VERSION

698595792d08b9161039f30c9e9eb4114fefe452 authored over 11 years ago by Ivan Kozik <[email protected]>
Avoid using wildcard imports so that pyflakes can scan the file

ff550388cbe688a857c23fb90c7110af3be25252 authored over 11 years ago by Ivan Kozik <[email protected]>
Removed unused imports

2b4d8bf042b89d621aef664e9ff006be26a06f9d authored over 11 years ago by Ivan Kozik <[email protected]>
Fix project description; use temporary tracker URL for testing

56a3e8e3287657e138d758b3782c92bba73d03a8 authored over 11 years ago by Ivan Kozik <[email protected]>
Fix user agent

0647651c2dd548205ff20d43eecb0de55f5f3d71 authored over 11 years ago by Ivan Kozik <[email protected]>
Import pipeline.py from posterous-grab

fd31b384b85fb2640756e67f85353dd0708f86b3 authored over 11 years ago by Ivan Kozik <[email protected]>
Print a message every 100 URLs, else this will be quite noisy

45843f9dd42fca3483ce18adcb4c0c16a19a339f authored over 11 years ago by Ivan Kozik <[email protected]>
Test commit

3a97fb18ea1cccb9de2f7638583ed29672acaf42 authored over 11 years ago by Ivan Kozik <[email protected]>
Add greader.lua and shell script to test it

e8f8439079bfbdb77c78e448e164580763292cd1 authored over 11 years ago by Ivan Kozik <[email protected]>
Import dkjson.lua 2.3 (2013-04-14) from http://dkolf.de/src/dkjson-lua.fsl/raw/dkjson.lua?name=1a7969ae3ff9a6e0268e6555d31092213060fd62

http://dkolf.de/src/dkjson-lua.fsl/home

f4e1c0ce0cbb155c83f35a26f25004d2b5a07b3c authored over 11 years ago by Ivan Kozik <[email protected]>
Add .gitignore

20eddbf5ca3e39217a2e666f71970d35a319e7b3 authored over 11 years ago by Ivan Kozik <[email protected]>
Add EquifaxSecureCA cert so that we can use certificate-pinning using wget and SSL_CERT_DIR=`pwd`/certs

http://wiki.openwrt.org/doc/howto/wget-ssl-certs

(www.google.com is signed with Google Internet...

ccc1305f0c581c2bcaada2a77df2c5b25408d9e0 authored over 11 years ago by Ivan Kozik <[email protected]>
Verify sha1 of tarball

777d5108e11fd0ae3839b1b4fdbfdb6c14858392 authored over 11 years ago by Ivan Kozik <[email protected]>
Get newer wget-lua

abcc728152619ffbe25148ded5ed65fb229951cc authored over 11 years ago by Ivan Kozik <[email protected]>
Import get-wget-lua.sh from posterous-grab

bc865b80b4f5960e348a6c581e4988bddc305732 authored over 11 years ago by Ivan Kozik <[email protected]>