Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/twitpic-grab

Grabbing Twitpic images and webpages
https://github.com/ArchiveTeam/twitpic-grab

twitpic.lua: small fix

50495541bcfa5218ed04bedb2eff04aa2e202723 authored about 10 years ago by Arkiver2 <[email protected]>
Ignore /show/{large,thumb}.

aa9e24aa0c8f68b1cbd8cf82bb8f38085737b178 authored about 10 years ago by David Yip <[email protected]>
Don't generate URLs for resized versions.

84aa2c8cb5e19d3bfeeca2515c3b24d390cb411c authored about 10 years ago by David Yip <[email protected]>
Bump pipeline version.

6a1a6204b39d0ffd49f04d7927fd0855d984fca3 authored about 10 years ago by David Yip <[email protected]>
Merge branch 'with-image-fetch'

123b32c0540ef4869a4c8e95d5b9286c1e17faee authored about 10 years ago by David Yip <[email protected]>
Merge remote-tracking branch 'origin/master'

7647b8f3633352678866336116947a1730847192 authored about 10 years ago by David Yip <[email protected]>
Grab Cloudfront-hosted assets also.

fd5f72f3482cf2cd743c90cf53cb6d07b011f65a authored about 10 years ago by David Yip <[email protected]>
pipeline.py: bump version

30a099160c21ced369bc4449785fc8873c6f8350 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix infinite downloading

185c37aef9aeec2959956bd892c9500af79caa1c authored about 10 years ago by Arkiver2 <[email protected]>
pipeline: Remove concurrecny message from header

749199ab7caf40e97a066a56dc728e234d8aa8a5 authored about 10 years ago by Christopher Foo <[email protected]>
pipeline.py: bump version

8370d7eeae97118ced28a6d3bf4e8197cbde79c1 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix to not download xxxxx1 for

item image:1

5a28b6b81ed36870e75bfb183f47cd8a24484437 authored about 10 years ago by Arkiver2 <[email protected]>
pipeline.py: bump version

a90c9758c78830a74b7eb88e92876cdd783a07af authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: update for x, xx, xxx and xxxx ranges

a7d094c6ce0b7f5557597c30d73242a139538502 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix

c9557e1d4701d10ebca236b90f8507f237857386 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: enhancement to future script

0b188693968d9f88f795e93a66f8f6707bb6b2fa authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: future scripts for downloading images

Twitpic is now using other security in their urls and has unabled us to download the images from...

22f462ca5fff1f777bfbe9fa0c9845806ad88881 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: add future script

66929b9e21d142701b51913ea061d6d6cdbcf3d1 authored about 10 years ago by Arkiver2 <[email protected]>
Restore image fetch.

This reverts the following commits:

e16bf4460572170fd1e93a9d2c1bf1ea3891815c
894e0bf19bed5bf0f8...

8be8d75fffec0260e92b577e2fa200a8d61fdaaf authored about 10 years ago by David Yip <[email protected]>
Ignore twimg.com stuff.

Same reason as before.

e16bf4460572170fd1e93a9d2c1bf1ea3891815c authored about 10 years ago by David Yip <[email protected]>
Bump pipeline version.

ff26a23cc5e273942e74000930f90bc91789d74a authored about 10 years ago by David Yip <[email protected]>
Do not fetch thumbnails in <link>s.

Also will be handled by the Cloudfront grab + some post-processing.

894e0bf19bed5bf0f8ca949634305ef39580aac5 authored about 10 years ago by David Yip <[email protected]>
Do not fetch any assets for image items.

These are handled by the Cloudfront grab.

83e4f7e01fcb35080e930261ed773d63cf8e6902 authored about 10 years ago by David Yip <[email protected]>
Use one second delay between requests.

This significantly reduces the 503 rate.

c4d74f05c5f0e1078135290cf2561ccce0c60ec0 authored about 10 years ago by David Yip <[email protected]>
Don't fetch large image.

See previous commits for reason.

7f3b8153e471bcfb3279ad37ea6f75e004d4d8bf authored about 10 years ago by David Yip <[email protected]>
Also avoid grabbing mini photo.

<RKenshin> why are you grabbing mini even
<yipdw> I can take it all out
<RKenshin> y...

434c8281816c20c29283f5be27b446109ebd00bc authored about 10 years ago by David Yip <[email protected]>
Bump pipeline version.

9938929733d3583e154e1b02b6b34984f90d78a7 authored about 10 years ago by David Yip <[email protected]>
Revert "twitpic.lua: don't show all lines"

This reverts commit 9c5c05082d2d111c9fe79fe190575a964403b922.

We're trying to figure out what's...

a64455588a6b1870361f13c1dadecf8932d6a47f authored about 10 years ago by David Yip <[email protected]>
Actually reset retry counter when retries exceeded.

951e99c326b9ba0aafeab5887ec1af98b8986b3a authored about 10 years ago by David Yip <[email protected]>
Don't fetch thumbnails.

Reason:

<RKenshin> but i was wondering why we're grabbing the small thumbs
<yipdw> prob...

baf02d9b0954d00fd413e4e3ab424055142d435a authored about 10 years ago by David Yip <[email protected]>
pipeline.py: bump version

3cc0056722c05a37fd84b6f2978b2fa18079c0be authored about 10 years ago by Arkiver2 <[email protected]>
pipeline.py: fix double downloading http and https

3faf8628654bb015614c63e3fd9110537e85dca4 authored about 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix double downloading http and https

45bea6af24401c290a2e66bcd52082e46b5e6353 authored about 10 years ago by Arkiver2 <[email protected]>
pipeline: Bump version.

fcc131ecac6301eca86e03ed8fc9da0abf0315bc authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'llnz-master'

c0950296e8e604053e8689c574caa7cae6dc0f4d authored over 10 years ago by Christopher Foo <[email protected]>
Added missing interpolation checking for wget without compression.

79dd006032a73c8c5d55697c99e9b856bfd99041 authored over 10 years ago by Lee Begg <[email protected]>
pipeline: Don't continue 302 to cloudfront on /show/. Bump version.

Wget log will not show the url being fetched, but the 302 will be in the
warc file.

Cloudfront ...

013eb550bd8ffe2fcb0d318788fd4103c8c580f3 authored over 10 years ago by Christopher Foo <[email protected]>
Ignore Cloudfront in other code paths; bump version.

65c439c47dd4bd0380b53a6c3c50d266a761757b authored over 10 years ago by David Yip <[email protected]>
Ignore Cloudfront URLs; bump version.

d0c58d55042147e1ba8d19300be446e79d4cd0bd authored over 10 years ago by David Yip <[email protected]>
pipeline: Add concurrency 1 suggestion notice. Bump verfsion.

aba66d82e9ad4fc29739058d71432d0af35947c7 authored over 10 years ago by Christopher Foo <[email protected]>
pipeline: Use less user-agents. Bump version.

03bea93f0edf52276b38fcd988c5f1c0621e932d authored over 10 years ago by Christopher Foo <[email protected]>
Merge pull request #6 from dequis/chmod

chmod +x get-wget-lua.sh

f45065566a4d3a3f1c0704fc158edc26101d20e5 authored over 10 years ago by Christopher Foo <[email protected]>
chmod +x get-wget-lua.sh

a316e72179125234338040ed080d6d5a6ffb8df2 authored over 10 years ago by dequis <[email protected]>
pipeline.py: check if banned from twitpic.

bump version

d19be78d10f5f7f6aa9184b5377481ce500babdc authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: check if banned from twitpic.

bump version

101c65b93875bd09dd9bbf1d6b1312d1135648d0 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: bump version

df59c994bdfbac48bc23464a7eb69f9992866a91 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix downloading from /tag/

91eb95440d8e75654457721d7ab3ea1c5f37244b authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: delete IP adress part

4475c711ff0eaa01bdf61ad2e5c376c8ca082787 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: check if banned from twitpic.

1131ab9b1672dcdf763663ffea2f1301f0942c30 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: 6 to 7. bump version.

fd4b86234af41c6ddcd7047f36abb91fce44b333 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: fix typo

8928088643f0863677031ec9198ea60ea477d340 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: set message for being banned:

You are banned from Twitpic or are behind a proxy! Please try ro use an other IP.

2e6d098cfa424dbd9a68216f14727336945f43cc authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: check if banned from twitpic.

bump version.

37903e0904e673f369c3aa0cd2d60e7f404e32a5 authored over 10 years ago by Arkiver2 <[email protected]>
Fill in readme using template

Template from ArchiveTeam/standalone-readme-
template@f5100e2a876feb96b0982625fa3f48495c8f1e56

5dfed6fe944667e68f17e99c7774d3d751c758c9 authored over 10 years ago by Christopher Foo <[email protected]>
twitpic.lua: Fix 403 logic bug where all 403s went into retry block. Bump ver.

My bad :)

5ea065fe3a3fee34d434b7ab7d19d28ad7a4c5b1 authored over 10 years ago by Christopher Foo <[email protected]>
Update README.md

0eaecff950c008709f3ab7379b2c1156d070f1da authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: bump version

dc1031a0cf29cd45c5d8d7ea03a5fdee272aba95 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: do not abort on domains other then

twitpic.com

138841f8a1cc2754d4194525adb40db7b2a2a7df authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: bump version

505b950296a4d7aaebb7fd9abffda7e1f1fc9925 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: remove failure server.

replace NOTHING by ABORT

c3bbaff02bcebaf482f0aabb79bad883cf071411 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: only 4 args

6aa2ed5c73071d0d6252a1a8afe8261389164aeb authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: take api urls out of 'image'

da95a56322b278e932ee14387324b4da696a5d9c authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: take api urls out of 'image'

928f7ff6726cf95f53426666f1e565d763c688bd authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: bump version

9d8d8cde9468bdd7176579998dfe7d31817a383d authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: report failure and skip url

immediatly when getting status code 403.

a7aa4ddf6f6c588877549cbddac92cbfc7f9a5f2 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: don't show all lines

9c5c05082d2d111c9fe79fe190575a964403b922 authored over 10 years ago by Arkiver2 <[email protected]>
fixup! Random user agents and headers. Treat 403 from twitpic as problem. Bump version.

710699fcb243ae3cf90097a4bc8f58bef25a267c authored over 10 years ago by Christopher Foo <[email protected]>
Random user agents and headers. Treat 403 from twitpic as problem. Bump version.

88c6da7adad2071028f173a6e41ce843af572695 authored over 10 years ago by Christopher Foo <[email protected]>
twitpic.lua: add support for places

c955d725050213415c7d54a19b32d43c93a74387 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: add support for faces url for 'user'

39694fe7d5811a4ee3134ec221b5253ce0ff276f authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: download json files of event /e/ urls

a6e2a06f91abacf82d67c276c16839383f47e39e authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: download all urls from

api.twitpic.com and all url that have the extension .json

1a90623e642feae5bb582712295c0bde751b5af4 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: load_file > read_file

472dcbfdc7e105c8570f91e9efca5895efb45042 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: typo's fix

be0a696421f602b60abc5b02f0760ca34bae505a authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix for scraping event urls

175c8553c52b3cd1538ab1558819249fa51a0d21 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix eventjsonurl

f1f4149510cc565e61f236d05e936ea6a7a8771b authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix missing ')'

45b194bdc9b59544146e27f077134775d1899e2d authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: download event /e/ urls with 'user'

e0ae4ea84d8ae5c3f6f0e8e7507887bb69962ebd authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: always download events

ec6578a8064ad815e4402e8e094fb4ff7994d0d8 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: add support for events and places for

'user'

bf3a416d91359e8a1b01bdaa2c6f9e8380b20fc2 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: add support for a 'image' url. bump

version

2228d14d2682a3f8ea5e8984756f09b8890fd0f1 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: add suport for new 'tag' url. bump

version

0d8b21abb3d37a99ff46b2a9f594c9c155c6c5cc authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: don't download inidividual comment

JSON files

5e8cf3f3bb2e5d2297332f073dde507d26ff47d6 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: exclude every second tag page when

last page is reached

118c1d5fcf5e8c3e23b87665875f31de905c396c authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: &page > ?page

0f6a183fedce6d30daf6643a58f78762ef7341c4 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: check if tag url is page url

bdff2b9bf8d17e2ed6d2733acab6e19b130eeafc authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: replace & by ?

6a342561c96481cf35c7311a3128d1a8c2efc2c5 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: stop getting pages when page is empty

fix

30ffd1d95940365d8bfbf6bee305bdcd75b97800 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix

971de8f3c137eaf80ede917e5d4bfb3803037fbc authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: stop getting pages when page is empty

2656d31cd3a7a5ebd9b5adc8d7cd452b7ea60b6b authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: download only pages containing string

375fed23c8c563683901ede366b9b29709ace546 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: stop getting pages when page is empty

e9a71e398f74e1309fb333bf2d2f95e18d5b70f0 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: add support for tag pages

e55787204eb184638320f134db38abf5c1d2bf74 authored over 10 years ago by Arkiver2 <[email protected]>
pipeline.py: change USER_AGENT so tests can run

d7f905acf73da4d7693c9156a5cd363b34648521 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: get number of pages for comment

and download those pages of comments

c942ee419735eea1b53ab2c5afaf07e71444433b authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: load JSON file as JSON

c0e4f5e235eeed277f30f0ef197b073b2add0944 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: read JSON files

2066095be66f262a1e37747367ef6eb5ecdd1e11 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: add locals

3e9bfb5ceea6fe1538412d02f4f269682c52be38 authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: do not check if downloaded already

de526a1b68297627daed6d068ee17513c1386c9c authored over 10 years ago by Arkiver2 <[email protected]>
twitpic.lua: fix urls > urls,

cc48bc906def59243bcffe23d3836af43a8ce54c authored over 10 years ago by Arkiver2 <[email protected]>