Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/friendster-scrape

Friendster archiving
https://github.com/ArchiveTeam/friendster-scrape

tell wget to use only ipv4

Since friendster does not have ipv6 servers, reduce DNS lookup
time and overhead by telling wget...

ba1e2dfae45d8caf24705e8afdb0edbdfbd2f253 authored over 13 years ago by Thad Ward <[email protected]>
much better estimation of tar file size

Produce a size which is much closer to the actual tar file size.
It is still a little off, but i...

3746cb6356c02ce6fc1c4b48e47de035f7e836c7 authored over 13 years ago by Thad Ward <[email protected]>
remove --no-recursion flag from non-pv packrange

when removing the sorting logic from packrange, I forgot to remove the
no-recursion flag from th...

8fa75c0648dae5dfcf8742da322448e681119aa8 authored over 13 years ago by Thad Ward <[email protected]>
add ampersand and semicolon to the reject list

to avoid infinite recursion in blogs, reject files or directories named
& or ;, as well as the p...

484e7a59b3c171ca8fe5a87f9c82ab82775e90a6 authored over 13 years ago by Thad Ward <[email protected]>
getfriends.sh downloads friends lists using the API.

acec107efac272e4e10351dc56021f5e8dd57260 authored over 13 years ago by Alard <[email protected]>
Another fix for the fixing script. (I really should be more careful.)

44d9fb7caf406b34108001d60074bad45f3c4674 authored over 13 years ago by Alard <[email protected]>
Updated script. (Forgot to commit last time. Grr.)

68cd4863856cd559944b967f8cf64c411374220b authored over 13 years ago by Alard <[email protected]>
Version 4: fix a problem with the bulletins. Groups downloaded with version 3 or earlier should be fixed by running fix-bgf-bulletins.sh.

I made a mistake in the part of bgf.sh that downloads the bulletin messages. Instead of the mess...

c26fceabd5dcde084dd7f13d6a76fb1572d5ecc9 authored over 13 years ago by Alard <[email protected]>
fix a couple problems in packrange

i should have tested changes before commiting. there is still a slight
discrepancy in the size c...

7fb6db831e0231293c33fd09c34ab540eab8ab08 authored over 13 years ago by Thad Ward <[email protected]>
add simple pack script that gives progress with pv

78d11bd7acda6c128d6e3315fefafdabfab4a59a authored over 13 years ago by Thad Ward <[email protected]>
Store groups in a separate directory to prevent collisions with profiles.

631e3eef6f0e6cf6a3eddf03b3ccbe6938a80ddd authored over 13 years ago by Alard <[email protected]>
Fixed the solution for Friendster's empty file problem. The script now works, I think.

I did not think to remove the empty result before restarting wget. Without removing the empty fi...

da62a711dd1ba739ff161df50602c7f9b2ab3b1c authored over 13 years ago by Alard <[email protected]>
Added bgf.sh for downloading Friendster groups. Still contains a bug with the forum download.

343b8d41689a563ab2ca59d7e2b9e840159d0954 authored over 13 years ago by Alard <[email protected]>
fix a bug that made the mv fail. (/me sighs) r=Coderjoe

5ab4a65078d3ed6ecb226d4a5c93d77cca7eb4e1 authored over 13 years ago by Daniel Brooks <[email protected]>
fix bugs, pretty bad ones too. thanks to Coderjoe for noticing them

3e6b108b15bec863530660626474a227460b39fd authored over 13 years ago by Daniel Brooks <[email protected]>
update to handle larger userids even on filesystems with low limits on the number of subdirectories in a directory

3924797a7654c18fcdb8b930462bd4e17c833634 authored over 13 years ago by Daniel Brooks <[email protected]>
detect when chunky is running under an old bash

detect when chunky.sh is running under a version of bash too old to
support associative arrays, ...

6c03ae2a8aa006484d878a3486e51245ff906cac authored over 13 years ago by Thad Ward <[email protected]>
fix edge case in detection of end-of-range

when end-start was exactly a multiple of step, the script would erroneously
omit the last profil...

2629af2246f6a386d30f93e5a65760943a9c0c7d authored over 13 years ago by Thad Ward <[email protected]>
move thread command name to a variable

274a4453b52f3bf259458686e996aa5c5684e521 authored over 13 years ago by Thad Ward <[email protected]>
reject blog pages beginning with <

e8fa445541b50d036a75c3537b669966efbded26 authored over 13 years ago by Thad Ward <[email protected]>
add parameter validation

d2a450eb23ee12015e9fd4f46ad439a8da6ec010 authored over 13 years ago by Thad Ward <[email protected]>
move per-thread report details to function

move per-thread report details to a function, and report threads
during the shutdown loop as well.

048202080f784f02cd07e824e1e1029548dd2a91 authored over 13 years ago by Thad Ward <[email protected]>
Stop on detection of the STOP file

allow for other processes to tell chunky to stop by creation of a
file named STOP. This allows f...

389cbff2c61b9cdd390f01d675e2c4f7c8a0feb6 authored over 13 years ago by Thad Ward <[email protected]>
Don't display "next chunk" message when out of blocks

983a74688e08f11ab63f8f2eeb10fe764b9c4696 authored over 13 years ago by Thad Ward <[email protected]>
better handle end of non-mod-100 ranges

4e27eab251c46ea2fdad0f472a9a46848070ef3b authored over 13 years ago by Thad Ward <[email protected]>
change chunksize to 100 profiles

change chunky's work unit to 100 profiles so lowering thread count
reacts a little faster, while...

478f2ffbe0949d32fb47779e8618500c229bb3b9 authored over 13 years ago by Thad Ward <[email protected]>
lower blog fetch read timeout to 2 minutes

wget defaults read timeouts to 900 seconds (15 minutes), and retries
to 20. This results in wget...

a0c96050b722ed55a90985b8d2cea1f1923d62cf authored over 13 years ago by Thad Ward <[email protected]>
use case instead of cascading ifs for menus

ccdd9a00c30e4057ff179ef8fc434fdaa0fa21d4 authored over 13 years ago by Thad Ward <[email protected]>
fix total percentage not increasing

53d64a35ee9f5534eb2d998805e317fb0adb8616 authored over 13 years ago by Thad Ward <[email protected]>
append to log files, rather than overwrite

2915b9c5da0d44fa4eadec1e10d87316bfccd271 authored over 13 years ago by Thad Ward <[email protected]>
message formatting changes

0bdddf7c011c0fa4eca963e1b5b75a67e1f632e0 authored over 13 years ago by Thad Ward <[email protected]>
remove use of ^c for attention

have read handle the role of sleep and getting input from the user.
Using ^c was a bad idea, sin...

506e493d8029d14779663c04f204c3547c90273a authored over 13 years ago by Thad Ward <[email protected]>
display progress for each running thread

3b33258ed974a6cc16494ce8b1b3cf8c676129d7 authored over 13 years ago by Thad Ward <[email protected]>
report profileid to caller via file

8e28e7909f80df05f8c9461920ada6621f349c64 authored over 13 years ago by Thad Ward <[email protected]>
provide status updates while waiting to stop

316df2d0d0efa1811ba8c96c99660750fe7c63bb authored over 13 years ago by Thad Ward <[email protected]>
add another thread control script

Add another thread control script that assigns blocks of 1000 profiles
to each thread. It also a...

00a03140584023b79fdb8f38deb4fd47e5738555 authored over 13 years ago by Thad Ward <[email protected]>
add an optional cookie file parameter

7bc0ffe5011cc2504ea04b1c48f6d110e9534881 authored over 13 years ago by Thad Ward <[email protected]>
fix broken thread cookie jar filename

747d749c64a85e41b25e728634c6fcd11d0e0545 authored over 13 years ago by Thad Ward <[email protected]>
use bash math for better compatibility

modified from pull request submitted by ip2k

c000111beb34ba4ea5b28ee3646ee552834e4064 authored over 13 years ago by Thad Ward <[email protected]>
Fixed status script bug

f89fd2cd7ffec1ec9687489636c9e2f0ffb3c176 authored over 13 years ago by Alex Buie <[email protected]>
Fixed status script bug

49cb27c481d295e2d1db549e30d43b12292b64e2 authored over 13 years ago by Alex Buie <[email protected]>
Fixed status script bug

0be4204bf98b28aeaf0f90d94cdbfbd729811f14 authored over 13 years ago by Alex Buie <[email protected]>
Added run-once progress script

c8dc6cc7f32b37bd5979dde8ce5f0c079d5a12ca authored over 13 years ago by Alex Buie <[email protected]>
Added progress script

f6dbf65b9feb2098b9c4e50219e4c39bac350641 authored over 13 years ago by Alex Buie <[email protected]>
break out the thread script into a separate file for readability

ef6041c262c59d498289eacd62b5d71c9e910a96 authored over 13 years ago by Daniel Brooks <[email protected]>
rejigger the rejigger, because it's still early and I can't count

178f55ee77edf5e4544719f8ffcc03622b42855a authored over 13 years ago by Daniel Brooks <[email protected]>
rejigger the start and end indexes so that we download exactly the range the user wanted instead of shifting it by two

1f2ceddeb75937e5778d38667fa6ef00014b2854 authored over 13 years ago by Daniel Brooks <[email protected]>
Merge remote branch 'alard/master'

d529e57081bbc8241f85af811dd020c024630acd authored over 13 years ago by Daniel Brooks <[email protected]>
Fix for shoutoutstream: used $PAGE instead of $page, so only the first page was downloaded.

a7388e06c0277dc725fa75a690f1bf8cf7b2d282 authored over 13 years ago by Alard <[email protected]>
Prevent infinite recursion when mirroring blogs with unlucky HTML errors.

4bf72ce552e6ef80891cb6313b8590b694d01647 authored over 13 years ago by Alard <[email protected]>
Merge remote branch 'alard/master'

763e8f29ad2358ac1919eeb4ac9701a276fb57f4 authored over 13 years ago by Daniel Brooks <[email protected]>
Better check for login result. (Use unique filename for login result.)

2342775910516b7afecf884c3672f90eef6a7c49 authored over 13 years ago by Alard <[email protected]>
Some textual changes to reduce confusion.

The script now checks the result of the login request.
Profiles that were retrieved without a pro...

86eb16a5c13d82fa1adbb6503b4bea202a505550 authored over 13 years ago by Alard <[email protected]>
add logs and cookies files created by snook.sh to .gitignore

043546870657c3b9d0b2e5795be2396aa86dcee5 authored over 13 years ago by Daniel Brooks <[email protected]>
clean up snook.sh and make it actually use a different cookies file for each copy of bff.sh

10f93021242ca07515864775d05bc71a5f97eb5f authored over 13 years ago by Daniel Brooks <[email protected]>
remove an emacs backup file and modify .gitignore so that it doesn't happen again

98225bbf7141f6103b0499ba0c33f605db9d16c6 authored over 13 years ago by Daniel Brooks <[email protected]>
add snook.sh to the repository

1fca67a1ce446d640f321a77b50517209ffc27e6 authored over 13 years ago by Daniel Brooks <[email protected]>
bump to version 9, re: grep fix

eb52c4055dab3f9910f2cf6ade0c2ef51c8970da authored over 13 years ago by Thad Ward <[email protected]>
Merge remote branch 'yipdw/master'

f1817a79fddfcb6c7dae5870e434bc1757d410b0 authored over 13 years ago by Thad Ward <[email protected]>
Previous fix was too broad; [^\s]+ is closer in meaning to \S+.

ab31ff0b58e76aee1704aa06b0749b049a4d506d authored over 13 years ago by David Yip <[email protected]>
Compatibility fix for GNU grep 2.5.4.

While 2.6.3 appears to correctly interpret the \S metacharacter, 2.5.4 doesn't.
The end result i...

b1f5b72cd13e20d6b02c20d8fc7b2710fc816a61 authored over 13 years ago by David Yip <[email protected]>
grab blog content from both *.blog.friendster.com and *.blogs.friendster.com

271a495fd8a4928855379f6eca4f2beba0d53f4f authored over 13 years ago by Thad Ward <[email protected]>
Add warning about blog.

7acbb14ff10da7138dd0eb5ab6025b2de2e2632d authored over 13 years ago by Alard <[email protected]>
Show warning if the login has failed.

2a123cbb52edc74a8d3d535ee2e27665a1d86855 authored over 13 years ago by Alard <[email protected]>
Script to fix an error in version 3 and earlier, which left the .incomplete for unavailable profiles.

f0ae0c7dd0ce3c43c1e8ac441e76e151d1930a46 authored over 13 years ago by Alard <[email protected]>
Ignore username and password files.

6f623da8400a37e588ae53454806596adc2c8ada authored over 13 years ago by Alard <[email protected]>
Version 4 + 5: remove .incomplete for for unavailable profiles

295cec5fa5de5245d402220539e1b351bc652333 authored over 13 years ago by Alard <[email protected]>
Version 3.

560799c98fbf9d662c9874fdb98d3303ba43bd06 authored over 13 years ago by Alard <[email protected]>
readme

2cdf35e4d48b9e87fd5be994cb9c4ec3aa7ef8c9 authored over 13 years ago by Alard <[email protected]>