Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ArchiveBot

ArchiveBot, an IRC bot for archiving websites
https://github.com/ArchiveTeam/ArchiveBot

Update to seesaw 0.4.0.

75403e8b6227112936c68aa86e69a257afcb5225 authored over 10 years ago by David Yip <[email protected]>
Merge branch 'master' into next

134f910d7bbc1dafc71dd42e37843d52d5bd9424 authored over 10 years ago by David Yip <[email protected]>
Show concurrency level and delay in dashboard.

56a49fb83c155f1b090d56b27d39157a9083e0d4 authored over 10 years ago by David Yip <[email protected]>
Return concurrency level and request delay in job JSON.

2462dde0e94c9f743506549018df0a6b539091b2 authored over 10 years ago by David Yip <[email protected]>
Ignore addtoany.com/share_save

950ced115f793489e68c46d5d2f8760a5f5c0557 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore localhost

9495447cc875b82f190ebf9fa4710a1910b4ba66 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore https as well

8d22efe28192dd6084a5665fa64891a5a4c89ff7 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore frequently-encountered wikipedia thumbnails

dd50509e034de7b6b0f26a7b98112c6834e4717d authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore pages on draft.blogger.com

cdb747a54d1c3930fc382d576dc84033685fbb61 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore some reddit wiki pages

f13015a8c83ddd4227aae118ffd4661ad5ef4771 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore more radioscoop

3a503b10dfc5b2a2f3e165f8018f038f2e8ac2bd authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore more js-agent.newrelic.com

e7a70246e3d9a78bff0789f1bd93c2dac0db8761 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another Icecast site

dd26c15de7f81e67a9dd82d65f7c36987e5115cf authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore &mobileaction=

94f340bd52ede46f710b99d4953931403f84544d authored over 10 years ago by Ivan Kozik <[email protected]>
Remove moved rule

7dfc5e377564397a139fa5b35e5823cbc103209b authored over 10 years ago by Ivan Kozik <[email protected]>
Copy tumblr rule from blogs set

5cfb8bf999a2c501f28ff5a3e909cf4a5e19d55e authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore more &amp;

36afc820c486838a73c391f97d4c453e53b13139 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Special:ListFiles.*&user=

486f852f49e87d4f7c1779ffe8b0328e9f990a1d authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore some Special:ListFiles

Note: &amp; args in URL like

https://wiki.unrealengine.com/index.php?title=Special:ListFiles&...

b2b0cd358a2d7c81c68c9c5418a8f35a11e455cb authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore stumbleupon without www. as well

7390959da20744b43de733ba8717e5b268a432b1 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore per-section edit pages

ddce1c05c3df21cb34db7b1ffef9c10a8294b9ca authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Special:RecentChanges&from=

30760dc9ff1619d1955b8dffcad8cd165b052a83 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Special:RecentChangesLinked

c2f3be1a9081ccf34f95ae28b6aec784f25d1de7 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore a SHOUTcast site

e5c6c016a519ac42fa9c0543f13c941fe4353df2 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore www as well

5865a7c7dad2cafad596260e770aaf7e28aab04d authored over 10 years ago by Ivan Kozik <[email protected]>
Fix literal .

63c44c7b254c3b2e326986e6dc7323d5f7620cd9 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another Icecast site

c865e4a8c056b83cf979ab613c04957a73915eed authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another Icecast site

c07fef15bf764e16457993286a5abcf26ae6a4a1 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Special:MobileOptions and Special:MobileFeedback

4c370069b0aece5fcd4078b7c1c8a9cd5322020b authored over 10 years ago by Ivan Kozik <[email protected]>
dashboard: Added queue counter to job status line

09a527c8103a9b1f926a39bc3b6aa97f59fc0965 authored over 10 years ago by Marc Hoersken <[email protected]>
pipeline: Added queue counter based upon queue hooks

34eea55edee290e23a0e6047d33aee89e3ff158e authored over 10 years ago by Marc Hoersken <[email protected]>
bot: Added display of queue counter to status

c1910df0a0c5c12bfdf78d189b766115b3da648c authored over 10 years ago by Marc Hoersken <[email protected]>
Ignore TMZ videos

These have popped up as page requisites twice now

9f778f61c52c5a1385d378e0e5d8bed45e8a5b8a authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' into next

21b442a1a6f57722b0a3be43f2ea5359eb1b47f5 authored over 10 years ago by David Yip <[email protected]>
Improve "ignore reports suppressed" indicator.

b0568ac281994a7027c8ca7c615064af15f0cc98 authored over 10 years ago by David Yip <[email protected]>
A utility to reverse the pending queue.

Sometimes we end up with high-priority !a items below lower-priority !a
items. This is one way ...

7017803b47c3fc5c0297702ea477a68da0f3a528 authored over 10 years ago by David Yip <[email protected]>
Revert "Revert "Eliminate stop_control atexit hook. #91.""

This reverts commit fa11a2a2835322461809e2c0a969b24b9ba54c38.

The dual at_exit/on_cleanup mecha...

80ae1012263b54b3cb5f9e2d316bbdc16832c6c8 authored over 10 years ago by David Yip <[email protected]>
It's pipeline_id, not pipe_id. #91.

I need a test suite.

fc4045d2f23b81f3f214dc7317378f635044b463 authored over 10 years ago by David Yip <[email protected]>
Collapse atexit hooks. Fixes #91.

efbbc89c2446dd16772e8605744c134c79dba099 authored over 10 years ago by David Yip <[email protected]>
Revert "Eliminate stop_control atexit hook. #91."

This reverts commit 89fbbe5af1204c1c264794a088e3807112e8c518.

The atexit hook is used to trigge...

fa11a2a2835322461809e2c0a969b24b9ba54c38 authored over 10 years ago by David Yip <[email protected]>
Ignore another Icecast site

fa8db17df8d8fa1832830265afdcda5d21dada59 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge pull request #94 from mback2k/master

Small improvements to INSTALL doc

9bd3c60ce4d10dc45661f49491762718cfc572d6 authored over 10 years ago by yipdw <[email protected]>
INSTALL: Small improvement to CouchDB explanation

c5b9e8c1fa95ead3b1eb3aa80bc61a16d3c70e86 authored over 10 years ago by Marc Hoersken <[email protected]>
INSTALL: Added dependency of ExecJS supported runtime

531ddf1a34dd3352610e8498d0d21fa779ac6539 authored over 10 years ago by Marc Hoersken <[email protected]>
INSTALL: Added dependency of PhantomJS 1.9.7 and its requirements

e78380c951d3af8f8a1d3adf727f1b5bb58006c0 authored over 10 years ago by Marc Hoersken <[email protected]>
INSTALL: Improved parameter examples and removed obsolete env vars

ce9e5a362ec83a106227ecd0285dc33f89bb3af0 authored over 10 years ago by Marc Hoersken <[email protected]>
INSTALL: Fixed path to cogs start application

7fdd84cba4b80fcf1b1f28892ad6dac749bc3bef authored over 10 years ago by Marc Hoersken <[email protected]>
Move new cogs into their own directory.

These programs will have their own specs and dependencies, so might as
well isolate them.

82728465480d1637edc0971cdf2af1690c81311b authored over 10 years ago by David Yip <[email protected]>
Add a utility to retrieve up the most recent N log entries for a job.

aded7ba1706356ab0836f52459fb8d0d9698f60f authored over 10 years ago by David Yip <[email protected]>
Remove unused cogs.

c759161b3d2224fab61d5667653fe81477712d23 authored over 10 years ago by David Yip <[email protected]>
Disable ArchiveFinder.

ArchiveFinder hasn't been updated to handle the multi-WARC paradigm that
ArchiveBot now uses, an...

0b1649d1238bee998db5c23f0b6c404c5591a9d4 authored over 10 years ago by David Yip <[email protected]>
A program to show idents of all jobs in the working set.

fb127253b5bb5e9d02c8c281c46de749a192f5db authored over 10 years ago by David Yip <[email protected]>
Make dashboard accept input from log-firehose.

b45fc5e335a6eed9ad378885779cc15f4d086264 authored over 10 years ago by David Yip <[email protected]>
Introduce log-firehose, a utility to spit out job logs.

This seems to exhibit saner memory usage patterns than the dashboard's log
broadcaster. (And if...

2a5eb5624b1a79d43d00db318bece3d7022c3998 authored over 10 years ago by David Yip <[email protected]>
We don't need to use tcpserver/tcpclient.

Originally, I thought that I could get away with using one Redis
subscription and attach all the...

10ae1d22e7704050449f7794c357cdc14c856996 authored over 10 years ago by David Yip <[email protected]>
Reimplement JobRecorder.

7e7b7f3ffff311c727739188b640e1d67fcd46f8 authored over 10 years ago by David Yip <[email protected]>
Remove unnecessary signal traps.

22c781cedc2ae11ca47a7edc20fbaedfb311cadf authored over 10 years ago by David Yip <[email protected]>
Remove old log analyzer and log trimmer cogs.

bfdfd83ab8d000594f62b09ee762b77274542d4b authored over 10 years ago by David Yip <[email protected]>
Remove superfluous subdirectories.

77c0b445d96db4276af64b13cf08ac4cbcc2b8a6 authored over 10 years ago by David Yip <[email protected]>
A simpler LogTrimmer.

e4fe5d17d2838b9c9798ef66f4d27f668f878c76 authored over 10 years ago by David Yip <[email protected]>
Disable old JobRecorder.

b1ccff8a882ba45b5f385b8f27c58e7fe841204c authored over 10 years ago by David Yip <[email protected]>
Remove unused stringio import.

81eb753147cc899b099d61695d75d9ff91086b73 authored over 10 years ago by David Yip <[email protected]>
Replace LogAnalyzer with a few smaller scripts.

The cogs program is suffering from inexplicable memory leaks and/or
process bloat. (I can't tel...

11edaf17e34206550e21519823df7e24991dbfa7 authored over 10 years ago by David Yip <[email protected]>
requirements.txt: Use seesaw>=0.3.1

5b3c5bf5ac501c78839955c4b4a3e70a1d5a3bd3 authored over 10 years ago by Christopher Foo <[email protected]>
Ignore delicious.com/save

2abcab4e34ef7ac131771efb5d959605962e41e9 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore add.my.yahoo.com/rss

df6c1c66a574b6963a4bfc851925765f6b015ae2 authored over 10 years ago by Ivan Kozik <[email protected]>
requirements.txt: Use seesaw>=0.3

Python 3 branch of seesaw has been merged to master.

64738a94a5571e477a14ac5d4b369f1756a2b4a1 authored over 10 years ago by Christopher Foo <[email protected]>
Ignore local/private URLs that have a :port

a0b0fa0b6cecbfb76f5c80662ba574470d450942 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore local and private IP ranges

3fdef88ea12f919cf65cc67b44444960352f72f5 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore dead host

ca18e47d944a7874640efb5d03c36057e512ddea authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore infinite /styles/ and /scripts/ loops

7edc27b3f3d6758a1189bf8d2a232425397a8c53 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Google Analytics tracking image

8295f19c8332d5723d2947612c70e17958924c11 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore /privmsg.php

6bbdf74e212fba73db79dc13429e54722dfaeb31 authored over 10 years ago by Ivan Kozik <[email protected]>
Fix double-spacing error in "Queued ..." message.

9b0ceaf91ee1387068d3e281303ba4dcc724d8bc authored over 10 years ago by David Yip <[email protected]>
Add nogravatar ignore set

18ab43fdedb95d7a84a7aac62bb56e460edfa30c authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore dead ad/tracking site

27524b5b13d0f22e3974f9e16b760e8497c9c2f9 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge remote-tracking branch 'origin/master'

33c9d63383b41c0571edc58a659dc083028ff6d6 authored over 10 years ago by David Yip <[email protected]>
Merge branch 'ao-many-2'

Useful for backing up news stories, like the ones related to
trigger-happy Ferguson, MO police. ...

0e1d85875962c08219a32b9be1067b083f0f27dd authored over 10 years ago by David Yip <[email protected]>
Ignore runaway %25252525

706319956184175d7dc6ad31f8dab77cd78637ab authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore blogger.com/comment-iframe.g

995cd092406d846e8fd64256568d6ac815210696 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore loop on photobucket.com

bd713f3c4a9e7145b54d417c31d126afaa0bc018 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore blogger.com/email-post.g

1916c314ec0e850baf68b8191986ba708ce0ec18 authored over 10 years ago by Ivan Kozik <[email protected]>
Remove unnecessary text()

f1f0a6f43ce72a7c29749865b71cfac5f9d3b4bc authored over 10 years ago by Ivan Kozik <[email protected]>
Remove quote-fixing hack; ArchiveBot was fixed

2512f43c496abe44265b0150f4a7841735441940 authored over 10 years ago by Ivan Kozik <[email protected]>
Update stats only on download lines

e5571002d2e2414f44525141ca7b0b8d7662af02 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore web-beta.archive.org

73292a2af01274bd5950ec6f4041e0fbb45860ce authored over 10 years ago by Ivan Kozik <[email protected]>
Show # of responses/sec

bcacad6cfd7bb34066e07b3235fca48a728507eb authored over 10 years ago by Ivan Kozik <[email protected]>
Show number of responses

ddd727609bb79ee2d44ed19b507bb470599e3809 authored over 10 years ago by Ivan Kozik <[email protected]>
Show how many MB downloaded

9ae44bc50354874fcaecc4511deb96e219b6310f authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another facebook infinite loop

https://www.facebook.com/connect.php/js/FB.SharePro/rsrc.php/v2/yo/r/rsrc.php/v2/yY/r/rsrc.php/v...

19b28d75d8a926c710bb9df9bf1980a483491d47 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore more khaleejtimes.com loops

ed68107becf1566645d6f3b256cc0f481b47ce44 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore infinite loops on khaleejtimes.com

7fc4460ca4c31caaec51acd162d38269ce5478c2 authored over 10 years ago by Ivan Kozik <[email protected]>
Make job queuing + queued_at an atomic action.

This hasn't (as far as I know) caused any issues, but I think of job
queuing + setting queued_at...

ba974c06a8dc69ede7e4ab16344b03fe238b8cb5 authored over 10 years ago by David Yip <[email protected]>
Teach the bot how to queue !ao < jobs. #14.

67f9c831eddf2f85c14a8b23a0a3f8ed8151a467 authored over 10 years ago by David Yip <[email protected]>
Stream URL file from source; write file as we receive it. #14.

This shifts responsibility to wpull for parsing the URL file; all we do
is fetch it.

Potential ...

68f54002eca468d3a5a021d458c51dac5d1c4907 authored over 10 years ago by David Yip <[email protected]>
Generate a different slug for !ao < jobs. #14.

(This is a good illustration of how the Job class has probably reached
its limits.)

3814f7405e2cc4e6c973aee453acff821f2777ba authored over 10 years ago by David Yip <[email protected]>
Teach pipeline how to handle a file containing URLs. #14.

6c6ae8f6fce60b61b55e4fd578f932602df4f180 authored over 10 years ago by David Yip <[email protected]>
Teach the bot to recognize !ao < FILE. #14.

This commit also teaches the bot to differentiate between !ao URL and
!ao < URL. (Cinch trigger...

1eb3b17db37f03c5724310de07ad29fe8adb001f authored over 10 years ago by David Yip <[email protected]>
Eliminate stop_control atexit hook. #91.

The pipeline.on_cleanup event was added to seesaw in commit
658e53ecbaebd7f21f303ac203b048cb0b6c...

89fbbe5af1204c1c264794a088e3807112e8c518 authored over 10 years ago by David Yip <[email protected]>
Ignore &hidepatrolled=

3703d9487045eaa2548d77831e9760a4ff040766 authored over 10 years ago by Ivan Kozik <[email protected]>