Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ArchiveBot

ArchiveBot, an IRC bot for archiving websites
https://github.com/ArchiveTeam/ArchiveBot

README: point to readthedocs; sync INSTALL references.

5f012c4f3e1d6e73ceb848f428133e25e89077bb authored over 10 years ago by David Yip <[email protected]>
Split INSTALL into backend and pipeline instructions.

7afbb960f1287206cc7e7177baa2609bb96e688a authored over 10 years ago by David Yip <[email protected]>
plumbing: Celluloid is no longer a dependency.

3ec5cde95360b076af502bd3b0c705f694c17b55 authored over 10 years ago by David Yip <[email protected]>
Make the test server exhibit more interesting behavior.

Specifically, this commit adds multiple links on each test page. This
means that concurrency le...

c5f02c0f2a1ca3773a2ec5a9b9701f3e0ff6df14 authored over 10 years ago by David Yip <[email protected]>
pipeline: lock at known-good trollius revision.

Trollius 1.0.1 contains a bug that, over time, causes wpull to fail to
retrieve data over any co...

06b19efd13ceda368e6f1045a1fcae5eff636f9e authored over 10 years ago by David Yip <[email protected]>
Merge branch 'remove-pykka' into next

e86553e38007ef870f82fff4753fbbe2a52e41fe authored over 10 years ago by David Yip <[email protected]>
Synchronize access to settings data.

The settings listener runs in one thread; wpull runs in another.

5355d9a4885bf52f1fcaceb507214979aae576cd authored over 10 years ago by David Yip <[email protected]>
Remove Pykka.

Pykka's ThreadingFuture object is interacting with ArchiveBot's wpull
hooks and wpull in weird w...

9fc00ef233ccd0b7cdca6d7dcc251aa3fc78cfe1 authored over 10 years ago by David Yip <[email protected]>
Link to the new docs

87774aad46e4440978a8d8ed26ead86ab97b594c authored over 10 years ago by Ivan Kozik <[email protected]>
Document some more basic regexps

777ef833685f254c8d57d45f3d353fbd44191d75 authored over 10 years ago by Ivan Kozik <[email protected]>
Document all-except-domain-or-subdomain regexp

b2dc99c90965f3829d01d4204138d5e1937ee6bc authored over 10 years ago by Ivan Kozik <[email protected]>
No need for the .*

95d5140f5498e746bb6a77a6bb6ea5bc50eef46d authored over 10 years ago by Ivan Kozik <[email protected]>
Document all-except-domain regexp

db1c5f48fab1ba720baaf31d7365aa9cfaabab5f authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another Icecast site

eeb30dcb5721818abac229b9d8857681756cb278 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another mp3 streaming site

d14ad6719722646c081fc5ccc014c974397a7ac7 authored over 10 years ago by Ivan Kozik <[email protected]>
Move WARCs to FINISHED_WARCS_DIR as they finish

FINISHED_WARCS_DIR must be on the same drive as the pipeline data
directory to allow for atomic ...

74334038842bd7ef1df85e9a812a7dabe646b3b1 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore dotfiles

df506deadd983a71dd68f936c6632011b450171b authored over 10 years ago by Ivan Kozik <[email protected]>
Sort files before choosing one to upload

27467c09abe01a6e6a2b66f9f7553bb36cb8a977 authored over 10 years ago by Ivan Kozik <[email protected]>
Add uploader script

ad0fbdbeee5d828e3c4b77b298d37d499a817ee1 authored over 10 years ago by Ivan Kozik <[email protected]>
Dashboard: use IEC prefixes for pipeline disk space readout.

GNU df insists on reporting GB as 2^20 bytes instead of 10^9 bytes.
(It'll use SI prefixes with ...

b604de7283eb6f94df4f20bde18e32f3e9d3f1b1 authored over 10 years ago by David Yip <[email protected]>
Merge branch 'master' into next

424224bc0451380e6f3f853581a9d7cac1963c07 authored over 10 years ago by David Yip <[email protected]>
doc: code-format !igon/!igoff.

377619347234892b66301413e046e8ffccd8ab65 authored over 10 years ago by David Yip <[email protected]>
doc: describe !archiveonly < FILE.

0f39cbce5d3d634504713ceefaa8b917210653dc authored over 10 years ago by David Yip <[email protected]>
Rename "timestamp" to "last checkin".

IMO, this title works better with the relative time presentation of this
column.

9efea1ed85c6cbbb8286664918307e158f779c65 authored over 10 years ago by David Yip <[email protected]>
Display nickname and free space on pipeline report.

Closes #106.

e852f382c805f33c4c2d9bfd74e4e43d26a46c8b authored over 10 years ago by David Yip <[email protected]>
Dashboard: eliminate pipelines set and PublicPipelineRecord.

We now SCAN for keys matching pipeline:* and just return the result of
HGETALL on those keys. T...

b9637f03e7081a759d1800842023af823c9059c8 authored over 10 years ago by David Yip <[email protected]>
Merge branch 'master' into dupes-db

0561a0ab0306ba71b1bde8c7bfcd388631be8e51 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' into next

9af99498e1e2b8ba5526733ab63f08ec36dbf24c authored over 10 years ago by David Yip <[email protected]>
Merge remote-tracking branch 'origin/master' into next

Conflicts:
pipeline/requirements.txt

6cf85d77bafa0698c099c347068c0c28ae730239 authored over 10 years ago by David Yip <[email protected]>
Merge pull request #105 from chfoo/topic/pipeline_monitor

Add pipeline monitor page for dashboard.

f257887d40dfd160216cbd0793e3124dd7529b06 authored over 10 years ago by yipdw <[email protected]>
Add pipeline monitor page for dashboard.

4798519fff92b559417ccedc1ea9d1aef67bdd26 authored over 10 years ago by Christopher Foo <[email protected]>
Fix very broken Google Finance ignore

e0a876e7f59d8ffaacc6073dd92e206696b4ecae authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' of github.com:ArchiveTeam/ArchiveBot

669747243941139d0fbd4e21380164e565f6ca44 authored over 10 years ago by Christopher Foo <[email protected]>
pipeline: Bump version. requirements.txt: wpull>=0.1000

71480d208ac9da2170877d9eec3ada67d5871f5a authored over 10 years ago by Christopher Foo <[email protected]>
pipeline: Pass phantomjs path by found pipeline to wpull.

0793f924567a2d0bf2c409fd0801eda456301ecf authored over 10 years ago by Christopher Foo <[email protected]>
Support all Google TLDs in Google Finance regexp

fd2eef3546078e5a89722a6fecd9016c8a6b7075 authored over 10 years ago by Ivan Kozik <[email protected]>
pipeline: Include nickname and Python version in monitor stats. Bump version.

Closes ArchiveTeam/ArchiveBot#79

a8aa519aa81d97d525a4fe98492417c8104a22ab authored over 10 years ago by Christopher Foo <[email protected]>
Ignore more incorrect flickr URLs

7e68ff51e2c62e7736b081b121c252b0ae2fd032 authored over 10 years ago by Ivan Kozik <[email protected]>
Fix google finance ignore

59c5f41a43d894eedf6ec19e5e9934e71a1e8574 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another Icecast site

215db966e5a464f1c8c818d91753c6fddc9a3790 authored over 10 years ago by Ivan Kozik <[email protected]>
Fix flickr rule

b6b5b128f0b1c9fdb5fd17ebe1d9bf28e72df101 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore incorrect flickr URLs found by wpull

69b9425711f683a94b2666d87a962da6c2eb6ece authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore webcam streams

7367a30cdae333acecc48eb33a7d29f016826c46 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore Google finance pages that wpull finds

Consider removing this after page requisites of page requisites/linked pages are not grabbed

c14553a1bd35a1b56f95a9943b4bdbb61e72519a authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another mp3 streaming site

6c3f01b5cd6410f7a5e07b489c3dc4a930638da0 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore more share links

91d3dda8d3b886394a29178e59af6f6fa2febb35 authored over 10 years ago by Ivan Kozik <[email protected]>
doc: correct typo in "ignorereports"

ab4f38d7169aa406944f24721028489dd1d305b8 authored over 10 years ago by David Yip <[email protected]>
Merge pull request #102 from ArchiveTeam/issue/100-doc

Add Sphinx documentation. Convert COMMANDS to doc/commands.rst.

61d4ca07af9d0168447aa50dbac566fb3b575f85 authored over 10 years ago by yipdw <[email protected]>
Add Sphinx documentation. Convert COMMANDS to doc/commands.rst.

RE: ArchiveTeam/ArchiveBot#100

5e7f3f85eab14f34154dc35d37ff00aaecd9140b authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'master' into dupes-db

3a017298761d950b8762ca86be4560f34fc0a614 authored over 10 years ago by Ivan Kozik <[email protected]>
Drop data: URLs before any ignore checking or logging

Else you sometimes get massive data: URLs in your dashboard

bf17c2f9e79d7193ab1181773aac1bc9a6d8dd89 authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore another default gravatar

28405fd05943be94f17a63fc7570d2e2e10f5016 authored over 10 years ago by Ivan Kozik <[email protected]>
Pass a --dupes-db for ludios/wpull@hax-6

2c2df48886f5e736b610e0f457ee56a242e158c8 authored over 10 years ago by Ivan Kozik <[email protected]>
Add nosortedindex ignore set

9d759734e94163f44a76ae8d9dbb48fc14c249dd authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore linkedin loop

Remove this when wpull has dupe detection

0a436e89d7f992d1431061862bb1e3d4562e75b9 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' into next

3d43002aa8f1946d1ade65d125bbb1c79ceab2a5 authored over 10 years ago by Ivan Kozik <[email protected]>
Remove link to dashboard2

d93a16c0785c35a17758603dfe9fa050665fed63 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge remote-tracking branch 'origin/master' into next

Conflicts:
dashboard/assets/javascripts/controllers.js.coffee
dashboard/assets/javascripts/mod...

0665685824acfb0777b7a43eb1a7610420f16b66 authored over 10 years ago by David Yip <[email protected]>
Update Gemfile.lock

d08e4593afbd9723fe98333bac05691f3b41c3f0 authored over 10 years ago by Ivan Kozik <[email protected]>
Fix ws:// and /logs/recent URLs

ee25b653b52b5ccaa50d33f9f14c4dcfe6a2eb73 authored over 10 years ago by Ivan Kozik <[email protected]>
Move the new dashboard into place and remove the old dashboard

ff64febeadf0af8055ec8ac1c1fbccb300b16fdd authored over 10 years ago by Ivan Kozik <[email protected]>
Copy links to RSS/Atom feeds and favicon from old dashboard

c5cfe8477bdd064864e4ac54bfda21bc1d355a6a authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' of https://github.com/ludios/dashboard2

357a86c3c8e9841ff81965aaa3e235cc46a44708 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge branch 'master' into next

Conflicts:
pipeline/requirements.txt

534bb622c09d6fad7c5812d3f22e06bd7106252c authored over 10 years ago by David Yip <[email protected]>
Ignore another share link

517efaccac5338a41c9e8f2ea16b4f429f6342c4 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge remote-tracking branch 'origin/master'

eabd848a0a419a1cdc2a2b1ab00424f2de007758 authored over 10 years ago by David Yip <[email protected]>
Describe --no-offsite-links. #90.

0d4f68111820e2b10d3828c6c6cc7a87f821f0b9 authored over 10 years ago by David Yip <[email protected]>
Bump pipeline version.

5929b1bfe6bd057db548eb1f21c90238992f2616 authored over 10 years ago by David Yip <[email protected]>
Remove duplicate span-hosts-allow argument. #90.

e538b6d313a1a9c7e8f9602c3703824d3b8c9aa1 authored over 10 years ago by David Yip <[email protected]>
Teach the bot about --no-offsite-links. #90.

994a45f146ca4d880b04f4cca8b5630369ce6154 authored over 10 years ago by David Yip <[email protected]>
Use relative imports for the Lua pattern conversion test.

This gets them running using nosetests under Python 3.4.

8b6719f936a11d783b20f5c8a1348c76da9fbea8 authored over 10 years ago by David Yip <[email protected]>
Teach the pipeline a no_offsite_links item attribute. #90.

Also disable span-hosts-allow=linked-pages in !ao mode and add a test
suite for wpull argument g...

a78e5c0549aafecc2f4a86c50dee41ab3dfeb340 authored over 10 years ago by David Yip <[email protected]>
Revert "Also ignore pages on np.reddit.com"

These pages are most likely comments linked from a www.reddit.com
page, and we don't want to ign...

f2bae7db0765910644474ae2f9990ec5c7e69b88 authored over 10 years ago by Ivan Kozik <[email protected]>
Also ignore pages on np.reddit.com

18d84ee3c73c54593df96a2d4740c12b4bf2a7f1 authored over 10 years ago by Ivan Kozik <[email protected]>
Add !igoff, !igon commands.

Added after I accidentally typed !igoff and decided that maybe adding it
wasn't such a bad idea.

8db6c063b0ce16b5b61e77f21974b57fec47f7b6 authored over 10 years ago by David Yip <[email protected]>
Ignore another Icecast site

7290a706856d65d02d64d49573e8e24ff549bc6d authored over 10 years ago by Ivan Kozik <[email protected]>
javascript: links don't work with target=_blank in IE11, so don't use one

bc374f02a38c98731e11b0c7829d629c5dcb5ca8 authored over 10 years ago by Ivan Kozik <[email protected]>
Merge pull request #97 from mback2k/topic/job_status

Show concurrency level and delay in job status report

954501751603275be440df6d3882cebc458183c3 authored over 10 years ago by yipdw <[email protected]>
bot: Show concurrency level and delay in job status report

dd030951bfd815d465b19a4b3270f7d344f4fc73 authored over 10 years ago by Marc Hoersken <[email protected]>
Format some long lines

0761ceb0ac9296057eb8dd231b76069f35c04854 authored over 10 years ago by Ivan Kozik <[email protected]>
Mobile devices have less memory, so give them fewer history lines

dd64f32f058688858036538eff7f5640e1fc0874 authored over 10 years ago by Ivan Kozik <[email protected]>
Can't hurt to actually have an <html> tag

b379fe1163b1fd78a499ae4a29f03ddcf278f681 authored over 10 years ago by Ivan Kozik <[email protected]>
Remove some unnecessary checks

7da6cf837ecca2fcf019639115999893b296eddc authored over 10 years ago by Ivan Kozik <[email protected]>
Refactor

f4dbb854fd85b5616826a984731f31a09221237f authored over 10 years ago by Ivan Kozik <[email protected]>
Update stats on any message type; show ignore suppression status

ab1844fe9b03cc5de085adf4090ba46ff0a3896a authored over 10 years ago by Ivan Kozik <[email protected]>
Minor tooltip tweak

767131330fc3339ecceae40415d4deadfd0afb2a authored over 10 years ago by Ivan Kozik <[email protected]>
Provide mouseover tooltip for more info about start date or request counts

8fd1c9f907fc39f9cb3afd3eb611dd7fc05493f8 authored over 10 years ago by Ivan Kozik <[email protected]>
Mention ignore sets and GitHub links

0637fa432760e0c448bf9dfca6af24ec27d08506 authored over 10 years ago by Ivan Kozik <[email protected]>
Add some help

658dc6b604cb64ad191a7455b25e6dc72a7b7f33 authored over 10 years ago by Ivan Kozik <[email protected]>
Decrement jobs counter when a job finishes

e79b0e9e4dd9ffc74f7b507ad3f7293a224fde7f authored over 10 years ago by Ivan Kozik <[email protected]>
Cachebreak the request to /recent to avoid possible AJAX caching in Firefox

9584768033b8c28a54d4e498a1eb2077d0f4d2e6 authored over 10 years ago by Ivan Kozik <[email protected]>
Clarify JavaScript notice

588b163564262777d353d5235207f0723f203873 authored over 10 years ago by Ivan Kozik <[email protected]>
Nicer delay representation

29aa1f24b309790048b75ddbc6e8d0bc0f77faf1 authored over 10 years ago by Ivan Kozik <[email protected]>
Show # of connections, delay

cae0e663e798f7d93bc7a2ad490fba733f5bdb3e authored over 10 years ago by Ivan Kozik <[email protected]>
Don't show unnecessary horizontal scrollbars

862420b4e13dc2737fbc71bccd9bd688828f2eff authored over 10 years ago by Ivan Kozik <[email protected]>
Adjust padding

866285170cf88ba1c34f068945835485707033a3 authored over 10 years ago by Ivan Kozik <[email protected]>
Use padding instead of space character, to avoid messing up text copies

f526e8d50750dc90d5f966e95630026046625ddf authored over 10 years ago by Ivan Kozik <[email protected]>
Show ignores

595de9f317c5bd3cbb82a52b136df500c32e693a authored over 10 years ago by Ivan Kozik <[email protected]>
Ignore /navbar.g because wpull doesn't decode the URL properly

14c2d88ff3ea55f4118046a61cb328c0d6b33318 authored over 10 years ago by Ivan Kozik <[email protected]>
Switch to ludios/wpull@hax-3.

This is https://github.com/chfoo/wpull @ develop with some debugging
additions and database fsyn...

414806825a6acff8a6ffd15f004144a174a87852 authored over 10 years ago by David Yip <[email protected]>