Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
https://github.com/ArchiveTeam/grab-site

README: Tweak

75452363d03bae54383f5fc272c6b34de6eb1a6a authored about 8 years ago by Ivan Kozik <[email protected]>
Opt out of Chrome's misbehaving Scroll Anchoring

ec5cc3f287fe67c2437718d3d7e8ddf591d0a76d authored about 8 years ago by Ivan Kozik <[email protected]>
Bump Firefox UA

c30e92ee02f1a9d44d3970ab4a7b6a4b46656ed8 authored about 8 years ago by Ivan Kozik <[email protected]>
chmod +x gs-dump-urls

76e173f1c9e3a5e27dbf8863d45d4c93a935d403 authored over 8 years ago by Ivan Kozik <[email protected]>
chmod +x pause_resume_grab_sites.sh

957c8f8aeec00d0feb33dce3b6a7f575ac98a792 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Add getpocket.com/save

90d747d9bc688243340d3836d7425b2eedaba2c1 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Ignore another loop

9e360a8a13689c6634fd98788b530745c618ff6d authored over 8 years ago by Ivan Kozik <[email protected]>
Bump version

2136bae9be3c4ebd06d276ed2754d11c7ef4e99b authored over 8 years ago by Ivan Kozik <[email protected]>
Revert "dashboard: Tweak font stack and size"

This reverts commit 3f31e251784fa4986751ce5613ef4884fe7656ec.

8a9a0b0d9f3f8ac4c60fdd784185527698d24f01 authored over 8 years ago by Ivan Kozik <[email protected]>
Revert "dashboard: Improve alignment when using a font with variable-width numbers like San Francisco"

This reverts commit 154e99349ca4233d5233582c806719fa0564e1e8.

4ef946300f4ea62d84511c7281bb17905deb88ea authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: Improve alignment when using a font with variable-width numbers like San Francisco

154e99349ca4233d5233582c806719fa0564e1e8 authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: Tweak font stack and size

3f31e251784fa4986751ce5613ef4884fe7656ec authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: remove some icecast stations that are covered by the icecast skipper in wpull_hooks

19738bf60377981ad16cf1a03fb305ac63ee58fd authored over 8 years ago by Ivan Kozik <[email protected]>
.travis.yml: Change URL to try to get an exit code of 0

ecf9b2e717d87f511ed5c0f23428f5815c74eb39 authored over 8 years ago by Ivan Kozik <[email protected]>
Set each timeout individually and use a session-timeout of two days

(we want to avoid hanging crawls forever, but we don't want to prevent the
downloading of large f...

659b25481ed024859ab20b3a3d99f871069bd4c0 authored over 8 years ago by Ivan Kozik <[email protected]>
README: Advise downgrading tmux, not upgrading with some ppa

016a166f14588cc8aa44a566bcbb9d2544a408bd authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: fix addtoany.com ignore and use https?:// for all ignores

6685bf51a827c59561ce725712440eab63ea9cf6 authored over 8 years ago by Ivan Kozik <[email protected]>
README: Fix note; wpull 2.0.1 does work on Python 3.5

d66011e86a98b667d73d1dab9b23c4a665c180d1 authored over 8 years ago by Ivan Kozik <[email protected]>
Lock html5lib version to work around https://github.com/chfoo/wpull/issues/332

63fdb9d5c6ef61998f513f421f50e6a0ea2166c3 authored over 8 years ago by Ivan Kozik <[email protected]>
.travis.yml: Upgrade setuptools to try to fix html5lib install failing due to old setuptools

f4230097eb51d780162d835f4a821820b4d153a3 authored over 8 years ago by Ivan Kozik <[email protected]>
.travis.yml: Upgrade pip3 to try to fix html5lib install failing due to old setuptools

20bee1bdf5e77360936879bae13fdf1007704157 authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: fix '?' shortcut key

32b68e93425c7b47bd7d89f8ae48cd7f480da82c authored over 8 years ago by Ivan Kozik <[email protected]>
README: Add warning about tmux 2.1

ca7bc71045784b1cfda2a6d1d6dbc44a4000aa13 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: make libsyn ignore libsyn-specific

e7fcec6f858a32e6d36f9925834b8d49e54ab16c authored over 8 years ago by Ivan Kozik <[email protected]>
Lock wpull dependency to 1.2.3 for now

76f8b2cf488c8ba719bed8227b36c2d79901e662 authored over 8 years ago by Ivan Kozik <[email protected]>
README: bugs and questions

c313bfb2a1566cfd53c07edefe2a9154eb2b7021 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Add facebook login.php

c08f229e1daa02aaf8d1f01fe01a6f5d027fa3b8 authored over 8 years ago by Ivan Kozik <[email protected]>
Bump Firefox UA

450a5c394f1567e7a59e68c0784d50872c25926b authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Add /%3Ca%20href= pattern

febee9c85eb1421fd54b4e4941db6067c18269f5 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Add /%20https?:/ pattern

fb6e01caa777528c0222c3bd8caf32b155806537 authored over 8 years ago by Ivan Kozik <[email protected]>
Update grab-site URL in setup.py and dashboard

aa366bbb27a82155a7fcf273cd57a3b1d75a2447 authored over 8 years ago by Ivan Kozik <[email protected]>
Stop listening on legacy ws port 29001

842fab4b23ef0c6a8302449affc63592c36ca05c authored over 8 years ago by Ivan Kozik <[email protected]>
Actually install the favicon.ico

f0bb696dc80bc05027a5993d60251800e18645fa authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: Add a favicon

b3c75b0ffb02611fb52bceb0b981f629bf00a99e authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: Allow for another digit in the MB stat

7df1761bf0fe54e47f6a5727f5edadb610091a5e authored over 8 years ago by Ivan Kozik <[email protected]>
dashboard: Align the req/s stat properly

8d93776742a638884cc43b40c91af0b0ba0e1516 authored over 8 years ago by Ivan Kozik <[email protected]>
Don't raise an exception if client lacks User-Agent

38877106ef0559e4c844da7aaaec50c699cbb6cc authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: ignore amp%3Bamp%3Bamp%3B loops

fe2530e667230d46ba18912656cad284d4ec1f31 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: tumblr serves 16px avatars on https now as well

01ac84da068a51a7cc4653e254aa4db06389d508 authored over 8 years ago by Ivan Kozik <[email protected]>
global igset: Ignore instapaper share links

ffecfcabdae8f34927a4eda7eadd0ee50db68433 authored over 8 years ago by Ivan Kozik <[email protected]>
README: Fix TOC link

bd375b31f376adacc7324ca5f06265ce7762fa4a authored over 8 years ago by Ivan Kozik <[email protected]>
README: Add Ubuntu 16.04 instructions

bcb7d8832b126ab8ac25aff0f6b4183d2d45c8e1 authored over 8 years ago by Ivan Kozik <[email protected]>
README: Add note about multiple URLs

987893eeff0d7603f5a5a958d1da3e251f2cc28d authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Advise to restart gs-server after upgrading

675274296305ceec5ff87fd44093c2bb754ed06e authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

54320040617c34de9c66bf71c3e30464424a81b8 authored almost 9 years ago by Ivan Kozik <[email protected]>
grab-site 0.11

316db6eec4bb06241c2c37074d326babb2bc34a5 authored almost 9 years ago by Ivan Kozik <[email protected]>
Bump dependencies since we only test with the latest versions

3ee97f9fcd866806012ac7db45747a26bd9da6f3 authored almost 9 years ago by Ivan Kozik <[email protected]>
Merge branch 'ws-http-same-port'

49271c77e80003ac20262ad72ab54891b6cbe6ae authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Be more specific about port deprecation

f9db61259c5288bcf2cea1c4d4e0ef7424a9aeed authored almost 9 years ago by Ivan Kozik <[email protected]>
s/started/listening/

bfb866dfed47e36d5036f0c19b3aa701129016ea authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Add note about port 29001

147e6d913fa32baae1b9f390806e8f9b8e883a7b authored almost 9 years ago by Ivan Kozik <[email protected]>
Listen on port 29001 as well until everyone's old port-29001 crawls finish

fcc3a7f9046386c16120f3e8afe5e838a68999ec authored almost 9 years ago by Ivan Kozik <[email protected]>
Don't allow anyone to frame/iframe us

be8418f52c8b4fd0c2d79d57491c0a87c7a3d7f9 authored almost 9 years ago by Ivan Kozik <[email protected]>
Factor out a sendPage

0f5342b9c09afda832eb5018befe3d202ef667a8 authored almost 9 years ago by Ivan Kozik <[email protected]>
Read bytes from file and send the correct Content-Length

25b3c9c076fa81be19bc122f519d7c6fb16be9ab authored almost 9 years ago by Ivan Kozik <[email protected]>
Use \r\n instead of \x0d\x0a

7486daa33f50f79e136a462bfdcdd414bd5e6a74 authored almost 9 years ago by Ivan Kozik <[email protected]>
Fix comment

7024f1e232d4c21b460d45ce29942c299e4f6248 authored almost 9 years ago by Ivan Kozik <[email protected]>
Merge branch 'pr_76' into ws-http-same-port

c20a0f76badb3fa741367a5c88025ff0e4e3c18a authored almost 9 years ago by Ivan Kozik <[email protected]>
Update server.py

fixed errors

5da8e85d433d43f5f302ad5b1f45009c8df68510 authored almost 9 years ago by 12As <[email protected]>
Update server.py based on comments in PR #76

Changed server to handle paths, added a send404 method and fixed letter case on variables

e7e3030d4ee220623a5f9543987567d49c540679 authored almost 9 years ago by 12As <[email protected]>
Update dashboard.html, per comments in PR.

3f0c4336682ae1dd6a7ca3c823075f0bc70dfff0 authored almost 9 years ago by 12As <[email protected]>
Create 404 Not Found page

4a51ee1c8377968278ebe0f0105449614fb9a0ca authored almost 9 years ago by 12As <[email protected]>
Remove aiohttp dependency from setup.py

990f24684a6ff8dd299c288541d4316d0f601970 authored almost 9 years ago by 12As <[email protected]>
README: Expand on abuse@ complaints

21ab037cf122df94ceaf6e4e276efb9351f01995 authored almost 9 years ago by Ivan Kozik <[email protected]>
Add more tests to .travis.yml

b368a7b210ec9bb304ea54c777c238f65ed2b464 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

be685a7e65603c6e18cd0d1d612d82a17f32fbef authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

37910d0382a66d60a82e7a14b2434f5def894d79 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Document --which-wpull-args-partial and --which-wpull-command

eb6f90ad84b39f49c856fba27545abdade7ed5e1 authored almost 9 years ago by Ivan Kozik <[email protected]>
Rename --which-wpull-args-full to --which-wpull-command

506a7604efdc6bf868e3375435776861744b67f3 authored almost 9 years ago by Ivan Kozik <[email protected]>
Implement --which-wpull-args-partial and --which-wpull-args-full for figuring out which wpull arguments grab-site would use, without actually starting wpull

5805e4c155eb07f41436a0c7533c7578e20ade54 authored almost 9 years ago by Ivan Kozik <[email protected]>
Pass maybe_log_ignore and print_to_terminal as globals to custom_hooks.py as well

bda4d8cf6d491404f92ae6ce9c4d9cf81a1baec9 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Point to update_custom_hooks

bc04df2dd2196a1435aebb8cdfec1caf9831263a authored almost 9 years ago by Ivan Kozik <[email protected]>
Move pause-or-resume grab-sites script to extra_docs/

4b0740ecf6bafff7fa421dd10b2c84983ead8472 authored almost 9 years ago by Ivan Kozik <[email protected]>
Implement --custom-hooks so that users can modify wpull_hook

c37b32bd1c95a39b7af92917c20d423c26b183af authored almost 9 years ago by Ivan Kozik <[email protected]>
Update Readme.md to remove multiple port mention

e01e0a38b1c39a5d61be2341d42549afb27f0a28 authored almost 9 years ago by 12As <[email protected]>
Change default port in wpull_hooks.py

dec4150969070429fbbf0b7d21222832777d6935 authored almost 9 years ago by 12As <[email protected]>
Change default port in dashboard.html

492625a09dadff48b01b4cc6effbc8f4f9255cf3 authored almost 9 years ago by 12As <[email protected]>
Make gs-server use single port

Remove the need for gs-server to require multiple ports

53e079228d778bde99df5bd3ff9626793d7aac14 authored almost 9 years ago by 12As <[email protected]>
README: Tweak

7f426d2aadeceb555f35187045cb0ab24d9bde3c authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

557ae40982185aa33c459663103151859ed19f95 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

67d650c28840eadad459fb4a20a37dab2ad33fec authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Fix GCE name

6189c7c124781123d19aba14ed9ac74f30e3006d authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

08a90865a2d5dd4e52a7d2e81ea8c1d1ef6a76f0 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Add warnings section

e53a8a8956e10a6acd1d9dcacf39b1c0c446d2ff authored almost 9 years ago by Ivan Kozik <[email protected]>
README Move pause-resume-grab-site script to a gist

0f2d7afc8f02104584c83d25227e279859c2cd52 authored almost 9 years ago by Ivan Kozik <[email protected]>
Bump version

292682a48f176883e426bd794d48962f0f747d5c authored almost 9 years ago by Ivan Kozik <[email protected]>
gs-server: Use env instead of py3 directly, makes virtualenvs nicer

95012d1e0c233ce6ebcd448584fed45d3477b19a authored almost 9 years ago by Daniel Oaks <[email protected]>
README: Tweak

12eb6e20afec45411f18dc9adfcf8145c67f730f authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Document locale requirement on OS X

c7927a155fa4d6599d00343fb14782bec85d7ba5 authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Tweak

dbed789ff99f976bf5f8a793d11aecde37af552d authored almost 9 years ago by Ivan Kozik <[email protected]>
README: Automatically pausing grab-site processes when free disk space runs low

50ff7759ba542c9a68286ad742292ce939ecd5eb authored almost 9 years ago by Ivan Kozik <[email protected]>
Use wpull 1.2.3

7c1afbefe0471e5a71059433f0d7b45c9f8d16ff authored almost 9 years ago by Ivan Kozik <[email protected]>
Update UA

ef5137ae86d224bedbcbbdef23841f9140729738 authored almost 9 years ago by Ivan Kozik <[email protected]>
.travis.yml: a grab-site of github.com/ludios/grab-site is returning exit code 8 because img.shields.io is down, so crawl google.com instead

1e06fa0a6e442a12a50530f7564bcc57133d56f3 authored almost 9 years ago by Ivan Kozik <[email protected]>
global igset: Ignore /CSI/CSI/ loops on blogspot

7ec2f90534b924bd56281f119faf142f96442623 authored almost 9 years ago by Ivan Kozik <[email protected]>
global igset: ignore bogus /search/label/CSI/ links on blogspot

0214558d5ebca8bfb571a1a9af49873e85c4f993 authored almost 9 years ago by Ivan Kozik <[email protected]>
Mention that the defaults work for Discourse forums

192553d6b90d58e1dbe1deb17519bcabecbacebc authored almost 9 years ago by Ivan Kozik <[email protected]>
Document how to grab GitHub issues and PRs

4315cf183170f31a9887ed0db367b2ad6dae448e authored almost 9 years ago by Ivan Kozik <[email protected]>
global igset: ignore /CaptchaImage.axd

3b9f8c1a4c4c587c4184588c77bf94eea45b0c01 authored almost 9 years ago by Ivan Kozik <[email protected]>
global igset: also ignore www.digg.com/submit

dff87eba2f6748c83fbb7ea5d622b8b1f20dfe02 authored almost 9 years ago by Ivan Kozik <[email protected]>