Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
https://github.com/ArchiveTeam/grab-site

Pin ludios_wpull dependency to version 3.0.9

8016234f3b0cac0e275137c977d6d177dbc34f2a authored 12 months ago by JustAnotherArchivist <[email protected]>
README: install on NixOS: this _can_ be run as root

dfb99dfdcd8b68d31612ab493044244f4f455f28 authored about 1 year ago by Ivan Kozik <[email protected]>
README: nixpkgs 22.11 -> 23.05

edca2cda84be00ae19c39132dc2bc7febb34e05a authored about 1 year ago by Ivan Kozik <[email protected]>
README: verify the install steps on Ubuntu 22.04; remove EOL'ed Ubuntu 16.04 and Debian 9 (stretch)

80919db0acd6a68d351561711ac8e7429fd5facb authored about 2 years ago by Ivan Kozik <[email protected]>
README: Python 3.8.13 -> 3.8.15

f6a73c1c56d69cda711e76b182392271ebb97a29 authored about 2 years ago by Ivan Kozik <[email protected]>
README: nixpkgs 22.05 -> 22.11

98dcecc212cc9bb52ff9a489b54e37628d708abe authored about 2 years ago by Ivan Kozik <[email protected]>
Revert "README: add nix-env install steps for installing the latest version of grab-site"

This reverts commit 32b127067e92d2c2d3aa8286cdd6ff4abe9c0955.

fc6ced917fbde8cebc6f9deff9c9da5b0a1a5356 authored about 2 years ago by Ivan Kozik <[email protected]>
README: tweak BrowserStack message

b297c9e28009730b39a7bb95be781a1b0dceed7f authored over 2 years ago by Ivan Kozik <[email protected]>
Revert "README: remove BrowserStack mention"

This reverts commit e45f6f5b97f6104693c9b35bd6f98d92c9a200c0.

eed00f234adb62dadfb61498eb14fb5061f56c24 authored over 2 years ago by Ivan Kozik <[email protected]>
README: remove old tmux 2.1 note

b659255c1c08eafcea7cedc10c5fe17d9274ef4c authored over 2 years ago by Ivan Kozik <[email protected]>
README: add nix-env install steps for installing the latest version of grab-site

32b127067e92d2c2d3aa8286cdd6ff4abe9c0955 authored over 2 years ago by Ivan Kozik <[email protected]>
2.2.7

a2e49a1d9656f504b3958de8261d64d6c5ab897d authored over 2 years ago by Ivan Kozik <[email protected]>
Update Firefox UA

077d61af03bb3776de8ebc68486270210db4f41e authored over 2 years ago by Ivan Kozik <[email protected]>
2.2.6

951522ce650e27a5a0e24ea2375f5f882b805a38 authored over 2 years ago by Ivan Kozik <[email protected]>
README: update link to ludios_wpull

f04e5f61662450e0c16f726859fad0f0c39e76d6 authored over 2 years ago by Ivan Kozik <[email protected]>
README: document --no-global-igset

5ee2132472f1502c63c8951977e8f8b213acbf2b authored over 2 years ago by Ivan Kozik <[email protected]>
Fix: when there are no ignores, ignore nothing rather than everything

d2bf2844dc7ed18693ced7d9c4e51686d5dc42c1 authored over 2 years ago by Ivan Kozik <[email protected]>
Don't crash when igsets file is empty

This fixes:

Traceback (most recent call last):
File "gs-venv/lib/python3.8/site-packages/libg...

a178d76ea720f77e4e06a2406585c04f468a7164 authored over 2 years ago by Ivan Kozik <[email protected]>
Add --no-global-igset for starting a crawl without the "global" ignore set

4994331eea218f2cd11725857c05dd356cbfc4db authored over 2 years ago by Ivan Kozik <[email protected]>
2.2.5

0e67d79ae70b2b3bf41e99c3f20dcbdae5c3f21d authored over 2 years ago by Ivan Kozik <[email protected]>
global igset: don't ignore Wikipedia thumbnails

If you would like to keep using this ignore, add it to a file and use --import-ignores=FILE

09b26c88fd8d548ec9a20aaec40fcc5129ae6fa1 authored over 2 years ago by Ivan Kozik <[email protected]>
2.2.4

a8538b01184825b19934651a67fcf637377f632e authored over 2 years ago by Ivan Kozik <[email protected]>
Merge pull request #222 from JustAnotherArchivist/warc-header-gs-version

Record grab-site version in WARC headers

df06e1441579672e0d4dcb2f660ab4465d4e757d authored over 2 years ago by Ivan Kozik <[email protected]>
README: nixpkgs 21.11 -> 22.05

cb477e68a5d6d2d82f22de6c05b5517a784973e5 authored over 2 years ago by Ivan Kozik <[email protected]>
Record grab-site version in WARC headers

a2ca38053450cc7ea9656f6a935ad736839029d1 authored over 2 years ago by JustAnotherArchivist <[email protected]>
2.2.3

9e6e95b5132bcfe41a9199a0d864258b0dda967a authored over 2 years ago by Ivan Kozik <[email protected]>
gs-server: fix RuntimeError: To use txaio, you must first select a framework with .use_twisted() or .use_asyncio()

Caused by an updated autobahn, probably.

Fixes https://github.com/ArchiveTeam/grab-site/issues/220

4f2526dbc6f4e0e3725b4a16f7463bbc03fb397f authored over 2 years ago by Ivan Kozik <[email protected]>
README: Python 3.8.12 -> 3.8.13

24a67521ef6793ed72989b87680699e27ae7e8e6 authored over 2 years ago by Ivan Kozik <[email protected]>
README: Debian install: add packages `wget ca-certificates` for the subsequent steps

53899380a956cd3c3009e080aa1c79ed65192058 authored over 2 years ago by Ivan Kozik <[email protected]>
README: add a note

9987b25af9dbd4b9a6ca852bf031ce62a9a12bc2 authored over 2 years ago by Ivan Kozik <[email protected]>
README: fix anchor link

87c725a3336387cfcea0f9ad586ae4d5d8eb54e0 authored over 2 years ago by Ivan Kozik <[email protected]>
README: remove the Nix-based macOS install because it fails due to Yapsy test failures

https://github.com/ArchiveTeam/grab-site/issues/218

20e5fef01d30ae529da1b99ef5c91e78be01e9ec authored over 2 years ago by Ivan Kozik <[email protected]>
README: macOS Nix-based install: no need to edit shell startup files yourself now

3d2699fb2f32601ca6ae68a7b94d0aebacbb479b authored over 2 years ago by Ivan Kozik <[email protected]>
README: update the macOS Nix-based install

81995d67c2abf64d03beeedb3c8ad1ddeb7289e8 authored over 2 years ago by Ivan Kozik <[email protected]>
README: fix macOS homebrew-based install: update for Python 3.8 and M1 Macs

fe006e4fe19e4ce53f4ce8710249013cf357b330 authored over 2 years ago by Ivan Kozik <[email protected]>
README: nixpkgs 21.05 -> 21.11

14c3bbdf7156a923c3b5baeef60b4e9fa3ef363c authored almost 3 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.11 -> 3.8.12; mention Debian 11

5a306b69173c3af45e53380de44802c6582f0286 authored almost 3 years ago by Ivan Kozik <[email protected]>
README: have consistency among the 'and then restart your shell' text

41a77f0812d2c9365454dd6850096ef2f2d5f498 authored almost 3 years ago by Ivan Kozik <[email protected]>
README: tabs before spaces

d696604509bd8f708066bff90f2ef540a2a188f8 authored almost 3 years ago by Ivan Kozik <[email protected]>
Update README.md

Updated as requested.

865aae75c91df7b4e026fed76124e31701b0778c authored almost 3 years ago by Preservation-Quest <[email protected]>
Update README.md

9f6e012c833044f8f93b12b75de3546c65ead8f7 authored almost 3 years ago by Preservation-Quest <[email protected]>
Update README.md

9a3e5772950f8040ad32fce83c2037e1e6a33837 authored almost 3 years ago by Preservation-Quest <[email protected]>
README: document --wpull-args=--no-warc-compression

6269289a2ca874bae52f116016ca54dc8887d0cc authored about 3 years ago by Ivan Kozik <[email protected]>
Merge pull request #203 from TheTechRobo/simplemachineforums-igsets

6f4d435bf2c3dc9484558b21574e028c93a51683 authored about 3 years ago by Ivan Kozik <[email protected]>
Remove printpage from forum igset

f0df3737014d89c5166542647141ca976a904324 authored about 3 years ago by TheTechRobo <[email protected]>
Add SimpleMachineForum ignores to `forums` igset

I will test it on an SMF.

a376f67130da5c14d36b715f5970c4564752b728 authored about 3 years ago by TheTechRobo <[email protected]>
README: fix the nix-based install steps to use release-21.05 because master has the incompatible sqlalachemy 1.4

4fbf6469a19e352a3c09c9731e85261a68ea703b authored over 3 years ago by Ivan Kozik <[email protected]>
README: mention Ubuntu 20.04

4fec250a32d26a47c4f2d0d164b4cabf71edbb50 authored over 3 years ago by Ivan Kozik <[email protected]>
2.2.2

bf7da79ce5c4a4a1784df7e677c1ec9f57a70964 authored over 3 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.10 -> 3.7.11

0f9585db6fabb7b874bfbe4eb7ec402cb65d6cfe authored over 3 years ago by Ivan Kozik <[email protected]>
Use ludios_wpull 3.0.9 to fix https://github.com/ArchiveTeam/ludios_wpull/issues/16

b5962676bede4d2f5fb22da8beb1755cb946f0f5 authored over 3 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.9 -> 3.7.10

fe3cc6ab1465f4c8ea1f83255187085c43a95e0c authored over 3 years ago by Ivan Kozik <[email protected]>
2.2.1

2e3be5dd294947a797b118891ab1f16ddcbdad27 authored over 3 years ago by Ivan Kozik <[email protected]>
Use ludios_wpull 3.0.8 to fix https://github.com/ArchiveTeam/grab-site/issues/181

cb3068c0ca38f0293f6f44cd102336d8848c7e55 authored over 3 years ago by Ivan Kozik <[email protected]>
README: Debian-based install: Python 3.7.8 -> 3.7.9

132064a24eeedbad2881128f932fca8b0c56ac64 authored almost 4 years ago by Ivan Kozik <[email protected]>
README: macOS: update maximum version

8c5681c726c82cd00c337edd02bc928c5039b1ee authored almost 4 years ago by Ivan Kozik <[email protected]>
README: remove the note about the macOS-specific lxml bug because I can't repro it

d25e419fb6bd637975fe86a1973ffe13a9e0be57 authored almost 4 years ago by Ivan Kozik <[email protected]>
README: macOS: mention .zshrc

f1e6ae4cd1a5ab8e7d9de987a18755b00bcdcd3f authored almost 4 years ago by Ivan Kozik <[email protected]>
README: update both the Homebrew and Nix install steps on macOS

ce93f62a9aaaa1f1e05d586bba6a40a470211848 authored almost 4 years ago by Ivan Kozik <[email protected]>
README: link to better WARC playback tools

be38f488afe1bade46a147ad07582e9e62337003 authored over 4 years ago by Ivan Kozik <[email protected]>
2.2.0

12e798b07578e351142b0e7e38a3f6f017f87b3d authored over 4 years ago by Ivan Kozik <[email protected]>
wpull_hooks: compile combined ignore with re if re2 fails

087e14517505057ceea3e0aff891dfd9529ec1b8 authored over 4 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.7 -> 3.7.8

e095e1c5b30590a634685c56afffacd520ef36d8 authored over 4 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.5 -> 3.7.7

8343916e3c9500074c5f5a46ee60ad3f75bba775 authored over 4 years ago by Ivan Kozik <[email protected]>
2.1.19

8365a9d5f5e2b9d45f812fd9dd6c5ebab7fe3874 authored over 4 years ago by Ivan Kozik <[email protected]>
Update Firefox UA

4032234758e72a8d465263e70a57a90093b698aa authored over 4 years ago by Ivan Kozik <[email protected]>
2.1.18

e98006f8d044fa12f2eddabe2d263c3951a46df8 authored about 5 years ago by Ivan Kozik <[email protected]>
Update Firefox UA

35f6b6bdaebe788f4eef03305fd3763017ea30a4 authored about 5 years ago by Ivan Kozik <[email protected]>
Mention that --finished-warc-dir takes an absolute path

7dfd0faa37f46e4d2344eadce8a2172cc3328661 authored about 5 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.3 -> 3.7.5

e22dadb759035dace9b5b521d7b3577cb8064be4 authored about 5 years ago by Ivan Kozik <[email protected]>
README: add note about having to run apt-get again

Experienced this odd problem on Ubuntu 19.10 on digitalocean:

root@ctest:~# sudo apt-get update...

d418ba4d9b0712667f15dacd48dbf35de71d8325 authored about 5 years ago by Ivan Kozik <[email protected]>
2.1.17

02343335b08b100798266751d8dd00d86a5e12ca authored about 5 years ago by Ivan Kozik <[email protected]>
dashboard: use wss:// if we're behind a reverse proxy serving with https://

e5637ca3b1512ddc8c0e953acc444a25be65b22f authored about 5 years ago by Ivan Kozik <[email protected]>
Use `--no-binary lxml` now that newer pip broke `--no-binary`

File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/pip/_internal/cli/cmdopt...

ecea5ec002e4b48fbbed9895a615c4bde2121935 authored about 5 years ago by Ivan Kozik <[email protected]>
README: link to macOS bug

5024832257e5f2b5281f9b989736f729714d295b authored over 5 years ago by Ivan Kozik <[email protected]>
2.1.16

7fb9eecbe2ff359d8f7eec421e607df1eb39b20c authored over 5 years ago by Ivan Kozik <[email protected]>
Don't retry initial URLs forever

Thanks to JAA for the diagnosis and fix.

Fixes #154
Fixes #129

6f48d929deab1ad574c32227388a7c3f0e4467ce authored over 5 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.2 -> 3.7.3

5e75c56a7d6ee405083b2f0c3534d67b2208edd8 authored over 5 years ago by Ivan Kozik <[email protected]>
2.1.15

bfe7cae5f821ae9cbeb53a6e6b9d5343e6a4f8b0 authored over 5 years ago by Ivan Kozik <[email protected]>
default_cookies.txt: skip the age gate on saidit.net

e9ebab6f9394ea865d0d5f55dd670d9fe5e1e827 authored over 5 years ago by Ivan Kozik <[email protected]>
README: fix macOS Homebrew install steps

Fixes https://github.com/ArchiveTeam/grab-site/issues/150

985e84abf55200280b0b5b9ab1d4b07f6c3b1194 authored over 5 years ago by Ivan Kozik <[email protected]>
2.1.14

33b5a98c7d2957bd0666baa75a05f90f644001b1 authored almost 6 years ago by Ivan Kozik <[email protected]>
setup.py: https://github.com/ludios/wpull -> https://github.com/ArchiveTeam/ludios_wpull

d33a42171d19e271ff3063382ad2110c153f9bfd authored almost 6 years ago by Ivan Kozik <[email protected]>
README: ludios/grab-site -> ArchiveTeam/grab-site

86280b6dcd46926287041362350bf22326106f63 authored almost 6 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.1 -> 3.7.2

e67f24c800fa15ee829e4b81cb0445c253a00817 authored almost 6 years ago by Ivan Kozik <[email protected]>
2.1.13

bebeb4c5797d3bae13aba88fbeb7b0863ff0c18c authored almost 6 years ago by Ivan Kozik <[email protected]>
Merge pull request #147 from Fusl/fix-145

Remove use of now-unavailable --process-dependency-links

502eccb3eb8fd9c89dceeea009a5f02fd50871ab authored almost 6 years ago by Ivan Kozik <[email protected]>
removed references for --process-dependency-links

fe814bf294e4dc3106da062e58c50eadfb713703 authored almost 6 years ago by Katie Holly <[email protected]>
Reference git repo in install_requires

Fixes #145

bb020e741d9cf0198e38ca58b1f6c2aee6a99fd0 authored almost 6 years ago by Katie Holly <[email protected]>
README: tweak indentation

5cbce99e64d0f690c8554a0e6632c0b8f9a877a1 authored about 6 years ago by Ivan Kozik <[email protected]>
README: update the upgrade steps

ab8009e0767f402e8ac795505b5e366daacbe565 authored about 6 years ago by Ivan Kozik <[email protected]>
README: add alternate Nix-based install steps for macOS

0dbfdec3ad23bc675a6bd7488a2b23bc5aea8133 authored about 6 years ago by Ivan Kozik <[email protected]>
README: update NixOS & other distribution install instructions

434ee19be026a6977658b0f2112bd36951feb316 authored about 6 years ago by Ivan Kozik <[email protected]>
Add back the tumblr Googlebot UA instruction now that it works again; bump Firefox UA

a9727902cd5e126c6d2cb8453dc10b9298cd76d2 authored about 6 years ago by Ivan Kozik <[email protected]>
2.1.11

d5b7aa67171a0186385a720dd0f0c140bd37f9ca authored about 6 years ago by Ivan Kozik <[email protected]>
Fix tests/online-tests

e7b99e761e1042cbf1faded33a9e72cf3c090d97 authored about 6 years ago by Ivan Kozik <[email protected]>
Make online tests use localhost only, and factor out tests that require no networking at all

2524e9c5e7b41743531767e43e3b121b7c240480 authored about 6 years ago by Ivan Kozik <[email protected]>
2.1.10

8d1402e410b784873091f025f458bfd6324d5300 authored about 6 years ago by Ivan Kozik <[email protected]>
Tweak .editorconfig

9d121a80afa792ee211645f619171a27c4c23a34 authored about 6 years ago by Ivan Kozik <[email protected]>
README: Python 3.7.0 -> 3.7.1

ac2c5d97eee3d0603189b6cf5c3a5d3682493930 authored about 6 years ago by Ivan Kozik <[email protected]>