Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/ArchiveTeam/grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
https://github.com/ArchiveTeam/grab-site
Can Grab-site be used in W7 with pip?
Snippet24816 opened this issue about 1 month ago
Snippet24816 opened this issue about 1 month ago
Getting 502 Bad Gateway Errors
syberphunk opened this issue 3 months ago
syberphunk opened this issue 3 months ago
Fix FB-RE2 build error in setup.py
dannypage opened this pull request 6 months ago
dannypage opened this pull request 6 months ago
Failed building wheel for fb-re2
mariospicross opened this issue 10 months ago
mariospicross opened this issue 10 months ago
Dashboard
drzo opened this pull request 10 months ago
drzo opened this pull request 10 months ago
xFormers Support?
Astra060 opened this issue 11 months ago
Astra060 opened this issue 11 months ago
Fallback to re if re2 can't be imported
rebane2001 opened this pull request 11 months ago
rebane2001 opened this pull request 11 months ago
Fix --which-wpull-command not working correctly with certain paths
rebane2001 opened this pull request 11 months ago
rebane2001 opened this pull request 11 months ago
fb-re2 dependency clang compile error on macOS Sonoma
xor-gate opened this issue about 1 year ago
xor-gate opened this issue about 1 year ago
Is it possible to crawl only a domain and its subdomains?
ghost opened this issue about 1 year ago
ghost opened this issue about 1 year ago
Support python 3.9-3.12
fruzitent opened this pull request about 1 year ago
fruzitent opened this pull request about 1 year ago
Add instructions for when using nix profiles
tripleo1 opened this pull request about 1 year ago
tripleo1 opened this pull request about 1 year ago
Add instructions for when using nix profiles
tripleo1 opened this pull request about 1 year ago
tripleo1 opened this pull request about 1 year ago
Grab site is not actually compatible with python 3.8
cenodis opened this issue over 1 year ago
cenodis opened this issue over 1 year ago
is it possible to output regular files instead of warc?
ftc2 opened this issue over 1 year ago
ftc2 opened this issue over 1 year ago
grab-site not displaying any content on Port 29000, but installed and running
DominicBilke opened this issue almost 2 years ago
DominicBilke opened this issue almost 2 years ago
Add upload option
upintheairsheep opened this issue about 2 years ago
upintheairsheep opened this issue about 2 years ago
Debian/Ubuntu install instructions fail on Raspbian
Billybangleballs opened this issue about 2 years ago
Billybangleballs opened this issue about 2 years ago
Add a --no-global-igset option
ivan opened this issue over 2 years ago
ivan opened this issue over 2 years ago
Can't grab Wikimedia thumbnails, even when global is removed from igset file
BrinBellway opened this issue over 2 years ago
BrinBellway opened this issue over 2 years ago
Record grab-site version in WARC headers
JustAnotherArchivist opened this pull request over 2 years ago
JustAnotherArchivist opened this pull request over 2 years ago
Log settings changes and ignores
JustAnotherArchivist opened this pull request over 2 years ago
JustAnotherArchivist opened this pull request over 2 years ago
RuntimeError: To use txaio, you must first select a framework with .use_twisted() or .use_asyncio()
PadraigEire opened this issue over 2 years ago
PadraigEire opened this issue over 2 years ago
No messege on Dashboard
CircleCrop opened this issue over 2 years ago
CircleCrop opened this issue over 2 years ago
Nix-based macOS install does not work because of failing Yapsy tests
ivan opened this issue over 2 years ago
ivan opened this issue over 2 years ago
install error in macOS Catalina
LeeBinder opened this issue over 2 years ago
LeeBinder opened this issue over 2 years ago
Update macOS install script to reflect Python 3.8.x (rather than 3.7)
LeeBinder opened this issue over 2 years ago
LeeBinder opened this issue over 2 years ago
Make it work again in Python 3.10
iacore opened this pull request almost 3 years ago
iacore opened this pull request almost 3 years ago
Syntax Error on run
trentwiles opened this issue almost 3 years ago
trentwiles opened this issue almost 3 years ago
Should we add an anti-porn igset?
TheTechRobo opened this issue almost 3 years ago
TheTechRobo opened this issue almost 3 years ago
Dubious quickmod2 SMF forum ignore
TheTechRobo opened this issue almost 3 years ago
TheTechRobo opened this issue almost 3 years ago
README: remove outdated "non-SMF forums"
TheTechRobo opened this pull request almost 3 years ago
TheTechRobo opened this pull request almost 3 years ago
Resuming a WARC after hard "No space left on device" error message?
Preservation-Quest opened this issue about 3 years ago
Preservation-Quest opened this issue about 3 years ago
Update README.md
Preservation-Quest opened this pull request about 3 years ago
Preservation-Quest opened this pull request about 3 years ago
multiple --wpull-args
TheTechRobo opened this pull request about 3 years ago
TheTechRobo opened this pull request about 3 years ago
How do you add custom hooks now?
TheTechRobo opened this issue about 3 years ago
TheTechRobo opened this issue about 3 years ago
Pause gracefully if OSError (No space left on device)
TheTechRobo opened this issue about 3 years ago
TheTechRobo opened this issue about 3 years ago
Cloudflare-protected site responds with 503 Service Temporarily Unavailable
rmfkdehd opened this issue about 3 years ago
rmfkdehd opened this issue about 3 years ago
Add some Tumblr ignores to global igset
TheTechRobo opened this issue about 3 years ago
TheTechRobo opened this issue about 3 years ago
Add SimpleMachineForum ignores to `forums` igset
TheTechRobo opened this pull request about 3 years ago
TheTechRobo opened this pull request about 3 years ago
On the dashboard, make the background colour ACTUALLY a background colour
TheTechRobo opened this issue about 3 years ago
TheTechRobo opened this issue about 3 years ago
Add SimpleMachineForums igsets
TheTechRobo opened this issue about 3 years ago
TheTechRobo opened this issue about 3 years ago
No module named 'autobahn'
vitacell opened this issue about 3 years ago
vitacell opened this issue about 3 years ago
Backslash to Forward slash correction
acrois opened this issue over 3 years ago
acrois opened this issue over 3 years ago
Fix ludios_wpull to support SQLAlchemy 1.4
ivan opened this issue over 3 years ago
ivan opened this issue over 3 years ago
Dupe spotter user-defined list of expressions / separation of default dupe spotter expressions
acrois opened this issue over 3 years ago
acrois opened this issue over 3 years ago
Error while starting a crawl in docker container
Z2Up1UwcaYOyZq opened this issue over 3 years ago
Z2Up1UwcaYOyZq opened this issue over 3 years ago
Full Docker support
acrois opened this pull request over 3 years ago
acrois opened this pull request over 3 years ago
infinite recursion on offsite links?
TheTechRobo opened this issue over 3 years ago
TheTechRobo opened this issue over 3 years ago
Ignore errors and keep crawling
TowardMyth opened this issue over 3 years ago
TowardMyth opened this issue over 3 years ago
Project Evolution
acrois opened this issue over 3 years ago
acrois opened this issue over 3 years ago
What does the ID do?
TheTechRobo opened this issue over 3 years ago
TheTechRobo opened this issue over 3 years ago
Document `--wpull-args=--no-warc-compression`
TheTechRobo opened this pull request over 3 years ago
TheTechRobo opened this pull request over 3 years ago
Change settings mid-crawl
TheTechRobo opened this issue over 3 years ago
TheTechRobo opened this issue over 3 years ago
Grab-site gets only a single page
mathuryash5 opened this issue over 3 years ago
mathuryash5 opened this issue over 3 years ago
Cookies not staying
TheTechRobo opened this issue over 3 years ago
TheTechRobo opened this issue over 3 years ago
clearer error when URL is invalid
TheTechRobo opened this pull request over 3 years ago
TheTechRobo opened this pull request over 3 years ago
My computer crashed. I'm 10gb into a crawl. How can I "resume" this crawl?
komali2 opened this issue over 3 years ago
komali2 opened this issue over 3 years ago
Ignore local/lan-only hosts (and invalid domains).
jtagcat opened this issue over 3 years ago
jtagcat opened this issue over 3 years ago
--no-offsite-links doesn't work
tripleo1 opened this issue over 3 years ago
tripleo1 opened this issue over 3 years ago
Dockerfile?
818S opened this issue over 3 years ago
818S opened this issue over 3 years ago
Can't evaluate Select
TheTechRobo opened this issue over 3 years ago
TheTechRobo opened this issue over 3 years ago
Update setup.py
PythonCoderAS opened this pull request over 3 years ago
PythonCoderAS opened this pull request over 3 years ago
Consider an option to generate WACZ files after a crawl is done for better replay with ReplayWeb.page
ikreymer opened this issue almost 4 years ago
ikreymer opened this issue almost 4 years ago
Ignore set: XenForo 1/2 and PostNuke forum engines
nekto-nekto opened this pull request almost 4 years ago
nekto-nekto opened this pull request almost 4 years ago
del
nekto-nekto opened this issue almost 4 years ago
nekto-nekto opened this issue almost 4 years ago
Issue-175: First pass at creating a Dockerfile for Nix that actually runs
bknowles opened this pull request about 4 years ago
bknowles opened this pull request about 4 years ago
Add a Dockerfile for running grab-site in a Nix-based container
bknowles opened this issue about 4 years ago
bknowles opened this issue about 4 years ago
Can't build lxml.etree (on macOS)
bknowles opened this issue about 4 years ago
bknowles opened this issue about 4 years ago
[wpull] 'cython_function_or_method' object has no attribute 'lower'
tempname1024 opened this issue about 4 years ago
tempname1024 opened this issue about 4 years ago
[BUG] Twitter pages potentially not downloading correctly
Coloradohusky opened this issue about 4 years ago
Coloradohusky opened this issue about 4 years ago
Bash script for automatic upload
raspher opened this pull request about 4 years ago
raspher opened this pull request about 4 years ago
pull args for http-auth (e.g. --user --password) are ignored
mep85 opened this issue over 4 years ago
mep85 opened this issue over 4 years ago
Regexp exclusion problem
manueldeprada opened this issue over 4 years ago
manueldeprada opened this issue over 4 years ago
Change wpull args during a crawl
Coloradohusky opened this issue over 4 years ago
Coloradohusky opened this issue over 4 years ago
ImportError: cannot import name 'SSLCertificateError'
dragonxtek opened this issue over 4 years ago
dragonxtek opened this issue over 4 years ago
Make WARC files searchable
Svekla opened this issue over 4 years ago
Svekla opened this issue over 4 years ago
Any solutions for already mentioned errors: Event loop is closed / Task is destroyed?
weselow opened this issue almost 5 years ago
weselow opened this issue almost 5 years ago
Pip build missing required package?
cfcs opened this issue about 5 years ago
cfcs opened this issue about 5 years ago
cannot import name 'SSLCertificateError'
mkrzmr opened this issue about 5 years ago
mkrzmr opened this issue about 5 years ago
More intelligent protocol selection
masterX244 opened this pull request about 5 years ago
masterX244 opened this pull request about 5 years ago
--finished-warc-dir= not working for me
BradCoffield opened this issue over 5 years ago
BradCoffield opened this issue over 5 years ago
What does the error status in URL queue mean?
Phasip opened this issue over 5 years ago
Phasip opened this issue over 5 years ago
Possible to run in the cloud?
BradCoffield opened this issue over 5 years ago
BradCoffield opened this issue over 5 years ago
WSL: lmdb.CorruptedError: mdb_get: MDB_CORRUPTED: Located page was wrong type
menmob opened this issue over 5 years ago
menmob opened this issue over 5 years ago
macOS-specific lxml crash: LookupError: unknown encoding: 'b'latin1''
ivan opened this issue over 5 years ago
ivan opened this issue over 5 years ago
DNS operation timed out
nihelmasell opened this issue over 5 years ago
nihelmasell opened this issue over 5 years ago
Best way to grab this page?
sardaukar opened this issue over 5 years ago
sardaukar opened this issue over 5 years ago
Errors on initial URLs are retried forever
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
Continuing or updating a grab
nihelmasell opened this issue over 5 years ago
nihelmasell opened this issue over 5 years ago
Crawl eventually becomes nothing but "Disconnected from ws:// server:"...
BradCoffield opened this issue over 5 years ago
BradCoffield opened this issue over 5 years ago
Crash on EOFError: Compressed file ended before the end-of-stream marker was reached
ivan opened this issue over 5 years ago
ivan opened this issue over 5 years ago
Homebrew install on macOS 10.14.4 (command 'clang' failed with exit status 1)
markhdavis opened this issue over 5 years ago
markhdavis opened this issue over 5 years ago
Add simplistic Dockerfile
Fusl opened this pull request almost 6 years ago
Fusl opened this pull request almost 6 years ago
wpull crash when http_proxy is set
yi opened this issue almost 6 years ago
yi opened this issue almost 6 years ago
Reference git repo in install_requires
Fusl opened this pull request almost 6 years ago
Fusl opened this pull request almost 6 years ago
Seeking new maintainer / project owner
ivan opened this issue almost 6 years ago
ivan opened this issue almost 6 years ago
ftp:// crawls crash with AttributeError: 'ListingResponse' object has no attribute 'version'
ivan opened this issue about 6 years ago
ivan opened this issue about 6 years ago
Are there any plans on getting grab-site into the official Debian/Ubuntu software repositories?
github-userx opened this issue about 6 years ago
github-userx opened this issue about 6 years ago
dashboard: Home/PgUp/PgDn/End keys usually fail in Firefox
ivan opened this issue about 6 years ago
ivan opened this issue about 6 years ago