Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/webrecorder/browsertrix-crawler
Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler
separate fetch api for autofetch bbehavior + additional improvements on partial responses:
ikreymer opened this pull request 8 days ago
ikreymer opened this pull request 8 days ago
Autofetch behavior results in empty 200 responses
karenhanson opened this issue 18 days ago
karenhanson opened this issue 18 days ago
Possible encoding issue with some
karenhanson opened this issue 18 days ago
karenhanson opened this issue 18 days ago
Consistent aspect ratio
SuaYoo opened this issue 27 days ago
SuaYoo opened this issue 27 days ago
Youtube video format
benoit74 opened this issue 28 days ago
benoit74 opened this issue 28 days ago
Manually create a profile.tar.gz
djhmateer opened this issue 30 days ago
djhmateer opened this issue 30 days ago
Add WARC resource containing DOM tree after load
magbb opened this pull request about 1 month ago
magbb opened this pull request about 1 month ago
Autoclick Support
ikreymer opened this pull request about 1 month ago
ikreymer opened this pull request about 1 month ago
[Feature] Support for clicking on links / other elements.
ikreymer opened this issue about 1 month ago
ikreymer opened this issue about 1 month ago
package: pin @novnc/novnc to 1.4.0 to prevent accidental upgrades
ikreymer opened this pull request about 1 month ago
ikreymer opened this pull request about 1 month ago
Headful interactive profile creation broken after v1.4.0 update
halvir opened this issue about 1 month ago
halvir opened this issue about 1 month ago
Dependency Update
ikreymer opened this pull request about 1 month ago
ikreymer opened this pull request about 1 month ago
support removing range from query (via wabac.js 2.20.6):
ikreymer opened this pull request about 2 months ago
ikreymer opened this pull request about 2 months ago
some links on page not crawled
robert-1043 opened this issue about 2 months ago
robert-1043 opened this issue about 2 months ago
WARC Record Write Failed
benoit74 opened this issue about 2 months ago
benoit74 opened this issue about 2 months ago
Ensure partial responses are not written
ikreymer opened this pull request about 2 months ago
ikreymer opened this pull request about 2 months ago
add disable-lazy-loading flag, should fix #699
ikreymer opened this pull request about 2 months ago
ikreymer opened this pull request about 2 months ago
Crawler is not failing when seed page returns an HTTP error code
benoit74 opened this issue about 2 months ago
benoit74 opened this issue about 2 months ago
Dependency Update
ikreymer opened this pull request about 2 months ago
ikreymer opened this pull request about 2 months ago
Support loading custom behaviors from git repo
tw4l opened this pull request 2 months ago
tw4l opened this pull request 2 months ago
WARC-Protocol + WARC-Cipher-Suite headers
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
fix indexing of cookie header:
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
fix cookie not being passed to replay regression: for now, add x-waba…
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
Support loading behaviors from a Git repo with branch + path
ikreymer opened this issue 2 months ago
ikreymer opened this issue 2 months ago
Stop crawler when we have been hit by a WAF protection
benoit74 opened this issue 2 months ago
benoit74 opened this issue 2 months ago
tests: use old.webrecorder.net for testing
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
various edge-case loading optimizations:
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
deps: update to latest wabac
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
Support loading custom behaviors from URLs and/or filepaths
tw4l opened this pull request 3 months ago
tw4l opened this pull request 3 months ago
Browser disconnected (crashed?)
rgaudin opened this issue 3 months ago
rgaudin opened this issue 3 months ago
Issue creating profile for intranet
MRLeflei opened this issue 3 months ago
MRLeflei opened this issue 3 months ago
dep: update to wabac.js 2.20
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
Archiving particular site kills networking on host machine
vitorio opened this issue 3 months ago
vitorio opened this issue 3 months ago
Automatically crawl `<form>` URLs when `method` is `get`
benoit74 opened this issue 3 months ago
benoit74 opened this issue 3 months ago
link extraction promise cleanup:
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
bump puppeteer core to 23.5.1
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
Youtube video is not crawled when `loading="lazy"`
benoit74 opened this issue 3 months ago
benoit74 opened this issue 3 months ago
Tests: disable blockrules test in CI
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
fix typo in QA exclude check, which resulted in all URLs being excluded
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
BUG: Create Browser Profile --headless can't render at viewport
gitreich opened this issue 3 months ago
gitreich opened this issue 3 months ago
Add documentation for crawl collections
tw4l opened this pull request 3 months ago
tw4l opened this pull request 3 months ago
ensure extraHops also apply to maxDepth
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
Crawler never fetches `<img>` source image in `src` when `srcset` is activated
benoit74 opened this issue 3 months ago
benoit74 opened this issue 3 months ago
[Bug]: Crawl Configuration Inconsistency: Max Depth and Include Any Linked Page
mona-ul opened this issue 3 months ago
mona-ul opened this issue 3 months ago
Additional exception safety
ikreymer opened this pull request 3 months ago
ikreymer opened this pull request 3 months ago
Include depth in pages JSONL files
tw4l opened this pull request 3 months ago
tw4l opened this pull request 3 months ago
Include depth in pages jsonl files
tw4l opened this issue 3 months ago
tw4l opened this issue 3 months ago
support custom css selectors for extracting links
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
direct fetch: when cancelling due to redirect, read full body
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
Brotli decompression error
rgaudin opened this issue 4 months ago
rgaudin opened this issue 4 months ago
exit codes: exit with error code 10 if interrupt is caused by unexpected browser exit
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
update current crawl size in redis on each healthcheck call
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
eslint: add strict await checking:
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
Browser Crash & Docker Exit Code
gitreich opened this issue 4 months ago
gitreich opened this issue 4 months ago
cleanup: remove old config files from pywb
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
bump browser to 1.69.162
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
crawler args typing
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
WARC writer + incremental indexing fixes
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
Additional direct fetch improvements
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
fix for direct fetch timeouts
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
Issue crawling a web property with big PDFs
benoit74 opened this issue 4 months ago
benoit74 opened this issue 4 months ago
Document crawl collection layout
tw4l opened this issue 4 months ago
tw4l opened this issue 4 months ago
Use in-place streaming to generate WACZ files
tw4l opened this issue 4 months ago
tw4l opened this issue 4 months ago
Streaming in-place WACZ creation + CDXJ indexing
ikreymer opened this pull request 4 months ago
ikreymer opened this pull request 4 months ago
Disable behaviors entirely if --behaviors array is empty
tw4l opened this pull request 5 months ago
tw4l opened this pull request 5 months ago
SOCKS5 over SSH Tunnel Support
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
SSH Socks5 Tunnel Proxy Support
ikreymer opened this issue 5 months ago
ikreymer opened this issue 5 months ago
Adds warning about crawling with basic auth
Shrinks99 opened this pull request 5 months ago
Shrinks99 opened this pull request 5 months ago
1.2.8 updates:
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
QA: Ensure empty string is propagated for comparison
tw4l opened this pull request 5 months ago
tw4l opened this pull request 5 months ago
Crawl button with javascript navigation
hamzamac opened this issue 5 months ago
hamzamac opened this issue 5 months ago
[Bug]: Pages with no text to extract should be shown as a 100% match
Shrinks99 opened this issue 5 months ago
Shrinks99 opened this issue 5 months ago
[question] Missing or timed out dynamic request to resource
wsdookadr opened this issue 5 months ago
wsdookadr opened this issue 5 months ago
A suggestion for making WACZ and WARC-requests
hamoudak opened this issue 5 months ago
hamoudak opened this issue 5 months ago
[BUG] invalid gzipped WARC
wsdookadr opened this issue 5 months ago
wsdookadr opened this issue 5 months ago
deps: update puppeteer-core to 22.14.0
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
ETA computation
wsdookadr opened this issue 5 months ago
wsdookadr opened this issue 5 months ago
deps: bump browsertrix-behaviors to 0.6.3
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
Ignore invalid URLs in redirects
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
remove crc32 computation, fixes #653
ikreymer opened this pull request 5 months ago
ikreymer opened this pull request 5 months ago
Implemented option for FullPage screenshot after the behaviours have run
fservida opened this pull request 5 months ago
fservida opened this pull request 5 months ago
Execution context was destroyed
rgaudin opened this issue 5 months ago
rgaudin opened this issue 5 months ago
Should invalid URL halt the scraping process?
rgaudin opened this issue 5 months ago
rgaudin opened this issue 5 months ago
Remove invalid crc32 calculation
ikreymer opened this issue 6 months ago
ikreymer opened this issue 6 months ago
Behavior run partially failed - Protocol error
zlodejpapiru opened this issue 6 months ago
zlodejpapiru opened this issue 6 months ago
Don't log behavior-related messages if no behaviors are set
tw4l opened this issue 6 months ago
tw4l opened this issue 6 months ago
misc tweaks:
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Make it clear that profile argument can be an HTTP(S) URL
benoit74 opened this pull request 6 months ago
benoit74 opened this pull request 6 months ago
Youtube Video Quality
fservida opened this issue 6 months ago
fservida opened this issue 6 months ago
der-postillon.com: crawler considers that scrolling is not necessary while it seems mandatory
benoit74 opened this issue 6 months ago
benoit74 opened this issue 6 months ago
Fix 206 response + general video handling
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Fix skipping of 206 responses
ikreymer opened this issue 6 months ago
ikreymer opened this issue 6 months ago
Can an AWS alternative to Access Keys be added?
jblukach opened this issue 6 months ago
jblukach opened this issue 6 months ago
Skipping URL from unknown frame
zlodejpapiru opened this issue 6 months ago
zlodejpapiru opened this issue 6 months ago
Crawler keeps signing out(?)
Azmodeszer opened this issue 6 months ago
Azmodeszer opened this issue 6 months ago
Add support for WARC-Protocol and WARC-Cipher-Suite headers
ikreymer opened this issue 6 months ago
ikreymer opened this issue 6 months ago
bump replayweb.page to 2.1.1
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
don't disable extraHops when using sitemaps:
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Loosen selectors for login fields in automated profile creation
tw4l opened this pull request 6 months ago
tw4l opened this pull request 6 months ago
"Login form could not be found"
Azmodeszer opened this issue 6 months ago
Azmodeszer opened this issue 6 months ago