Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/webrecorder/browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler

Make numRetries configurable

ikreymer opened this pull request 4 days ago
hang protection: wrap remaining evaluate() calls to avoid rare hangs

ikreymer opened this pull request 9 days ago
if uploading wacz files, compute waczfile name on load to be able to …

ikreymer opened this pull request 11 days ago
Add a create WACZ command after that fact.

ikreymer opened this issue 11 days ago
Apply exclusions to redirects

ikreymer opened this pull request 13 days ago
Ensure exclusions apply to pages that redirect

ikreymer opened this issue 13 days ago
Retry support and additional fixes

ikreymer opened this pull request 22 days ago
set failed URL retry to 5 by default

ikreymer opened this pull request 22 days ago
Improve add exclusion test

stavares843 opened this pull request 23 days ago
clear out core dumps to avoid using up volume space:

ikreymer opened this pull request 23 days ago
Retry Failed Pages + Ignore Hashtags in Redirect Check

ikreymer opened this pull request 23 days ago
Consider disabling core dumps

ikreymer opened this issue 24 days ago
Autofetch behavior results in empty 200 responses

karenhanson opened this issue about 2 months ago
Possible encoding issue with some  

karenhanson opened this issue about 2 months ago
Consistent aspect ratio

SuaYoo opened this issue about 2 months ago
Youtube video format

benoit74 opened this issue 2 months ago
Manually create a profile.tar.gz

djhmateer opened this issue 2 months ago
Add WARC resource containing DOM tree after load

magbb opened this pull request 2 months ago
Autoclick Support

ikreymer opened this pull request 2 months ago
[Feature] Support for clicking on links / other elements.

ikreymer opened this issue 2 months ago
package: pin @novnc/novnc to 1.4.0 to prevent accidental upgrades

ikreymer opened this pull request 3 months ago
Dependency Update

ikreymer opened this pull request 3 months ago
support removing range from query (via wabac.js 2.20.6):

ikreymer opened this pull request 3 months ago
some links on page not crawled

robert-1043 opened this issue 3 months ago
WARC Record Write Failed

benoit74 opened this issue 3 months ago
Ensure partial responses are not written

ikreymer opened this pull request 3 months ago
add disable-lazy-loading flag, should fix #699

ikreymer opened this pull request 3 months ago
Dependency Update

ikreymer opened this pull request 3 months ago
Support loading custom behaviors from git repo

tw4l opened this pull request 3 months ago
WARC-Protocol + WARC-Cipher-Suite headers

ikreymer opened this pull request 3 months ago
fix indexing of cookie header:

ikreymer opened this pull request 3 months ago
fix cookie not being passed to replay regression: for now, add x-waba…

ikreymer opened this pull request 3 months ago
Support loading behaviors from a Git repo with branch + path

ikreymer opened this issue 3 months ago
Stop crawler when we have been hit by a WAF protection

benoit74 opened this issue 3 months ago
tests: use old.webrecorder.net for testing

ikreymer opened this pull request 3 months ago
various edge-case loading optimizations:

ikreymer opened this pull request 3 months ago
deps: update to latest wabac

ikreymer opened this pull request 4 months ago
Support loading custom behaviors from URLs and/or filepaths

tw4l opened this pull request 4 months ago
Browser disconnected (crashed?)

rgaudin opened this issue 4 months ago
Issue creating profile for intranet

MRLeflei opened this issue 4 months ago
dep: update to wabac.js 2.20

ikreymer opened this pull request 4 months ago
Archiving particular site kills networking on host machine

vitorio opened this issue 4 months ago
Automatically crawl `<form>` URLs when `method` is `get`

benoit74 opened this issue 4 months ago
link extraction promise cleanup:

ikreymer opened this pull request 4 months ago
bump puppeteer core to 23.5.1

ikreymer opened this pull request 4 months ago
Youtube video is not crawled when `loading="lazy"`

benoit74 opened this issue 4 months ago
Tests: disable blockrules test in CI

ikreymer opened this pull request 4 months ago
fix typo in QA exclude check, which resulted in all URLs being excluded

ikreymer opened this pull request 4 months ago
BUG: Create Browser Profile --headless can't render at viewport

gitreich opened this issue 4 months ago
Add documentation for crawl collections

tw4l opened this pull request 4 months ago
ensure extraHops also apply to maxDepth

ikreymer opened this pull request 4 months ago
Additional exception safety

ikreymer opened this pull request 4 months ago
Include depth in pages JSONL files

tw4l opened this pull request 5 months ago
Include depth in pages jsonl files

tw4l opened this issue 5 months ago
support custom css selectors for extracting links

ikreymer opened this pull request 5 months ago
direct fetch: when cancelling due to redirect, read full body

ikreymer opened this pull request 5 months ago
Brotli decompression error

rgaudin opened this issue 5 months ago
update current crawl size in redis on each healthcheck call

ikreymer opened this pull request 5 months ago
eslint: add strict await checking:

ikreymer opened this pull request 5 months ago
Browser Crash & Docker Exit Code

gitreich opened this issue 5 months ago
cleanup: remove old config files from pywb

ikreymer opened this pull request 5 months ago
bump browser to 1.69.162

ikreymer opened this pull request 5 months ago
crawler args typing

ikreymer opened this pull request 5 months ago
WARC writer + incremental indexing fixes

ikreymer opened this pull request 5 months ago
Additional direct fetch improvements

ikreymer opened this pull request 5 months ago
fix for direct fetch timeouts

ikreymer opened this pull request 5 months ago
Issue crawling a web property with big PDFs

benoit74 opened this issue 5 months ago
Document crawl collection layout

tw4l opened this issue 5 months ago
Use in-place streaming to generate WACZ files

tw4l opened this issue 6 months ago
Streaming in-place WACZ creation + CDXJ indexing

ikreymer opened this pull request 6 months ago
Disable behaviors entirely if --behaviors array is empty

tw4l opened this pull request 6 months ago
SOCKS5 over SSH Tunnel Support

ikreymer opened this pull request 6 months ago
SSH Socks5 Tunnel Proxy Support

ikreymer opened this issue 6 months ago
Adds warning about crawling with basic auth

Shrinks99 opened this pull request 6 months ago
1.2.8 updates:

ikreymer opened this pull request 6 months ago
QA: Ensure empty string is propagated for comparison

tw4l opened this pull request 6 months ago
Crawl button with javascript navigation

hamzamac opened this issue 6 months ago
[question] Missing or timed out dynamic request to resource

wsdookadr opened this issue 6 months ago
A suggestion for making WACZ and WARC-requests

hamoudak opened this issue 6 months ago
[BUG] invalid gzipped WARC

wsdookadr opened this issue 6 months ago
deps: update puppeteer-core to 22.14.0

ikreymer opened this pull request 6 months ago
ETA computation

wsdookadr opened this issue 6 months ago
deps: bump browsertrix-behaviors to 0.6.3

ikreymer opened this pull request 6 months ago
Ignore invalid URLs in redirects

ikreymer opened this pull request 6 months ago
remove crc32 computation, fixes #653

ikreymer opened this pull request 6 months ago
Implemented option for FullPage screenshot after the behaviours have run

fservida opened this pull request 6 months ago
Execution context was destroyed

rgaudin opened this issue 6 months ago
Should invalid URL halt the scraping process?

rgaudin opened this issue 6 months ago
Remove invalid crc32 calculation

ikreymer opened this issue 7 months ago
Behavior run partially failed - Protocol error

zlodejpapiru opened this issue 7 months ago