Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/webrecorder/browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler

Autofetch behavior results in empty 200 responses

karenhanson opened this issue 18 days ago
Possible encoding issue with some  

karenhanson opened this issue 18 days ago
Consistent aspect ratio

SuaYoo opened this issue 27 days ago
Youtube video format

benoit74 opened this issue 28 days ago
Manually create a profile.tar.gz

djhmateer opened this issue 30 days ago
Add WARC resource containing DOM tree after load

magbb opened this pull request about 1 month ago
Autoclick Support

ikreymer opened this pull request about 1 month ago
[Feature] Support for clicking on links / other elements.

ikreymer opened this issue about 1 month ago
package: pin @novnc/novnc to 1.4.0 to prevent accidental upgrades

ikreymer opened this pull request about 1 month ago
Headful interactive profile creation broken after v1.4.0 update

halvir opened this issue about 1 month ago
Dependency Update

ikreymer opened this pull request about 1 month ago
support removing range from query (via wabac.js 2.20.6):

ikreymer opened this pull request about 2 months ago
some links on page not crawled

robert-1043 opened this issue about 2 months ago
WARC Record Write Failed

benoit74 opened this issue about 2 months ago
Ensure partial responses are not written

ikreymer opened this pull request about 2 months ago
add disable-lazy-loading flag, should fix #699

ikreymer opened this pull request about 2 months ago
Crawler is not failing when seed page returns an HTTP error code

benoit74 opened this issue about 2 months ago
Dependency Update

ikreymer opened this pull request about 2 months ago
Support loading custom behaviors from git repo

tw4l opened this pull request 2 months ago
WARC-Protocol + WARC-Cipher-Suite headers

ikreymer opened this pull request 2 months ago
fix indexing of cookie header:

ikreymer opened this pull request 2 months ago
fix cookie not being passed to replay regression: for now, add x-waba…

ikreymer opened this pull request 2 months ago
Support loading behaviors from a Git repo with branch + path

ikreymer opened this issue 2 months ago
Stop crawler when we have been hit by a WAF protection

benoit74 opened this issue 2 months ago
tests: use old.webrecorder.net for testing

ikreymer opened this pull request 2 months ago
various edge-case loading optimizations:

ikreymer opened this pull request 2 months ago
deps: update to latest wabac

ikreymer opened this pull request 2 months ago
Support loading custom behaviors from URLs and/or filepaths

tw4l opened this pull request 3 months ago
Browser disconnected (crashed?)

rgaudin opened this issue 3 months ago
Issue creating profile for intranet

MRLeflei opened this issue 3 months ago
dep: update to wabac.js 2.20

ikreymer opened this pull request 3 months ago
Archiving particular site kills networking on host machine

vitorio opened this issue 3 months ago
Automatically crawl `<form>` URLs when `method` is `get`

benoit74 opened this issue 3 months ago
link extraction promise cleanup:

ikreymer opened this pull request 3 months ago
bump puppeteer core to 23.5.1

ikreymer opened this pull request 3 months ago
Youtube video is not crawled when `loading="lazy"`

benoit74 opened this issue 3 months ago
Tests: disable blockrules test in CI

ikreymer opened this pull request 3 months ago
fix typo in QA exclude check, which resulted in all URLs being excluded

ikreymer opened this pull request 3 months ago
BUG: Create Browser Profile --headless can't render at viewport

gitreich opened this issue 3 months ago
Add documentation for crawl collections

tw4l opened this pull request 3 months ago
ensure extraHops also apply to maxDepth

ikreymer opened this pull request 3 months ago
Additional exception safety

ikreymer opened this pull request 3 months ago
Include depth in pages JSONL files

tw4l opened this pull request 3 months ago
Include depth in pages jsonl files

tw4l opened this issue 3 months ago
support custom css selectors for extracting links

ikreymer opened this pull request 4 months ago
direct fetch: when cancelling due to redirect, read full body

ikreymer opened this pull request 4 months ago
Brotli decompression error

rgaudin opened this issue 4 months ago
update current crawl size in redis on each healthcheck call

ikreymer opened this pull request 4 months ago
eslint: add strict await checking:

ikreymer opened this pull request 4 months ago
Browser Crash & Docker Exit Code

gitreich opened this issue 4 months ago
cleanup: remove old config files from pywb

ikreymer opened this pull request 4 months ago
bump browser to 1.69.162

ikreymer opened this pull request 4 months ago
crawler args typing

ikreymer opened this pull request 4 months ago
WARC writer + incremental indexing fixes

ikreymer opened this pull request 4 months ago
Additional direct fetch improvements

ikreymer opened this pull request 4 months ago
fix for direct fetch timeouts

ikreymer opened this pull request 4 months ago
Issue crawling a web property with big PDFs

benoit74 opened this issue 4 months ago
Document crawl collection layout

tw4l opened this issue 4 months ago
Use in-place streaming to generate WACZ files

tw4l opened this issue 4 months ago
Streaming in-place WACZ creation + CDXJ indexing

ikreymer opened this pull request 4 months ago
Disable behaviors entirely if --behaviors array is empty

tw4l opened this pull request 5 months ago
SOCKS5 over SSH Tunnel Support

ikreymer opened this pull request 5 months ago
SSH Socks5 Tunnel Proxy Support

ikreymer opened this issue 5 months ago
Adds warning about crawling with basic auth

Shrinks99 opened this pull request 5 months ago
1.2.8 updates:

ikreymer opened this pull request 5 months ago
QA: Ensure empty string is propagated for comparison

tw4l opened this pull request 5 months ago
Crawl button with javascript navigation

hamzamac opened this issue 5 months ago
[question] Missing or timed out dynamic request to resource

wsdookadr opened this issue 5 months ago
A suggestion for making WACZ and WARC-requests

hamoudak opened this issue 5 months ago
[BUG] invalid gzipped WARC

wsdookadr opened this issue 5 months ago
deps: update puppeteer-core to 22.14.0

ikreymer opened this pull request 5 months ago
ETA computation

wsdookadr opened this issue 5 months ago
deps: bump browsertrix-behaviors to 0.6.3

ikreymer opened this pull request 5 months ago
Ignore invalid URLs in redirects

ikreymer opened this pull request 5 months ago
remove crc32 computation, fixes #653

ikreymer opened this pull request 5 months ago
Implemented option for FullPage screenshot after the behaviours have run

fservida opened this pull request 5 months ago
Execution context was destroyed

rgaudin opened this issue 5 months ago
Should invalid URL halt the scraping process?

rgaudin opened this issue 5 months ago
Remove invalid crc32 calculation

ikreymer opened this issue 6 months ago
Behavior run partially failed - Protocol error

zlodejpapiru opened this issue 6 months ago
misc tweaks:

ikreymer opened this pull request 6 months ago
Make it clear that profile argument can be an HTTP(S) URL

benoit74 opened this pull request 6 months ago
Youtube Video Quality

fservida opened this issue 6 months ago
Fix 206 response + general video handling

ikreymer opened this pull request 6 months ago
Fix skipping of 206 responses

ikreymer opened this issue 6 months ago
Can an AWS alternative to Access Keys be added?

jblukach opened this issue 6 months ago
Skipping URL from unknown frame

zlodejpapiru opened this issue 6 months ago
Crawler keeps signing out(?)

Azmodeszer opened this issue 6 months ago
Add support for WARC-Protocol and WARC-Cipher-Suite headers

ikreymer opened this issue 6 months ago
bump replayweb.page to 2.1.1

ikreymer opened this pull request 6 months ago
don't disable extraHops when using sitemaps:

ikreymer opened this pull request 6 months ago
Loosen selectors for login fields in automated profile creation

tw4l opened this pull request 6 months ago
"Login form could not be found"

Azmodeszer opened this issue 6 months ago