Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/webrecorder/browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler

Fix Pending Request causing timeout

ikreymer opened this pull request 6 months ago
Improved handling of pages that redirect back to the same page.

ikreymer opened this pull request 6 months ago
Better handling of redirect chains to same page

ikreymer opened this issue 6 months ago
Dependency Update / 1.2.2

ikreymer opened this pull request 6 months ago
Vimeo Playback: Retrieve full stream

kila58 opened this issue 6 months ago
Automatically add exclusion rules based on `robots.txt`

benoit74 opened this issue 7 months ago
Revisit WARC-Resource-Type or add a new header

benoit74 opened this issue 7 months ago
Always download PDF + non HTML page cleanup + enterprise policy cleanup

ikreymer opened this pull request 7 months ago
Don't filter saving redirect if no response body.

ikreymer opened this pull request 7 months ago
Remove DISPLAY env var from image

ikreymer opened this pull request 7 months ago
1.2.0 release - deps: bump wabac.js to 2.19.1, RWP for QA to 2.1.0

ikreymer opened this pull request 7 months ago
Updated rewriting for YouTube + dependency update

ikreymer opened this pull request 7 months ago
disable socat by default

ikreymer opened this pull request 7 months ago
add yarn.lock to Docker to ensure consistent builds!

ikreymer opened this pull request 7 months ago
bump brave to 1.67.119

ikreymer opened this pull request 7 months ago
logging: log error message when seed is failed to be created

ikreymer opened this pull request 7 months ago
btoa() the auth from yaml config too

edsu opened this pull request 7 months ago
http auth support per seed (supersedes #566):

ikreymer opened this pull request 7 months ago
clearer scope check

ikreymer opened this pull request 7 months ago
adjust browser viewport to avoid cutting off bottom of page

ikreymer opened this pull request 7 months ago
add EXPOSE for ports used inside container

ikreymer opened this pull request 7 months ago
merge 1.1.4 -> 1.2.0 beta

ikreymer opened this pull request 7 months ago
dependency: update RWP to 2.0.1

ikreymer opened this pull request 7 months ago
Fix header newline escape

ikreymer opened this pull request 7 months ago
add undici for 1.1.4 release, to fix #606

ikreymer opened this pull request 7 months ago
Fix synching extraSeeds state with multiple crawler instances

ikreymer opened this pull request 7 months ago
Fix seed redirect with multiple crawl instances (eg. scale > 1)

ikreymer opened this issue 7 months ago
tests: fix blockrules tests

ikreymer opened this pull request 7 months ago
recorder: add missing shouldSkip() to responseReceived callback

ikreymer opened this pull request 7 months ago
Ensure responseReceived callback also checks shouldSkip()

ikreymer opened this issue 7 months ago
Change some logged errors to warns

tw4l opened this pull request 7 months ago
Make timeout logging messages warns, not errors

tw4l opened this issue 7 months ago
Misleading error message

rgaudin opened this issue 7 months ago
--useSiteMap options is not working anymore

benoit74 opened this issue 7 months ago
tests: reduce logging

ikreymer opened this pull request 7 months ago
cleanup dockerfile + fix test

ikreymer opened this pull request 7 months ago
add --dryRun flag and mode

ikreymer opened this pull request 7 months ago
Dry Run Mode

ikreymer opened this issue 7 months ago
base image version bump to brave 1.66.115

ikreymer opened this pull request 7 months ago
Use isolated Python venv for dependencies installation

benoit74 opened this pull request 7 months ago
Get rid of Python dependencies now that pywb is gone

benoit74 opened this pull request 7 months ago
[Bug]: Profiles are cut off at the bottom

SuaYoo opened this issue 7 months ago
Consider disk usage of collDir instead of default /crawls

benoit74 opened this pull request 8 months ago
Crawler is not checking the proper disk usage

benoit74 opened this issue 8 months ago
Better indicate the interruption reason

benoit74 opened this issue 8 months ago
Load non-HTML resources directly whenever possible

ikreymer opened this pull request 8 months ago
Bump version to 1.2.0 Beta + make draft release for each commit

ikreymer opened this pull request 8 months ago
Fix failOnFailedLimit and add tests

tw4l opened this pull request 8 months ago
Add group policies, limit browser access to container filesystem

vnznznz opened this pull request 8 months ago
Sitemap Parsing Fixes

ikreymer opened this pull request 8 months ago
Mention command line options when restarting

edsu opened this pull request 8 months ago
headers: better filtering and encoding

ikreymer opened this pull request 8 months ago
Fix regressions with `failOnFailedSeed` option

tw4l opened this pull request 8 months ago
PDF loading status code fix

ikreymer opened this pull request 8 months ago
WARC record HTTP status code is 0 instead of 200

benoit74 opened this issue 8 months ago
Cannot convert argument to a ByteString

benoit74 opened this issue 8 months ago
Unable to resume crawl: not valid JSON

edsu opened this issue 8 months ago
Crawl error: missing context with id

edsu opened this issue 8 months ago
Adding support for HTTP Basic Auth

edsu opened this pull request 8 months ago
add STORE_REGION env var to be able to specify region

ikreymer opened this pull request 8 months ago
Skip Checking Empty Frame + eval timeout

ikreymer opened this pull request 8 months ago
Parameter --failOnFailedSeed exits Docker with ExitCode 0

gitreich opened this issue 8 months ago
improved handling of requests from workers:

ikreymer opened this pull request 8 months ago
profiles: ensure initial page.load() is awaited

ikreymer opened this pull request 8 months ago
pages.jsonl missing required ts timestamp field

halmos opened this issue 8 months ago
profiles: ensure all page.goto() promises have at least catch block/a…

ikreymer opened this pull request 9 months ago
Port 9222 can't be used for Screencast

gitreich opened this issue 9 months ago
Always add warcinfo records to all WARCs

ikreymer opened this pull request 9 months ago
`.warc` strict conforming output ?

dbuenzli opened this issue 9 months ago
`.warc` output, always write the warcinfo record ?

dbuenzli opened this issue 9 months ago
extra security: only add --no-sandbox flag if running as root

ikreymer opened this pull request 9 months ago
replace minio client with aws sdk

mguella opened this pull request 9 months ago
Bump base image 1.64.122 + Skip http/2 pseudoheaders

ikreymer opened this pull request 9 months ago
Config payload Digest sha1 base32

gitreich opened this issue 9 months ago
S3 upload and region error

cmillet2127 opened this issue 10 months ago
Switch to using JS WACZ

ikreymer opened this pull request 10 months ago
WARC Validation Error appears from time to time

gitreich opened this issue 10 months ago
Make screenshot after custom behaviors

cmillet2127 opened this issue 10 months ago
Use js-wacz to create WACZ files

tw4l opened this issue 10 months ago
Failure uploading large files (handling slowDown)

wvengen opened this issue 10 months ago
Brave Default Setting Improvements

Shrinks99 opened this issue 11 months ago
SOCKS proxy username and password parameters missing

vbanos opened this issue about 1 year ago
[Bug]: no warc-info header in any warc file included in a wacz

tuehlarsen opened this issue about 1 year ago
how configurable is the Automated Profile Creation feature

msramalho opened this issue about 1 year ago
Cloudflare interstitial wait isn't working

vbanos opened this issue about 1 year ago
Update README.md to mention Brave instead of Chrome

benoit74 opened this issue about 1 year ago
Convert to TypeScript

ikreymer opened this issue about 1 year ago