Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/webrecorder/browsertrix-crawler
Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler
Fix Pending Request causing timeout
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Improved handling of pages that redirect back to the same page.
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Better handling of redirect chains to same page
ikreymer opened this issue 6 months ago
ikreymer opened this issue 6 months ago
Dependency Update / 1.2.2
ikreymer opened this pull request 6 months ago
ikreymer opened this pull request 6 months ago
Vimeo Playback: Retrieve full stream
kila58 opened this issue 6 months ago
kila58 opened this issue 6 months ago
Automatically add exclusion rules based on `robots.txt`
benoit74 opened this issue 7 months ago
benoit74 opened this issue 7 months ago
Revisit WARC-Resource-Type or add a new header
benoit74 opened this issue 7 months ago
benoit74 opened this issue 7 months ago
Always download PDF + non HTML page cleanup + enterprise policy cleanup
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Don't filter saving redirect if no response body.
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Crawler is not returning full seed page URL in WARC `WARC-Target-URI`
benoit74 opened this issue 7 months ago
benoit74 opened this issue 7 months ago
browser policies: disable restoring any tabs on startup + set new tab URL to about:blank
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Remove DISPLAY env var from image
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
1.2.0 release - deps: bump wabac.js to 2.19.1, RWP for QA to 2.1.0
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Updated rewriting for YouTube + dependency update
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
disable socat by default
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
add yarn.lock to Docker to ensure consistent builds!
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
bump brave to 1.67.119
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
logging: log error message when seed is failed to be created
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
btoa() the auth from yaml config too
edsu opened this pull request 7 months ago
edsu opened this pull request 7 months ago
Could there be a way to create warcs with certain size after one RUN (combinewarc / rolloversize...)
ssairanen opened this issue 7 months ago
ssairanen opened this issue 7 months ago
http auth support per seed (supersedes #566):
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
clearer scope check
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
adjust browser viewport to avoid cutting off bottom of page
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
add EXPOSE for ports used inside container
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
merge 1.1.4 -> 1.2.0 beta
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
dependency: update RWP to 2.0.1
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Fix header newline escape
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
add undici for 1.1.4 release, to fix #606
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Crawl stopping with TypeError: Headers.append:...is an invalid header value.
ldko opened this issue 7 months ago
ldko opened this issue 7 months ago
Crawl stopping with Node throwing ERR_INVALID_STATE.TypeError
ldko opened this issue 7 months ago
ldko opened this issue 7 months ago
Fix synching extraSeeds state with multiple crawler instances
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Fix seed redirect with multiple crawl instances (eg. scale > 1)
ikreymer opened this issue 7 months ago
ikreymer opened this issue 7 months ago
tests: fix blockrules tests
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
recorder: add missing shouldSkip() to responseReceived callback
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Ensure responseReceived callback also checks shouldSkip()
ikreymer opened this issue 7 months ago
ikreymer opened this issue 7 months ago
Change some logged errors to warns
tw4l opened this pull request 7 months ago
tw4l opened this pull request 7 months ago
Make timeout logging messages warns, not errors
tw4l opened this issue 7 months ago
tw4l opened this issue 7 months ago
Misleading error message
rgaudin opened this issue 7 months ago
rgaudin opened this issue 7 months ago
--useSiteMap options is not working anymore
benoit74 opened this issue 7 months ago
benoit74 opened this issue 7 months ago
tests: reduce logging
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
cleanup dockerfile + fix test
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
add --dryRun flag and mode
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Dry Run Mode
ikreymer opened this issue 7 months ago
ikreymer opened this issue 7 months ago
base image version bump to brave 1.66.115
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
Use isolated Python venv for dependencies installation
benoit74 opened this pull request 7 months ago
benoit74 opened this pull request 7 months ago
Get rid of Python dependencies now that pywb is gone
benoit74 opened this pull request 7 months ago
benoit74 opened this pull request 7 months ago
proxy: support setting proxy via --proxyServer, PROXY_SERVER env var or PROXY_HOST + PROXY_PORT env vars
ikreymer opened this pull request 7 months ago
ikreymer opened this pull request 7 months ago
[Bug]: Profiles are cut off at the bottom
SuaYoo opened this issue 7 months ago
SuaYoo opened this issue 7 months ago
Investigate capture of dynamic videos that render only when page is scrolled.
ikreymer opened this issue 7 months ago
ikreymer opened this issue 7 months ago
Add SOCKS5 + HTTP Proxy support (and add backwards compatible support for 0.12.x support in Browsertrix Crawler 1.x)
tw4l opened this issue 7 months ago
tw4l opened this issue 7 months ago
Consider disk usage of collDir instead of default /crawls
benoit74 opened this pull request 8 months ago
benoit74 opened this pull request 8 months ago
Crawler is not checking the proper disk usage
benoit74 opened this issue 8 months ago
benoit74 opened this issue 8 months ago
Better indicate the interruption reason
benoit74 opened this issue 8 months ago
benoit74 opened this issue 8 months ago
Load non-HTML resources directly whenever possible
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Bump version to 1.2.0 Beta + make draft release for each commit
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Failing to create a login profile for www.solidarite-numerique.fr
benoit74 opened this issue 8 months ago
benoit74 opened this issue 8 months ago
Fix failOnFailedLimit and add tests
tw4l opened this pull request 8 months ago
tw4l opened this pull request 8 months ago
Add group policies, limit browser access to container filesystem
vnznznz opened this pull request 8 months ago
vnznznz opened this pull request 8 months ago
Sitemap Parsing Fixes
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Mention command line options when restarting
edsu opened this pull request 8 months ago
edsu opened this pull request 8 months ago
save state: export pending list as array of json strings + fix importing save state to support pending
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Combine Parameters --failOnFailedLimit with --failOnInvalidStatus
gitreich opened this issue 8 months ago
gitreich opened this issue 8 months ago
What exactly is '--blockRules' blocking? Entire URLs where an element like an iframe matches a regex, or only the matching part of a page?
steph-nb opened this issue 8 months ago
steph-nb opened this issue 8 months ago
headers: better filtering and encoding
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Fix regressions with `failOnFailedSeed` option
tw4l opened this pull request 8 months ago
tw4l opened this pull request 8 months ago
PDF loading status code fix
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
WARC record HTTP status code is 0 instead of 200
benoit74 opened this issue 8 months ago
benoit74 opened this issue 8 months ago
Cannot convert argument to a ByteString
benoit74 opened this issue 8 months ago
benoit74 opened this issue 8 months ago
Unable to resume crawl: not valid JSON
edsu opened this issue 8 months ago
edsu opened this issue 8 months ago
Crawl error: missing context with id
edsu opened this issue 8 months ago
edsu opened this issue 8 months ago
Adding support for HTTP Basic Auth
edsu opened this pull request 8 months ago
edsu opened this pull request 8 months ago
add STORE_REGION env var to be able to specify region
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Skip Checking Empty Frame + eval timeout
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
Parameter --failOnFailedSeed exits Docker with ExitCode 0
gitreich opened this issue 8 months ago
gitreich opened this issue 8 months ago
improved handling of requests from workers:
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
profiles: ensure initial page.load() is awaited
ikreymer opened this pull request 8 months ago
ikreymer opened this pull request 8 months ago
pages.jsonl missing required ts timestamp field
halmos opened this issue 8 months ago
halmos opened this issue 8 months ago
profiles: ensure all page.goto() promises have at least catch block/a…
ikreymer opened this pull request 9 months ago
ikreymer opened this pull request 9 months ago
Port 9222 can't be used for Screencast
gitreich opened this issue 9 months ago
gitreich opened this issue 9 months ago
Always add warcinfo records to all WARCs
ikreymer opened this pull request 9 months ago
ikreymer opened this pull request 9 months ago
`.warc` strict conforming output ?
dbuenzli opened this issue 9 months ago
dbuenzli opened this issue 9 months ago
`.warc` output, always write the warcinfo record ?
dbuenzli opened this issue 9 months ago
dbuenzli opened this issue 9 months ago
extra security: only add --no-sandbox flag if running as root
ikreymer opened this pull request 9 months ago
ikreymer opened this pull request 9 months ago
replace minio client with aws sdk
mguella opened this pull request 9 months ago
mguella opened this pull request 9 months ago
Bump base image 1.64.122 + Skip http/2 pseudoheaders
ikreymer opened this pull request 9 months ago
ikreymer opened this pull request 9 months ago
Config payload Digest sha1 base32
gitreich opened this issue 9 months ago
gitreich opened this issue 9 months ago
S3 upload and region error
cmillet2127 opened this issue 10 months ago
cmillet2127 opened this issue 10 months ago
Switch to using JS WACZ
ikreymer opened this pull request 10 months ago
ikreymer opened this pull request 10 months ago
WARC Validation Error appears from time to time
gitreich opened this issue 10 months ago
gitreich opened this issue 10 months ago
Make screenshot after custom behaviors
cmillet2127 opened this issue 10 months ago
cmillet2127 opened this issue 10 months ago
Use js-wacz to create WACZ files
tw4l opened this issue 10 months ago
tw4l opened this issue 10 months ago
Failure uploading large files (handling slowDown)
wvengen opened this issue 10 months ago
wvengen opened this issue 10 months ago
Brave Default Setting Improvements
Shrinks99 opened this issue 11 months ago
Shrinks99 opened this issue 11 months ago
SOCKS proxy username and password parameters missing
vbanos opened this issue about 1 year ago
vbanos opened this issue about 1 year ago
[Bug]: no warc-info header in any warc file included in a wacz
tuehlarsen opened this issue about 1 year ago
tuehlarsen opened this issue about 1 year ago
how configurable is the Automated Profile Creation feature
msramalho opened this issue about 1 year ago
msramalho opened this issue about 1 year ago
Cloudflare interstitial wait isn't working
vbanos opened this issue about 1 year ago
vbanos opened this issue about 1 year ago
Update README.md to mention Brave instead of Chrome
benoit74 opened this issue about 1 year ago
benoit74 opened this issue about 1 year ago
parentURL / sourceURL / Referer - flag ? to enable parentURL recording
Dooriin opened this issue about 1 year ago
Dooriin opened this issue about 1 year ago
Convert to TypeScript
ikreymer opened this issue about 1 year ago
ikreymer opened this issue about 1 year ago