Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/webrecorder/browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://github.com/webrecorder/browsertrix-crawler

deps: update puppeteer-core to 20.4.0, fixes #324

ikreymer opened this pull request over 1 year ago
Update puppeteer-core

ikreymer opened this issue over 1 year ago
Ignore spaces in double quotes when splitting process.env.CRAWL_ARGS

tw4l opened this pull request over 1 year ago
Update argParser.js for #307

anjackson opened this pull request over 1 year ago
Skipping autoscroll when page should be able to scroll

edsu opened this issue over 1 year ago
allow adding --include with pre-existing --scopeType values (besides …

ikreymer opened this pull request over 1 year ago
Allow --includes to be added to --scopeType values

ikreymer opened this issue over 1 year ago
Created Invalid WARC record

rgaudin opened this issue over 1 year ago
Chrome 112 + new headless mode + consistent viewport tweaks

ikreymer opened this pull request over 1 year ago
Entire site is crawled, but no output warcs are generated.

ArtHoff opened this issue over 1 year ago
stopping: if crawl is marked as stopping, and no warcs found, mark st…

ikreymer opened this pull request over 1 year ago
Can't create intranet profile

ArtHoff opened this issue over 1 year ago
Disable Chrome optimization logic

malemburg opened this pull request over 1 year ago
black screen on interactive profile creation

jswrenn opened this issue over 1 year ago
state: adjust redis keys to be more consistent

ikreymer opened this pull request over 1 year ago
Disk utilization threshold

atomotic opened this issue over 1 year ago
Handling of CRAWL_ARGS cannot cope with quoted strings

anjackson opened this issue over 1 year ago
Consolidate wacz error loglines

tw4l opened this pull request over 1 year ago
Log fatal messages to redis errors

tw4l opened this pull request over 1 year ago
Improve thumbnails with sharp

tw4l opened this pull request over 1 year ago
crawl stopping / additional states:

ikreymer opened this pull request over 1 year ago
Improve thumbnail creation

tw4l opened this issue over 1 year ago
Switch back to Puppeteer from Playwright

ikreymer opened this pull request over 1 year ago
Catch 4xx and 5xx page.goto() responses to mark invalid URLs as failed

tw4l opened this pull request over 1 year ago
Full-page screenshots missing content

ArtHoff opened this issue over 1 year ago
Playwright persistent browser context causing memory issues

tw4l opened this issue over 1 year ago
Fixes from 0.9.1

ikreymer opened this pull request over 1 year ago
Fix full page screenshot

tw4l opened this pull request over 1 year ago
Browsertrix can't fetch articles to crawl list (only menu items)

gitreich opened this issue over 1 year ago
Allow switching capturing backend from pywb to warcprox

Sanqui opened this issue over 1 year ago
Allow spaces in userAgentSuffix command line option

anjackson opened this issue over 1 year ago
Quick exit on redis connection error after interrupt

ikreymer opened this pull request over 1 year ago
Store archive dir size in Redis

tw4l opened this pull request over 1 year ago
Introduce new Limit Parameter crawl-size

gitreich opened this issue over 1 year ago
worker: lower wait time, in case where no additional pages remain and…

ikreymer opened this pull request over 1 year ago
Store crawl size in Redis while crawl is running

ikreymer opened this issue over 1 year ago
Crawler doesn't mark invalid URL as failed

tw4l opened this issue over 1 year ago
feat: Add custom behavior injection

lambdahands opened this pull request over 1 year ago
is it possible to output regular files

ftc2 opened this issue almost 2 years ago
origin override: add --originOverride source=dest to allow routing wh…

ikreymer opened this pull request almost 2 years ago
Investigate removing done from Redis

tw4l opened this issue almost 2 years ago
Add option to log errors to redis

tw4l opened this pull request almost 2 years ago
Add option to log crawl errors to Redis

tw4l opened this issue almost 2 years ago
Error when restarting crawl with config via stdin

darcyparksliu opened this issue almost 2 years ago
Add --maxPageLimit override

ikreymer opened this pull request almost 2 years ago
blockrules/logger: use global logger var

ikreymer opened this pull request almost 2 years ago
Add unit test for sizeLimit

stavares843 opened this pull request almost 2 years ago
Update README for 0.9.0

tw4l opened this pull request almost 2 years ago
Add options to filter logs by --logLevel and --context

tw4l opened this pull request almost 2 years ago
Add CLI options to filter logs by logLevel and/or context

tw4l opened this issue almost 2 years ago
Network error when using --config and config file

darcyparksliu opened this issue almost 2 years ago
Support Contextual Information in datapackage.json for WACZ

markpbaggett opened this issue almost 2 years ago
Reset locked pending URLs when crawler restarts.

ikreymer opened this pull request almost 2 years ago
worker index: set worker index automatically to work with k8s naming

ikreymer opened this pull request almost 2 years ago
twitter Quote Tweets issue

polo1kani opened this issue almost 2 years ago
Ensure crawler can't run out of space with --diskUtilization param

tw4l opened this pull request almost 2 years ago
Support Custom Browsertrix Behaviors Loading

ikreymer opened this issue almost 2 years ago
Error puppeteer: Unable to get browser page

PedroG1515 opened this issue almost 2 years ago
Add more verbose logs in browsertrix

PedroG1515 opened this issue almost 2 years ago
What's your registry strategy?

rgaudin opened this issue almost 2 years ago
New parameter to add deduplication between crawls

PedroG1515 opened this issue almost 2 years ago
Add option for sleep interval after behaviors run

tw4l opened this pull request almost 2 years ago
Parameter sizeLimit is not ending the crawl correctly

gitreich opened this issue almost 2 years ago
Catch loading issues

ikreymer opened this pull request almost 2 years ago
Logger cleanup

ikreymer opened this pull request almost 2 years ago
Dev 0.9.0 Beta 1 Work - Playwright Removal + Worker Refactor + Redis State

ikreymer opened this pull request almost 2 years ago
State / Worker Refactor

ikreymer opened this pull request almost 2 years ago
Refactor / Cleanup of Crawl (for 1.0.0)

ikreymer opened this issue almost 2 years ago
Obtaining Screenshot Image Files After Crawl

thegrif opened this issue almost 2 years ago
Disable browser updates

rgaudin opened this issue almost 2 years ago
Support uploading/serialized crawled output to IPFS

RangerMauve opened this issue almost 2 years ago
Catch ioredis console errors and log "Waiting for redis" instead

tw4l opened this issue almost 2 years ago
Ensure Crawler Can Not Run out of Disk Space / Stops at Disk Utilization

ikreymer opened this issue almost 2 years ago
Add a 'finishing' state to RedisCrawlState

ikreymer opened this issue almost 2 years ago
Per-Crawler Instance status messages

Shrinks99 opened this issue almost 2 years ago
Add documentation for how to use drivers!

ikreymer opened this issue almost 2 years ago
Don't set viewport for full page screenshots

tw4l opened this pull request almost 2 years ago
Specifying selectors for extracting links.

ttaomae opened this issue almost 2 years ago
Serialize Redis pending pages as JSON objects

tw4l opened this pull request almost 2 years ago
Add RedisCrawlState test

tw4l opened this pull request almost 2 years ago
Success status code on failure

rgaudin opened this issue almost 2 years ago
Remove dead pywb configuration

edsu opened this pull request about 2 years ago
Consider switching to Brave for base browser.

ikreymer opened this issue about 2 years ago
Add cookie popup blocking via adblock-rs

tw4l opened this pull request about 2 years ago
HTTP Basic Auth

edsu opened this pull request about 2 years ago
SSLError

wenjin11 opened this issue over 2 years ago
Exclude example needs protocol

edsu opened this pull request over 2 years ago
[Feature request] Prioritize entries in queue (by regex?)

bjrne opened this issue over 2 years ago
How to run "Interactive Profile Creation" using docker compose?

rajasekhar-gundala opened this issue over 2 years ago
Suggestion: make it easy to integrate adblocker

phiresky opened this issue almost 3 years ago
get working screenshot functionality

emmadickson opened this pull request about 3 years ago
proxy support

phiresky opened this issue over 3 years ago