Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/mwmbl/crawler-extension
A browser extension that can be installed by volunteers to participate in mwmbl distributed crawling.
https://github.com/mwmbl/crawler-extension
Replace multiple white spaces in content
f7efb6c0a409de6eb3a1d5ef9a41be8cd4091bbc authored about 3 years ago
f7efb6c0a409de6eb3a1d5ef9a41be8cd4091bbc authored about 3 years ago
Add todo
dc3de71a738a714c5c3362219ae02f93e742d903 authored about 3 years ago
dc3de71a738a714c5c3362219ae02f93e742d903 authored about 3 years ago
Check some edge cases
900ce29195b993547f54e4d52e316b72f0312ce4 authored about 3 years ago
900ce29195b993547f54e4d52e316b72f0312ce4 authored about 3 years ago
Record links in paragraphs
6b0db1b938c6b53a35ea5004cee7d25eed42f8de authored about 3 years ago
6b0db1b938c6b53a35ea5004cee7d25eed42f8de authored about 3 years ago
Get paragraphs from retrieved web pages
33f9ad92836efda8933e8661d516687273fab2ba authored about 3 years ago
33f9ad92836efda8933e8661d516687273fab2ba authored about 3 years ago
Remove unused constants
4f7654e00e2d25c72be08fa421218ae58b2ca4c4 authored about 3 years ago
4f7654e00e2d25c72be08fa421218ae58b2ca4c4 authored about 3 years ago
Revise paragraphs
2fc64dad07fb5c95bd9e3a122c2618e4e8a68890 authored about 3 years ago
2fc64dad07fb5c95bd9e3a122c2618e4e8a68890 authored about 3 years ago
Classify paragraphs context free
fa26a78e3c8c016c6e3b5c48cba35cfcd2459daa authored about 3 years ago
fa26a78e3c8c016c6e3b5c48cba35cfcd2459daa authored about 3 years ago
Extract paragraphs
4a3cd497a0ec29043055a5e0bbb62c8aed8b3f1c authored about 3 years ago
4a3cd497a0ec29043055a5e0bbb62c8aed8b3f1c authored about 3 years ago
Check we behave ok with nested tags to remove
548f0e17cd1a973077ec2c64918deaad58b4c999 authored about 3 years ago
548f0e17cd1a973077ec2c64918deaad58b4c999 authored about 3 years ago
Start on jusText port - remove unwanted tags
cc066f81900129be1cd504ad3727edd3ee937fa6 authored about 3 years ago
cc066f81900129be1cd504ad3727edd3ee937fa6 authored about 3 years ago
Merge pull request #6 from mwmbl/watch-files
Changed dev script to build with watch mode
fd5174262bc92f6dffa9626ec1edd5c6e3b8bd3b authored about 3 years ago
chore: changed dev script to build with watch mode
4bb3496730ee0f4fa2301b39c5ae31fb5eaf30da authored about 3 years ago
4bb3496730ee0f4fa2301b39c5ae31fb5eaf30da authored about 3 years ago
Update gitignore for Jetbrains
f3037490ae239f59343588ce0847509ef4216430 authored about 3 years ago
f3037490ae239f59343588ce0847509ef4216430 authored about 3 years ago
Merge pull request #4 from mwmbl/robots-txt
Respect robots.txt
87e479494ec6a70130b5e567744bf41e84f3def0 authored about 3 years ago
Revert package-lock to master
68e0aef3686fd72a326a88c9a8883b6e90e23e61 authored about 3 years ago
68e0aef3686fd72a326a88c9a8883b6e90e23e61 authored about 3 years ago
Merge branch 'master' into robots-txt
97cb0350d654c7878812e4053f95a39f1532d50b authored about 3 years ago
97cb0350d654c7878812e4053f95a39f1532d50b authored about 3 years ago
Check the parsed result against user agent and path
20bb483ea27d89285ab4907983b95686c7c6d041 authored about 3 years ago
20bb483ea27d89285ab4907983b95686c7c6d041 authored about 3 years ago
Handle errors fetching robots
cdd3427a16b8be7fa3d2585a89cf37519ebcd3ff authored about 3 years ago
cdd3427a16b8be7fa3d2585a89cf37519ebcd3ff authored about 3 years ago
Load and parse robots.txt
bfd50490b0ea7265cc57e64f26798b900a83b5da authored about 3 years ago
bfd50490b0ea7265cc57e64f26798b900a83b5da authored about 3 years ago
Merge pull request #3 from mwmbl/retrieve-pages
Retrieve pages
2d8466a362796f806e51d349525686dc1a470d36 authored about 3 years ago
Fix using suggestions from code review
b9b98558942f7e0e5a88058fbe2fffa1e485fbec authored about 3 years ago
b9b98558942f7e0e5a88058fbe2fffa1e485fbec authored about 3 years ago
Don't instantiate Date
d0730b600823ae667bec5e6ef92882f9aea2e14d authored about 3 years ago
d0730b600823ae667bec5e6ef92882f9aea2e14d authored about 3 years ago
Make crawlURL async
fd928558e3b37e1788fc6af6d1bd326accfda05e authored about 3 years ago
fd928558e3b37e1788fc6af6d1bd326accfda05e authored about 3 years ago
Convert loadCuratedDomains to async/await
de507cfe4ae83a73f52f8390b6e5df433f5e8029 authored about 3 years ago
de507cfe4ae83a73f52f8390b6e5df433f5e8029 authored about 3 years ago
Use ES6 style classes; this.chooseDomain() is not working
b7de76bf75d930e92827efb187a793eb7128a482 authored about 3 years ago
b7de76bf75d930e92827efb187a793eb7128a482 authored about 3 years ago
Use manifest v2 style permissions
cb59f27204c85243c4404321be9d6cb4ca84b124 authored about 3 years ago
cb59f27204c85243c4404321be9d6cb4ca84b124 authored about 3 years ago
Merge pull request #2 from mwmbl/crawl-pages
Run a crawl iteration once every second
6667eee891ca56930255187f12306d9b4e547b78 authored about 3 years ago
Remove unwanted background section in package.json
728da25494aeae3cbc7e0eb1a57244e554627b67 authored about 3 years ago
728da25494aeae3cbc7e0eb1a57244e554627b67 authored about 3 years ago
Try and crawl URL
a41b00b4a058080c68c8712b3af6f554424ffd1c authored about 3 years ago
a41b00b4a058080c68c8712b3af6f554424ffd1c authored about 3 years ago
Move all functions to the module
2824d6b9397f2ce150329e7355498db2b75b8a28 authored about 3 years ago
2824d6b9397f2ce150329e7355498db2b75b8a28 authored about 3 years ago
Remove extraneous brackets
1867d3e2c4f7120082951f5eda11f11613f1d79f authored about 3 years ago
1867d3e2c4f7120082951f5eda11f11613f1d79f authored about 3 years ago
Load curated set of domains and choose a random one at startup
515292d4aa17737343b718dc16b8a9bccf7bfb5d authored about 3 years ago
515292d4aa17737343b718dc16b8a9bccf7bfb5d authored about 3 years ago
Run a crawl iteration once every second
- Use manifest version 2 to be more compatible with Firefox
- We also need a persistent backgro...
chore: removed useless Vite starter assets
7d1cadf933ed7f208931640baeb10055f5abf028 authored about 3 years ago
7d1cadf933ed7f208931640baeb10055f5abf028 authored about 3 years ago
feat: setup Vite bundling and getting-started example from docs
4aa1b4f16bbd8f6e78322642f713a9fe4a87ee08 authored about 3 years ago
4aa1b4f16bbd8f6e78322642f713a9fe4a87ee08 authored about 3 years ago
chore: copied license file from mwmbl repo
03413c5f01486b8506dbe137bafd9cf8dac80df5 authored about 3 years ago
03413c5f01486b8506dbe137bafd9cf8dac80df5 authored about 3 years ago
Initial commit
a99b5e90637837944ce7f0c82c89bda66b2cd5ed authored about 3 years ago
a99b5e90637837944ce7f0c82c89bda66b2cd5ed authored about 3 years ago