Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/mwmbl/crawler-script


https://github.com/mwmbl/crawler-script

Merge pull request #8 from mwmbl/fix-docker-file

Fix docker file

bc1a37d4d0b06a8462b4d07a8b38dd8193d381ed authored about 1 year ago
Fix Dockerfile

e0345bda1f9f3b8b1086c6bda587b3a28a47b5bf authored about 1 year ago
Add script to download links from HN

80b1b5c5466527a5355b25b44ac7c463ec6961ba authored about 1 year ago
Remove white space from URLs

577ec12b806d9a715edb855d22863db765f8e1e4 authored about 2 years ago
Crawl a site

cc16fc36ad174cfe722b598895749ff5b6740e6d authored about 2 years ago
Log URL when we get a DOM parsing error

6652faa21be9a1eac10e505e0e8d48f78bf8adf0 authored about 2 years ago
Try and decode robots.txt with another encoding

2114558e38b8da40f348e3d295170c8f4de368e3 authored about 2 years ago
Catch dom parsing exceptions

bb5f1d5cb168fefcc191dd39548cbeaf43bfc678 authored about 2 years ago
Improve robots decode logging

12ecf8111c09add1e7df8b04660e5526fb78fe1f authored about 2 years ago
Update README.md

c36868614deccb41a1a257ad84f2980b36f1554c authored about 2 years ago
Update README.md

a5da7dc3eed2ff569fddd82192f9b613a495cbb3 authored about 2 years ago
Update README.md

bca0919293882e07a5f9ab46d67ed2811631df8c authored about 2 years ago
Merge pull request #3 from mwmbl/crawl-given-url

Allow crawling specific URLs

db3d2107bc8958c4fdf8e79dbb87a5ec6c5a26c3 authored about 2 years ago
Allow crawling specific URLs

333c0805c0839fb927b3880be2195069c84dad6b authored about 2 years ago
Merge pull request #2 from EchedeyLR/feature/Dockerize

Feature/dockerize

9908e1552b6c3fb4335f741883c3e7be9e95efd9 authored about 2 years ago
Fix name

85ee88c01828a0a72c3a37ce41fd153ebf522b33 authored about 2 years ago
Dockerize

c98388499316ba3bb8687bfd4c7976b1d8ff2708 authored about 2 years ago
Fix pyproject.toml

c5d2f3b6cfe6800a376e2dbfc404ad46f3d7b52d authored about 2 years ago
Use a threadpool instead of a process pool

24ae505c86aa2c408abc61f89e8822f1a478380e authored about 2 years ago
Enable multi-process crawling

c12c794c5ba2b62cb03031ae22ec8ba5214ada0c authored about 2 years ago
Send a batch if it gets interrupted

b85ac4a5970027dd9e4edd64c12dd5e5d4cc8888 authored about 2 years ago
Add poetry files

5a913cbfabcc021fdf1b1162a78a7633d1410a66 authored about 2 years ago
Crawler script ready for testing

5eb3c14b6703b75b613f24a4d7ca6099110f3b99 authored about 2 years ago
Get and store user ID, send batch

c4be7e4a3d7343b366017f6b05f6e5023676cb6b authored about 2 years ago
Crawl batch

af04eec55d73af3a81f0f45f9a80ee20192468ef authored about 2 years ago
Get links

1f06c5d5127f8511d3c248cd07b37e3f2ea30b65 authored about 2 years ago
Check robots.txt

4dcc922114ff87b5672df5c8da55749f57896cde authored about 2 years ago
Maximum fetch size and timeout

1e7848141f5e1dec561a54419f72294a90a28865 authored about 2 years ago
Store the links with the paragraphs

67b6671bceb831503a8b17cbbccf1cac64abd29a authored about 2 years ago
Initial commit

94e5891411a76da5ffb64f244de7902c08a27053 authored about 2 years ago
Initial commit

6dfebf7f5154bcf63273cd3435d0dbe39cc6c362 authored about 2 years ago