Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/HTTPArchive/crawl

Controller logic for managing the monthly crawl testing
https://github.com/HTTPArchive/crawl

Change the CrUX crawl delay to 24-hours

6dbf72e7af728af66b82915209fa6aa95eb30551 authored over 1 year ago by Patrick Meenan <[email protected]>
Switched to a 3-hour delay

3bef60d67fee39572c5d69254c918dbf904e3237 authored over 1 year ago by Patrick Meenan <[email protected]>
Use hours for elapsed

819fc1c04450cdecec58686ceeeb1074686e2f43 authored over 1 year ago by Patrick Meenan <[email protected]>
Added an elapsed time to the log

f9675aa6c205665a2a65838ea52d9f196c58b1a2 authored over 1 year ago by Patrick Meenan <[email protected]>
Only log the status info if it changed

91b406fdbbbdc0bc92d74f8897f688da97e3ea38 authored over 1 year ago by Patrick Meenan <[email protected]>
Added more curx logging

f39e0de1279e4e8f29eb49131795b53095f33eab authored over 1 year ago by Patrick Meenan <[email protected]>
Added logging

151d1622550400e4d07faa9ca126a1745066d3c4 authored over 1 year ago by Patrick Meenan <[email protected]>
Wait for the current mont's CrUX URL lists before starting

ecac52ad45e0531d919c52ca156980f79c93500a authored over 1 year ago by Patrick Meenan <[email protected]>
Changed done message to info

8341265f03b412588770bbceb260de1bac862618 authored almost 2 years ago by Patrick Meenan <[email protected]>
Combine both done messages into a single message

46cacc3428de5064c1cf04153cc610f95c42a2e4 authored almost 2 years ago by Patrick Meenan <[email protected]>
Post a message with the crawl dir to the complete queue

cc016d93830cae49b26f46b3fff57a63ce15f242 authored about 2 years ago by Patrick Meenan <[email protected]>
Make SURE to only crawl to links that are same-origin as the parent

c532a360d7195552cb959dfa537de3f84c215cd6 authored about 2 years ago by Patrick Meenan <[email protected]>
Re-enabled the crawl

01f23b335830140aeb0d0818658c289ca20f9e3d authored over 2 years ago by Patrick Meenan <[email protected]>
Disable the crawl

c212e0ef956ff5c6c5b2ab82a7c052ef22caba5a authored over 2 years ago by Patrick Meenan <[email protected]>
Reset crawl dates

a4a5d860eb2cc88b9309e10bbfe89d8ee3d6d06f authored over 2 years ago by Patrick Meenan <[email protected]>
Increase the number of retry threads to better support the crush at the end of the crawl

bdd3e1c0e961ffee054bdc9bb140c472f399e3fe authored over 2 years ago by Patrick Meenan <[email protected]>
Changed crawl date

5b4d793022a5a6e2ad6b026d9b761813ec2ec6b0 authored over 2 years ago by Patrick Meenan <[email protected]>
Prepare for the June 8 re-run

bfd01dd76dda3beb7d2ce899caab94315576e861 authored over 2 years ago by Patrick Meenan <[email protected]>
Slight tweak to the runtime to increase the chance that it won't be idle for 5 minutes

2b36d97d6c065c30272fa29719e7273795af9b6f authored over 2 years ago by Patrick Meenan <[email protected]>
Clear the crawled history when testing is complete

44fe231f8ac8a94f6fcf529b2061ff374c026024 authored over 2 years ago by Patrick Meenan <[email protected]>
Bump the timeout back up to an hour

e56bd46d3e7e7c1976150a3cc2838fca05a67420 authored over 2 years ago by Patrick Meenan <[email protected]>
Align the timeouts

d025c237bbc49e7d1b5af0fb3464c5462dc61b8d authored over 2 years ago by Patrick Meenan <[email protected]>
Back to shorter cycles

03bbcff1fa834fcd6dffc4e350b41c3cb7b49706 authored over 2 years ago by Patrick Meenan <[email protected]>
Run for an hour at a time

2f17f31e89af284292cee003c8f83157f56352d7 authored over 2 years ago by Patrick Meenan <[email protected]>
Load the crawled history separately so we're not re-loading it when idle

2cccdf72ee276bbf81536f08e1fe1b0581ec6ddf authored over 2 years ago by Patrick Meenan <[email protected]>
Run for longer intervals

59610bd8dd1ab485dfcd057b1cdcd431cad480cd authored over 2 years ago by Patrick Meenan <[email protected]>
Enable bodies

dee071a75e9e105d801aef838ccd8c43ea154e0d authored over 2 years ago by Patrick Meenan <[email protected]>
Don't crawl into known-stand-alone file formats

8bd9cb5e49e7529a745a24e41d45a0ed62f9712a authored over 2 years ago by Patrick Meenan <[email protected]>
Limit the children crawled in case we get duplicate messages

e61330d06dc25fda16dd2ba9c48f0fa86f56ef43 authored over 2 years ago by Patrick Meenan <[email protected]>
Restored the crawl name

7989e7ec9279c730f3a21550b4d72cb2b4881bca authored over 2 years ago by Patrick Meenan <[email protected]>
Double the retry threads

bbb7cc9f89f0b5fdd2297b49d7f54b2d7bf215af authored over 2 years ago by Patrick Meenan <[email protected]>
More retry parallelism

47195ce3f4f20449aca8e25f8b9ab2c696f33662 authored over 2 years ago by Patrick Meenan <[email protected]>
Increase the number of retry workers

5e801bafd7b7b93e21488b0466b54db8ce68f4d3 authored over 2 years ago by Patrick Meenan <[email protected]>
Generate deterministic test ID's for crawled pages

85d402fc8e50e05ccf316eb14d27a37e5099cc62 authored over 2 years ago by Patrick Meenan <[email protected]>
Run multiple subscription threads

fed7259fb6a8938a519da299fdbce5cc62e36d06 authored over 2 years ago by Patrick Meenan <[email protected]>
re-enabled flow control

f5a062e10625c4fa1e00d06937ee3923dae77d80 authored over 2 years ago by Patrick Meenan <[email protected]>
Removed all subscriber flow control

a56b50365f5867aa7eb90a529ac025e02806dd97 authored over 2 years ago by Patrick Meenan <[email protected]>
Try disabling flow control on the completed message queue

1ac149ac2bf5a2281a840f52d49454861ecd64a0 authored over 2 years ago by Patrick Meenan <[email protected]>
Tweaked the publish batch settings

998bea0b935e762c049747169504a7481a88ff39 authored over 2 years ago by Patrick Meenan <[email protected]>
Give the batch publish API more time

669abc8b7c9ff709088039b7bc5e617c8c6b0dbe authored over 2 years ago by Patrick Meenan <[email protected]>
Increased the size of the internal queue

bab7d9c1becde9cefad7354871a68024fa6d8e47 authored over 2 years ago by Patrick Meenan <[email protected]>
Reduced the message queue size

07bd59065dc8b164889c68f604a68300369d847e authored over 2 years ago by Patrick Meenan <[email protected]>
Don't wait for pubsub futures when publishing

b1da03230620c9b2512764884e7ee87288620f93 authored over 2 years ago by Patrick Meenan <[email protected]>
Increased the queue size for the completed queue

e7f0a7b1aaf34061f132ee0f59569b9385d09a39 authored over 2 years ago by Patrick Meenan <[email protected]>
Save the status after submits are complete

9201f3dcde5587d8790e107d011f31aabefe5a55 authored over 2 years ago by Patrick Meenan <[email protected]>
Update the activity counter with each submit

28f6a9121a4c72cf685671198b2f2bc9fce045d8 authored over 2 years ago by Patrick Meenan <[email protected]>
Switched queue implementations

67ab38044ad0972fb858251c0a927fd0e2c05bb4 authored over 2 years ago by Patrick Meenan <[email protected]>
Supress ack exceptions

51f3ea4f70cffc274d58234e8fc0c46dc4e1fbf0 authored over 2 years ago by Patrick Meenan <[email protected]>
Moved the completed queue to the front of the subscriptions

1283235789559836da7b3d76608048ca06c86cd6 authored over 2 years ago by Patrick Meenan <[email protected]>
Reduced the size of the failed queue

5976e4d0e055df05092500d48e48b226233d908e authored over 2 years ago by Patrick Meenan <[email protected]>
Run for 10 minutes at a time

b43a1b0fa9c2badab15632330971c7a47820535d authored over 2 years ago by Patrick Meenan <[email protected]>
More tweaks

0eeeb8a9348f765ca3fda8c20021dea368aa8c01 authored over 2 years ago by Patrick Meenan <[email protected]>
Tweaked the queue sizes

68c0482a3bd09a9a2bbaff49dd28a5e9059fd176 authored over 2 years ago by Patrick Meenan <[email protected]>
Increased to 10 messages

0fc7f4efa93b65d44a415a45bddcd6489a828d95 authored over 2 years ago by Patrick Meenan <[email protected]>
Try limiting to 1 message at a time

435885ac07ca7c78d0cc7fffec707d943eac1a6d authored over 2 years ago by Patrick Meenan <[email protected]>
Moved the logging setup back to the beginning

1224980b38e0d69dafe3e3d4681391f988e1e4b0 authored over 2 years ago by Patrick Meenan <[email protected]>
Added exception handling

c5b6d7b0a119929ad7e46f6920f2fb3d8448fb99 authored over 2 years ago by Patrick Meenan <[email protected]>
Tighten up the timeouts

a11005ef702b15c0d45dda515598f51e5662c436 authored over 2 years ago by Patrick Meenan <[email protected]>
Use a separate lock file

03b990b918bbf4eb77d42aed80cb6c6de29733c0 authored over 2 years ago by Patrick Meenan <[email protected]>
Updated logging for the start of crawl

218beaaf34b2e93faeb23e7e06cc85d1bb8135a8 authored over 2 years ago by Patrick Meenan <[email protected]>
Switch to May 12th crawl name temporarily

843fdce2e25bef96a099e390c0fb14dc05c450ec authored over 2 years ago by Patrick Meenan <[email protected]>
Populate the root page for all tests, including for the root page itself.

a89bc69f0b44b19672bf011355644b36150ff567 authored over 2 years ago by Patrick Meenan <[email protected]>
Added information about the root page for a crawl as well

040659f6dd0b70ab2384ceb8f5f46e531f76a990 authored over 2 years ago by Patrick Meenan <[email protected]>
Added metadata about the parent page

080f7d05c251e9a4eccecd3e894ec46e4518e55c authored over 2 years ago by Patrick Meenan <[email protected]>
Disable updating the URL list temporarily while we re-crawl the same URLs

5a2e1d602a3fa395dce8398bc66ca29af28dce6e authored over 2 years ago by Patrick Meenan <[email protected]>
Merge branch 'async' into main

8d6cf57071a739a651590a4f2e728e943920b28c authored over 2 years ago by Patrick Meenan <[email protected]>
Don't use the completion queue when not crawling

dec80139b776a034147e857a8e2e949ff7f94385 authored over 2 years ago by Patrick Meenan <[email protected]>
Disable crawling for now

08fbc03786418f73406e276b989140beb21fcb37 authored over 2 years ago by Patrick Meenan <[email protected]>
Completed crawl depth logic

f6412dec595acac0932f950e265b61fe531ca8ed authored over 2 years ago by Patrick Meenan <[email protected]>
In-flight work to switch to async pub/sub

9630fc15379d2a47a1469d36d5dd82e739042fc0 authored over 2 years ago by Patrick Meenan <[email protected]>
Added the tested URL to the metadata

00ca8895437343a875555994346ebebbdad559a2 authored over 2 years ago by Patrick Meenan <[email protected]>
Put a 4 minute limit on requeuing tests

74b4ab7ee83aaab099731e931ba0c25c91a01aac authored over 2 years ago by Patrick Meenan <[email protected]>
put the return_immediately check back in

879e86866d9760a62472a1169c159cf122b6811d authored over 2 years ago by Patrick Meenan <[email protected]>
Removed the return_immediately option

d5b5699cf012442fe2523b77b05ea21258659c47 authored over 2 years ago by Patrick Meenan <[email protected]>
Fixed the rate calculation

b9cf23f36492001e3b685d33cd5b4473f680eb24 authored over 2 years ago by Patrick Meenan <[email protected]>
Added a crawl rate calculation to the status

48633de264db68cb154b3e67475b9ae534538321 authored over 2 years ago by Patrick Meenan <[email protected]>
Made the logged status compact

b21a4c8700df11d27a0cf27f81f649bc7afb80ee authored over 2 years ago by Patrick Meenan <[email protected]>
Change the bulk retry size to 100

429403310e69be0232aaf3a9d969c98b066eb928 authored over 2 years ago by Patrick Meenan <[email protected]>
Ack messages in bulk

a07078ff678927f8bce7e8d1e6aa3bde3bf0e073 authored over 2 years ago by Patrick Meenan <[email protected]>
Put some protection in the retry processing

a8bc6f875a47dc2edb96b203078515eaf2317513 authored over 2 years ago by Patrick Meenan <[email protected]>
Handle the retry queue in batches of 1000

3490dff01f85ec56048801c4517e0113c6f2f62d authored over 2 years ago by Patrick Meenan <[email protected]>
Persist the log for the duration of the crawl

fe9f0a723575e71244c2e8432d25ab065df8e655 authored over 2 years ago by Patrick Meenan <[email protected]>
Added crawl_depth and link_depth to the metadata

4f7180023e47534a8d1fc3b8dfa9dce3d3f9412b authored over 2 years ago by Patrick Meenan <[email protected]>
Enable lighthouse testing on the desktop tests

747aa2c25dff37b6150674aea004323d27bcb6b9 authored over 2 years ago by Patrick Meenan <[email protected]>
Switched to the production config

1dc0e92e0fe8f49ee1acf83e65023377bb5ec683 authored over 2 years ago by Patrick Meenan <[email protected]>
Reset the empty time when tests are resubmitted

4cc616616ceda8789d196d1428faa010362ab6d9 authored over 2 years ago by Patrick Meenan <[email protected]>
Added more logging

cae1ce7ba8bb64c80ae3d0fda214f00661f32b09 authored over 2 years ago by Patrick Meenan <[email protected]>
Fix the zero-based test ID's

6cf88df8f539f8c73dfcd0e18ff6e6cf574b2d48 authored over 2 years ago by Patrick Meenan <[email protected]>
Fixed the crawl names

144b5413af029dc2117c3664ac4da991b9cafe8c authored over 2 years ago by Patrick Meenan <[email protected]>
Updated the logging

ea533e7d93a05257141d26b07fd5649cb857bd89 authored over 2 years ago by Patrick Meenan <[email protected]>
Update the status at the end of checking

8f890cb29f69e1f24711984becbf981dfe304a2e authored over 2 years ago by Patrick Meenan <[email protected]>
Changed the log file to overwrite

5cd79aabd407e3199f1f1cc2851beac43b15ab46 authored over 2 years ago by Patrick Meenan <[email protected]>
Changed to log to a file

cae5066cfaa0ba8e778da43bffdda25e060e40c8 authored over 2 years ago by Patrick Meenan <[email protected]>
Set it up for testing

bc960e4f5f9ff46588aad2822aed0b9d53691e2c authored over 2 years ago by Patrick Meenan <[email protected]>
Added done file handling

01b0af479635b089bbe22d454c861b011cc80287 authored over 2 years ago by Patrick Meenan <[email protected]>
Added the crux API keys

4c414a4647e9f317560d6d3f890cc94d25956d5d authored over 2 years ago by Patrick Meenan <[email protected]>
Initial work on the batch job for processing the crawl

0dd03cc087f4fec634bf8e695a203c49714ea05c authored over 2 years ago by Patrick Meenan <[email protected]>
Initial commit

ed49fd9015f6cee8c35d3c2c38b0f7fc1963b218 authored over 2 years ago by Patrick Meenan <[email protected]>