Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ooni/2022-04-websteps-illustrated

websteps: winter 2022 edition
https://github.com/ooni/2022-04-websteps-illustrated

fix(measurex): make NewURLAddressList deterministic

This matters when we're rerunning previous measurements using the cache.

Without this fix, the ...

fb5cf82ce1a3f820bca472e65716b560272f09c0 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): improve quality of logging

These changes are based on my trying to make sense of the
several measurements I'm using as test...

35429f1367dd30d0f3a680679234c2d8de99a587 authored over 2 years ago by Simone Basso <[email protected]>
fix: ease comparison with cached measurements

1. we need to rewrite the IDs when we pick results up from the cache
because the ID in general a...

910dd287360e6fd3b9a9e0ef05b80fc5896ecb43 authored over 2 years ago by Simone Basso <[email protected]>
feat(thd): allow logging on file

This is required to re-run measurements using the cache.

While there add support for asbolute t...

0e4601110a424b5ac958d43c0b3e165d124dc782 authored over 2 years ago by Simone Basso <[email protected]>
feat(websteps): allow writing a log file directly

I should have added this code to miniooni long time ago.

We're using the same format of minioon...

e04b5a8b553a4b553b521eed47126f879c7fc735 authored over 2 years ago by Simone Basso <[email protected]>
chore: start documenting the purpose of each dir

7aeba5751a101005c669895a6027d659785ee473 authored over 2 years ago by Simone Basso <[email protected]>
feat: add flags to disable networking when cache is on

This seems one of the missing bits to ensure that, when we're
repeating a measurement, we're not...

7822a1420a35475f0def6b46d613082da5ff1fd3 authored over 2 years ago by Simone Basso <[email protected]>
feat: implement dnsping caching

We _also_ need to cache dnsping results to make measurements fully
reproducible later in time. W...

9b9c1e141ef7e2a71793084d74efbe3eefee0ab8 authored over 2 years ago by Simone Basso <[email protected]>
feat(websteps): allow filling fully reusable measurements cache

This diff introduces changes allowing one to use the websteps command
to fill a predictable, com...

1ae3d1573d4e12caee0a93d0eff5883b12a63a24 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): replace --deep and --fast with -m, --mode=MODE

This command line design is more flexible.

556a91c9a5dd50b57f777febf4df6b6d0ff6a958 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): force 2 addresses per domain in fast mode

If we only use one, we're not able to compare the DNS resolver and
the system resolver to say wh...

c8c2291df11acecd81c25359e0301ae3c6f24144 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): make the CLI more user friendly

cbe771fcc811e79dd59fc478de0094ac3cf2a2a4 authored over 2 years ago by Simone Basso <[email protected]>
refactor: split cache in probe and th caches

This should help me implement my plan of having two local caches
for re-running measurements loc...

ee3206d5b6bc4c872ac4f1a4978341860da2cd31 authored over 2 years ago by Simone Basso <[email protected]>
feat(thctl): use OONI backend by default

9c8f60c6873c18afd9bab7586bb81e8a132ac5d8 authored over 2 years ago by Simone Basso <[email protected]>
fix(logcat): synchronize with log processor before continuing

5855bdf23312c824ae28b369978f92aa02861db5 authored over 2 years ago by Simone Basso <[email protected]>
refactor: completely rewrite analysis code

1. stop relying on hashing because we're not convinced

2. cleanly split each part of the analys...

604ee8197e8c14f54c5bb4121f5c1f3da6fa6fb8 authored over 2 years ago by Simone Basso <[email protected]>
cleanup(measurex): we don't actually need to separate by resolver

This path was a bit too tricky, so I'll try a simpler path that does not
require me to separate ...

fbb437e565a8d93542181908e99b0dc74a37a185 authored over 2 years ago by Simone Basso <[email protected]>
refactor: separate scrutinize from inspect

Inspect is informing the user we're doing an activity.

Scrutinize is telling the user we're loo...

4a18ab80daedc223841f4407a6f31b020bcacf10 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): start log consumer before the experiment

Otherwise, we may end up missing some messages.

849058d7417153808bd386ef5db2dc674a3b65ba authored over 2 years ago by Simone Basso <[email protected]>
cleanup(measurex): suppress redundant piece of logging

1ebfb1c981a42384f3d3a772bbe04320d88143c1 authored over 2 years ago by Simone Basso <[email protected]>
feat(logcat): introduce all emojis we need

3f631ad6aa4b8666d0c3b2959aef49b2eb93aedb authored over 2 years ago by Simone Basso <[email protected]>
refactor: eliminate usage of the 🪀 emoji

It's very seldom used and so let's just not use it.

21e57fe5a5052db1d775321c829418d5ea873e19 authored over 2 years ago by Simone Basso <[email protected]>
feat(measurex): separate system and non-system lookups

23b45a348245d62e57f2a75b3640ede91011b98a authored over 2 years ago by Simone Basso <[email protected]>
refactor: introduce the "inspect" message

This message is an info message associated with 🧐.

487709611a72c3f0a33b9b0e3d1289712b36a8e1 authored over 2 years ago by Simone Basso <[email protected]>
refactor(logcat): make level vs emoji obvious

The previous code was not obvious and I could not fully explain why
it was working or non-workin...

36110c0ac3e87a664ccb37afe0c09c9f01757b23 authored over 2 years ago by Simone Basso <[email protected]>
feat(websteps): implement greedy mode

In greedy mode we stop following redirects as soon as we find
some signs of internet censorship....

50682db67f0e2ba06c49ee87bbe24bd61be96eb7 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): only use TH IPs for additional stage

If we also use dnsping, then analysis is going to have a hard
time because some TH endpoint meas...

74a2d5dadc91f3afd2d722f7fc6bd4a5332d4022 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): endpoint consistency wrt the TH

1. ensure that the probe is going to test in its additional round
only new IP addresses returned...

911007ec1f3fe11fb37910c870bce9e74fb8b61d authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): proper fix for endpoints mismatch with th

After reasoning a bit more on it, it turns out that
d46101418537113d15c05d7309f4de8fcae28242 is ...

e8e30b8689784e1f0a07679585697ca19d64672b authored over 2 years ago by Simone Basso <[email protected]>
refactor(atomicx): make it a ~zero-cost abstraction

We can reduce the complexity of Int64 by just ensuring its int64 value is
the first element: thi...

a257c396b178d74c8a6f8a324c130fe326d9ea53 authored over 2 years ago by Simone Basso <[email protected]>
fix(netxlite): avoid embedding for resolver

With embedding, every time we add a new method, we don't get a compiler
error, which means the n...

8848c8c516a40663c2718b1aff271884a116a147 authored over 2 years ago by Simone Basso <[email protected]>
fix(websteps): pass to TH all the endpoints we discovered

d46101418537113d15c05d7309f4de8fcae28242 authored over 2 years ago by Simone Basso <[email protected]>
refactor(logcat): per-consumer emoji support

We encode emojis inside the loglevel and each consumer chooses
whether it wants messages with em...

d31043f74292f0ca4a39b88c2a4fd0d9f5608808 authored over 2 years ago by Simone Basso <[email protected]>
refactor(logcat): introduce ring buffer

We now have a ring buffer where we store recent log messages, which
allows the mobile app to see...

3299da5820482342eff0db3bdc382fa80e4930f5 authored over 2 years ago by Simone Basso <[email protected]>
feat(measurex): use pubblic suffix to determine legit redirect

There is still _more_ work to do in this space, like determining the
full list of acceptable red...

d6b267a19e39ff837767453a48c78bea1e73854a authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): improve logging and analysis

4572b55f445860a19bffb4c90d9e2cbb5456c33b authored almost 3 years ago by Simone Basso <[email protected]>
feat: start to organize logging levels

9dbcba1cb3d9236c28f43fec44d236b6680911b2 authored almost 3 years ago by Simone Basso <[email protected]>
refactor: replace logger with logcat singleton

This commit replaces model.Logger instances pretty much everywhere with
calls to the logcat pack...

018e5af72155bf2a81231205f003e00944acffb4 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): implement reverse DNS matching

86d8446aea75cb4c8905e9015db1475bd4228af8 authored almost 3 years ago by Simone Basso <[email protected]>
feat: issue reverse lookups in the test helper

This query collects extra information useful to further
disambiguate possible DNSDiff cases.

6e033f60a48800274f611c2cdfb5e8eb6d3b26d5 authored almost 3 years ago by Simone Basso <[email protected]>
feat: adapt go code for trimming the cache

9e8c43256cb79d7dce8c6e7c2a6ee72426edda3d authored almost 3 years ago by Simone Basso <[email protected]>
fix(cache): print name of matching endpoint

If we cut the name we cannot see what was matching. I am also
wondering whether we should invest...

7b8aa83b46e061862babd4bc5ef1eec15c93f1f0 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): stable and more readable endpoint summary

1. we need to serialize headers as a list rather than as a map
because in golang maps have unpre...

114fad21d7872452ca3298e7013c54284cbe088a authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): use correct referer and headers

The correct referer is the previous URL rather than the location.

We should not pass extra head...

4439c047ff0917409c758790947154b7cbb97a36 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): emit useful info in archival format

41c38a57f2b25f0538257b9c09679605b8387286 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): copy URL with correct query

af8a947eaaa68698ffa2c1f03c29c011b93114c6 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): help endpoint mismatch debugging

611b3eedd88870446c1eaaa16e58c230e896f641 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): include more information in DNS lookup summary

With these information hopefully we're able to say enough on any
given lookup w/o more complicat...

6086d4a7e7c354b38f14b13feab844ead2a91014 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(websteps): archival data format easier to process

1. collapse probe_initial into the single-step measurement, which
would be annoying to do when g...

2e8cb15786861cb6171b5c4d3cf55484891dcc4f authored almost 3 years ago by Simone Basso <[email protected]>
fix(th): use cloudflare as the resolver

b77a9548f74ffcd73c64a723a9703e5334c79433 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): tweak archival data format for endpoint

a30a1c7152156c33b609aa6785cc55c1ab5f083c authored almost 3 years ago by Simone Basso <[email protected]>
refactor: more consistent naming for flat data format

64d95f7a58332f458fe4e9adec96d0fec8dfb457 authored almost 3 years ago by Simone Basso <[email protected]>
fix(thctl): remember to extract title

7b0c316eaac5e8edcccd5fdba8184314203fd6f8 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): remember to extract title

2d6eb5cc9c0dab6b0698e823d05c5ca1fd658dd1 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): limit max number of address tested by the TH

We should not exaggerate with the number of IP addresses tested by
the test helper. Let's add on...

4bc84e5fab900d31848d8d4749ba895a46e48855 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): do not pass all IP addresses to the TH

For websites with _many_ IP addresses like YouTube, this would
mean testing a very large number ...

fd82f2207caa792ccbf6b567a70c525278a0de9c authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): print elapsed time waiting for completion

This functionality is very useful when reading logs.

211e6f9b9fb963a2da370d098bae76a19292d05b authored almost 3 years ago by Simone Basso <[email protected]>
refactor(websteps): pass single cookie jar and use it

I think I had a bunch of false positives caused by the fact that
the test helper was not actuall...

753398e03a3875f8fc14f5f7b5f90d130949cf1b authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): another round of cache improvements

1. cache algorithms were wrong in that they did not prune
duplicate entries but just added new e...

8a9e3ba186b1a3297da5bfc2cddee516c978ba07 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): correct log message

cc11bfd636d562361851584f009e9213a4339b8c authored almost 3 years ago by Simone Basso <[email protected]>
refactor(websteps): know when we started dnsping

By knowing it, we can emit more useful log messages.

a23341333644092ffd314e8a0fbe85747c1070f5 authored almost 3 years ago by Simone Basso <[email protected]>
fix: ensure that caching works as intended

We cannot exclude the possibility that a dns measurement and an
endpoint measurement could have ...

dcd44d0c701fd040b120eb2e2b1abfb4e5616173 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): distinguish between doh and doh3

a7aa8ee2c4840758efd3b9a197bb3c18fdbc01e0 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): remove caching endpoints restrictions

Create a summary of an endpoint that also includes all the
relevant options. So, we're now also ...

c7fa6f9d541e4aa9c4cdb0407657f219f02e460a authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): cache endpoints by summary

While caching by IP address is nice in principle, we are going to
possibly have several records ...

f8d99093af93b38ea6bb6253067ceb16f82996ae authored almost 3 years ago by Simone Basso <[email protected]>
feat: extract CNAME from TH responses

This change requires adding support for extracting the CNAME from
a DNS round trip result. We st...

9249d14f80b4c685234da8e344b36c8904ee69e9 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): omitempty more fields

It's easier to reason on a JSON when fields with their default
value are actually missing from t...

e91efaf0ee2550b3b7f81d7c8793255c566fca60 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): round trip TH options

We need to know their values to construct a more precise
representation of endpoints where we'll...

ef5aa4527de4edfb8c2574e8c516122c052e80ed authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): round trip HTTP title from TH

We were missing this field, which was not so good in terms of
heuristics for detecting HTTP diff...

4874a6e2fa7a35a463d7c5b2d04e9d09c5ca448f authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): probe's DNS lookups are "external"

751d8eb8a09c2345386bc9a6ab9f9eec7f3eaf6a authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): propagate the NS result

71e82c634385dafd2d3480f859da3d83d1a7056a authored almost 3 years ago by Simone Basso <[email protected]>
feat(DNSLookupMeasurement): add accessor for NS

137c62a8af5a13c7959da5ebe027fc2d783e09c4 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): correctly handle todo for dns cache

bd2a44352687ddf3f4a7f5ee0042ff5ac243a401 authored almost 3 years ago by Simone Basso <[email protected]>
fix(thd): make cache optional

With a mandatory cache it's too difficult to develop in the
common case where I need to test wit...

079a1fbb915131bef18092eecce81fd13f812dce authored almost 3 years ago by Simone Basso <[email protected]>
refactor: change endpoint cache policy

Store all endpoints by IP address and check for equality using
the current summary (we still nee...

48d8ec193822cf71888693c5b8387142133d2676 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): represent location as a SimpleURL

2b3a8aa556325ccdb27c1bf811e0008914c30c26 authored almost 3 years ago by Simone Basso <[email protected]>
feat: change DNS change policy

We're now caching on a domain basis. We have improved the rules
to check whether a plan and a me...

66340b2e97b12fc749e8b0285bc6f11556459031 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): don't be fooled by weak comparison

When searching for a compatible DNS lookup measurement, we should
first try to find a "same as" ...

40effa36808cb58d46a9b3f10c8b2499e00e774b authored almost 3 years ago by Simone Basso <[email protected]>
feat: improve the definition of compatible DNS lookups

It's actually fine to cross compare getaddrinfo and HTTPSSvc as
long as both have gathered some ...

d7fedf2f1e2b209d088060ccf29014579af89c85 authored almost 3 years ago by Simone Basso <[email protected]>
refactor: create factories for constructing fake DNS results

5880741cea96eccb03237cd651614e93c01bab07 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): move related code in the same place

ad17d7a8f6987f38f458c91e1d1c732dc24fd515 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): use single DNSLookupPlan per LookupType

We need to refactor the code until there is a single plan for
each measurement. This is the best...

e7bfd1fc19938b61dacd57016b1eb2fa202c77f2 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): use single DNSLookupPlan per resolver

This diff changes the way in which we create DNSLookupPlans to
produce a single plan per resolve...

e025b9d62a16a310b862ded3edfd1526f48acb7d authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): pass flags to NewDNSLookupPlan

These flags allow one to easily specify extra queries to
perform (e.g., HTTPSSvc or NS) as part ...

e956a646f394ca54f9305c40971e242caf77f118 authored almost 3 years ago by Simone Basso <[email protected]>
feat(thd): implement mostly-cache mode

The idea here is to never expire the cache. This allows us to
test the cache performance. It doe...

4ac42a46a9f3d97b1e66a4131aa09ef039186916 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): add support for persistent cache

This should allow us to retry measurements at a later time w/o
having to really do network I/O.
...

b94ffd00af584a564dc547d9ae4a0e5450051969 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): allow to randomize input list

01cc3cea33373f2c7711561e2fa9b8008e152f9c authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): read input list from file

efb5d220867ab4256e7b9a08716f4b6f2319833d authored almost 3 years ago by Simone Basso <[email protected]>
fix(thd): assign a default cache dir value

e7d335ac1d3143de5f7dd72ecce096a24dc97ac1 authored almost 3 years ago by Simone Basso <[email protected]>
fix(thd): ensure we really drop privs before opening the cache

The previous diff was not correct, sadly. So we need a new diff.

3ccf1fd9db3758e3d2df6546e9463fa06debe53c authored almost 3 years ago by Simone Basso <[email protected]>
fix(thd): drop privileges before touching the cache

Otherwise the cache ends up being owned by root and it does not
work because the unprivileged pr...

6e237e0a2e99009410dd2972867a21660896641e authored almost 3 years ago by Simone Basso <[email protected]>
feat: implement {dns,endpoint} measurement caching

We import the codebase used by Go to maintain its package cache, which
has been modified to remo...

ac30cc949f6229ccde9528b46742873cb58436be authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): introduce AbstractMeasurer

The AbstractMeasurer is an interface that replaces a concrete
Measurer with an abstract version ...

6cbd8e6b5619dc9414f88434de83de725b551031 authored almost 3 years ago by Simone Basso <[email protected]>
feat: shrink and rationalize flat data format

Basically, only transmit fields that are different from their
default value. This change makes t...

3edba045f1d6c0a8b89f0c8e863270805e48a956 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): make HTTPSvc lookups optional

Too often they do not work or are censored just like normal
queries, so it seems more economic t...

0bb995af3f355901e7c2e81968fcd7e3d9abd19f authored almost 3 years ago by Simone Basso <[email protected]>
refactor: emit DNS round trips using OONI data format

This diff modifies the emitted data format so that:

1. we only emit DNS round trips

2. we emit...

cdd80f4ffa74552e51bcb6a514841b447ec9d9eb authored almost 3 years ago by Simone Basso <[email protected]>
refactor: use a transport also for system resolver

The advantage of using a (obviously fake) transport for the system
resolver is that we can colle...

97bd13d18b49fd3703cf130667f25acef1f1273e authored almost 3 years ago by Simone Basso <[email protected]>
feat: add simple dnslookup command for testing

a4d669f0060a4fbba4c10d2dcb069e15a6aadabf authored almost 3 years ago by Simone Basso <[email protected]>
feat: implement NS lookup resolution

At the measurex level, you can ask for a NS lookup by using
a flag that extends the performed qu...

eb0bf38957e79fbad198fcdc9f9c7b36f61a8e2c authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): implement stable redirect chains

If we're measuring http://torproject.org with the default config we'll
end up using just two of ...

8755fd05ed964ea45bca259dac8daa1ebbad10b0 authored almost 3 years ago by Simone Basso <[email protected]>