Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ooni/2022-04-websteps-illustrated

websteps: winter 2022 edition
https://github.com/ooni/2022-04-websteps-illustrated

fix: ignore .DS_Store files

9c331ec8f00b470e072611a6e0798e035639f020 authored almost 3 years ago by Simone Basso <[email protected]>
doc(websteps): document more TODOs

ab20f1d628f5a097f079184b540b453f71018cde authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex.go): download 1<<19 to detect throttling

574229fd82e08e8aef359fdcc01e6a1b2a5ea701 authored almost 3 years ago by Simone Basso <[email protected]>
fix: update go.sum

c2c9a789b33616118bf6235b1c05994ee999e989 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): second TH round to honour Alt-Svc

eea8e8e6fbb02029241f590ff16da523b00d947f authored almost 3 years ago by Simone Basso <[email protected]>
fix(netxlite): short-circuit IPAddr for LookupHTTPS

2b51d144bf642f10237102bdc79a0defc30c1579 authored almost 3 years ago by Simone Basso <[email protected]>
feat(cmd/websteps): -b option to specify the backend

342ee801308af151e7ef7c7963c7bd4f57c1981a authored almost 3 years ago by Simone Basso <[email protected]>
chore: ensure we use ASN databases to analyze IPs

8e38daca817d82ce189c68a4cfbcafb7511ff840 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): record both probe and TH data used to compare

5ad9ec91be57dea091fc143da0627a50d073a5e1 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): ensure we save TH's HTTP data

6e92f4b66ac741ed8f1a9d5e5fe0ac870d2d6cfe authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): explain should check unexpected first

46e4036e0b144f2c03fe5fb541dc2953b2a98e5c authored almost 3 years ago by Simone Basso <[email protected]>
fix: ensure we line-terminate the JSONs

cf1a46a9d379edb5eb68b96778121d9db02317b5 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): flag and ignore cases where IPv6 is broken

7ff23d0420ba1f3d43d72bda828a50c4fc6ee2c7 authored almost 3 years ago by Simone Basso <[email protected]>
fix(th): ensure we test all the endpoints

42833852d3d36d80e8c5e4b4adf016f25fd1cbb0 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): implement measurement analysis

73e9319e4338966d25c509f508e95e076b4daa95 authored almost 3 years ago by Simone Basso <[email protected]>
feat: add support for extracting the title

0c1f93cc6db46d3193f105bfa3f760cd536b2e37 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): type changes I forgot to commit

242f0bd471edf4ca0496a35683ba03cf8058407a authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): add code to support websteps analysis

ff6fc7568bfc1b393559cc58180d7265eddbb5ba authored almost 3 years ago by Simone Basso <[email protected]>
refactor(websteps): just move around code

dbd7e9fe6de20a70d1528e1527e5ed844b0de333 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): wire-in following redirects

b11933bbf17d98d049891ec6c37041024eb85568 authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): complete the TH measurement

f8802c915f95c341bc97ad51d793904e2613493e authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): allow creating URLAddressList w/o URLMeasurement

c504882d7ef9c40c31bdf96944c4a1341f46cd33 authored almost 3 years ago by Simone Basso <[email protected]>
cleanup(URLAddress): remove ~unused URL field

646262ff5e82f43fbd33a5dc711a54af4a1c0f65 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): export HTTP body length

Required to process TH measurement from Python.

ce4a706e07383ac23af6583ee053480181f8aac5 authored almost 3 years ago by Simone Basso <[email protected]>
fix(websteps): better external import of probe DNS lookups

1. ensure that the resolver looks like external://probe

2. ensure that we deduplicate IP addresses

53abe73b62ab2db059408c7b522efadfa0b31af5 authored almost 3 years ago by Simone Basso <[email protected]>
fix(thctl): repair build after previous commit

fcd90171116e11b4c5612d93e9a86136888e3e3e authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): import TH measurement on the probe side

884dd7063c3558ca20281f0e0c933b10e128734c authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): better design for endpoint equality

Now we have a clear definition of endpoints measurement equality.

Now we have a slightly better...

c13493e4fd4c887bf5f6ec6675ddac8b177589ad authored almost 3 years ago by Simone Basso <[email protected]>
feat(websteps): TH interaction now looks good

fd651f64b69c9fe40804cfaa1a30cf48a0b773fb authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): support DNS-over-HTTP{,3}

52b8e340b276d4d63c6249f0e77e193d56faebfa authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): optionally exclude bogons from endpoint planning

104bc2fe7490285215bdcc1b6df66c6f5e936065 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): add exported required by websteps' TH

356bcf5a1b3fa21c1b0d99467fbe4a300b08a874 authored almost 3 years ago by Simone Basso <[email protected]>
feat(archival): add support for serializing headers list

818c379c85db12e592179ca158a892b23046caec authored almost 3 years ago by Simone Basso <[email protected]>
feat(URLMeasurement): allow adding fake DNS entries

c5975474631d2fb4a843e7edd0f0a45c525a31b4 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): allow to flatten options

Useful in websteps TH.

f8ddf384fe6ec1e7d86203ae6c94d1495a85b78d authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): add code to serialize cookies

2d64368333b9259455b5c68b6d207137b2fd8008 authored almost 3 years ago by Simone Basso <[email protected]>
chore: import unmodified httpx.go

732f4297f95ab75a7056c40b6d163c268e8a695e authored almost 3 years ago by Simone Basso <[email protected]>
chore: sync crawler's main with websteps' main

3d7a60006b2b5a24568fef30f962b50ecb6b514d authored almost 3 years ago by Simone Basso <[email protected]>
feat: start sketching out websteps implementation

b9a4ac070ef69319872d441395120e8339156cd5 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): obtain parallelism from options

59442dfd1c34b9f3ddb87870ff537817cb0c1c04 authored almost 3 years ago by Simone Basso <[email protected]>
feat(cmd/crawler): specify host header and sni from cmdline

9ee9136cc16f7f8f498c8f13568921c15b02f7da authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): count the depth inside the deque

While there, ensure we have a default TLS handshaker.

a9576b8ec6db66a118f72d290c2dd6ce6baa120b authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): common struct with options with good defaults

0e33f35d020f8cd8adc397c30d7c0a883ae6c4e2 authored almost 3 years ago by Simone Basso <[email protected]>
feat: enforce maximum number of addresses per family

a1377df40dc402a06f296419f94c208dd84a1694 authored almost 3 years ago by Simone Basso <[email protected]>
chore: add TODOs for myself

1e43760a071c520df572a933b0aa90dd13d67602 authored almost 3 years ago by Simone Basso <[email protected]>
chore: show how using options changes websteps personality

3aa9076c5802b0f12b3d801a89d850faf5fdaa07 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): aggregate network events to reduce size

ca34e91e889999867a33bf0c0b5a457dec30e82b authored almost 3 years ago by Simone Basso <[email protected]>
feat(archival): collect when events started

ac3aec5686dff157f63d13801b4d863d270892aa authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): ensure we fully support IDNA

815beb46878d36cf7bfc76b6d44af786f7b4bfa0 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): make sure we include the ALPN

6bf1b273b12bc3f75772d13c96e7b0516edb7530 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): try harder to avoid duplicating work

b572fa3538e22b8d97f27b57626b2411c34b0fdc authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): check both http and https only initially

ae18a3751bd85b79574f36c27c5a0cfa5d9e555e authored almost 3 years ago by Simone Basso <[email protected]>
feat: implement and use parallel DNS resolver

55231d73cd822a851f532dea1b8089694d58100e authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): we always want to query for HTTPS

acbc0d744c75e9de0f3c178661ec3192d615fb6a authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): canonicalize URLs

This avoids taking unnecessary steps when non-canonical versions
of the URL are returned by vari...

aff403501734d3709f65d7233e1d0bbdb4667090 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): make debugging easier

cfb4c8f1f24fbe72cc9f2c1c3992d2a8c8120821 authored almost 3 years ago by Simone Basso <[email protected]>
feat(crawler): dump data using the archival data format

d8b57d80f2c16edd548562b495ef190c08a602d5 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(archival): allow to convert individual events

6d38fa66cc5274960b72e13636da111eccfe95c7 authored almost 3 years ago by Simone Basso <[email protected]>
feat: initial implementation of crawler

2fa69e00eac5d010324559b4e48daf6d5b2e5a90 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): return HTTP{,S,3} plans

This happens regardless of the input URL scheme.

644b18761d13b5bc81342e50bc80ace800408ad4 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): collect TCPConnect separately

This simplifies managing a measurement result.

31c1a01cff7b1f54ea82b0b320838ce539bf4bf1 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): continue to rationalize the data format

d7f223d5751e196db240b34780a712802f0ae6f0 authored almost 3 years ago by Simone Basso <[email protected]>
cleanup(measurex): remove the foreign resolver concept

In the current websteps, it's used to fit in the TH but that does
not actually seem to be a clea...

23dbed0b7efb6bfb7c05649da8bd5cb6798ad516 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): make package documentation accurate

57479511b57744e0af0c4e2eb76d6168703aeaf2 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): make comment better

91507bd4d7225765f8b9cb579260540c43928c30 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): link with netxlite

ab5852ea9f86b23582f72f111ddc629b751d4498 authored almost 3 years ago by Simone Basso <[email protected]>
feat: include whole netxlite

Using https://github.com/ooni/probe-cli/commit/024eb42334721c06a37da7930c08836bab6ef2e8

The ide...

8cb5b0d3419962a3aab58adb3fa04873198000b8 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): more understandable endpoint model

Like in the previous commit, we're now using a more understandable
model for representing an end...

ab5914592636bc451503cfd02771997a31692aff authored almost 3 years ago by Simone Basso <[email protected]>
refactor(archival): all handshakes together

efa280506ab00e93de96a25aae3dbce79cc61415 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): more understandable model

While in general we may have many lookups, it's fair to say that
in this case we just have a sin...

a7c003e2815d33182d84353dea8675c8d8bc7ada authored almost 3 years ago by Simone Basso <[email protected]>
refactor(archival): collect all DNS lookups together

090d491e4e13bf888fe9fa99fd80e4cf9a4ab77f authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): compute redirects

c7b37b23f47913e1e41d58af26a90972a5d220c8 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): raise the maximum body size

84746b704a436d72c887de03cc20967e233c9fb4 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): don't stop collecting network events

We removed this functionality in the previous commit, so now
the code does not compile anymore :^).

6512bfaeccbdef50781a2e5767408913b8728784 authored almost 3 years ago by Simone Basso <[email protected]>
fix(archival): always collect network events

To prevent from saving them is perhaps premature optimization.

cdfabe95fa63d1dd9b3c42a1b8b27f9152e58975 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): support for measuring URLs

a9c7b6e9cc69f5a29ad15c2642698f9f1464d977 authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): cosmetic changes

298dd977d48901e20a560d508d39a6bfc9842ffc authored almost 3 years ago by Simone Basso <[email protected]>
refactor(measurex): redesign around dns and endpoint

Hide most operations and only expose code to deal with either
endpoint measurements or DNS measu...

5b7a9c28348b02b40d2d31000cade5e4a5f6d1ea authored almost 3 years ago by Simone Basso <[email protected]>
chore: run go mod tidy

a3a45211e11b54fa2a5bc24d59ae5ee804cb5e46 authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): clarify documentation

3e09785bd7cc5d88b75002c08bfa27508c6dbb28 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): copy parallel code from original impl

While there, try to make the parallel lookup faster

9a649ce648724a53fc9ce85b1bc823238a4e00fc authored almost 3 years ago by Simone Basso <[email protected]>
fix(measurex): always wrap TCP connections

86eef6e17ed6820260a2c57a9ccea15d4b81f074 authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): more abstract ID generator

c03877e39fae01a448d8f5147d49280c1e6b6b1e authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): convert HTTPEndpoint to Endpoint

9f60a0da04f533ebb223ccb8143613a823f5fbce authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): incorporate cookies into the HTTPEndpoint concept

4c606f87797db5b5e558d601f377c3101c7a5863 authored almost 3 years ago by Simone Basso <[email protected]>
feat: initial rewrite of measurex

The objective of this initial rewrite is to have a simpler
measurex. We have used lots of code f...

3b84f62b453643ecfb1e79e5aeb9afd756c4fb1a authored almost 3 years ago by Simone Basso <[email protected]>
feat: add unmodified internal/engine/httpheader

f53df7dee364c5f5237539f747feb2cf536e4130 authored almost 3 years ago by Simone Basso <[email protected]>
feat: add unmodified i/runtimex/runtimex.go

58007dac6e50e2b166119b7d2bc42f579923789e authored almost 3 years ago by Simone Basso <[email protected]>
refactor(netxlite): tlsutil.go => tlsutils.go

5199635c9d57dfaa1f9bc9e79c74b5c275c426d2 authored almost 3 years ago by Simone Basso <[email protected]>
fix(netxlite): we need more tlsutils

9de68616f2ed2ef45d05d2b2923e8758038c6757 authored almost 3 years ago by Simone Basso <[email protected]>
chore(measurex): add unmodified logger.go file

3fdaceb6b0276676f15703bc239b3c58434d1e8d authored almost 3 years ago by Simone Basso <[email protected]>
chore: add un modified i/n/errno_linux.go

ffc326bd8ca52ee6e6734db492caafc166f59f0e authored almost 3 years ago by Simone Basso <[email protected]>
feat(measurex): start sketching out rewrite

This is not meant to be a super destructive rewrite, rather I'd like
to more smoothly integrate ...

50b79c5025361bb922f041614f070fdf70be0f43 authored almost 3 years ago by Simone Basso <[email protected]>
feat(archival): allow stop/resume collecting network events

This removes concerns regarding whether we'll ever collect a huge
trace when we're measuring thr...

9d813e2d92a3d0fd55cb3725202318450ce9e86f authored almost 3 years ago by Simone Basso <[email protected]>
chore(model/netx.go): define TLSConn

This diff WILL need to be ported to probe-cli.

8e866dbab88ce732ee8aa7e748b4a194caec7708 authored almost 3 years ago by Simone Basso <[email protected]>
chore: add unmodified internal/model/logger.go

9710a540c39af968153f2d8b0ec3317b278e700f authored almost 3 years ago by Simone Basso <[email protected]>
chore: run go mod tidy

4c5fa6c749632beac49a051906c0432c1ed7d77c authored almost 3 years ago by Simone Basso <[email protected]>
feat(archival): only expose wrap operations

This diff improves the abstraction of the internal/archival package
to only expose wrappers for ...

08c8def9e39286fb0e0c0833b8dde2eece3a3218 authored almost 3 years ago by Simone Basso <[email protected]>
chore: add LICENSE and README.md

d400868f7a062f79db159c00b85b1645ba39689a authored almost 3 years ago by Simone Basso <[email protected]>
chore: add unmodified internal/atomicx/atomicx.go

b267ba456249a83901adcba7fa90ae4856d4c77a authored almost 3 years ago by Simone Basso <[email protected]>