Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ooni/pipeline

OONI data processing pipeline
https://github.com/ooni/pipeline

Replace msmt with msm. Leave the plural as msmts

* Add comments to tables

ed351f8e4d3a20fe495bddf1eadfeaf9c7264e14 authored almost 6 years ago
Add initial schema for some of the tables needed by OONI Explorer

652282ad47071c9f7f1aec2795de86b594173a8b authored almost 6 years ago
ooni-uuid.md: fix obvious typo

4a3a014256e4c2b46fc693d732da4e16aba3f0dd authored almost 6 years ago
Merge pull request #143 from ooni/drop-legacy

Delete all legacy pipeline code

0eab0a52c5dfce17a1fd4203c5bdf1749e4470ef authored almost 6 years ago
Readme.md: fix obvious typo

b6f43c0eaa98f6774535e7cfe06cdffcd7560eef authored almost 6 years ago
Revert "Drop daily_workflow.py"

This reverts commit 7ac04de98103b8e3eca6bc5c456d4e66b2da3425.

83bafc99f7e3b19d6cb72e79cbf191e5566969e7 authored almost 6 years ago
Drop daily_workflow.py

7ac04de98103b8e3eca6bc5c456d4e66b2da3425 authored almost 6 years ago
Drop private dir

36f4c46be4e93a3f3e71edd88a10fe903f918d15 authored almost 6 years ago
WIP: document report reprocesing, see #133, #134, #135

ee8258e654224d3656c8c5af8dbe57ce624a323f authored almost 6 years ago
Delete all legacy pipeline code

9a62ef13906032551f475af7d0fd70d2c32ab2e0 authored almost 6 years ago
Merge pull request #142 from gabelula/documentation

A few small changes for pipeline documentation.

1b2688d75a7abc09e446a7d965dd8011f5b5564d authored almost 6 years ago
A few small changes for pipeline documentation.

15f935d0fef4df3e8c234aacaaaa506d686bc937 authored almost 6 years ago
Merge pull request #128

31dd687563d7125460e863f39f054c74521a22b9 authored almost 6 years ago
Add docs/delete-report note & debugging scripts used during data cleanup

6108671bb92fd67164886433a32bc49181f005a5 authored almost 6 years ago
Ignore empty reports on reingestion, closes #141

Empty reports may be result of yaml to json conversion: the yaml report
header was there, but no...

220417f8aa57360fcb94eae2e1d6caf7d48a36b2 authored almost 6 years ago
Fix unnecessary Cartesian product during reingestion

f8e15241dffe707cce59a2cc2361f972c018ac59 authored almost 6 years ago
Fix ZeroDivisionError

3e9506ca3425a648e09552cbe5291558258c56cd authored almost 6 years ago
Fix chmod typo

ee816b71c550fca6519dd1c75bc9b42901bb78b0 authored almost 6 years ago
Add `--missing` to autoclaving to handle modified canned files

5ab3e90c28161c81e30a25f9a3b6ded2ce0cbca3 authored almost 6 years ago
Move tempfile wrappers to oonipl.tmp

39c9c1970ff6cd26ce33e03fe1e6a81e4cac8d3a authored almost 6 years ago
Add `delete_canned_report.py` to re-compressed canned and reports-tgz

69cd5575f54516b5a9a41567436504c7c42ad8f4 authored almost 6 years ago
Move two more canning subroutines to `oonipl`

04cb81adc1d6ed0277726cc9dacc9887b671c5d4 authored almost 6 years ago
Move `filename` CLI validator to oonipl.cli

1e114bd7101c71e35564db1a3cebc0648db7d507 authored almost 6 years ago
Change uniq(input) to uniq(sha256(input)), closes #139

abef3b80e1b91c754781152c59ff691855b454f8 authored almost 6 years ago
Merge pull request #130 from ooni/add-license

Add standard OONI license file. It seems simple and good to me.

8de708cf3512c79cc2ab773e57bc32dfa6488d40 authored about 6 years ago
Add standard OONI license file

eafa16743c7eb624f9797e2bd263b8dfeaf1d671 authored about 6 years ago
Table/columns/attributes cleanup

a62726a7b861d153c61c2245d47247f21c96f4d3 authored about 6 years ago
Add ShouldCreateEmptyBucket image

ed16eed6cd842251c78f82da57333ce509e83ddd authored about 6 years ago
Enrich empty bucket text

dd0da9c4f92f68e556524f9f2868a4fbcc6bf429 authored about 6 years ago
Merge pull request #129 from ooni/doc-empty-bucket

Docuemnt empty bucket creation

19f9d28e5f0a7bd206b1157a4b72d8561d6f3baa authored about 6 years ago
Docuemnt empty bucket creation

4b2a40e1258e0974c6eeaddc42498485e57bcddb authored about 6 years ago
Merge pull request #127 from ooni/dag-failures

Add docs on investigating DAG failures

e9db58c071dc0515e842bc9bf9488f699e8fb1ea authored about 6 years ago
Add clear task screenshot

bde8114a4939a7ddf9add91173adc85553b2197c authored about 6 years ago
s/DAG/Task/

5b1c3b44b4622d52c5642887c7cd636353870392 authored about 6 years ago
Add docs on investigating DAG failures

51628c5cbb1073fcbefadd5dca9fb3c85668254e authored about 6 years ago
Add estimates of SimhashCache hit-rate and explanation for not-so-great speedup

3ba73981a38f92df7bd69fd4babd5c9aafa3e5d3 authored about 6 years ago
Log SimhashCache hit-rate both in items and in bytes, these value differ a lot

5fb852ef01a5200bea456bdd9213ed220636007c authored about 6 years ago
Preloading simhash cache from previous bucket to speedup ingestion

101d5ee8ea47a238bf1c587a84f31005309bcf86 authored about 6 years ago
Do not parse vanilla_tor test_keys if test_name does not match

e7b7ede8af96ebf418dfb5aa6dd51d96b18bd10a authored about 6 years ago
DNS TTL=0 is "valid" TTL value. At least it's not exceptional

c7acf01562b0c8bf4f82896685d3625040e91405 authored about 6 years ago
Use simhash cache on data reprocessing

It gives ~10x speedup of reprocessing on my machine

c8e6d3a4a5ca7b3f980da51bb659d93b9928cda9 authored about 6 years ago
Merge pull request #126 from badblob

Ingest `badblob` to track failures

fa7de28cd1baf30c84bca5980268561b4f5ba015 authored about 6 years ago
Ingest `badblob` to track failures

Non-ingested `badblob` blocked ingestion of data that is referenced from
Uganda report[U16] and,...

62f30b74e4b080385a452bf843530aa0ddc751e8 authored about 6 years ago
Fix typo

b1e17fbae563f1da6f961ac38331ed79c5366221 authored over 6 years ago
Merge pull request #124 from simhash

centrifugation: calculate simhash(body) and simhash(text(body))

7a549ee83dac396a107f15f3762f42005f84f6f2 authored over 6 years ago
Drop stale build script

0fec08f655766001f574ebab02bad54a921fe9c2 authored over 6 years ago
centrifugation: calculate simhash(body) and simhash(text(body))

It'll enable heuristics that rely on page similarity to mine more
blockpages our of existing OON...

8e14b20ec368572c0bb831fb958bcc70eb9108a6 authored over 6 years ago
Move some bits of code to `oonipi` module

Also, re-run mubench/hash64_shingles to check for possible regressions.

d8dde72d2622fc25488d82e27d5ca560822bc2d9 authored over 6 years ago
shovel: add rsync & ssh

11a5d747f796ee518841a23e4bfc121290c682e1 authored over 6 years ago
Merge pull request #120 refactor/ooni-uuid.md

f0f05668d3544143a079e79884fb361d7d0b4ba3 authored over 6 years ago
Add note on the fact that the offset may change

eb2faa7dfa0621c11120a0bb2cc62171c294727f authored over 6 years ago
Move "keyed hash" estimates to separate section

61ca75d248a279e7ac4f1ff3defb84bf445e5d18 authored over 6 years ago
Make a sentence more clear

e64ded763f6ab4966465d8e105f3750e9bc82ef4 authored over 6 years ago
Reorganise some of the text in ooni-uuid

Make some minor edits and formatting to the backfilling section

1545ccef92c8f9962e66b7fa026b5f10bfbc61da authored over 6 years ago
Add the reason to have "namespaces" -- to distinguish backfilled measurements from stamped during _transition_

6986358f2ee607c9243cf8b7fc806bfe545235be authored over 6 years ago
Fix typos

ac56b4e3c3b9288eccaa488720f376383caa3823 authored over 6 years ago
OOID: on 24bit and 20bit counters

bc2872b5f555db5085508b4f5140d1fc0d8ad415 authored over 6 years ago
ooid: estimate max-report-size in bits

500d7aed91ffb1dd011fc11e4353ae100c6030c9 authored over 6 years ago
ooid: fix typos

91a113b6f63faae4a062e26c7a7543fb3ec31f53 authored over 6 years ago
ooid: add total number of measureents

c82fda2ad3f7024f0e062ba05418a07928e6ef37 authored over 6 years ago
doc: on OOID, OONI measurement UUID

5a57a1e07afd5d73c30f772171eee4c173df9f41 authored over 6 years ago
cleanup_uploaded: to drop uploaded `canned` and `reports-tgz` files

a083c7d878eea1728db8685edd94850c122e81c5 authored over 6 years ago
cleanup_reports_raw: update with new dependencies (local reports-tgz & canned)

0b6cb2e7e4c0360374ecf21eff954e06226640f7 authored over 6 years ago
shovel: convert tar `--add-file` to `--files-from` because of lots of tiny NDT measurements

cd75d38165dcc9e08d371f2011590a6870238620 authored almost 7 years ago
Add `tar_reports_raw.py` script to create reports-tgz bucket

6c3f7cefbb2713a816b32c29536c35a73c40b709 authored about 7 years ago
Hotfix: moar test_name enum values

179cdb2473b8b4b39327ebc622e61af40cc70592 authored about 7 years ago
Merge pull request #98 from feature/tor-metrics

a950764394637c2066e31d15839837cfd447c303 authored about 7 years ago
Make close() one-way ticket

26dc901ea497349fabc624d76931884311d93ee6 authored about 7 years ago
centriguration: fix badrow handling

ca143c08d5b6cf61d9c9b384c1d5942f1cb14f90 authored about 7 years ago
Add missing file to Dockerfile

a924ddfacc34b90ca24d8282fb9ebe2a08ccc734 authored about 7 years ago
More good comments

90445631e86fac3c1dd053eb616a576c64e0a164 authored about 7 years ago
centrifugation: split steps `ingest`, `reingest` and `reprocess`

a86e4fe42dec869b02618ba2c3e282812052cab9 authored about 7 years ago
Add more good comments

d31a202c79d8adc7f36e562a476d8bbc011a1c0c authored about 7 years ago
Parse `tor_log` to ease further data mining

efb274c0d6119acdb3ae1d588279684f1ab59437 authored about 7 years ago
Fixup number of columns

727d6e1900a8ddeb9c9153790e269c0a887765ac authored about 7 years ago
Check for the vanilla_tor test_name

b6c90e85cb1c27f02fd88c5480a224385b1a488e authored about 7 years ago
Integrate feedback from @darkk in https://github.com/TheTorProject/ooni-pipeline/pull/97

5e8a091cdb65e0dc2aeb5265002c86a28672851e authored about 7 years ago
Add more information about schema changes in the Readme

a133595933a9e636f7bd1e7f7cecb85d689ac4db authored about 7 years ago
error is actually text

105d17c6045799fe910276a240a8dde2598f304a authored about 7 years ago
Add more docs on how to write custom feeders

b8125a830a3bdd29993a8faf658d89616fe7574c authored about 7 years ago
Add Feeder for vanilla_tor tests

902e6751340dd515096214f74c739751c9ddca55 authored about 7 years ago
Add vanilla-tor sql scripts

ac8e349b86246c6c50c4464bd971e5e43b4c277c authored about 7 years ago
Update README regarding `backfill` surprises

798796775849655f0230ba4c51f6f8d2f3278b9e authored about 7 years ago
shovel:0.0.19 - S3 upload resumption is fixed, fixing uploading to an empty bucket

84e738e71ac65cdc62cb816c91dc0e6d776b6448 authored about 7 years ago
shovel:0.0.18 - update `awscli`

705e8b0d68894aa6a13e2ea74a1833ccd4d2ce9a authored about 7 years ago
shovel:0.0.17 - re-implement part of `aws s3 sync` to speedup uncompressed file upload

eb028d8ba753ec99cd01d9cfe77bb9b5c74d2736 authored about 7 years ago
shovel:0.0.16 - avoid scanning WHOLE S3 bucket for every mini-batch

fd86c9cf45918aef69b8bd49ca436260a09b1827 authored about 7 years ago
shovel:0.0.15 - `tar -I lz4 | aws sync`

ecd5ab315510dc7c23a9721faf029b90baddee17 authored about 7 years ago
aws_s3_lz4cat_sync.py to upload decompressed jsonl.tar.lz4 to AWS S3

d446edae9462aea6031c109c19475504f55c1254 authored about 7 years ago
Add aws_s3_cp.py command

9ede0925d3b4cfb02b88d75b18fa5625def328bb authored over 7 years ago
Merge pull request #97 from TheTorProject/docs/3.0.0

bc938e62be3132d60362fc21d20e884d64130c53 authored over 7 years ago
Update .gitignores

3d1fd7fc379b34e07b7b1d2f330b8be7ef327806 authored over 7 years ago
Add link to more docs

d74f4ccca7b5c405061b8580606a0f630de46fe2 authored over 7 years ago
Update Readme.md with new pipeline information

33f976735e833d9aeb79fb698869d6b5dd2d6017 authored over 7 years ago
Rename old readme to readme-2.0.0

8907ac8ff37e05a55feebb2ff705fab3f882f4cb authored over 7 years ago
README: reference pipeline-16.10 (aka pipeline-v3)

e3e22ddc40a33ef49e992cd92a242d4012ba5e86 authored over 7 years ago
Merge pull request #62 from TheTorProject/pipeline-16.10

Declaring pipeline-16.10 stable. Just a day before 17.10 :-)

eed2c1a0d93f43677b6ae5b8cab0fb91c99eca57 authored over 7 years ago
metadb: set names for anomaly columns (without indexes at the moment)

421f1471c4c20a4c96f0b4f2e1c4c58387629add authored over 7 years ago
shovel:0.0.14 - canned/index.json.gz was not WRITTEN as a gzip file

cace129189216d66510f7fa921d6d8d80196ffc2 authored over 7 years ago
shovel:0.0.13 - canning/index is gzipped now, duplicate reports are tracked in metadb

dcfc6c109c1f686454abdc655d04e108aa3368d1 authored over 7 years ago