Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/openaddresses/dedupe

Code for deduplicating OpenAddresses records
https://github.com/openaddresses/dedupe

Indexing hashes only at the end

1f54d04eff6f03b0ef1037c3550d79a74ed0920f authored over 7 years ago by Michal Migurski <[email protected]>
Improved output for debugging

f49f51ee77d55370476b129d4d849bb274ee9481 authored over 7 years ago by Michal Migurski <[email protected]>
Rewrote expand-reduce.py to use sqlite3 instead of networkx

e2bd5ef9b4d3c55452966d3dd9f813f9ba9ee9f4 authored over 7 years ago by Michal Migurski <[email protected]>
Only adding edges for different hashes

e959ac896ad8f790e0c2e1f640f495af194fd288 authored over 7 years ago by Michal Migurski <[email protected]>
Documented new address-map.py usage

bed7470612ced7f80dd6fbcec2dc82895e6acd5b authored over 7 years ago by Michal Migurski <[email protected]>
Rewrote address-map.py to work under parallel

c063893a592219436ef6efdb7e54ba226e153063 authored over 7 years ago by Michal Migurski <[email protected]>
Moved blank check earlier in address-areas.py

078c0e3a086669547f58b4ceb7d4f4c1705587e3 authored over 7 years ago by Michal Migurski <[email protected]>
Skipping blank addresses and omitting unneeded fields in address-areas.py

83c4a62e12624f305e79f3fe3765875857b96840 authored over 7 years ago by Michal Migurski <[email protected]>
Capturing graph node removal exceptions in expand-reduce.py

7c39a9223f28b874c3621f95686c4585d42edf53 authored over 7 years ago by Michal Migurski <[email protected]>
Writing directly to output instead of keeping a giant list of features in expand-reduce.py

c3f388674c9285376ad1bc8bbbe224b45224add8 authored over 7 years ago by Michal Migurski <[email protected]>
Switched to removing graph nodes in expand-reduce.py to save memory usage

45911be1b287caeafd982b0e92f3cdcc0057736c authored over 7 years ago by Michal Migurski <[email protected]>
Added notes for advanced parallel usage

dcf5cc437a9de3573d8be9f258398166a9d7155b authored over 7 years ago by Michal Migurski <[email protected]>
Modified expand-reduce.py to output CSV instead of GeoJSON

9fb2a57ebade9c0d82bdb4bb6ee76a40b0508649 authored over 7 years ago by Michal Migurski <[email protected]>
Started outputting merged points instead of multipoints

aa0c376a2e04791e84fa0332b221291d212762c4 authored over 7 years ago by Michal Migurski <[email protected]>
Fixed a source of errors in expand-reduce.py

f2891cde6e0d56833bab71c4761eb0da6e5d1923 authored over 7 years ago by Michal Migurski <[email protected]>
Switched to explicit PROJ.4 definition of web mercator projection

1acfe0669af056bc849223827184869c90c5f69f authored over 7 years ago by Michal Migurski <[email protected]>
Added split-areas.py script to better parallelize large areas

cdae6e40ffee9ed1147f4baffa45560e0e8da3f8 authored over 7 years ago by Michal Migurski <[email protected]>
Added required output filename to address-areas.py

b41baebe587def9e5c3298c88d140072fc99facb authored over 7 years ago by Michal Migurski <[email protected]>
Updated expand-reduce.py to support GNU parallel use

0011960db62bc177125882fdb94a68c18f354545 authored over 7 years ago by Michal Migurski <[email protected]>
Updated address-map.py to no longer need sorted input

5d14b8ddcd08c47b01fee7528c8acd184f220beb authored over 7 years ago by Michal Migurski <[email protected]>
Added docstring and argument to address-areas.py

0652dfc7eed385185fc651c45670edaf64509dd5 authored over 7 years ago by Michal Migurski <[email protected]>
Added base U.S. census shapefiles

a6d6ae4706e4c79de07a5df76d8895673389edcf authored over 7 years ago by Michal Migurski <[email protected]>
Removed PostGIS dependencies

e2f9cb28d008aac8be87d8a000eb02ef99c2fa9e authored over 7 years ago by Michal Migurski <[email protected]>
Added preparation for boxed areas

1a28e326d34283bbea8b965953063c542fbd5849 authored over 7 years ago by Michal Migurski <[email protected]>
Connected address streams back to expand-reduce.py script

961129be8ccdf3968897f056c76c2199473d47c6 authored over 7 years ago by Michal Migurski <[email protected]>
Switched from plain bounding box input to possible WKT polygons

8a97d05124e1680716fa6d466fb941603a24e291 authored over 7 years ago by Michal Migurski <[email protected]>
Added requirements

ac05abb6810c83fc76794d8710d9658952dc2d79 authored over 7 years ago by Michal Migurski <[email protected]>
Updated sample areas

11845f8e6c355af57d177b56a3319d5c5c626182 authored over 7 years ago by Michal Migurski <[email protected]>
Added Butte, MT example description

d90ee9c75ae187416b316e4c47e5ccf6f3b8ef61 authored over 7 years ago by Michal Migurski <[email protected]>
Added simple README

70e482ce9a137b21fb0322ec2c1b22d55987341c authored over 7 years ago by Michal Migurski <[email protected]>
Split expansion script to use simple commandline mapreduce approach

33744ef83a52cfa2aaa9c07a8a1becd305228e7b authored over 7 years ago by Michal Migurski <[email protected]>
Updated expansion script to use naive number/street string matching

0fa27830cf6f24cc1f1f94877b8b12d95b1f113b authored over 7 years ago by Michal Migurski <[email protected]>
Created script to expand OA data and mark duplicates with libpostal

9ba5377de48f8bd959d5bacb9bc8a0fd333d57f8 authored over 7 years ago by Michal Migurski <[email protected]>
Created script to import OA data from expanded zip files

20deb311646cd22102a8177a7bd00d4e7c00bf7e authored over 7 years ago by Michal Migurski <[email protected]>