Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
https://github.com/openvenues/libpostal

sorted place names

865f99a0c1b426174c579e0d0545d3e631ed22ae authored about 7 years ago by jeffrey04 <[email protected]>
new place names

ceae1257af76bd29c946d24d9cd4c83994b86a89 authored about 7 years ago by jeffrey04 <[email protected]>
some new company types in malay

f3b76c1f284dd150356444924c2eecf527d73c8b authored about 7 years ago by jeffrey04 <[email protected]>
rearrange according to alphabetical order

c9d22d228f6eeefbcf2b3d6fdc47e3ab3ad7602e authored about 7 years ago by jeffrey04 <[email protected]>
rearrange into alphabetical order as in other languages

5e9d8f0a1ecf7794d0bbcec75932cac07f2378e7 authored about 7 years ago by jeffrey04 <[email protected]>
new building types

6d54cbcc825bb7519c65e9a346702e8e0d7ed49d authored about 7 years ago by jeffrey04 <[email protected]>
Merge pull request #1 from openvenues/master

Synching from upstream

867c3b825c445e8aaf88d8ef83b71e0b530988a3 authored about 7 years ago by Choon-Siang Lai <[email protected]>
[similarity] adding possible abbreviation functions to header, making everything const char *

fbf88aee8828f779a3d6805872609c3fae6721c7 authored about 7 years ago by Al <[email protected]>
[similarity] using new sequence alignment breakdown by operation to tell if any two words are an abbreviation. The loose variant requires that the alignment covers all characters in the shortest string, which matches things like Services vs. Svc, whereas the strict variant requires that either the shorter string is a prefix of the longer one (Inc and Incorporated) or that the two strings share both a prefix and a suffix (Dept and Department). Both variants require that the strings share at least the first letter in common.

b34e5783661990db8ba16ecd1f1ba68cff36c995 authored about 7 years ago by Al <[email protected]>
[similarity] a *NEW* sequence alignment algorithm which builds on Smith-Waterman-Gotoh with affine gap penalties. Like Smith-Waterman, it performs a local alignment, and like the cost-only version of Gotoh's improvement, it needs O(mn) time and O(m) space (where m is the length of the longer string). However, this version of the algorithm stores and returns a breakdown of the number and specific types of edits it makes (matches, mismatches, gap opens, gap extensions, and transpositions) rather than rolling them up into a single cost, and without needing to return/compute the full alignment as in Needleman-Wunsch or Hirschberg's variant

751873e56bd9ccf9e4e91b0307d31acc7776d4f8 authored about 7 years ago by Al <[email protected]>
[utils] adding unicode_equals function in string_utils for testing equality of unicode char arrays

665b7804227a411f09680dc17d86078000d478ba authored about 7 years ago by Al <[email protected]>
[fix] README badges

5f0e394ea8ba3ccf8e4e8ce650d7bdae3d742142 authored about 7 years ago by Al <[email protected]>
[build] adding --no-same-owner explicitly when untarring the data files for #267

669e52b329017348af81af2f40292f56f85d5de3 authored about 7 years ago by Al <[email protected]>
[dictionaries] adding variants of & as synonyms in all languages

3c6629ae3d24b6914cb3d065bbed8faaed9983a0 authored about 7 years ago by Al <[email protected]>
[similarity] exposing unicode versions of Damerau-Levenshtein and Jaro-Winkler distances

bc9f11d6e37a4c604648e8c3b4ddf21721195fd6 authored about 7 years ago by Al <[email protected]>
[expand] added search_address_dictionaries_substring to support the new use case (i.e. returns "does this substring in the trie?" regardless of if it's stored under the special prefixes/suffixes namespaces)

2d6079b06f3a5be427f108ddf3dd42321774c790 authored about 7 years ago by Al <[email protected]>
[expand] adding a normalization for a single non-acronym internal period where there's an expansion at the prefix/suffix (for #218 and https://github.com/openvenues/libpostal/issues/216#issuecomment-306617824). Helps in cases like "St.Michaels" or "Jln.Utara" without needing to specify concatenated prefix phrases for every possibility

053dca82ba241547fad4c2b81bfb6bab444a8fd2 authored about 7 years ago by Al <[email protected]>
[utils] adding functions for finding the next index of a full stop/period charater in a string

6d430f7e9ba230bf409b5cddfb966a62c1101ef0 authored about 7 years ago by Al <[email protected]>
[numex] fixing edge case where something like "IV Michael" could cause a partial Roman numeral to get added for the MI portion of "Michael"

e38e57b8e8b30610d4cf0d5439743798f394a2c3 authored about 7 years ago by Al <[email protected]>
[similarity] using NULL-terminated varargs in double metaphone instead of specifying the number of arguments, should be more maintainable

e8ae3bbbafd89b63fbef28544c1bac14c4788adc authored about 7 years ago by Al <[email protected]>
[dedupe] Jaccard similarity

5c0ecf89637ed67d1c6a94de7c5bf0ddf69faf2c authored about 7 years ago by Al <[email protected]>
[fix] making string args const in string_similarity module

4ccc2a9e9fa32ff7f19570d1bee5aa2ab6a8a317 authored about 7 years ago by Al <[email protected]>
[expand] adding ability to expand Roman numerals with ordinal suffixes like IXe in French

5c927e780fb2edd60902c436e5b46ea660e7cccd authored about 7 years ago by Al <[email protected]>
[utils] adding utf8_is_digit to string_utils.h

b7eda37e444990d3e4f25d8c3e0ac82b483e9970 authored about 7 years ago by Al <[email protected]>
[numex] adding functions to parse and validate a Roman numeral

1fbc238b60e333ce94f7d8356b0442630a407689 authored about 7 years ago by Al <[email protected]>
[phrases] when skipping/ignoring hyphens in trie search, make sure that the new longer phrase ends at a word boundary (space, hyphen, end of string, etc.)

1c5afcafd294e52fa9ec1296a84b1ebb46e581da authored about 7 years ago by Al <[email protected]>
[numex] when parsing numex, bail on rules in whole_tokens_only languages if there are contiguous rules with no right context rules (example: something that wouldn't make sense like VL in Latin)

9d2a111286451e9648787170718c1d2f4940730a authored about 7 years ago by Al <[email protected]>
[similarity] string similarity measures for Damerau-Levenshtein and Jaro-Winkler distances. Both operate on unicode points internally for lengths, etc. instead of byte strings and the Levenshtein distance uses only one array instead of needing to store the full matrix of transitions.

bd477976d1374f5edcd56e4b27f12c3615b52f9a authored about 7 years ago by Al <[email protected]>
[utils] function to create an array of uint32_t codepoints from a UTF-8 string, a few bug fixes to string_utils

245aa226e087fd847947c9fa0c1953e7a12b43eb authored about 7 years ago by Al <[email protected]>
[similarity] bug fixes and additional French, Spanish, Italian, and Slavic phonetics

c61007388bde7cb92782747fe83ec80e97c7a244 authored about 7 years ago by Al <[email protected]>
[similarity] adding basic double metaphone implementation

3a3aca8490c9124327b0bb24cf0907ebcac1e5c7 authored about 7 years ago by Al <[email protected]>
[test] test for utf8_equal_ignore_separators

2f2d3da7220e4c64e41680f9998007c1b50b4616 authored about 7 years ago by Al <[email protected]>
[utils] adding utf8_equal_ignore_separators to string utils

09fbb02042882bdfa54a0ae23cffa6785c4bf6e3 authored about 7 years ago by Al <[email protected]>
[utils] adding utf8_len function for strings, and utf8_is_digit

f8a808e25426f7c29c6ad8d9420be5e0a219b6d8 authored about 7 years ago by Al <[email protected]>
[merge] merging commit from v1.1

448ca6a61a1b0b5e2a8f539f3d4d0e475928a49a authored about 7 years ago by Al <[email protected]>
[auto][ci skip] Adding data files from Travis build #268

bb277fb3268fb77af12652d3e6dec8bbfeb879df authored about 7 years ago by Travis <[email protected]>
Merge pull request #257 from mkaranta/patch-1

Add 'bld' as an abbreviation for 'building'

e60139757f96825ed1372e800316a3147f5d2dae authored about 7 years ago by Al Barrentine <[email protected]>
Add 'bld' as an abbreviation for 'building'

I noticed this was missing while testing a batch of addresses. Hopefully it doesn't introduce mu...

c96a042e86546d7eb84f0653ef4fb94586526db4 authored about 7 years ago by mkaranta <[email protected]>
[fix] removing log error for sequences of length 0

c984dca459d5ed3bf2648b9015a255ab399de0ad authored over 7 years ago by Al <[email protected]>
[fix] typo

94a0e842e74598c6d52a4809ea4604b4be25ca60 authored over 7 years ago by Al Barrentine <[email protected]>
[code of conduct] adding stronger, more specific language about hate speech in code of conduct

34e2c4772e960c08a1151dccf5902b9b360751f5 authored over 7 years ago by Al Barrentine <[email protected]>
[docs] updating README examples of normalization now that canonical forms are no longer transliterated

2bfa8efefbf5258275af73ac95278abbe4a02b87 authored over 7 years ago by Al Barrentine <[email protected]>
[fix] normalize canonical strings (after expanding abbreviations, concatenated suffixes, etc.) with Latin-ASCII, Latin-ASCII-Simple or simple UTF-8 normalization depending on the options

0c6af2b74cd07a1f386d3a16e26db3fede56f99b authored over 7 years ago by Al <[email protected]>
[docs][ci skip] update contributing section in README

ed011e50d589fdc0eab6ce278ee7249c1b89619f authored over 7 years ago by Al <[email protected]>
[fix][ci skip] updates to contributions guide

caf241593806672b4fa89119dc381638995c8b1d authored over 7 years ago by Al <[email protected]>
[fix][ci skip] removing repetition in contributing guide

da2affbacbbb0023a8bc4e7eeaec6297addf1ebb authored over 7 years ago by Al <[email protected]>
[docs][ci skip] adding contributing guide for how to submit issues

2c06f26f3de393e3994a76ad64d8fea001f0ce23 authored over 7 years ago by Al <[email protected]>
Merge pull request #231 from michaelkrog/patch-1

Changes front matter of iis.yaml to correct description

6ca6493d0b1e36c2cee9e8857fa94ffca3865c4f authored over 7 years ago by Al Barrentine <[email protected]>
Update is.yaml

a36dcc8b9c028a2ee1d2d492cfdb0956de8847b5 authored over 7 years ago by Michael Krog <[email protected]>
Moving language around in code of conduct

7352dc74c6821afa572517d0089ddd277c10bc97 authored over 7 years ago by Al Barrentine <[email protected]>
Adding a custom libpostal Code of Conduct

4cde250463c2c9cadfd0411de404ebe09f3935af authored over 7 years ago by Al Barrentine <[email protected]>
Merge pull request #229 from openvenues/32bit_numex_fix

32-bit safety in numex table loading

dab3b95ae1361d9a2d60d855d58701439daf52a3 authored over 7 years ago by Al Barrentine <[email protected]>
[fix] 32-bit safety in numex table loading

97044f5a8baf08e405be0bb5182d3c66cf6f2ecf authored over 7 years ago by Al <[email protected]>
Merge pull request #215 from xiamx/patch-2

Add Elixir language binding to README.md

0cb8c61fb0619a47ab19ab97e847bea8af3a7e44 authored over 7 years ago by Al Barrentine <[email protected]>
Add Elixir language binding to Readme

abcf72be2e12655492cf74b67ba69312c0f18f97 authored over 7 years ago by Mengxuan Xia <[email protected]>
Merge pull request #214 from iestynpryce/master

Fix remaining log_* compile format warnings

50cf14846c6747408ec86ca62d1dfb9b00585743 authored over 7 years ago by Al Barrentine <[email protected]>
Merge https://github.com/openvenues/libpostal

b96a6871824b83ac404e50044ff6b82bf259bad2 authored over 7 years ago by Iestyn Pryce <[email protected]>
[auto][ci skip] Adding data files from Travis build #250

8dd84b71bad4c70150e60c7bda8071b9bd8902f8 authored over 7 years ago by Travis <[email protected]>
Merge pull request #212 from openvenues/bbraunay-master

modified Indonesian dictionary updates

e9696e91667ff0e7be769661546fa6dc8280bc41 authored over 7 years ago by Al Barrentine <[email protected]>
[dictionaries] adding a separable prefix for Jl. and Jln. so things like Jl.Utara get separated and expanded

1948634bf3ad1a305698336832592996f56eb9b2 authored over 7 years ago by Al <[email protected]>
[dictionaries] adding ambiguous expansions for all Indonesian abbreviations 1-2 characters as they could also be initials, etc.

3b5b5d8baa0edd992c7bf50fc0b7e0bfa819e31a authored over 7 years ago by Al <[email protected]>
[dictionaries] removing English words from Indonesian unit types

f5071024571aaa95ea759664b8ed4b85d0d13b41 authored over 7 years ago by Al <[email protected]>
[fix] changing national to nasional in Indonesian

4b24699e1f8bf3b5be05a6e1855ff922b6c682b4 authored over 7 years ago by Al <[email protected]>
[dictionaries] moving Kampong to normalize to Kampung in Indonesian, better if there's one canonical form

4df48fb412adaa5df7a80a6a670fb9e0bf1b302b authored over 7 years ago by Al <[email protected]>
[dictionaries] removing a few English words and dupes from Indonesian place names

ec79c610ebce297c09ee79327e4e49764de6b538 authored over 7 years ago by Al <[email protected]>
[dictionaries] removing no fixed address from Indonesian dictionaries

77365a56a53e5679a8bb540c0c752343908fcfea authored over 7 years ago by Al <[email protected]>
[dictionaries] removing level/platform/podium from Indonesian level types

8a35cfcd80fed268ecbb742e5fa8eb1ea956e037 authored over 7 years ago by Al <[email protected]>
[dictionaries] separating Mas and Abang

364b00da01f14157aa3d185e0dbd43acf500991f authored over 7 years ago by Al <[email protected]>
[dictionaries] remove Doktor from academic degrees in Indonesian dictionaries

83378049ee461fd13e1396685079767840ef0f99 authored over 7 years ago by Al <[email protected]>
[dictionaries] remove nonprofit from Indonesian company types

52593c6374d89225fbf4176158fea5223b3a47c2 authored over 7 years ago by Al <[email protected]>
[dictionaries] moving some of the existing chain stores for Indonesia to the all/chains.txt dictionary

08524f4b0708d934115117ab420355f42de7ba18 authored over 7 years ago by Al <[email protected]>
Merge branch 'master' of https://github.com/bbraunay/libpostal into bbraunay-master

18b2fb0ec88cf9171ab6d16bb39908f49a1779cb authored over 7 years ago by Al <[email protected]>
Add portable way of formatting khint_t type (from klib)

87cf7b5bca7fd4bec90a7dcea3c67a0904c6ba21 authored over 7 years ago by Iestyn Pryce <[email protected]>
Revert format regression introduced in ecd07b18c118fc1e52ea30d9a91d7dc6f049258c

d8239a9cc4bb61a9c12b643c275fba1c69a65ea8 authored over 7 years ago by Iestyn Pryce <[email protected]>
Fix log_* formats which expect long long uint but receive uint64_t.

73d27caeb977ede9878322fb1ea0a5786278777a authored over 7 years ago by Iestyn Pryce <[email protected]>
[dictionaries] add more option on toponyms

695756d48421e1029f7b5e26e4682cd1e6463cc5 authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
Merge latest https://github.com/openvenues/libpostal

0c3ef33682319a6265ea1784e87dee3b7415872c authored over 7 years ago by Iestyn Pryce <[email protected]>
Fix log_* formats which expect long long int but receive int64_t.

6aa3cb61fda8aba41c5ac43fd2fc00c3601a8b0d authored over 7 years ago by Iestyn Pryce <[email protected]>
[dictionaries] Remove additional english words from ID dictionary

03be9eea4938c5b0e46a772eb5eb8982a381a88f authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
[dictionaries] Remove english words from ID dictionary

09cb28cb14bc6e421aca13093ffbcc183878ea0b authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
Merge pull request #204 from iestynpryce/master

Fix log_{debug,info} formats which expect size_t but receive int.

b79934394a596077c3ba223fb876f07118d6be5e authored over 7 years ago by Al Barrentine <[email protected]>
Fix log_* formats which expect size_t but receive uint32_t.

ecd07b18c118fc1e52ea30d9a91d7dc6f049258c authored over 7 years ago by Iestyn Pryce <[email protected]>
[dictionaries] Fix blank synonym in numbers

3b2fb597fe67ed8a5f0e5a461e317eee73227dbe authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
[dictionaries] Fix blank synonym in academic degrees

7f14dafd211ce66194c619ad3c00cba2a6457ba0 authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
[dictionaries] Indonesian dictionaries to support new config

251458061180f51e53fae65ee08f8a967955aea9 authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
[dictionaries] Indonesian dictionaries to support new config

60cde05c3d813fe40a9f3959231bb29292392463 authored over 7 years ago by Yanuar Budi Baskoro <[email protected]>
Fix log_{debug,info} formats which expect size_t but receive int.

87a76bf96713248fd13b693d51c6505f47a2cdcc authored over 7 years ago by Iestyn Pryce <[email protected]>
Merge pull request #201 from iestynpryce/master

Fix log_debug formats which expect unsigned int but receive size_t

2a0fb69ae57de7cdbbc0e0d141691c8c83ef7fab authored over 7 years ago by Al Barrentine <[email protected]>
Fix log_debug formats which expect unsigned int but receive size_t

f34fc56fec6a8f4e44cc84fe260ad518faa57431 authored over 7 years ago by Iestyn Pryce <[email protected]>
[fix] adding maximum number of permutations for libpostal_expand_address to consider (n=100 for both the inner and outer loop, so max strings=10000), fixes #200

a7e67c4967b9f12c2461e14cc4db2662309a7fb8 authored over 7 years ago by Al <[email protected]>
[fix] check that possible ordinal suffix also has non-zero digit length before normalizing

5780a08b4854d07452b65ef2df995e081f5614cc authored over 7 years ago by Al <[email protected]>
[fix] open files in binary format for #69

cea3ced5334f4626043d6bf3f0c794299b842822 authored over 7 years ago by Al <[email protected]>
[fix] terminate the char_array if input token is zero-length in add_normalized_token

6ea22732630da11948801dc60b458dcc66fd7af3 authored over 7 years ago by Al <[email protected]>
Merge pull request #189 from openvenues/fix_trie_search

Reset to root node in trie search on partial failed matches before rolling back pointer

04eb2d4539488fe61fa98bffde0ea2ce39242342 authored over 7 years ago by Al Barrentine <[email protected]>
[fix] in tokenized trie_search, in the case of a partial failed match, reset to the root node before rolling the pointer back to phrase start + 1

278679b7fb30f61bc67a78023493bd35ebaeaf56 authored over 7 years ago by Al <[email protected]>
[auto][ci skip] Adding data files from Travis build #231

074b6ff802aff20ccdd69b2c989e2aaa9595b882 authored over 7 years ago by Travis <[email protected]>
Merge pull request #187 from openvenues/degree_symbol_ordinal_suffix

Ordinal suffix tests

004d3d98c960c50d8c0ad543c27c9b1947c67bf9 authored over 7 years ago by Al Barrentine <[email protected]>
[fix] whitespace in numex config to trigger build

7bce358ca6c8bef72163c4736bfd82c5835eaeb0 authored over 7 years ago by Al <[email protected]>
[fix] no parens in travis config grep for numex change detection

676fb9bcbcb7173ac0eec97d966d92a9e9734676 authored over 7 years ago by Al <[email protected]>
[fix] adding numex change to trigger build

86956db055477ee7d28319e562eceef00f684421 authored over 7 years ago by Al <[email protected]>