Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
https://github.com/openvenues/libpostal

[fix] cast

5383640c14365e9501794cff5928c3789eabcd69 authored over 9 years ago by Al <[email protected]>
[numex] Separating rules from keys for Linux gcc compilation

dd391eabe59c849b616831fc8321e62eb04c14a4 authored over 9 years ago by Al <[email protected]>
[build] public-read permissions when uploading to S3

e346b831cbdab38738890c9e667e42a6b1664b84 authored over 9 years ago by Al <[email protected]>
[build] Not compiling with -Werror for now

ad584671c4817694d157052168dc5c47a71e4519 authored over 9 years ago by Al <[email protected]>
[build] Link to math library

f170f707273d70d13bb8b2744b5e05914f98c4e6 authored over 9 years ago by Al <[email protected]>
[build] builder programs are now in noinst_PROGRAMS, Makefile target to upload data tarball to S3 (with proper credentials)

423e2c86c75a810dda5b07ac7e6fc1193be6a3e9 authored over 9 years ago by Al <[email protected]>
[fix] stdint header in address expansion rule generation script

a5ce1f12dd6415957bdd398e6c8932fee0f6075b authored over 9 years ago by Al <[email protected]>
[dictionaries] Removing dictionaries/all/personal_suffixes, can add to languages as needed

ee982cd872e34301351f17d89487fd7e677eb085 authored over 9 years ago by Al <[email protected]>
[phrases] resetting node position when continuation falls off the trie

5acf7a4f3e29a3767cfb765542947c7199968257 authored over 9 years ago by Al <[email protected]>
[build] Adding bootstrap.sh script and removing configure from version control

a77c8e132186887607a894f2b1d61baec58e159f authored over 9 years ago by Al <[email protected]>
[fix] making transliteration path relative to data dir

cd0f95f9e2d48d60b1dba12e3668f4c93620aee1 authored over 9 years ago by Al <[email protected]>
[build] better autoconf checks for time and dirent headers

2ba0e814adb1c397462bf93fca012041cef7595e authored over 9 years ago by Al <[email protected]>
[config] Including Autoconf config.h in internal config

d0679450e364915ecd1398c55520a729f7d60b60 authored over 9 years ago by Al <[email protected]>
[numex] Fix to whole_tokens_only numeric experession parsing where numex was pushing a number onto the stack even on encountering a new rule context even though the token was not completely parsed

5df9e123af916923e175815fcf85906ae9d63e7a authored over 9 years ago by Al <[email protected]>
[fix] removing comment

53f54d6454ece5630ddcd152c35d3edcd7359ebd authored over 9 years ago by Al <[email protected]>
[build] Adding command-line test and bench programs

2106a6cfe4f44ebc57e14ff3e09ad5d98edad76c authored over 9 years ago by Al <[email protected]>
[fix] data dir for tar extraction

5aa2e99b92beeadd1946388a0b9c495a7a7d17e5 authored over 9 years ago by Al <[email protected]>
[build] Fixing runtime check/save of last updated file for package data tarball

54aa6fe7df7c13beeee3a8ebacc073bcb1fd6d26 authored over 9 years ago by Al <[email protected]>
[rm] Better not to keep that file in the repo

f38a53601b19285dbd56c0c567787d6dc6aabaec authored over 9 years ago by Al <[email protected]>
[build] Adding default file to track last updated date

770f44198c13df03db33892bc3f9932107366798 authored over 9 years ago by Al <[email protected]>
[build] Adding generated configure script

c0c21b81f261a68700e7566fc0826edbd7cbc17e authored over 9 years ago by Al <[email protected]>
[fix] float comparison

a197d04b1a87f58f8ce163803fefe9b76fd839b1 authored over 9 years ago by Al <[email protected]>
[build] Changes to Makefile.am to build on Debian/Ubuntu, fixing downloading of the data tarball for Mac and Linux

f161f68d53c1399bcf3436e9c6cfb526720964fe authored over 9 years ago by Al <[email protected]>
[fix] Removing C++ checks from all but the main API functions

9b69d1f67a83329a0daef4bd0a85f91952a12725 authored over 9 years ago by Al <[email protected]>
[fix] Adding stdint.h include to most of the header files for portability

359a1efb03bde4bef0b659e34f3c7da67cb990cf authored over 9 years ago by Al <[email protected]>
[fix] restoring ctype.h include

0738a57caae5c76c30311840200eeee741b1dc19 authored over 9 years ago by Al <[email protected]>
[fix] includes, matters on GCC/Linux

06d2e916a1b7692d65562c9b853711a37714454b authored over 9 years ago by Al <[email protected]>
[build] Fixing data dir download in Automake file

ae9825b9f9a0a54ec20e3bf41d94778e9046b57d authored over 9 years ago by Al <[email protected]>
[fix] includes

d7ebcd046e6393f68c9aa8169ba7bddb69b40a80 authored over 9 years ago by Al <[email protected]>
[api] Adding address component constants to libpostal.h, returning char ** instead of a cstring_array to simplify API/dependencies

f246c2ee95e26d6ab57dad6d45a50642db27898f authored over 9 years ago by Al <[email protected]>
[config] config.h=>libpostal_config.h so as not to conflict with autoconf

61d586fa1db3acc83214f6bcd47ee4e44f81903f authored over 9 years ago by Al <[email protected]>
[build] adding Automake file in src, including rule to download data dir tarball

2bedb695a216f7e03a9d0496a86dbcf95ae0ea90 authored over 9 years ago by Al <[email protected]>
[build] Main Automake file and modified version of Sparkey's Automake file

4b9f11eca56e2dad770b0b5033f1589e84ea4f2b authored over 9 years ago by Al <[email protected]>
[build] Adding Autoconf file

fe078cff66a59b3094f48d4027df16a6b9360aee authored over 9 years ago by Al <[email protected]>
[fix] Fixing warnings in unicode script data

1d39916aaad2cd8f818b5d1b144bfb028e4db6e0 authored over 9 years ago by Al <[email protected]>
[expansion] Re-generating address expansion data file

770ce4256f7b4ba38930f4f25a5535c847551a1e authored over 9 years ago by Al <[email protected]>
[dictionaries] condensed forms of sin numero in various languages

90cde298dd992112345cf2537fc8b4b3ccbc1e57 authored over 9 years ago by Al <[email protected]>
[api] Initial libpostal API, combining string normalization, transliteration, numex and address dictionaries

753c6efb1d923aaaf95c9d0e84d10efc2b69314c authored over 9 years ago by Al <[email protected]>
[fix] tokenized trie search was skipping tokens in some cases

b27030e39fd7aacc27cbf80f98deae5b4a933104 authored over 9 years ago by Al <[email protected]>
[utils] string_contains_hyphen method

3178eda501fbfde16e9dd79b513de66e35f5dd41 authored over 9 years ago by Al <[email protected]>
[normalize] Adding an option when normalizing tokens to split tokens of the form [\w]+[\.\-]?[\d]+ for cases like I35, CR123, R-66, RN.7, etc. where the alpha component is an expansion

46141a6c36eca1652a691ae8b364ebd1d9122a76 authored over 9 years ago by Al <[email protected]>
[expansion] NULL_CANONICAL_INDEX constant

f10dd49c5895e68f13d2a1e41d501682850e3544 authored over 9 years ago by Al <[email protected]>
[dictionaries] Italian abbreviations for strada

6bf563ca89776564b02751bcf5ff054c568a5ce5 authored over 9 years ago by Al <[email protected]>
[fix] compiler warnings

fe4789a6659b9b85c4d35b1769d15440146590a6 authored over 9 years ago by Al <[email protected]>
[normalize] cstring_array instead of string_tree for token-based normalization

551904d2029d00bcd6b5f9005734fc62677ae715 authored over 9 years ago by Al <[email protected]>
[geodb] Adding an is_canonical bit field to geodb trie values

90d4da9e72c840ca77170af1c083fc7c726abedb authored over 9 years ago by Al <[email protected]>
[numex] LATIN_LANGUAGE_CODE constant for Roman numeral normalization

9bc902f575edf5fc69aa9fb8b625466e5f4b429f authored over 9 years ago by Al <[email protected]>
[numex] Fixing numex parsing for lone stopwords and certain prefix matches that were getting mistakenly converted e.g. settembre => 7mbre

df1410da8c404489467c00f9cf77ca95e91846ef authored over 9 years ago by Al <[email protected]>
[numex] Fixing hyphen-initial numeric phrases that end the string

a16f0dabcb7cae258546956910c6d8afce721005 authored over 9 years ago by Al <[email protected]>
[dictionaries] Updates to English and Spanish dictionaries on looking through a data set of real test addresses

3dc6115a4eb0812ea7b8763cef2cd9551ed063ea authored over 9 years ago by Al <[email protected]>
[fix] transition to SEARCH_STATE_NO_MATCH in trie_search_tokens_from_index on a return to the start node

0f5b69c06b5ac6bac932ebf631f110c7b5f2f7ec authored over 9 years ago by Al <[email protected]>
[fix] NULL check

243f32792866a132ba385749b499b77ea2f2f746 authored over 9 years ago by Al <[email protected]>
[utils] string_tree_num_tokens

7aee159c0ca72f75a5813bedda155d9243221221 authored over 9 years ago by Al <[email protected]>
[fix] specifying numex dir with cross-platform PATH_SEPARATOR

b812d90c599f5106ea9458cc796afd8697daa5e3 authored over 9 years ago by Al <[email protected]>
[geodb] trim strings in geodb builder

7ff9a6054df749e9e01b218dd36c41eb5d5c7ec8 authored over 9 years ago by Al <[email protected]>
[normalize] adding an option for string trimming in normalize

053b987d58383a5bc6e68ff4c1477cf59ced87e0 authored over 9 years ago by Al <[email protected]>
[utils] Making string_trim handle all kinds of UTF-8 whitespace/separators

b94526a27b25ed50da67f4814357bdd17024b876 authored over 9 years ago by Al <[email protected]>
[numex] Regenerating numex data file

eab4c554d61d9f79e25b613fcd4b23f46e7b2d4a authored over 9 years ago by Al <[email protected]>
[numex] Making all languages except the ideographic writing systems (CJK) whole_tokens_only for numex. Otherwise non-number prefixes may accidentally get converted into numbers. May add some more options around this in the future.

0ab1434f205d203c3e25f790e4bdf34647b36765 authored over 9 years ago by Al <[email protected]>
[numex] Fixing case of hyphen/space-initial phrases in numex, as well as whole token only languages with ordinals

d2539f5b57e541375dd4a2669f4ef7f75171bb7d authored over 9 years ago by Al <[email protected]>
[phrases] Allowing trie_search to process tokenized input with or without whitespace, and to handle ideographic characters correctly

8ff4ace63b937237a3621884875c7b73088e9497 authored over 9 years ago by Al <[email protected]>
[fix] Clearing paths before reuse in geodb_builder

38b10b9dd0df7b86c82c57f8a13d429243164b93 authored over 9 years ago by Al <[email protected]>
[fix] warnings in string_utils.c

93042761ac9fcdd4db189e832107503ac4d3529c authored over 9 years ago by Al <[email protected]>
[geodb] Adding a msgpack'd list of ids for naked string keys in geodb builder

50ee95ff7dd8188187e7d4971136ea5cc36dcff5 authored over 9 years ago by Al <[email protected]>
[utils] cstring_array_terminate, moving msgpack_utils to separate file

a67ec44a087952899c5261021aa8d3220499dddc authored over 9 years ago by Al <[email protected]>
[fix] county road

42f6be7434debe017582385f3901dbe849478cde authored over 9 years ago by Al <[email protected]>
[transliteration] fixing length-based transliteration

2ff8c0fd1ef712f4800f9104a49a12f4ebc76d6e authored over 9 years ago by Al <[email protected]>
[expansion] tokenized version of search_address_dictionaries

71ffdf9cbc12d26dee941bdba898f8e88c053b2d authored over 9 years ago by Al <[email protected]>
[fix] unnecessary headers

ee96dab93cbfa10ea9924c03361c9760db2b81d9 authored over 9 years ago by Al <[email protected]>
[utils] string_tree_iterator_foreach_token

e549e76806a300bc49a7a5248d5fc32447699c25 authored over 9 years ago by Al <[email protected]>
[utils] cstring_array (contiguous) to array of malloc'd strings

2adaf475c269a36be79b8c91949d0ffc1057a5f1 authored over 9 years ago by Al <[email protected]>
[utils] vector extend method

e9277d73399f84850270be8da8777bfb2ae2b58b authored over 9 years ago by Al <[email protected]>
[fix] address training data carriage returns

cdb9afddd37d400f4744dd636ad11b6afcf41a3a authored over 9 years ago by Al <[email protected]>
[expansion] Regenerating address data file

9fb1eae8771d1ceedfe3dc2d7dc056f193acd741 authored over 9 years ago by Al <[email protected]>
[dictionaries] Adding a few versions of the phrase "centro commerical" in French, Spanish and Italian after a review of addresses in those languages

cff72a0cb3c7569c38cc700f5b197dced4d00fb1 authored over 9 years ago by Al <[email protected]>
[expansion] Add concatenated suffixes to the suffix keyspace of the address dictionary trie and concatenated prefixes and elisions to the prefix keyspace

351c7c8c2e0b45dd4af24e838d65c2155691511f authored over 9 years ago by Al <[email protected]>
[search] Modifying trie_search_prefixes to use the new key schema

90a91cadd0ac5c765761cf99d8b35d9be2ae1476 authored over 9 years ago by Al <[email protected]>
[phrases] trie_add_prefix method and a schema for prefix keys, e.g. elisions in French and Italian, separable prefixes like Hinter in German, etc.

bb7688d8d1fc68508d0bbf5f6ab3fc71e96be680 authored over 9 years ago by Al <[email protected]>
[numex] Adding a replace_numeric_expressions method (returns NULL if no replacements were made), fixing lengths in situations where two unrelated numbers are joined by a stopword e.g. in the phrase "one and one" the "and" acts as a delimiter vs a phrase where the stopword acts as a joiner like "one hundred and twenty"

359cd62e20dbf145a9aeebc250a93c3104bc7e17 authored over 9 years ago by Al <[email protected]>
[numex] Re-generating numex data

12959aa48332dbbe91f9b2ad3703732bcb35ac3c authored over 9 years ago by Al <[email protected]>
[docs] Adding some documentation for normalize.h options

5239c365d09bd4da4616864d7143f807ce25f808 authored over 9 years ago by Al <[email protected]>
[fix] typo and frivolous key

caf714f06f6fb0f3568a74752267087185051404 authored over 9 years ago by Al <[email protected]>
[numex] Adding validation checks for numex JSON

87566bb6a5db88aa58d1b806216d0768f92fc7da authored over 9 years ago by Al <[email protected]>
[utils] Adding a cstring_array_foreach macro

96538469ddeacfc65224f93bfc7736625e10321f authored over 9 years ago by Al <[email protected]>
[expansion] Changes to address_expansion struct to allow for multiple dictionaries per record. Only adding unique canonical strings to the string array

27af28eacf4264a3d674f1659087519733f1d098 authored over 9 years ago by Al <[email protected]>
[expansion] generated header and data files

454be891215009542604eac1f3e71ce105a5a2dd authored over 9 years ago by Al <[email protected]>
[expansion] Adding an array of dictionaries to each (phrase, canonical) pair

b27af13f8acf10c5a577d8e58eb72d9dabc58402 authored over 9 years ago by Al <[email protected]>
[expansion] Adding both key (for membership tests) and language-prefixed key to address dictionary

0a9e92f11f73882035d17d0c8cfffe3543cefcf2 authored over 9 years ago by Al <[email protected]>
[expansion] Constant for the "all" dictionary

09004aa5f1e3266ad493ad73f023ef4d2224d79c authored over 9 years ago by Al <[email protected]>
[expansion] removing the self param from address_dictionary methods, adding search_address_dictionaries method which searches a string for phrases in a particular language

f61d9931579955d9fe658b6eda89f92badb07965 authored over 9 years ago by Al <[email protected]>
[numex] New numex generated data file

3da4b5d8c27a05a2241a8381b7871c8352f3f142 authored over 9 years ago by Al <[email protected]>
[expansion] Language prefixed keys

ba8ff2b0c673eaf453c8e82f66a1104f8eb402f6 authored over 9 years ago by Al <[email protected]>
[fix] method name, strlen and fclose

157727d2494e653c583556d3992ea277d06cdf19 authored over 9 years ago by Al <[email protected]>
[mv] Moving all repo data files to a resources dir, data is only for runtime files

64a63fdf51df03daf958f105fb5e9e050968a4cd authored over 9 years ago by Al <[email protected]>
[fix] add_token_alternatives

a38b924c5d9c70f0edab642554f0077372a51fa5 authored over 9 years ago by Al <[email protected]>
[tokenization] Adding a version which of tokenize which keeps whitespace tokens

71be52275d269518eaa2870abc90be7053ca4e5f authored over 9 years ago by Al <[email protected]>
[expansion] Address dictionary builder

5d21cb1604f15492a68a2574a9fba2f29f622c92 authored over 9 years ago by Al <[email protected]>
[fix] trie_set_data_at_index

6eccde0df83a4ffdf57c17f8a622d594d2ce99f6 authored over 9 years ago by Al <[email protected]>
[expansion] Address dictionary allocation, I/O, get/set

c798876b3d1b76a3c49930f93a488c124119eb81 authored over 9 years ago by Al <[email protected]>
[fix] A few anomalies in the Wikipedia/Wiktionary-generated given names

2114b21399a5515517abb6c3cf3d46f55d3dc3cb authored over 9 years ago by Al <[email protected]>