Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/openvenues/libpostal
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
https://github.com/openvenues/libpostal
[fix] language disambiguation
7053c6b60b088d28adfb0c906f0b75b741a51ceb authored over 9 years ago by Al <[email protected]>
7053c6b60b088d28adfb0c906f0b75b741a51ceb authored over 9 years ago by Al <[email protected]>
[dictionaries] Occitan stopwords for disambiguating from French
e26776a5e9227ad2c2d9f1f622210e1fc2d244a0 authored over 9 years ago by Al <[email protected]>
e26776a5e9227ad2c2d9f1f622210e1fc2d244a0 authored over 9 years ago by Al <[email protected]>
[languages] If a non-Latin script in a string would prohibit the found language, return ambiguous. Adding some test cases for sanity checking the labeling
f6d84531bc7395d2d23ed96d0effcbd8213dc9aa authored over 9 years ago by Al <[email protected]>
f6d84531bc7395d2d23ed96d0effcbd8213dc9aa authored over 9 years ago by Al <[email protected]>
[mv] Moving the get regional/country languages logic out of language polygons
b8e4c191468f83ac669802a6a34d87767a67273e authored over 9 years ago by Al <[email protected]>
b8e4c191468f83ac669802a6a34d87767a67273e authored over 9 years ago by Al <[email protected]>
[languages] Using stopwords only to account for how ambiguous a phrase is, not for disambiguation
43178747f8251d1b2ad225670ee07ae2d3ec603d authored over 9 years ago by Al <[email protected]>
43178747f8251d1b2ad225670ee07ae2d3ec603d authored over 9 years ago by Al <[email protected]>
[languages] Adding non-canonicals only for streets, prefixes and suffixes. Better handling of default langauges, abbreviations and ambiguity
d8763e9d6c26e868660e22f43f6b0478bbf8d153 authored over 9 years ago by Al <[email protected]>
d8763e9d6c26e868660e22f43f6b0478bbf8d153 authored over 9 years ago by Al <[email protected]>
[dictionaries] Norwegian street types from the suffix dictionary
9c176961ffb5594db70243347baacdd4f161c753 authored over 9 years ago by Al <[email protected]>
9c176961ffb5594db70243347baacdd4f161c753 authored over 9 years ago by Al <[email protected]>
[languages] non-default languages can still be labeled from > 1 char abbreviations if there's no evidence of other languages in the string. Adding Python version of get_string_script from the C lib
122a81b61085b100f01e794d170bd3a0408d9e91 authored over 9 years ago by Al <[email protected]>
122a81b61085b100f01e794d170bd3a0408d9e91 authored over 9 years ago by Al <[email protected]>
[languages] Adding canonical back in to language disambiguation (for prefixes/suffixes too), using non-canonicals/abbreviations in non-default languages if there are no other abbreviations found, adding in stopwords dictionaries
a419dad63079eab977a4338b3aa6ec933d6349d6 authored over 9 years ago by Al <[email protected]>
a419dad63079eab977a4338b3aa6ec933d6349d6 authored over 9 years ago by Al <[email protected]>
[fix] No longer using abbreviations for default languages, can be stopwords, etc.
a7d9cc17824142f014e6b16b148acfc4a7f0614a authored over 9 years ago by Al <[email protected]>
a7d9cc17824142f014e6b16b148acfc4a7f0614a authored over 9 years ago by Al <[email protected]>
[fix] import
0701bb6f086c29eee39d3aa889a9f2d42e5dfd59 authored over 9 years ago by Al <[email protected]>
0701bb6f086c29eee39d3aa889a9f2d42e5dfd59 authored over 9 years ago by Al <[email protected]>
[languages] Disambiguation uses language defaults, unicode normalized canonicals are treated as canonicals
723058886a776d01053eba1d520e7cd499f62b77 authored over 9 years ago by Al <[email protected]>
723058886a776d01053eba1d520e7cd499f62b77 authored over 9 years ago by Al <[email protected]>
[languages] Disambiguation in language labeling better handles default languages and only uses canonical forms for non-default languages
6231e17f2b05871a1fc8854435831f6c313df098 authored over 9 years ago by Al <[email protected]>
6231e17f2b05871a1fc8854435831f6c313df098 authored over 9 years ago by Al <[email protected]>
[polygons] Adding a main to generate language polygons
bf829f7cb6f6ae5adad7a1b981b86a36244c9751 authored over 9 years ago by Al <[email protected]>
bf829f7cb6f6ae5adad7a1b981b86a36244c9751 authored over 9 years ago by Al <[email protected]>
[languages] Adding non-default Spanish and French gazetteers to the US, and giving the country of Jersey shared English/French defaults instead of just English
5c15c4a99f7b3995c8771b52567fe3b50214d472 authored over 9 years ago by Al <[email protected]>
5c15c4a99f7b3995c8771b52567fe3b50214d472 authored over 9 years ago by Al <[email protected]>
[fix] import
e70c2453ee5e0491e288de3acd3952f620d34aba authored over 9 years ago by Al <[email protected]>
e70c2453ee5e0491e288de3acd3952f620d34aba authored over 9 years ago by Al <[email protected]>
[osm] Some countries like Lebanon in OSM will list the same address under two languages (French/English), which creates an unreasonable task for a linear classifier, so running disambiguation in those cases
390271525896a658a951dd233592518d978d4dd8 authored over 9 years ago by Al <[email protected]>
390271525896a658a951dd233592518d978d4dd8 authored over 9 years ago by Al <[email protected]>
[geonames] Adding covering index to geonames DB
f6e521e3f34a0319c13edac97de1a38f31fbe7ef authored over 9 years ago by Al <[email protected]>
f6e521e3f34a0319c13edac97de1a38f31fbe7ef authored over 9 years ago by Al <[email protected]>
[mv] csv_utils
bd31dc99f28a933ec4e4ad95480f1db23e3cbe25 authored over 9 years ago by Al <[email protected]>
bd31dc99f28a933ec4e4ad95480f1db23e3cbe25 authored over 9 years ago by Al <[email protected]>
[languages] Adding English gazetteers to many countries where the default language is Arabic but the road signs may be in English
cc43409b726337ba0acc467e891cfc12cb8b7198 authored over 9 years ago by Al <[email protected]>
cc43409b726337ba0acc467e891cfc12cb8b7198 authored over 9 years ago by Al <[email protected]>
[languages] Refactorying street_types_gazetteer a bit so dictionaries are configurable
c5a9c392d4d2bd4fb8573dcbbff3229c6f4c853f authored over 9 years ago by Al <[email protected]>
c5a9c392d4d2bd4fb8573dcbbff3229c6f4c853f authored over 9 years ago by Al <[email protected]>
[fix] language dismabiguation module
baa60aab65a638dc485c0f1dfdc1ebeab513bb98 authored over 9 years ago by Al <[email protected]>
baa60aab65a638dc485c0f1dfdc1ebeab513bb98 authored over 9 years ago by Al <[email protected]>
[fix] var name
4976be64e585d96c2d9e794917d75f7880e35c90 authored over 9 years ago by Al <[email protected]>
4976be64e585d96c2d9e794917d75f7880e35c90 authored over 9 years ago by Al <[email protected]>
[fix] typo
8e56568cabe940492c9718d536406658531f0854 authored over 9 years ago by Al <[email protected]>
8e56568cabe940492c9718d536406658531f0854 authored over 9 years ago by Al <[email protected]>
[languages] Moving language id methods into a separate package
ca6d802a430e567d362051a562037ec9e75b909c authored over 9 years ago by Al <[email protected]>
ca6d802a430e567d362051a562037ec9e75b909c authored over 9 years ago by Al <[email protected]>
[fix] var name
9d2f7e4bd1cdebf31a9380f488909b39956d6c6e authored over 9 years ago by Al <[email protected]>
9d2f7e4bd1cdebf31a9380f488909b39956d6c6e authored over 9 years ago by Al <[email protected]>
[osm] OSM untagged formatted addresses try to use language namespaced tags
0528d1b578e7dfcd84c0069832a432162fb0d2eb authored over 9 years ago by Al <[email protected]>
0528d1b578e7dfcd84c0069832a432162fb0d2eb authored over 9 years ago by Al <[email protected]>
[fix] via in English is a stopword, not a street type
330002197a2a320fc54521b4c17053f91764bf8d authored over 9 years ago by Al <[email protected]>
330002197a2a320fc54521b4c17053f91764bf8d authored over 9 years ago by Al <[email protected]>
[osm] OSM untagged formatted addresses now use the new language labeling scheme
c09cb4dd82c7cd63749efdc88c401aba18078593 authored over 9 years ago by Al <[email protected]>
c09cb4dd82c7cd63749efdc88c401aba18078593 authored over 9 years ago by Al <[email protected]>
[fix] removing debug print
3daba2ddcd0f024c536adb757f66cbc7d6131287 authored over 9 years ago by Al <[email protected]>
3daba2ddcd0f024c536adb757f66cbc7d6131287 authored over 9 years ago by Al <[email protected]>
[dictionaries] Updates to Galician and Catalan where they overlap with Spanish
089a197155146202e70b61dfee1fc9df68b5e065 authored over 9 years ago by Al <[email protected]>
089a197155146202e70b61dfee1fc9df68b5e065 authored over 9 years ago by Al <[email protected]>
[fix] English dictionaries
faf3435ffc9efc77f5f8572df262c4c675d26797 authored over 9 years ago by Al <[email protected]>
faf3435ffc9efc77f5f8572df262c4c675d26797 authored over 9 years ago by Al <[email protected]>
[dictionaries] Accented Gran Via for Catalan
9183ba4e0131adb54b1645b874f12c4311549dce authored over 9 years ago by Al <[email protected]>
9183ba4e0131adb54b1645b874f12c4311549dce authored over 9 years ago by Al <[email protected]>
[dictionaries] A few more Catalan terms that are the same as in Spanish
07b43e524e5db9a36e1bf71cb0be34e9609ef595 authored over 9 years ago by Al <[email protected]>
07b43e524e5db9a36e1bf71cb0be34e9609ef595 authored over 9 years ago by Al <[email protected]>
[languages/osm] Checking for existence of separable prefix/suffix in the given dictionaries
ffe76f04032590053be3b664acd4ec555625bcc8 authored over 9 years ago by Al <[email protected]>
ffe76f04032590053be3b664acd4ec555625bcc8 authored over 9 years ago by Al <[email protected]>
[fix] English dictionary
3b55b51ef12fc750446eb0ef076d4b4ce1d841d8 authored over 9 years ago by Al <[email protected]>
3b55b51ef12fc750446eb0ef076d4b4ce1d841d8 authored over 9 years ago by Al <[email protected]>
[languages/osm] Adding a primitive phrase dictionary to the OSM training data construction script and a few heuristics to help disambiguate in the case of small local language groups that may not be specified with name:lang tags e.g. Occitan, Catalan, Basque, Galician, etc. Also throwing away ambiguous multilanguage names
0e00625dbd155556b14f1a164358f204a3f43200 authored over 9 years ago by Al <[email protected]>
0e00625dbd155556b14f1a164358f204a3f43200 authored over 9 years ago by Al <[email protected]>
[dictionaries] Moving a few terms in German dictionaries
fb7f2999e583912732b7acbae10b56a1f7736c84 authored over 9 years ago by Al <[email protected]>
fb7f2999e583912732b7acbae10b56a1f7736c84 authored over 9 years ago by Al <[email protected]>
[dictionaries] A few new terms in Dutch dictionaries to help distinguish from German
c5d14e9c4d8d4da73ce28eb31823c8782048cfba authored over 9 years ago by Al <[email protected]>
c5d14e9c4d8d4da73ce28eb31823c8782048cfba authored over 9 years ago by Al <[email protected]>
[dictionaries] Better categorization of French dictionaries
4d115fdd88e29df94ca3712e142f67541ed20357 authored over 9 years ago by Al <[email protected]>
4d115fdd88e29df94ca3712e142f67541ed20357 authored over 9 years ago by Al <[email protected]>
[dictionaries] A few English dictionary terms that came up in language detection tests
0f883a887285ea658306c14ca595612558640cd7 authored over 9 years ago by Al <[email protected]>
0f883a887285ea658306c14ca595612558640cd7 authored over 9 years ago by Al <[email protected]>
[dictionaries] Updating Catalan dictionaries with place types to help distinguish from Spanish
db7ffa7cab9db6e8f5e016a04c26658a729b8b4c authored over 9 years ago by Al <[email protected]>
db7ffa7cab9db6e8f5e016a04c26658a729b8b4c authored over 9 years ago by Al <[email protected]>
[dictionaries] Fixes to Spanish dictionaries
a1d8d3bf5fe8ea688fffc224404cb3ab9ea788f9 authored over 9 years ago by Al <[email protected]>
a1d8d3bf5fe8ea688fffc224404cb3ab9ea788f9 authored over 9 years ago by Al <[email protected]>
[fix] items
b72d9af7dcc6aa9b783047055246ac91d61afc68 authored over 9 years ago by Al <[email protected]>
b72d9af7dcc6aa9b783047055246ac91d61afc68 authored over 9 years ago by Al <[email protected]>
[fix] getter
f3bb3c83569d18d38a0ce1f1bfa7013fe29ba3f4 authored over 9 years ago by Al <[email protected]>
f3bb3c83569d18d38a0ce1f1bfa7013fe29ba3f4 authored over 9 years ago by Al <[email protected]>
[fix] name
ebd5e96bd73d7a31e40df02099cdebe7bedbc106 authored over 9 years ago by Al <[email protected]>
ebd5e96bd73d7a31e40df02099cdebe7bedbc106 authored over 9 years ago by Al <[email protected]>
[fix] var name
b5be1e8df5c30126b6a756949d381f9f353df819 authored over 9 years ago by Al <[email protected]>
b5be1e8df5c30126b6a756949d381f9f353df819 authored over 9 years ago by Al <[email protected]>
[fix] language polys
e84f932042e23b8f9f943404dbcd177d456c8318 authored over 9 years ago by Al <[email protected]>
e84f932042e23b8f9f943404dbcd177d456c8318 authored over 9 years ago by Al <[email protected]>
[polygons] Changes to languages polygons to support new regional language handling
bada7fd13b48c1700b6d06cf13255dce360a20c4 authored over 9 years ago by Al <[email protected]>
bada7fd13b48c1700b6d06cf13255dce360a20c4 authored over 9 years ago by Al <[email protected]>
[languages] Allowing specification of multiple regional languages
d97c725bbcab9ffd8f7802368d3696ea75ef4c1d authored over 9 years ago by Al <[email protected]>
d97c725bbcab9ffd8f7802368d3696ea75ef4c1d authored over 9 years ago by Al <[email protected]>
[languages] Removing the Belarusian override as Russian appears to be used often in street signs and there are generally good name:ru/name:be tags
b8fbbb1917f76c374bcb25af296de0d161692684 authored over 9 years ago by Al <[email protected]>
b8fbbb1917f76c374bcb25af296de0d161692684 authored over 9 years ago by Al <[email protected]>
[dictionaries] Adding French as equally likely language for Guernesey, which will effectively exclude it from the language training data (doesn't matter since there's already enough English/French addresses).
453aa7c633cd7cf94bfefeb5f6fd15d82aec12fe authored over 9 years ago by Al <[email protected]>
453aa7c633cd7cf94bfefeb5f6fd15d82aec12fe authored over 9 years ago by Al <[email protected]>
[osm] Omitting country in limited address data set (often abbreviated, doesn't convey language as well)
89071ea21a14146a52f4cdf64240323383f47264 authored over 9 years ago by Al <[email protected]>
89071ea21a14146a52f4cdf64240323383f47264 authored over 9 years ago by Al <[email protected]>
[fix] var name
c50526091253f8cbd02ec1b993788fa762b571b7 authored over 9 years ago by Al <[email protected]>
c50526091253f8cbd02ec1b993788fa762b571b7 authored over 9 years ago by Al <[email protected]>
[fix] street addresses by language
548ce79b99bd5064360336fd896b6dc5b9875758 authored over 9 years ago by Al <[email protected]>
548ce79b99bd5064360336fd896b6dc5b9875758 authored over 9 years ago by Al <[email protected]>
[osm] Adding a new OSM training data option for writing out full formatted addresses without place names
74a751ce0afbad51ef5b257e1ff18d051e8f62d3 authored over 9 years ago by Al <[email protected]>
74a751ce0afbad51ef5b257e1ff18d051e8f62d3 authored over 9 years ago by Al <[email protected]>
[languages] Bonaire admin1 as well as country code
133ce9e5b19b2a4961c02cbbadf5a023442ca8e8 authored over 9 years ago by Al <[email protected]>
133ce9e5b19b2a4961c02cbbadf5a023442ca8e8 authored over 9 years ago by Al <[email protected]>
[fix] language polygon index
05b8f555d53515cb6f579072e6679c641928fd31 authored over 9 years ago by Al <[email protected]>
05b8f555d53515cb6f579072e6679c641928fd31 authored over 9 years ago by Al <[email protected]>
[osm] Adding building tag to venues training set construction
0e92abd53e675a94f7549bcc044e8ad6e2ace686 authored over 9 years ago by Al <[email protected]>
0e92abd53e675a94f7549bcc044e8ad6e2ace686 authored over 9 years ago by Al <[email protected]>
[languages] Changing Bonaire's default road sign language to Papiamento to help distinguish from Dutch
191c0e3ce5d6f0a62c95005300915d9654ff02b7 authored over 9 years ago by Al <[email protected]>
191c0e3ce5d6f0a62c95005300915d9654ff02b7 authored over 9 years ago by Al <[email protected]>
[osm] Making minimal_only the default in formatted addresses, expanding list of acceptable combinations of address fields
cad1f95bbb0a9a118ead3c09c57a761633f56d4d authored over 9 years ago by Al <[email protected]>
cad1f95bbb0a9a118ead3c09c57a761633f56d4d authored over 9 years ago by Al <[email protected]>
[fix] road+house_number as minimal keys for formatting addresses
1e936ac9dc8f8259efbc4a6ed8bf257030dc26f1 authored over 9 years ago by Al <[email protected]>
1e936ac9dc8f8259efbc4a6ed8bf257030dc26f1 authored over 9 years ago by Al <[email protected]>
[fix] param
83bbd67c9c6f54328d30b7aa186956d54c3046be authored over 9 years ago by Al <[email protected]>
83bbd67c9c6f54328d30b7aa186956d54c3046be authored over 9 years ago by Al <[email protected]>
[fix] splitter
e993ddcb51a4169ab9063cfa88513efb3c680dfb authored over 9 years ago by Al <[email protected]>
e993ddcb51a4169ab9063cfa88513efb3c680dfb authored over 9 years ago by Al <[email protected]>
[fix] __init__
dc2766ae5d9acbce68bd857df0d59c19f4b68944 authored over 9 years ago by Al <[email protected]>
dc2766ae5d9acbce68bd857df0d59c19f4b68944 authored over 9 years ago by Al <[email protected]>
[osm] Using pipe splitter for address components
62c67aa970e0cc92d8e94950c6c2dcf0a9de5a1a authored over 9 years ago by Al <[email protected]>
62c67aa970e0cc92d8e94950c6c2dcf0a9de5a1a authored over 9 years ago by Al <[email protected]>
[osm] Prefer amenity tag, skip if the building tag is simply building=yes
2bd763be035884dc658f782711a3eb11998669e9 authored over 9 years ago by Al <[email protected]>
2bd763be035884dc658f782711a3eb11998669e9 authored over 9 years ago by Al <[email protected]>
[fix] carriage returns
c844d0484a296325d5cef5e70efd95a57836b82f authored over 9 years ago by Al <[email protected]>
c844d0484a296325d5cef5e70efd95a57836b82f authored over 9 years ago by Al <[email protected]>
[osm] Replacing escape chars at write time as there's no quoting, adding building key to venue training data
ef14aa2b7ed1106e979a3cd1bab456ca180ce62a authored over 9 years ago by Al <[email protected]>
ef14aa2b7ed1106e979a3cd1bab456ca180ce62a authored over 9 years ago by Al <[email protected]>
[polygons] Separating out simplify polygon into a method in RTree index
9125f07af08d20daf813d86cf83c6fcbc17ab435 authored over 9 years ago by Al <[email protected]>
9125f07af08d20daf813d86cf83c6fcbc17ab435 authored over 9 years ago by Al <[email protected]>
[osm] Using tsv_no_quote writers in all OSM training data files
46f2c68a690c549de7ee0af9a78e6389ebf3ae61 authored over 9 years ago by Al <[email protected]>
46f2c68a690c549de7ee0af9a78e6389ebf3ae61 authored over 9 years ago by Al <[email protected]>
[scripts] Regenerating unicode_scripts_data file
9464670174bf1c13d2671ec7236ada60e02b004f authored over 9 years ago by Al <[email protected]>
9464670174bf1c13d2671ec7236ada60e02b004f authored over 9 years ago by Al <[email protected]>
[utils] no-quote CSV dialect
88d63c85d246379281baca0ab05bd22bc5637434 authored over 9 years ago by Al <[email protected]>
88d63c85d246379281baca0ab05bd22bc5637434 authored over 9 years ago by Al <[email protected]>
[scripts] Better script code aliasing
03febc7e209420e9e7cb2829ff016b2d03029204 authored over 9 years ago by Al <[email protected]>
03febc7e209420e9e7cb2829ff016b2d03029204 authored over 9 years ago by Al <[email protected]>
[mv] csv_utils
b54ff95ecc4e527e961a223f557ea240fbed1de1 authored over 9 years ago by Al <[email protected]>
b54ff95ecc4e527e961a223f557ea240fbed1de1 authored over 9 years ago by Al <[email protected]>
[normalize] Need to do a Latin-ASCII transliteration even if the string is entirely ASCII since it may contain HTML escapes
66a71ab70d64e33943abb64ce2bd671f69520cf6 authored over 9 years ago by Al <[email protected]>
66a71ab70d64e33943abb64ce2bd671f69520cf6 authored over 9 years ago by Al <[email protected]>
[transliteration] Regenerating transliteration data file
87b275fcab9b4b2d9dbe3284c45d071fbf146c05 authored over 9 years ago by Al <[email protected]>
87b275fcab9b4b2d9dbe3284c45d071fbf146c05 authored over 9 years ago by Al <[email protected]>
[transliteration] Doing HTML escapes first in Latin-ASCII transliteration as they may need to be resolved further in subsequent steps
cf706158508bff5c155fab471858589921c96269 authored over 9 years ago by Al <[email protected]>
cf706158508bff5c155fab471858589921c96269 authored over 9 years ago by Al <[email protected]>
[fix] phrase start in transliteration
9712e0fa8761c901dba89079048d4112ac858cd1 authored over 9 years ago by Al <[email protected]>
9712e0fa8761c901dba89079048d4112ac858cd1 authored over 9 years ago by Al <[email protected]>
[phrases] Fixing tail searches in trie_get_prefix*
562a7c243da6940f8c77807de91ba27e616637eb authored over 9 years ago by Al <[email protected]>
562a7c243da6940f8c77807de91ba27e616637eb authored over 9 years ago by Al <[email protected]>
[fix] check for local CLDR in unicode properties
51addec5f2bf54549af70527450037296418faaf authored over 9 years ago by Al <[email protected]>
51addec5f2bf54549af70527450037296418faaf authored over 9 years ago by Al <[email protected]>
[fix] ensure CLDR dir
882e4c2ab85a9a3917b3cfd16990e8bcb1120d97 authored over 9 years ago by Al <[email protected]>
882e4c2ab85a9a3917b3cfd16990e8bcb1120d97 authored over 9 years ago by Al <[email protected]>
[fix] cldr languages dir
48566bf0976f351498f44e1ba8d63c65874971e3 authored over 9 years ago by Al <[email protected]>
48566bf0976f351498f44e1ba8d63c65874971e3 authored over 9 years ago by Al <[email protected]>
[build] ORder-only dependencies for downloading data files, rm-ing the tarball when done extracting
e98a82266117abbb2b40e93a85697906c4500d53 authored over 9 years ago by Al <[email protected]>
e98a82266117abbb2b40e93a85697906c4500d53 authored over 9 years ago by Al <[email protected]>
[build] Fixing tarball uploading
0028c2bc53674e2496b84e7ea1b0e3ce4192d501 authored over 9 years ago by Al <[email protected]>
0028c2bc53674e2496b84e7ea1b0e3ce4192d501 authored over 9 years ago by Al <[email protected]>
[build] Adding tarball back to pkgdata
f21b767696b3132842c9b203da0e8efde32e9303 authored over 9 years ago by Al <[email protected]>
f21b767696b3132842c9b203da0e8efde32e9303 authored over 9 years ago by Al <[email protected]>
[api] Better handling of strings with multiple scripts and strings that use more than one transliterator. Reducing complexity/allocations
c29cf5ac9a0b08bfce6e1ac5a54f2d3ad1e9e734 authored over 9 years ago by Al <[email protected]>
c29cf5ac9a0b08bfce6e1ac5a54f2d3ad1e9e734 authored over 9 years ago by Al <[email protected]>
[normalize] Adding the original script as an alternative in transliteration mode as well
4bc6adf6699f3a3a991abd6d585d818458f01d40 authored over 9 years ago by Al <[email protected]>
4bc6adf6699f3a3a991abd6d585d818458f01d40 authored over 9 years ago by Al <[email protected]>
[utils] string_tree_num_strings method
a13e5117b503983f2c897366272b5d8ab5dae0d7 authored over 9 years ago by Al <[email protected]>
a13e5117b503983f2c897366272b5d8ab5dae0d7 authored over 9 years ago by Al <[email protected]>
[cli] delete_word_hyphens as a default option
219947722d026e9c7dd8d34e3d2d6c2ce957c3b1 authored over 9 years ago by Al <[email protected]>
219947722d026e9c7dd8d34e3d2d6c2ce957c3b1 authored over 9 years ago by Al <[email protected]>
[api] Add separable or inseparable non-canonical string affixes (e.g. foobg. => fooburg, foostrasse => foostraße|foo straße, l'ensemble => l' ensemble, etc.) in expand_address
78a80dd86e707a0d584b367cfc92edb67ae5a906 authored over 9 years ago by Al <[email protected]>
78a80dd86e707a0d584b367cfc92edb67ae5a906 authored over 9 years ago by Al <[email protected]>
[expansion] Adding search_address_dictionaries_prefix/suffix for concatenated prefixes/suffixes e.g. in Germanic languages. Adding a flag to the address_expansion struct and trie value to denote separability, adding prefix/suffix keys during dictionary creation
de5d6945b553cdf1f9b2147d26d1c0c627e1ad07 authored over 9 years ago by Al <[email protected]>
de5d6945b553cdf1f9b2147d26d1c0c627e1ad07 authored over 9 years ago by Al <[email protected]>
[normalize] Adding a char_array version of normalize token
0f77ca1213571e136c99bbf754572dd645f04d43 authored over 9 years ago by Al <[email protected]>
0f77ca1213571e136c99bbf754572dd645f04d43 authored over 9 years ago by Al <[email protected]>
[utils] char_array_append_reversed for adding reversed strings without a malloc
064b6b5898d3b62b202cb7bff9c2c22b39b5b86f authored over 9 years ago by Al <[email protected]>
064b6b5898d3b62b202cb7bff9c2c22b39b5b86f authored over 9 years ago by Al <[email protected]>
[fix] Only the exact TRIE_PREFIX_CHAR/TRIE_SUFFIX_CHAR characters are disallowed as keys
dab181a4d7c69e80ee28653f29ef999bec54c75b authored over 9 years ago by Al <[email protected]>
dab181a4d7c69e80ee28653f29ef999bec54c75b authored over 9 years ago by Al <[email protected]>
[phrases] Prefix/suffix trie search using the new characters, fixing length of matched prefixes/suffixes and exiting early on falling off the the trie
e511eede74db8caffbd95299fe0c7aefae518d0d authored over 9 years ago by Al <[email protected]>
e511eede74db8caffbd95299fe0c7aefae518d0d authored over 9 years ago by Al <[email protected]>
[phrases] Changing prefix/suffix chars so both are control characters and neither is the NUL-byte. Modifying transliteration special characters accordingly
51572d65757efce01de5a8700cf3c23dc65ba9da authored over 9 years ago by Al <[email protected]>
51572d65757efce01de5a8700cf3c23dc65ba9da authored over 9 years ago by Al <[email protected]>
[phrases] adding _from_index_get_prefix_char/_from_index_get_suffix_char methods
11a9881988c62ec7fb9ba379861111bcc62e4908 authored over 9 years ago by Al <[email protected]>
11a9881988c62ec7fb9ba379861111bcc62e4908 authored over 9 years ago by Al <[email protected]>
[phrases] trie_search_prefixes/trie_search_suffixes now take a length param
2eb67ad8501f8f66e01fe48109e2b3234af43fcf authored over 9 years ago by Al <[email protected]>
2eb67ad8501f8f66e01fe48109e2b3234af43fcf authored over 9 years ago by Al <[email protected]>
[fix] NUMEX_STOPWORD_RULE define
bbaa302e2e65bc6efdd8cf4c1c7cb48368ed8d74 authored over 9 years ago by Al <[email protected]>
bbaa302e2e65bc6efdd8cf4c1c7cb48368ed8d74 authored over 9 years ago by Al <[email protected]>