Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
https://github.com/openvenues/libpostal

[test] adding tests for ordinal suffix normalization

e81580287daab07af9b187f0b864e89d8420454b authored over 7 years ago by Al <[email protected]>
[fix] numex change detection in Travis build

85297f333386097d0d0b4f1332418ed32fa720a2 authored over 7 years ago by Al <[email protected]>
[auto][ci skip] Adding data files from Travis build #228

4762ff2638fe0f1f3df803a4762f6c0c7aa8b638 authored over 7 years ago by Travis <[email protected]>
Merge pull request #186 from openvenues/degree_symbol_ordinal_suffix

Degree symbol ordinal suffix

e92c3c2867dff415abe2610bdf5b7790a0c3a8ff authored over 7 years ago by Al Barrentine <[email protected]>
[numex] adding ability to handle handle the degree symbol in numex parsing since it's technically a separate token

f3adde746e2e518c64e57c81d8b2acd3903abb07 authored over 7 years ago by Al <[email protected]>
[dictionaries] adding degree symbol "°" variant for any surface forms that have "º"

19899b2f7dca49b63f4cb2811ce203273418b5ee authored over 7 years ago by Al <[email protected]>
[numex] adding "°" as additional ordinal suffix for Spanish, Italian, and Portuguese

c968dd4ecc0b1e7eddebb354c03f8c23335225f7 authored over 7 years ago by Al <[email protected]>
Merge pull request #185 from Ironholds/master

Remove unused variable

254f3622ea4e3473ea9538249c18a167961ba1b5 authored over 7 years ago by Al Barrentine <[email protected]>
Merge pull request #1 from Ironholds/Ironholds-patch-1

Remove unused variable

18a5d06427c49d50077d3b5242628fb43a47ced3 authored over 7 years ago by Oliver Keyes <[email protected]>
Remove unused variable

What it says on the tin!

35821f975e6f871cccfa8583a85e6192abc0a241 authored over 7 years ago by Oliver Keyes <[email protected]>
Merge pull request #184 from openvenues/remove_ordinal_suffix

Remove ordinal suffixes in libpostal_expand_address

e0c82b5edb2eb98af5f7b05255573d6fba689f63 authored over 7 years ago by Al Barrentine <[email protected]>
[build] rebuild numex table in Travis if either the configs change or numex_table_builder.c changes

9cd3ec37f963797a66b0e67e56836c641195e0fe authored over 7 years ago by Al <[email protected]>
[build] Makefile changes to support moving numeric expression parsing to normalize.c

f3cf119e5848d14d48252ffff0501f207468f031 authored over 7 years ago by Al <[email protected]>
[numex] adding one form of normalization which strips ordinal suffixes so {96th, Ninety-sixth} => 96. This is an additional form of normalization, so there's still one form where the suffixes are kept. One case that's still not handled is something like "IXe Arrondissement"

cddc368533b583b8b24218be79d0402b4d507214 authored over 7 years ago by Al <[email protected]>
[numex] adding ordinal suffixes themselves to the numex trie so they can be removed from strings

92051863ba807b73fc7945b573afaa3c54e71d6c authored over 7 years ago by Al <[email protected]>
Merge pull request #183 from openvenues/cdn

Hosting model files and training data on CloudFront CDN

63ac3cf9210b4cb9df79fb857c12fb1da8194058 authored over 7 years ago by Al Barrentine <[email protected]>
[data] deployed model files and training data to CloudFront for easier downloading around the world and in places like China where the Great Fire Wall may prevent large downloads from abroad. TTL is set to 0 so it still caches the files themselves but checks with origin for the If-Modified-Since headers, allowing the files to be updated dynamically

d2732922c249f55a18550a7305642a2521bb938c authored over 7 years ago by Al <[email protected]>
Merge pull request #181 from eefi/bug/various/initializer

[fix] don't use unnamed fields in initializers

5699ef3da03aed752fa5f4dff266ea43e750b2ae authored over 7 years ago by Al Barrentine <[email protected]>
Merge branch 'master' of https://github.com/openvenues/libpostal

36dc41af8cdbae26136c5c62f9aa4865ddee8359 authored over 7 years ago by Al <[email protected]>
[fix] need to set prev_state to the NULL state in numex parsing after a non-space/non-hyphen is encountered and the previous match, if any, is added to the result array

413c584f08082e25fb599b741c4b5eebef4e365a authored over 7 years ago by Al <[email protected]>
[fix] don't use unnamed fields in initializers

GCC did not support assigning to unnamed fields from designated
initializers until 4.6 [1]. Unfo...

f9b57dbd42c94cc178f7ee82e48ae4178828ce94 authored over 7 years ago by Austin Chu <[email protected]>
Merge pull request #180 from eefi/bug/tagger/include-guard

[fix] add #include guard to tagger.h

7bef84676ecddf29c0bff863fb05e9dc9bc6f42d authored over 7 years ago by Al Barrentine <[email protected]>
[fix] add #include guard to tagger.h

a966712e18dfc22e53bd85ac440215212c10029e authored over 7 years ago by Austin Chu <[email protected]>
Merge pull request #177 from eefi/bug/matrix/clbas

[fix] typo in compiler warning when no CBLAS found

32c8662f8d94a9d6ae123622670e80146f4965ce authored over 7 years ago by Al Barrentine <[email protected]>
[fix] typo in compiler warning when no CBLAS found

19a04511ba3e2b8eb6c6e3c907a7e5452b46414e authored over 7 years ago by Austin Chu <[email protected]>
[numex] fix numex parsing when the spelled-out number is followed by a comma or other punctuation

b464eb6c07a88ea2591917727b3bc53b8b989759 authored over 7 years ago by Al <[email protected]>
[osm/boundaries] check polygons with an ISO3166-2 as well in the country polygon index in case the country polygon is funky

fc91471434c01266a6dfc28afe94efb1f3440511 authored over 7 years ago by Al <[email protected]>
[formatting] removing the ability to insert city between house number and road in France from discussion in #27

4ecd6c23c6045a8e5842383711335f2d5e158405 authored over 7 years ago by Al <[email protected]>
[build] add another housekeeping file in the datadir for data_version. Blow away the exiting files if that file either doesn't exist or doesn't contain a matching version string to help with upgrades

7f7aada32ab1a65b94f880a45f9755bbd941eedc authored over 7 years ago by Al <[email protected]>
[docs][ci skip] adding note about using libpostal on mobile

4f9b0ef495ffb438da52866ddbf2bd08eb69f0e5 authored over 7 years ago by Al <[email protected]>
[docs][ci skip] add link to the 1.0 blog post

6984427eb99f2696cb7f5bbac064fc2c7ca759f1 authored over 7 years ago by Al <[email protected]>
[docs] adding note about the newly-trained language classifier trained with FTRL-Proximal (now 1/10th the size), which keeps its high accuracy while maintaining a sparse solution. This commit will trigger a build with the freshly uploaded model.

5605ba3185c5ed54cf36c6df52ddf06e3a4c621b authored over 7 years ago by Al <[email protected]>
[fix][ci skip] S3 upload paths in data upload/download script

5a96be5d5ca429c588c014cb26177147425aedc2 authored over 7 years ago by Al <[email protected]>
[auto][ci skip] Adding data files from Travis build #210

d8409f1f38fc73182e09ce22e99ae0fb07ecc213 authored over 7 years ago by Travis <[email protected]>
Merge pull request #171 from openvenues/parser-data

Libpostal 1.0

918342d4c34c967550456dacfe7c2226ce932044 authored over 7 years ago by Al Barrentine <[email protected]>
[fix] removing one of the warnings about C90 since this is entirely C99.

c01e67c1e43235e87bd167cb49a4a7591a69934d authored over 7 years ago by Al <[email protected]>
[classification] correcting cost functions in SGD and FTRL for use in parameter sweeps

caebf4e2c910089911f21688a24cb95e4b8c15db authored over 7 years ago by Al <[email protected]>
[numex] add dehyphenated form when building numex table

6219cc63783ab30040b6c3ed693468ff9f0198f3 authored over 7 years ago by Al <[email protected]>
[build/fix] autoconf syntax for Ubuntu (12.04) version of autoconf aka that used on Travis

264866d7199e72c1cc4cb1feb7fc23aebb5677e0 authored over 7 years ago by Al <[email protected]>
[build] fixing checks in numex.py, run when the resources/numex directory changes

ef0d4c2ded8d86a5090e3904493bda8043f0b4d8 authored over 7 years ago by Al <[email protected]>
[fix] adding yaml to requirements-simple.txt for CI

0ec2e57afabf024854e2c2faf6d7c63bcae930d0 authored over 7 years ago by Al <[email protected]>
[fix] /AC_CONFIG_MACRO_DIRS/AC_CONFIG_MACRO_DIR/

64fae1e241e0d4ff472fa2ed0362fd15af9341fa authored over 7 years ago by Al <[email protected]>
[build] add pkg-config to packages in Travis config, remove libsnappy-dev

2b3fb196a111cee820eb7ded91170e2f1252cc40 authored over 7 years ago by Al <[email protected]>
[docs] new parser GIF, featuring addresses relevant to current events

8cef3c4eb9869b04e41ce165e4f1ab6d520ea28a authored over 7 years ago by Al <[email protected]>
[docs] fix spacing

aaae1e055ec7df4ada17c099b1586f599b383a55 authored over 7 years ago by Al <[email protected]>
[docs] merge README from master, move bindings below examples

9c7eac61ebfbc62722ba0c3d2070b2b6922dd060 authored over 7 years ago by Al <[email protected]>
[test] adding more tests from the demo

8ec6e546f5445c7b3f83d301478adc98b5bc3562 authored over 7 years ago by Al <[email protected]>
[parser] removing special commands other than .exit from address_parser_cli

22443e31cc194c1fc63c5c6e8e3cc978fc2aece5 authored over 7 years ago by Al <[email protected]>
[parser] storing address_parser_context on the parser struct itself so it doesn't have to be allocated every time

8742574257f5c2d647dcc746d08240d27a210a45 authored over 7 years ago by Al <[email protected]>
[docs] moving blog post to first paragraph

67157fbd98bf45ad326480a7f15890855f51b5ba authored over 7 years ago by Al <[email protected]>
[docs] aesthetic README changes

b8f65d0a06abd9480e383136bdee3c0875067ad9 authored over 7 years ago by Al <[email protected]>
[openaddresses] Sampson and Yadkin counties, NC, and Union County, SC

f746c6eec634abb9168499a758c48e612632ae78 authored over 7 years ago by Al <[email protected]>
[openaddresses] Rown County, NC

bca449e653cae257fbeb72b834297f8131be793b authored over 7 years ago by Al <[email protected]>
[openaddresses] Carteret County, NC

6102fd345980757f8620c6992c854726e5972376 authored over 7 years ago by Al <[email protected]>
[openaddresses] Bladen County, NC

342740c3a6422fb72fadf52114d16e5e21c4b0dd authored over 7 years ago by Al <[email protected]>
[openaddresses] Beaufort County, NC

7c67ca6edba26e9c873bb8d513eb262a8eb5abb0 authored over 7 years ago by Al <[email protected]>
[openaddresses] city of Ruidoso, NM

680a2e6357c0d7f1b0aa64d0b824ca1ad2a930d9 authored over 7 years ago by Al <[email protected]>
[openaddresses] add Caddo Parisn, LA

921e635b7a69da3bdcc2f0468750baa86597e5ec authored over 7 years ago by Al <[email protected]>
[openaddresses] add Desoto County, FL

e0dc0c9b8646aa36fb023e45cf40adf312d4d4b5 authored over 7 years ago by Al <[email protected]>
[openaddresses] adding OSM boundaries to Clear Creek County, CO as new data set doesn't list city

20adc591a872f6426b5205dd6e11e71e8cb83270 authored over 7 years ago by Al <[email protected]>
[docs] README fixes

4b16b5bccd953d6aeb096243fc14f512bb8e696a authored over 7 years ago by Al <[email protected]>
[openaddresses] removing Lawrence County, SD. Covered by new statewide and has some weird addresses

97ffdbaee0557b71d460013021a802c7b5ee06a5 authored over 7 years ago by Al <[email protected]>
[openaddresses] Fall River County, SD

e4290a489f6e73c3ad25d536805965aef2cbc32c authored over 7 years ago by Al <[email protected]>
[docs] README updates for 1.0 release, adding training data section

c3a64452902d327c7c54020c2b0c8c2d89629e89 authored over 7 years ago by Al <[email protected]>
[openaddresses] moving Buenos Aires, adding Boulder County, CO

65a0d82bda70a665844d46c179591ccd07875ab1 authored over 7 years ago by Al <[email protected]>
[optimization] moving regularization methods to their own module

eff7a7a27a8e21bdd10d2964e7e845356c9099c7 authored over 7 years ago by Al <[email protected]>
[utils] cartesian product iterator for grid search during model selection

957aa0c0c95335f8f5b962c4c6739b0544710e3b authored over 7 years ago by Al <[email protected]>
[build] Makefile changes for new language_classifier_train

4a72afc7127ea6d3e421141ec8606d2fe2519aa3 authored over 7 years ago by Al <[email protected]>
[fix] expansion array destroy API in libpostal expand program

378a11c88fd1785b47c3fdc555c9032fbc61c107 authored over 7 years ago by Al <[email protected]>
[fix] declaring is_common_script function as static

c5e2f89ee992b0647891f8f69d939f8f4e4a3370 authored over 7 years ago by Al <[email protected]>
[language_classification] Runtime language classifier can now use dense or sparse weights, with a different header signature for the sparse version (using old signature for the dense version, so backward-compatible)

5dfdd4b7ebe97c11db7c89fee8b720577d5541d7 authored over 7 years ago by Al <[email protected]>
[log] log the offending line if token count does not match in language_classifier_io

835d851310dacc908f08ca8544f2ea7bce20fbf1 authored over 7 years ago by Al <[email protected]>
[language_classification] adding options to language_classifier_train for using SGD with {L2, L1} regularization or FTRL-Proximal using both.

1. Creates sparse matrix for L1 SGD and FTRL
2. Uses the one standard-error rule during ...

964ac15e5169b37b89e8dc52650b0937d6651af4 authored over 7 years ago by Al <[email protected]>
[languages] adding replace_hyphens and split_alpha_from_numeric in language classifier input normalization

58661c9f2789bee5a7655e7908dc5438794acb91 authored over 7 years ago by Al <[email protected]>
[math] using new matrix methods in softmax

e4ed759f0daa4c7e0383f095de63be846ed5133a authored over 7 years ago by Al <[email protected]>
[math] adding mean, variance and standard deviation to generic vector functions

3aab15a0a099d39408ce55a78b46977ec8f21ef6 authored over 7 years ago by Al <[email protected]>
[utils] hash_get is no longer a string-only function, can be used for generic hashtables

3cb513a8f2cd12bc5b5b4d4611595c4b55f9b59a authored over 7 years ago by Al <[email protected]>
[utils] removing default chunk size from address_parser_train

95e39ad91c71f36b03f057e5b2e5a80695b761db authored over 7 years ago by Al <[email protected]>
[classification] removing regularization update from gradient computation in logistic regression, as that's now handled by the optimizer

a4431dbb27f383eecee344c7fab7b5c5a3bb1884 authored over 7 years ago by Al <[email protected]>
[classification] flexible logistic regression trainer that can handle either SGD (with either L1 or L2) or FTRL as optimiers

64c049730a083272257e7c648cd59d912a749e7d authored over 7 years ago by Al <[email protected]>
[optimization] implemented Google's FTRL-Proximal, adapted for the multiclass/multinomial case. It is L1 and L2 regularized, and should both encourage sparsity with the L1 penalty while being robust to collinearity of features due to the L2 penalty. Ref: https://research.google.com/pubs/archive/41159.pdf

cf88bc7f653818e2788ababaf1caee253d4402ac authored over 7 years ago by Al <[email protected]>
[utils] adding default chunk size to shuffle.h

ed05aaabb1c5d5b89e72e852934251c8768715f4 authored over 7 years ago by Al <[email protected]>
[utils] sparse_matrix_add_unique_columns_alias, adds the actual column indices to hashtable/array and aliases those in the table from 1 to N (where N is the number of unique columns in this batch). This way it's compatible with smaller matrices of batch weights.

96e1ca5e896c4a5913971f7a6e09115dc567c939 authored over 7 years ago by Al <[email protected]>
[optimization] new sgd_trainer struct to manage weights in stochastic gradient descent, allows L1 or L2 regularization, cumulative penalties instead of exponential decay, SGD using L1 regularization encouraged sparsity and can produce a sparse matrix after training rather than a dense one

a2563a4dcdd2cb58c60db114cd63d08f8faec6ae authored over 7 years ago by Al <[email protected]>
[utils] adding non-branching sign functions

19fe084974b9b40acf875f28499f158aba125c1a authored over 7 years ago by Al <[email protected]>
[dictionaries] more abbreviations for MLK

74a281e332ab75a950e8ea7f48f70968d6076d1f authored over 7 years ago by Al <[email protected]>
[openaddresses] add OSM boundaries to King, NC

7f30fb8e38b7d9f1eec3c471efdc81928803b675 authored over 7 years ago by Al <[email protected]>
[openaddresses] adding units to Chelan County, WA, adding Island County, WA

b52f137b5da3e9bf352949124974253bafca6172 authored over 7 years ago by Al <[email protected]>
[openaddresses] adding units to city of Columbia, MO

6ec4c1fdc98a79936e5cbd68a527cade2e06a2ea authored over 7 years ago by Al <[email protected]>
[openaddresses] adding units in Boone County, MO

f349607412bde9003b5df9f452fb11afec334f5f authored over 7 years ago by Al <[email protected]>
[openaddresses] OSM boundaries no longer needed in Alamance County, NC. Ignore city when it's {ALAMANCECOUNTY, COUNTY}

bd8de15886f8035129d5ac75053b73fc406a062b authored over 7 years ago by Al <[email protected]>
[data] 12 worker pool in data download instead of 10 to download the new parser in one shot

267be6c05cc5adc83edbe90e3a3a85f249d65678 authored over 7 years ago by Al <[email protected]>
[fix] remove bloom.c from libpostal sources

7f8c2f0ad349b7b1efdf57fc8a2c7e7510797e06 authored over 7 years ago by Al <[email protected]>
[data/models] updating libpostal download script to download new models. The simple data files are stored by libpostal major version, whereas the models are stored by the version of the training data they used. A file called "latest" is stored in S3 to indicate the latest version of the model and checked on make

a64c81b45b0f9dbf7975c58c494aee456b4dfa9f authored over 7 years ago by Al <[email protected]>
[api] doing this now since we're bumping a major version. Using a libpostal prefixes for all public header functions and definitions

6d4c7984dfc529643a3e92b0c309fdde84d32fac authored over 7 years ago by Al <[email protected]>
[build] defining libpostal .so version in configure.ac, removing dependency on mmap and sparkey

f8d7bdf3642ddc0d59a332b5fdd85cc6f64164e3 authored over 7 years ago by Al <[email protected]>
[build] add /usr/local/include as default include path for test Makefile as well

f7b695c642ca29023a1d955d8fcd1ef46cd1bfe0 authored over 7 years ago by Al <[email protected]>
[rm] removing ax_blas.m4

ace40bf0aa64be3f57a7418c712534e9fbd9f2b3 authored over 7 years ago by Al <[email protected]>
[build] wrap CBLAS check in a check for the cblas.h header

27426e90d01722d6020de236f75793ec5660a428 authored over 7 years ago by Al <[email protected]>
[openaddresses] OSM boundaries no longer needed in Allen County, IN, needed in Clark County, NV

db726d5ce116c6673795cce9e3e4251a2ed3f392 authored over 7 years ago by Al <[email protected]>