Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

libpostal

libpostal is a C library for parsing/normalizing street addresses around the world
Collective - Host: opensource - https://opencollective.com/libpostal - Code: https://github.com/openvenues/libpostal

[fix] Alpha-numeric splitting

github.com/openvenues/libpostal - 89d0fd571808c1a9be32c1708ff0211fad5d79a5 authored over 9 years ago by Al <[email protected]>
[utils] cstring_array_cat

github.com/openvenues/libpostal - 6428c0ae2048f823060e5f4e527aedd37f1dedab authored over 9 years ago by Al <[email protected]>
[osm] Adding dependencies so single street names are not valid without at least one of {house, number, suburb, city, postcode}

github.com/openvenues/libpostal - 5d2a24872a018a338069c079487578129881fd6b authored over 9 years ago by Al <[email protected]>
[osm] Adjusting priors for country code expansion

github.com/openvenues/libpostal - 77be2fe43366d6e9422ec715f0950b17686a97b3 authored over 9 years ago by Al <[email protected]>
[fix] keeping name tag in address components

github.com/openvenues/libpostal - 0b98a2642654681f01369567e3416c44b0c63676 authored over 9 years ago by Al <[email protected]>
[osm] Doing initial formatting after replacing country/state

github.com/openvenues/libpostal - 0f9ad259dc28a1628b87307db18a12fd8f75d6e7 authored over 9 years ago by Al <[email protected]>
[fix] import, initialization

github.com/openvenues/libpostal - 71233c9c02ff02ccc2e51dc36b98a0102f9f0dc5 authored over 9 years ago by Al <[email protected]>
[fix] file encoding

github.com/openvenues/libpostal - 85b17d9b271bfd2f502c1abd0a0a095f066bfc70 authored over 9 years ago by Al <[email protected]>
[osm/parsing] Randomly replacing country codes with local and foreign language expansions as well as randomly expanding state abbreviations to make parser more robust to different input

github.com/openvenues/libpostal - 22efce73374b2189fca1c0041dba87fc56616048 authored over 9 years ago by Al <[email protected]>
[expansion] Adding state abbreviations for US, Canada and Australia for expansion while generating OSM training data

github.com/openvenues/libpostal - 89208120550cace14cee164464f3cff9a6f4faca authored over 9 years ago by Al <[email protected]>
[languages] Function to sample a random language from a discrete distribution (e.g. languages on the Internet, languages in a country, etc.)

github.com/openvenues/libpostal - 7eb18f3538b5114dd8fe34844baf339baa9dfd32 authored over 9 years ago by Al <[email protected]>
[fix] abbreviations

github.com/openvenues/libpostal - 0aa6950b6cba0f48a9fb6c9cab8ffd9321d1e79c authored over 9 years ago by Al <[email protected]>
[fix] checking validity of component combination

github.com/openvenues/libpostal - db71b65412cdcbc90f984839af8b8be55fefb6d6 authored over 9 years ago by Al <[email protected]>
[fix] bitset for address components, only looking at valid component keys

github.com/openvenues/libpostal - 521f33d892764656582fdd68cf338b50f8dd8878 authored over 9 years ago by Al <[email protected]>
[fix] only OSM tagged addresses need extra logic

github.com/openvenues/libpostal - 528285f735dca548299042cd81b91c879d41dc55 authored over 9 years ago by Al <[email protected]>
[osm/parsing] Making tagged training data for address parser more robust to the types of partial input we see in geocoding by randomly eliminating components subject to some constraints (e.g. house number cannot be used without a street name)

github.com/openvenues/libpostal - 83aecb9f2cf90edf19b84d3df9e0e37bdeaadfcc authored over 9 years ago by Al <[email protected]>
[fix] spoken/official

github.com/openvenues/libpostal - c790a2b87fc8f31a259ca65b46d2a057569f18df authored over 9 years ago by Al <[email protected]>
[geonames] Using official country languages in GeoNames

github.com/openvenues/libpostal - db3364be3051a283987d2db1139c7400ad7137e4 authored over 9 years ago by Al <[email protected]>
[tokenization] Regenerating scanner.c

github.com/openvenues/libpostal - 562aeb497d26d598c81708df642c0ed0513e37c7 authored over 9 years ago by Al <[email protected]>
[tokenization] Acronym vs abbreviation

github.com/openvenues/libpostal - 689b830ad268723a8c1695a6c5ab8584371b817e authored over 9 years ago by Al <[email protected]>
[languages] options for get_country_languages

github.com/openvenues/libpostal - 7dfbcce9ec4ea86024f187b241927e591df4935d authored over 9 years ago by Al <[email protected]>
[doc] doumentation for country_names module, fixing variable name

github.com/openvenues/libpostal - 86e9166ae880cbb1e4a317ccb47295d245a5dd06 authored over 9 years ago by Al <[email protected]>
[countries] Making country official names align better with OSM/Wikipedia, plugging holes

github.com/openvenues/libpostal - 42e77cb57058debc6bb7df873ffcf6d0abe50e7e authored over 9 years ago by Al <[email protected]>
[languages] Changing Arabic to default in North African countries with two official languages. Making Danish secondary in the US Virgin Islands

github.com/openvenues/libpostal - 0cedc68a97dfa101cda639bd7dca4670faf19ebc authored over 9 years ago by Al <[email protected]>
[formatting] Constants for field names, a few options in format_address

github.com/openvenues/libpostal - 40cf24765582c3ede0ce1cc9a5166c86d19eecf6 authored over 9 years ago by Al <[email protected]>
[countries] Adding module for getting official country names in every language from CLDR + a dictionary of local language names

github.com/openvenues/libpostal - 22e8178a979ffd5a7954b565378ef5284e3e5e59 authored over 9 years ago by Al <[email protected]>
[geodb] Renaming geodb

github.com/openvenues/libpostal - c3c6a18df847cf92675873f39b1d83b95d6eced0 authored over 9 years ago by Al <[email protected]>
[fix] labels in averaged perceptron trainer

github.com/openvenues/libpostal - 8ca22247f9336ce75b4f501e0df519b6faa839c2 authored over 9 years ago by Al <[email protected]>
[fix] Labels in averaged perceptron tagger

github.com/openvenues/libpostal - 6666f0baf8d02c331993dd67317829169f4ab8a1 authored over 9 years ago by Al <[email protected]>
[dictionaries] Adding commonly used colon form No: for Turkish addresses

github.com/openvenues/libpostal - 05da2ee6bd8c58747bcbb04579f34a663f33659e authored over 9 years ago by Al <[email protected]>
[geonames] Removing alternate names from geonames data set which are digits-only (most are not legitimate)

github.com/openvenues/libpostal - daad1a13136224458be8d42a06c8c218007b52bc authored over 9 years ago by Al <[email protected]>
[api] Setting global objects to NULL on teardown

github.com/openvenues/libpostal - 12816d0e9542423871435397a145a686fc4ffed1 authored over 9 years ago by Al <[email protected]>
[build] Adding libpostal_data script for downloading data from S3, Makefile uses that now as part of the all-local target. Can be run periodically after install

github.com/openvenues/libpostal - abfa744d59bbb133265775572e61f3a46cbc211d authored over 9 years ago by Al <[email protected]>
[fix] only commas and hyphens need to be eliminated at the end of phrases in untagged address formatting

github.com/openvenues/libpostal - 93b3110a491be2f3b9ca2073a180c16af9b36c6a authored over 9 years ago by Al <[email protected]>
[osm/formatting] Fixing formatting tagged addresses with comma separated fields

github.com/openvenues/libpostal - d3bfaf6b43e7e2d2fef12ee7e6e0b57476041e57 authored over 9 years ago by Al <[email protected]>
[fix] removing space from tokens in address formatting

github.com/openvenues/libpostal - d512201e2c7e93780426364f1c3005744c8847a2 authored over 9 years ago by Al <[email protected]>
[readme] Readme fixes and additions

github.com/openvenues/libpostal - a3214b791461c4536015cb4ad707aee693e9f178 authored over 9 years ago by Al <[email protected]>
[fix] blank values containing punctuation in formatting

github.com/openvenues/libpostal - 5b829cd5a789b39b88e641f2ef482494d9f1f703 authored over 9 years ago by Al <[email protected]>
[dictionaries] Luxembourgish dictionaries

github.com/openvenues/libpostal - e255ae0e0988046f3ca8eabaf1b6f91e4de2f049 authored over 9 years ago by Al <[email protected]>
[dictionaries] German Swiss dictionaries

github.com/openvenues/libpostal - 3fe56d029de610e5a2809876582b62f26bbbfa50 authored over 9 years ago by Al <[email protected]>
[osm/formatting] Moving back to openvenues repo pending resolution of the Turkish address issue

github.com/openvenues/libpostal - ae93552455d48446a25e83ff4476eb24f244b8de authored over 9 years ago by Al <[email protected]>
[osm/formatting] Changing the way the formatter elimiates inter-component separators, changing repo back to OpenCageData after pull request merge

github.com/openvenues/libpostal - 0c792a2cc37205cc8e8d677308ff5e82c15a0a30 authored over 9 years ago by Al <[email protected]>
[tokenization] Regenerated scanner.c

github.com/openvenues/libpostal - 856198a3529ba222ca58a524924e31e2d3bcac06 authored over 9 years ago by Al <[email protected]>
[transliteration] Regenerating transliteration data with new categories

github.com/openvenues/libpostal - 07f1f361e2d6f72cf03d8f8f076b012bffbe28e4 authored over 9 years ago by Al <[email protected]>
[tokenization] Adding updated token classes to scanner.re

github.com/openvenues/libpostal - 172263af58412bca148c86120df8798728a9cce7 authored over 9 years ago by Al <[email protected]>
[unicode] Downloading latest UnicodeData.txt instead of using builtin Python module (out of date) e.g. for getting unicode codepoint categories

github.com/openvenues/libpostal - 5417b4e602e2662fc216106e289ef0a6f2b4bf70 authored over 9 years ago by Al <[email protected]>
[fix] ensure_dir in file downloads

github.com/openvenues/libpostal - 8fe791a14acab7264184b97464444067a90496ce authored over 9 years ago by Al <[email protected]>
[osm/formatting] Continuing to use openvenues formatter for the India fix

github.com/openvenues/libpostal - 646b9f72488da0fc5ba4b80f37c38f243c5961ac authored over 9 years ago by Al <[email protected]>
[api] Adding LIBPOSTAL_DEFAULT_OPTIONS to libpostal.h

github.com/openvenues/libpostal - 5a6b47d0fd44e1ee7a895fd676bde57c229f53b1 authored over 9 years ago by Al <[email protected]>
[readme] missed a dictionary type

github.com/openvenues/libpostal - f5bb72c6f5bcb719ac9a7899e071943ab8f147c7 authored over 9 years ago by Al <[email protected]>
[readme] Moving paragraph

github.com/openvenues/libpostal - cfef3059bbb99fe39187795882bf9c28733ef575 authored over 9 years ago by Al <[email protected]>
[readme] README changes

github.com/openvenues/libpostal - f62cfb955144545b2ad2781f3572a4b1e40d2a2a authored over 9 years ago by Al <[email protected]>
[readme] More informative README

github.com/openvenues/libpostal - 3e256404b9bcbde81aa063ed4a867ef9fb603c48 authored over 9 years ago by Al <[email protected]>
[fix] Switching address formatter back to OpenCageData repo

github.com/openvenues/libpostal - 9901dd2aacdeecfd96d32368ee1d5ce7ed5483e9 authored over 9 years ago by Al <[email protected]>
[expansion] Regenerating expansion data

github.com/openvenues/libpostal - accd8a57e7d62637966a14519294cb3b8acafa9a authored over 9 years ago by Al <[email protected]>
[dictionaries] Afrikaans dictionaries for better disambiguatin in South Africa

github.com/openvenues/libpostal - fa320defb739bd690e51468256d34c8b1a11da07 authored over 9 years ago by Al <[email protected]>
[dictionaries] Dutch directionals, separating out the west vs westen forms

github.com/openvenues/libpostal - 050a850fb933b7ecc25fe59768401744b3816480 authored over 9 years ago by Al <[email protected]>
[dictionaries] Arc in English needn't always expand to Arcade

github.com/openvenues/libpostal - fe5d66553396f094b9e5cff71cd37c09f098772d authored over 9 years ago by Al <[email protected]>
[dictionaries] Separating out Austrian toponym abbreviations

github.com/openvenues/libpostal - bcac6a41be1304158fcbc43e5aa9bfcf19349746 authored over 9 years ago by Al <[email protected]>
[osm/formatting] Tagging separators as well in tagged output of the address formatter

github.com/openvenues/libpostal - c85ce0b11d06508deb31dc0d2a4af87e68ccee4e authored over 9 years ago by Al <[email protected]>
[normalize] New token normalization option for replacing digits with 'D' for masking numbers e.g. when learning patterns (so 1234 and 5678 both normalize to DDDD). Shouldn't be used by libpostal API, just by the feature extractors in the machine learning models. Also adding better possessive handling.

github.com/openvenues/libpostal - f6c30778bfa4492b9911b57fca6fb84ac218f96b authored over 9 years ago by Al <[email protected]>
[doc] Averaged perceptron tagger

github.com/openvenues/libpostal - a1d272077dbbe63876fa7beafd684b27e2868936 authored over 9 years ago by Al <[email protected]>
[unicode] better segmentation on script breaks

github.com/openvenues/libpostal - 88bd0cd158d660796747a451a4d14507e06100f0 authored over 9 years ago by Al <[email protected]>
[transliteration] Regenerating transliteration data files

github.com/openvenues/libpostal - 377c9475415c9f19da2624c62b3d62739af402ad authored over 9 years ago by Al <[email protected]>
[transliteration] Wide char support in transliteration data generator

github.com/openvenues/libpostal - abfb1d4a60cc462670e5ff283340f374d0b147d3 authored over 9 years ago by Al <[email protected]>
[utils] basic functions for wide char support for narrow Python builds (unichr, ord, unicode iteration)

github.com/openvenues/libpostal - 7e057b0fb87502ef093dacce34fd007514e1e4a8 authored over 9 years ago by Al <[email protected]>
[unicode] Adding wide char support for language disambiguation (comes up in venue names), despite the likelihood of running on a narrow Python build. Rolling back common script chars at a script break, so in the case of e.g. Cyrllic name (Latin name), the segmentation is done at the space before the paren.

github.com/openvenues/libpostal - 8562c7a5cbcd2a199c887fc28d7fbdd135dc7646 authored over 9 years ago by Al <[email protected]>
[unicode] Regenerated unicode scripts data file, using simple integers instead of repeating the enum types for succinctness

github.com/openvenues/libpostal - 19e5457a0fda97700614ea524badbcacff43cc87 authored over 9 years ago by Al <[email protected]>
[unicode] Regenerated unicode script types (ignore extraneous scripts, they're not used, just reside in the upper unicode planes)

github.com/openvenues/libpostal - 4ad3fac62761f2530e69ff53c64004a381e57117 authored over 9 years ago by Al <[email protected]>
[unicode] Allowing wide chars in unicode properties

github.com/openvenues/libpostal - 13bcc35523000a52ef71c8988528ff65851392c1 authored over 9 years ago by Al <[email protected]>
[tokenization] Regenerated scanner.c

github.com/openvenues/libpostal - f13e9fad90fe145b84a219df031e6e8d0e02946e authored over 9 years ago by Al <[email protected]>
[unicode/tokenization] Using new character classes including wide chars in scanner

github.com/openvenues/libpostal - b4593b6f8839eff7bb264d0ba77c95560e8c685f authored over 9 years ago by Al <[email protected]>
[unicode] Wide version of word breaks

github.com/openvenues/libpostal - a76831df7a35ded2bc34c2cf7f8f3736b15ac3bf authored over 9 years ago by Al <[email protected]>
[fix] chars out of range in get_string_script Python version

github.com/openvenues/libpostal - b405a53fe151a0ae2c3f3fa0c56f1a4f3797e485 authored over 9 years ago by Al <[email protected]>
[fix] Not writing empty fields in formatted addresses

github.com/openvenues/libpostal - ca25b486879c124a1dd0bd931507f333f1ae908b authored over 9 years ago by Al <[email protected]>
[fix] Accounting for unknown scripts in disambiguation

github.com/openvenues/libpostal - 747de1944b623f55c6c9a2e050e235559f7bbcfe authored over 9 years ago by Al <[email protected]>
[setup] fixing packaging

github.com/openvenues/libpostal - 3ac89d7ed94cfa3343a90231ff32cfe16fcafa77 authored over 9 years ago by Al <[email protected]>
[tokenization/osm] Using utf8 encoded version of string for tokens in python tokenizer

github.com/openvenues/libpostal - 236737eab31af79c1d80be135d4239776be33544 authored over 9 years ago by Al <[email protected]>
[osm] Using street for language disambiguation in training data

github.com/openvenues/libpostal - 134cf616d6c510a38e285fceab2e9fc11dfa7b2c authored over 9 years ago by Al <[email protected]>
[fix] package directory

github.com/openvenues/libpostal - ccac4a5a90df0a5b94394004d19a8acb4b0b0397 authored over 9 years ago by Al <[email protected]>
[fix] pytokenize compilation on Ubuntu/gcc

github.com/openvenues/libpostal - 5b2fd0be5038ba9a38c3caa570276dcd1eb92c18 authored over 9 years ago by Al <[email protected]>
[fix] stdint include

github.com/openvenues/libpostal - cffa5a4a206e223b13998a955de34d1905b755af authored over 9 years ago by Al <[email protected]>
[setup] setup.py for pypostal so it can be installed from the Github url

github.com/openvenues/libpostal - 25b3338600361aad644aa20e29bbbf12c10dcfdc authored over 9 years ago by Al <[email protected]>
[osm] Separating address formatter into its own module, adding some documentation of the various training sets with examples

github.com/openvenues/libpostal - 84cf21df88969ecbce3c22482f5ad1ce06471974 authored over 9 years ago by Al <[email protected]>
[python] Adding initial pypostal bindings for tokenize so we can remove address_normalizer dependency. Not tested on Python 3.

github.com/openvenues/libpostal - 5485ea21971d2d57f7b3c84d2a1264e8812a96cf authored over 9 years ago by Al <[email protected]>
[fix] fixing some compiler warnings, using type-specific abs functions for vector_math

github.com/openvenues/libpostal - 3fab0f984f735ac74ab75ddb3c2e9039532ae9c8 authored over 9 years ago by Al <[email protected]>
[osm] Separating tagged from untagged output

github.com/openvenues/libpostal - 6731395ca06ace8a27b87fc4d5d0bc44d4d39e8e authored over 9 years ago by Al <[email protected]>
[fix] tokenized string destroy frees original string

github.com/openvenues/libpostal - 2940cc15b89d6ccd23354979786c19097d93ed5a authored over 9 years ago by Al <[email protected]>