Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/cdli-gh/cdli-search

Search interface experiment
https://github.com/cdli-gh/cdli-search

Print the non-printable characters.

Output a python-escaped version of the field values so it's clear
what triggered the warning. Of...

71b34d775dccceb23103b73d21d438d93f47c29f authored about 5 years ago by Ralph Giles <[email protected]>
Ignore __pycache__

Python caches compiled code in this this directory. We don't want
this tracked in the source cod...

a883709876c574a2db10f0f7ea2494db95054a1d authored about 5 years ago by Ralph Giles <[email protected]>
travis: Run pycodestyle.

Mark commits with style nits as not passing.

595ae9ca06d34ca142dc8ce38436862ee8e96113 authored about 5 years ago by Ralph Giles <[email protected]>
travis: Add build status branch to README.

This only reflects the status of the current master branch on
github, not any local checkout whe...

5b8810b42696f8610a140d92091a0a4e0dacf330 authored about 5 years ago by Ralph Giles <[email protected]>
travis: Try running the upload as a test.

The continuous integration system at travis-ci.org offers some
docker support. Try instantiating...

f8d008b8098379eac1da10e102d9f92387769e4e authored about 5 years ago by Ralph Giles <[email protected]>
Update to Elasticsearch 7.3.0.

This is the current release. Seems to work fine.

e39ef89f96ea8943d96802cd9222583dfd448dfc authored about 5 years ago by Ralph Giles <[email protected]>
upload: Use argparse to accept data_path and --quiet.

Parse command-line arguments when invoked as an executable.
Require the path to the cdli data ex...

2ffe55e74894555bb1de076e6da8379723757835 authored about 5 years ago by Ralph Giles <[email protected]>
check: skip non-string entries.

When an extra column is present in a row, csv.DictReader returns
it as a list under the `None` k...

53a376157da665513f67ec8878fb4bb8cd5adce1 authored about 5 years ago by Ralph Giles <[email protected]>
check: Fix long line.

Reformat the check_columns loop to fit in 80 characters.
Less nesting can be easier to read and ...

169d50a5189053aeb21bbd8d8753189f656bfaa8 authored about 5 years ago by Ralph Giles <[email protected]>
check: Share check.py DictReader creation.

Move the DictReader instance to a helper method so it can be
accessed directly by the checks whi...

6072f71388d8ff4068f1c723d238f6b488ea6c8b authored about 5 years ago by Ralph Giles <[email protected]>
Move common functions to cdli.py.

Both check.py and upload.py need to open and part the catalogue
csv files. Move this and some ot...

7726c9eacda4f1a601ee00ec156bd348501920d8 authored about 5 years ago by Ralph Giles <[email protected]>
Accept path to data repo on the command line.

Allow invoking as `pipenv run python upload.py /path/to/data` so
it's possible to use an existin...

14a63411c47d24624a22947e15734c065de4b599 authored about 5 years ago by Ralph Giles <[email protected]>
Switch to logging output.

This is mostly to reduce the length of the output when running in
automation.

76a8424c1f17387ad828450e60789b88c1f25c06 authored about 5 years ago by Ralph Giles <[email protected]>
Look for the export data in a local `data` directory.

Matches what the travis test expects.

b1ef42153f3387599e8f5366386a5c9193debeff authored about 5 years ago by Ralph Giles <[email protected]>
Add a check for entries with whitespace or non-printable characters.

Unfortunately there are over a thousand of the non-printable character
corruptions. Many are tab...

5b2493cbc8d486201e5c80286f8f700957d25ab1 authored about 5 years ago by Ralph Giles <[email protected]>
Print something about updating the index alias.

fb3a91b74eabf8a6d2ce51113306497dd4d0f31f authored about 5 years ago by Ralph Giles <[email protected]>
Fix README typo.

The upload script needs to be invoked with python explicitly.

ceec23fa1f9f717a14b7576c95098de73d2fd3ef authored about 5 years ago by Ralph Giles <[email protected]>
Link to the example interface page.

Nicer than having to paste the url, assuming the browser will
follow a localhost link.

27a3620da392a56410d6dbdbbb8f20146083a516 authored over 5 years ago by Ralph Giles <[email protected]>
Link to cdli from the readme.

Don't make people look it up.

6f49421a9914c96b1a924e4e32513e12ba3bd735 authored over 5 years ago by Ralph Giles <[email protected]>
Add README.

Give a basic quickstart introduction to help people try out
the demo.

340a3ee7069f41e04e14c412287d5c41e42f0a19 authored over 5 years ago by Ralph Giles <[email protected]>
Add a code of conduct.

Specify interaction standards for publication. Copied from
contributor-covenant.org by Coraline ...

4b10702cd911d0aef609485445417ec5f4212515 authored over 5 years ago by Ralph Giles <[email protected]>
Add GPLv3 license.

Set a license file for publication.

92fd361f98bcbba446b9a98e66db312d0c82037f authored over 5 years ago by Ralph Giles <[email protected]>
Some rough test scripts for investigating data consistency.

invoke with `pipenv run python check.py`.

6a4352c633c76ebb96c1fe222cdcbcbb72caec1c authored over 5 years ago by Ralph Giles <[email protected]>
Add a utility function to clear the search index.

This is useful for testing when we to start with a fresh upload.

36354d25a8580845fbec6d8ee54102cc5d6716d9 authored over 5 years ago by Ralph Giles <[email protected]>
Use an index alias for search.

Set or update an generic 'cdli-catalogue' index alias when the
upload is complete and use that i...

1cfbb0611d0d73d097ad0d6ce11cd08dc6521554 authored over 5 years ago by Ralph Giles <[email protected]>
Assert instead of clobbering metadata keys.

Complain instead of silently altering the catalogue data in
the unlikely possibility that column...

ad1c89d40ba3cb5df8388ded2acb8b2f3ae1c574 authored over 5 years ago by Ralph Giles <[email protected]>
Restore document id generation.

Wrap read_catalogue in another function which adds metadata keys
to the row dictionary. This let...

1849ae84e8abf396a3ccb01c51cfca7968b83e68 authored over 5 years ago by Ralph Giles <[email protected]>
Use the bulk api to speed up import.

This is much faster than the previous document-at-a-time indexing
upload, but still takes severa...

4f85cb3db8173f79cf88a36c641e2789d03e962c authored over 5 years ago by Ralph Giles <[email protected]>
Remove debug logs.

These are no longer necessary now that it's working.

cf74f0df01983ae8c70795ea33e9106e8e4fc0ad authored over 5 years ago by Ralph Giles <[email protected]>
Fix typo.

Make div#results hide itself properly, and remove the obsolete
placeholder data.

185dae1ef2345689194dffd5c204310bc001e7b2 authored over 5 years ago by Ralph Giles <[email protected]>
Fix search results in the test client.

Correctly generate a list of results. It turns out `this` isn't
bound in then functions chained ...

288596231fa9ad729997916e2b8e0557183538a2 authored over 5 years ago by Ralph Giles <[email protected]>
Use a relative url in the test client.

This ensures queries are sent to the api instance without having
to pass the correct url into th...

4482e9ccf59ea40cb871c6d3b51c6a9ae3daf9c4 authored over 5 years ago by Ralph Giles <[email protected]>
Serve the test client under vue/index.html.

Use starlette's StaticFiles class to serve anything under the
vue directory.

99d29479d968adc670e3e0808b81311362fd8390 authored over 5 years ago by Ralph Giles <[email protected]>
Add a simple demo page.

Vue.js search page to make testing the api a little nicer.

1181d31902817b0f729b2132f9845b0a2a34a9b3 authored over 5 years ago by Ralph Giles <[email protected]>
Do a full-text query instead of a structured one.

I think this is the same as the _search?q=foo REST api.

2905c99df4eb8bb4644a0c67d1fd6ec204c35bdf authored over 5 years ago by Ralph Giles <[email protected]>
Add title and description to the query parameter.

I tried using pydanic.Schema to add the same documentation to
the paging parameters skip and lim...

7791793fc7d9a58a6286e8c249e14aa150e36b86 authored over 5 years ago by Ralph Giles <[email protected]>
api: Add a search endpoint.

Query some common fields. Finds 'Ishtar descent' but not 'K 162',
so not really doing better yet...

ef37c6b3287a5d1e8cd1e1501d35cc25ab8af261 authored over 5 years ago by Ralph Giles <[email protected]>
api: Add /catalogue/{id}.

Fetches the catalogue data for the given CDLI id (the numerical
part of the P-number).

b2810c6d60ffadd8ed7916de4e9bdb1df01d0878 authored over 5 years ago by Ralph Giles <[email protected]>
Stub out an api service for searching.

FastAPI-based stub for querying the index and returning search
results.

Invoke with:

pipen...

a63280f82e971b657eac92d7e90232e642b9299d authored over 5 years ago by Ralph Giles <[email protected]>
Upload each row to the elasticsearch instance for indexing.

Simplest possible indexing with no tuned weights for different fields.

Generate an index name b...

23c4018ce5808a4f5dc36c9d5e3f2038b083f77e authored over 5 years ago by Ralph Giles <[email protected]>
Add elasticsearch import.

This is the Python package wrapping the elasticsearch REST api.

Invoke with

pipenv install...

e82096c05c848681ab65fda8d0f3c819402ddee6 authored over 5 years ago by Ralph Giles <[email protected]>
Move local path config into the test invocation.

This is only relavent when invoked in my local config, where
a checkout of https://github.com/cd...

65d4a3b0301f88f6b3889cacd50a9da2383116f6 authored over 5 years ago by Ralph Giles <[email protected]>
Iterate of the catalogue csv files.

Build a reader to logically concatenate the two file segments,
read and parse each row, and prin...

08f463498fbc91bc637533543f8aaedab78c3394 authored over 5 years ago by Ralph Giles <[email protected]>