Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/cdli-gh/atf2tei

TEI XML converter for CDLI ATF cuneiform tablet markup.
https://github.com/cdli-gh/atf2tei

gitlab: Ignore spurious RDFLib security warning.

`pipenv check` warns about CVE-2019-7653:

The CLI tools in RDFLib 4.2.2 can load Python modul...

234f270fb8a4cc1879b39f177f7e93e6190aeebe authored about 5 years ago by Ralph Giles <[email protected]>
Convert rulings to notes as well.

Also add a unit test to verify extraction of the $-line.

The other option here is to use a <mil...

a29a4c06f6ecb121a01d589850aa08bf66b38de9 authored about 5 years ago by Ralph Giles <[email protected]>
atf2tei: Convert doctrings to comments.

My editor started warning about comment strings which weren't
explicitly docstrings, so convert ...

7012744e87117fb7f4776a807a0fa9c6b17aca93 authored about 5 years ago by Ralph Giles <[email protected]>
Parametrize $-line parsing test.

Verify common state annotations propagate into the tei model
classes as notes. Unforunately seve...

fb880c3076835a99054efce0c71231a52edab75c authored about 5 years ago by Ralph Giles <[email protected]>
Convert $-state objects.

Write out $ lines as TEI <note> elements, similar to what oracc
does, although they set addition...

e7d546b3c54ef1ce532bde3a12f6ca0b34da0b95 authored about 5 years ago by Ralph Giles <[email protected]>
pipenv: Add hooktest dev dependency.

Now that our output is good enough to avoid crashing HookTest,
add it as a development dependenc...

f2b66586abf139ed9845ee9479395f84d3e30053 authored about 5 years ago by Ralph Giles <[email protected]>
Write out CTS metadata for translations.

Instead of generating a ti:edition tag directly from
the Work object metadata, add CTS propertie...

592588e9330abcc9289a28916ab145df5ef400c4 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Write separate files for each translation.

It makes sense from the point of view of cuneiform scholarship
to consider each tranliteration, ...

7bb3a5c9eadcd60d7d1a3601c989c74baf73dc8b authored about 5 years ago by Ralph Giles <[email protected]>
cts: Remove object level refsDecl.

Don't include the object label (tablet, seal, etc.) in the declared
reference scheme. For the va...

40788ca8fbde0bcc89852828c527aa6ec6dd50ef authored about 5 years ago by Ralph Giles <[email protected]>
atf2tei: Don't write out empty translations.

We were always appending a translation part, just in case the
parser found a parallel translatio...

9cfb83cbc6bc16150a1497510e0bca003ab99484 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Hoist os import.

This module is used in the convert function, but was only imported
by main, preventing atf2cts f...

d75681a44bd76e6c74d95fdbd4f84638d2744298 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Don't print unserializable XML.

This was useful for testing, but sometimes the document itself
raises and exception when it's se...

28735e11d82efe44778c5aa477dfb24ce0778549 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Fix log format nit.

Passing the line as a separate arguments already inserts a space
so we don't need to do so manua...

a0ecc517594d94266c34a6bc4f212c3e6ba2f8af authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Remove unused variables.

These were left over from when we kept copies of all the processed
conversions.

ac9c17e3a83525f707490324535e68d4b408d2ef authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Correct failure reporting.

This wasn't updated properly after convert's return type changed.

4d005982b598d194e42c0c59c4c594601be232bf authored about 5 years ago by Ralph Giles <[email protected]>
Merge pull request #8 from rillian/tei2atf

Add tei2atf script

8eb1696d1913f1619fa8e4b24e4407a96a9b1600 authored about 5 years ago by Ralph Giles <[email protected]>
tei2atf: Add some basic tests.

Verify round-trip parseablitily of the SIL-034.atf example file.

Verify some features convertin...

6dd59960537f419afdd10c0ea0ce2dbcc983b529 authored about 5 years ago by Ralph Giles <[email protected]>
tei2atf: Restore CDLI idno extration.

Explicitly check the idno element release against None. For some
reason the Element object on it...

ee7044ba35bb2206b9503dd98123d4212566ba0f authored about 5 years ago by Ralph Giles <[email protected]>
tei2atf: Handle the Iliad as well.

Serialize all the text inside each line, not just the immediate
element text. Also check for lin...

6a64834d19d099c08b814cdb23b54fcfa826f3f9 authored about 5 years ago by Ralph Giles <[email protected]>
tei2atf: Remove debug prints.

e3f4f5e2150d1adf67bdfd24491c6b7eb586ca0f authored about 5 years ago by Ralph Giles <[email protected]>
Add a rough tei2atf script.

Reverse the transformation, converting TEI XML into ATF.

- Works well enough to restore the SI...

ae5a5eca0e66f0b1e3d6e75e3dcbdc217e8a2d88 authored about 5 years ago by Ralph Giles <[email protected]>
Merge pull request #7 from rillian/work

Apply double-P-number urn scheme

1af7b5855c62a387c035e0b8866ff8355ec41399 authored about 5 years ago by Ralph Giles <[email protected]>
atf2tei: Propagate surface names from the parser.

Section objects may have an optional name, `Surface a` and so on.
We use the object name as the ...

6b2b62961bd88d37f8e917d112bff9c6e2b79ec2 authored about 5 years ago by Ralph Giles <[email protected]>
atf2tei: Remove scaffolding.

These docstrings and sketch code are no longer necessary.
Setting the urn and encodingDesc have ...

e13cf3b79ea536a8a91d20751cc77408c9fa1974 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Use the P464358.P464358.cdli-akk.xml urn style.

Stop creating a test umbrella text group and let the converter
fall back to generating a textgro...

8e13e800794dbddd768be4fce64218f036504880 authored about 5 years ago by Ralph Giles <[email protected]>
atf2cts: Make generate a textgroup if none is given.

For testing we've been placing all the converted documents in
a `test` textgroup. That's fine wh...

91439fc5bef0c5cc1463fdebf37f74a5e716b85e authored about 5 years ago by Ralph Giles <[email protected]>
Move file writing to the XML serialization mixin.

This allows better control over the formatting of the output.

d98b3fa30a4b975702c5327ca13baf48a91c98e8 authored about 5 years ago by Ralph Giles <[email protected]>
Move file writing to a shared function.

We don't need three copies of this code; it's clear enough from
the function name what's happening.

2202c9caaf8b9bf588e73abc687eb4cddeddabec authored about 5 years ago by Ralph Giles <[email protected]>
Set a cdli project component to edition urns.

Recommendations are for the edition urn to have a component
reflecting the project producing the...

aa12ca445e892da557bed7cff981adeffb824634 authored over 5 years ago by Ralph Giles <[email protected]>
Improve docstrings.

The cts serialization objects are about CTS elements.

4a3879cb806ab86c926c3b765fc9fbe790205342 authored over 5 years ago by Ralph Giles <[email protected]>
Improve test textgroup title.

e95b2e02e24d4a3fe2ee9e6d6432aa41a4fecf25 authored over 5 years ago by Ralph Giles <[email protected]>
Skip non-atf file lines.

Rather than asserting, this makes it more clear what's going on
if atf2cts is called on a non-at...

decc8e3b3a1e0c18a0f8d7ea176347f11362a2a5 authored over 5 years ago by Ralph Giles <[email protected]>
Log skipping unknown atf objects.

We're no longer writing these into document comments, and it's
easy to miss them as the number o...

2a7f11a225888c16543738fb23c291f99f02c00e authored over 5 years ago by Ralph Giles <[email protected]>
Fix invalid variable reference.

This comment string wasn't updated when I changed the name of
the loop variable so we were asser...

a44944e9a287258f9d728f8170704da8735aaa6d authored over 5 years ago by Ralph Giles <[email protected]>
Merge pull request #5 from rillian/tei2

Improve CTS compliance

4301b820d2c6e3f4504c85a6297d4f6c6f112c64 authored over 5 years ago by Ralph Giles <[email protected]>
Merge pull request #4 from rillian/tei

Use serialization classes to write out tei xml.

c85a960c965232eda675e181476d13fc5cfaa8cd authored over 5 years ago by Ralph Giles <[email protected]>
Append a project-language component to the edition urn.

We now pass the capitains_units.cts tests except for
Duplicate passages. This is perhaps confuse...

ca87c0327d7ed62275bc973e5e7370f4a7aae933 authored over 5 years ago by Ralph Giles <[email protected]>
Set the work urn on the Edition tag per EpiDoc.

Epidoc seems a better fit, and is what Perseus is moving to.

069ecf52f6bd1b450ec2034b20b129d158245ff9 authored over 5 years ago by Ralph Giles <[email protected]>
atf2cts: Restore language subpart of the work filename.

I don't think this is what we eventually want, but it restores
previous behaviour and matches th...

0c2370c39493fc5ff4993badda848b0c66b37405 authored over 5 years ago by Ralph Giles <[email protected]>
cts: Use literal xpath patterns.

The final element of the line pattern needs to look for an <l> not
a <div> so constructing them ...

42ba024e7214307bcd4cc75a9f93ea8c5519ae19 authored over 5 years ago by Ralph Giles <[email protected]>
atf2cts: Restore refsDecl markup.

CTS guidelines specify the header should contain patterns for
constructing xpath pointers based ...

df56c3b685d7bd54c010d29f495e8ea030d1a9b9 authored over 5 years ago by Ralph Giles <[email protected]>
Convert damage marks to half-brackets.

In atf there are marked by a trailing # on the sign, but in
presentation the sign should be in h...

682615a4579d3111e476baafe0231bca5355753d authored over 5 years ago by Ralph Giles <[email protected]>
Remove determinative and logogram markup.

This doesn't survive the new xml generation and needs to be
replaced code to generate an Element...

ab9015fd3049efc6e8317933a7bc42b61165f2ce authored over 5 years ago by Ralph Giles <[email protected]>
cts: Use the tei.XMLSerializer mixing.

Serialize CTS metadata objects the same way we do TEI objects.
Note this changes the indent on _...

4cd504fba48a9a02e818468508dc95625d4f776c authored over 5 years ago by Ralph Giles <[email protected]>
tei: indent xml output with spaces.

The default 8-space tabs are much too strong an indent for the
nesting of a typical tei document.

f2aed7848823d1dcef411ee5804c7ca8a83301da authored over 5 years ago by Ralph Giles <[email protected]>
tei: Move xml serialization to a mixin.

Create a class for the shared __str__ implementation so the way
xml is serialized is all in one ...

3ba902f14f19725ca516b8c5eb2ca90d4cf829f1 authored over 5 years ago by Ralph Giles <[email protected]>
atf2cts: Restore output path generation.

This brings the output back to near where it was before replacing
template-based xml generation ...

c344869e5ee2c76ab06bc5d0e8c3c2a91037a320 authored over 5 years ago by Ralph Giles <[email protected]>
cts: Re-order Work attributes.

Set these in a more logical order.

2d855090b362354cae5c159413e5ceb5fa29baad authored over 5 years ago by Ralph Giles <[email protected]>
tei: Move language setting into TextPart.

Although generic TextPart objects don't have a language, this
lets us inherit the code in both E...

8c7fa0515c431909df221eb4e820b2a05514e4c7 authored over 5 years ago by Ralph Giles <[email protected]>
tei: Set urn and xml:lang on the body tag.

This is general tei style, not epidoc, but add it now for complete
ness. Attributes still need t...

aee9d0f9ceb0831b9df336dc117f569a5e851b89 authored over 5 years ago by Ralph Giles <[email protected]>
Pass the language through the Document object.

The parser gets this from the ATF and atf2cts needs it to construct
proper metadata files.

cc85890ab3f0c26d7a216961f1bc6cdc3bcd3cde authored over 5 years ago by Ralph Giles <[email protected]>
atf2cts: Return a flag tuple from convert.

Instead of the fancy class, just return a flag tuple. Since boolean
values are numbers these can...

f6027713b0177057b61032deafdaca72f5d29102 authored over 5 years ago by Ralph Giles <[email protected]>
atf2cts: Return Success and Failure classes from convert.

The convert function spawned in parallel for handle each atf
block individually was still return...

a26ed3c7cfa2216f7cc53354e25317e11a0963bc authored over 5 years ago by Ralph Giles <[email protected]>
Serialize document objects when exporting.

Now that atf2tei.convert returns a tei.Document instead of an
already-serialized string, we need...

d0fa49d4fb09cd6d523af150b5567f1b157f948e authored over 5 years ago by Ralph Giles <[email protected]>
Add append method to TextPart.

This is less confusing than remembering it's document.parts.append
vs textpart.children.append.

cc481683bee25141b5e2b3ecc81a2c09f96ba106 authored over 5 years ago by Ralph Giles <[email protected]>
Add setting the textpart n attribute at construction.

Simplifies the caller code since in epidoc we're pretty much
always setting this reference attri...

f8d497ed0743becc7f7a141a6ee1415efda34abe authored over 5 years ago by Ralph Giles <[email protected]>
Add more header information to tei.Header.

Support the same header information we include in the current
atf2tei template.

It would be nic...

8c2c7b76124638eb675b9c296453a2e68e12565e authored over 5 years ago by Ralph Giles <[email protected]>
Add a class for serializing lines of text.

144c87b0038be77f9db0a71c181f31affd80851d authored over 5 years ago by Ralph Giles <[email protected]>
atf2tei: Use single quotes for single lines.

These are single-line strings, so we don't need multi-line quotes.
I was trying to make it unifo...

e13cb9c02593b676d9558cb0ba6f37668ad2a39b authored over 5 years ago by Ralph Giles <[email protected]>
Correct <indo> tag environment.

This tag can't be used directly in a <sourceDesc> and needs to
be wrapped in something like a <b...

e59b9171e573529662b14b1584eb07ec3ee4e417 authored over 5 years ago by Ralph Giles <[email protected]>
Merge pull request #3 from rillian/parallel

Convert documents in parallel

9bd0d5c5cc935f5a1f6e981d01c096e2506f2f7f authored over 5 years ago by Ralph Giles <[email protected]>
Remove '--' logging prefix.

I added this to distinguish the progress message from other
debugging output, but there's less n...

e3e61f8328ec33e63dd3516f38343d21e8a45084 authored over 5 years ago by Ralph Giles <[email protected]>
Aggreggate statistics for all converted files.

Move the failure/success report to the end, after all atf files
given on the command line have b...

d581029c3eefafdb06aa141eba0bdf53ad90f172 authored over 5 years ago by Ralph Giles <[email protected]>
Remove the conversion success bar.

It turns out we successfully convert around 95% of the current
cdli atf blocks, so the bar didn'...

f0cc67ba10fd94ffb68e813a01834079e2fc41b1 authored over 5 years ago by Ralph Giles <[email protected]>
Remove --workers switch.

I only needed this to benchmark, and the default (the number
of cores on the machine) works fine.

2ed29efdeba21141b955d0cc922f593a1ee1ab9e authored over 5 years ago by Ralph Giles <[email protected]>
Convert documents in parallel.

Dispatch the atf conversion work through ProcessPoolExecutor,
which speeds things up by about 4x...

0ae2380b87063bca9a7bf803230401d2b8d8c2d0 authored over 5 years ago by Ralph Giles <[email protected]>
Remove unnecessary include.

328763fcdf996ea1eb7d34d6ffdef25f1116d28d authored over 5 years ago by Ralph Giles <[email protected]>
Merge pull request #2 from rillian/travis

Update travis badge to use cdli-gh fork.

ae532424ff07b92ea7395eaa410630db0bf09208 authored over 5 years ago by Ralph Giles <[email protected]>
Merge pull request #1 from rillian/teitest

Verify tei attribute serialization.

c7c8cc4a13be64c616e2b33cd534e0532a956631 authored over 5 years ago by Ralph Giles <[email protected]>
Update travis badge to use cdli-gh fork.

Now that upstream is the cdli-gh repository, its status is the
more appropriate one to advertise.

decf1e0d7923fa46417a33a39673c6fe49e17602 authored over 5 years ago by Ralph Giles <[email protected]>
Verify tei attribute serialization.

Fix issues with the xml serialization class tests.

A Document sets an xml namespace attribute, ...

b69811fcec36f320f30e3d81760f8f7e9a69f035 authored over 5 years ago by Ralph Giles <[email protected]>
Move cts conversion into a function.

Cleans up a bit and makes it easier to experiment with the harness.

9c17ab45ae753621bf2633a783a1ef123a0e2f5d authored over 5 years ago by Ralph Giles <[email protected]>
Fix pycodestyle nit.

b826b4454ceb9eb86089cd892600687f37e784ad authored over 5 years ago by Ralph Giles <[email protected]>
Mark work metadata descriptions as English.

Hooktest requires there be a language marker. Default to English
since we don't have any languag...

8c78d9f5e561fd07cb79e62981cceef245850fb1 authored over 5 years ago by Ralph Giles <[email protected]>
Skip atf records which fail to convert.

Keep a list of failures and report the success ratio. I just wanted
to see how far along I was o...

4c82ffe95a5195028f228a7a9d114f9f94c73ad8 authored over 5 years ago by Ralph Giles <[email protected]>
Handle logogram markup on replaced signs.

Another possible punctutation sign was missing from the regex.

346afb5bfdb4c14b7ff8376c0a84e9d433679f16 authored over 5 years ago by Ralph Giles <[email protected]>
Close logogram markup on damaged and uncertain signs.

The regex was missing some possible trailing characters.
See for example P005984 and P292940.

4437782dba72dc6ff74812fc68e9f208f73ee6f8 authored over 5 years ago by Ralph Giles <[email protected]>
Start a tei module for serializing the XML text.

Add some classes to model the major pieces which can create and
ElementTree representation for s...

a7256f1d635ee1938c63ae5ce70046e20d62dbb6 authored over 5 years ago by Ralph Giles <[email protected]>
Merge branch 'master' of https://github.com/cdli-gh/atf2tei

3e6c649659fd4b3179079a08b3a6bef68a67ee3e authored over 5 years ago by Ralph Giles <[email protected]>
Set lang attributes on work title and label.

These are part of the Capitains Guidelines. Just default to
English for lack of a better option,...

ee34f3a04badeadfb1c79a83af12a493190cae02 authored over 5 years ago by Ralph Giles <[email protected]>
Revert <w> markup.

The _ atf markup sometimes spans multiple words, which causes
nesting problems if words are also...

b7fb93f880324fffd0d4e97a92e2322547e9e2bc authored over 5 years ago by Ralph Giles <[email protected]>
Wrap long lines.

Fix pycodestyle lint.

5db4097845cb5b89eee6bccecd4307ddbcd7f755 authored over 5 years ago by Ralph Giles <[email protected]>
Wrap transliteration words in <w>.

Better aligns with markup from the oracc adart conversions.

79116896c85a347930f28b5e875f6e9ffa286a6c authored over 5 years ago by Ralph Giles <[email protected]>
Use <c> for determinatives and logograms.

This aligns better with the oracc adart encodings.

39c77e399012f16822adbc673eabdf0a50e77c71 authored over 5 years ago by Ralph Giles <[email protected]>
Use line numbers from atf.

Previously we were generating sequential line numbers so the tei
labels were regular. But since ...

e3c33dca3b578694462e9a320a443d4c81089c97 authored over 5 years ago by Ralph Giles <[email protected]>
Remove a debug print.

This is working reliably now.

ece6a6706573bf42ff8d4df71f2238bf2a6dca69 authored over 5 years ago by Ralph Giles <[email protected]>
Serialize translations for Comment objects.

Currently pyoracc doesn't parse interlinear translations, so the
Translation object only shows u...

174a2c600a2fa5bbd3d0072ae4cd01f697f174f0 authored over 5 years ago by Ralph Giles <[email protected]>
Fix README formatting.

The triple-quote is redundant with the indent, and we don't
really need syntax highlighting, so ...

3f46b10c4906fb0aa8bf17809f49aee6afcb3502 authored over 5 years ago by Ralph Giles <[email protected]>
Move tests to a sub-module.

Reduces clutter in the top-level directory.

We seem to need the dummy __init__.py to make impor...

7f7e5eb35342450e250b2f8617802297d6305ce7 authored over 5 years ago by Ralph Giles <[email protected]>
Use classes to write out __cts__.xml.

Use cts.TextGroup and cts.Work to construct and write out the
corresponding metadata files.

Not...

2b826f5b7031ba1061e511784dc36cb1ff95b0b3 authored over 5 years ago by Ralph Giles <[email protected]>
Add classes to represent __cts__.xml index files.

This encapsulates the data needed for the Canonical Text
Services file hierarchy and makes it si...

26ef4ea1935df5d81e9f589dc64b7b24c171410a authored over 5 years ago by Ralph Giles <[email protected]>
Handle more atf markup cases.

Properly handle logogram markup next to punctuation.
Issues found with P464358 Codex Hammurapi c...

22f876b88cb84b784538a54e9fcbbae713d27b20 authored over 5 years ago by Ralph Giles <[email protected]>
Append lang to document filename.

This matches the lang suffix on the edition urn in the __cts__.xml
metadata file. I think this i...

09832c72368a890ddafaa947953d92cab5392a63 authored over 5 years ago by Ralph Giles <[email protected]>
Escape the title string in __cts__.xml.

The title can of course contain characters like '&'.

abfbe10e0baa1bf8273c6073484f313c0f7be66b authored over 5 years ago by Ralph Giles <[email protected]>
test: Use itertools.repeat's built-in limit.

Calling islice is an unnecessary step here.

f87bb38680230287e0defe7f5754044bbb4f204b authored over 5 years ago by Ralph Giles <[email protected]>
test: Add a paratrized test of the segmentor.

Verify multiple copies segment into the correct number of atf blocks.
This is redundant with `te...

030eeedb296187476a87fac31b36d6671d0cfc4c authored over 5 years ago by Ralph Giles <[email protected]>
test: Move the test filename to a variable.

Avoid repeating the name in each test.

669422e5ed5a43beae2ebc6ec4a4e7a6e7d28236 authored over 5 years ago by Ralph Giles <[email protected]>
Add a simple test for the atf segmentor.

Just verify that something comes out.

5c9f296eac09afa9052513e9f465c790a80394ad authored over 5 years ago by Ralph Giles <[email protected]>
Add atf2cts.py.

Wrapper around atf2tei which segments a multi-object atf file,
converts each segment to xml and ...

171d261da8f3b08736596ec7201e2521cbfc1a94 authored over 5 years ago by Ralph Giles <[email protected]>
Make atf2tei.convert take a string.

The pyoracc parser also takes a string, but this simplifies
passing in something from an externa...

bae374a2e5e3018d139f75fb2fd5f7e488a7f2b1 authored over 5 years ago by Ralph Giles <[email protected]>