Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/cdli-gh/atf2tei
TEI XML converter for CDLI ATF cuneiform tablet markup.
https://github.com/cdli-gh/atf2tei
`pipenv check` warns about CVE-2019-7653:
The CLI tools in RDFLib 4.2.2 can load Python modul...
234f270fb8a4cc1879b39f177f7e93e6190aeebe authored over 5 years ago by Ralph Giles <[email protected]>Also add a unit test to verify extraction of the $-line.
The other option here is to use a <mil...
a29a4c06f6ecb121a01d589850aa08bf66b38de9 authored over 5 years ago by Ralph Giles <[email protected]>
My editor started warning about comment strings which weren't
explicitly docstrings, so convert ...
Verify common state annotations propagate into the tei model
classes as notes. Unforunately seve...
Write out $ lines as TEI <note> elements, similar to what oracc
does, although they set addition...
Now that our output is good enough to avoid crashing HookTest,
add it as a development dependenc...
Instead of generating a ti:edition tag directly from
the Work object metadata, add CTS propertie...
It makes sense from the point of view of cuneiform scholarship
to consider each tranliteration, ...
Don't include the object label (tablet, seal, etc.) in the declared
reference scheme. For the va...
We were always appending a translation part, just in case the
parser found a parallel translatio...
This module is used in the convert function, but was only imported
by main, preventing atf2cts f...
This was useful for testing, but sometimes the document itself
raises and exception when it's se...
Passing the line as a separate arguments already inserts a space
so we don't need to do so manua...
These were left over from when we kept copies of all the processed
conversions.
This wasn't updated properly after convert's return type changed.
4d005982b598d194e42c0c59c4c594601be232bf authored over 5 years ago by Ralph Giles <[email protected]>Add tei2atf script
8eb1696d1913f1619fa8e4b24e4407a96a9b1600 authored over 5 years ago by Ralph Giles <[email protected]>Verify round-trip parseablitily of the SIL-034.atf example file.
Verify some features convertin...
6dd59960537f419afdd10c0ea0ce2dbcc983b529 authored over 5 years ago by Ralph Giles <[email protected]>
Explicitly check the idno element release against None. For some
reason the Element object on it...
Serialize all the text inside each line, not just the immediate
element text. Also check for lin...
e3f4f5e2150d1adf67bdfd24491c6b7eb586ca0f authored over 5 years ago by Ralph Giles <[email protected]>
Reverse the transformation, converting TEI XML into ATF.
- Works well enough to restore the SI...
ae5a5eca0e66f0b1e3d6e75e3dcbdc217e8a2d88 authored over 5 years ago by Ralph Giles <[email protected]>Apply double-P-number urn scheme
1af7b5855c62a387c035e0b8866ff8355ec41399 authored over 5 years ago by Ralph Giles <[email protected]>
Section objects may have an optional name, `Surface a` and so on.
We use the object name as the ...
These docstrings and sketch code are no longer necessary.
Setting the urn and encodingDesc have ...
Stop creating a test umbrella text group and let the converter
fall back to generating a textgro...
For testing we've been placing all the converted documents in
a `test` textgroup. That's fine wh...
This allows better control over the formatting of the output.
d98b3fa30a4b975702c5327ca13baf48a91c98e8 authored over 5 years ago by Ralph Giles <[email protected]>
We don't need three copies of this code; it's clear enough from
the function name what's happening.
Recommendations are for the edition urn to have a component
reflecting the project producing the...
The cts serialization objects are about CTS elements.
4a3879cb806ab86c926c3b765fc9fbe790205342 authored over 5 years ago by Ralph Giles <[email protected]>e95b2e02e24d4a3fe2ee9e6d6432aa41a4fecf25 authored over 5 years ago by Ralph Giles <[email protected]>
Rather than asserting, this makes it more clear what's going on
if atf2cts is called on a non-at...
We're no longer writing these into document comments, and it's
easy to miss them as the number o...
This comment string wasn't updated when I changed the name of
the loop variable so we were asser...
Improve CTS compliance
4301b820d2c6e3f4504c85a6297d4f6c6f112c64 authored over 5 years ago by Ralph Giles <[email protected]>Use serialization classes to write out tei xml.
c85a960c965232eda675e181476d13fc5cfaa8cd authored over 5 years ago by Ralph Giles <[email protected]>
We now pass the capitains_units.cts tests except for
Duplicate passages. This is perhaps confuse...
Epidoc seems a better fit, and is what Perseus is moving to.
069ecf52f6bd1b450ec2034b20b129d158245ff9 authored over 5 years ago by Ralph Giles <[email protected]>
I don't think this is what we eventually want, but it restores
previous behaviour and matches th...
The final element of the line pattern needs to look for an <l> not
a <div> so constructing them ...
CTS guidelines specify the header should contain patterns for
constructing xpath pointers based ...
In atf there are marked by a trailing # on the sign, but in
presentation the sign should be in h...
This doesn't survive the new xml generation and needs to be
replaced code to generate an Element...
Serialize CTS metadata objects the same way we do TEI objects.
Note this changes the indent on _...
The default 8-space tabs are much too strong an indent for the
nesting of a typical tei document.
Create a class for the shared __str__ implementation so the way
xml is serialized is all in one ...
This brings the output back to near where it was before replacing
template-based xml generation ...
Set these in a more logical order.
2d855090b362354cae5c159413e5ceb5fa29baad authored over 5 years ago by Ralph Giles <[email protected]>
Although generic TextPart objects don't have a language, this
lets us inherit the code in both E...
This is general tei style, not epidoc, but add it now for complete
ness. Attributes still need t...
The parser gets this from the ATF and atf2cts needs it to construct
proper metadata files.
Instead of the fancy class, just return a flag tuple. Since boolean
values are numbers these can...
The convert function spawned in parallel for handle each atf
block individually was still return...
Now that atf2tei.convert returns a tei.Document instead of an
already-serialized string, we need...
This is less confusing than remembering it's document.parts.append
vs textpart.children.append.
Simplifies the caller code since in epidoc we're pretty much
always setting this reference attri...
Support the same header information we include in the current
atf2tei template.
It would be nic...
8c2c7b76124638eb675b9c296453a2e68e12565e authored over 5 years ago by Ralph Giles <[email protected]>144c87b0038be77f9db0a71c181f31affd80851d authored over 5 years ago by Ralph Giles <[email protected]>
These are single-line strings, so we don't need multi-line quotes.
I was trying to make it unifo...
This tag can't be used directly in a <sourceDesc> and needs to
be wrapped in something like a <b...
Convert documents in parallel
9bd0d5c5cc935f5a1f6e981d01c096e2506f2f7f authored over 5 years ago by Ralph Giles <[email protected]>
I added this to distinguish the progress message from other
debugging output, but there's less n...
Move the failure/success report to the end, after all atf files
given on the command line have b...
It turns out we successfully convert around 95% of the current
cdli atf blocks, so the bar didn'...
I only needed this to benchmark, and the default (the number
of cores on the machine) works fine.
Dispatch the atf conversion work through ProcessPoolExecutor,
which speeds things up by about 4x...
328763fcdf996ea1eb7d34d6ffdef25f1116d28d authored over 5 years ago by Ralph Giles <[email protected]>
Update travis badge to use cdli-gh fork.
ae532424ff07b92ea7395eaa410630db0bf09208 authored over 5 years ago by Ralph Giles <[email protected]>Verify tei attribute serialization.
c7c8cc4a13be64c616e2b33cd534e0532a956631 authored over 5 years ago by Ralph Giles <[email protected]>
Now that upstream is the cdli-gh repository, its status is the
more appropriate one to advertise.
Fix issues with the xml serialization class tests.
A Document sets an xml namespace attribute, ...
b69811fcec36f320f30e3d81760f8f7e9a69f035 authored over 5 years ago by Ralph Giles <[email protected]>Cleans up a bit and makes it easier to experiment with the harness.
9c17ab45ae753621bf2633a783a1ef123a0e2f5d authored over 5 years ago by Ralph Giles <[email protected]>b826b4454ceb9eb86089cd892600687f37e784ad authored over 5 years ago by Ralph Giles <[email protected]>
Hooktest requires there be a language marker. Default to English
since we don't have any languag...
Keep a list of failures and report the success ratio. I just wanted
to see how far along I was o...
Another possible punctutation sign was missing from the regex.
346afb5bfdb4c14b7ff8376c0a84e9d433679f16 authored over 5 years ago by Ralph Giles <[email protected]>
The regex was missing some possible trailing characters.
See for example P005984 and P292940.
Add some classes to model the major pieces which can create and
ElementTree representation for s...
3e6c649659fd4b3179079a08b3a6bef68a67ee3e authored over 5 years ago by Ralph Giles <[email protected]>
These are part of the Capitains Guidelines. Just default to
English for lack of a better option,...
The _ atf markup sometimes spans multiple words, which causes
nesting problems if words are also...
Fix pycodestyle lint.
5db4097845cb5b89eee6bccecd4307ddbcd7f755 authored over 5 years ago by Ralph Giles <[email protected]>Better aligns with markup from the oracc adart conversions.
79116896c85a347930f28b5e875f6e9ffa286a6c authored over 5 years ago by Ralph Giles <[email protected]>This aligns better with the oracc adart encodings.
39c77e399012f16822adbc673eabdf0a50e77c71 authored over 5 years ago by Ralph Giles <[email protected]>
Previously we were generating sequential line numbers so the tei
labels were regular. But since ...
This is working reliably now.
ece6a6706573bf42ff8d4df71f2238bf2a6dca69 authored over 5 years ago by Ralph Giles <[email protected]>
Currently pyoracc doesn't parse interlinear translations, so the
Translation object only shows u...
The triple-quote is redundant with the indent, and we don't
really need syntax highlighting, so ...
Reduces clutter in the top-level directory.
We seem to need the dummy __init__.py to make impor...
7f7e5eb35342450e250b2f8617802297d6305ce7 authored over 5 years ago by Ralph Giles <[email protected]>
Use cts.TextGroup and cts.Work to construct and write out the
corresponding metadata files.
Not...
2b826f5b7031ba1061e511784dc36cb1ff95b0b3 authored over 5 years ago by Ralph Giles <[email protected]>
This encapsulates the data needed for the Canonical Text
Services file hierarchy and makes it si...
Properly handle logogram markup next to punctuation.
Issues found with P464358 Codex Hammurapi c...
This matches the lang suffix on the edition urn in the __cts__.xml
metadata file. I think this i...
The title can of course contain characters like '&'.
abfbe10e0baa1bf8273c6073484f313c0f7be66b authored over 5 years ago by Ralph Giles <[email protected]>Calling islice is an unnecessary step here.
f87bb38680230287e0defe7f5754044bbb4f204b authored over 5 years ago by Ralph Giles <[email protected]>
Verify multiple copies segment into the correct number of atf blocks.
This is redundant with `te...
Avoid repeating the name in each test.
669422e5ed5a43beae2ebc6ec4a4e7a6e7d28236 authored over 5 years ago by Ralph Giles <[email protected]>Just verify that something comes out.
5c9f296eac09afa9052513e9f465c790a80394ad authored over 5 years ago by Ralph Giles <[email protected]>
Wrapper around atf2tei which segments a multi-object atf file,
converts each segment to xml and ...
The pyoracc parser also takes a string, but this simplifies
passing in something from an externa...