django

Open Scriptures sandbox API server

For the past few months I've been hosting a "sandbox" server for the Open Scriptures API. The purpose is to provide a proof-of-concept for the code, and to ensure that we know how to make the code work in production. It has been an adventure, since the code and its requirements have changed many times.

Tonight I reached an important milestone with the API server: I am now serving the API with a the full Tischendorf Greek New Testament text, using mysql as the database. Previously I had been only hosting the book of Jude, and only using sqlite for the database. So now I have a fully-functioning instance, with a full dataset, in a real production environment.

As an exmaple, you can visit here: http://api.ossandbox.info/texts/passage/Bible.Tischendorf:2Thess.1.1-2Thess.1.3 (that's 2 Thessalonians 1:1-3 in an XML format).

If you are interested in the project and would like to know more, check out our mailing list, and visit us on IRC in #openscriptures on irc.freenode.net.

(For the technically inclined, Open Scriptures is group of Django apps, bundled in Pinax. It is being served by Apache and mod_wsgi. Thanks especially to Patrick Altman and Brian Rosner for helping me get things up and running.)

First Open Scriptures Hackfest

Over the long weekend I got together with Weston, and we did some work on Open Scriptures. This resulted in a storm of commits, and a much better code base. We have been fortunate enough to be joined by Patrick Altman, whose experience with Django led to some immediate improvements in the structure and functionality of the code. This in turn has inspired other work, and the project as a whole is moving a decent pace at the moment.

We are hanging out in #openscriptures on irc.freenode.net. If you are so inclined to drop in and learn more about the project, feel free to join the room.

Derived classes: not so fast

For the Open Scriptures API I was trying to find an elegant way to store parsing information in a language-agnostic way in a Django model. To achieve that end, I chose to use derived classes. My thought was that we could use a generic class (TokenParsing) as an abstraction layer through which the language-specific parsing models could be connected to our metadata class (TokenMeta). Those parsings could then be retrieved using this bit of magic:

TokenParsing.objects.get(tokenmeta = self)

I thought that this would return a list of all the objects subclassed TokenParsing which linked to the metadata object. However, when subclassed objects are created in Python and stored in Django, it actually creates two instances, one of the base class, and one of the derived class. So this function returns only the base class objects, which have no useful information in them. In other words, the derived, language-specific parsing class (e.g. TokenParsing_grc) can have a link to the TokenMeta object, but there is no way to traverse that link backwards from the TokenMeta object. Without that, the API would not be able to query metadata properly. I poked around at other possible methods for achieving this (e.g. abstract base classes), but none of them would achieve my goal. So I have decided to revert to the original method. But as it turns out, that is not a bad thing. In my quest to find a way to express parsings in a language-agnostic way, I overlooked one glaring problem: a language-agnostic solution would be useless. This came home when I was searching for a solution to this problem: Read more »

Morphological v. Semantic Parsing and Databases

I proposed an initial Django model for storing Greek parsing data in the Open Scriptures mailing list and it has generated a good amount of discussion. The central question is whether we should follow traditional yet problematic morphological parsing paradigm, or whether we should seek to implement a semantic paradigm. Mike Aubrey has written some good posts on the problems with the traditional paradigm (e.g. Robertson on the middle and passive voice). Luckily with Django we can have an arbitrary number of parsing models for any given word. So from a technical standpoint, it is not a question of which model, so long as that model can be sensibly reduced to database fields. From a grammatical point of view, I have mixed feelings. I think that there are some real problems with the traditional system, especially in terms of its terminology and treatment of "tense" and voice. I think there is some value in purely morphological descriptions (especially insofar as they provide an objective description of the word), but that should not be the end-all of understanding a word. And I tend to agree  that the introduction of a new technology paradigm (i.e. the Open Scriptures API) may be a good time to introduce new parsing paradigms. Still, most people who have learned Greek are rooted in the traditional paradigm, so Open Scriptures should contain parsing information they can understand. Also, there are many existing datasets using the traditional paradigm which we would like to import and utilize. So I think it best for Open Scriptures to be able to store the morphological parsings (though not to the exclusion of other paradigms). Read more »

Scripture and APIs

I've been having some correspondance on the Open Scriptures mailing list. Weston has been working on implementing database models for an API using Django. One of the most challenging aspects has been finding out how to provide structural information to the text: verses, chapters, title headings, etc. There's also been a desire to not rely on any particular structural marker in the database. So the base unit for storing the text is what is called a Token in the project. It is comprised by one of the three atomic structures of a text - word; punctuation; whitespace. Of course, there may be cases where even the basic Token can be split, but you've got to start somewhere. To provide structure, Weston has proposed a Token linkage system, where you can record a certain structure (e.g. "Verse 12") and using the features of a relational database, connect it to the tokens which should be included in that structure. There is even a feature for non-linear token linkages, if anyone finds a use for that. I am optimistic about the potential of this particular project. Once the API is nailed down, there will be a lot of great opportunities for "client" apps, using whatever framework they wish. Until then, the API has to be finalized and garnished with built-in methods, and the models have to be tested with real data (which requires that the data be ported to the models in the first place). At any rate, it's a good time to be interested in the scriptures and open source software. My experience with databases is not the strongest, but I am pleased that the project is using a Python framework, since it is my best language. But it's also fun to bring my education to bear on technical problems - sort of a perfect storm of personal interest for me. Read more »

Syndicate content