I proposed an initial Django model for storing Greek parsing data in the Open Scriptures mailing list and it has generated a good amount of discussion. The central question is whether we should follow traditional yet problematic morphological parsing paradigm, or whether we should seek to implement a semantic paradigm. Mike Aubrey has written some good posts on the problems with the traditional paradigm (e.g. Robertson on the middle and passive voice).
Luckily with Django we can have an arbitrary number of parsing models for any given word. So from a technical standpoint, it is not a question of which model, so long as that model can be sensibly reduced to database fields.
From a grammatical point of view, I have mixed feelings. I think that there are some real problems with the traditional system, especially in terms of its terminology and treatment of “tense” and voice. I think there is some value in purely morphological descriptions (especially insofar as they provide an objective description of the word), but that should not be the end-all of understanding a word. And I tend to agree that the introduction of a new technology paradigm (i.e. the Open Scriptures API) may be a good time to introduce new parsing paradigms.
Still, most people who have learned Greek are rooted in the traditional paradigm, so Open Scriptures should contain parsing information they can understand. Also, there are many existing datasets using the traditional paradigm which we would like to import and utilize. So I think it best for Open Scriptures to be able to store the morphological parsings (though not to the exclusion of other paradigms). It was suggested that we might be able to provide an automated mapping between different paradigms. Assuming there is a consistent correlation between the two schemes, that should not be overly difficult. If not, someone will need to generate a new database of parsings, and that will be no small task.
Update: It turns out that Django dervied classes are not the answer to this problem. I’ll elaborate later.
