Archive for the ‘technology’ Category

Odds and ends

Sunday, March 7th, 2010
  • I decided to freshen up the look of my eeePC 900HD (running Ubuntu Netbook Remix):eeePC screenshot
  • I really like Linux Mint. You should check it out, and possibly install it on your computer.
  • Star Trek: The Next Generation was the best television series of all time.
  • The Chinese word for “crisis” is composed of elements that signify “cliche” and “etymological fallacy.”
  • Eric Gagne is suiting up as a Dodger for spring training. Good luck to him, I’d love to see him back on the team.
  • I decided to finally and permanently retire my Apple Dual 1.0 GHz G4 Powermac (mirrored drive door). It served faithfully running Gentoo Linux for a long time, but it was time to move on.

Derived classes: not so fast

Tuesday, February 23rd, 2010

For the Open Scriptures API I was trying to find an elegant way to store parsing information in a language-agnostic way in a Django model. To achieve that end, I chose to use derived classes. My thought was that we could use a generic class (TokenParsing) as an abstraction layer through which the language-specific parsing models could be connected to our metadata class (TokenMeta). Those parsings could then be retrieved using this bit of magic:

TokenParsing.objects.get(tokenmeta = self)

I thought that this would return a list of all the objects subclassed TokenParsing which linked to the metadata object. However, when subclassed objects are created in Python and stored in Django, it actually creates two instances, one of the base class, and one of the derived class. So this function returns only the base class objects, which have no useful information in them. In other words, the derived, language-specific parsing class (e.g. TokenParsing_grc) can have a link to the TokenMeta object, but there is no way to traverse that link backwards from the TokenMeta object. Without that, the API would not be able to query metadata properly.

I poked around at other possible methods for achieving this (e.g. abstract base classes), but none of them would achieve my goal. So I have decided to revert to the original method.

But as it turns out, that is not a bad thing. In my quest to find a way to express parsings in a language-agnostic way, I overlooked one glaring problem: a language-agnostic solution would be useless. This came home when I was searching for a solution to this problem:

Why would this be useful? If you don’t know what class you want, you also won’t know which methods to call or which attributes can be inspected.

Of course! In order to make use of the real parsings (their methods, attributes, etc.), the API code would have to know the specifics of the models. “Language-agnostic” models would be useless. Since the API already has to be customized to some extent for each language, there is not point in having a generic TokenParsing class to clutter things up.

Fool me once.

Morphological v. Semantic Parsing and Databases

Monday, February 22nd, 2010

I proposed an initial Django model for storing Greek parsing data in the Open Scriptures mailing list and it has generated a good amount of discussion. The central question is whether we should follow traditional yet problematic morphological parsing paradigm, or whether we should seek to implement a semantic paradigm. Mike Aubrey has written some good posts on the problems with the traditional paradigm (e.g. Robertson on the middle and passive voice).

Luckily with Django we can have an arbitrary number of parsing models for any given word. So from a technical standpoint, it is not a question of which model, so long as that model can be sensibly reduced to database fields.

From a grammatical point of view, I have mixed feelings. I think that there are some real problems with the traditional system, especially in terms of its terminology and treatment of “tense” and voice. I think there is some value in purely morphological descriptions (especially insofar as they provide an objective description of the word), but that should not be the end-all of understanding a word. And I tend to agree  that the introduction of a new technology paradigm (i.e. the Open Scriptures API) may be a good time to introduce new parsing paradigms.

Still, most people who have learned Greek are rooted in the traditional paradigm, so Open Scriptures should contain parsing information they can understand. Also, there are many existing datasets using the traditional paradigm which we would like to import and utilize. So I think it best for Open Scriptures to be able to store the morphological parsings (though not to the exclusion of other paradigms). It was suggested that we might be able to provide an automated mapping between different paradigms. Assuming there is a consistent correlation between the two schemes, that should not be overly difficult. If not, someone will need to generate a new database of parsings, and that will be no small task.

Update: It turns out that Django dervied classes are not the answer to this problem. I’ll elaborate later.

Scripture and APIs

Thursday, February 18th, 2010

I’ve been having some correspondance on the Open Scriptures mailing list. Weston has been working on implementing database models for an API using Django. One of the most challenging aspects has been finding out how to provide structural information to the text: verses, chapters, title headings, etc. There’s also been a desire to not rely on any particular structural marker in the database. So the base unit for storing the text is what is called a Token in the project. It is comprised by one of the three atomic structures of a text – word; punctuation; whitespace. Of course, there may be cases where even the basic Token can be split, but you’ve got to start somewhere.

To provide structure, Weston has proposed a Token linkage system, where you can record a certain structure (e.g. “Verse 12″) and using the features of a relational database, connect it to the tokens which should be included in that structure. There is even a feature for non-linear token linkages, if anyone finds a use for that.

I am optimistic about the potential of this particular project. Once the API is nailed down, there will be a lot of great opportunities for “client” apps, using whatever framework they wish. Until then, the API has to be finalized and garnished with built-in methods, and the models have to be tested with real data (which requires that the data be ported to the models in the first place). At any rate, it’s a good time to be interested in the scriptures and open source software. My experience with databases is not the strongest, but I am pleased that the project is using a Python framework, since it is my best language. But it’s also fun to bring my education to bear on technical problems – sort of a perfect storm of personal interest for me.

Note: This post has been adapted and cross-posted on the Open Scriptures blog.

Grammar and the machine (links)

Monday, February 1st, 2010

From around the internet:

The last two are thanks to Jesus Radicals. There is of course some irony in blogging about an online video containing a critique of technology.

Browser wars heat up

Monday, January 11th, 2010

I just benchmarked Firefox against Chromium on the SunSpider Javascript tests on my eeePC 900 running Ubuntu Netbook Remix:

  • Firefox: 6034.2ms +/- 1.6%
  • Chromium: 2537.8ms +/- 2.3%

Looks like I have a new browser. This actually follows my recent conversion from Safari to Chrome on Mac. Firefox has good extensions and nostalgia, but they better turn up the heat on development.

The truth about content management systems

Monday, January 11th, 2010

Drupal is an excellent CMS, but nothing can beat Wordpress for blogging. So I’ve decided to move back the familiar old friend Wordpress for my blog. Any substantive content will still be posted on the Compositions section of my site. The blog will be reserved for questions, comments, concerns, gestures, and other ephemera.

The Berry Blogger's Dilemma

Wednesday, August 26th, 2009

There is something slightly embarrassing for those readers of Wendell Berry who first discovered his work on the internet. I myself fit in to this category. It is a sure sign of being a Berry neophyte (note the agricultural metaphor), since someone who is initiated to his thought would know better than to approach his work through an electronic medium.

The reason for this is twofold. First, Berry himself has chosen not to use computers. He rejects the premise that computers increase the quality of writing. I believe that we can infer that his opinion of the internet and blogging would fare no better than the technology upon which they are based. There is something supremely ironic about reading about a man’s case against the computer on the internet.

A second reason can be derived from Berry’s thoughts on energy. Berry is a conservationist. He does not seem to mind, however, writing and purchasing works printed on the remains of trees. As an important conveyor of our culture (which he values quite highly), books are a worthy expenditure of natural resources. Another factor in favor of printed books as a medium is that they are durable. That is, one book, if cared for and shared liberally, might spread its value to many people over many years. I suspect that a calf-skin codex would be even better in Berry’s estimation, since it could even last 1,600 years and bless millions.

The internet as a storage medium is quite the opposite of books with respect to energy. A Wendell Berry article is in no way durable when conveyed electronically. Information online is ephemeral. Rather than requiring a fixed amount of resources at production like a book, an online article requires electricity each time it is accessed, even if by the same individual. This electricity is typically generated in an unsustainable and polluting manner (both are anathema to Berry). Therefore I must come to the uncomfortable conclusion that Berry himself might condemn the reading of his articles online. I would hazard to guess that he is blissfully ignorant, however.

So this is my formulation of the Berry bloggers’ dilemma: to blog about Wendell Berry is to contradict his writings. Indeed, if I myself become a full-fledged adherent of Berry’s thought, I would have no choice but to quit blogging and disconnect from the internet entirely. The internet is a terrible example of an increasing volume of decreasingly useful information being disseminated to an increasingly large audience, all at the expense of non-renewable energy. So if I one day disappear completely from the internet, blame first Wendell Berry.

Open Scriptures

Wednesday, April 8th, 2009

I think I have a new favorite software project. Open Scriptures is

a repository for Biblical manuscripts and their translations, and a system for storing the differences between manuscripts and their relationships to versions expressed by semantic links: it seeks to represent the textual transmission of the Bible and, on top of this foundation:

  1. supply an Open interface for querying interlinked scriptural data,
  2. store derived data in an internationalized (i18n) and translation-neutral manner, and
  3. provide an application platform for mashing this data up into scriptural applications with a framework for discussion and collaboration.

In addition to a software platform, I think serious consideration should be given to producing freely licensed texts and morphological databases. Unfortunately, most modern eclectic biblical texts and associated tagging databases have restrictive copyright licenses, which can cause problems. I will be posting anew my views on how the scriptures should be licensed, but in the meantime, there is the practical problem of availability. Therefore I am inquiring about assisting the Open Scriptures project.