technology

Off the binary bandwagon

I had thought I had settled comfortably into the Debian Linux ecosystem, but I was struck by a strange urge the other day.

So I installed Gentoo.

On my eeePC 900 HD.

I'm not sure what possessed me to install a source-based distro on a netbook with a Celeron M processor. Yeah, I've been compiling a lot.

But I've also been having a lot of fun. Gentoo was my first distro, so it's like coming home.

Open Scriptures sandbox API server

For the past few months I've been hosting a "sandbox" server for the Open Scriptures API. The purpose is to provide a proof-of-concept for the code, and to ensure that we know how to make the code work in production. It has been an adventure, since the code and its requirements have changed many times.

Tonight I reached an important milestone with the API server: I am now serving the API with a the full Tischendorf Greek New Testament text, using mysql as the database. Previously I had been only hosting the book of Jude, and only using sqlite for the database. So now I have a fully-functioning instance, with a full dataset, in a real production environment.

As an exmaple, you can visit here: http://api.ossandbox.info/texts/passage/Bible.Tischendorf:2Thess.1.1-2Thess.1.3 (that's 2 Thessalonians 1:1-3 in an XML format).

If you are interested in the project and would like to know more, check out our mailing list, and visit us on IRC in #openscriptures on irc.freenode.net.

(For the technically inclined, Open Scriptures is group of Django apps, bundled in Pinax. It is being served by Apache and mod_wsgi. Thanks especially to Patrick Altman and Brian Rosner for helping me get things up and running.)

Openscriptures Meet Up: The Prequel

In advance of the planned first-ever Open Scriptures Meetup (OSMU) in September (to coincide with Djangocon), a few of us will be meeting at OSCON to take in the State of the Onion Address and then get some food. We don't want to pre-empt the primacy of the actual first-ever OSMU, so we're going to call this one the Pre-OSMU OSMU. It's a test run.

The Oregon Convention Center

Cryptographic hashes and RESTful URIs

In a recent post to the Open Scriptures mailing list, it was suggested that we use md5 (or another cryptographic hash) to generate unique IDs for each token (a "token" is the fundamental unit of text (most often a word) in our API database models). Today we discussed the implementation of this on IRC, and it was fairly stimulating.

First of all, md5 is broken and deprecated, due to possible collisions (two different pieces of data can result in the same hash). Since we will be dealing with millions of tokens, we decided not to test our luck, unlikely though a problem may be. SHA-256 has no known collisions, so we decided it was best to use that algorithm.

SHA-256 is implemented in Python's standard library hashlib, so that is good. For exapmle:

>>> import hashlib
>>> hashlib.sha256("Hello world!").digest()
'\xc0S^K\xe2\xb7\x9f\xfd\x93)\x13\x05Ck\xf8\x891NJ?\xae\xc0^\xcf\xfc\xbb}\xf3\x1a\xd9\xe5\x1a'

Needless to say, such a digest would not be very good for use in a RESTful URI scheme. So, hashlib also offers a hexadecimal option:

>>> hashlib.sha256("Hello world!").hexdigest()
'c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'

That is still not the best, since that makes for a very long string. So, we have the option of using base64 encoding:

>>> import base64
>>> base64.b64encode(hashlib.sha256("Hello world!").digest())
'wFNeS+K3n/2TKRMFQ2v4iTFOSj+uwF7P/Lt98xrZ5Ro='

That is shorter, but it includes the "/" character, which is a no-no for URI design. Luckily base64 includes a function for this exact purpose: Read more »

Sunday Roundup

A few things of note:

  • A bunch of folks from the Open Scriptures project are hanging out in irc: #openscriptures on irc.freenode.net
  • The MorphGNT site is active and rumbling again.
  • James Tauber and Patrick Altman's οχλος is a tool for collabrotive corpus linguistics. The demo task provides an interface to enter morphological parsings on the gospel of John, and Tauber is even working on a cooler interface. I had wanted to launch something like this, but smarter people are taking care of it. It is official: this will be the coolest site on the web when it is done.
  • Speaking of active and rumbling, Kim and I visited Mt. St. Helens today.
  • OSCON is this week in Portland!

First Open Scriptures Hackfest

Over the long weekend I got together with Weston, and we did some work on Open Scriptures. This resulted in a storm of commits, and a much better code base. We have been fortunate enough to be joined by Patrick Altman, whose experience with Django led to some immediate improvements in the structure and functionality of the code. This in turn has inspired other work, and the project as a whole is moving a decent pace at the moment.

We are hanging out in #openscriptures on irc.freenode.net. If you are so inclined to drop in and learn more about the project, feel free to join the room.

Single monitor dogfood

Tags:

I'm putting my money where my mouth is and going with a single monitor at work. Instead of two monitors, I'm relying on virtual desktops, as provided by VirtuaWin for Windows 7. Wish me luck.

How I learned to stop worrying and love Drupal

I have all but finished the migration of my blog from Wordpress to Drupal. Phew. All I have to do is recover a few more articles, and I will be done. At the end of it all, I am really pleased.

There was no sweat in actually importing my blog content. The Wordpress Import module worked exactly as advertized, importing posts and categories and tags. It even had an option to set up URL aliases so that the path of my blog posts would match the previous site. That worked, but not all the way, as explained below. Then I just had to fix any posts which had images, and I was good to go.

The most challenging aspect has been namespace. My blog was hosted in nathansmith.me/blog, while my existing Drupal site was in the root of that domain. So, when my posts were imported, I had to manually prepend "blog/" to all the paths in order to not break existing incoming links. Then, I used the pathauto module to generate aliases for my existing tags and category links in the "blog/" namespace. New posts and tags still go in the "blog/" space, to stay consistent.

Then, when I was browsing categories, I found that Drupal was not showing the child items in hierarchical categories (e.g. not showing posts tagged "liturgy" when viewing the "Christianity" category, even though the former is a subset of the latter). This was challenging to track down, especially since a module which fixed this was not ported to Drupal 6. Then I discovered that the Views module could do exactly what I needed, and even had a template for this exact task. Trust me when I tell you: Views is magic. Read more »

Swete LXX downloader

The Christian Classics Ethereal Library hosts scans of H.B. Swete's "Old Testament in Greek According to the Septuagint." It is a public domain LXX, including introduction and textual apparatus. If you find yourself desiring to store the images on your computer instead of viewing them through CCEL's website, you can use this Python script which I created. It will grab all of the PNG files of the text and apparatus and arrange them in order, by volume.

The script itself is not much, but I decided to license it under the GNU General Public License, version 3. This script uses some Python 2 syntax, so I might convert it for Python 3 at some point (though it's not so great a taks). If CCEL decides to change the structure of their site, it may break this script.

I considered hosting the finished product here, and may do so in the future, but for now I am going to preserve the bandwidth.

Where 1.0 > 2.0

Tags:

Stories of privacy breaches in social networking websites are becoming pretty common. Facebook, Google Buzz, Twitter, and many others have had recent and frequent glitches or intentional changes which compromised their users' privacy. It probably should not be assumed that anything uploaded to the web will remain private, in spite of "privacy controls." Social networking is software, and software has bugs. Additionally, Facebook and some other networks place some pretty heavy requirements on content uploads - you grant a license to Facebook to use your media (even if you "delete it" - nothing is ever really deleted from a proper database). Little wonder that someone wrote to Slashdot, wondering about an "open, distributed alternative to Facebook." I loved this response: Read more »

Syndicate content