Software versioned in Git or Git-related posts.

rss2email

Available in a git repository.
Repository: rss2email
Browsable repository: rss2email
Author: W. Trevor King

Since November 2012 I've been maintaining rss2email, a package that converts RSS or Atom feeds to email so you can follow them with your mail user agent. Rss2email was created by the late Aaron Swartz and maintained for several years by Lindsey Smith. I've added a mailing list (hosted with mlmmj) and PyPI package and made the GitHub location the homepage.

Overall, setting up the standard project infrastructure has been fun, and it's nice to see interest in the newly streamlined code picking up. The timing also works out well, since the demise of Google Reader may push some talented folks in our direction. I'm not sure how visible rss2email is, especially the fresh development locations, hence this post ;). If you know anyone who might be interested in using (or contributing to!) rss2email, please pass the word.

catalyst

Available in a git repository.
Repository: catalyst-swc
Browsable repository: catalyst-swc
Author: W. Trevor King

Catalyst is a release-building tool for Gentoo. If you use Gentoo and want to roll your own live CD or bootable USB drive, this is the way to go. As I've been wrapping my head around catalyst, I've been pushing my notes upstream. This post builds on those notes to discuss the construction of a bootable ISO for Software Carpentry boot camps.

Getting a patched up catalyst

Catalyst has been around for a while, but the user base has been fairly small. If you try to do something that Gentoo's Release Engineering team doesn't do on a regular basis, built in catalyst support can be spotty. There's been a fair amount of patch submissions an gentoo-catalyst@ recently, but patch acceptance can be slow. For the SWC ISO, I applied versions of the following patches (or patch series) to 37540ff:

Configuring catalyst

The easiest way to run catalyst from a Git checkout is to setup a local config file. I didn't have enough hard drive space on my local system (~16 GB) for this build, so I set things up in a temporary directory on an external hard drive:

$ cat catalyst.conf | grep -v '^#\|^$'
digests="md5 sha1 sha512 whirlpool"
contents="auto"
distdir="/usr/portage/distfiles"
envscript="/etc/catalyst/catalystrc"
hash_function="crc32"
options="autoresume kerncache pkgcache seedcache snapcache"
portdir="/usr/portage"
sharedir="/home/wking/src/catalyst"
snapshot_cache="/mnt/d/tmp/catalyst/snapshot_cache"
storedir="/mnt/d/tmp/catalyst"

I used the default values for everything except sharedir, snapshot_cache, and storedir. Then I cloned the catalyst-swc repository into /mnt/d/tmp/catalyst.

Portage snapshot and a seed stage

Take a snapshot of the current Portage tree:

# catalyst -c catalyst.conf --snapshot 20130208

Download a seed stage3 from a Gentoo mirror:

# wget -O /mnt/d/tmp/catalyst/builds/default/stage3-i686-20121213.tar.bz2 \
>   http://distfiles.gentoo.org/releases/x86/current-stage3/stage3-i686-20121213.tar.bz2

Building the live CD

# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage1-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage2-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage3-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-livecd-stage1-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-livecd-stage2-i686-2013.1.spec

isohybrid

To make the ISO bootable from a USB drive, I used isohybrid:

# cp swc-x86.iso swc-x86-isohybrid.iso
# isohybrid iso-x86-isohybrid.iso

You can install the resulting ISO on a USB drive with:

# dd if=iso-x86-isohybrid.iso of=/dev/sdX

replacing replacing X with the appropriate drive letter for your USB drive.

With versions of catalyst after d1c2ba9, the isohybrid call is built into catalysts ISO construction.

Mutt-LDAP

Available in a git repository.
Repository: mutt-ldap
Browsable repository: mutt-ldap
Author: W. Trevor King

I wrote this Python script to query an LDAP server for addresses from Mutt. In December 2012, I got some patches from Wade Berrier and Niels de Vos. Anything interesting enough for others to hack on deserves it's own repository, so I pulled it out of my blog repository (linked above, and mirrored on GitHub).

The README is posted on the PyPI page.

Reading IGOR files from Python

Available in a git repository.
Repository: igor
Browsable repository: igor
Author: W. Trevor King

This is the home page for the igor package, Python modules for reading files written by WaveMetrics IGOR Pro. Note that if you're designing a system, HDF5 is almost certainly a better choice for your data file format than IBW or PXP. This package exists for those of you who's data is already stuck in an IGOR format.

History

When I joined Prof. Yang's lab, there was a good deal of data analysis code written in IGOR, and a bunch of old data saved in IGOR binary wave (IBW) and packed experiment (PXP) files. I don't use MS Windows, so I don't run IGOR, but I still needed a way to get at the data. Luckily, the WaveMetrics folks publish some useful notes which explain the fundamentals of these two file formats (TN003 for IBW and PTN003 for PXP). The file formats are in a goofy format, but strings pulls out enough meat to figure out what's going on.

For a while I used a IBW → ASCII reader that I coded up in C, but when I joined the Hooke project during the winter of 2009–2010, I translated the reader into Python to support the drivers for data from Asylum Research's MFP-* and related microscopes. This scratched my itch for a few years.

Fast forward to 2012, and for the first time I needed to extract data from a PXP file. Since my Python code only supported IBW's, I searched around and found igor.py by Paul Kienzle Merlijn van Deen. They had a PXP reader, but no reader for stand-alone IBW files. I decided to merge the two projects, so I split my reader out of the Hooke repository and hacked up the Git repository referenced above. Now it's easy to get a hold of all that useful metadata in a hurry. No writing ability yet, but I don't know why you'd want to move data that direction anyway ;).

Parsing dynamic structures with Python

The IGOR file formats rely on lots of shenanigans with C structs. To meld all the structures together in a natural way, I've extended Python's standard struct library to support arbitrary nesting and dynamic fields. Take a look at igor.struct for some examples. This framework makes it easy to load data from structures like:

struct vector {
  unsigned int length;
  short data[length];
};

With the standard struct module, you'd read this using the functional approach:

>>> import struct
>>> buffer = b'\x00\x00\x00\x02\x01\x02\x03\x04'
>>> length_struct = struct.Struct('>I')
>>> length = length_struct.unpack_from(buffer)[0]
>>> data = struct.unpack_from('>' + 'h'*length, buffer, length_struct.size)
>>> print(data)
(258, 772)

This obviously works, but keeping track of the offsets, byte ordering, etc. can be tedious. My igor.struct package allows you to use a more object oriented approach:

>>> from pprint import pprint
>>> from igor.struct import Field, DynamicField, DynamicStructure
>>> class DynamicLengthField (DynamicField):
...     def pre_pack(self, parents, data):
...         "Set the 'length' value to match the data before packing"
...         vector_structure = parents[-1]
...         vector_data = self._get_structure_data(
...             parents, data, vector_structure)
...         length = len(vector_data['data'])
...         vector_data['length'] = length
...         data_field = vector_structure.get_field('data')
...         data_field.count = length
...         data_field.setup()
...     def post_unpack(self, parents, data):
...         "Adjust the expected data count to match the 'length' value"
...         vector_structure = parents[-1]
...         vector_data = self._get_structure_data(
...             parents, data, vector_structure)
...         length = vector_data['length']
...         data_field = vector_structure.get_field('data')
...         data_field.count = length
...         data_field.setup()
>>> dynamic_length_vector = DynamicStructure('vector',
...     fields=[
...         DynamicLengthField('I', 'length'),
...         Field('h', 'data', count=0, array=True),
...         ],
...     byte_order='>')
>>> vector = dynamic_length_vector.unpack(buffer)
>>> pprint(vector)
{'data': array([258, 772]), 'length': 2}

While this is overkill for such a simple example, it scales much more cleanly than an approach using the standard struct module. The main benefit is that you can use Structure instances as format specifiers for Field instances. This means that you could specify a C structure like:

struct vectors {
  unsigned int length;
  struct vector data[length];
};

With:

>>> dynamic_length_vectors = DynamicStructure('vectors',
...     fields=[
...         DynamicLengthField('I', 'length'),
...         Field(dynamic_length_vector, 'data', count=0, array=True),
...         ],
...     byte_order='>')

The C code your mimicking probably only uses a handful of dynamic approaches. Once you've written classes to handle each of them, it is easy to translate arbitrarily complex nested C structures into Python representations.

The pre-pack and post-unpack hooks also give you a convenient place to translate between some C struct's funky format and Python's native types. You take care off all that when you define the structure, and then any part of your software that uses the structure gets the native version automatically.

curses_check_for_keypress

Available in a git repository.
Repository: curses-check-for-keypress
Browsable repository: curses-check-for-keypress
Author: W. Trevor King

There are some points in my experiment control code where the program does something for an arbitrary length of time (e.g, waits while the operator manually adjusts a laser's alignment). For these situations, I wanted to be able to loop until the user pressed a key. This is a simple enough idea, but the implementation turned out to be complicated enough for me to spin it out as a stand-alone module.

pyassuan

Available in a git repository.
Repository: pyassuan
Browsable repository: pyassuan
Author: W. Trevor King

I've been trying to come up with a clean way to verify detached PGP signatures from Python. There are a number of existing approaches to this problem. Many of them call gpg using Python's multiprocessing or subprocess modules, but to verify detached signatures, you need to send the signature in on a separate file descriptor, and handling that in a way safe from deadlocks is difficult. The other approach, taken by PyMe is to wrap GPGME using SWIG, which is great as far as it goes, but development seems to have stalled, and I find the raw GPGME interface excessively complicated.

The GnuPG tools themselves often communicate over sockets using the Assuan protocol, and I'd already written an Assuan server to handle pinentry (originally for my gpg-agent post, not part of pyassuan). I though it would be natural if there was a gpgme-agent which would handle cryptographic tasks over this protocol, which would make the pgp-mime implementation easier. It turns out that there already is such an agent (gpgme-tool), so I turned my pinentry script into the more general pyassuan package. Now using Assuan from Python should be as easy (or easier?) than using it from C via libassuan.

The README is posted on the PyPI page.

pygrader

Available in a git repository.
Repository: pygrader
Browsable repository: pygrader
Author: W. Trevor King

The last two courses I've TAd at Drexel have been scientific computing courses where the students are writing code to solve homework problems. When they're done, they email the homework to me, and I grade it and email them back their grade and comments. I've played around with developing a few grading frameworks over the years (a few years back, one of the big intro courses kept the grades in an Excel file on a Samba share, and I wrote a script to automatically sync local comma-separated-variable data with that spreadsheet. Yuck :p), so I figured this was my change to polish up some old scripts into a sensible system to help me stay organized. This system is pygrader.

During the polishing phase, I was searching around looking for prior art ;), and found that Alex Heitzmann had already created pygrade, which is the name I under which I had originally developed my own project. While they are both grade databases written in Python, Alex's project focuses on providing a more integrated grading environment.

Pygrader accepts assignment submissions from students through its mailpipe command, which you can run on your email inbox (or from procmail). Students submit assignments with an email subject like

[submit] <assignment name>

mailpipe automatically drops the submissions into a student/assignment/mail mailbox, extracts any MIME attachments into the student/assignment/ directory (without clobbers, with proper timestamps), and leaves you to get to work.

Pygrader also supports multiple graders through the mailpipe command. The other graders can request a student's submission(s) with an email subject like

[get] <student name>, <assignment name>

Then they can grade the submission and mail the grade back with an email subject like

[grade] <student name>, <assignment name>

The grade-altering messages are also stored in the student/assignment/mail mailbox, so you can peruse them later.

Pygrader doesn't spawn editors or GUIs to help you browse through submissions or assigning grades. As far as I am concerned, this is a good thing.

When you're done grading, pygrader can email (email) your grades and comments back to the students, signing or encrypting with pgp-mime if either party has configured a PGP key. It can also email a tab-delimited table of grades to the professors to keep them up to speed. If you're running mailpipe via procmail, responses to grade request are sent automatically.

While you're grading, pygrader can search for ungraded assignments, or for grades that have not yet been sent to students (todo). It can also check for resubmissions, where new submissions come in response to earlier grades.

The README is posted on the PyPI page.

update-copyright

Available in a git repository.
Repository: update-copyright
Browsable repository: update-copyright
Author: W. Trevor King

A few years ago I was getting tired of having missing or out-of-date copyright blurbs in packages that I was involved with (old license text, missing authors, etc.). This is important stuff, but not the kind of thing that is fun to maintain by hand. I wrote a script for bugs everywhere that automated the process, using the version control system to extract lists of authors and dates for each file. The script was great, so I ported it into a few other projects I was involved in.

This month I realized that it would be much easier to just break the script out into its own package, and only maintain a config file in each of the projects that use it. I don't know why this didn't occur to me years ago :p. Anyhow, here it is! Enjoy.

The README, with usage details, is posted on the PyPI page.

Serving Git on Gentoo

Today I decided to host all my public Git repositories on my Gentoo server. Here's a quick summary of what I did.

Gitweb

Re-emerge git with the cgi USE flag enabled.

# echo "dev-util/git cgi" >> /etc/portage/package.use/webserver
# emerge -av git

Create a virtual host for running gitweb:

# cat > /etc/apache2/vhosts.d/20_git.example.net_vhost.conf << EOF
<VirtualHost *:80>
    ServerName git.example.net
    DocumentRoot /usr/share/gitweb
    <Directory /usr/share/gitweb>
        Allow from all
        AllowOverride all
        Order allow,deny
        Options ExecCGI
        <Files gitweb.cgi>
            SetHandler cgi-script
        </Files>
    </Directory>
    DirectoryIndex gitweb.cgi
    SetEnv  GITWEB_CONFIG  /etc/gitweb.conf
</VirtualHost>
EOF

Tell gitweb where you keep your repos:

# echo "\$projectroot = '/var/git';" > /etc/gitweb.conf

Tell gitweb where people can pull your repos from:

# echo "@git_base_url_list = ( 'git://example.net', ); >> /etc/gitweb.conf

Restart Apache:

# /etc/init.d/apache2 restart

Add the virtual host to your DNS server.

# emacs /etc/bind/pri/example.net.internal
...
git  A  192.168.0.2
...

Restart the DNS server.

# /etc/init.d/named restart

If names aren't showing up in the Owner column, you can add them to the user's /etc/passwd comment with

# usermod -c 'John Doe' jdoe

Thanks to Phil Sergi for his own summary, which I've borrowed from heavily.

Git daemon

Gitweb allows browsing repositories via HTTP, but if you will be pulling from your repositories using the git:// protocol, you'll also want to run git-daemon. On Gentoo, this is really easy, just edit /etc/conf.d/git-daemon as you see fit. I added --verbose, --base-path=/var/git and --export-all to GITDAEMON_OPTS. Start the daemon with

# /etc/init.d/git-daemon start

Add it to your default runlevel with

# rc-update add git-daemon default

If you're logging to syslog and running syslog-ng, you can configure the log location using the usual syslog tricks. See my syslog-ng for details.

cookbook

Available in a git repository.
Repository: cookbook
Browsable repository: cookbook
Author: W. Trevor King

I've been running a home-rolled recipe webapp for a year now, and it worked fairly well in a bare-bones sort of way. However, I recently had to make some changes to my personal website (since EveryDNS and aparently most other free DNS providers were bought by Dyn), which prompted me to translate cookbook into a Django app. Thanks to the wonders of Django, Grappelli, and django-taggit, the code is now leaner, meaner, and prettier!

See the README for details.