Killing defunct processes

Allen Rueter points out that one way to kill a defunct process is to kill its parent or child:

# ps -ef | grep '<defunct>\|PPID'
UID   PID  PPID  C    STIME TTY      TIME CMD
zzz 13868     1  0                   0:00 <defunct>
# ps -ef | grep '13868\|PPID'
UID   PID  PPID  C    STIME TTY      TIME CMD
zzz 13868     1  0                   0:00 <defunct>1
zzz 16596 13868  0                   0:00 a.out
# kill -9 16596
Posted
DVD Backup

I've been using abcde to rip our audio CD collection onto our fileserver for a few years now. Then I can play songs from across the collection using MPD without having to dig the original CDs out of the closet. I just picked up a large external hard drive and thought it might be time to take a look at ripping our DVD collection as well.

There is an excellent Quick-n-Dirty Guide that goes into more detail on all of this, but here's an executive summary.

Make sure you're kernel understands the UDF file system:

$ grep CONFIG_UDF_FS /usr/src/linux/.congfig

If your kernel was compiled with CONFIG_IKCONFIG_PROC enabled, you could use

$ zcat /proc/config.gz | grep CONFIG_UDF_FS

instead, to make sure you're checking the configuration of the currently running kernel. If the udf driver was compiled as a module, make sure it's loaded.

$ sudo modprobe udf

Mount your DVD somewhere:

$ sudo mount /dev/dvd /mnt/dvd

Now you're ready to rip. You've got two options: you can copy the VOBs over directly, or rip the DVD into an alternative container format such as Matroska.

Vobcopy

Mirror the disc with vobcopy (media-video/vobcopy on Gentoo):

$ vobcopy -m -t "Awesome_Movie" -v -i /mnt/dvd -o ~/movies/

Play with Mplayer (media-video/mplayer on Gentoo):

$ mplayer -nosub -fs -dvd-device ~/movies/Awesome_Movie dvd://1

where -nosub and -fs are optional.

Matroska

Remux the disc (without reencoding) with mkvmerge (from MKVToolNix, media-video/mkvtoolnix on Gentoo):

$ mkvmerge -o ~/movies/Awesome_Movie.mkv /mnt/dvd/VIDEO_TS/VTS_01_1.VOB
(Processing the following files as well: "VTS_01_2.VOB", "VTS_01_3.VOB", "VTS_01_4.VOB", "VTS_01_5.VOB")

Then you can do all the usual tricks. Here's an example of extracting a slice of the Matroska file as silent video in an AVI container with mencoder (from Mplayer, media-video/mplayer on Gentoo):

$ mencoder -ss 00:29:20.3 -endpos 00:00:21.6 Awesome_Movie.mkv -nosound -of avi -ovc copy -o silent-clip.avi

Here's an example of extracting a slice of the Matroska file as audio in an AC3 container:

$ mencoder -ss 51.1 -endpos 160.9 Awesome_Movie.mkv -of rawaudio -ovc copy -oac copy -o audio-clip.ac3

You can also take a look through the Gentoo wiki and this Ubuntu thread for more ideas.

rsnapshot

rsnapshot is a backup utility that uses rsync to backup files over the network. Backups to my external Btrfs drive were dragging, with a lot of time and processor power spend running cp and rm. With a COW filesystem like btrfs, this is wasted processor time and disk activity, so I migrated to using Btrfs snapshots instead. Walter Werther has a good write up. The idea is to configure rsnapshot to use scripts that wrap btrfs instead of your system's cp and rm programs. Walter's put the relavant script on GitHub: cp and rm.

Posted
pyassuan

Available in a git repository.
Repository: pyassuan
Browsable repository: pyassuan
Author: W. Trevor King

I've been trying to come up with a clean way to verify detached PGP signatures from Python. There are a number of existing approaches to this problem. Many of them call gpg using Python's multiprocessing or subprocess modules, but to verify detached signatures, you need to send the signature in on a separate file descriptor, and handling that in a way safe from deadlocks is difficult. The other approach, taken by PyMe is to wrap GPGME using SWIG, which is great as far as it goes, but development seems to have stalled, and I find the raw GPGME interface excessively complicated.

The GnuPG tools themselves often communicate over sockets using the Assuan protocol, and I'd already written an Assuan server to handle pinentry (originally for my gpg-agent post, not part of pyassuan). I though it would be natural if there was a gpgme-agent which would handle cryptographic tasks over this protocol, which would make the pgp-mime implementation easier. It turns out that there already is such an agent (gpgme-tool), so I turned my pinentry script into the more general pyassuan package. Now using Assuan from Python should be as easy (or easier?) than using it from C via libassuan.

The README is posted on the PyPI page.

pygrader

Available in a git repository.
Repository: pygrader
Browsable repository: pygrader
Author: W. Trevor King

The last two courses I've TAd at Drexel have been scientific computing courses where the students are writing code to solve homework problems. When they're done, they email the homework to me, and I grade it and email them back their grade and comments. I've played around with developing a few grading frameworks over the years (a few years back, one of the big intro courses kept the grades in an Excel file on a Samba share, and I wrote a script to automatically sync local comma-separated-variable data with that spreadsheet. Yuck :p), so I figured this was my change to polish up some old scripts into a sensible system to help me stay organized. This system is pygrader.

During the polishing phase, I was searching around looking for prior art ;), and found that Alex Heitzmann had already created pygrade, which is the name I under which I had originally developed my own project. While they are both grade databases written in Python, Alex's project focuses on providing a more integrated grading environment and sharing grading responsibilities among multiple graders.

Pygrader don't support multiple graders (yet), and it doesn't spawn editors or GUIs to help you browse through submissions or assigning grades. It does pull submissions from your email inbox (or from procmail), automatically drop them into a student/assignment/mail mailbox, extract any MIME attachments into the student/assignment/ directory (without clobbers, with proper timestamps), and leave you to get to work (all via the mailpipe command).

When you're done grading, pygrader can email (email) your grades and comments back to the students, signing or encrypting with pgp-mime if either party has configured a PGP key. It can also email a tab-delimited table of grades to the professors to keep them up to speed.

While you're grading, pygrader can search for ungraded assignments, or for grades that have not yet been sent to students (todo). It can also check for resubmissions, where new submissions come in response to earlier grades.

The README is posted on the PyPI page.

Linking

This example shows the details of linking a simple program from three source files. There are three ways to link: directly from object files, statically from static libraries, or dynamically from shared libraries. If you're following along in my example source, you can compile the three flavors of the hello_world program with:

$ make

And then run them with:

$ make run

Compiling and linking

Here's the general compilation process:

  1. Write code in a human-readable language (C, C++, …).
  2. Compile the code to object files (*.o) using a compiler (gcc, g++, …).
  3. Link the code into executables or libraries using a linker (ld, gcc, g++, …).

Object files are binary files containing machine code versions of the human-readable code, along with some bookkeeping information for the linker (relocation information, stack unwinding information, program symbols, …). The machine code is specific to a particular processor architecture (e.g. x86-64).

Linking files resolves references to symbols defined in translation units, because a single object file will rarely (never?) contain definitions for all the symbols it requires. It's easy to get confused about the difference between compiling and linking, because you often use the same program (e.g. gcc) for both steps. In reality, gcc is performing the compilation on its own, but is using external utilities like ld for the linking. To see this in action, add the -v (verbose) option to your gcc (or g++) calls. You can do this for all the rules in the Makefile with:

make CC="gcc -v" CXX="g++ -v"

On my system, that shows g++ using /lib64/ld-linux-x86-64.so.2 for dynamic linking. On my system, C++ seems to require at least some dynamic linkning, but a simple C program like simple.c can be linked statically. For static linking, gcc uses collect2.

Symbols in object files

Sometimes you'll want to take a look at the symbols exported and imported by your code, since there can be subtle bugs if you link two sets of code that use the same symbol for different purposes. You can use nm to inspect the intermediate object files. I've saved the command line in the Makefile:

$ make inspect-object-files
nm -Pg hello_world.o print_hello_world.o hello_world_string.o
hello_world.o:
_Z17print_hello_worldv U         
main T 0000000000000000 0000000000000010
print_hello_world.o:
_Z17print_hello_worldv T 0000000000000000 0000000000000027
_ZNSolsEPFRSoS_E U         
_ZNSt8ios_base4InitC1Ev U         
_ZNSt8ios_base4InitD1Ev U         
_ZSt4cout U         
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U         
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U         
__cxa_atexit U         
__dso_handle U         
hello_world_string U         
hello_world_string.o:
hello_world_string R 0000000000000010 0000000000000008

The output format for nm is described in its man page. With the -g option, output is restricted to globally visible symbols. With the -P option, each symbol line is:

<symbol> <type> <offset-in-hex> <size-in-hex>

For example, we see that hello_world.o defines a global text symbol main with at position 0 with a size of 0x10. This is where the loader will start execution.

We also see that hello_world.o needs (i.e. “has an undefineed symbol for”) _Z17print_hello_worldv. This means that, in order to run, hello_world.o must be linked against something else which provides that symbol. The symbol is for our print_hello_world function. The _Z17 prefix and v postfix are a result of name mangling, and depend on the compiler used and function signature. Moving on, we see that print_hello_world.o defines the _Z17print_hello_worldv at position 0 with a size of 0x27. So linking print_hello_world.o with hello_world.o would resolve the symbols needed by hello_world.o.

print hello_world.o has undefined symbols of its own, so we can't stop yet. It needs hello_world_string (provided by hello_world_string.o), _ZSt4cout (provided by libcstd++), ….

The process of linking involves bundling up enough of these partial code chunks so that each of them has access to the symbols it needs.

There are a number of other tools that will let you poke into the innards of object files. If nm doesn't scratch your itch, you may want to look at the more general objdump.

Storage classes

In the previous section I mentioned “globally visible symbols”. When you declare or define a symbol (variable, function, …), you can use storage classes to tell the compiler about your symbols' linkage and storage duration.

For more details, you can read through §6.2.2 Linkages of identifiers, §6.2.4 Storage durations of objects, and §6.7.1 Storage-class specifiers in WG14/N1570, the last public version of ISO/IEC 9899:2011 (i.e. the C11 standard).

Since we're just worried about linking, I'll leave the discussion of storage duration to others. With linkage, you're basically deciding which of the symbols you define in your translation unit should be visible from other translation units. For example, in print_hello_world.h, we declare that there is a function print_hello_world (with a particular signature). The extern means that may be defined in another translation unit. For block-level symbols (i.e. things defined in the root level of your source file, not inside functions and the like), this is the default; writing extern just makes it explicit. When we define the function in print_hello_world.cpp, we also label it as extern (again, this is the default). This means that the defined symbol should be exported for use by other translation units.

By way of comparison, the string secret_string defined in hello_world_string.cpp is declared static. This means that the symbol should be restricted to that translation unit. In other words, you won't be able to access the value of secret_string from print_hello_world.cpp.

When you're writing a library, it is best to make any functions that you don't need to export static and to avoid global variables altogether.

Static libraries

You never want to code everything required by a program on your own. Because of this, people package related groups of functions into libraries. Programs can then take use functions from the library, and avoid coding that functionality themselves. For example, you could consider print_hello_world.o and hello_world_string.o to be little libraries used by hello_world.o. Because the two object files are so tightly linked, it would be convenient to bundle them together in a single file. This is what static libraries are, bundles of object files. You can create them using ar (from “archive”; ar is the ancestor of tar, from “tape archive”).

You can use nm to list the symbols for static libraries exactly as you would for object files:

$ make inspect-static-library
nm -Pg libhello_world.a
libhello_world.a[print_hello_world.o]:
_Z17print_hello_worldv T 0000000000000000 0000000000000027
_ZNSolsEPFRSoS_E U         
_ZNSt8ios_base4InitC1Ev U         
_ZNSt8ios_base4InitD1Ev U         
_ZSt4cout U         
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U         
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U         
__cxa_atexit U         
__dso_handle U         
hello_world_string U         
libhello_world.a[hello_world_string.o]:
hello_world_string R 0000000000000010 0000000000000008

Notice that nothing has changed from the object file output, except that object file names like print_hello_world.o have been replaced by libhello_world.a[print_hello_world.o].

Shared libraries

Library code from static libraries (and object files) is built into your executable at link time. This means that when the library is updated in the future (bug fixes, extended functionality, …), you'll have to relink your program to take advantage of the new features. Because no body wants to recompile an entire system when someone makes cout a bit more efficient, people developed shared libraries. The code from shared libraries is never built into your executable. Instead, instructions on how to find the dynamic libraries are built in. When you run your executable, a loader finds all the shared libraries your program needs and copies the parts you need from the libraries into your program's memory. This means that when a system programmer improves cout, your program will use the new version automatically. This is a Good Thing™.

You can use ldd to list the shared libraries your program needs:

$ make list-executable-shared-libraries
ldd hello_world
        linux-vdso.so.1 =>  (0x00007fff76fbb000)
        libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/libstdc++.so.6 (0x00007ff7467d8000)
        libm.so.6 => /lib64/libm.so.6 (0x00007ff746555000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff74633e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007ff745fb2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ff746ae7000)

The format is:

soname => path (load address)

You can also use nm to list symbols for shared libraries:

$ make inspect-shared-libary | head
nm -Pg --dynamic libhello_world.so
_Jv_RegisterClasses w         
_Z17print_hello_worldv T 000000000000098c 0000000000000034
_ZNSolsEPFRSoS_E U         
_ZNSt8ios_base4InitC1Ev U         
_ZNSt8ios_base4InitD1Ev U         
_ZSt4cout U         
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U         
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U         
__bss_start A 0000000000201030 
__cxa_atexit U         
__cxa_finalize w         
__gmon_start__ w         
_edata A 0000000000201030 
_end A 0000000000201048 
_fini T 0000000000000a58 
_init T 0000000000000810 
hello_world_string D 0000000000200dc8 0000000000000008

You can see our hello_world_string and _Z17print_hello_worldv, along with the undefined symbols like _ZSt4cout that our code needs. There are also a number of symbols to help with the shared library mechanics (e.g. _init).

To illustrate the “link time” vs. “load time” distinction, run:

$ make run
./hello_world
Hello, World!
./hello_world-static
Hello, World!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Hello, World!

Then switch to the Goodbye definition in hello_world_string.cpp:

//extern const char * const hello_world_string = "Hello, World!";
extern const char * const hello_world_string = "Goodbye!";

Recompile the libraries (but not the executables) and run again:

$ make libs
…
$ make run
./hello_world
Hello, World!
./hello_world-static
Hello, World!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Goodbye!

Finally, relink the executables and run again:

$ make
…
$ make run
./hello_world
Goodbye!
./hello_world-static
Goodbye!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Goodbye!

When you have many packages depending on the same low-level libraries, the savings on avoided rebuilding is large. However, shared libraries have another benefit over static libraries: shared memory.

Much of the machine code in shared libraries is static (i.e. it doesn't change as a program is run). Because of this, several programs may share the same in-memory version of a library without stepping on each others toes. With statically linked code, each program has its own in-memory version:

StaticShared
Program A → Library BProgram A → Library B
Program C → Library BProgram C ⎯⎯⎯⎯⬏

Further reading

If you're curious about the details on loading and shared libraries, Eli Bendersky has a nice series of articles on load time relocation, PIC on x86, and PIC on x86-64.

Blinkenlights

Achtung Alles Lookenspeepers!

Das computermachine ist nicht für gefingerpoken und mittengrabben. Ist easy schnappen der springenwerk, blowenfusen und poppencorken mit spitzensparken. Ist nicht fuer gewerken bei das dumpkopfen. Das rubbernecken sichtseeren keepen das cotten-pickenen hans in das pockets muss; relaxen und watchen das blinkenlichten.

Posted
Screen

Screen is a ncurses-based terminal multiplexer. There are tons of useful things you can do with it, and innumerable blog posts describing them. I have two common use cases:

  • On my local host when I don't start X Windows, I login to a virtual terminal and run screen. Then I can easily open several windows (e.g. for Emacs, [[Mutt], irssi, …) without having to log in on another virtual terminal.
  • On remote hosts when I'm doing anything serious, I start screen immediately aftering SSH-ing into the remote host. Then if my connection is dropped (or I need to disconnect while I take the train in to work), my remote work is waiting for me to pick up where I left off.

Treehouse X

Those are useful things, but they are well covered by others. A few days ago I though of a cute trick, for increasing security on my local host, which lead me to finally write up a screen post. I call it “treehouse X”. Here's the problem:

You don't like waiting for X to start up when a virtual terminal is sufficient for your task at hand, so you've set your box up without a graphical login manager. However, sometimes you do need a graphical interface (e.g. to use fancy characters via Xmodmap or the Compose key), so you fire up X with startx, and get on with your life. But wait! You have to leave the terminal to do something else (e.g. teach a class, eat dinner, sleep?). Being a security-concious bloke, you lock your screen with xlockmore (using your Fluxbox hotkeys). You leave to complete your task. While you're gone Mallory sneaks into your lab. You've locked your X server, so you think you're safe, but Mallory jumps to the virtual terminal from which you started X (using Ctrl-Alt-F1, or similar), and kills your startx process with Ctrl-c. Now Mallory can do evil things in your name, like adding export EDITOR=vim to your .bashrc.

So how do you protect yourself against this attack? Enter screen and treehouse X. If you run startx from within a screen session, you can jump back to the virtual terminal yourself, detach from the sesion, and log out of the virtual terminal. This is equivalent to climing into your treehouse (X) and pulling up your rope ladder (startx) behind you, so that you are no longer vulnerable from the ground (the virtual terminal). For kicks, you can reattach to the screen session from an xterm, which leads to a fun chicken-and-egg picture:

startx → X → Xterm → Screen → startx cycle
startx → X → Xterm → Screen → startx cycle

Of course the whole situation makes sense when you realize that it's really:

$ pstree 14542
screen───bash───startx───xinit─┬─X
                               └─fluxbox───xterm───bash───screen

where the first screen is the server and the second screen is the client.

Cython

Cython is a Python-like language that makes it easy to write C-based extensions for Python. This is a Good Thing™, because people who will write good Python wrappers will be fluent in Python, but not necessarily in C. Alternatives like SWIG allow you to specify wrappers in a C-like language, which makes thin wrappers easy, but can lead to a less idomatic wrapper API. I should also point out ctypes, which has the advantage of avoiding compiled wrappers altogether, at the expense of dealing with linking explicitly in the Python code.

The Cython docs are fairly extensive, and I found them to be sufficient for writing my pycomedi wrapper around the Comedi library. One annoying thing was that Cython does not support __all__ (cython-users). I took a stab at fixing this, but got sidetracked cleaning up the Cython parser (cython-devel, later in cython-devel). I must have bit off more than I should have, since I eventually ran out of time to work on merging my code, and the Cython trunk moved off without me ;).

SWIG

SWIG is a Simplified Wrapper and Interface Generator. It makes it very easy to provide a quick-and-dirty wrapper so you can call code written in C or C++ from code written in another (e.g. Python). I don't do much with SWIG, because while building an object oriented wrapper in SWIG is possible, I could never get it to feel natural (I like Cython better). Here are my notes from when I do have to interact with SWIG.

%array_class and memory management

%array_class (defined in carrays.i) lets you wrap a C array in a class-based interface. The example from the docs is nice and concise, but I was running into problems.

>>> import example
>>> n = 3
>>> data = example.sample_array(n)
>>> for i in range(n):
...     data[i] = 2*i + 3
>>> example.print_sample_pointer(n, data)
Traceback (most recent call last):
  ...
TypeError: in method 'print_sample_pointer', argument 2 of type 'sample_t *'

I just bumped into these errors again while trying to add an insn_array class to Comedi's wrapper:

%array_class(comedi_insn, insn_array);    

so I decided it was time to buckle down and figure out what was going on. All of the non-Comedi examples here are based on my example test code.

The basic problem is that while you and I realize that an array_class-based instance is interchangable with the underlying pointer, SWIG does not. For example, I've defined a sample_vector_t struct:

typedef double sample_t;
typedef struct sample_vector_struct {
  size_t n;
  sample_t *data;
} sample_vector_t;

and a sample_array class:

%array_class(sample_t, sample_array);

A bare instance of the double array class has fancy SWIG additions for getting and setting attributes. The class that adds the extra goodies is SWIG's proxy class:

>>> print(data)  # doctest: +ELLIPSIS
<example.sample_array; proxy of <Swig Object of type 'sample_array *' at 0x...> >

However, C functions and structs interact with the bare pointer (i.e. without the proxy goodies). You can use the .cast() method to remove the goodies:

>>> data.cast()  # doctest: +ELLIPSIS
<Swig Object of type 'double *' at 0x...>
>>> example.print_sample_pointer(n, data.cast())
>>> vector = example.sample_vector_t()
>>> vector.n = n
>>> vector.data = data
Traceback (most recent call last):
  ...
TypeError: in method 'sample_vector_t_data_set', argument 2 of type 'sample_t *'
>>> vector.data = data.cast()
>>> vector.data  # doctest: +ELLIPSIS
<Swig Object of type 'double *' at 0x...>

So .cast() gets you from proxy of <Swig Object ...> to <Swig Object ...>. How you go the other way? You'll need this if you want to do something extra fancy, like accessing the array members ;).

>>> vector.data[0]
Traceback (most recent call last):
  ...
TypeError: 'SwigPyObject' object is not subscriptable

The answer here is the .frompointer() method, which can function as a class method:

>>> reconst_data = example.sample_array.frompointer(vector.data)
>>> reconst_data[n-1]
7.0

Or as a single line:

>>> example.sample_array.frompointer(vector.data)[n-1]
7.0

I chose the somewhat awkward name of reconst_data for the reconstitued data, because if you use data, you clobber the earlier example.sample_array(n) definition. After the clobber, Python garbage collects the old data, and becase the old data claims it owns the underlying memory, Python frees the memory. This leaves vector.data and reconst_data pointing to unallocated memory, which is probably not what you want. If keeping references to the original objects (like I did above with data) is too annoying, you have to manually tweak the ownership flag:

>>> data.thisown
True
>>> data.thisown = False
>>> data = example.sample_array.frompointer(vector.data)
>>> data[n-1]
7.0

This way, when data is clobbered, SWIG doesn't release the underlying array (because data no longer claims to own the array). However, vector doesn't own the array either, so you'll have to remember to reattach the array to somthing that will clean it up before vector goes out of scope to avoid leaking memory:

>>> data.thisown = True
>>> del vector, data

For deeply nested structures, this can be annoying, but it will work.


Powered by ikiwiki.