tags/linuxunfolding disastershttp://blog.tremily.us//tags/linux/unfolding disastersikiwiki2013-09-20T04:04:58ZComcasthttp://blog.tremily.us//posts/Comcast/2013-09-20T04:04:58Z2013-09-20T04:04:58Z
<p>A while back I <a href="http://blog.tremily.us//tags/linux/../../posts/Comcast_rediculousness/">posted about Comcast blocking outgoing traffic on
port 25</a>. We've spent some time with
Verizon's DSL service, but after our recent move we're back with
Comcast. Luckily, Comcast now explicitly lists <a href="http://customer.comcast.com/help-and-support/internet/list-of-blocked-ports/">the ports they
block</a>. Nothing I care about, except for port 25 (incoming and
outgoing). For incoming mail, I use <a href="http://dyn.com/">Dyn</a> to forward mail to <a href="http://blog.tremily.us//tags/linux/../../posts/Postfix/">port
587</a>. For outgoing mail, I had been using <span class="createlink">stunnel</span>
through <code>outgoing.verizon.net</code> for my <a href="http://blog.tremily.us//tags/linux/../../posts/SMTP/">SMTP</a> connections. Comcast
takes a <a href="http://customer.comcast.com/help-and-support/internet/email-client-programs-with-xfinity-email/">similar approach</a> forcing outgoing mail through port
465 on <code>smtp.comcast.net</code>.</p>
One-off Git daemonhttp://blog.tremily.us//posts/One-off_Git_daemon/2013-04-14T00:44:55Z2013-02-19T00:00:45Z
<p>In my <a href="http://blog.tremily.us//tags/linux/../../posts/gitweb/">gitweb</a> post, I explain how to setup <code>git daemon</code> to serve
<code>git://</code> requests under <a href="http://blog.tremily.us//tags/linux/../../posts/Nginx/">Nginx</a> on <span class="createlink">Gentoo</span>. This post talks
about a different situation, where you want to toss up a Git daemon
for collaboration on your LAN. This is useful when you're teaching
Git to a room full of LAN-sharing students, and you don't want to
bother setting up public repositories more permanently.</p>
<h1>Serving a few repositories</h1>
<p>Say you have a repository that you want to serve:</p>
<pre><code>$ mkdir -p ~/src/my-project
$ cd ~/src/my-project
$ git init
$ …hack hack hack…
</code></pre>
<p>Fire up the daemon (probably in another terminal so you can keep
hacking in your original terminal) with:</p>
<pre><code>$ cd ~/src
$ git daemon --export-all --base-path=. --verbose ./my-project
</code></pre>
<p>Then you can clone with:</p>
<pre><code>$ git clone git://192.168.1.2/my-project
</code></pre>
<p>replacing <code>192.168.1.2</code> with your public IP address (e.g. from <code>ip
addr show scope global</code>). Add additional repository paths to the <code>git
daemon</code> call to serve additional repositories.</p>
<h1>Serving a single repository</h1>
<p>If you don't want to bother listing <code>my-project</code> in your URLs, you can
base the daemon in the project itself (instead of in the parent
directory):</p>
<pre><code>$ cd
$ git daemon --export-all --base-path=src/my-project --verbose
</code></pre>
<p>Then you can clone with:</p>
<pre><code>$ git clone git://192.168.1.2/
</code></pre>
<p>This may be more convenient if you're only sharing a single
repository.</p>
<h1>Enabling pushes</h1>
<p>If you want your students to be able to push to your repository during
class, you can run:</p>
<pre><code>$ git daemon --enable=receive-pack …
</code></pre>
<p>Only do this on a trusted LAN with a junk test repository, because it
will allow <em>anybody</em> to push <em>anything</em> or remove references.</p>
AVRhttp://blog.tremily.us//posts/AVR/2012-07-28T12:36:59Z2012-07-28T12:36:59Z
<p>I've been wanting to get into microcontroller programming for a while
now, and last week I broke down and ordered components for a
<a href="http://arduino.cc/en/Main/Standalone">breadboard Arduino</a> from <a href="http://www.mouser.com/">Mouser</a>. There's a
fair amount of buzz about the <a href="http://arduino.cc/">Arduino</a> platform, but I find the
whole <a href="http://arduino.cc/en/Guide/Environment">sketch infrastucture</a> confusing. I'm a big fan of
command line tools in general, so the whole IDE thing was a bit of a
turn off.</p>
<p>Because the <a href="http://www.atmel.com/devices/atmega328.aspx">ATMega328</a> doesn't have a USB controller, I also bought
a <a href="http://pjrc.com/teensy/">Teensy 2.0</a> from <a href="http://pjrc.com/">PJRC</a>. The Teensy is just an
<a href="http://www.atmel.com/devices/atmega32u4.aspx">ATMega32u4</a> on a board with supporting hardware (clock, reset
switch, LED, etc). I've packed the Teensy programmer and HID listener
in my <a href="http://blog.tremily.us//tags/linux/../../posts/Gentoo_overlay/">Gentoo overlay</a>, to make it easier to install them and stay
up to date.</p>
<p>Arduinos (and a number of similar projects) are based on <a href="http://en.wikipedia.org/wiki/Atmel_AVR">AVR</a>
microcontrollers like the ATMegas. Writing code for an AVR processor
is the similar to writing code for any other processor. <a href="http://gcc.gnu.org/">GCC</a> will
cross-compile your code once you've setup a cross-compiling toolchain.
There's a good intro to the whole embedded approach in the <a href="http://www.gentoo.org/proj/en/base/embedded/handbook/">Gentoo
Embedded Handbook</a>.</p>
<p>For all the AVR-specific features you can use <a href="http://www.nongnu.org/avr-libc/">AVR-libc</a>, an open
source C library for AVR processors. It's hard to imagine doing
anything interesting without using this library, so you should at
least skim through the manual. They also have a few interesting
<a href="http://www.nongnu.org/avr-libc/user-manual/group__demos.html">demos</a> to get you going.</p>
<p>AVR-libc sorts chip-support code into AVR architecture subdirectories.
For example, object code specific to my <a href="http://www.atmel.com/devices/atmega32u4.aspx">ATMega32u4</a> is installed at
<code>/usr/avr/lib/avr5/crtm32u4.o</code>. <code>avr5</code> is the <a href="http://www.nongnu.org/avr-libc/user-manual/using_tools.html#using_avr_gcc_mach_opt">AVR architecture
version</a> of this chip.</p>
<h2>Crossdev</h2>
<p>Since you will probably not want to build a version of GCC that runs
on your AVR chip, you'll be building a cross comiling toolchain. The
toolchain will allow you to use your development box to compile
programs for your AVR chip. On Gentoo, the recommended approach is to
use <a href="http://en.gentoo-wiki.com/wiki/Crossdev">crossdev</a> to build the toolchain (although <a href="https://bugs.gentoo.org/show_bug.cgi?id=147155">crossdev's AVR
support can be flaky</a>). They suggest you install it in a
<a href="http://www.gentoo.org/proj/en/base/embedded/handbook/?part=1&chap=2#doc_chap1">stage3 chroot to protect your native toolchain</a>, but I think
it's easier to just make <a href="http://blog.tremily.us//tags/linux/../../posts/btrfs/">btrfs</a> snapshots of my hard drive before
doing something crazy. I didn't have any trouble skipping the chroot
on my sytem, but your mileage may vary.</p>
<pre><code># emerge -av crossdev
</code></pre>
<p>Because it has per-arch libraries (like <code>avr5</code>), AVR-libc needs to be
built with <a href="http://www.gentoo.org/doc/en/gentoo-amd64-faq.xml#multilib">multilib</a> support. If you (like me) have avoided
multilib like the plague so far, you'll need to patch crossdev to turn
on multilib for the AVR tools. Do this by applying Jess'
<a href="https://bugs.gentoo.org/attachment.cgi?id=304037">patch</a> from <a href="https://bugs.gentoo.org/show_bug.cgi?id=377039">bug 377039</a>.</p>
<pre><code># wget -O crossdev-avr-multilib.patch 'https://bugs.gentoo.org/attachment.cgi?id=304037'
# patch /usr/bin/crossdev < crossdev-avr-multilib.patch
</code></pre>
<p>If you're using a profile where multilib is masked
(e.g. <code>default/linux/x86/10.0/desktop</code>) you should use Niklas'
<a href="https://bugs.gentoo.org/attachment.cgi?id=285901">extended version of the patch</a> from the duplicate <a href="https://bugs.gentoo.org/show_bug.cgi?id=378387">bug
378387</a>.</p>
<p>Despite claiming to use the last overlay in <code>PORTDIR_OVERLAY</code>,
crossdev currently <a href="https://bugs.gentoo.org/show_bug.cgi?id=428420">uses the first</a>, so if you use
<a href="http://layman.sourceforge.net/">layman</a> to manage your overlays (like <a href="http://blog.tremily.us//tags/linux/../../posts/Gentoo_overlay/">mine</a>),
you'll want to tweak your <code>make.conf</code> to look like:</p>
<pre><code>source /var/lib/layman/make.conf
PORTDIR_OVERLAY="/usr/local/portage ${PORTDIR_OVERLAY}"
</code></pre>
<p>Now you can install your toolchain following the <a href="http://en.gentoo-wiki.com/wiki/Crossdev#AVR_Architecture">Crossdev
wiki</a>. First install a minimal GCC (stage 1) using</p>
<pre><code># USE="-cxx -openmp" crossdev --binutils 9999 -s1 --without-headers --target avr
</code></pre>
<p>Then install a full featured GCC (stage 4) using</p>
<pre><code># USE="cxx -nocxx" crossdev --binutils 9999 -s4 --target avr
</code></pre>
<p>I use <code>binutils-9999</code> to install live from the <a href="http://sourceware.org/git/?p=binutils.git">git mirror</a>,
which avoids <a href="http://sourceware.org/bugzilla/show_bug.cgi?id=12161">a segfault bug in binutils 2.22</a>.</p>
<p>After the install, I was getting bit by <a href="https://bugs.gentoo.org/show_bug.cgi?id=147155">bug 147155</a>:</p>
<pre><code>cannot open linker script file ldscripts/avr5.x
</code></pre>
<p>Which I work around with:</p>
<pre><code># ln -s /usr/x86_64-pc-linux-gnu/avr/lib/ldscripts /usr/avr/lib/ldscripts
</code></pre>
<p>Now you're ready. Go forth and build!</p>
<h2>Cross compiler construction</h2>
<p>Why do several stages of GCC need to be built anyway? From <code>crossdev
--help</code>, here are the stages:</p>
<ol>
<li>Build just binutils</li>
<li>Also build a bare C compiler (no C library/C++/shared GCC libs/C++
exceptions/etc…)</li>
<li>Also build kernel headers</li>
<li>Also build the C library</li>
<li>Also build a full compiler</li>
</ol>
curses_check_for_keypresshttp://blog.tremily.us//posts/curses-check-for-keypress/2012-07-14T19:10:54Z2012-07-14T19:10:54Z
<p><span class="infobox">
Available in a <a href="http://blog.tremily.us//tags/linux/../git/">git</a> repository.<br />
Repository: <a href="git://tremily.us/curses-check-for-keypress.git" rel="vcs-git" title="curses-check-for-keypress repository">curses-check-for-keypress</a><br />
Browsable repository: <a href="http://git.tremily.us/?p=curses-check-for-keypress.git" rel="vcs-git" title="curses-check-for-keypress repository">curses-check-for-keypress</a><br />
Author: W. Trevor King<br />
</span></p>
<p>There are some points in my experiment control code where the program
does something for an arbitrary length of time (e.g, waits while the
operator manually adjusts a laser's alignment). For these situations,
I wanted to be able to loop until the user pressed a key. This is a
simple enough idea, but the implementation turned out to be
complicated enough for me to spin it out as a stand-alone module.</p>
DVD Backuphttp://blog.tremily.us//posts/DVD_Backup/2013-09-20T04:04:58Z2012-04-27T14:08:45Z
<p>I've been using <a href="http://code.google.com/p/abcde/">abcde</a> to rip our audio CD collection onto our
fileserver for a few years now. Then I can play songs from across the
collection using <a href="http://blog.tremily.us//tags/linux/../../posts/MPD/">MPD</a> without having to dig the original CDs out of
the closet. I just picked up a large external hard drive and thought
it might be time to take a look at ripping our DVD collection as well.</p>
<p>There is an excellent <a href="http://www.scottro.net/qnd/qnd-dvd-backup.html">Quick-n-Dirty Guide</a> that goes into more
detail on all of this, but here's an executive summary.</p>
<p>Make sure your kernel understands the <a href="http://en.wikipedia.org/wiki/Universal_Disk_Format">UDF file system</a>:</p>
<pre><code>$ grep CONFIG_UDF_FS /usr/src/linux/.config
</code></pre>
<p>If your kernel was compiled with <code>CONFIG_IKCONFIG_PROC</code> enabled, you
could use</p>
<pre><code>$ zcat /proc/config.gz | grep CONFIG_UDF_FS
</code></pre>
<p>instead, to make sure you're checking the configuration of the
currently running kernel. If the <code>udf</code> driver was compiled as a
module, make sure it's loaded.</p>
<pre><code>$ sudo modprobe udf
</code></pre>
<p>Mount your DVD somewhere:</p>
<pre><code>$ sudo mount /dev/dvd /mnt/dvd
</code></pre>
<p>Now you're ready to rip. You've got two options: you can copy the
VOBs over directly, or rip the DVD into an alternative container
format such as <a href="http://www.matroska.org/">Matroska</a>.</p>
<h2>Vobcopy</h2>
<p>Mirror the disc with <a href="http://lpn.rnbhq.org/">vobcopy</a> (<code>media-video/vobcopy</code> on <span class="createlink">Gentoo</span>):</p>
<pre><code>$ vobcopy -m -t "Awesome_Movie" -v -i /mnt/dvd -o ~/movies/
</code></pre>
<p>Play with <a href="http://www.mplayerhq.hu/">Mplayer</a> (<code>media-video/mplayer</code> on <span class="createlink">Gentoo</span>):</p>
<pre><code>$ mplayer -nosub -fs -dvd-device ~/movies/Awesome_Movie dvd://1
</code></pre>
<p>where <code>-nosub</code> and <code>-fs</code> are optional.</p>
<h2>Matroska</h2>
<p>Remux the disc (without reencoding) with <code>mkvmerge</code> (from
<a href="http://www.bunkus.org/videotools/mkvtoolnix">MKVToolNix</a>, <code>media-video/mkvtoolnix</code> on <span class="createlink">Gentoo</span>):</p>
<pre><code>$ mkvmerge -o ~/movies/Awesome_Movie.mkv /mnt/dvd/VIDEO_TS/VTS_01_1.VOB
(Processing the following files as well: "VTS_01_2.VOB", "VTS_01_3.VOB", "VTS_01_4.VOB", "VTS_01_5.VOB")
</code></pre>
<p>Then you can do all the usual tricks. Here's an example of extracting
a slice of the Matroska file as silent video in an AVI container with
<code>mencoder</code> (from <a href="http://www.mplayerhq.hu/">Mplayer</a>, <code>media-video/mplayer</code> on <span class="createlink">Gentoo</span>):</p>
<pre><code>$ mencoder -ss 00:29:20.3 -endpos 00:00:21.6 Awesome_Movie.mkv -nosound -of avi -ovc copy -o silent-clip.avi
</code></pre>
<p>Here's an example of extracting a slice of the Matroska file as audio
in an AC3 container:</p>
<pre><code>$ mencoder -ss 51.1 -endpos 160.9 Awesome_Movie.mkv -of rawaudio -ovc copy -oac copy -o audio-clip.ac3
</code></pre>
<p>You can also take a look through the <a href="http://en.gentoo-wiki.com/wiki/Ripping_DVD_to_Matroska_and_H.264">Gentoo wiki</a> and <a href="http://ubuntuforums.org/showthread.php?s=e45e01b671c1dd08351876fda432f04a&t=1400598&page=2">this
Ubuntu thread</a> for more ideas.</p>
Killing defunct processeshttp://blog.tremily.us//posts/Killing_defunct_processes/2012-04-27T13:27:14Z2012-04-27T13:27:14Z
<p><a href="http://www.cts.wustl.edu/~allen/">Allen Rueter</a> <a href="http://www.cts.wustl.edu/~allen/kill-defunct-process.html">points out</a> that one way to kill a defunct
process is to kill its parent or child:</p>
<pre><code># ps -ef | grep '<defunct>\|PPID'
UID PID PPID C STIME TTY TIME CMD
zzz 13868 1 0 0:00 <defunct>
# ps -ef | grep '13868\|PPID'
UID PID PPID C STIME TTY TIME CMD
zzz 13868 1 0 0:00 <defunct>1
zzz 16596 13868 0 0:00 a.out
# kill -9 16596
</code></pre>
pyassuanhttp://blog.tremily.us//posts/pyassuan/2012-05-12T11:32:08Z2012-03-24T11:22:00Z
<p><span class="infobox">
Available in a <a href="http://blog.tremily.us//tags/linux/../git/">git</a> repository.<br />
Repository: <a href="git://tremily.us/pyassuan.git" rel="vcs-git" title="pyassuan repository">pyassuan</a><br />
Browsable repository: <a href="http://git.tremily.us/?p=pyassuan.git" rel="vcs-git" title="pyassuan repository">pyassuan</a><br />
Author: W. Trevor King<br />
</span></p>
<p>I've been trying to come up with a clean way to verify detached
<a href="http://blog.tremily.us//tags/linux/../../posts/PGP/">PGP</a> signatures from <a href="http://blog.tremily.us//tags/linux/../../posts/Python/">Python</a>. There are <a href="http://wiki.python.org/moin/GnuPrivacyGuard">a number of existing
approaches to this problem</a>. Many of them call <a href="http://www.gnupg.org/">gpg</a>
using Python's <a href="http://docs.python.org/library/multiprocessing.html">multiprocessing</a> or <a href="http://docs.python.org/library/subprocess.html">subprocess</a> modules, but to
verify detached signatures, you need to send the signature in <a href="http://lists.gnupg.org/pipermail/gnupg-devel/2002-November/019343.html">on a
separate file descriptor</a>, and handling that
in a way safe from deadlocks is difficult. The other approach, taken
by <a href="http://pyme.sourceforge.net/">PyMe</a> is to wrap <a href="http://www.gnupg.org/related_software/gpgme/">GPGME</a> using <a href="http://blog.tremily.us//tags/linux/../../posts/SWIG/">SWIG</a>, which is great as far
as it goes, but development seems to have stalled, and I find the raw
GPGME interface excessively complicated.</p>
<p>The GnuPG tools themselves often communicate over sockets using the
<a href="http://www.gnupg.org/documentation/manuals/assuan/">Assuan protocol</a>, and I'd already written an Assuan server to
handle <a href="http://git.tremily.us/?p=pyassuan.git;a=blob;f=bin/pinentry.py;hb=HEAD">pinentry</a> (originally for my <a href="http://blog.tremily.us//tags/linux/../../posts/gpg-agent/">gpg-agent</a> post, not part of
pyassuan). I though it would be natural if there was a <code>gpgme-agent</code>
which would handle cryptographic tasks over this protocol, which would
make the <a href="http://blog.tremily.us//tags/linux/../../posts/pgp-mime/">pgp-mime</a> implementation easier. It turns out that there
already is such an agent (<a href="http://git.gnupg.org/cgi-bin/gitweb.cgi?p=gpgme.git;a=blob;f=src/gpgme-tool.c;hb=HEAD">gpgme-tool</a>), so I turned my pinentry
script into the more general pyassuan package. Now using Assuan from
Python should be as easy (or easier?) than using it from C via
<a href="http://www.gnupg.org/related_software/libassuan/">libassuan</a>.</p>
<p>The <code>README</code> is posted on the <a href="http://pypi.python.org/pypi/pyassuan/">PyPI page</a>.</p>
pygraderhttp://blog.tremily.us//posts/pygrader/2012-09-02T20:29:26Z2012-03-23T20:12:08Z
<p><span class="infobox">
Available in a <a href="http://blog.tremily.us//tags/linux/../git/">git</a> repository.<br />
Repository: <a href="git://tremily.us/pygrader.git" rel="vcs-git" title="pygrader repository">pygrader</a><br />
Browsable repository: <a href="http://git.tremily.us/?p=pygrader.git" rel="vcs-git" title="pygrader repository">pygrader</a><br />
Author: W. Trevor King<br />
</span></p>
<p>The last two courses I've TAd at Drexel have been scientific computing
courses where the students are writing code to solve homework
problems. When they're done, they email the homework to me, and I
grade it and email them back their grade and comments. I've played
around with developing a few grading frameworks over the years (a few
years back, one of the big intro courses kept the grades in an Excel
file on a <a href="http://www.samba.org/">Samba</a> share, and I wrote a script to automatically sync
local comma-separated-variable data with that spreadsheet. Yuck :p),
so I figured this was my change to polish up some old scripts into a
sensible system to help me stay organized. This system is pygrader.</p>
<p>During the polishing phase, I was searching around looking for prior
art ;), and found that Alex Heitzmann had already created <a href="http://code.google.com/p/pygrade/">pygrade</a>,
which is the name I under which I had originally developed my own
project. While they are both grade databases written in <a href="http://blog.tremily.us//tags/linux/../../posts/Python/">Python</a>,
Alex's project focuses on providing a more integrated grading
environment.</p>
<p>Pygrader accepts assignment submissions from students through its
<code>mailpipe</code> command, which you can run on your email inbox (or from
<a href="http://www.procmail.org/">procmail</a>). Students submit assignments with an email subject like</p>
<pre><code>[submit] <assignment name>
</code></pre>
<p><code>mailpipe</code> automatically drops the submissions into a
<code>student/assignment/mail</code> mailbox, extracts any <a href="http://en.wikipedia.org/wiki/MIME">MIME</a> attachments
into the <code>student/assignment/</code> directory (without clobbers, with
proper timestamps), and leaves you to get to work.</p>
<p>Pygrader also supports multiple graders through the <code>mailpipe</code>
command. The other graders can request a student's submission(s) with
an email subject like</p>
<pre><code>[get] <student name>, <assignment name>
</code></pre>
<p>Then they can grade the submission and mail the grade back with an
email subject like</p>
<pre><code>[grade] <student name>, <assignment name>
</code></pre>
<p>The grade-altering messages are also stored in the
<code>student/assignment/mail</code> mailbox, so you can peruse them later.</p>
<p>Pygrader <em>doesn't</em> spawn editors or GUIs to help you browse through
submissions or assigning grades. As far as I am concerned, this is a
good thing.</p>
<p>When you're done grading, pygrader can email (<code>email</code>) your grades and
comments back to the students, signing or encrypting with <a href="http://blog.tremily.us//tags/linux/../../posts/pgp-mime/">pgp-mime</a>
if either party has configured a <a href="http://blog.tremily.us//tags/linux/../../posts/PGP/">PGP</a> key. It can also email a
tab-delimited table of grades to the professors to keep them up to
speed. If you're running <code>mailpipe</code> via procmail, responses to grade
request are sent automatically.</p>
<p>While you're grading, pygrader can search for ungraded assignments, or
for grades that have not yet been sent to students (<code>todo</code>). It can
also check for resubmissions, where new submissions come in response
to earlier grades.</p>
<p>The <code>README</code> is posted on the <a href="http://pypi.python.org/pypi/pgp-mime/">PyPI page</a>.</p>
Linkinghttp://blog.tremily.us//posts/Linking/2012-03-10T23:35:09Z2012-03-10T23:35:09Z
<p>This example shows the details of linking a simple program from three
source files. There are three ways to link: directly from object
files, statically from static libraries, or dynamically from shared
libraries. If you're following along in my <a href="http://blog.tremily.us//tags/linux/../../posts/Linking/linking.tar.gz">example
source</a>, you can compile the three flavors of the
<code>hello_world</code> program with:</p>
<pre><code>$ make
</code></pre>
<p>And then run them with:</p>
<pre><code>$ make run
</code></pre>
<h1>Compiling and linking</h1>
<p>Here's the general compilation process:</p>
<ol>
<li>Write code in a human-readable language (C, C++, …).</li>
<li>Compile the code to <a href="http://en.wikipedia.org/wiki/Object_file">object files</a> (<code>*.o</code>) using a
compiler (<code>gcc</code>, <code>g++</code>, …).</li>
<li>Link the code into executables or libraries using a <a href="http://en.wikipedia.org/wiki/Linker_%28computing%29">linker</a>
(<code>ld</code>, <code>gcc</code>, <code>g++</code>, …).</li>
</ol>
<p>Object files are binary files containing <a href="http://en.wikipedia.org/wiki/Instruction_set_architecture">machine code</a>
versions of the human-readable code, along with some bookkeeping
information for the linker (relocation information, stack unwinding
information, program symbols, …). The machine code is specific to a
particular processor architecture (e.g. <a href="http://en.wikipedia.org/wiki/X86-64">x86-64</a>).</p>
<p>Linking files resolves references to symbols defined in translation
units, because a single object file will rarely (never?) contain
definitions for all the symbols it requires. It's easy to get
confused about the difference between compiling and linking, because
you often use the same program (e.g. <code>gcc</code>) for both steps. In
reality, <code>gcc</code> is performing the compilation on its own, but is
using external utilities like <code>ld</code> for the linking. To see this in
action, add the <code>-v</code> (verbose) option to your <code>gcc</code> (or <code>g++</code>)
calls. You can do this for all the rules in the <code>Makefile</code> with:</p>
<pre><code>make CC="gcc -v" CXX="g++ -v"
</code></pre>
<p>On my system, that shows <code>g++</code> using <code>/lib64/ld-linux-x86-64.so.2</code>
for dynamic linking. On my system, C++ seems to require at least some
dynamic linkning, but a simple C program like <code>simple.c</code> can be
linked statically. For static linking, <code>gcc</code> uses <code>collect2</code>.</p>
<h1>Symbols in object files</h1>
<p>Sometimes you'll want to take a look at the symbols exported and
imported by your code, since there can be <a href="http://blog.flameeyes.eu/2008/02/09/flex-and-linking-conflicts-or-a-possible-reason-why-php-and-recode-are-so-crashy">subtle bugs</a> if you
link two sets of code that use the same symbol for different purposes.
You can use <code>nm</code> to inspect the intermediate object files. I've saved
the command line in the <code>Makefile</code>:</p>
<pre><code>$ make inspect-object-files
nm -Pg hello_world.o print_hello_world.o hello_world_string.o
hello_world.o:
_Z17print_hello_worldv U
main T 0000000000000000 0000000000000010
print_hello_world.o:
_Z17print_hello_worldv T 0000000000000000 0000000000000027
_ZNSolsEPFRSoS_E U
_ZNSt8ios_base4InitC1Ev U
_ZNSt8ios_base4InitD1Ev U
_ZSt4cout U
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
__cxa_atexit U
__dso_handle U
hello_world_string U
hello_world_string.o:
hello_world_string R 0000000000000010 0000000000000008
</code></pre>
<p>The output format for <code>nm</code> is described in its man page. With the
<code>-g</code> option, output is restricted to globally visible symbols. With
the <code>-P</code> option, each symbol line is:</p>
<pre><code><symbol> <type> <offset-in-hex> <size-in-hex>
</code></pre>
<p>For example, we see that <code>hello_world.o</code> defines a global text
symbol <code>main</code> with at position 0 with a size of 0x10. This is where
the loader will start execution.</p>
<p>We also see that <code>hello_world.o</code> needs (i.e. “has an undefineed symbol
for”) <code>_Z17print_hello_worldv</code>. This means that, in order to run,
<code>hello_world.o</code> must be linked against something else which provides
that symbol. The symbol is for our <code>print_hello_world</code> function. The
<code>_Z17</code> prefix and <code>v</code> postfix are a result of <a href="http://en.wikipedia.org/wiki/Name_mangling">name
mangling</a>, and depend on the compiler used and function
signature. Moving on, we see that <code>print_hello_world.o</code> defines the
<code>_Z17print_hello_worldv</code> at position 0 with a size of 0x27. So
linking <code>print_hello_world.o</code> with <code>hello_world.o</code> would resolve the
symbols needed by <code>hello_world.o</code>.</p>
<p><code>print hello_world.o</code> has undefined symbols of its own, so we can't
stop yet. It needs <code>hello_world_string</code> (provided by
<code>hello_world_string.o</code>), <code>_ZSt4cout</code> (provided by <code>libcstd++</code>),
….</p>
<p>The process of linking involves bundling up enough of these partial
code chunks so that each of them has access to the symbols it needs.</p>
<p>There are a number of other tools that will let you poke into the
innards of object files. If <code>nm</code> doesn't scratch your itch, you may
want to look at the more general <code>objdump</code>.</p>
<h1>Storage classes</h1>
<p>In the previous section I mentioned “globally visible symbols”. When
you declare or define a symbol (variable, function, …), you can use
<a href="http://ee.hawaii.edu/~tep/EE160/Book/chap14/section2.1.1.html">storage classes</a> to tell the compiler about your
symbols' <em>linkage</em> and <em>storage duration</em>.</p>
<p>For more details, you can read through <em>§6.2.2 Linkages of
identifiers</em>, <em>§6.2.4 Storage durations of objects</em>, and <em>§6.7.1
Storage-class specifiers</em> in <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf">WG14/N1570</a>, the last public
version of <a href="http://www.open-std.org/jtc1/sc22/wg14/">ISO/IEC 9899:2011</a> (i.e. the C11 standard).</p>
<p>Since we're just worried about linking, I'll leave the discussion of
storage duration to others. With linkage, you're basically deciding
which of the symbols you define in your translation unit should be
visible from other translation units. For example, in
<code>print_hello_world.h</code>, we declare that there is a function
<code>print_hello_world</code> (with a particular signature). The <code>extern</code>
means that may be defined in another translation unit. For
block-level symbols (i.e. things defined in the root level of your
source file, not inside functions and the like), this is the default;
writing <code>extern</code> just makes it explicit. When we define the
function in <code>print_hello_world.cpp</code>, we also label it as <code>extern</code>
(again, this is the default). This means that the defined symbol
should be exported for use by other translation units.</p>
<p>By way of comparison, the string <code>secret_string</code> defined in
<code>hello_world_string.cpp</code> is declared <code>static</code>. This means that
the symbol should be restricted to that translation unit. In other
words, you won't be able to access the value of <code>secret_string</code> from
<code>print_hello_world.cpp</code>.</p>
<p>When you're writing a library, it is best to make any functions that
you don't <em>need</em> to export <code>static</code> and to <a href="http://c2.com/cgi/wiki?GlobalVariablesAreBad">avoid global variables
altogether</a>.</p>
<h1>Static libraries</h1>
<p>You never want to code <em>everything</em> required by a program on your own.
Because of this, people package related groups of functions into
libraries. Programs can then take use functions from the library, and
avoid coding that functionality themselves. For example, you could
consider <code>print_hello_world.o</code> and <code>hello_world_string.o</code> to be
little libraries used by <code>hello_world.o</code>. Because the two object
files are so tightly linked, it would be convenient to bundle them
together in a single file. This is what static libraries are, bundles
of object files. You can create them using <code>ar</code> (from “archive”;
<code>ar</code> is the ancestor of <code>tar</code>, from “tape archive”).</p>
<p>You can use <code>nm</code> to list the symbols for static libraries exactly as
you would for object files:</p>
<pre><code>$ make inspect-static-library
nm -Pg libhello_world.a
libhello_world.a[print_hello_world.o]:
_Z17print_hello_worldv T 0000000000000000 0000000000000027
_ZNSolsEPFRSoS_E U
_ZNSt8ios_base4InitC1Ev U
_ZNSt8ios_base4InitD1Ev U
_ZSt4cout U
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
__cxa_atexit U
__dso_handle U
hello_world_string U
libhello_world.a[hello_world_string.o]:
hello_world_string R 0000000000000010 0000000000000008
</code></pre>
<p>Notice that nothing has changed from the object file output, except
that object file names like <code>print_hello_world.o</code> have been replaced
by <code>libhello_world.a[print_hello_world.o]</code>.</p>
<h1>Shared libraries</h1>
<p>Library code from static libraries (and object files) is built into
your executable at link time. This means that when the library is
updated in the future (bug fixes, extended functionality, …), you'll
have to relink your program to take advantage of the new features.
Because no body wants to recompile an entire system when someone makes
<code>cout</code> a bit more efficient, people developed shared libraries. The
code from shared libraries is never built into your executable.
Instead, instructions on how to find the dynamic libraries are built
in. When you run your executable, a loader finds all the shared
libraries your program needs and copies the parts you need from the
libraries into your program's memory. This means that when a system
programmer improves <code>cout</code>, your program will use the new version
automatically. This is a Good Thing™.</p>
<p>You can use <code>ldd</code> to list the shared libraries your program needs:</p>
<pre><code>$ make list-executable-shared-libraries
ldd hello_world
linux-vdso.so.1 => (0x00007fff76fbb000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.5.3/libstdc++.so.6 (0x00007ff7467d8000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff746555000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff74633e000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff745fb2000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff746ae7000)
</code></pre>
<p>The format is:</p>
<pre><code>soname => path (load address)
</code></pre>
<p>You can also use <code>nm</code> to list symbols for shared libraries:</p>
<pre><code>$ make inspect-shared-libary | head
nm -Pg --dynamic libhello_world.so
_Jv_RegisterClasses w
_Z17print_hello_worldv T 000000000000098c 0000000000000034
_ZNSolsEPFRSoS_E U
_ZNSt8ios_base4InitC1Ev U
_ZNSt8ios_base4InitD1Ev U
_ZSt4cout U
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ U
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc U
__bss_start A 0000000000201030
__cxa_atexit U
__cxa_finalize w
__gmon_start__ w
_edata A 0000000000201030
_end A 0000000000201048
_fini T 0000000000000a58
_init T 0000000000000810
hello_world_string D 0000000000200dc8 0000000000000008
</code></pre>
<p>You can see our <code>hello_world_string</code> and <code>_Z17print_hello_worldv</code>,
along with the undefined symbols like <code>_ZSt4cout</code> that our code
needs. There are also a number of symbols to help with the shared
library mechanics (e.g. <code>_init</code>).</p>
<p>To illustrate the “link time” vs. “load time” distinction, run:</p>
<pre><code>$ make run
./hello_world
Hello, World!
./hello_world-static
Hello, World!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Hello, World!
</code></pre>
<p>Then switch to the <code>Goodbye</code> definition in
<code>hello_world_string.cpp</code>:</p>
<pre><code>//extern const char * const hello_world_string = "Hello, World!";
extern const char * const hello_world_string = "Goodbye!";
</code></pre>
<p>Recompile the libraries (but not the executables) and run again:</p>
<pre><code>$ make libs
…
$ make run
./hello_world
Hello, World!
./hello_world-static
Hello, World!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Goodbye!
</code></pre>
<p>Finally, relink the executables and run again:</p>
<pre><code>$ make
…
$ make run
./hello_world
Goodbye!
./hello_world-static
Goodbye!
LD_LIBRARY_PATH=. ./hello_world-dynamic
Goodbye!
</code></pre>
<p>When you have many packages depending on the same low-level libraries,
the savings on avoided rebuilding is large. However, shared libraries
have another benefit over static libraries: shared memory.</p>
<p>Much of the machine code in shared libraries is static (i.e. it
doesn't change as a program is run). Because of this, several
programs may share the same in-memory version of a library without
stepping on each others toes. With statically linked code, each
program has its own in-memory version:</p>
<table class="center" style="border-spacing: 40px 0px">
<thead>
<tr><th>Static</th><th>Shared</th></tr>
</thead>
<tbody>
<tr><td>Program A → Library B</td><td>Program A → Library B</td></tr>
<tr><td>Program C → Library B</td><td>Program C ⎯⎯⎯⎯⬏</td></tr>
</tbody>
</table>
<!--
Unicode characters used in above figure:
U2192 RIGHTWARDS ARROW
U23AF HORIZONTAL LINE EXTENSION
U2B0F RIGHTWARDS ARROW WITH TIP UPWARDS
-->
<h1>Further reading</h1>
<p>If you're curious about the details on loading and shared libraries,
<a href="http://eli.thegreenplace.net/">Eli Bendersky</a> has a nice series of articles on <a href="http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/">load time
relocation</a>, <a href="http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/">PIC on x86</a>, and <a href="http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/">PIC on
x86-64</a>.</p>
<!--
Wulf and Shaw, Global variable considered harmful
http://dx.doi.org/10.1145%2F953353.953355
-->
Screenhttp://blog.tremily.us//posts/Screen/2012-07-08T17:03:30Z2012-03-10T14:04:04Z
<p><a href="http://www.gnu.org/software/screen/">Screen</a> is a <a href="http://www.gnu.org/software/ncurses/">ncurses</a>-based terminal multiplexer. There are
tons of useful things you can do with it, and innumerable blog posts
describing them. I have two common use cases:</p>
<ul>
<li>On my local host when I don't start X Windows, I login to a
virtual terminal and run <code>screen</code>. Then I can easily open several
windows (e.g. for <a href="http://blog.tremily.us//tags/linux/../../posts/Emacs/">Emacs</a>, <a href="http://blog.tremily.us//tags/linux/../../posts/Mutt/">Mutt</a>, <a href="http://blog.tremily.us//tags/linux/../../posts/irssi/">irssi</a>, …) without having
to log in on another virtual terminal.</li>
<li>On remote hosts when I'm doing anything serious, I start <code>screen</code>
immediately aftering <a href="http://blog.tremily.us//tags/linux/../../posts/SSH/">SSH</a>-ing into the remote host. Then if my
connection is dropped (or I need to disconnect while I take the
train in to work), my remote work is waiting for me to pick up where
I left off.</li>
</ul>
<h1>Treehouse X</h1>
<p>Those are useful things, but they are well covered by others. A few
days ago I though of a cute trick, for increasing security on my local
host, which lead me to finally write up a <code>screen</code> post. I call it
“treehouse X”. Here's the problem:</p>
<p>You don't like waiting for X to start up when a virtual terminal is
sufficient for your task at hand, so you've set your box up without a
graphical login manager. However, sometimes you <em>do</em> need a graphical
interface (e.g. to use fancy characters via <a href="http://blog.tremily.us//tags/linux/../../posts/Xmodmap/">Xmodmap</a> or the
<a href="http://blog.tremily.us//tags/linux/../../posts/Compose_key/">Compose key</a>), so you fire up X with <code>startx</code>, and get on with your
life. But wait! You have to leave the terminal to do something else
(e.g. teach a class, eat dinner, sleep?). Being a security-concious
bloke, you lock your screen with <a href="http://www.tux.org/~bagleyd/xlockmore.html">xlockmore</a> (using your <a href="http://blog.tremily.us//tags/linux/../../posts/Fluxbox/">Fluxbox</a>
<a href="http://git.tremily.us/?p=dotfiles-public.git;a=blob;f=src/.fluxbox/keys;hb=HEAD">hotkeys</a>). You leave to complete your task. While you're gone
Mallory sneaks into your lab. You've locked your X server, so you
think you're safe, but Mallory jumps to the virtual terminal from
which you started X (using <code>Ctrl-Alt-F1</code>, or similar), and kills your
<code>startx</code> process with <code>Ctrl-c</code>. Now Mallory can do evil things in
your name, like adding <code>export EDITOR=vim</code> to your <code>.bashrc</code>.</p>
<p>So how do you protect yourself against this attack? Enter <code>screen</code>
and treehouse X. If you run <code>startx</code> from within a <code>screen</code> session,
you can jump back to the virtual terminal yourself, detach from the
sesion, and log out of the virtual terminal. This is equivalent to
climing into your treehouse (X) and pulling up your rope ladder
(<code>startx</code>) behind you, so that you are no longer vulnerable from the
ground (the virtual terminal). For kicks, you can reattach to the
screen session from an <code>xterm</code>, which leads to a fun chicken-and-egg
picture:</p>
<table class="img"><caption>startx → X → Xterm → Screen → startx cycle</caption><tr><td><a href="http://blog.tremily.us//tags/linux/../../posts/Screen/treehouse-X.png"><img src="http://blog.tremily.us//tags/linux/../../posts/Screen/treehouse-X.png" width="884" height="645" alt="startx → X → Xterm → Screen → startx cycle" class="img" /></a></td></tr></table>
<p>Of course the whole situation makes sense when you realize that it's
really:</p>
<pre><code>$ pstree 14542
screen───bash───startx───xinit─┬─X
└─fluxbox───xterm───bash───screen
</code></pre>
<p>where the first <code>screen</code> is the server and the second <code>screen</code> is the
client.</p>