Caveat: I have much more Linux experience than I had when I wrote this HOWTO. If I had to do it again, I'd probably use Gentoo instead of Debian, because I find it easier to write custom packages for Gentoo. Anyhow, here's my original HOWTO, preserved for posterity.
Overview
This HOWTO details the procedure I used to set up the abax cluster for NFS-rooted network booting. The system is useful in this case because it centralizes the installation in the head node (server), which makes maintaining, upgrading, or altering the computational nodes (clients) easier.
This procedure follows mainly Tim Brom's Microwulf configuration notes with two major differences.
- Microwulf uses Ubuntu (gutsy?), and I'm using Debian etch.
- Microwulf has a seperate partition for each client's root, populated with an independent installation from CD. I'm using a single partition for all of my clients, with the base system created using debootstrap (no CD).
For guidance in my deviations, I'm indebted to Bart Trojanowski's pxeboot and nfsroot notes and Falko Timme's notes on kernel compilation in Debian Etch.
Physical setup
Our cluster has one server with eight clients. The server has two
network cards, eth0
and eth1
. eth1
is connected to the outsize
world (WAN). All of the clients have one network card, eth0
. All
of the eth0
s are connected together through a gigabit switch (LAN).
Notation
Throughout this HOWTO, I will use #
as the prompt for root, $
as
the prompt for an unpriveledged user, and chroot#
as the prompt for
a root in a chroot
ed environment. File contents will be listed with
the full path in the text introducing the listing. For example,
path/to/file
:
Contents of file
All files are complete with the exception of lines containing …
, in
which case the meaning of the example should be clear from the
context.
Basic server setup
Installing the OS
Boot the server with the Debian installation kernel following one of
the options in the Debian installation guide. I netbooted
my server from one of the client nodes following this procedure to set
up the DHCP and TFTP servers on the client and untarring
netboot.tar.gz in my tftpboot
directory. After netbooting from
a client, don't forget to take that client down so you won't have
DHCP conflicts once you set up a DHCP server on your server.
Install Debian in whatever manner seems most appropriate to you. I partitioned my 160 GB drive manually according to
Mount point | Type | Size |
---|---|---|
`/` | ext3 | 280 MB |
`/usr` | ext3 | 20 GB |
`/var` | ext3 | 20 GB |
`/swap` | swap | 1 GB |
`/tmp` | ext3 | 5 GB |
`/diskless` | ext3 | 20 GB |
`/home` | ext3 | 93.7 GB |
I went with a highly partitioned drive to ease mounting, since I will be sharing some partitions with my clients. To understand why partitioning is useful, see the Partition HOWTO.
You can install whichever packages you like, but I went with just the standard set (no Desktop, Server, etc.). You can adjust your installation later with any of (not an exhaustive list)
tasksel
, command line, coarse-grained package control.apt-get
, command line, fine-grained package control.aptitude
, curses frontend forapt-get
.synaptic
, gtk+ frontend forapt-get
.dpkg
, command line, package-management without dependency checking.
The base install is pretty bare, but I don't need a full blown desktop, so I flesh out my system with:
# apt-get install xserver-xorg fluxbox fluxconf iceweasel xterm xpdf
# apt-get install build-essentials emacs21-nox
which gives me a bare-bones graphical system (fire it up with
startx
) and a bunch of critical build tools (make
, gcc
, etc.).
Configuring networking
We need to set up our server so that eth1
assumes it's appropriate static IP on the WAN, and eth0
assumes it's appropriate static IP on the LAN.
We achieve this by changing the default /etc/network/interfaces
to
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
allow-hotplug eth0 # start on boot & when plugged in
iface eth0 inet static # static LAN interface
address 192.168.2.100
netmask 255.255.255.0
broadcast 192.168.2.255
allow-hotplug eth1 # start on boot & when plugged in
#iface eth1 inet dhcp # WAN DHCP interface (not used)
iface eth1 inet static # WAN static interface
address XXX.XXX.YYY.YYY
netmask 255.255.128.0
broadcast XXX.XXX.127.255
gateway XXX.XXX.ZZZ.ZZZ
where I've censored our external IPs for privacy. The netmask selects
which addresses belong to which networks. The way we've set it up,
all 192.168.2.xxx messages will be routed out eth0
, and everything
else will go through eth1
to it's gateway. See the Net-HOWTO
for more details.
Remote booting
Server services
The clients will boot remotely using the Pre eXecution Environment (PXE). The boot procedure is
- Client powers on.
- Client BIOS comes up, detects attached devices, and looks for a DHCP server for advice on network booting.
- DHCP server gives client an IP address, domain name, host name, the IP address of the TFTP server, and the location of the bootloader on the TFTP server.
- Client gets bootloader from TFTP server.
- BIOS hands over control to bootloader.
- Bootloader gets kernel and initial ramdisk from TFTP server.
- Bootloader hands over control to kernel
- Kernel starts up the system, mounting root via NFS.
- … after this point, it's just like a normal boot process.
We can see that we need to set up DHCP, TFTP, and NFS servers (not necessarily on the same server, but they are in our case).
pxelinux
The pxe bootloader can be obtained with
# apt-get install syslinux
which installs it to /usr/lib/syslinux/pxelinux.0
along with a
manual and some other syslinux
tools.
DHCP
Install a server with
# apt-get install dhcp
Configure the server with /etc/dhcpd.conf
allow bootp; # maybe?
allow booting;# maybe?
option domain-name "your.domain.com";
option domain-name-servers XXX.XXX.XXX.XXX,YYY.YYY.YYY.YYY;
subnet 192.168.2.0 netmask 255.255.255.0 {
range 192.168.2.150 192.168.2.200; # non-static IP range
option broadcast-address 192.168.2.255;
option routers 192.168.2.100; # Gateway server
next-server 192.168.2.100; # TFTP server
filename "pxelinux.0"; # bootloader
host n1 {
hardware ethernet ZZ:ZZ:ZZ:ZZ:ZZ:ZZ;
fixed-address 192.168.2.101;
option root-path "192.168.2.100:/diskless/n1";
option host-name "n1";
}
… more hosts for other client nodes …
}
This assigns the client a static hostname, domain name, and IP address
according to it's ethernet address (aka MAC address). It also tells
all the clients to ask the TFTP server on 192.168.2.100 for the
bootloader pxelinux.0
. For extra fun, it tells the clients to send
packets to the router at 192.168.2.100 if they can't figure out where
they should go, and to use particular DNS servers to resolve domain
names to IP addresses. This gives them access to the outside WAN. I
don't know yet if the booting options are necessary, since I don't
know what they do.
We also need to ensure that the DHCP server only binds to eth0
, since starting a DHCP server on your WAN will make you unpopular with your ISP. You should have the following /etc/default/dhcp
:
INTERFACES="eth0"
Once the DHCP server is configured, you can start it with
# /etc/init.d/dhcp restart
Check that the server is actually up with
# ps -e | grep dhcp
and if it is not, look for error messages in
# grep -i dhcp /var/log/syslog
TFTP
There are several TFTP server packages. We use atftpd
here, but
tftp-hpa
is also popular. Install atftpd
with
# apt-get install atftpd xinetd
where xinetd
is a super-server (replacing inetd
, see man xinetd
for details). Configure atftpd
with /etc/xinetd.d/atftpd
service tftp
{
disable = no
socket_type = dgram
protocol = udp
wait = yes
user = nobody
server = /usr/sbin/in.tftpd
server_args = --tftpd-timeout 300 --retry-timeout 5 --bind-address 192.168.2.100 --mcast-port 1758 --mcast-addr 239.239.239.0-255 --mcast-ttl 1 --maxthread 100 --logfile /var/log/atftpd.log --verbose=10 /diskless/tftpboot
}
Note that the server_args
should all be on a single, long line,
since I haven't been able to discover if xinetd
recognizes escaped
endlines yet. This configuration tells xinetd
to provide TFTP
services by running in.tftpd
(the daemon form of atftpd
) as user
nobody
. Most of the options we pass to in.tftpd
involve
multicasting, which I believe is only used for MTFTP (which
pxelinux.0
doesn't use). --logfile /var/log/atftpd.log
--verbose=10
logs lots of detail to /var/log/atftpd.log
if it
exists. You can create it with
# touch /var/log/atftpd.log
# chown nobody.nogroup /var/log/atftpd.log
The most important argument is /diskless/tftpboot
, which specifies
the root of the TFTP-served filesystem (feel free to pick another
location if you would like). This is where we'll put all the files
that the TFTP will be serving. It needs to be read/writable by
nobody
, so create it with
# mkdir \tftpboot
# chmod 777 tftpboot
(TODO: possibly set the sticky bit, remove writable?)
Finally, we need to restart the xinetd
server so it notices the new
atftpd
server.
# /etc/init.d/xinetd restart
Check that the xinetd
server is up with
# ps -e | grep xinetd
and look for error messages with
# grep -i dhcp /var/log/syslog
Just having xinetd
up cleanly doesn't prove that atftpd
is working
though, it just shows that the atftpd
configuration file wasn't too
bungled. To actually test atftpd
we need to wait until the
Synthesis Section when we actually have files to
test-transfer.
NFS
Install the NFS utilities on the server with
# apt-get install nfs-common nfs-kernel-server
We go with the kernel server because we want fast NFS, since we'll be
doing a lot of it. Set the NFS server up to export the root file
systems and the user's home directories with /etc/exports
:
/diskless/n1 192.168.2.0/24(rw,no_root_squash,sync,no_subtree_check)
… other node root exports …
/diskless 192.168.2.0/24(rw,no_root_squash,sync,no_subtree_check) # unnecessary
/home 192.168.2.0/24(rw,no_root_squash,sync,no_subtree_check)
/usr 192.168.2.0/24(rw,no_root_squash,sync,no_subtree_check)
Then let the NFS server know we've changed the exports
file with
# exportfs -av # TODO: -r?
Test that the NFS server is working properly by ssh
ing onto one of
the clients and running
client# mkdir /mnt/n1
client# mount 192.168.2.100:/diskless/n1 /mnt/n1
client# ls /mnt/n1
… some resonable contents …
client# umount /mnt/n1
client# rmdir /mnt/n1
Client setup
The only client setup that actually happens on the client is changing
the BIOS boot order to preferentially boot from the network. Consult
your motherboard manual for how to accomplish this. It should be
simple once you get into the BIOS menu, which you generally do by
pressing del
, F2
, F12
, or some such early in your boot process.
Everything else happens on the server.
Root file system
We want to install a basic Debian setup on our clients. Since each
client doesn't have it's own, private partition, we need to install
Debian using debootstrap
.
# apt-get install debootstrap
# mkdir /diskless/n1
# debootstrap --verbose --resolve-deps etch /diskless/n1
# chroot /diskless/n1
chroot# tasksel install standard
chroot# dpkg-reconfigure locales
chroot# apt-get install kernel-image-2.6-686 openssh-server nfs-client
TODO: what get's installed with standard?
See /usr/share/tasksel/debian-tasks.desc
for a list of possible
tasks and the debian docs for details on how a full
installation from CD or netboot.
We can also add a few utilities so we can work in our chroot
ed environment
chroot# apt-get install emacs21-nox
Configuring /etc
The client will be getting its hostnames from the DHCP server, so remove the default
# rm /diskless/n1/etc/hostname
We also need to setup the fstab
to mount /home
and /usr
from the
server. In /diskless/n1/etc/fstab
:
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
# automatically mount nfs root and proc through other means
192.168.2.100:/home /home nfs defaults,nolock 0 0
192.168.2.100:/usr /usr nfs defaults,nolock 0 0
# we're diskess so we don't need to mount the hard disk sda :)
#/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
Kernel and initial ramdisk
The kernel version number shows up often in this section. You can
determine your kernel version number (in my case 2.6.18-6-686) with
uname -r
. Because kernel versions change fairly frequently, I'll
use KERNEL_VERSION
to denote the kernel version string.
Your kernel must be compiled with NFS root support if it's going to have an NFS root. You can determine whether your kernel supports NFS roots with
# grep 'ROOT_NFS' /diskless/n1/boot/config-KERNEL-VERSION
I didn't have it in my default debian etch 2.6.18-6-686 kernel, so I
had to recompile my kernel (see the Kernel Appendix and
Falko's notes). My compiled kernel had a version string
2.6.18-custom
.
Most kernels boot using an initial ramdisk (a compressed
root filesytem that lives in RAM). This ramdisk contains the
necessary programs and scripts for booting the kernel. We need to
create a ramdisk that can handle an NFS root, so chroot
into your
client filesystem and install some tools
chroot# apt-get install initramfs-tools
Configure future ramdisks for NFS mounting with
/etc/initramfs-tools/initramfs.conf
:
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
…
BOOT=nfs # was BOOT=local
…
Compile a new initrd
with
chroot# update-initramfs -u
If you compiled your own kernel as in Kernel Appendix
after setting up initramfs.conf
, an appropriate ramdisk should have
been created automatically.
You can examine the contents of your ramdisk with
$ cp /diskless/n1/boot/initrd.img-2.6.18-6-686 initrd.img.gz
$ gunzip initrd.img.gz
$ mkdir initrd
$ cd initrd/
$ cpio -i --make-directories < ../initrd.img
Synthesis
To configure PXE, we need to bring pxelinux.0
into our new
tftpboot
directory
# cp /usr/lib/syslinux/pxelinux.0 /diskless/tftpboot/
We also need to bring in our kernel image and initial ramdisk
# cd /diskless/tftpboot
# ln -s /diskless/n1/boot/initrd.img-2.6.18-custom
# ln -s /diskless/n1/boot/vmlinuz-2.6.18-custom
atftpd
handles the symbolic links, but if your TFTP server doesn't,
you'll have to copy the image and ramdisk over instead.
At this point you should test your TFTP server with test transfers. Install the atftp client
# apt-get install atftp
And attempt to transfer the important files.
$ atftp 192.168.2.100
tftp> status
Connected: 192.168.2.100 port 69
Mode: octet
Verbose: off
Trace: off
Options
tsize: disabled
blksize: disabled
timeout: disabled
multicast: disabled
mtftp variables
client-port: 76
mcast-ip: 0.0.0.0
listen-delay: 2
timeout-delay: 2
Last command: quit
tftp> get pxelinux.0
tftp> get initrd.img-2.6.18-custom
tftp> get vmlinuz-2.6.18-custom
tftp> quit
$ ls -l
…
-rw-r--r-- 1 sysadmin sysadmin 4297523 2008-05-30 09:27 initrd.img-2.6.18-custom
-rw-r--r-- 1 sysadmin sysadmin 13480 2008-05-30 09:26 pxelinux.0
-rw-r--r-- 1 sysadmin sysadmin 1423661 2008-05-30 09:27 vmlinuz-2.6.18-custom
…
If this doesn't work, look for errors in /var/log/syslog
and
/var/log/atftpd.log
and double check your typing in the atftpd
configuration file.
The last stage is to configure the pxelinux.0
bootloader. Create a
configuration directory in tftboot
with
# mkdir /diskless/tftpboot/pxelinux.cfg
When each client loads pxelinux.0
during the boot, they look for a
configuration file in pxelinux.cfg
. The loader runs through a
sequence of possible config file names, as described in
pxelinux.doc
. We'll have different rood directories for each of our
nodes, so we need a seperate config for each of them. In order to
make our configs machine-specific, we'll use the ethernet (MAC)
address file-name scheme. That is, for a machine with MAC address
AA:BB:CC:DD:EE:FF, we make the file
pxelinux.cgf/01-aa-bb-cc-dd-ee-ff
. TODO: base config on IP address.
In /diskless/tftpboot/pxelinux.cfg/01-aa-bb-cc-dd-ee-ff
:
default linux
label linux
kernel vmlinuz-2.6.18-custom
append root=/dev/nfs initrd=initrd.img-2.6.18-custom
nfsroot=192.168.2.100:/diskless/n1,tcp ip=dhcp rw
Note that the append
ed args should all be on a single, long line,
since I haven't been able to discover if pxelinux
recognizes escaped
endlines yet. This file is basically like a grub
or lilo
config
file, and you can get fancy with a whole menu, but since this is a
cluster and not a computer lab, we don't need to worry about that.
Note that this file was only for our first node (n1
). You have to
make copies for each of your nodes, with the appropriate file names
and nfsroot
s.
The kernel options are fairly self explanatory except for the tcp
for the nfsroot
option, which says the client should mount the root
directory using TCP based NFS. Traditional NFS uses UDP, which is
faster, but possibly less reliable for large files (like our kernel
and initrd). However I'm having trouble tracking down a reliable
source for this. For now, consider the tcp
a voodoo incantation to
be attempted if the NFS booting isn't working.
You're done! Plug a monitor into one of the clients and power her up. Everything should boot smoothly off the server, without touching the client's harddrive.
Adding clients
To add a new client node nX
to the cluster, we need to do the
following (which can be combined into an add-client
script). First,
we need to create a root directory for the new client
# cd /diskless/
# cp -rp n1 nX
Now we need to export that directory
# echo '/diskless/nX 192.168.2.0/24(rw,no_root_squash,sync,no_subtree_check)' >> /etc/exports
# exportfs -av
Finally, we need to set up the booting and DHCP options
# cd /diskless/tftpboot
# sed 's/\/diskless\/n1/\/diskless\/nX/' 01-xx-xx-xx-xx-xx-xx > 01-yy-yy-yy-yy-yy-yy
# echo ' host n8 {
hardware ethernet YY:YY:YY:YY:YY:YY;
fixed-address 192.168.2.10X;
option root-path "192.168.2.100:/diskless/nX/";
option host-name "nX";
}' >> /etc/dhcpd.conf
# /etc/init.d/dhcp restart
Appendix
Compiling a kernel
See Falko's notes for an excellent introduction, and the NFS-root mini-HOWTO for NSF root particulars.
First, grab a bunch of useful compilation tools
chroot# apt-get install wget bzip2 kernel-package
chroot# apt-get install libncurses5-dev fakeroot build-essential initramfs-tools
Some of these (e.g. wget
) should already be installed, but apt-get
will realize this, so don't worry about it. Configure initramfs
for
building NFS root-capable initial ramdisks by setting up
/etc/initramfs-tools/initramfs.conf
as explained in the Kernel
Section. For NSF root, your kernel needs the following
options:
`IP_PNP_DHCP`
Networking
→ Networking support (`NET [=y]`)
→ Networking options
→ TCP/IP networking (`INET [=y]`)
→ IP: kernel level autoconfiguration (`IP_PNP =y`)
`ROOT_NFS` (`NET && NFS_FS=y && IP_PNP`)
File systems
→ Network File Systems
I also used the build-in NFS client instead of the module.
Here is a diff
of the original debian etch conf vs. mine:
$ diff /diskless/n1/boot/config-2.6.18-6-686 .config
4c4
< # Sun Feb 10 22:04:18 2008
---
> # Thu May 29 23:59:47 2008
402c402,405
< # CONFIG_IP_PNP is not set
---
> CONFIG_IP_PNP=y
> CONFIG_IP_PNP_DHCP=y
> CONFIG_IP_PNP_BOOTP=y
> CONFIG_IP_PNP_RARP=y
3314c3317
< CONFIG_NFS_FS=m
---
> CONFIG_NFS_FS=y
3325c3328,3329
< CONFIG_LOCKD=m
---
> CONFIG_ROOT_NFS=y
> CONFIG_LOCKD=y
3328c3332
< CONFIG_NFS_ACL_SUPPORT=m
---
> CONFIG_NFS_ACL_SUPPORT=y
3330,3332c3334,3336
< CONFIG_SUNRPC=m
< CONFIG_SUNRPC_GSS=m
< CONFIG_RPCSEC_GSS_KRB5=m
---
> CONFIG_SUNRPC=y
> CONFIG_SUNRPC_GSS=y
> CONFIG_RPCSEC_GSS_KRB5=y
3485c3489
< CONFIG_CRYPTO_DES=m
---
> CONFIG_CRYPTO_DES=y
Compile your shiny, new kernel with
chroot# make-kpkg clean
chroot# fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers
The new kernel packages are in the src
directory
chroot# cd /usr/src
chroot# ls -l
Install the packages with
chroot# dpkg -i linux-image-2.6.18-custom_2.6.18-custom-10.00.Custom_i386.deb
Troubleshooting
No network devices available
Getting
IP-Config: No network devices available.
messages during the boot (after the kernel is successfully loaded!). According to this post, the problem is due to a missing kernel driver.
So I figured out what card I had:
# lspci
…
03:03.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
03:04.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
…
The ethernet HOWTO claimed that the e1000
drivers were
required for Intel gigabit cards, and indeed I had the e1000 module
mounted on my server:
# lsmod | less
…
e1000 108480 0
…
I reconfigured my kernel with (old vs new):
diff .config_mod_e1000 .config
3,4c3,4
< # Linux kernel version: 2.6.18-custom
< # Fri May 30 00:13:47 2008
---
> # Linux kernel version: 2.6.18
> # Fri May 30 22:21:29 2008
1542c1542
< CONFIG_E1000=m
---
> CONFIG_E1000=y
After which I recompiled and reinstalled the kernel as in the Kernel Appendix.
Waiting for /usr/
On booting a client, I noticed a Waiting for /usr/: FAILED
message
just before entering runlevel 2. I attribute the error to a faulty
boot order on the client not mounting it's fstab filesystems before
trying to run something in /usr/
. There don't seem to be any
serious side effects though, since the wait times out, and by the time
I can log in to the node, /usr/
is mounted as it should be.