Teaching pages (Physics and related topics).

Factor analysis

I've been trying to wrap my head around factor analysis as a theory for designing and understanding test and survey results. This has turned out to be another one of those fields where the going has been a bit rough. I think the key factors in making these older topics difficult are:

• “Everybody knows this, so we don't need to write up the details.”
• “Hey, I can do better than Bob if I just tweak this knob…”
• “I'll just publish this seminal paper behind a paywall…”

The resulting discussion ends up being overly complicated, and it's hard for newcomers to decide if people using similar terminology are in fact talking about the same thing.

Some of the better open sources for background has been Tucker and MacCallum's “Exploratory Factor Analysis” manuscript and Max Welling's notes. I'll use Welling's terminology for this discussion.

The basic idea of factor analsys is to model $d$ measurable attributes as generated by $k common factors and $d$ unique factors. With $n=4$ and $k=2$, you get something like: Corresponding to the equation (Welling's eq. 1):

(1)$x=Ay+\mu +\nu$

The independent random variables $y$ are distributed according to a Gaussian with zero mean and unit variance ${𝒢}_{y}\left[0,I\right]$ (zero mean because constant offsets are handled by $\mu$; unit variance because scaling is handled by $A$). The independent random variables $\nu$ are distributed according to ${𝒢}_{\nu }\left[0,\Sigma \right]$, with (Welling's eq. 2):

(2)$\Sigma \equiv \text{diag}\left[{\sigma }_{1}^{2},\dots ,{\sigma }_{d}^{2}\right]$

The matrix $A$ (linking common factors with measured attributes $x\right)\mathrm{is}\mathrm{refered}\mathrm{to}\mathrm{as}\mathrm{the}\mathrm{factor}\mathrm{weights}\mathrm{or}\mathrm{factor}\mathrm{loadings}.\mathrm{Because}\mathrm{the}\mathrm{only}\mathrm{source}\mathrm{of}\mathrm{constant}\mathrm{offset}\mathrm{is}$\mathbf{\mu}$,\mathrm{we}\mathrm{can}\mathrm{calculate}\mathrm{it}\mathrm{by}\mathrm{averaging}\mathrm{out}\mathrm{the}\mathrm{random}\mathrm{noise}\left(\mathrm{Welling}\prime s\mathrm{eq}.6\right):\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(3\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$$\mu =\frac{1}{N}{\sum }_{n=1}^{N}{x}_{n}$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\mathrm{where}$N$\mathrm{is}\mathrm{the}\mathrm{number}\mathrm{of}\mathrm{measurements}\left(\mathrm{survey}\mathrm{responders}\right)\mathrm{and}$\mathbf{x}n$\mathrm{is}\mathrm{the}\mathrm{response}\mathrm{vector}\mathrm{for}\mathrm{the}$n^\text{th}$\mathrm{responder}.\mathrm{How}\mathrm{do}\mathrm{we}\mathrm{find}$\mathbf{A}$\mathrm{and}$\mathbf{\Sigma}$?\mathrm{This}\mathrm{is}\mathrm{the}\mathrm{tricky}\mathrm{bit},\mathrm{and}\mathrm{there}\mathrm{are}a\mathrm{number}\mathrm{of}\mathrm{possible}\mathrm{approaches}.\mathrm{Welling}\mathrm{suggests}\mathrm{using}\mathrm{expectation}\mathrm{maximization}\left(\mathrm{EM}\right),\mathrm{and}\mathrm{there}\prime s\mathrm{an}\mathrm{excellent}\mathrm{example}\mathrm{of}\mathrm{the}\mathrm{procedure}\mathrm{with}a\mathrm{colorblind}\mathrm{experimenter}\mathrm{drawing}\mathrm{colored}\mathrm{balls}\mathrm{in}\mathrm{his}\left[\mathrm{EM}\mathrm{notes}\right]\left[\mathrm{EM}\right]\left(\mathrm{to}\mathrm{test}\mathrm{my}\mathrm{understanding},I\mathrm{wrote}\text{Unknown character}a\mathrm{href}=\text{Unknown character}../../\mathrm{posts}/{\mathrm{Factor}}_{\mathrm{analysis}}/\mathrm{color}-\mathrm{ball}.\mathrm{py}\text{Unknown character}\text{Unknown character}\mathrm{color}-\mathrm{ball}.\mathrm{py}\text{Unknown character}/a\text{Unknown character}\right).\mathrm{To}\mathrm{simplify}\mathrm{calculations},\mathrm{Welling}\mathrm{defines}\left(\mathrm{before}\mathrm{eq}.15\right):\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(4\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$$\begin{array}{rl}A\prime & \equiv \left[A,\mu \right]\\ y\prime & \equiv \left[{y}^{T},1{\right]}^{T}\end{array}$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\mathrm{which}\mathrm{reduce}\mathrm{the}\mathrm{model}\mathrm{to}\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(5\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$$x=A\prime y\prime +\nu$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\mathrm{After}\mathrm{some}\mathrm{manipulation}\mathrm{Welling}\mathrm{works}\mathrm{out}\mathrm{the}\mathrm{maximizing}\mathrm{updates}\left(\mathrm{eq}\prime \mathrm{ns}16\mathrm{and}17\right):\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(6\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$$\begin{array}{rl}A{\prime }^{\text{new}}& =\left({\sum }_{n=1}^{N}{x}_{n}E\left[y\prime \mid {x}_{n}{\right]}^{T}\right){\left({\sum }_{n=1}^{N}{x}_{n}E\left[y\prime y{\prime }^{T}\mid {x}_{n}\right]\right)}^{-1}\\ {\Sigma }^{\text{new}}& =\frac{1}{N}{\sum }_{n=1}^{N}\text{diag}\left[{x}_{n}{x}_{n}^{T}-A{\prime }^{\text{new}}E\left[y\prime \mid {x}_{n}\right]{x}_{n}^{T}\right]\end{array}$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\mathrm{The}\mathrm{expectation}\mathrm{values}\mathrm{used}\mathrm{in}\mathrm{these}\mathrm{updates}\mathrm{are}\mathrm{given}\mathrm{by}\left(\mathrm{Welling}\prime s\mathrm{eq}\prime \mathrm{ns}12\mathrm{and}13\right):\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(7\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$$\begin{array}{rl}E\left[y\mid {x}_{n}\right]& ={A}^{T}\left(A{A}^{T}+\Sigma {\right)}^{-1}\left({x}_{n}-\mu \right)\\ E\left[y{y}^{T}\mid {x}_{n}\right]& =I-{A}^{T}\left(A{A}^{T}+\Sigma {\right)}^{-1}A+E\left[y\mid {x}_{n}\right]E\left[y\mid {x}_{n}{\right]}^{T}\end{array}$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\mathrm{Survey}\mathrm{analysis}===============\mathrm{Enough}\mathrm{abstraction}!\mathrm{Let}\prime s\mathrm{look}\mathrm{at}\mathrm{an}\mathrm{example}:\left[\mathrm{survey}\mathrm{results}\right]\left[\mathrm{survey}\right]:\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{import}\mathrm{numpy}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{scores}=\mathrm{numpy}.\mathrm{genfromtxt}\left(\prime {\mathrm{Factor}}_{\mathrm{analysis}}/\mathrm{survey}.\mathrm{data}\prime ,\mathrm{delimiter}=\prime t\prime \right)\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{scores}\mathrm{array}\left(\left[\left[1.,3.,4.,6.,7.,2.,4.,5.\right],\left[2.,3.,4.,3.,4.,6.,7.,6.\right],\left[4.,5.,6.,7.,7.,2.,3.,4.\right],\left[3.,4.,5.,6.,7.,3.,5.,4.\right],\left[2.,5.,5.,5.,6.,2.,4.,5.\right],\left[3.,4.,6.,7.,7.,4.,3.,5.\right],\left[2.,3.,6.,4.,5.,4.,4.,4.\right],\left[1.,3.,4.,5.,6.,3.,3.,4.\right],\left[3.,3.,5.,6.,6.,4.,4.,3.\right],\left[4.,4.,5.,6.,7.,4.,3.,4.\right],\left[2.,3.,6.,7.,5.,4.,4.,4.\right],\left[2.,3.,5.,7.,6.,3.,3.,3.\right]\right]\right)scores\left[i,j\right]\mathrm{is}\mathrm{the}\mathrm{answer}\mathrm{the}i\mathrm{th}\mathrm{respondent}\mathrm{gave}\mathrm{for}\mathrm{the}j\mathrm{th}\mathrm{question}.\mathrm{We}\prime \mathrm{re}\mathrm{looking}\mathrm{for}\mathrm{underlying}\mathrm{factors}\mathrm{that}\mathrm{can}\mathrm{explain}\mathrm{covariance}\mathrm{between}\mathrm{the}\mathrm{different}\mathrm{questions}.\mathrm{Do}\mathrm{the}\mathrm{question}\mathrm{answers}\left($\mathbf{x}$\right)\mathrm{represent}\mathrm{some}\mathrm{underlying}\mathrm{factors}\left($\mathbf{y}$\right)?\mathrm{Let}\prime s\mathrm{start}\mathrm{off}\mathrm{by}\mathrm{calculating}$\mathbf{\mu}$:\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{def}{\mathrm{print}}_{\mathrm{row}}\left(\mathrm{row}\right):...\mathrm{print}\left(\prime \prime .\mathrm{join}\left(\prime :0.2f\prime .\mathrm{format}\left(x\right)\mathrm{for}x\mathrm{in}\mathrm{row}\right)\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{mu}=\mathrm{scores}.\mathrm{mean}\left(\mathrm{axis}=0\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{row}}\left(\mathrm{mu}\right)2.423.585.085.756.083.423.924.25\mathrm{Next}\mathrm{we}\mathrm{need}\mathrm{priors}\mathrm{for}$\mathbf{A}$\mathrm{and}$\mathbf{\Sigma}$.\text{Unknown character}\mathrm{span}\mathrm{class}=\text{Unknown character}\mathrm{createlink}\text{Unknown character}\text{Unknown character}\mathrm{MDP}\text{Unknown character}/\mathrm{span}\text{Unknown character}\mathrm{has}\mathrm{an}\mathrm{implementation}\mathrm{for}\text{Unknown character}a\mathrm{href}=\text{Unknown character}../../\mathrm{posts}/\mathrm{Python}/\text{Unknown character}\text{Unknown character}\mathrm{Python}\text{Unknown character}/a\text{Unknown character},\mathrm{and}\mathrm{their}\left[\mathrm{FANode}\right]\left[\right]\mathrm{uses}a\mathrm{Gaussian}\mathrm{random}\mathrm{matrix}\mathrm{for}$\mathbf{A}$\mathrm{and}\mathrm{the}\mathrm{diagonal}\mathrm{of}\mathrm{the}\mathrm{score}\mathrm{covariance}\mathrm{for}$\mathbf{\Sigma}$.\mathrm{They}\mathrm{also}\mathrm{use}\mathrm{the}\mathrm{score}\mathrm{covariance}\mathrm{to}\mathrm{avoid}\mathrm{repeated}\mathrm{summations}\mathrm{over}$n$.\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{import}\mathrm{mdp}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{def}{\mathrm{print}}_{\mathrm{matrix}}\left(\mathrm{matrix}\right):...\mathrm{for}\mathrm{row}\mathrm{in}\mathrm{matrix}:...{\mathrm{print}}_{\mathrm{row}}\left(\mathrm{row}\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{fa}=\mathrm{mdp}.\mathrm{nodes}.\mathrm{FANode}\left({\mathrm{output}}_{\mathrm{dim}}=3\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{numpy}.\mathrm{random}.\mathrm{seed}\left(1\right)#\mathrm{for}\mathrm{consistend}\mathrm{doctest}\mathrm{results}\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{responder}}_{\mathrm{scores}}=\mathrm{fa}\left(\mathrm{scores}\right)#\mathrm{common}\mathrm{factors}\mathrm{for}\mathrm{each}\mathrm{responder}\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{matrix}}\left({\mathrm{responder}}_{\mathrm{scores}}\right)-1.92-0.450.000.671.971.960.700.03-2.000.290.03-0.60-1.021.79-1.430.820.27-0.23-0.07-0.080.82-1.38-0.270.480.79-1.170.501.59-0.30-0.410.01-0.480.73-0.46-1.340.18\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{row}}\left(\mathrm{fa}.\mathrm{mu}.\mathrm{flat}\right)2.423.585.085.756.083.423.924.25\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{fa}.\mathrm{mu}.\mathrm{flat}==\mathrm{mu}#\mathrm{MDP}\mathrm{agrees}\mathrm{with}\mathrm{our}\mathrm{earlier}\mathrm{calculation}\mathrm{array}\left(\left[\mathrm{True},\mathrm{True},\mathrm{True},\mathrm{True},\mathrm{True},\mathrm{True},\mathrm{True},\mathrm{True}\right],\mathrm{dtype}=\mathrm{bool}\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{matrix}}\left(\mathrm{fa}.A\right)#\mathrm{factor}\mathrm{weights}\mathrm{for}\mathrm{each}\mathrm{question}0.80-0.06-0.450.170.30-0.650.34-0.13-0.250.13-0.73-0.640.02-0.32-0.700.610.230.860.080.630.59-0.090.670.13\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{row}}\left(\mathrm{fa}.\mathrm{sigma}\right)#\mathrm{unique}\mathrm{noise}\mathrm{for}\mathrm{each}\mathrm{question}0.040.020.380.550.300.050.480.21\mathrm{Because}\mathrm{the}\mathrm{covariance}\mathrm{is}\mathrm{unaffected}\mathrm{by}\mathrm{the}\mathrm{rotation}$\mathbf{A}\rightarrow\mathbf{A}\mathbf{R}$,\mathrm{the}\mathrm{estimated}\mathrm{weights}$\mathbf{A}$\mathrm{and}\mathrm{responder}\mathrm{scores}$\mathbf{y}$\mathrm{can}\mathrm{be}\mathrm{quite}\mathrm{sensitive}\mathrm{to}\mathrm{the}\mathrm{seed}\mathrm{priors}.\mathrm{The}\mathrm{width}$\mathbf{\Sigma}$\mathrm{of}\mathrm{the}\mathrm{unique}\mathrm{noise}$\mathbf{\nu}$\mathrm{is}\mathrm{more}\mathrm{robust},\mathrm{because}$\mathbf{\Sigma}$\mathrm{is}\mathrm{unaffected}\mathrm{by}\mathrm{rotations}\mathrm{on}$\mathbf{A}$.\mathrm{Related}\mathrm{tidbits}===============\mathrm{Communality}-----------\mathrm{The}\left[\mathrm{communality}\right]\left[\right]$hi^2$\mathrm{of}\mathrm{the}$i^\text{th}$\mathrm{measured}\mathrm{attribute}$x_i$\mathrm{is}\mathrm{the}\mathrm{fraction}\mathrm{of}\mathrm{variance}\mathrm{in}\mathrm{the}\mathrm{measured}\mathrm{attribute}\mathrm{which}\mathrm{is}\mathrm{explained}\mathrm{by}\mathrm{the}\mathrm{set}\mathrm{of}\mathrm{common}\mathrm{factors}.\mathrm{Because}\mathrm{the}\mathrm{common}\mathrm{factors}$\mathbf{y}$\mathrm{have}\mathrm{unit}\mathrm{variance},\mathrm{the}\mathrm{communality}\mathrm{is}\mathrm{given}\mathrm{by}:\text{Unknown character}\mathrm{div}\mathrm{class}=\text{Unknown character}\mathrm{numberedEq}\text{Unknown character}\text{Unknown character}\text{Unknown character}\mathrm{span}\text{Unknown character}\left(8\right)\text{Unknown character}/\mathrm{span}\text{Unknown character}$${h}_{i}=\frac{{\sum }_{j=1}^{k}{A}_{\mathrm{ij}}^{2}}{{\sum }_{j=1}^{k}{A}_{\mathrm{ij}}^{2}+{\sigma }_{1}^{2}}$$\text{Unknown character}/\mathrm{div}\text{Unknown character}\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{factor}}_{\mathrm{variance}}=\mathrm{numpy}.\mathrm{array}\left(\left[\mathrm{sum}\left(\mathrm{row}2\right)\mathrm{for}\mathrm{row}\mathrm{in}\mathrm{fa}.A\right]\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}h=\mathrm{numpy}.\mathrm{array}\left(...\left[\mathrm{var}/\left(\mathrm{var}+\mathrm{sig}\right)\mathrm{for}\mathrm{var},\mathrm{sig}\mathrm{in}\mathrm{zip}\left({\mathrm{factor}}_{\mathrm{variance}},\mathrm{fa}.\mathrm{sigma}\right)\right]\right)\text{Unknown character}\text{Unknown character}\text{Unknown character}{\mathrm{print}}_{\mathrm{row}}\left(h\right)0.950.970.340.640.660.960.610.69\mathrm{There}\mathrm{may}\mathrm{be}\mathrm{some}\mathrm{scaling}\mathrm{issues}\mathrm{in}\mathrm{the}\mathrm{communality}\mathrm{due}\mathrm{to}\mathrm{deviations}\mathrm{between}\mathrm{the}\mathrm{estimated}$\mathbf{A}$\mathrm{and}\Sigma$ and the variations contained in the measured scores (why?):

>>> print_row(factor_variance + fa.sigma)
0.89   0.56   0.57   1.51   0.89   1.21   1.23   0.69
>>> print_row(scores.var(axis=0, ddof=1))  # total variance for each question
0.99   0.63   0.63   1.66   0.99   1.36   1.36   0.75


The proportion of total variation explained by the common factors is given by:

(9)$\frac{\sum _{i=1}^{k}{h}_{i}}{}$

## Varimax rotation

As mentioned earlier, factor analysis generated loadings $A$ that are unique up to an arbitrary rotation $R$ (as you'd expect for a $k$-dimensional Gaussian ball of factors $y$). A number of of schemes have been proposed to simplify the initial loadings by rotating $A$ to reduce off-diagonal terms. One of the more popular approaches is Henry Kaiser's varimax rotation (unfortunately, I don't have access to either his thesis or the subsequent paper). I did find (via Wikipedia) Trevor Park's notes which have been very useful.

The idea is to iterate rotations to maximize the raw varimax criterion (Park's eq. 1):

(10)$V\left(A\right)=\sum _{j=1}^{k}\left(\frac{1}{d}\sum _{i=1}^{d}{A}_{\mathrm{ij}}^{4}-{\left(\frac{1}{d}\sum _{i=1}^{d}{A}_{\mathrm{ij}}^{4}\right)}^{2}\right)$

Rather than computing a $k$-dimensional rotation in one sweep, we'll iterate through 2-dimensional rotations (on successive column pairs) until convergence. For a particular column pair $\left(p,q\right)$, the rotation matrix ${R}^{*}$ is the usual rotation matrix:

(11)${R}^{*}=\left(\begin{array}{cc}\mathrm{cos}\left({\varphi }^{*}\right)& -\mathrm{sin}\left({\varphi }^{*}\right)\\ \mathrm{sin}\left({\varphi }^{*}\right)& \mathrm{cos}\left({\varphi }^{*}\right)\end{array}\right)$

where the optimum rotation angle ${\varphi }^{*}$ is (Park's eq. 3):

(12)${\varphi }^{*}=\frac{1}{4}\angle \left(\frac{1}{d}\sum _{j=1}^{d}{\left({A}_{\mathrm{jp}}+{\mathrm{iA}}_{\mathrm{jq}}\right)}^{4}-{\left(\frac{1}{d}\sum _{j=1}^{d}{\left({A}_{\mathrm{jp}}+{\mathrm{iA}}_{\mathrm{jq}}\right)}^{2}\right)}^{2}\right)$

where $i\equiv \sqrt{-1}$.

# Nomenclature

${A}_{\mathrm{ij}}$
The element from the ${i}^{\text{th}}$ row and ${j}^{\text{th}}$ column of a matrix $A$. For example here is a 2-by-3 matrix terms of components:
(13)$A=\left(\begin{array}{ccc}{A}_{11}& {A}_{12}& {A}_{13}\\ {A}_{21}& {A}_{22}& {A}_{23}\end{array}\right)$
${A}^{T}$
The transpose of a matrix (or vector) $A$. ${A}_{\mathrm{ij}}^{T}={A}_{\mathrm{ji}}$
${A}^{-1}$
The inverse of a matrix $A$. ${A}^{-1}\stackrel{˙}{A}=1$
$\text{diag}\left[A\right]$
A matrix containing only the diagonal elements of $A$, with the off-diagonal values set to zero.
$E\left[f\left(x\right)\right]$
Expectation value for a function $f$ of a random variable $x$. If the probability density of $x$ is $p\left(x\right)$, then $E\left[f\left(x\right)\right]=\int dxp\left(x\right)f\left(x\right)$. For example, $E\left[p\left(x\right)\right]=1$.
$\mu$
The mean of a random variable $x$ is given by $\mu =E\left[x\right]$.
$\Sigma$
The covariance of a random variable $x$ is given by $\Sigma =E\left[\left(x-\mu \right)\left(x-\mu {\right)}^{T}\right]$. In the factor analysis model discussed above, $\Sigma$ is restricted to a diagonal matrix.
${𝒢}_{x}\left[\mu ,\Sigma \right]$
A Gaussian probability density for the random variables $x$ with a mean $\mu$ and a covariance $\Sigma$.
(14)${𝒢}_{x}\left[\mu ,\Sigma \right]=\frac{1}{\left(2\pi {\right)}^{\frac{D}{2}}\sqrt{\mathrm{det}\left[\Sigma \right]}}{e}^{-\frac{1}{2}\left(x-\mu {\right)}^{T}{\Sigma }^{-1}\left(x-\mu \right)}$
$p\left(y\mid x\right)$
Probability of $y$ occurring given that $x$ occured. This is commonly used in Bayesian statistics.
$p\left(x,y\right)$
Probability of $y$ and $x$ occuring simultaneously (the joint density). $p\left(x,y\right)=p\left(x\mid y\right)p\left(y\right)$
$\angle \left(z\right)$
The angle of $z$ in the complex plane. $\angle \left({\mathrm{re}}^{i\theta }\right)=\theta$.

Note: if you have trouble viewing some of the more obscure Unicode used in this post, you might want to install the STIX fonts.

Posted
catalyst

Available in a git repository.
Repository: catalyst-swc
Browsable repository: catalyst-swc
Author: W. Trevor King

Catalyst is a release-building tool for Gentoo. If you use Gentoo and want to roll your own live CD or bootable USB drive, this is the way to go. As I've been wrapping my head around catalyst, I've been pushing my notes upstream. This post builds on those notes to discuss the construction of a bootable ISO for Software Carpentry boot camps.

# Getting a patched up catalyst

Catalyst has been around for a while, but the user base has been fairly small. If you try to do something that Gentoo's Release Engineering team doesn't do on a regular basis, built in catalyst support can be spotty. There's been a fair amount of patch submissions an gentoo-catalyst@ recently, but patch acceptance can be slow. For the SWC ISO, I applied versions of the following patches (or patch series) to 37540ff:

# Configuring catalyst

The easiest way to run catalyst from a Git checkout is to setup a local config file. I didn't have enough hard drive space on my local system (~16 GB) for this build, so I set things up in a temporary directory on an external hard drive:

$cat catalyst.conf | grep -v '^#\|^$'
digests="md5 sha1 sha512 whirlpool"
contents="auto"
distdir="/usr/portage/distfiles"
envscript="/etc/catalyst/catalystrc"
hash_function="crc32"
options="autoresume kerncache pkgcache seedcache snapcache"
portdir="/usr/portage"
sharedir="/home/wking/src/catalyst"
snapshot_cache="/mnt/d/tmp/catalyst/snapshot_cache"
storedir="/mnt/d/tmp/catalyst"


I used the default values for everything except sharedir, snapshot_cache, and storedir. Then I cloned the catalyst-swc repository into /mnt/d/tmp/catalyst.

# Portage snapshot and a seed stage

Take a snapshot of the current Portage tree:

# catalyst -c catalyst.conf --snapshot 20130208


# wget -O /mnt/d/tmp/catalyst/builds/default/stage3-i686-20121213.tar.bz2 \
>   http://distfiles.gentoo.org/releases/x86/current-stage3/stage3-i686-20121213.tar.bz2


# Building the live CD

# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage1-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage2-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-stage3-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-livecd-stage1-i686-2013.1.spec
# catalyst -c catalyst.conf -f /mnt/d/tmp/catalyst/spec/default-livecd-stage2-i686-2013.1.spec


# isohybrid

To make the ISO bootable from a USB drive, I used isohybrid:

# cp swc-x86.iso swc-x86-isohybrid.iso
# isohybrid iso-x86-isohybrid.iso


You can install the resulting ISO on a USB drive with:

# dd if=iso-x86-isohybrid.iso of=/dev/sdX


replacing replacing X with the appropriate drive letter for your USB drive.

With versions of catalyst after d1c2ba9, the isohybrid call is built into catalysts ISO construction.

Posted
SymPy

SymPy is a Python library for symbolic mathematics. To give you a feel for how it works, lets extrapolate the extremum location for $f\left(x\right)$ given a quadratic model:

(1)$f\left(x\right)=A{x}^{2}+Bx+C$

and three known values:

(2)$\begin{array}{rl}f\left(a\right)& =A{a}^{2}+Ba+C\\ f\left(b\right)& =A{b}^{2}+Bb+C\\ f\left(c\right)& =A{c}^{2}+Bc+C\end{array}$

Rephrase as a matrix equation:

(3)$\left(\begin{array}{c}f\left(a\right)\\ f\left(b\right)\\ f\left(c\right)\end{array}\right)=\left(\begin{array}{ccc}{a}^{2}& a& 1\\ {b}^{2}& b& 1\\ {c}^{2}& c& 1\end{array}\right)\cdot \left(\begin{array}{c}A\\ B\\ C\end{array}\right)$

So the solutions for $A$, $B$, and $C$ are:

(4)$\left(\begin{array}{c}A\\ B\\ C\end{array}\right)={\left(\begin{array}{ccc}{a}^{2}& a& 1\\ {b}^{2}& b& 1\\ {c}^{2}& c& 1\end{array}\right)}^{-1}\cdot \left(\begin{array}{c}f\left(a\right)\\ f\left(b\right)\\ f\left(c\right)\end{array}\right)=\left(\begin{array}{c}\text{long}\\ \text{complicated}\\ \text{stuff}\end{array}\right)$

Now that we've found the model parameters, we need to find the $x$ coordinate of the extremum.

(5)$\frac{\mathrm{d}f}{\mathrm{d}x}=2Ax+B\phantom{\rule{thickmathspace}{0ex}},$

which is zero when

(6)$\begin{array}{rl}2Ax& =-B\\ x& =\frac{-B}{2A}\end{array}$

Here's the solution in SymPy:

>>> from sympy import Symbol, Matrix, factor, expand, pprint, preview
>>> a = Symbol('a')
>>> b = Symbol('b')
>>> c = Symbol('c')
>>> fa = Symbol('fa')
>>> fb = Symbol('fb')
>>> fc = Symbol('fc')
>>> M = Matrix([[a**2, a, 1], [b**2, b, 1], [c**2, c, 1]])
>>> F = Matrix([[fa],[fb],[fc]])
>>> ABC = M.inv() * F
>>> A = ABC[0,0]
>>> B = ABC[1,0]
>>> x = -B/(2*A)
>>> x = factor(expand(x))
>>> pprint(x)
2       2       2       2       2       2
a *fb - a *fc - b *fa + b *fc + c *fa - c *fb
---------------------------------------------
2*(a*fb - a*fc - b*fa + b*fc + c*fa - c*fb)
>>> preview(x, viewer='pqiv')


Where pqiv is the executable for pqiv, my preferred image viewer. With a bit of additional factoring, that is:

(7)$x=\frac{{a}^{2}\left[f\left(b\right)-f\left(c\right)\right]+{b}^{2}\left[f\left(c\right)-f\left(a\right)\right]+{c}^{2}\left[f\left(a\right)-f\left(b\right)\right]}{2\cdot \left\{a\left[f\left(b\right)-f\left(c\right)\right]+b\left[f\left(c\right)-f\left(a\right)\right]+c\left[f\left(a\right)-f\left(b\right)\right]\right\}}$
Posted
Open physics text

Since I love both teaching and open source development, I suppose it was only a matter of time before I attempted a survey of open source text books. Here are my notes on the projects I've come across so far:

# Light and Matter

The Light and Matter series is a set of six texts by Benjamin Crowell at Fullerton College in California. The series is aimed at the High School and Biology (i.e. low calc) audience. The source is distributed in LaTeX and versioned in Git. I love this guy!

Crowell also runs a book review site The Assayer, which reviews free books.

Radically Modern Introductory Physics is David J. Raymond's modern-physics-based approach to introductory physics. He posts the LaTeX source, but it does not seem to be version controlled.

# Calculus Based Physics

Calculus Based Physics, by Jeffrey W. Schnick at St. Anselm in New Hampshire. It is under the Creative Commons Attribution-ShareAlike 3.0 License, and the sources are free to alter. However, there is no official version control, and the sources are in MS Word format :(. On the other hand, I wholeheartedly agree with all the objectives Schnick lists in his motivational note.

# Textbook Revolution

Calculus Based Physics' Schnick linked to Textbook Revolution, which immediately gave off good tech vibes with an IRC node (#textbookrevolution). The site is basically a wiki with a browsable list of pointers to open textbooks. The list isn't huge, but it does prominently display copyright information, which makes it easier to separate the wheat from the chaff.

# College Open Textbooks

College Open Textbooks provides another registry of open textbooks with clearly listed license information. They're funded by The William and Flora Hewlett Foundation (of NPR underwriting fame).

# MERLOT's Open Textbook Initiative

The Multimedia Educational Resource for Learning and Online Teaching (MERLOT) is a California-based project that assembles educational resources. They have a large collection of open textbooks in a variety of fields. The Light and Matter series is well represented. Unfortunately, many of the texts seem to be "free as in beer" not "free as in freedom".

# Open Access Textbooks

The Open Access Textbooks project is run by a number of Florida-based groups and funded by the U.S. Department of Education. However, I have grave doubts about any open source project that opens their project discussion with

Numerous issues that impact open textbook implementation (such as creating sustainable review processes and institutional reward structures) have yet to be resolved. The ability to financially sustain a large scale open textbook effort is also in question.

There are zounds of academics with enough knowledge and invested interest in developing an open source textbook. The resources (computers and personal websites) are generally already provided by academic institutions. Just pick a framework (LaTeX, HTML, ...), put the whole thing in Git, and start hacking. The community will take it from there.

# ArXiv

Finally, there are a number of textbooks on arXiv. For example, Siegel's Introduction to string field theory and Fields are posted source and all. The source will probably be good quality, but the licensing information may be unclear.

Posted
Parallel computing

Available in a git repository.
Repository: parallel_computing
Browsable repository: parallel_computing
Author: W. Trevor King

In contrast to my course website project, which is mostly about constructing a framework for automatically compiling and installing LaTeX problem sets, Prof. Vallières' Parallel Computing course is basically an online textbook with a large amount of example software. In order to balance between to Prof. Vallières' original and my own aesthetic, I rolled a new solution from scratch. See my version of his Fall 2010 page for a live example.

Differences from my course website project:

• No PHP, since there is no dynamic content that cannot be handled with SSI.
• Less installation machinery. Only a few build/cleanup scripts to avoid versioning really tedious bits. The repository is designed to be dropped into your ~/public_html/ whole, while the course website project is designed to rsync the built components up as they go live.
• Less LaTeX, more XHTML. It's easier to edit XHTML than it is to exit and compile LaTeX, and PDFs are large and annoying. As a computing class, there are fewer graphics than there are in an intro-physics class, so the extra power of LaTeX is not as useful.
Posted
Course website

Available in a git repository.
Repository: course
Browsable repository: course
Author: W. Trevor King

Over a few years as a TA for assorted introductory physics classes, I've assembled a nice website framework with lots of problems using my LaTeX problempack package, along with some handy Makefiles, a bit of php, and SSI.

The result is the course package, which should make it very easy to whip up a course website, homeworks, etc. for an introductory mechanics or E&M class (431 problems implemented as of June 2012). With a bit of work to write up problems, the framework could easily be extended to other subjects.

The idea is that a course website consists of a small, static HTML framework, and a bunch of content that is gradually filled in as the semester/quarter progresses. I've put the HTML framework in the html/ directory, along with some of the write-once-per-course content (e.g. Prof & TA info). See html/README for more information on the layout of the HTML.

The rest of the directories contain the code for compiling material that is deployed as the course progresses. The announcements/ directory contains the atom feed for the course, and possibly a list of email addresses of people who would like to (or should) be notified when new announcements are posted. The latex/ directory contains LaTeX source for the course documents for which it is available, and the pdf/ directory contains PDFs for which no other source is available (e.g. scans, or PDFs sent in by Profs or TAs who neglected to include their source code).

Note that because this framework assumes the HTML content will be relatively static, it may not be appropriate for courses with large amounts of textbook-style content, which will undergo more frequent revision. It may also be excessive for courses that need less compiled content. For an example of another framework, see my branch of Prof. Vallières' Parallel Computing website.

Posted
problempack

Available in a git repository.
Repository: problempack
Browsable repository: problempack
Author: W. Trevor King

I've put together a LaTeX package problempack to make it easier to write up problem sets with solutions for the classes I TA.

## problempack.sty

The package takes care of a few details:

• Make it easy to compile one pdf with only the problems and another pdf with problems and solutions.
• Define nicely typeset environments for automatically or manually numbered problems.
• Save retyping a few of the parameters (course title, class title, etc), that show up in the note title and also need to go out to pdftitle and pdfsubject.
• Change the page layout to minimize margins (saves paper on printing).
• Set the spacing between problems (e.g. to tweak output to a single page, versions >= 0.2).
• Add section level entries to the table-of-contents and hyperref bookmarks (versions >= 0.3).

The basic idea is to make it easy to write up notes. Just install problempack.sty in your texmf tree, and then use it like I do in the example included in the package. The example produces a simple problem set (probs.pdf) and solution notes (sols.pdf).

For a real world example, look at my Phys 102 notes with and without solutions (source). Other notes produced in this fashion: Phys201 winter 2009, Phys201 spring 2009, and Phys102 summer 2009.

## wtk_cmmds.sty

A related package that defines some useful physics macros (\U, \E, \dg, \vect, \ihat, ...) is my wtk_cmmds.sty. This used to be a part of problempack.sty, but the commands are less general, so I split them out into their own package.

## wtk_format.sty

The final package in the problempack repository is wtk_format.sty, which adjusts the default LaTeX margins to pack more content into a single page.

Posted
Math

I've had a few students confused by this sort of "zooming and chunking" approach to analyzing functions specifically, and technical problems in general, so I'll pass the link on in case you're interested. Curtesy of Charles Wells.

Posted