You can use pdftk to fill out PDF forms (thanks for the inspiration, Joe Rothweiler). The syntax is simple:

$ pdftk input.pdf fill_form data.fdf output output.pdf

where input.pdf is the input PDF containing the form, data.fdf is an FDF or XFDF file containing your data, and output.pdf is the name of the PDF you're creating. The tricky part is figuring out what to put in data.fdf. There's a useful comparison of the Forms Data Format (FDF) and it's XML version (XFDF) in the XFDF specification. XFDF only covers a subset of FDF, so I won't worry about it here. FDF is defined in section 12.7.7 of ISO 32000-1:2008, the PDF 1.7 specification, and it has been in PDF specifications since version 1.2.

Forms Data Format (FDF)

FDF files are basically stripped down PDFs (§12.7.7.1). A simple FDF file will look something like:

%FDF-1.2
1 0 obj<</FDF<</Fields[
<</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
<</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
…
] >> >>
endobj
trailer
<</Root 1 0 R>>
%%EOF

Broken down into the lingo of ISO 32000, we have a header (§12.7.7.2.2):

%FDF-1.2

followed by a body with a single object (§12.7.7.2.3):

1 0 obj<</FDF<</Fields[
<</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
<</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
…
] >> >>
endobj

followed by a trailer (§12.7.7.2.4):

trailer
<</Root 1 0 R>>
%%EOF

Despite the claims in §12.7.7.2.1 that the trailer is optional, pdftk choked on files without it:

$ cat no-trailer.fdf
%FDF-1.2
1 0 obj<</FDF<</Fields[
<</T(Name)/V(Trevor)>>
<</T(Date)/V(2012-09-20)>>
] >> >>
endobj
$ pdftk input.pdf fill_form no-trailer.fdf output output.pdf
Error: Failed to open form data file: 
   data.fdf
   No output created.

Trailers are easy to add, since all they reqire is a reference to the root of the FDF catalog dictionary. If you only have one dictionary, you can always use the simple trailer I gave above.

FDF Catalog

The meat of the FDF file is the catalog (§12.7.7.3). Lets take a closer look at the catalog structure:

1 0 obj<</FDF<</Fields[
…
] >> >>

This defines a new object (the FDF catalog) which contains one key (the /FDF dictionary). The FDF dictionary contains one key (/Fields) and its associated array of fields. Then we close the /Fields array (]), close the FDF dictionary (>>) and close the FDF catalog (>>).

There are a number of interesting entries that you can add to the FDF dictionary (§12.7.7.3.1, table 243), some of which require a more advanced FDF version. You can use the /Version key to the FDF catalog (§12.7.7.3.1, table 242) to specify the of data in the dictionary:

1 0 obj<</Version/1.3/FDF<</Fields[…

Now you can extend the dictionary using table 244. Lets set things up to use UTF-8 for the field values (/V) or options (/Opt):

1 0 obj<</Version/1.3/FDF<</Encoding/utf_8/Fields[
<</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
<</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
…
] >> >>
endobj

pdftk understands raw text in the specified encoding ((…)), raw UTF-16 strings starting with a BOM ((\xFE\xFF…)), or UTF-16BE strings encoded as ASCII hex (<FEFF…>). You can use pdf-merge.py and its --unicode option to find the latter. Support for the /utf_8 encoding in pdftk is new. I mailed a patch to pdftk's Sid Steward and posted a patch request to the underlying iText library. Until those get accepted, you're stuck with the less convenient encodings.

Fonts

Say you fill in some Unicode values, but your PDF reader is having trouble rendering some funky glyphs. Maybe it doesn't have access to the right font? You can see which fonts are embedded in a given PDF using pdffonts.

$ pdffonts input.pdf
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
MMXQDQ+UniversalStd-NewswithCommPi   CID Type 0C       yes yes yes   1738  0
MMXQDQ+ZapfDingbatsStd               CID Type 0C       yes yes yes   1749  0
MMXQDQ+HelveticaNeueLTStd-Roman      Type 1C           yes yes no    1737  0
CPZITK+HelveticaNeueLTStd-BlkCn      Type 1C           yes yes no    1739  0
…

If you don't have the right font for your new data, you can add it using current versions of iText. However, pdftk uses an older version, so I'm not sure how to translate this idea for pdftk.

FDF templates and field names

You can use pdftk itself to create an FDF template, which it does with embedded UTF-16BE (you can see the FE FF BOMS at the start of each string value).

$ pdftk input.pdf generate_fdf output template.fdf
$ hexdump -C template.fdf  | head
00000000  25 46 44 46 2d 31 2e 32  0a 25 e2 e3 cf d3 0a 31  |%FDF-1.2.%.....1|
00000010  20 30 20 6f 62 6a 20 0a  3c 3c 0a 2f 46 44 46 20  | 0 obj .<<./FDF |
00000020  0a 3c 3c 0a 2f 46 69 65  6c 64 73 20 5b 0a 3c 3c  |.<<./Fields [.<<|
00000030  0a 2f 56 20 28 fe ff 29  0a 2f 54 20 28 fe ff 00  |./V (..)./T (...|
00000040  50 00 6f 00 73 00 74 00  65 00 72 00 4f 00 72 00  |P.o.s.t.e.r.O.r.|
…

You can also dump a more human friendly version of the PDF's fields (without any default data):

$ pdftk input.pdf dump_data_fields_utf8 output data.txt
$ cat data.txt
---
FieldType: Text
FieldName: Name
FieldNameAlt: Name:
FieldFlags: 0
FieldJustification: Left
---
FieldType: Text
FieldName: Date
FieldNameAlt: Date:
FieldFlags: 0
FieldJustification: Left
---
FieldType: Text
FieldName: Advisor
FieldNameAlt: Advisor:
FieldFlags: 0
FieldJustification: Left
---
…

If the fields are poorly named, you may have to fill the entire form with unique values and then see which values appeared where in the output PDF (for and example, see codehero's identify_pdf_fields.js).

Conclusions

This would be so much easier if people just used YAML or JSON instead of bothering with PDFs ;).