Over the last few days I've been trying to teach myself enough genetics to reconstruct Carrion-Vazquez's poly-I27 synthesis procedure. I'm not quite there yet, but I feel like I've made enough progress that it's worth posting my notes somewhere public in case they are useful to others.
Overview
We buy our poly-I27 from AthenaES, who market it as I27O™. Perusing their technical brief, makes it clear that I2O7™ corresponds to Carrion-Vazquez's I27RS₈. In Carrion-Vazquez' original paper they describe the synthesis of both I27RS₈ and a variant I27GLG₁₂. Their I27RS₈ procedure is:
- Human cardiac muscle used to generate a cDNA library (Rief 1997)
- cDNA library amplified with PCR
- 5' primer contained a BamHI restriction site that permitted in-frame cloning of the monomer into the expression vector pQE30.
- The 3' primer contained a BglII restriction site, two Cys codons located 3' to the BglII site and in-frame with the I27 domain, and two in-frame stop codons.
- The PCR product was cloned into pUC19 linearized with BamHI and SmaI.
- The 8-domain synthetic gene was constructed by iterative cloning of monomer into monomer, dimer into dimer, and tetramer into tetramer.
- The final construct contained eight direct repeats of the I27 domain, an amino-terminal His tag for purification, and two carboxyl-terminal Cys codons used for covalent attachment to the gold-covered coverslips.
They also give the full-length sequence of I27RS₈:
Met-Arg-Gly-Ser-(His)₆-Gly-Ser-(I27-Arg-Ser)₇-I27-...-Cys-Cys
They point out the Arg-Ser (RS) amino acid sequence is the BglII/BamHI hybrid site, which makes sense.
Back on the Athena site, they have a page describing their procedure (they reference the Carrion-Vazquez paper). They claim to use the restriction enzyme KpnI in addition to BamHI, BglII, and SmaI.
Carrion-Vazquez points to the following references:
- Kempe et al. 1985 (CV16), the source of the multi-step cloning technique.
- Rief et al. (CV10), for I27 subcloning.
Rief
In their note 11, Rief et al. explain their synthesis procedure:
- λ cDNA library
- Titin fragments of interest were amplified by PCR
- cloned into pET 9d
- NH₂-terminal domain boundaries were as in Politou 1996.
- The clones were fused with an NH₂-terminal His₆ tag and a COOH-terminal Cys₂ tag for immobilization on solid surfaces.
which doesn't help me very much.
Kemp
The Kempe article is more informative, focusing entirely on the synthesis procedure (albiet for a different gene). Their figure 2 outlines the general approach, and used the following restriction enzymes: PstI, BamHI, PstI, and BglII. I'll walk through their procedure in detail below.
Genetic code
Wikipedia has a good page on the genetic code for converting between DNA/mRNA codons and amino acids. I've written up a little Python script, mRNAcode.py, to automate the conversion of various sequences, which helped me while I was writing this post. I'm sure there are tons of similar programs out there, so don't feel pressured to use mine ;).
Restriction enzymes
We'll use the following restriction enzymes:
5' G|GATC C 3'
3' C CTAG|G 5'
BglI (N is any nucleotide)
5' GCCN NNN|NGGC 3'
3' CGGN|NNN NCCG 5'
5' A|GATC T 3'
3' T CTAG|A 5'
5' A|AGCT T 3'
3' T TCGA|A 5'
5' G GTAC|C 3'
3' C|CATG G 5'
5' C TGCA|G 3'
3' G|ACGT C 5'
5' CCC|GGG 3'
3' GGG|CCC 5'
Details
Here's my attempt to reconstruct the details of the polymer-cloning reactions, where they splice several copies of I27 into the expression plasmid.
Kempe procedure
Inserted their poly-SP into pHK414 (I haven't been able to find any online sources for pHK414. Kempe cites R.J. Watson et al. Expression of Herpes simplex virus type 1 and type 2 glyco-protein D genes using the Escherichia coli lac promoter. Y. Becker (Ed.), Recombinant DNA Research and Viruses. Nijhoff, The Hague, 1985, pp. 327-352.)
Synthetic SP
HindIII. ,BamHI_.
| | Met Arg Pro Lys Pro Gln Gln Phe Phe Gly Leu Met |
5’ GA AGC TTC ATG CGT CCG AAG CCG CAG CAG TTC TTC GGT CTC ATG GAT CCG
CT TCG AAG TAC GCA GGC TTC GGC GTC GTC AAG AAG CCA GAG TAC CTA GGC 5’
pHK414
_______Linker_sequence______
/ \
HindIII BamHI
,PstI. BglII.| |,SmaI. |
CTGCAG...AGATCTAAGCTTCCCGGGGATCCAAGATCC
GACGTC...TCTAGATTCGAAGGGCCCCTAGGTTCTAGG
. .
.......................................
Synthesizing pSP4-1
pHK414 + HindIII + BamHI
They cut a hole in the plasmid…
HindIII BamHI.
(PstI) BglII,| |
CTGCAG...AGATCTA GATCCAAGATCC
GACGTC...TCTAGATTCGA GTTCTAGG
. .
.......................................
SP + HindIII + BamHI
… and cut matching snips off their SP gene.
HindIII. ,BamHI_.
| | Met Arg Pro Lys Pro Gln Gln Phe Phe Gly Leu Met |
AGC TTC ATG CGT CCG AAG CCG CAG CAG TTC TTC GGT CTC ATG
AG TAC GCA GGC TTC GGC GTC GTC AAG AAG CCA GAG TAC CTA G
pSP4-1
Mixing the snips together gives the plasmid with a single SP.
HindIII BamHI.
,PstI. BglII.| | MetArgProLysProGlnGlnPhePheGlyLeuMet |
CTGCAG...AGATCTAAGCTTCATGCGTCCGAAGCCGCAGCAGTTCTTCGGTCTCATGGATCCAAGATCC
GACGTC...TCTAGATTCGAAGTACGCAGGCTTCGGCGTCGTCAAGAAGCCAGAGTACCTAGGTTCTAGG
. .
......................................................................
Using -SP-
to abbreviate the HindIII→Met→Met portion (less the
terminal G, which is part of the BamHI match sequence).
,PstI. BglII. BamHI.
CTGCAG...AGATCT-SP-GGATCC
GACGTC...TCTAGA-SP-CCTAGG
. .
.........................
Synthesizing pSP4-2
The single-SP plasmid, pSP4-1, is split in two parallel reactions.
PstI + BamHI
G...AGATCT-SP-G
ACGTC...TCTAGA-SP-CCTAG
PstI + BglII
CTGCA GATCT-SP-GGATCC
G A-SP-CCTAGG
. .
.........................
pSP4-2
Then the SP-containing fragments (shown above) are isolated and mixed together to form pSP4-2.
,PstI. BglII. other. BamHI.
CTGCAG...AGATCT-SP-GGATCT-SP-GGATCC
GACGTC...TCTAGA-SP-CCTAGA-SP-CCTAGG
. .
...................................
where the "other" sequence is the result of the BamHI/BglII splice.
Expanding the -SP-
abbreviation around the SP joint:
....SP,other_.HindIII. SP.....
Leu Met Asp Leu Ser Phe Met Arg
CTC ATG GAT CTA AGC TTC ATG CGT
AGA CGT TCG AGC CTA GGA CGT ATG
So the resulting poly-SP will have Asp-Leu-Ser-Phe linking amino acids.
By repeating the PstI + BamHI / PstI + BglII split-and-join, you can synthesize plasmids with any number of SP repeats.
I27RS₈ procedure
Like Kempe, Carrion-Vazquez et al. flank the I27 gene with BglII and BamHI, but they reverse the order. Here's the output of their PCR:
BamHI-I27-BglII-Cys-Cys-STOP-STOP
From the PDB entry for I27 (1TIT), the amino acid sequence is:
,leader_.
MHHHHHHSSLIEVEKPLYGVEVFVGETAHFEIELSEPDVHGQWKLKGQPLTASPDCEIIEDGKKHILI
LHNCQLGMTGEVSFQAANAKSAANLKVKEL
To translate this into cDNA, I've scanned thorough the sequence of NM_003319.4, and found a close match from nucleotides 15991 through 16248.
15982 CTAATAAAAG TGGAAAAGCC TCTGTACGGA GTAGAGGTGT TTGTTGGTGA
16032 AACAGCCCAC TTTGAAATTG AACTTTCTGA ACCTGATGTT CACGGCCAGT
16082 GGAAGCTGAA AGGACAGCCT TTGACAGCTT CCCCTGACTG TGAAATCATT
16132 GAGGATGGAA AGAAGCATAT TCTGATCCTT CATAACTGTC AGCTGGGTAT
16182 GACAGGAGAG GTTTCCTTCC AGGCTGCTAA TGCCAAATCT GCAGCCAATC
16232 TGAAAGTGAA AGAATTG
This cDNA match generates an amino acid starting with LIKVEK instead of the expected LIEVEK, but the LIKVEK version matches amino acids 12677-12765 in Q8WZ42 (canonical titin), and there is a natural variant listed for 12679 K→E.
Interestingly, this sequence contains a PstI site at nucleotides 16220 through 16225. None of our other restriction enzymes have sites in the I27 sequence.
Carrion-Vazquez et al. list two vectors in their procedure, but I'm not sure about their respective roles.
pQE30
pQE30 (sequence) is listed as the "expression vector", but I'm not sure why they would need a non-expression vector, as they don't reference cross-vector subcloning after inserting their I27 monomer into the plasmid.
From the Qiagen site, the section around the linker nucleotides 115 through 203 is:
,RGS-His epitope__________________. ,BamHI.
Met Arg Gly Ser His His His His His His Gly Ser Ala Cys Glu Leu
ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC GCA TGC GAG CTC
CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA TAC GTA TCT AGA
,SmaI__.
,KpnI_. HindIII
Gly Thr Pro Gly Arg Pro Ala Ala Lys Leu Asn STOP
GGT ACC CCG GGT CGA CCT GCA GCC AAG CTT AAT TAG CTG AG
TTG CAA AAT TTG ATC AAG TAC TAA CCT AGG CCG GCT AGT CT
However, there is no BglII site in this linker. In fact, there is no BglII site in the entire pQE30 plasmid, so they'd need to use a third restiction enzyme to insert their I27 (which does contain a trailing BglII).
pUC19
From BCCM/LMBP and GenBank, the section around the linker nucleotides 233 through 289 is:
,SmaI_.
HindIII. ,PstI__. ,BamHI_. ,KpnI__.
Met STOP
AA GCT TGC ATG CCT GCA GGT CGA CTC TAG AGG ATC CCC GGG TAC CGA
GCT CGA ATT C
However, there is no BglII the entire pUC19 plasmid either, so they'd need to use a third restiction enzyme to insert their I27.
Questions
- Why do Carrion-Vazquez et al. list two different plasmids?
- What is the 3'-side restiction enzyme that Carrion-Vazquez et al. use to insert their I27 into their plasmid?
- What is the remote restriction enzyme that Carrion-Vazquez et al. use to break their opened plasmids (Kempe PstI equivalent).
- The BamHI and SmaI sites in pUC19 overlap, so it is unclear how you could use both to "linearize" pUC19. It would seem that either one would open the plasmid on its own, although I'm not sure you could "heal" the blunt-ended SmaI cut.
Since the Arg-Ser joint is formed by a BglII/BamHI overlap, why are there no BglII-coded amino acids after the last I27 in the I27RS₈ sequence? If there is, why do Carrion-Vazquez et al. not acknowledge it when they write [3]:
The full-length construct, I27RS₈, results in the following amino acid additions: (i) the amino-terminal sequence is Met-Arg-Gly-Ser-(His)6-Gly-Ser-I27 codons; (ii) the junction between the domains (BamHI-BglII hybrid site) is Arg-Ser; and (iii) the protein terminates in Cys-Cys.
Since they don't acknowledge an I27-Arg-Ser-Cys-Cys ending, might there be more amino acids in the C terminal addition?
Working backward
Since I'm stuck trying to get I27 into either plasmid, let's try and work backward from
Met-Arg-Gly-Ser-(His)₆-Gly-Ser-(I27-Arg-Ser)₇-I27-...-Cys-Cys
BglII/BamHI joint
The BglII/BamHI overlap would produce the expected Arg-Ser joint.
BglII BamHI
A + GATCC = AGATCC = Arg-Ser
TCTAG G TCTAGG
Final plasmid (pI27-8)
The beginning of this sequence looks like the start of pQE30's linker, so we'll assume the final plasmid was:
remote ... ,RGS-His epitope__________________. ,BamHI. I27...
... Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
........I27 joint_. I27 ... final I27 ,BglII. continuation of pQE30?
... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
Penultimate plasmid (pI27-4)
remote ... ,RGS-His epitope__________________. ,BamHI. I27...
Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
pI27-4 + BamHI + remote
remote ,BamHI. I27...
Leu Ile ...
? GA TCC CTA ATA ...
?? A GAT TAT ...
....... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
pI27-4 + BglII + remote
remote ... ,RGS-His epitope__________________. ,BamHI. I27...
Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
?? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
....... I27 joint_. I27 ... fourth I27 ,BglII.
... Glu Leu Leu ... Leu
... GAA TTG AGA TCC CTA ... TTG A
... CTT AAC TCT AGG GAT ... GAT CTC GA
pI27-8
remote ... ,RGS-His epitope__________________. ,BamHI. I27...
Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
....... I27 joint_. I27 ... fourth I27 ,other. I27...
... Glu Leu Leu ... Leu Gly Ser Leu Ile ...
... GAA TTG AGA TCC CTA ... TTG AGA TCC CTA ATA ...
... CTT AAC TCT AGG GAT ... GAT CTC GAA GAT TAT ...
....... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
Continuing to the first plasmid, pI27-1 must have been
remote ... ,RGS-His epitope__________________. ,BamHI. I27...
... Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
........I27 ,BglII. continuation of pQE30?
... Glu Leu Arg Ser Cys Cys STOPSTOP...
... GAA TTG AGA TCT TGC TGC TAG TAG ...
... CTT AAC CTC GAG GTA GTA GCT GCT ...
Potential pQE30 insertion points
- Kpn1 (present after BamHI in both plasmids)
Potential remote restriction enzymes
- BglI (pQE30 nucleotides 2583-2593 (GCCGGAAGGGC), Amp-resistance 3256-2396; pUC19 has two BglI sites (bad idea))