Acta Chim. Slov. 2008, 55, 385–395
385
Scientific paper
Molecular Simulations Find Stable Structures in Fragments of Protein G
Tja{a Urbi~,1 Toma` Urbi~,2 Franc Avbelj1 and Ken A. Dill3
1National Institute of Chemistry Slovenia, Ljubljana, Slovenia
2Faculty of Chemistry and Chemical Technology, University of Ljubljana, Slovenia
3Department of Pharmaceutical Chemistry, University of California, San Francisco
* Corresponding author: E-mail: tomaz.urbic@fkkt.uni-lj.si
Received: 26-01-2008
Abstract
We perform all-atom computer simulations on nearly one hundred 6-, 8-, 10-, and 12-mer peptide fragments of protein G, and look for stable states. We simulated by replica-exchange molecular dynamics using Amber7 with the parm96 force-field and a GB/SA (generalized-Born/solvent accessible) implicit solvent model. We find that useful diagnostics for identifying stable converged structures are the conformational entropy and free energy of each state. A large gap in the ground-state free-energy, and a low entropy indicate convergence to a single preferred peptide conformation. We find that a non-negligible fraction of such structures have some native-like character. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.
1. Introduction
A long-standing goal of computational biology has been to devise a computer algorithm that takes, as input, an amino acid sequence and gives, as output, the three-dimensional native structure of a protein. A main motivation is to make drug discovery faster and more efficient by replacing slow expensive structural biology experiments with fast cheap computer simulations. There are many successful bioinformatics approaches to this problem.1–5 Those methods draw heavily from knowledge bases of known protein native structures. Our interest here is in whether purely physical all-atom force field models are capable of identifying native-like starting points from small peptide fragments.
Short peptide fragments of proteins often have intrinsic propensities for the formation of their native conformations. NMR experiments6 show that long peptide fragments have native-like conformations.7–11 Some short peptides in solution have also been shown to adopt their native secondary structures: ? helices12, 13 and ß hair-pins.14–18 As a consequence, peptide conformational propensities that are taken from the protein databank (PDB) are now widely used in protein-structure prediction algorithms. Most current protein structure prediction methods make some use of database-derived conforma-
tional preferences. A popular set of peptide fragment conformations is the I-sites library of David Baker and his co-workers19, 20 From the recent CASP (Critical Assessment of Techniques for Structure Prediction) protein structure prediction competition, it was noted that most of the successful de novo methods start with small fragments21, 22 which are then combined together into a predicted tertiary structure.
However, it would be desirable to achieve high-resolution protein structure prediction in models that are purely physics-based, for various reasons. Such predictions would not rely on information contained in protein structure databases. First, it would put our understanding of protein structures and driving forces on a deeper and more physical foundation. For example, such methods could elucidate the physical routes of protein folding. Second, it would allow the prediction of non-native states, too, those that are of interest for protein folding kinetics and stability. Third, it would allow us to treat induced fit binding of ligands, or other conformational changes.
There is good evidence that physical models can capture these conformational propensities of peptides. Simple physical models can reproduce the structural biases of certain peptide fragments.23–26 To date, however, such studies have largely focused on selected peptides that are expected to fold. Moreover, several models of protein
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
386
Acta Chim. Slov. 2008, 55, 385–395
folding kinetics are premised on the idea that folding routes begin with metastable structures of small peptide fragments.27–30 In recent work Ho et al31 simulated 133 peptide 8-mer fragments from six different proteins. In that work, it was found that more than 30 percent of the peptide fragments converge to a preferred structure, some of which are native-like. In the present study, we extend beyond that work, through a systematic exploration of different fragment chain lengths
The present study provides a test of physics-based all-atom simulations – the quality of the force field and solvation model, and the adequacy of current typical levels of sampling. There are well known problems with commonly used molecular mechanics force fields. Yoda et al.32 conducted multicanonical simulations of several small peptides (the ?-helical C-peptide of ribonuclease A and the C-terminal ß-hairpin of protein G) by using six common force fields (AMBER94, AMBER96, AM-BER99, CHARMM22, OPLS/AA/L, and GROM0S96) in explicit solvent and concluded that all of these force fields have different propensities to form secondary structures because of the differences in backbone torsional energies. Also, it is still largely an open question as to how well these implicit solvent models can predict the thermodynamics as well as the kinetics of protein folding. Zhou33 tested different combinations of force fields and solvation models on the C-terminal ß-hairpin of protein G. He found the best balanced combination to be AMBER96/ GBSA, so we use that here.
Here, we study 26 6-mer peptide fragments from protein G, 25 8-mers, 24 10-mers and 23 12-mers. We use replica-exchange molecular dynamics sampling34 in Amber735, with the parm96 parameters36 and the GB/SA implicit solvent model of Tsui and Case37. We are interested in whether this physical model can identify native-like secondary structures in peptide fragments. We systematically generate a series of peptide fragments with overlapping sequences from the original protein sequence. Neighboring fragments have a 4–10 residue overlap (and two-residue gap). We simulate each peptide using 16 replicas for 5 ns/replica.
2. Computational Details
2. 1. Simulation Details
We utilized replica-exchange molecular dynamics (REMD)34 using a custom PERL script wrapper (http:// www.dillgroup.ucsf.edu/?jchodera/code/rex) around the SANDER molecular dynamics program for the Amber7 molecular-modeling package.35 REMD periodically attempts to exchange conformations between independent molecular dynamics simulations running in parallel at different temperatures, based on a Metropolis-like criterion. This allows individual replicas to heat up to overcome barriers and then cool back down to temperatures of interests.
REMD has two advantages. It explores more conformational space then conventional molecular dynamics techniques.38 And, REMD samples from the canonical ensemble at each temperature. This sampling gives proper estimates of free energies, not just energies. We used 16 replicas exponentially spaced between 270 K and 690 K. The probability of exchange-acceptance was approximately 50%. Exchanges were attempted every 1 ps. Between exchange attempts energy-conserving molecular dynamics was used with a 2 fs time step. After each exchange attempt the velocities were randomized from the appropriate Maxwell-Boltzmann distributions to ensure sampling from the canonical ensemble at the appropriate temperature.
The protein is chopped into overlapping fragments 6 to 12 residues in length. Fragments were modeled with the Amber Parm96 force field36 with GB implicit solvent model and surface area penalty term of 5 cal · mol –1 · A–2 of Tsui and Case.37 All fragments were capped at N and C termini with acetyl and N-methylamine blocking groups to avoid undue influence from the zwitterionic termini. The fragments were initialized in extended state. The bonds to hydrogens were constrained with the SHAKE.39 Simulations were run for 5 ns per replica. Configurations were stored every 1 ps for analysis.
2. 2. Thermodynamic Properties
The results were analyzed by the weighted histogram analysis (WHAM).40, 41 In order to extract thermodynamic observables for a target temperature T we must reweight the configurations taken from each temperature Tk in order to combine them into a representative ensemble.41
We first calculate the dimensionless free energy fk for each replica k. We start with an approximate value for fk and calculate the density of states Q.kE with energy E in replica k as
(1)
where NkE is number of sampled configurations with energy E from replica k. &i = j—^ inverse temperature, kB Boltzmann constant and Nkl number of configurations of replica k at temperature Tl . From the updated density of states we can calculate an updated estimate of dimension-less free energy by
(2)
We iterate equations (1) and (2) until the free energy converges. Then we can use the these data to reweight the relative free energy profile F of state i to the inverse target temperature or we can calculate the estimator of the expectation for any observable.41
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
Acta Chim. Slov. 2008, 55, 385–395
387
2. 3. Mesostates
To analyze our date, we form discrete bins of the backbone degrees of freedom31. This process has a long history, dating back to the original work of Ramachandran et al,42 who divided the backbone (j> - y/ angles into three district regions, which are known as a, aL and ß. We describe the conformation of the peptide backbone as a string of discrete mesostates that we call a mesostring. Each mesostring describes a state that is separated by an energy barrier from other mesostrings. This means that each mesostring corresponds to local minima in the conformation space of the peptide backbone. When we know all mesostrings it is easy to find the one with lowest energy and get structure from the lowest energy basin. This partitioning in the terms of discrete regions in the backbone angles has been observed in a molecular dynamics simulation of an a-helical peptide.43 For mesostring analysis we cannot use the database analysis to define the boundaries of the backbone mesostates because current force fields cannot replicate the database distribution of (j> - y/angles.31 Ho and coworkers31 define the boundaries of the backbone mesostates in the terms of the same force field we use here. They ran replica-exchange simulations of the alanine dipeptide and the glycine dipeptide to define the boundaries of different mesostates. They break up the Ramachandran plot in terms of the following mesostates:
[b] : (-180° <(j)< 0°, 45° <(j>< 180°)
U(-180° < (j) < 0°,-180° < (j) < -135°)
U(120° < (j) < 180°, 45° < (j) < 180°)
U(120° < (j) < 180°,-180° < (j) < -135°)
[a]  : (-180° < (j)< 0°,-135° <(j>< 45°) U(120° < (j) < 180°,-135° < (j) < 45°)
[l] : (0° < (j)< 120°,-180° < (j)< 180°) U(120° < (j) < 180°,-135° < (j) < 45°)
and for glycin
[b]  : (-180° <(j)< 0°, 45° <(j>< 180°) U(-180° < (j) < 0°,-180° < (j) < -135°)
U(0° <(j)< 180°, 135° < (j)< 180°) U(0° <(j)< 180°,-180° < (j)< -45°)
[a] : (-180° < (j)< 0°,-135° <(j>< 45°)
[l] : (0° < (j)< 180°,–45° < (j)< 135°)
With the use of dimensionless free energies fk (Eq. (2)) we reweight the free energy profile Fi of mesostring i at the inverse target temperature ß as
(3)
After using WHAM to calculate the relative free energies Fi of the mesostrings i we calculate the probabilities pi of this mesostring by
(4)
When each simulation is completed, each fragment will have different populations of the various mesostrings and also different free energies. The ground mesostring is the mesostate with the highest population and the lowest free energy. We then determine whether each peptide finds a metastable structure. We define the existence of the structure in the fragment in terms of two properties of mesostring. First, we use the probability pi of the ground mesostring. Second, we use the free energy gap AF between the ground mesostring and the next mesostring. When the ground mesostring is nearly identical to next mesostring, it indicates a lack of preference of the force field for a particular structure. If the most populated (ground mesostring) differs by only one mesostate, we group them into a consensus mesostring, which contains one indefinite mesostate signified by [-]. When we group similar mesostrings into consensus mesostring we calculate the free energy difference to another mesostring j by
(5)
The backbone entropy is calculated using the Boltzmann formula
(6)
The backbone entropy is useful for measuring for a given fragment the sharpness of the distribution of probabilities of the mesostring. The more peaked the distribution is and, thus the more favored the mesostring is, the lower is the backone entropy. In this way, the backbone entropy indicates whether any one conformation is substantially favored over the others for given fragment. The fragments having low mesostring entropy can be considered as early folding nuclei to initiate folding.
2. 4. Contact Maps
The three-dimensional structure of a protein may be compactly represented as a symmetrical, square matrix of pairwise, inter-residue contacts, called the contact map.44,45 The contact map provides useful information about the protein’s secondary structure and it also captures non-local interactions giving clues to its tertiary structure.
We define that two residues xi and x in a protein are in a contact if the distance dij between a-carbon atoms of the residues xi and x is lower then some threshold value.
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
388
Acta Chim. Slov. 2008, 55, 385–395
In our case we chose 7 A. Distance dij is defined as
(7)
where ri and r are the coordinates of a-carbon atoms. A contact map for protein with N residues is an N × N binary matrix S in which the elements Sij are defined as
f 1   dij<7Ā,\i-j\>
1   !J          Otherwise
(8)
Note that we require a minimum sequence distance of 4 to call it a contact.
3 Results and Discussion
In this study, we chopped protein G in 26 peptide 6-mer fragments, 25 8-mers, 24 10-mers and 23 12-mers. We systematically generate a series of peptide fragments with overlapping
them using mesostring analysis. We looked for structural biases in each peptide. We used two criteria. First, we use the free energy gap ?F between the ground mesostring and the next mesostring with higher free energy. Second,
Figure 1: Contact map for different 6-mers. Contact map of native protein G is shown with diamonds, for different structured 6-mers with crosses.
Table 1: Sequences of 6, 8, 10 and 12-mers fragments used in calculations.
Fragment	Sequence of 6-mers	Sequence of 8-mers	Sequence of 10-mers	Sequence of 12-mers
fragl	1-MTYKLI	1-MTYKLILN	1-MTYKLILNGK	1-MTYKLILNGKTL
frag2	3-YKLILN	3-YKLILNGK	3-YKLILNGKTL	3-YKLILNGKTLKG
frag3	5-LILNGK	5-LILNGKTL	5-LILNGKTLKG	5-LILNGKTLKGET
frag4	7-LNGKTL	7-LNGKTLKG	7-LNGKTLKGET	7-LNGKTLKGETTT
frag5	9-GKTLKG	9-GKTLKGET	9-GKTLKGETTT	9-GKTLKGETTTEA
frag6	11-TLKGET	11-TLKGETTT	11-TLKGETTTEA	11-TLKGETTTEAVD
frag7	13-KGETTT	13-KGETTTEA	13-KGETTTEAVD	13-KGETTTEAVDAA
frag8	15-ETTTEA	15-ETTTEAVD	15-ETTTEAVD AA	15-ETTTEAVD AATA
frag9	17-TTEAVD	17-TTEAVDAA	17-TTEAVD AATA	17-TTEAVD AATAEK
frag10	19-EAVDAA	19-EAVDAATA	19-EAVDAATAEK	19-EAVD AATAEKVF
fragil	21-VDAATA	21-VDAATAEK	21-VDAATAEKVF	21-VDAATAEKVFKQ
frag12	23-AATAEK	23-AATAEKVF	23-AATAEKVFKQ	23-AATAEKVFKQYA
frag13	25-TAEKVF	25-TAEKVFKQ	25-TAEKVFKQYA	25-TAEKVFKQYAND
frag14	27-EKVFKQ	27-EKVFKQYA	27-EKVFKQYAND	27-EKVFKQYANDNG
frag15	29-VFKQYA	29-VFKQYAND	29-VFKQYANDNG	29-VFKQYANDNGVD
frag16	31-KQYAND	31-KQYANDNG	31-KQYANDNGVD	31-KQYANDNGVDGE
frag17	33-YANDNG	33-YANDNGVD	33-YANDNGVDGE	33-YANDNGVDGEWT
frag18	35-NDNGVD	35-NDNGVDGE	35-NDNGVDGEWT	35-NDNGVDGEWTYD
frag19	37-NGVDGE	37-NGVDGEWT	37-NGVDGEWTYD	37-NGVDGEWTYDDA
frag20	39-VDGEWT	39-VDGEWTYD	39-VDGEWTYDDA	39-VDGEWTYDDATK
frag21	41-GEWTYD	41-GEWTYDD A	41-GEWTYDDATK	41-GEWTYDDATKTF
frag22	43-WTYDDA	43-WTYDDATK	43-WTYDD ATKTF	43-WTYDDATKTFTV
frag23	45-YDDATK	45-YDDATKTF	45-YDDATKTFTV	45-YDDATKTFTVTE
frag24	47-DATKTF	47-DATKTFTV	47-DATKTFTVTE	
frag25	49-TKTFTV	49-TKTFTVTE		
frag26	51-TFTVTE			
sequences from the original protein sequence. Neighboring fragments have a 4–10 residue overlap (and a two-residue gap). Table 1 shows the sequences of the fragments. We simulated each peptide using 16 replicas for 5 ns per replica. We clustered our conformations analyzed
we use the probability p1 to determine the population of the ground mesostring. We identified a stable structure if ?F > 0.6 kcal/mol and p1 > 38%, which is slightly differently then was used previously by Ho et al.31 We used a 1 ns window for analyzing the mesostrings. Tables 2–5
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
Acta Chim. Slov. 2008, 55, 385–395
389
Table 2: Native and ground mesostring (see the text for definition on state a, b, and l), probability of ground mesostring, free energy difference, entropy and RMSD of 6-mers in 1ns time window after 4 ns equalibration. Bold indicates stable fragments.
Fragment	RMSD A	Native mesostring	Ground mesostring	p1 %	AF	TS
					kcal/mol	kcal/mol
fragl	3.7	bbbbbb	-aaaaa	50	0.71	1.20
frag2	3.9	bbbbba	baaaaa	42	0.37	1.01
frag3	4.1	bbbaba	-aaabb	37	0.82	1.51
frag4	2.6	babaab	bb-aab	64	0.65	0.92
frag5	2.6	baabbb	baaaa-	49	0.92	1.38
frag6	4.1	abbbbb	bbb-aa	20	0.39	1.87
frag7	4.3	bbbbbb	blaabb	20	0.28	1.56
frag8	3.6	bbbbbb	baaaaa	10	0.00	1.82
frag9	3.5	bbbbab	aaaaa-	36	0.43	1.41
frag10	3.1	bbabaa	aaaaa-	34	0.56	1.58
fragil	0.5	abaaaa	bbaaa-	51	1.10	1.34
frag12	3.0	aaaaaa	bbaaa-	23	0.45	1.63
frag13	3.1	aaaaaa	b-aabb	25	0.68	1.67
fragl4	0.4	aaaaaa	-aaaaa	73	1.21	0.80
frag15	3.1	aaaaaa	aaaaaa	23	0.43	1.59
frag16	3.5	aaaaaa	babaab	15	0.06	1.53
frag17	3.3	aaaaal	bbbbbb	18	0.40	1.75
frag18	1.5	aaalbb	ba-bbb	33	0.97	1.70
frag19	4.2	albbab	bbbbbb	13	0.03	1.85
frag20	3.0	bbabbb	bbbabb	16	0.01	1.62
frag21	4.7	abbbbb	baaaab	36	0.99	1.56
frag22	1.9	bbbbaa	baaaaa	11	0.14	1.68
frag23	2.1	bbaaal	ba-aaa	59	0.87	1.06
frag24	3.0	aaalbb	baaabb	18	0.18	1.62
frag25	3.8	albbbb	aaaaab	24	0.20	1.45
frag26	3.9	bbbbbb	aaaaa-	60	1.41	1.20
Table 3: Same as table 2, but for 8-mers.						
Fragment	RMSD A	Native mesostring	Ground mesostring	p1 %	AF	TS
					kcal/mol	kcal/mol
fragl	4.6	bbbbbbba	aaaaaaaa	30	0.43	1.34
frag2	5.5	bbbbbaba	baaaaabb	21	0.01	1.51
frag3	3.8	bbbabaab	bbbbbaab	28	0.31	1.38
frag4	3.0	babaabbb	bb-aabbb	57	1.00	1.25
frag5	4.6	baabbbbb	bblaabbb	11	0.00	1.69
frag6	6.1	abbbbbbb	bbaaabba	12	0.02	1.73
frag7	5.6	bbbbbbbb	bbaaaaaa	25	0.03	1.26
frag8	4.3	bbbbbbab	-aaaaaaa	55	0.68	1.18
frag9	3.8	bbbbabaa	abaaaabb	43	0.48	1.13
frag10	2.5	bbabaaaa	b-baaaab	63	1.29	1.22
fragil	4.0	abaaaaaa	bblaabbb	14	0.08	1.65
frag12	2.3	aaaaaaaa	baaaaaaa	20	0.17	1.48
frag13	4.3	aaaaaaaa	aaaaaaaa	30	0.41	1.34
frag14	3.6	aaaaaaaa	baaaaaaa	44	0.37	1.01
frag15	4.5	aaaaaaaa	lbaaaabb	35	0.47	1.23
frag16	3.7	aaaaaaal	bbaaabbb	21	0.27	1.71
frag17	3.4	aaaaalbb	bbbaabbb	28	0.42	1.46
frag18	3.4	aaalbbab	baababbb	12	0.11	2.08
frag19	4.4	albbabbb	bbaa-bbb	23	0.53	1.77
frag20	4.4	bbabbbbb	babaaaab	41	0.68	1.35
frag21	6.9	abbbbbaa	bbaaaabb	17	0.01	1.51
frag22	3.4	bbbbaaal	aaaaaaaa	30	0.10	1.24
frag23	3.0	bbaaalbb	babaaa-b	78	1.10	0.73
frag24	4.1	aaalbbbb	baaaaaaa	38	0.46	1.31
frag25	6.1	albbbbbb	aaababab	25	0.14	1.42
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
390
Acta Chim. Slov. 2008, 55, 385–395
Table 4: Same as table 2, but for 10-mers.						
Fragment	RMSD A	Native mesostring	Ground mesostring	p1 %	AF	TS
					kcal/mol	kcal/mol
fragl	6.2	bbbbbbbaba	aaaaaaaab-	43	0.82	1.43
frag2	6.1	bbbbbabaab	bbababbbab	31	0.24	1.22
frag3	4.3	bbbabaabbb	aaabbaabbb	36	0.66	1.42
frag4	5.3	babaabbbbb	bblaaabbab	19	0.15	1.49
frag5	5.4	baabbbbbbb	baaaalbba-	21	0.44	1.93
fragó	7.7	abbbbbbbbb	babbbaaaaa	8	0.02	1.84
frag7	5.7	bbbbbbbbab	bbaaaaaaaa	13	0.10	1.67
frag8	4.6	bbbbbbabaa	baaaaaaaaa	30	0.23	1.30
frag9	4.7	bbbbabaaaa	bbaaaaaaaa	18	0.49	1.86
frag10	3.9	bbabaaaaaa	bbbaabbabb	26	0.39	1.57
fragil	4.3	abaaaaaaaa	aaaaaaaabb	27	0.56	1.61
frag12	3.0	aaaaaaaaaa	baaaaaabbb	35	0.42	1.23
frag13	5.3	aaaaaaaaaa	bbbblbbabb	30	0.38	1.37
fragl4	0.6	aaaaaaaaaa	aaaaaaaaaa	51	0.84	1.10
frag15	5.3	aaaaaaaaal	bbabblabbb	18	0.06	1.57
frag16	3.7	aaaaaaalbb	bbbbaaalbb	19	0.37	1.61
frag17	3.3	aaaaalbbab	abaabbabbb	21	0.75	1.94
frag18	5.4	aaalbbabbb	bbbbaabbbb	16	0.15	1.94
fragl9	5.8	albbabbbbb	bbbbbabbbb	38	0.77	1.59
frag20	5.4	bbabbbbbaa	babaaaaaaa	60	0.87	0.95
frag21	4.1	abbbbbaaal	baaaaaaaaa	41	0.35	1.14
frag22	3.2	bbbbaaalbb	bababaaaaa	37	0.37	1.14
frag23	3.8	bbaaalbbbb	baaaaaaaaa	39	0.50	1.29
frag24	5.6	aaalbbbbbb	baaaaaabba	27	0.06	1.30
Table 5: Same as table 2, but for 12-mers.
Fragment	RMSD A	Native mesostring	Ground mesostring	p1 %	AF	TS
					kcal/mol	kcal/mol
fragl	8.4	bbbbbbbabaab	bababaaaabba	32	0.63	1.49
frag2	5.0	bbbbbabaabbb	baaabbbbabbb	30	0.32	1.49
frag3	4.3	bbbabaabbbbb	aaabbaabbbbb	18	0.37	1.76
frag4	6.0	babaabbbbbbb	bblaaabbabbb	38	0.63	1.39
frag5	7.3	baabbbbbbbbb	bbbbbbbbbbbb	17	0.03	1.89
frag6	8.9	abbbbbbbbbab	bbb-bbaaabbb	42	0.68	1.45
frag7	5.8	bbbbbbbbabaa	babaaaaabbbb	33	0.78	1.62
frag8	5.1	bbbbbbabaaaa	aaaaaaabbabb	33	0.50	1.38
frag9	4.7	bbbbabaaaaaa	baaaaaabbaab	23	0.13	1.57
frag10	4.4	bbabaaaaaaaa	babbaaaabbbb	42	0.13	0.91
fragil	5.8	abaaaaaaaaaa	bbbaaabbbbab	15	0.36	1.68
fragl2	5.7	aaaaaaaaaaaa	bbabaaababb-	42	0.66	1.42
frag13	4.2	aaaaaaaaaaaa	baaaabaaabba	12	0.02	1.66
fragl4	5.6	aaaaaaaaaaal	bbb-aaabbbbb	38	0.68	1.52
frag15	5.5	aaaaaaaaalbb	bbbbbaabbbbb	15	0.02	1.67
frag16	4.1	aaaaaaalbbab	bbbaaaababb-	45	0.95	1.43
frag17	5.6	aaaaalbbabbb	bbbabbbb-aab	22	0.75	2.02
frag18	6.3	aaalbbabbbbb	babbbbbbbaba	19	0.41	1.87
frag19	6.6	albbabbbbbaa	bbbababbbbaa	13	0.25	1.93
frag20	5.6	bbabbbbbaaal	babaaaaaaabb	24	0.28	1.49
frag21	5.5	abbbbbaaalbb	bbbaaabaaabb	21	0.06	1.53
frag22	4.4	bbbbaaalbbbb	babaaaabbaaa	40	0.17	0.94
frag23	5.0	bbaaalbbbbbb	aabaaaaaaaaa	53	0.60	0.95
show data for different fragments at 4 ns from start of simulation in 1 ns window. The lines in bold indicate stable structures. A key finding, consistent with that of Ho et
al is the persistence of substantial stable structure even in such short peptides, studied here over a more extensive set of peptides and over a systematic series of different
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
Acta Chim. Slov. 2008, 55, 385–395
391
chain lengths: in 27% of 6-mer fragments, 20% of 8-mers, 17% of 10-mers and 30% of 12-mers. We classified the structures of stable fragments using the definitions of Ho et al.31
Out of 98 simulated fragments, we found 22 that have structural preferences. We found six structured 6-mers each form a helical-turn, and one other forms a reverse turn. Similarly, for 8-mers, three fragments for a helical turn and one other forms a reverse turn. For 10-mers, all but one is in a helical turn structure. The 12-mers can form more complex structures, often a com-
Figure 2: Same as figure 1, but for 8-mers.
Figure 3: Same as figure 1, but for 10-mers.
bination of a helical-turn and reverse-turn. Only one of the 12-mers has a stable helical structure. While it has been generally supposed that peptides this short are unlikely to have stable structure, our work is consistent with the conclusions from the previous more limited studies of Ho et al.
Figure 4: Same as figure 1, but for 12-mers.
In general there are 3 places in protein G where we find structured fragments: the N-terminal ß-hairpin, the helix, and the C-terminal ß-hairpin. At the N-terminal, the turn in the ß-hairpin is predicted by the ground state of the 8-mer called frag4. For several other fragments, this turn is among the highly populated structures. Within the helix, we find two 6-mers and one 10-mer with very nativelike structure. Two 12-mers are also stable, but with nonnative structures. In the C-terminal ß-hairpin, all our stable fragments adopt a helical-turn, the 8-mer of which is the most stable. The isolated C-terminal ß-hairpin has been found experimentally to be stable,14 where this stability is reflected in the structural bias found in the peptide fragments of the hairpin-turn. These structured fragments also are consistent with a study of Minor and Kim,46 who replaced the ?-helix sequence with a sequence based on the C-terminal hairpin. The mutant was able to fold into the same topology, showing that there is a helical propensity in the C-terminal hairpin. In this study we find helical-turns in both the ?-helix and the turn of the C-terminal hairpin, which demonstrates the interchangeability of these two sequences in our simulations. Our 6-mers frag10 and frag11 and our 10-mer frag14 all show highly native-like helical structures, with very low RMSD to native and low entropy.
On figures 1–4 we plotted contact maps for native protein G and stable fragments of different sizes. We found a substantial number of stable fragments in the right places and they predict at least some of the contacts correctly. We also see some stable fragments in the places where native protein G does not have contacts. These tend to be locations where the backbone is transitioning from one structural element to another.
We also tested the proposition that keeping the five best mesostrings for a given peptide, based on our entropy and free energy analysis, might capture native-like structures in the fragments. Results are shown in figures 5–8. Figure 5 shows a 10-mer helix for which all the top conformations are native-like. In figure 6 we show struc-
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
392
Acta Chim. Slov. 2008, 55, 385–395
Figure 5: Structures of 10-mer fragment 14. a) native structure in protein G, b) ground mesostring, c) 1st higher mesostring, d) 2nd higher mesostring, e) 3rd higher mesostring and f) 4th higher mesostring. For each mesostring, we show the free energy and population.
Figure 6: Same as figure 5, only structures of 12-mer fragment frag22. This fragment is not stable.
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
Acta Chim. Slov. 2008, 55, 385–395
393
tures of a 12-mer in the C-terminal ß-hairpin. Here, we find a competition between helical-turn structures and the correct hairpin. The helix population is about 40% of all mesostrings, the hairpin is about 50% and there is a small population of other conformations. In short, in this case, by keeping the top 2 or 3 of our best conformers, they have the potential to ultimately grow into the native structure. Figures 7 and 8 show an 8-mer and 12-mer from the N-terminal hairpin. In this case, all the top conformations form the correct hairpin. Out of our total of 98 simulated peptides, we find both the correct native topology and structures that are less than 3 Angstroms RMSD from the native in 3 6-mers (frag4, frag10 and frag14), one 8-mer (frag4) and one 10-mer (frag14). Other fragments meet one criterion or the other, but not both. Hence, we find that these short physical simula-
tions have a significant ability to identify useful nativelike structures.
4. Conclusions
In this work, we have applied replica-exchange molecular dynamics using AMBER96 force field with the GB/SA solvent model to simulate protein fragments 6–12 residues in length. Out of simulated 98 fragments, 22 were structured. Despite the fact that the simulations are short in time and are on peptides that are very short in length, nevertheless, these physics-based computations are capable of picking out some native-like structures. This may be useful for simulations of protein folding kinetics, or for physics-based native protein structure predictions.
e) bbbaabbbaaaa, -2.21 kcal/mol, 6 %
Figure 7: Same as figure 5, only structures of 12-mer fragment frag4.
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
394
Acta Chim. Slov. 2008, 55, 385–395
a) native, babaabbb
^Qfe
b) bbbaabbb, -3.26 kcal/mol, 42 %
e) bbbaabbt, -1.98 kcal/mol, 4 %
Figure 8: Same as figure 5, only structures of 8-mer fragment frag4.
5. Acknowledgments
This work was supported by the Slovenian Research Agency (Research Program P1–0201 and P1–0010) and NIH grant GM34993. T. U. was supported through the Young Researcher Program.
6. References
1. D. Baker and A. Sali, Science 2001, 294, 93–96.
2. A. Liwo, M. Khalili and H. A. Scheraga, Proc. Natl. Acad. Sci. USA 2005, 102, 2362–2367.
3. S. Oldziej, C. Czaplewski, H. D. Schafroth, R. Kazmierkie-wicz, D. R. Ripoll, J. Pillardy, J. A. Saunders, Y. K. Kang, K. D. Gibson and H. A. Scheraga, Proc. Natl. Acad. Sci. USA 2005, 102, 7547–7552.
4. I. A. Hubner, E. J. Deeds and E. I. Shakhnovich, Proc. Natl. Acad. Sci. USA 2005, 102, 18914–18918.
5. P. Bradley, K. M. Misura and D. Baker, Science 2005, 309, 1868–1871.
6. H. J. Dyson and P. E. Wright, Nat. Struct. Biol. 1998, 5, 499–503.
7. H. J. Dyson, G. Merutka, J. P. Waltho, R. A. Lerner and P. E. Wright, J. Mol. Biol. 1992, 226, 795–817.
8. J. P. Waltho, V. A. Feher, G. Merutka, H. J. Dyson and P. E. Wright, Biochem. 1993, 32, 6337–6347.
9. M. Ramirez-Alvarado, L. Serrano and F. J. Blanco, Protein Sci. 1997, 6, 162–174.
10. D. Eliezer, J. Chung, H. J. Dyson and P. E. Wright, Biochem. 2000, 39, 2894–2901.
11. R. Mohana-Borges, N. K. Goto, G. J. Kroon, H. J. Dyson and P. E. Wright, J. Mol. Biol. 2004, 340, 1131–1142.
12. S. Marqusee, V. H. Robbins and R. L. Baldwin, Proc. Natl. Acad. Sci.USA 1989, 86, 5286–5290.
13. V. Munoz and L. Serrano, Nat. Struct. Biol. 1994, 1, 399–409.
14. F. J. Blanco, G. Rivas and L. Serrano, Nat. Struct. Biol. 1994, 1, 584–590.
15. M. S. Searle, D. H. Williams and L. C. Packman, Nat. Struct. Biol. 1995, 2, 999–1006.
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G
Acta Chim. Slov. 2008, 55, 385–395
395
16. R. Zerella, P. A. Evans, J. M. Ionides, L. C. Packman, B. W. Trotter, J. P Mackay and D. H. Williams, Protein Sci. 1999, 8, 1320–1331.
17. J. F. Espinosa, V. Munoz and S. H. Gellman, J. Mol. Biol. 2001, 306, 397–402.
18. K. S. Rotondi and L. M. Gierasch, Biochem. 2003, 42 7976–7985.
19. C. Bystroff, K. T. Simons, K. F. Han and D. Baker, Curr. Opin. Biotechnol. 1996, 7, 417–421.
20. C. Bystroff and D. Baker, J. Mol. Biol. 1998, 281, 565–577.
21. P. Aloy, A. Stark, C. Hadley and R. B. Russell, Proteins 2003, 53, 436–456.
22. J. Moult, Curr. Opin. Struct. Biol. 2005, 15, 285–289.
23. F. Avbelj and J. Moult, Proteins 1995, 23, 129–141.
24. R. Srinivasan and G. D. Rose, Proteins 1995, 22, 81–99.
25. N. Gibbs, A. R. Clarke and R. B. Sessions, Proteins 2001, 43, 186–202.
26. J. L. Klepeis and C. A. Floudas, J. Comput. Chem. 2002, 23, 245–266.
27. P. S. Kim and R. L. Baldwin, Annu. Rev. Biochem. 1982, 51, 459–489.
28. K. A. Dill, K. M. Fiebig and H. S. Chan, Proc. Natl. Acad. Sci. USA 1993, 90, 1942–1946.
29. R. L. Baldwin and G. D. Rose, Trends Biochem. Sci. 1999, 24, 26–33.
30. S. Banu Ozkan, G. A. Wu, J. D. Chodera and K. A. Dill, Proc. Natl. Acad. Sci. USA 2007, 104,11987–11992.
31. B. K. Ho and K. A. Dill, Plos. Comp. Biol. 2006, 2, 1–10.
32. Y. T. Yoda, Y. Sugita and Y. Okamoto, Chem. Phys. 2004, 307, 269–283.
33. R. Zhou, Proteins 2003, 53, 148–161.
34. Y. Sugita and Y. Okamoto, Chem. Phys. Lett. 1999, 314, 141–151.
35. D. A. Pearlman, D. A. Case, J. W. Caldwell, W. S. Ross, T. E. Cheatham III, S. DeBolt, D. Ferguson, G. Seibel and P. Kollman, Comput. Phys. Comm. 1995, 91, 1–41.
36. P. Kollman, R. Dixon, W. Cornell, T. Vox, C. Chipot and A. Pohorille, The development/application of a ’minimalist’ organic/biochemical molecular mechanic force field using a combination of ab initio calculations and experimental data. In A. Wilkinson, P. Weiner, W. Van Gunsteren (Eds.), Computer Simulation of Biomolecular Systems, Vol. 3, (Elsevier, 1997), pp. 83–96
37. V. Tsui and D. A. Case, Biopolymers 2000, 56, 275–291.
38. W. Zhang, C. Wu and Y. Duan, J. Chem. Phys. 2005, 123, 154105–154113.
39. J. P. Ryckaert, G. Ciccotti and H. J. C. Berendsen, J. Comp. Phys. 1977, 23, 327–341.
40. S. Kumar, D. Bouzida, R.H. Swendsen, P. A. Kollman, and J.M. Rosenberg, J. Comp. Chem. 1992, 13, 1011–1021.
41. J. D. Chodera, W. C. Swope, J. W. Pitera, C. Seok, and K. A. Dill, J. Chem. Theory Comput. 2007, 3, 26–41.
42. G. N. Ramachandran, C. Ramakrishnan and V. Sasisekharan, J. Mol. Biol. 1963, 7, 95–99.
43. Y. Mu, P. H. Nguyen and G. Stock, Proteins 2005, 58, 45–52.
44. J. Hu, X. Shen, Y. Shao, C. Bystroff and M. J. Zaki, Mining protein contact maps. In M. Zaki, J. Wang and H. Toivonen (Eds.), Proceedings of BIOKDD02: Workshop on Data Mining in Bioinformatics, with SIGKDD02 Conference, AMC press, Edmonton, 2002, 3–10.
45. N. Gupta, N. Mangel and S. Biswas, Proteins 2005, 59 196–204.
46. D. L. Minor Jr and P. S. Kim, Nature 1996, 380, 730–734.
Povzetek
V delu smo naredili ra~unalni{ke simulacije 98 majhnih peptidov dol`ine 6, 8, 10 in 12 aminokislinskih preostankov proteina G. Iskali smo stabilne fragmente. Molekulsko dinamiko zamenjave replik smo izvajali s programom Amber 7. Uporabili smo polje sil parm96 s posplo{enim Bornovim modelom topila (GB/SA). Ugotovili smo, da je dober kriterij za dolo~evanje strukturiranih fragmentov konformacijska entropija in prosta energija posameznega stanja. Visoka razlika proste energije med osnovnim in prvim vi{jim stanjem ter nizka entropija sta zna~ilna za strukturirane peptide. Veliko strukturiranih peptidov ima strukture podobne nativni. Simulacije kraj{ih peptidov so uporabne za ugotavljanje mest, kjer se protein pri~ne zvijati, in lahko pripomorejo k bolj{emu napovedovanju strukture proteinov.
Urbi~ et al.:  Molecular simulations find stable structures in fragments of Protein G