70 Mechanisms of protein folding Viara Grantcharova*, Eric J Alm†, David Baker† and Arthur L Horwich‡ The strong correlation between protein folding rates and the between the unfavorable loss in configurational entropy contact order suggests that folding rates are largely determined upon folding and the gain in attractive native interactions; by the topology of the native structure. However, for a given non-native interactions are assumed not to play a signifi- topology, there may be several possible low free energy paths to cant role. As will be discussed first, recent results suggest a the native state and the path that is chosen (the lowest free picture in which several different routes through the free energy path) may depend on differences in interaction energies energy landscape with roughly equivalent free energy bar- and local free energies of ordering in different parts of the riers can be consistent with the overall topology structure. For larger proteins whose folding is assisted by (low-resolution structure) of a protein and sequence chaperones, such as the Escherichia colichaperonin GroEL, changes can, by lowering or raising one barrier relative to advances have been made in understanding both the aspects of another, produce significant changes in the transition-state an unfolded protein that GroEL recognizes and the mode of ensemble without large changes in the folding rate. binding to the chaperonin. The possibility that GroEL can remove Because our recent articles have probably overly empha- non-native proteins from kinetic traps by unfolding them either sized the role of native state topology [1–3], we shall during polypeptide binding to the chaperonin or during the subsequently focus our attention on several examples that subsequent ATP-dependent formation of folding-active complexes illustrate how variations in local free energies of ordering with the co-chaperonin GroES has also been explored. can modulate the folding process. We begin by considering a zeroth order model in which all Addresses *Center for Genomics Research, Harvard University, Cambridge, native interactions in a protein are equally favorable (i.e. MA02138, USA homogeneous contact model). In such a model, the free †Department of Biochemistry and Howard Hughes Medical Institute, energy cost of forming different contacts in a protein University of Washington, Seattle, WA 98195, USA depends solely on the entropic cost of restricting the chain ‡Department of Genetics and Howard Hughes Medical Institute, YaleSchool of Medicine, New Haven, CT 06510, USA to allow the contact. This entropic cost increases with increasing sequence separation between the interacting Current Opinion in Structural Biology2001, 11:70–82 residues, as more of the chain must be constrained in order 0959-440X/01/$ — see front matter to form the contact. When many of the contacts in a pro- © 2001 Elsevier Science Ltd. All rights reserved. tein are between residues distant in the primary sequence, a large portion of the chain must be ordered before even a Abbreviations AcP acylphosphatase few favorable contacts can form, leading to a large folding Ada2h activation domain of procarboxypeptidase free energy barrier. Conversely, when interacting residues CO contact order are close in the protein sequence, the entropic cost of EDTA ethylenediamine tetra-acetic acid chain ordering is partially compensated by the formation GFP green fluorescent protein MDH malate dehydrogenase of contacts earlier in the folding process, leading to a Rubisco ribulose-1,5-bisphosphate carboxylase-oxygenase smaller folding free energy barrier. Therefore, in this very SH Src homology simple model, one expects proteins with most of their con- TFE trifluoroethanol tacts between residues close in the sequence to fold faster than proteins with contacts between residues distant in Introduction the sequence. Two aspects of protein folding mechanisms are considered in this review: recent insights into the folding behavior of small Several years ago, we found such a relationship between two-state folding proteins and the action of the chaperonin folding rate and the average sequence separation between GroEL in assisting the folding of larger proteins. contacting residues (the contact order — CO) [1]. A con- siderable number of proteins have been studied in the Folding of small proteins interim period and an updated version of the plot, encom- The past several years have witnessed a rapid increase in passing all the two-state folding proteins that have been the amount of experimental data on the folding of small kinetically characterized (Table1), shows an even stronger single-domain proteins. Comparison of results on sets of correlation between CO and rate of folding (Figure 1a). both homologous and unrelated proteins has provided con- The correlation is particularly remarkable because of the siderable insight into the determinants of the folding very wide variation in the folds and functions of these pro- process. In this part of the review, we present simple mod- teins. It suggests that the low-resolution structure or els that incorporate recent experimental findings and topology of a protein is a major determinant of the trade- appear to capture the broad outlines of the folding process. off between configurational entropy loss and formation of An important feature of these models is that the folding attractive interactions, as suggested by the simple model free energy landscape is dominated by the trade-off described in the previous paragraph. The correlation also Mechanisms of protein foldingGrantcharova etal. 71 supports the assumption that non-native interactions play Table1 a relatively minor role in shaping the folding process as, Rates of folding for two-state folding proteins. unlike native interactions, they are not expected to be related to the native structure. Protein* Log(k)† CO‡ ∆G Length§ Temperature f u (%) (kcal/mol) (residues) (C°) In the simple zeroth order model discussed above, increas- Cyt-B [62] 5.30 7.47 10.0 106 20 ing uniformly the strength of all interactions clearly Myogl5o6b2in 4.83# 8.50 8.4 154 25 reduces the free energy barrier to folding (the unfavorable λ-repressor [63] 4.78 9.37 5.6 80 20 entropy of ordering is better compensated by the forma- PSBD [64] 4.20 11.20 2.2 41 41 Cyt-c [65] 3.80# 11.22 8.2 104 23 tion of the more favorable interactions) and the folding rate Im9 [66] 3.16 12.07 6.6 85 10 increases. Thus, for a given protein, reducing the strength ACBP [67] 2.85 13.99 8.2 86 25# of the favorable interactions (i.e. reducing stability) is Villin 14T [68] 3.25 12.31 9.8 126 25 expected to reduce the folding rate. Indeed, there is a N-term L9 [69] 2.87 12.74 4.5 56 25 Ubiquitin [70] 3.19 15.11 7.2 76 25 nearly linear correlation between folding rate and stability CI2 [71] 1.75 16.40 7.6 64 25 for a given protein upon changes in solution conditions, U1A [72] 2.53 16.91 9.9 102 25 most notably upon the addition of denaturant. Also, Ada2h [73] 2.88 16.96 4.1 79 25 within a protein family, more stable proteins generally fold Protein G [74] 2.46 17.30 4.6 56 25 more rapidly than less stable proteins [4,5]. However, the Protein L [75] 1.78 17.62 4.6 62 22 FKBP [76] 0.60 17.70 5.5 107 25 correlation between stability and folding rate for proteins HPr [77] 1.17 18.35 4.7 85 20 with different folds is much weaker than that between CO MerP [78] 0.26# 18.90 3.4 72 25 and folding rate, consistent with the dominant role of mAcP [79] –0.64 21.20 4.5 98 28 native state topology in determining folding rates[2]. CspB [4] 2.84 16.40 2.7 67 25 TNfn3 [80] 0.46 17.35 5.3 92 20 TI I27 [80] 1.51 17.82 7.5 89 25 Interestingly, there is a better correlation between the Fyn SH3 [5] 1.97 18.28 6.0 59 20 folding rate and the relative CO (average sequence separa- Twitchin [80] 0.18 19.70 4.0 93 20 tion divided by chain length) than between the folding PsaE(a) 0.51 17.01 1.57 69 22 Sso7d(b) 3.02 9.54 5.93 63 25 rate and the absolute (unnormalized) CO (compare Figure1a,b). This is somewhat unexpected as the entropic *A nonhomologous set of simple, single-domain, non-disulfide-bonded cost of contact formation is a function of the absolute CO, proteins that have been reported to fold via two-state kinetics under at rather than of the relative CO, and simple models of the least some conditions. Reported data and representative members of homologous families selected as previously described [1]. sort discussed above predict relationships with the †Extrapolated folding rates in water. May differ from true folding rate in absolute CO. If the improved correlation with the relative water (e.g. cyt-c, protein G, ubiqutin and others) due to 'roll-over' at CO is borne out by further experimental data over the next low denaturant concentrations. ‡Calculated as previously described several years, it may be necessary to consider models in [1]. §Length of protein in residues from first structured residue to last. which there is a renormalization that removes the depen- May differ from number of residues in construct characterized. #As reported previously in [2]. (a) P Bowers, D Baker, unpublished data. dence on the absolute length of the protein. An alternative (b) L Serrano, personal communication. possibility is that, for the proteins in this set, the stability increases with increasing length and dividing by the length accounts for the effect of stability on the folding rate, albeit within particular protein structures (in an all-helical pro- in a somewhat indirect way. tein, the contact lengths are consistently shorter than in a parallel β-sheet protein, for example). We frequently encounter two questions about the contact order/folding rate correlation. First, given that the entropic In the simple zeroth order model, protein topology is the cost of closing a loop in a protein is proportional to the log- single most important determinant of the folding process arithm of the loop length, shouldn’t folding rates be more because it determines the sequence separation and spa- closely correlated to the logarithm of the CO? As shown in tial arrangement of the contacting residues. Indeed, Figure 1c, because of the limited range of the CO values, simple computational models based on the homogeneous the relationship between folding rates and log CO is nearly contact picture have done reasonably well at capturing indistinguishable from that between folding rates and CO. many of the overall features of protein folding rates and Second, as the magnitude of the entropic barrier to folding mechanisms [6–9]. However, there are now a number of depends on the CO of the folding transition-state ensem- examples in which differences in local free energies of ble, why is there a correlation between folding rates and ordering have a significant influence on the folding the CO of the native structure? The correlation suggests mechanism, particularly in cases in which several differ- that the CO of the native structure is, in turn, correlated ent pathways are equally consistent with the structure with that of the transition-state ensemble; this is not sur- because of symmetry (see below). These differences may prising given that a reasonable fraction of the native arise, for example, from particularly unfavorable local structure is usually formed in the transition-state ensemble conformations that either are important for functional and that contact lengths tend to be relatively consistent reasons or are compensated in the final folded structure 72 Folding and binding Figure 1 (a) 6 (b) 6 (c) 6 x x x 5 x x 5 x x 5 x x 4 x 4 x 4 x x x x log(k) 23 x xxx x x xxxxxxxx log(k) 23 x x xxxxxxxx x xx x log(k) 23 x xxx x x xxxxxxxx 1 x 1 x 1 x 0 xxx xx 0 x x x xx 0 xxx xx x x x –1 –1 –1 5 10 15 20 25 0 5 10 15 20 25 0.8 0.9 1 1.1 1.2 1.3 1.4 Relative CO (%) Absolute CO log(relative CO) Current Opinion in Structural Biology Correlation between the logarithm of the folding rate and (a)relative CO, (b)absolute CO and (c)log(relative CO). by particularly favorable nonlocal interactions. in the transition-state ensemble. The rate-limiting step in Incorporation of these differences leads to a model in folding involves the association of two monomers to form which the order of events in folding depends both on the a dimer in which hydrophobic residues are partially overall topology and on the relative free energy of order- buried, but the helices are not completely formed. The ing different parts of the chain. Given two possible routes C-terminal region of the helix exhibits higher helix to the native state, which involve forming contacts propensity and mutations in that region have larger effects between residues equally distant along the chain, the on the folding rate than mutations in the N terminus lowest free energy route is that involving the formation of [12,13]. Interestingly, the effect of mutations on the fold- the lowest free energy substructures. Such a model pro- ing rate can be altered by manipulating the helix duces considerably better predictions of the folding rate propensity throughout the helix with the help of additional and of the dominant features of the structure of the fold- mutations. For example, once the N terminus of the helix ing transition-state ensemble than the simple zeroth is stabilized by two alanine substitutions, a subsequent order model (see Figures 2 and 3; E Alm, A Morozov, mutation at the C terminus has a relatively small effect on DBaker, unpublished data). folding, and when the C terminus is destabilized by a glycine substitution, a subsequent mutation at the N ter- Experimentally, the distribution of structure in the folding minus has a much larger effect on folding than in the transition state can be determined by measuring the effect wild-type protein [12]. Thus, whereas in the wild-type of mutations throughout the protein on the folding and protein the rate-limiting step appears to involve primarily unfolding rate [10]. Fersht’s Φ value notation is a conve- the association of C-terminal portions of the two helices nient way to summarize such data; a Φ value of one [13], association of the N-terminal regions can nucleate indicates that the interactions made by a residue are as folding if the N terminus is stabilized or the C terminus is ordered in the transition state as in the native state, whereas destabilized. Such malleability is expected given the sym- a Φ value of zero indicates that the interactions are not metry of the helix—it appears that the rate-limiting step formed in the transition state [11]. Table2 summarizes the involves the pairing of helical regions of the two general properties of the folding transition states studied monomers, but whether these are C-terminal or N-termi- so far using this kind of analysis. The following focuses on nal depends on the details of the sequence and can be several recent examples that highlight the interplay perturbed by mutations that alter the helix propensity. between the native state topology and variations in local However, when the symmetry is broken by connecting the free energies of ordering in determining the folding mech- N termini of the helices with a covalent cross-link, the por- anism (this is not a comprehensive summary of recent tions of the helices adjacent to the (N-terminal) cross-link advances in protein folding studies). are largely formed and the C-terminal regions are largely disrupted in the transition state, regardless of the intrinsic GCN4 and λλrepressor helical propensities [12]. Therefore, in this system, local The GCN4-p1 coiled coil is a particularly simple system structural biases have some influence on the transition for the detailed examination of the effects of topology and state when multiple folding routes are equally consistent local structural propensity on the distribution of structure with the overall topology because of symmetry (the dimeric Mechanisms of protein foldingGrantcharova etal. 73 Table2 Folding transition states characterized by mutational analysis. Protein Fold Number of residues Number of mutants Transition state (TS) characteristics λRepressor αhelix 80 8 Some helices are more structured in the TS than others; multiple folding pathways were postulated because of the dramatic effect of single mutations and temperature on TS structure [14,15] ACBP αhelix 86 26 Terminal helices come together in the TS, while the rest of the protein is involved in non-native interactions; conserved hydrophobic residues are important in the TS [67] GCN4 coiled coil αhelix TS for coiled-coil formation is different when the two helices are Monomer 72 3 cross-linked and when they form a dimer [12,13] Dimer 36/36 3 src SH3 domain βbarrel 57 57 TS is structurally polarized, with part of the protein fully formed α-Spectrin SH3 domain 62 17 and the rest fully disordered; TS is conserved among distant sequence homologs [3,22] PsaE βbarrel 69 18 These proteins are structural homologs of the SH3 domain, but Sso7d 63 24 do not exhibit the same TS (P Bowers, D Baker, unpublished Simplified SH3 56 5 data; L Serrano, personal communication; Q Yi, D Baker, unpublished data) src SH3 circ βbarrel 57 14 Circularization (circ) makes the TS more delocalized, whereas src SH3 cross 57 9 cross-linking (cross) of the distal hairpin leaves it unchanged [25] Spectrin SH3 perm1 βbarrel 62 7 Permutation at the distal hairpin, but not at the RT loop, causes a Spectrin SH3 perm2 62 8 shift in the structure of the TS [81] TNfn3 βsandwich 92 48 Structurally polarized: a ring of core residues from the central βstrands forms the folding nucleus, while the terminal strands are disordered [82] Ada2h α/β 81 15 The topology of this fold allows several different TSs, depending AcP (βαββαββ) 98 26 on which helix is more structured [19–21,73] U1A 102 13 S6 101 ? Protein L α/β 62 70 The symmetric topology of the protein allows for two possible Protein G (ββαββ) 57 19 TSs, depending on which hairpin is more stable; stabilizing the Protein G_Nu 57 4 opposite hairpin leads to a switch in the transition state (protein G_Nu); ([16,17]; S Nauli, B Kuhlman, D Baker, unpublished data) CI2 α/β 64 150 Delocalized TS, with most of the interactions only partially formed [71] CI2 circ α/β 64 11 Circularization (circ), circular permutation (perm) and CI2 perm 64 11 fragmentation (frag) do not change the delocalized TS [83] CI2 frag 40/24 23 FKBP α/β 107 34 [76,84] CheY α/β 129 34 [85] p13suc1 α/β 113 57 [86] Arc repressor α/β 53 44 Delocalized TS [87] case). However, when the topology strongly favors one Protein L and protein G particular route to the native state because of the reduced Protein L and protein G are structural homologs, but have entropic cost of forming more local interactions (the little detectable sequence similarity. Both proteins consist monomeric case), secondary structure propensities are of of an αhelix packed across a four-stranded sheet formed by little consequence. two symmetrically disposed β hairpins. Remarkably, the symmetry of the fold is almost completely broken during The λ repressor, another all α-helical protein, has also folding: in protein L, the first hairpin is formed and the sec- been postulated to fold by a number of pathways, depend- ond disrupted at the rate-limiting step in folding, whereas ing on the intrinsic stability of each helix. Both point in protein G, the second hairpin is formed and the first is mutations [14] and temperature [15] have been shown to disrupted [16,17] (Figure 2). Thus, despite the small size significantly change structure in the transition state. (~60 residues) of the two proteins and their topological 74 Folding and binding Figure 2 at the rate-limiting step is the one with the lowest free energy of formation. To test this hypothesis, computational protein design methods [18] have recently been used to specifically stabilize the first βhairpin of protein G, which, as noted above, is not formed in the transition state in the wild- type protein. A redesigned protein G variant with a more optimal backbone conformation and sequence in the first hairpin folds 100-fold faster than the wild-type protein. Subsequent mutational analysis shows that the first β hair- pin, rather than the second βhairpin (as in the wild-type), is formed in the transition state in the redesigned protein (SNauli, BKuhlman, D Baker, unpublished data). Likewise, following stabilization by redesign of the second hairpin of protein L, which contains three consecutive residues with positive phi angles in the wild-type structure, and destabi- lization of the first hairpin, the second hairpin was found to be better formed in the folding transition-state ensemble than the first turn (D Kim, B Kuhlman, D Baker, unpub- lished data). These switches in folding mechanism highlight the differences local free energies of ordering can have when the overall topology has considerable symmetry. AcP, Ada2h, U1A and S6 The folding transition states of four proteins with the ferredoxin-like fold (two helices packed against one side of a five-stranded β sheet) have been characterized. The folding transition states of Ada2h (activation domain of procarboxypeptidase) and AcP (acylphosphatase) are simi- lar, despite the low sequence similarity (13%) between the two proteins and variations in the length of the secondary structural elements [19,20]. In both cases, the overall topology of the protein appears to be already specified in Folding transition states of (a) protein G and (b)protein L. Left, predicted the transition state, but the second αhelix and the inside phi values; right, experimental phi values. The color scheme is continuous from red (Φ=1; structured in the transition state) to blue (Φ=0; strands of the βsheet with which it interacts appear to be unstructured in the transition state). Sites not probed experimentally are more ordered than the rest of the polypeptide chain. The indicated in white. Graphics were generated with MOLSCRIPT[88]. characterization of two other members of this structural Predicted phi value distributions were obtained from the highest free family, however, revealed an alternative nucleus with pref- energy configurations along the lowest free energy paths between the unfolded and native states, as described in [6], except that additional erential structure around helix 1: U1A nucleates in helix1 terms representing hydrogen bonding and local sequence/structure and S6 nucleates in both helices [21]. The topology preferences were included in the free energy function. The second appears to allow several roughly equivalent folding path- βhairpin is favored by the computational model for protein G, because of ways: the choice of the dominant pathway may be an extensive hydrogen-bond network, and the first hairpin is favored by the model for protein L, because the second βturn has considerable determined by the detailed packing and orientation of torsional strain (three consecutive residues with positive phi angles). structural elements. Proteins with this fold also exhibit a pronounced movement of the transition state from 20% to 80% native (as judged by the burial of surface area) with symmetry, there is a definite hierarchy to structure forma- increasing concentration of denaturant. Remarkably, given tion. The characterization of the two transition states the variation in the transition-state structure, the folding suggests that the lowest free energy route to the native rates of these proteins are highly correlated with the CO state for this fold involves formation of one of the two over an approximately 4000-fold range of folding rates. βhairpins; however, the choice of hairpin is determined by Furthermore, changing the CO can significantly change factors beyond native state topology. Interestingly, with the the folding rate: a circular permutant of U1A with CO addition of hydrogen bonding and sequence- and structure- lower than that of the wild-type protein folds considerably dependent local free energies of ordering, the simple faster (MOliveberg, personal communication). computational model described above [6] recapitulates the experimentally observed symmetry breaking (Figure2). SH3 domain fold SH3 family The correspondence between the predicted and experimen- The homologous src and α-spectrin SH3 domains exhibit tally determined phi values suggests that the hairpin formed very similar transition states [3,22–24], despite the low Mechanisms of protein foldingGrantcharova etal. 75 sequence identity (36%) (Figure 3a,b). Stabilizing muta- Figure 3 tions [23] and changes in pH [22] do not seem to affect the structure of the transition state of the α-spectrin SH3 domain. In the case of the src SH3 domain, stabilization of local structure by hairpin cross-linking and global stabiliza- tion by sodium sulfate do not alter the placement of the transition state along the reaction coordinate [25]. It appears, then, that SH3 domains allow quite large varia- tions in sequence and experimental conditions with no change to the transition state, probably because there are no alternative structural elements that can be sufficiently stabilized to become folding nuclei. On the other hand, modifying the topology of the protein can significantly change the free energy landscape to favor alternative routes for folding. Circularization of the src SH3 domain causes the delocalization of structure in the transition state [25]. Circular permutation experiments on the α-spectrin SH3 domain also changed the transition state [26]. Connecting the wild-type termini with a small peptide linker and introducing a cut in the distal hairpin resulted in a shift in the structure of the transition state towards the n-src loop and the hairpin formed by the old termini; the former distal hairpin was completely disordered at the rate- limiting step. Therefore, shifts in transition-state structure can occur when formerly distant elements are covalently linked to reduce the entropic cost of their interaction. Drastic mutagenesis, which weakens the interaction ener- gies throughout the protein, can also change the transition state. For example, a sequence-simplified mutant of the src SH3 domain made predominantly of five amino acids (isoleucine, lysine, glutamic acid, alanine and glycine) was found to have a more delocalized transition state (distal hairpin is not fully formed); the interactions stabilizing the wild-type SH3 transition state may not be strong enough in the simplified mutant to overcome the loss in entropy and residues from other parts of the protein may have to participate (Q Yi, D Baker, unpublished data). SH3 structural analogs The characterization of SH3 structural analogs has shown that transition-state structure is not always conserved in proteins with similar topologies. PsaE [27], a photosystem protein from cyanobacteria, has a large loop insertion at the distal hairpin (13 amino acids), making it entropically Folding transition states of proteins with the SH3 fold: (a)src SH3 more costly to form stabilizing interactions. As a result, its domain, (b)spectrin SH3 domain, (c) Sso7d and (d) PsaE. Left, transition state is more delocalized than that of the src SH3 predicted phi values (see legend to Figure 2); right, experimental phi domain, with well-ordered residues found in the distal values. The color scheme is described in the legend to Figure 2. The hairpin, as well as in the N and C termini (P Bowers, distal loop is favored over the n-src loop by the computational model for the src SH3 domain because it has more extensive hydrogen D Baker, unpublished data) (Figure 3d). Sso7d, a DNA- bonding, whereas the equivalent of the distal loop is disfavored by the binding protein from Sulfolobus solfataricus [28], has a model for Sso7d because it contains five glycine residues that are significantly different transition state from that of the src costly to order. and α-spectrin SH3 domains. The n-src loop and the Cterminus (which is a helix in Sso7d, instead of a βstrand) are the most structured in the transition state, whereas the domains and in Sso7d, the contiguous three-stranded distal hairpin is only weakly ordered (R Guerois, sheet is formed but, in one case, the diverging turn inter- LSerrano, personal communication) (Figure3c). This is in acts with it, whereas in the other case, it is the C-terminal contrast to the src and α-spectrin SH3 transition states, in helix. This difference may reflect variations in the free which the distal hairpin is completely ordered. In the SH3 energies of forming the structural elements: in the SH3 76 Folding and binding domains, the distal loop hairpin is well packed and the contact order regions may tilt the kinetic competition n-src loop is irregular, whereas in Sso7d, the opposite is the between on- and off -pathway reactions in favor of the lat- case — the equivalent of the distal hairpin contains five ter. It should be emphasized, however, that non-native consecutive glycine residues (which are likely to be func- interactions are likely to play a greater role in the folding tionally important). With the inclusion of hydrogen of larger proteins simply because the increased size of the bonding and sequence- and structure-dependent local free protein increases the probability of low free energy non- energies of ordering, the simple computational model native interactions. Chaperones act on such non-native described above [6] produces phi values very similar to states in the first instance by binding the hydrophobic sur- those observed experimentally for the SH3 domains and faces that are exposed, preventing these surfaces from Sso7d. Similar results were very recently published by ‘wrongful interactions’ that lead to multimolecular aggre- Guerois and Serrano (R Guerois, L Serrano, unpublished gation. Binding may, in some cases, be associated also with data; see Now published). at least partial unfolding, as discussed below for GroEL. Release from the chaperones, in many cases driven by In summary, folding transition-state structure is conserved ATP binding (not hydrolysis), then allows the substrate more highly within the SH3 sequence superfamily than polypeptide a chance to fold. Uniquely, in the case of the among SH3 analogs. The SH3 topology, then, although chaperonin ring class of chaperones, polypeptide is not as obviously symmetric as the protein L/protein G released into an encapsulated chamber where folding pro- topology, still allows several alternative folding routes. ceeds in isolation. In the case of the bacterial chaperonin, The prevalence of one route over the other depends on GroEL, this is mediated by ATP/GroES binding, which is the details of the structure. This may, in part, be due to associated with rigid-body movements of the GroEL inter- the fact that functional constraints lead to the conserva- mediate and peptide-binding apical domains of the bound tion within, but not between, superfamilies of portions of ring [29] (see Figure4). The 60° elevation and 90° twisting protein structures with unusual local features (the irregu- of the apical domains act to remove the hydrophobic pep- lar n-src and RT loops in the SH3 domain, for example, tide-binding sites away from the central cavity, releasing are involved in proline-rich peptide binding) with higher polypeptide into this GroES-encapsulated space. Because free energies of formation. Thesefeatures partially deter- the character of the wall of the cavity is switched from mine which of the pathways consistent with the native hydrophobic to hydrophilic as the result of the rigid-body state topology is actually chosen. movements, it may influence the released polypeptide to fold in this space because burial of exposed hydrophobic The GCN4 and protein G experiments, together with the surfaces and exposure of hydrophilic surfaces, features of comparisons of transition-state structures in the AcP and the native state, will be energetically favored. SH3 families, suggest a picture in which several different ‘pathways’ with roughly equivalent free energy barriers Both cryo-EM reconstructions [30] and high-resolution can be consistent with the overall topology. Sequence crystal structures have resolved the rigid-body domain changes can, by lowering or raising one barrier relative to movements of the GroEL–GroES machinery itself during another, produce significant changes in the transition- the reaction cycle [29,31] (see Figure4). In addition, there state ensemble without large changes in folding rate. are dynamic fluorescence and kinetic studies indicating, Consistent with this picture, our most recent models of respectively, rapid release of bound polypeptide into the the folding process produce considerably more accurate central cavity upon ATP/GroES binding (t ~1s) and pro- ½ predictions of folding transition-state structures when ductive folding inside the GroEL–GroES cavity [32–34]. local free energies of ordering based on sequence-depen- However, the exact effects of the various states and tran- dent backbone torsion angles and local hydrogen bonding sitions of the GroEL–GroES machinery during the terms are included. We anticipate considerable synergy reaction cycle on the conformation of polypeptide sub- between theory and experiment, and an important role for strates are not well understood because, as ensembles of computational protein design methods in the further elu- unstable non-native states, the substrates are much less cidation of the mechanisms of protein folding during the accessible to structural study, particularly in the presence next few years. of the megadalton GroEL ring structure. Thus, our ‘view’ of what is happening to substrate proteins themselves dur- GroEL–GroES-assisted folding ing the GroEL–GroES reaction is poorly resolved. At this How do the foregoing simple concepts apply to chaperone- point, the study of stringent substrates, which aredepen- assisted folding? In small proteins, the largest free energy dent on the complete system to reach their native form barriers to folding involve the formation of particularly and are unable to productively fold without it, seems nonlocal portions of protein structures and regions with valuable for identifying and characterizing the full range particularly unfavorable local energetics. It seems possible, of steps in the reaction that are critical to producing the therefore, that larger proteins containing such features may native state. Nevertheless, there can also be value to be particularly dependent on chaperones for suppressing studying nonstringent substrates, particularly those whose alternative off-pathway misfolding/aggregation. Kinetic nonchaperoned folding is well described, because folding bottlenecks caused by unfavorable local structures or high behavior can be compared in the presence and absence of Mechanisms of protein foldingGrantcharova etal. 77 Figure 4 Rigid-body movements of a GroEL subunit attendant to ATP/GroES binding. Rigid-body rotations about the top and bottom of the APICAL intermediate domain redirect the peptide- Helix H bindingsurface of the apical domain, composed of helices H and I and an Helix I underlying extended segment, from a position facing the central cavity (lying to the right of INTERMEDIATE the subunit) to a new position facing out of the page. The binding of peptides in the Extended groove between helices H and I, through segment contacts with resident hydrophobic sidechains, has been observed (see text). Although the involvement of the extended segment of the apical domain in polypeptide binding has been indicated by mutational studies, a structural basis for such interaction remains undefined (adapted from [29]). EQUATORIAL Current Opinion in Structural Biology chaperonin. Even small peptides may, to some extent, both while in a metastable intermediate state in solution simulate the behavior of a region of polypeptide chain, at and after becoming bound to GroEL [40•]. In this case, a least in binding to GroEL. high degree of protection from exchange was observed for a small number of amide protons both in the metastable Binding to GroEL—potential unfolding action intermediate in solution and in the binary complex with There are definable points in the GroEL–GroES reaction GroEL. Thus, whatever the nature of this secondary struc- cycle (Figure 5) at which major actions on polypeptide ture(s), it appears to be resistant to the unfolding action substrates have been considered likely to occur. One is the associated with GroEL binding. Some proteins, however, step of polypeptide binding to an open GroEL ring may nevertheless be subject to catalyzed unfolding at a (which, under physiological conditions, would be the open local level during the process of binding to GroEL. ring of a GroEL–GroES–ADP asymmetric complex) [35] (see Figure5). Binding may be associated with at least par- The thermodynamic mechanism for unfolding in the pres- tial unfolding of a substrate protein, which is potentially a ence of GroEL involves the greater affinity of GroEL for means for removing a non-native form from a kinetic trap. less-folded states among an ensemble of conformers that are This could occur through either or both of two mecha- in equilibrium with each other [43]. This would effectively nisms, one catalytic, in which GroEL lowers the energy shift the equilibrium by mass action toward the less-folded barriers between various non-native states, the other ther- states. Perhaps the best evidence supporting an action of modynamic, in which GroEL preferentially binds this sort comes from study of an RNaseT1 mutant that pop- less-folded states without affecting the transition states ulates two non-native states, one more structured than the between the various conformations. The best evidence to other [44]. In the presence of GroEL, the less-folded state date for a catalytic unfolding action associated with bind- became more populated, without alteration of the micro- ing comes from a hydrogen-deuterium exchange scopic rate constants between the two states, arguing for a experiment showing that GroEL in catalytic amounts can thermodynamic effect (see also [42,45,46] for descriptions of globally unfold the 6kDa protein barnase [36]. Whether such effects on β-lactamase, dihydrofolate reductase and GroEL can exert similar effects on larger proteins, includ- barstar). Such partitioning between non-native states has yet ing those that form stable binary complexes with it, to be demonstrated for stringent substrates, although the remains unclear. A number of exchange studies carried out ability of GroEL to inhibit the production of off-pathway with stable binary complexes of such proteins as α-lactal- aggregates of malate dehydrogenase (MDH) has been bumin [37], human dihydrofolate reductase [38,39] and kinetically modeled to such a mechanism. In the model, Rubisco (ribulose-1,5-bisphosphate carboxylase-oxyge- GroEL favors binding of MDH monomers and shifts an nase) [40•] indicate that these proteins do not become equilibrium of low-order aggregates of MDH toward this globally exchanged while bound to GroEL, exhibiting state [47]. Clearly, the ability to resolve different conforma- modest levels of amide proton protection that are, in some tional states within an ensemble of substrate proteins, both cases, localized (but see, however, [41,42], which showed unbound and GroEL-bound, using spectroscopic tech- that cyclophilin and a chemically denatured β-lactamase, niques, for example, will be necessary to better characterize respectively, were completely exchanged while bound). In the behavior of an open GroEL ring toward its substrates. the case of Rubisco, it was possible to examine the protein Both catalytic and thermodynamic mechanisms could be 78 Folding and binding Figure 5 N or I c (i) (ii) (iii) (iv) (v) Iuc or GroES T T T T T D T D D D D D T T T T ATP, GroES polypeptide cis cis Polypeptide binding Folding triggered ES release primed Discharge triggered New cis Current Opinion in Structural Biology GroEL–GroES reaction cycle. Non-native polypeptide is bound in the of non-native polypeptide serves to accelerate the rate of this open (trans) ring of an asymmetric GroEL–GroES–ADP (D) complex departure by 30–50-fold. Note that the polypeptide can be ejected in via hydrophobic interactions with the surrounding apical domains either a native form (N), a form committed to reaching the native state (paneli). Binding of ATP (T) and GroES to the same ring as the in the bulk solution (I ) or an uncommitted non-native state (I ) that c uc polypeptide produces large rigid-body movements in the subunits of can be rebound by chaperonin. The relatively slow binding of GroES the ring, elevating and twisting the hydrophobic binding surface away to the new ATP/polypeptide-bound ring orders the formation of the from the bound polypeptide, releasing it into the encapsulated and next folding-active GroEL–GroES complex (panelv). Thus, GroEL now hydrophilic cis chamber where folding commences (panelii). alternates rings back and forth as folding-active, expending the ATP of After 8–10s, ATP hydrolysis occurs in the seven subunits of the one ring to simultaneously initiate a new folding reaction, while folding-active ring, relaxing the affinity of the ring for GroES and dissociating the previous one from the opposite ring. As discussed in ‘priming’ it for release (paneliii). At the same time, cishydrolysis the text, polypeptide binding in an open GroEL ring (panelsi and iv) produces an allosteric adjustment of the transring that allows rapid may be associated with an action of unfolding. The step of entry of ATP and non-native polypeptide (paneliv). The arrival of ATP ATP/GroES binding may also produce forced mechanical unfolding triggers allosteric dissociation of the cisligands (panelv); the binding (panelsii and v). operative, depending on the particular substrate and its on the structure of a substrate protein bound to GroEL, position on the landscape. Finally, although the binding of we can only extrapolate from a variety of different types of substrate proteins is usually thought of as redirecting off- experimental information, which, in the past year, has pathway states, there seems no reason to exclude that, in at been derived from proteomic, biochemical, spectroscopic least some cases, GroEL could recognize on-pathway inter- and crystallographic studies. At the level of binding to mediates, which could also receive kinetic assistance as a individual apical domains, a crystallographic study result of recruitment to the GroEL–GroES cavity. observed that a dodecamer peptide, selected for its high affinity for an isolated apical domain, associated with it as Both catalytic and thermodynamic unfolding mechanisms a β hairpin, both in a co-crystal with an isolated apical could be enabled by the ability of the multiple surrounding domain and in one with full occupancy of the apical GroEL apical domains to interact with a substrate protein. domains of the GroEL tetradecamer [49•]. In these struc- Such multivalent binding was recently indicated by an tures, one strand of the hairpin contacted the apical experiment with covalent GroEL rings bearing various domain at a position between the two αhelices (H and I) numbers and arrangements of binding-proficient and bind- facing the central cavity (see Figure 4). A host of ing-incompetent apical domains [48•]. A minimum of three hydrophobic contacts were formed between tryptophan consecutive proficient domains was required for efficient and phenylalanine residues in the peptide and hydropho- binding of a stringent substrate protein. In agreement, an bic sidechains in the two αhelices; these helices had been accompanying experiment employing cysteine cross-link- previously implicated in polypeptide binding by a muta- ing between a bound substrate protein and a GroEL ring genesis study [50] and by a previous crystallographic study observed cross-links with multiple GroEL apical domains. of an apical domain [51]. In the latter study, similar topol- ogy and contacts were observed between an extended Translating binding action back to structure — N-terminal tag segment of one monomer found lying in what does GroEL recognize? the groove between these two α helices in a neighboring Ultimately, it would be desirable to translate the foregoing monomer in the asymmetric unit. In the dodecamer study, actions associated with chaperonin binding into structural it was additionally noted that, compared with the unoccu- terms. Lacking, however, any high-resolution information pied isolated apical domain crystal structure, in which a Mechanisms of protein foldingGrantcharova etal. 79 number of regions, including the channel-facing ones, Both major secondary structural elements figure together were found to differ somewhat in positioning between in a proteomic study identifying several dozen proteins monomers in the asymmetric unit, the conformations of from Escherichia coli that could be co-immunoprecipitated the isolated domains with peptide bound became virtually with anti-GroEL antiserum upon cell lysis in EDTA (to identical. This suggests that there is a structural plasticity inhibit nucleotide-driven dissociation) [56]. Whether any to the apical binding surface that accommodates the vari- of these are stringent substrates, that is, dependent on ety of substrates and that, upon contact with a particular GroEL–GroES for proper folding, remains to be seen, but substrate, optimizes contacts with it. of this collective of bound species, where a structure of the native form was available, the topology favored was αβ, Lest it seem that only β strands can associate with the with two or more domains. Thus, it seems plausible that GroEL apical domain, two different NMR studies re- GroEL multivalently binds individual α and β units examined an N-terminal 13-residue peptide from the through exposed hydrophobic aspects that will be buried substrate rhodanese that is known to form an αhelix in the together in the native state. This potentially stabilizes the intact native protein. This peptide had been observed individual domains against inappropriate intermolecular or through transfer NOE effects to adopt an α-helical struc- even intramolecular interactions until ATP/GroES-driven ture upon association with intact GroEL [52]. In the first of release directs an optimal chance for correct association the new studies, the same transfer NOE effects were within the molecule, while it is confined to the ciscavity. A observed when the peptide was incubated with an isolated direct illustration of such putative action comes from a GroEL apical domain, and chemical shift changes could be study of the folding of four-disulfide hen lysozyme, com- observed that localized to the same two cavity-facing posed of an α and β domain, in the presence of GroEL α helices (H and I) [53]. In the second study, carried out [57]. The open GroEL ring accelerated the rate of acquisi- with intact GroEL, Dand D,Lchiral forms of the same pep- tion of the native state by 1.3-fold, without affecting the tide were observed to bind as well as the original L form rate or mechanism of domain folding. Rather, GroEL [54]. Whereas the Dform could form a left-handed helix in accelerated the slower step of proper docking of the two TFE, the D,L form did not form α helix. This suggested domains, presumably by binding one or both individual that the hydrophobic content of the peptides was more domains and disfavoring or reversing non-native contacts. critical to binding than adoption of a particular secondary structure. Two dodecameric α-helical peptides with the ATP/GroES-driven release of GroEL-bound same composition were also compared, observing that one substrate into the central cavity—potential with hydrophobic sidechains clustered on one side of the unfolding action predicted helix opposite hydrophilic sidechains The action of ATP/GroES binding on polypeptide confor- (amphiphilic character) bound more strongly than another mation, associated with release into the GroEL–GroES peptide interspersing hydrophobic sidechains with cavity, has been of major interest. An earlier study of the hydrophilic sidechains. This suggested that a contiguous substrate Rubisco, examining its tryptophan fluorescence hydrophobic surface is the feature in a substrate favoring anisotropy, observed a rapid drop (t ~1s), followed by a ½ its recruitment to GroEL. In a third study, a series of slow rise correlating with production of the native state 14-residue peptides that exhibited α-helical character in [32]. The nature of the fast phase had been a mystery, but solution was examined [55]. In this case also, those pep- an exchange experiment with tritium-labeled Rubisco has tides with amphiphilic character were found to bind most begun to address this [40•]. A metastable intermediate of strongly to GroEL, some with submicromolar affinity. this protein exhibited 12 highly protected amide tritiums both in solution and while bound to GroEL. When ATP Thus, GroEL appears able to recognize both major sec- and GroES were added, all but two of the tritiums were ondary structural elements, so long as hydrophobic surface exchanged by 5s, the earliest time examined. The eleva- is presented. It remains curious, however, that, where tion and twisting of the apical domains, driven by examined, recognition appears to occur through the same ATP/GroES binding to a polypeptide-bound ring, were two apical αhelices without recognizable participation of proposed to produce a stretching of substrate between the an underlying extended segment (amino acids 199–209; apical domains before complete release into the cavity. see Figure4) that also bears hydrophobic residues, muta- Such a mechanism would couple the energy of tion of which abolishes polypeptide binding. Thus, the ATP/GroES binding to a forced unfolding action. But the question remains as to whether this segment participates deprotection observed does not seem fully accountable directly in binding. Notably, the H and I α helices also only by a stretching action exerted on molecules becoming form the major contacts with the GroES mobile loop encapsulated in the cis ring. Consider the experimental (itself in an extended state), also through hydrophobic observation that GroES binds randomly to either of the interactions, after elevation and twisting of the apical two GroEL rings of a Rubisco–GroEL binary complex to domains [29] (see Figure4). Thus, binding through these form two different asymmetric complexes: approximately two α helices may be an energetically favored mode, 50% cis ternary complexes and approximately 50% trans although polypeptide and GroES binding occur at two ternary complexes, the latter with GroES on the ring oppo- very different points in space. site the polypeptide-bound one. Thus, one would expect
Description: