Nucleic Acids Res. 2016 Sep 30;44(17):8020-40.

An integrated, structure- and energy-based view of the genetic code 

Henri Grosjean1* and Eric Westhof 2*

1Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, 91198 Gif-sur-Yvette, France and

2Architecture et Réactivité de l’ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 rue René Descartes, 67084 Strasbourg, France

Received May 08, 2016; Revised June 11, 2016; Accepted June 17, 2016



The principles of mRNA decoding are conserved among all extant life forms. We present an integrative view of all the interaction networks between mRNA, tRNA and rRNA: the intrinsic stability of codon-anticodon duplex, the conformation of the anticodon hairpin, the presence of modified nucleotides, the occurrence of non-Watson-Crick pairs in the codon-anticodon helix and the interactions with bases of rRNA at the A-site decoding site. We derive a more information-rich, alternative representation of the genetic code, that is circular with an unsymmetrical distribution of codons leading to a clear segregation between GC-rich 4-codon boxes and AU-rich 2:2-codon and 3:1-codon boxes. All tRNA sequence variations can be visualized, within an internal structural and energy framework, for each organism, and each anticodon of the sense codons. The multiplicity and complexity of nucleotide modifications at positions 34 and 37 of the anticodon loop segregate meaningfully, and correlate well with the necessity to stabilize AU-rich codon-anticodon pairs and to avoid miscoding in split codon boxes. The evolution and expansion of the genetic code is viewed as being originally based on GC content with progressive introduction of A/U together with tRNA modifications. The representation we present should help the engineering of the genetic code to include non-natural amino acids.

doi: 10.1093/nar/gkw608



The molecular processes at the ribosomal decoding site

Although the genetic code (with 61 sense codons for 20 amino acids and three stop codons for termination of translation) was considered, for a long time, “universal”, in reality, it is not. Several organisms and organelles have now been demonstrated to exploit variants in which a given codon is assigned to another amino acid than in the ‘standard, nowadays called “quasi-universal” Genetic Code. However, the basic principles of mRNA decoding by the complex ribosomal machinery are conserved among all extant life forms. In our paper (Grosjean and Westhof, 2016), we have attempted to extract the energetic and structural constraints acting at the ribosomal decoding A-site in order to delineate the degrees of freedom available for variations in decoding and evolution of the Code. This led us to suggest an alternative representation of the codon table that integrates multiple energetic and structural observations. This representation highlights and enforces the central roles of tRNA modifications, not only in the process of accurate reading of the triplet code, but also in the regulation and the control of protein homeostasis that, in fine, have impacts on certain forms of human diseases.


Figure 1 illustrates, in a simplified scheme, the main parameters that are involved in the ribosomal decoding site (also called the A-site, for Aminoacyl-tRNA accepting site). They are: (i) the energy of the base-paired mini helix formed between the three bases of the codon and the anticodon; (ii) the molecular interactions between the binary complex formed by the tRNA bound to its codon on the mRNA and the ribosomal components of the A-site (“the ribosomal grip”); and finally the impact of the anticodon loop closed by the terminal base pair of the anticodon helix (“the proximal “extended” anticodon”).


On the basis of published crystallographic structures (Demeshkina et al., 2012, Jenner et al., 2010, Ogle and Ramakrishnan, 2005, Rozov et al., 2016, Weixlbaumer et al., 2007, Yusupova et al., 2001), a detailed analysis of the contacts formed within the ternary complex (rRNA-mRNA-tRNA) led to the following conclusions: (i) all the energetical and structural interactions act in concert, compensating each other (i.e. enforcing weak or weakening strong ones); (ii) these mutual compensations contribute to guarantee a smooth and uniform decoding of the mRNA and of the folding of the ensuing polypeptide; (iii) the interactions between the three types of RNAs concur to restrict the recognition of the three base pairs between the codon and anticodon only when they adopt a Watson-Crick geometry. Strikingly, the so-called wobble pair [3-34] (see Figure 1 for numbering) is held by the ribosome in an asymmetric fashion, the codon nucleotide 3 being more tightly held to the ribosome than the anticodon nucleotide 34. This last structural observation has important consequences when the base pair is “wobbling” (forming, for example, a G/U base opposition)(Westhof, 2014). Indeed, a GoU pair is not structurally equivalent to its reversal, a UoG pair. In the structural context of the ribosome, a G34oU3 pair can easily be accommodated but not a U34oG3 pair, an impaired recognition that would lead to an inefficient translation. In order to overcome this detrimental effect, cells adopted an apparently complex and intricate strategy: posttranscriptional modifications of nucleotide U34 (symbolized below as an asterisk next to the parent nucleotide). Although the type of chemical adducts varies and depends on the origin of the cell (bacteria, archaea, or eukaryotes)(Grosjean et al., 2010), the modifications facilitate the formation of non-wobbling U34*-G3 (and other types of unusual pairs), while maintaining the possibility of forming a standard U34*-A3. The chemical structures of the pairs U34*-G3 display particular states that include base tautomerism, protonation, altered hydrophobicity or electronic charge distribution (Rozov et al., 2015, Weixlbaumer et al., 2007). In addition, the anticodon nucleotide 37, always a purine, does not base pair but stacks upon the pair [1-36], contributing to the stability of the complex. The modifications of 37 are complex and have two main roles: (i) to enhance the stacking potential and (ii) to keep frame maintenance within the comma free codon series of the mRNA by preventing the formation of an additional or alternative base pair (Agris, 2008, Björk and Hagervall, 2014, Jenner et al., 2010, Marck and Grosjean, 2002).


The wheel representation of the genetic code

The new circular representation of the 64 codons of the decoding system and the main conclusions are recapped in Figure 2. This new organization of the 64 codons is asymmetric with a distribution of codons that leads to a clear segregation between GC-rich 4-codon boxes (at the top of the wheel or the northern section) and AU-rich 2:2-codon, 3:1-codon boxes, and STOP codons (at the bottom or the southern section). The advantage of integrating data in this circular decoding system is that it allows mapping of any parameter, for example variations in tRNA sequences or nucleotide modifications, within a structurally meaningful and coherent context, for each organism and anticodon. In other words, the circular representation of the genetic code emphasizes the inherent regularities present in the decoding recognition processes.


In Figure 2, the codons that contain, at the first two positions, only G = C pairs are the blue part of the wheel (those are energetically calculated as “strong”), while those with only A–U pairs are in the red part of the wheel (those are energetically calculated as “weak”). The codons with mixed pairs of G=C and A–U at either the first or second pair of the codon/anticodon helix are located in the white parts of the wheel (those are energetically calculated as “intermediate”). The blue vertical arrows on the left of the wheel show (i) the increase in the strengths of the networking interactions (that originate from the ribosomal grip and the proximal anticodon, see Figure 1) and (ii) the concomitant opposite increase in base modifications at positions 34 and 37 in the tRNA anticodon loop. On the right part of the wheel, the blue vertical arrow indicates how a primordial highly biased G/C-rich code, encoding less than 20 amino acids, probably evolved by introducing more A/U, leading to the present-day code with 61 codons for 20 amino acids. That evolution is coupled with the progressive introduction of base modifications, especially at positions 34 and 37 of the anticodon loop of tRNAs, each catalyzed by a large array of RNA modification enzymes and appropriate cofactors, that originate from the general metabolism of the cell. In other words, usage of a full set of 61 codons for the 20 universal proteinous amino acids correlates well with the multiplicity and complexity of nucleotide modifications. Modifications at positions 34 and 37 of the anticodon loop stabilize AU-rich codon-anticodon pairs within a comma free mRNA and avoid miscoding in split codon boxes.


The mRNA codon usage and the tRNA pool are strongly interdependent

In cells, the number of tRNAs with different anticodons is never equal to the number of sense codons (i.e. 64 – 3 Stop = 61). With a genomic GC-content in the range 40-60%, the number of tRNAs is generally around 45. In such cases, there are between 2 and 3 tRNAs per 4-codon box and 1-2 tRNAs par 2-codon box. In extreme situations (high GC or AT genomic content), the number of tRNAs decreases sharply, sometimes down to the minimum requirement of 23 tRNAs (1 tRNA per box, not counting initiator tRNA). In one extreme but very important case, that of mammalian mitochondria (including human), the total number of encoded mitochondrial tRNAs is 22 (one of the two arginine boxes is no more assigned and there is a single initiator and elongator tRNA)(Suzuki and Suzuki, 2014).


We have discussed above how structural constraints impose modifications of U34 to decode purine-ending codons in split codon boxes. Strikingly, independently of the genomic context, modifications of U34 in purine-ending 2-codon boxes (and the necessary enzymatic machineries) are maintained and preserved in all types of cells (including mitochondria or parasitic mycoplasmas). This apparently complex strategy allows both diversity in codon usage (different frequencies for “synonymous” codons), whatever the genomic content (for example, very low G in human mitochondria), and minimization of the number of different tRNAs. But, importantly also, beyond insuring economical and efficient translation, the sophisticated arrays of tRNA modifications that cells developed forced the integration and the anchoring of ribosomal translation within the cellular metabolic enzymatic pathways. With medium range GC content (as in E. coli or humans), cells can fine-tune the codon usage of certain mRNAs to tRNA sets requiring specific modifications, thereby linking expression of these mRNAs to properly modified tRNAs. For example, under stress, only those cells with the specific panoply of tRNA modification enzymes will express stress proteins and survive (Chan et al., 2015, Chionh et al., 2016, Endres et al., 2015, Nedialkova and Leidel, 2015, Novoa and Ribas de Pouplana, 2012). In short, cells transformed a structural weakness in the translation apparatus (at the third codon position) into a control and regulation mechanism through a complex integration within metabolic networks (Figure 3).


The translation hub and human diseases

How are these points related to disease? Accuracy in decoding is certainly a prerequisite for correct protein function, but appropriate speed and rhythm during decoding are also a necessity because the native state of the product of translation, a protein, implies a proper folding in three-dimensional space that is dynamically acquired. In other words, native protein folding is tightly linked to accurate, efficient and speed-control translation. Protein synthesis, the types of synthesized proteins, vary widely between cells types and environments and, accordingly, the levels of nutrients, metabolites and energy requirements for proper protein folding. Besides properly assembled ribosomes and cofactors, adequate levels of fully modified tRNAs are required for smooth translation. The accumulation of unfolded or misfolded proteins in the lumen of the endoplasmic reticulum (the ER stress), activates the unfolded protein response (Wang and Kaufman, 2016). In several cells, it has been observed that the lack of tRNA modifications leads to protein misfoldings with the activation of the UPR, an appropriate signaling pathway, the unfolded protein response (Nedialkova and Leidel, 2015). One should also keep in mind that tRNA mutations far away from the interaction sites with the ribosome can lead to misfoldings in the tRNA itself, misfodings that may prevent recognition of the tRNA fold and the subsequent action of maturation and modification enzymes. Several recent publications describe pathological cases associated with absence of tRNA modifications especially at position 34. The elongator complex, made of several proteins, is entirely dedicated to modifications of U34 in mammals. Several mutations in tRNAs or modification enzymes linked to the absence of tRNA modifications have led to, for examples, neurological or mitochondrial disorders (Agris et al., 2017, Bohnsack and Sloan, 2017, Karlsborn et al., 2016, Karlsborn et al., 2014, Powell et al., 2015, Ranjan and Rodnina, 2016). In brief, within each organism, there is a very strong connectivity between the elements responsible for the reliability and efficiency of the decoding process of the genetic code. The multiplicity of these highly interconnected elements and the integration of the various biological information flows ultimately allow for the maintenance of subtle cellular homeostasis and place the processes of translation at the center of cellular activities.




Figure 1: The decoding of mRNAs into polypeptides on the ribosome results from an ensemble of a complex multistep molecular recognition process. Here we focus on those that occur at the ribosomal small subunit A-site. Before entering the decoding site, a potential incoming tRNA is first properly recognized and correctly aminoacylated by its cognate aminoacyl tRNA synthetase. After entering the ribosomal A-site, multiple molecular interactions occur between the tRNA, the mRNA, and the rRNA (Ogle and Ramakrishnan, 2005, Yusupova et al., 2001). The recognition of “synonymous” codons is made possible by structural accommodations at the third base pair (third codon base with residue 34 of the tRNA), the wobble position.



Figure 2: The circular representation of the genetic code and the coevolution of the genetic code with metabolic pathways (after (Grosjean and Westhof, 2016))

Referebces. The energetics and the evolutionary history of the wheel are mapped on the wheel organization. See text for a complete description. In the north or top part of the wheel, an attenuation of the binding parameters are required for avoiding miscoding (due to the additive favorable binding energies). In the south or bottom part of the wheel, a boost of the binding parameters is required for allowing regular coding (through the anticodon loop sequence conservation and the addition of anticodon modifications). The requirement for nucleotide modifications in order to decode smoothly AU-rich codons led to a coevolution between the use of the genetic code and metabolic pathways. The amino acids coded by unsplit 4-codon boxes are indicated in red and those by split 2:2- and 3:1-codon boxes, together with the usual stop codons, are indicated in black.



Figure 3: The translation hub integrates ribosomal translation and cellular protein homeostasis. Information in biology is chemically encoded and processed. The code is molecular, with physico-chemical and structural constraints, and driven by biological evolution.



1. Agris, P.F. (2008) Bringing order to translation: the contributions of transfer RNA anticodon-domain modifications. EMBO Rep 9, 629-635.
2. Agris, P.F., Narendran, A., Sarachan, K., Vare, V.Y.P., and Eruysal, E. (2017) The Importance of Being Modified: The Role of RNA Modifications in Translational Fidelity. Enzymes 41, 1-50.
3. Björk, G.R. and Hagervall, T.G. (2014) Transfer RNA Modification: Presence, Synthesis, and Function. EcoSal Plus 6.
4. Bohnsack, M.T. and Sloan, K.E. (2017) The mitochondrial epitranscriptome: the roles of RNA modifications in mitochondrial translation and human disease. Cell Mol Life Sci.
5. Chan, C.T., Deng, W., Li, F., DeMott, M.S., Babu, I.R., Begley, T.J., and Dedon, P.C. (2015) Highly Predictive Reprogramming of tRNA Modifications Is Linked to Selective Expression of Codon-Biased Genes. Chem Res Toxicol 28, 978-988.
6. Chionh, Y.H., McBee, M., Babu, I.R., Hia, F., Lin, W., Zhao, W., . . . Dedon, P.C. (2016) tRNA-mediated codon-biased translation in mycobacterial hypoxic persistence. Nat Commun 7, 13302.
7. Demeshkina, N., Jenner, L., Westhof, E., Yusupov, M., and Yusupova, G. (2012) A new understanding of the decoding principle on the ribosome. Nature 484, 256-259.
8. Endres, L., Dedon, P.C., and Begley, T.J. (2015) Codon-biased translation can be regulated by wobble-base tRNA modification systems during cellular stress responses. RNA Biol 12, 603-614.
9. Grosjean, H., de Crecy-Lagard, V., and Marck, C. (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584, 252-264.
10. Grosjean, H. and Westhof, E. (2016) An integrated, structure- and energy-based view of the genetic code. Nucleic Acids Res 44, 8020-8040.
11. Jenner, L.B., Demeshkina, N., Yusupova, G., and Yusupov, M. (2010) Structural aspects of messenger RNA reading frame maintenance by the ribosome. Nat Struct Mol Biol 17, 555-560.
12. Karlsborn, T., Mahmud, A., Tukenmez, H., and Bystrom, A.S. (2016) Loss of ncm5 and mcm5 wobble uridine side chains results in an altered metabolic profile. Metabolomics 12, 177.
13. Karlsborn, T., Tukenmez, H., Mahmud, A.K., Xu, F., Xu, H., and Bystrom, A.S. (2014) Elongator, a conserved complex required for wobble uridine modifications in eukaryotes. RNA Biol 11, 1519-1528.
14. Marck, C. and Grosjean, H. (2002) tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features. RNA 8, 1189-1232.
15. Nedialkova, D.D. and Leidel, S.A. (2015) Optimization of Codon Translation Rates via tRNA Modifications Maintains Proteome Integrity. Cell 161, 1606-1618.
16. Novoa, E.M. and Ribas de Pouplana, L. (2012) Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet 28, 574-581.
17. Ogle, J.M. and Ramakrishnan, V. (2005) Structural insights into translational fidelity. Annu Rev Biochem 74, 129-177.
18. Powell, C.A., Nicholls, T.J., and Minczuk, M. (2015) Nuclear-encoded factors involved in post-transcriptional processing and modification of mitochondrial tRNAs in human disease. Front Genet 6, 79.
19. Ranjan, N. and Rodnina, M.V. (2016) tRNA wobble modifications and protein homeostasis. Translation (Austin) 4, e1143076.
20. Rozov, A., Demeshkina, N., Westhof, E., Yusupov, M., and Yusupova, G. (2015) Structural insights into the translational infidelity mechanism. Nat Commun 6, 7251.
21. Rozov, A., Demeshkina, N., Westhof, E., Yusupov, M., and Yusupova, G. (2016) New Structural Insights into Translational Miscoding. Trends Biochem Sci 41, 798-814.
22. Suzuki, T. and Suzuki, T. (2014) A complete landscape of post-transcriptional modifications in mammalian mitochondrial tRNAs. Nucleic Acids Res 42, 7346-7357.
23. Wang, M. and Kaufman, R.J. (2016) Protein misfolding in the endoplasmic reticulum as a conduit to human disease. Nature 529, 326-335.
24. Weixlbaumer, A., Murphy, F.V.t., Dziergowska, A., Malkiewicz, A., Vendeix, F.A., Agris, P.F., and Ramakrishnan, V. (2007) Mechanism for expanding the decoding capacity of transfer RNAs by modification of uridines. Nat Struct Mol Biol 14, 498-502.
25. Westhof, E. (2014) Isostericity and tautomerism of base pairs in nucleic acids. FEBS Lett 588, 2464-2469.
26. Yusupova, G.Z., Yusupov, M.M., Cate, J.H., and Noller, H.F. (2001) The path of messenger RNA through the ribosome. Cell 106, 233-241.