ACS Chem. Neurosci. 2017, 8, 578-591

Structure and Dynamics of the DNA and RNA Double Helices Obtained from the GGGGCC and CCCCGG Hexanucleotide Repeats That Are the Hallmark of C9FTD/ALS Diseases

Yuan Zhang, Christopher Roland and Celeste Sagui

From the Department of Physics, North Carolina State University, Raleigh 27695-8202

Running title: Structure of double helices from GGGGCC and CCCCGG hexanucleotide repeats

To whom correspondence should be addressed: Prof. Celeste Sagui, Department of Physics, 319 Riddick Hall, North Carolina State University, Raleigh, North Carolina 27695-8202, Telephone: (919) 515 3111; E-mail:



A (GGGGCC) hexanucleotide repeat (HR) expansion in the C9ORF72 gene, and its associated antisense (CCCCGG) expansion are considered the major cause behind frontotemporal dementia, and amyotrophic lateral sclerosis. We have performed molecular dynamics simulations to characterize the conformation and dynamics of the 12 duplexes that result from the three different reading frames in sense and antisense HRs for both DNA and RNA. These duplexes display atypical structures relevant not only for a molecular level understanding of these diseases but also for enlarging the repertoire of nucleic-acid structural motifs. G-rich helices share common features. The inner G-G mismatches stay inside the helix in Gsyn-Ganti conformations and form two hydrogen bonds (HBs) between the Watson-Crick edge of Ganti and the Hoogsteen edge of Gsyn. In addition, G in RNA forms a base-phosphate HB. Inner G-G mismatches cause local unwinding of the helix. G-rich double helices are more stable than C-rich helices due to better stacking and HBs of G-G mismatches. C-rich helix conformations vary wildly. C mismatches flip out of the helix in DNA but not in RNA. Least (most) stable C-rich RNA and DNA helices have single (double) mismatches separated by two (four) Watson-Crick basepairs. The most stable DNA structure displays and “e-motif” where mismatched bases flip towards the minor groove and point in the 5′ direction. There are two RNA conformations where the orientation and HB pattern of the mismatches is coupled to bending of the helix.

Keywords: nucleotide repeat disorder, hexanucleotide repeat, C-C and G-G mismatches, C9FTD, ALS, e-motif

PMID: 27933757 ; DOI: 10.1021/acschemneuro.6b00348



Simple sequence repeats (SSRs) consist of units of 1 to 6 (and even 12) nucleotides that are repeated up to 30 times (and even more in pathological cases). They represent approximately 3% of the entire sequence of the human genome. Trinucleotide repeats (TRs) constitute the most common type of SSRs in the exome of all known eukaryotic genomes. TRs may be selectively neutral sequences, or play important functional roles. They are characterized by a high mutation rate associated with the variation of the repeat number. It has been estimated that the rate of repeat number mutations in some TRs is about 10,000 times higher than that of a point mutation. This leads to frequent polymorphism in the coding regions of genes, with a correspondingly rapid expansion of the amino acid repeats. This aids natural selection by rapidly generating new alleles.


TRs also exhibit “dynamic mutations” that do not follow Mendelian inheritance. About a century ago, it was observed that a neurological disorder (myotonic dystrophy) was an inherited disease whose age or onset decreased and whose severity increased with successive generations. Additionally, the penetrance – i.e., the probability that this kind of mutation results in the disease – also increased with successive generation. Many other similar disorders have subsequently joined the family of so-called “genetic anticipation” diseases, but it took until the 1990 for scientists to realize that these diseases were caused by the intergenerational expansion of SSRs. After a certain threshold in repeat number, the probability of further expansion and the severity of the disease increases with repeat number. To date, about 30 DNA expandable SSR diseases have been identified and the list is expected to grow (see Figure 1) [1]. The expansion is believed to be primarily caused by some sort of slippage during DNA replication, repair, recombination, or transcription. Subsequent cell toxicity and death have been linked to the atypical DNA conformation and functional changes of the transcripts and, when TRs are present in exons, of the translated proteins.


Figure 1: Schematic illustrating occurrence of SSRs and abbreviations of the most common diseases that they lead to.


Although the mechanisms underlying these SSR-based diseases are quite complex, some simple trends are remarkably robust, such as the correlation between repeat number beyond the repeat threshold and the probability of further expansion and repeat pathology. A particularly important breakthrough has been the recognition that stable non-B-DNA secondary structure in the expanded repeats is “a common and causative factor for expansion in human disease”[2]. Indeed, expandable repeats have been shown to display atypical structural characteristics including single-stranded hairpins, Z-DNA, triplex and G-quartets and slipped-stranded duplexes. A primary goal of our research has been the characterization of these atypical structures, their relative stability, and the way both structure and stability vary with repeat number.


Turning specifically to frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), these are two neurodegenerative diseases with a similar genetic and neurological pathways. FTD is the most common cause of early onset dementia due to degeneration of the frontal and anterior temporal lobes, while ALS is characterized by progressive muscle weakness and paralysis due to loss of motor neurons in the brain and spinal chord. These diseases are believed to be part of the same general spectrum, and are associated with a (GGGGCC) hexanucleotide repeat (HR) expansion of the first intron of the C9ORF72 gene. Generally, the unaffected population carries fewer than 20 repeats, while expansions greater than 70 and usually encompassing 250-1600 repeats have been found in all C9FTD and ALS patients.


These repeats can cause toxicity though different nonexclusive mechanisms. The transcribed introns containing these large expansions seem to contribute to neuropathology through a loss of function as mRNA levels of C9ORF72 are decreased, and through gain of function as RNA transcripts containing the (GGGGCC) HRs are accumulated in the nuclear foci in the frontal cortex and spinal chord leading to a sequestration of RNA-binding proteins. Complicating matters, there is evidence that the antisense transcripts of the CCCCGG expanded repeats also form nuclear RNA foci. Translated repeats can also cause toxicity in the corresponding protein and its interaction partners. Even though the HR expansions reside in the noncoding regions of the C9ORF72 gene, these expansions can trigger protein translations even in the absence of a start codon, giving rise to unconventional repeat-associated non-ATG translations. These so-called C9RAN proteins have been detected in FTD/ALS patients.


As already noted, atypical stable DNA structures are associated with SSR-based diseases. For FTD/ALS diseases, chemical and enzymatic probing of the HRs points to a general scenario where the repeat expansion adopts a hairpin structure with G-G mismatches, which is in equilibrium with a quadruplex structure, as shown in Figure 2. Generally speaking, for reasons that are not yet understood, lower annealing temperatures favor hairpin structures while higher temperatures favor quadruplexes. Also, the ion environment seems to play an important role with Na+ cations favoring hairpins and K+ ions quadruplexes.


Figure 2: Schematic of two common atypical DNA structures associated with HRs: (a) hairpin;(b) quadruplex.


As a first step towards investigating the characteristics of HR-based hairpins, our work has focused on the structural and dynamical aspects of double helices associated with the hairpin stems [3]. We have therefore carried out extensive molecular dynamics simulations of all possible DNA and RNA duplexes that can be formed from the HRs. Our results indicate G-rich helices are stable, with G-G mismatches remaining inside the helix in a Gsyn-Ganti conformation. By contrast, C-rich helices show a wide variety of conformations. For example, for the C-rich DNA duplexes, the C’ tend to form a stable “e-motif” (see Figure 3), where mismatched bases flip towards the minor groove and point in the 5′ direction. RNA, on the other hand, forms two stable structures (Figure 4) which are coupled to the bending of the helix near the mismatches.


In summary, our paper provides a comprehensive description of the wide variety of conformations characteristic of SSRs based on (GGGGCC) and (CCCCGG) repeats. It is important to note that there is currently no experimental data with atomic resolution for any of these HR-based duplexes or hairpins. Thus, we believe our study to be timely and hope that this characterization of the HR structures will ultimately lead to a new understanding with regards to the origins and mechanisms behind the nucleotide repeat disorder diseases. In that spirit, we have similarly investigated the structure and dynamics of DNA and RNA double helices based on CAG and GAC repeats [4], and GGC, CGG, CCG, GCC repeats [5].


Figure 3: Sample simulation results of (GGGGCC) hexanucleotide repeat expansions. Shown here is the e-motif for DC-1 in two forms. Plotted on the left schematic is the neutralizing ion distribution.

Figure 4: Two stable RNA structures as seen in the molecular dynamics simulations. These structures are coupled to the bending of the helix. Shown are the conformation of the C-C mismatches and the structure of the helix. On top of the latter, the distribution of the neutralizing ion cloud is also plotted.



  1. See for example: S. Mirkin, Expandable DNA repeats and human disease, Nature 447, 932 (2007).
  1. C. McMurray, DNA secondary structure: A common and causative factor for the expansion in human disease, Proc. Natl. Acad. Sci. USA 96, 1822 (1999).
  1. Y. Zhang, C. Roland, C. Sagui, Structure and dynamics of DNA and RNA double helices obtained from the GGGGCC and CCCCGG hexnucleotide repeats that are the hallmark of C9FTD/ALS diseases. ACS Chem. Neurosci. 8, 578-591 (2017).
  1. F. Pan, V.H. Man, C. Roland and C. Sagui, Structure and dynamics of DNA and RNA double helices obtained from CAG and GAC trinucleotide repeats. Biophys. J., in press 2017.
  1. F. Pan, V.H. Man, C. Roland and C. Sagui, Structure and dynamics of DNA and RNA double helices obtained from CCG and GGC trinucleotide repeats. Manuscript in preparation.