Biochimie. 2017 Apr;135:54-62.

The expanding repertoire of G4 DNA structures.

PubMed

 

Supplement: 

The publication concerns conformational polymorphism of nucleic acids. Guanine-rich DNA/RNA fragments can fold into G-quadruplexes – non-canonical four-strand secondary structures. Literature on biological roles of PQS sites (Putative Quadruplex Sequences) and their distribution in the human genome indicate a growing interest in the field [1-4]. Quadruplexes are known to affect chromosomal integrity, represent ‘hot spots’ of recombination and behave as ‘switchers’ of various cellular processes. Formation of such structures has implications for oncological and neurodegenerative diseases. 

Adequate bioinformatics tools are required to predict which polynucleotide fragments are prone to unusual folding. A growing body of data suggests that the secondary structures adopted by G-rich polynucleotides may be more diverse than previously thought and that the definition of G-quadruplex-forming sequences should be broadened. Hence, the available tools must be improved. Novel structural diversity arises from the accommodation of mixed tetrads, bulges and vacancies (Fig.1). We called such structures “imperfect” quadruplexes (ImGQ) due to defects in their sequences – they contain at least one truncated or interrupted G-run. Most G4-predicting software programs utilize the consensus sequence rule, i.e., the G3-5 N1-7 G3-5 N1-7 G3-5 N1-7 G3-5 formula. We studied the solution structures of a series of naturally occurring and model single-stranded DNA fragments that defy the formula and would not be detected by almost all of the current quadruplex search algorithms. The results confirmed the G4-forming potential of such sequences.

We developed a new quadruplex search algorithm that considers recent findings in the area of non-canonical DNA/RNA geometries and takes into account “imperfect” structures – imGQfinder (http://imgqfinder.niifhm.ru/). It searches for all G4 (classic and imGQ) motifs in a given sequence. The input parameters include the queried nucleotide sequence in raw or FASTA format, the number of tetrads and defects and the maximum loop length. ImGQfinder searches for G-runs and selects ones that are consistent with input parameters.

Analysis of the human genome with imGQfinder revealed that quadruplex-forming sites may be much more frequent than previously thought. Such sites in functionally important genome regions represent potential targets for therapeutic intervention. A reassessment of the abundance of putative quadruplex sites in the human genome with the new algorithm revealed that the maximum number of four-tetrad G4 structures that could be simultaneously realized has been underestimated by approximately 5 times. Putative canonical and non-canonical G4 sites have basically similar distributions within the genome (Fig.2).

We offer a simple classification of diverse quadruplex structures: all quadruplexes with truncated or interrupted G-runs are referred to as “imperfect” (imGQs); the rest, including ones with long loops or two tetrads, are referred to as “perfect” (GQs). Recently, ratios of different quadruplex structures (classical 3- and 2-tetrad ones, as well as quadruplexes with long loops and bulges) in transcriptome have been estimated using rG4-seq method [1], and the effects of a quadruplex-stabilizing ligand pyridostatin have been assessed. According to our classification, proportion of GQs and imGQs does not depend on the presence of pyridostatin (Fig.3). This supports our assumption that the ligand affects both perfect and imperfect structures, agrees with the results our detailed studies of imGQ-ligand interactions [5], and stresses the importance of considering imGQs as drug targets.

To obtain some insight into imGQ core dynamics and the favorable positions of the G-run-interrupting nucleotides, we performed molecular modeling of one of naturally occurring single-stranded DNA fragments. The molecular dynamics simulation results suggest that a mismatching base in the internal tetrad of a single-defect imGQ can bulge out to initiate “shifting” of the defect to the external tetrad. Thus, a 3-tetrad quadruplex with a mismatch can transform to a 2-tetrad one with a bulge and both structures would be stable in physiological conditions.

Thus, our results contribute to fundamental studies of DNA/RNA spatial organization and, in the long run, may provide a basis for the development of new drugs. The broadened algorithm provides new opportunities in the prediction of DNA/RNA structure. Furthermore, our findings are particularly important for molecular medicine. Quadruplexes are known as promising genetic as well as epigenetic targets for therapeutics, nutrition and xenobiotics [6-7]. Our results reveal imGQs as equally important targets. Quadruplexes with truncated G-runs might account for some of the side effects of G4-targeted therapeutics and should be considered when studying regulatory effects of G4-stabilizing endogenous or xenobiotic ligands and selecting individual drugs based on SNPs. This work was supported by Russian Science Foundation (grant No. 14-25-00013).

 

 

Figure 1. Schematic representations of “perfect” (A) and “imperfect” (B-D) quadruplex structures. A: classical G4; B: G4 with a vacancy; C: G4 with a mismatch; D: G4 with a bulge.
Figure 2. GQ/imGQ motif distribution within genes. GQ and imGQ motif frequencies in the proximity of RefSeq TSS and exon/intron boundaries.
Figure 3. Prevalence of RTS sites by rG4 category in K+ and K+- PDS. G ≥ 40%, sequences with at least 40% G-content that do not fall into any other categories. Data from [1]. The authors applied another classification: only G3L1-7 sequences were designated as canonical G4. This figure is colored according to our classification: ImGQ, dark green; classical G4, violet.

 

References

1. Kwok C.K., Marsico G., Sahakyan A.B., Chambers V.S.Balasubramanian S. rG4-seq reveals widespread formation of G-quadruplex structures in the human transcriptome // Nat Meth. – 2016; 13 (10): 841-844.
2. Guo J.U.Bartel D.P. RNA G-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria // Science. – 2016; 353 (6306).
3. Fay M.M., Lyons S.M.Ivanov P. RNA G-Quadruplexes in Biology: Principles and Molecular Mechanisms // Journal of Molecular Biology. – 2017; 429 (14): 2127-2147.
4. Hon J., Martínek T., Zendulka J.Lexa M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R // Bioinformatics. – 2017; btx413.
5. Vlasenok M., Varizhuk A., Kaluzhny D., Smirnov I.Pozmogova G. Data on secondary structures and ligand interactions of G-rich oligonucleotides that defy the classical formula for G4 motifs // Data in Brief. – 2017; 11 258-265.
6. Porru M., Zizza P., Franceschin M., Leonetti C.Biroccio A. EMICORON: A multi-targeting G4 ligand with a promising preclinical profile // Biochimica et Biophysica Acta (BBA) – General Subjects. – 2017; 1861 (5): 1362-1370.
7. Francois M., Leifert W., Tellam R.Fenech M. G-quadruplexes: A possible epigenetic target for nutrition // Mutat Res Rev Mutat Res. – 2015; 764 101-107.