J Phys Soc Jpn.2017 Jan;86,014802. DOI:10.7566/JPSJ.86.014802.

Simulated Annealing-Extended Sampling for Multicomponent Decomposition of Spectral Data of DNA Complexed with Peptide.

Jiyoung Kang1, Kazuhiko Yamasaki2, Kuniaki Sano3, Ken Tsutsui3, Kimiko M. Tsutsui3, and Masaru Tateno1

1Graduate School of Life Science, University of Hyogo, Kamigori, Hyogo 678-1297, Japan

2Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8566, Japan

3Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama 700-8558, Japan

 

Abstract

Theoretical analyses of multivariate data have become increasingly important in various scientific disciplines such as genome biology and bioinformatics. The multivariate curve resolution alternating least-squares (MCR-ALS) method is an integrated and systematic tool to decompose such various types of spectral data to several pure spectra, corresponding to distinct species. However, in the present study, the MCR-ALS calculation provided only unreasonable solutions, when used to process the circular dichroism spectra of double-stranded DNA (228 bp) in the complex with a DNA-binding peptide under various concentrations. To resolve this problem, we developed an algorithm by including a simulated annealing (SA) protocol (the SA-MCR-ALS method), to facilitate the expansion of the sampling space. The analysis successfully decomposed the aforementioned data into three reasonable pure spectra. Thus, our SA-MCR-ALS scheme provides a useful tool for effective extended sampling, to investigate the substantial and detailed properties of various forms of multivariate data with significant difficulties in the degrees of freedom.

 

Supplement:

For decomposition of multivariate data, several methodologies such as independent component analysis (ICA) and principal component analysis (PCA) have been employed so far (Fig. 1) (1, 2). The MCR-ALS analysis can provide more comprehensive explanations than ICA and PCA, by combining several constraints to explore reasonable decomposition solutions (3).

 

Nevertheless, in terms of our circular dichroism (CD) spectra of double-stranded DNA (228 bp) in the complex with a DNA-binding peptide1 under various concentrations, we could not find a reasonable solution with the conventional MCR-ALS method. This problem may be originated from nonlinearity of the system, and is well known as the local minimum problem in the data science (Fig. 2).

 

In order to resolve this issue by extending the sampling space, we developed an algorithm including a simulated annealing (SA) protocol into the conventional MCR-ALS method (i.e., the SA-MCR-ALS method) (Fig. 2). This analysis successfully decomposed the aforementioned data into three reasonable pure spectra, i.e., B-DNA and two ψ-DNA spectra (the latter two spectra exhibited characteristic features found in the ψ-DNA) 2.

 

Thus, our SA-MCR-ALS analysis scheme could be an effective tool for exploring the reasonable solution (i.e., the global minimum), and could be also utilized for various multivariate data analysis (4-13). So, we recommend the SA-MCR-ALS analysis as a standard scheme to decompose the complicated multivariate data such as microarray image data in bioinformatics, and infrared absorption, circular dichroism (CD), and NMR spectra in structural analyses.

 

Note:

  1. The DNA-binding peptide that was analyzed in the present study (the amino acid sequence is (Lys)9(Glu)9(Lys)9) is derived from the DNA-binding domain of lens epithelium-derived growth factor (LEDGF), which selectively binds to the negatively supercoiled DNA. LEDGF contains a cluster with the conservative amino acid sequence consisting of lysine and glutamic/aspartic acid residues. Interestingly, even only this polypeptide segment extracted from the protein, e.g., (Lys)9(Glu)9(Lys)9, exhibits the comparable selectivity for the negatively supercoiled DNA (14).
  2. To investigate the binding modes of the DNA and the peptide, further spectroscopic studies are in progress in our group, based on the present analysis. These experimental data enable us to perform structural modeling of the DNA-peptide complex, coupled to sophisticated computational techniques such as molecular dynamics simulation with generalized ensemble methods.

 

 

Fig. 1 Decomposition of a data matrix (D) to C • ST (i.e., D = C • ST + E).

 

 

Fig.2 Comparison of the conventional MCR-ALS and SA-MCR-ALS methods. (a) Unreasonable solution (i.e., local minimum) detected by the conventional MCR-ALS is shown. In fact, the obtained two pure spectra, colored by red and yellow, are complementary to each other. (b) In contrast, by adopting our SA-MCR-ALS method, a reasonable solution (i.e., the global minimum) is successfully identified. Actually, the aforementioned complementarity is not found in the pure spectra obtained by the SA-MCR-ALS method.

 

References:

1. Jolliffe IT (2002) Principal Component Analysis (Springer-Verlag, New York) 2 Ed.
2. Jutten C & Herault J (1991) Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing 24(1):1-10.
3. Parastar H, Jalali-Heravi M, & Tauler R (2012) Is independent component analysis appropriate for multivariate resolution in analytical chemistry? Trac-Trend. Anal. Chem. 31:134-143.
4. Jaumot J, Gargallo R, de Juan A, & Tauler R (2005) A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB. Chemometr. Intell. Lab. 76(1):101-110.
5. Jaumot J & Tauler R (2010) MCR-BANDS: A user friendly MATLAB program for the evaluation of rotation ambiguities in Multivariate Curve Resolution. Chemometr. Intell. Lab. 103(2):96-107.
6. Garrido M, Rius FX, & Larrechi MS (2008) Multivariate curve resolution–alternating least squares (MCR-ALS) applied to spectroscopic data from monitoring chemical reactions processes. Anal Bioanal Chem 390(8):2059-2066.
7. Wentzell PD, et al. (2006) Multivariate curve resolution of time course microarray data. BMC bioinformatics 7:343.
8. Meshki M, Behpour M, & Masoum S (2015) Application of multivariate curve resolution alternating least squares method for determination of caffeic acid in the presence of catechin interference. Analytical biochemistry 473:80-88.
9. Ruckebusch C & Blanchet L (2013) Multivariate curve resolution: A review of advanced and tailored applications and challenges. Anal. Chim. Acta 765:28-36.
10. Felten J, et al. (2015) Vibrational spectroscopic image analysis of biological material using multivariate curve resolution-alternating least squares (MCR-ALS). Nature protocols 10(2):217-240.
11. Wolffe AP (2001) Chromatin remodeling: why it is important in cancer. Oncogene 20(24):2988-2990.
12. Haeusler AR, et al. (2014) C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507(7491):195-200.
13. Maizels N (2015) G4‐associated human diseases. EMBO Rep. 16(8):910-922.
14. Tsutsui KM, Sano K, Hosoya O, Miyamoto T, & Tsutsui K (2011) Nuclear protein LEDGF/p75 recognizes supercoiled DNA by a novel DNA-binding domain. Nucleic Acids Res. 39(12):5067-5081.