iRNA-3typeA: Identifying Three Types of Modification at RNA's Adenosine Sites.
Journal: 2018/November - Molecular Therapy - Nucleic Acids
ISSN: 2162-2531
Abstract:
RNA modifications are additions of chemical groups to nucleotides or their local structural changes. Knowledge about the occurrence sites of these modifications is essential for in-depth understanding of the biological functions and mechanisms and for treating some genomic diseases as well. With the avalanche of RNA sequences generated in the post-genomic age, many computational methods have been proposed for identifying various types of RNA modifications one by one. However, so far no method whatsoever has been developed for simultaneously identifying several different types of RNA modifications. To address such a challenge, we developed a predictor called "iRNA-3typeA," by which we can simultaneously identify the occurrence sites of the following three most frequently observed modifications in RNA: (1) N1-methyladenosine (m1A), (2) N6-methyladenosine (m6A), and (3) adenosine to inosine (A-to-I). It has been shown via rigorous cross-validations for the RNA sequences from Homo sapiens and Mus musculus transcriptomes that the success rates achieved by the powerful new predictor are quite high. For the convenience of broad experimental scientists, a user-friendly web server for iRNA-3typeA has been established at http://lin-group.cn/server/iRNA-3typeA/. It is anticipated that iRNA-3typeA may become a useful high throughput tool for genome analysis.
Relations:
Content
Citations
(12)
Similar articles
Articles by the same authors
Discussion board
Molecular Therapy. Nucleic Acids. May/31/2018; 11: 468-474
Published online Mar/29/2018

iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites

Abstract

RNA modifications are additions of chemical groups to nucleotides or their local structural changes. Knowledge about the occurrence sites of these modifications is essential for in-depth understanding of the biological functions and mechanisms and for treating some genomic diseases as well. With the avalanche of RNA sequences generated in the post-genomic age, many computational methods have been proposed for identifying various types of RNA modifications one by one. However, so far no method whatsoever has been developed for simultaneously identifying several different types of RNA modifications. To address such a challenge, we developed a predictor called “iRNA-3typeA,” by which we can simultaneously identify the occurrence sites of the following three most frequently observed modifications in RNA: (1) N1-methyladenosine (m1A), (2) N6-methyladenosine (m6A), and (3) adenosine to inosine (A-to-I). It has been shown via rigorous cross-validations for the RNA sequences from Homo sapiens and Mus musculus transcriptomes that the success rates achieved by the powerful new predictor are quite high. For the convenience of broad experimental scientists, a user-friendly web server for iRNA-3typeA has been established at http://lin-group.cn/server/iRNA-3typeA/. It is anticipated that iRNA-3typeA may become a useful high throughput tool for genome analysis.

Introduction

RNA modification means the addition of chemical groups to its constitutional nucleotides or structural changes therein.1 So far, more than 100 types of RNA modifications have been observed in cellular RNAs of all living organisms.2 Because they are involved in a series of crucial biological activities,3 such as mRNA splicing, mRNA nuclear processing, mRNA export, and mRNA decay,3, 4, 5, 6 particularly linked with human diseases, RNA modifications have drawn great attention in the scientific community.

With the development of high-throughput experimental techniques,7, 8, 9 lots of RNA modification data have been acquired; they are very helpful for revealing the novel functions of RNA modifications. As indicated in a recent review,10 however, most of these methods are unable to discriminate among the different RNA modifications that may simultaneously occur in the same RNA molecule. For example, the adenosine usually undergoes N1-methyladenosine (m1A), N6-methyladenosine (m6A), and adenosine to inosine (A-to-I or AI) modifications7 (Figure 1). Unfortunately, using the aforementioned techniques, one could not detect whether different types of RNA modifications might take place at the same time, let alone analyze their combinational biological functions.11.

Figure 1

The Three Common Types of Modifications in RNA

(1) N1-methyladenosine (m1A), (2) N6-methyladenosine (m6A), and (3) adenosine to inosine (A-to-I).

Therefore, it is urgently needed to develop computational methods to address this problem. As excellent complements to experimental techniques, computational methods have been developed to identify RNA modifications12, 13, 14, 15, 16, 17, 18 via machine learning to train computational models based on the large data yielded from the high-throughput experiments. However, rarely are they able to simultaneously identify multiple RNA modifications.

The present study was devoted to developing a bioinformatics tool that can identify the RNA modification types for m1A, m6A, and AI that may simultaneously occur on adenosine in both Homo sapiens and Mus musculus transcriptomes.

As shown in a series of recent publications,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 in developing a bioinformatics tool, complying with the five-step rules yields the following advantages:32 (1) clearer in logic deduction, (2) better illumination in stimulating other relevant tools, and (3) more usefulness in practical application.

In view of this, we elaborate the following procedures required in the five-step rules: (1) benchmark dataset, (2) sample formulation, (3) operative machine, (4) cross-validation, and (5) web server, and they are embedded into the rubrics according to the journal’s format.

Results and Discussion

Performance Report

Listed in Table 1 are the jackknife test results obtained by the proposed predictor on the benchmark datasets (Supplemental Information S1 and Supplemental Information S2 available at http://lin-group.cn/server/iRNA3typeA/data.htm) for H. sapiens and M. musculus, respectively. As we can see from the table, the rates for both overall accuracy (Acc) and stability (MCC) are quite high for all the three different types of modifications investigated, indicating that the predictor is not only high in overall success rate but also quite stable. Therefore, the potential is quite high for iRNA-type3A to become a high-throughput tool in both basic research and drug development.

Table 1

The Success Rates Achieved by iRNA-3typeA via Jackknife Tests on the Benchmark Datasets for H. sapiens and M. musculus, Respectively

SpeciesType of ModificationSn (%)Sp (%)Acc (%)MCC
H. sapiensm1Aa98.3899.8999.130.98
m6Ab81.6899.1190.380.82
AIc86.1895.2390.710.82
M. musculusm1Ad97.46100.0098.730.97
m6Ae77.79100.0088.390.80
AIf96.75100.0098.380.96

aThe parameters used for SVM are C=8 and γ = 0.0078125.

bThe parameters used for SVM are C=128 and γ = 3.05158e-5.

cThe parameters used for SVM are C=8 and γ = 0.0078125.

dThe parameters used for SVM are C=2 and γ = 0.0078125.

eThe parameters used for SVM are C=32 and γ = 0.00012207.

fThe parameters used for SVM are C=512 and γ = 0.000488281.

It is instructive to point out that, although the current predictor is limited in identifying m1A, m6A, and AI sites for the RNA sequences from H. sapiens and M. musculus, with more experimental data available for other types of modifications and other species in future, we can easily to extend our model to cover more different types of modifications and more different species. Therefore, the current predictor is just a good start; it will be subjected to updates with the aim to continuously enhance its power and coverage scope.

Comparison with Other Classifiers

The proposed predictor iRNA-3typeA is the first predictor ever constructed for identifying the three types of RNA modifications (m1A; m6A; AI) simultaneously. It is not possible to show its power via a conventional comparison since there is no other predictor whatsoever that can do the same. Nevertheless, below we can carry out a special comparison to further demonstrate its superiority.

As mentioned above, the operative machine used for iRNA-3typeA is a support vector machine (SVM) classifier. What would happen if we use other classifiers instead? Listed in Table 2 are the results when the SVM classifier was substituted with the other classifiers, respectively.

Table 2

The Comparative Results of the Proposed Predictor When Its Operating Algorithm32 Was Replaced from SVM to Other Classifiers

ClassifierSpeciesModification TypeSn (%)Sp (%)Acc (%)MCC
BayesNetaH. sapiensm1A98.8198.8598.830.98
m6A82.04100.0091.020.83
AI88.5089.5789.030.78
M. musculusm1A97.1898.7897.980.96
m6A77.79100.0088.900.80
AI96.5199.8898.200.96
Naive BayesaH. sapiensm1A98.1698.3098.230.96
m6A82.0499.7390.880.83
AI89.4087.0488.220.76
M. musculusm1A96.4397.7597.090.94
m6A77.7998.6288.220.78
AI95.9197.9596.930.94
J48 TreeaH. sapiensm1A98.7799.4099.090.98
m6A82.4884.3583.410.67
AI88.1889.0488.600.77
M. musculusm1A96.7198.6897.700.95
m6A83.0382.2182.620.65
AI96.2799.0497.650.95
SVMbH. sapiensm1A98.4699.8999.180.98
m6A80.44100.0090.230.82
AI86.7395.4091.070.82
M. musculusm1A97.46100.0098.730.97
m6A77.79100.0088.900.80
AI97.35100.0098.670.97

All the rates below are obtained by the 10-fold cross-validations on the same benchmark datasets (Supplemental Information S1 and Supplemental Information S2 available at http://lin-group.cn/server/iRNA3typeA/data.htm).

aTaken from the WEKA package.91

bProposed in this paper.

From the table, we can see the following: (1) the SVM classifier is better than J48 Tree in all the metrics rates. (2) Although the SVM classifier is a little bit lower than the BayesNet classifier and Naive Bayes classifier in identifying the m6A sites for H. sapiens, its accuracies in identifying all the other types of modifications for both H. sapiens and M. musculus are significantly higher than those of BayesNet and Naive Bayes. All these results have further indicated that the SVM classifier is indeed a correct choice for the iRNA-3typeA predictor.

Web Server and User Guide

The last step of the five-step rules32 is about the web server. It is indeed important because user-friendly and publicly accessible web servers represent the future direction for developing practically more useful predictors.33 Actually, it has been demonstrated by a series of recent publications (see, e.g., Cheng et al.,25, 34, 35, 36 Liu et al.,28 Lin et al.,37 Jia et al.,38, 39 and Cheng and Xiao40) that a new prediction method with its web server available would significantly enhance its impacts.41, 42 In view of this, the web server for iRNA-3typeA has been established. Furthermore, to maximize the convenience of broad experimental scientists, a step-by-step guide is given below:

  • Step 1. Open the iRNA-3typeA web server at http://lin-group.cn/server/iRNA-3typeA; you will see the top page of the web server as shown in Figure 2A.

    Figure 2

    The Semi-screenshot for the Top Page of the iRNA-3typeA Web Server and the Prediction Result of the Two Example Query Sequences

    The Semi-screenshot for the top page of the iRNA-3typeA Web Server (top panel) and the Prediction Result of the two example query sequences (bottom panel).

  • Step 2. Either type or copy/paste the query RNA sequences (in FASTA format) into the input box. Example sequences can be found by clicking on the Example button.
  • Step 3. Click the open circle (H. sapiens and M. musculus) to choose the species concerned, followed by clicking the Submit button. For example, if using the query RNA sequences in the Example window as the input and choosing H. sapiens, after submission you will see the predicted results summarized in a table (Figure 2B), clearly indicating (1) the adenosine at position 21of sequence #1 has the potential to be of the site for m1A or A-to-I editing modification. (2) The adenosine at position 21 of sequence #2 has the potential to be of m6A modification only. All these predicted results are fully consistent with experimental observations.

Materials and Methods

Benchmark Datasets

The benchmark datasets for m1A, m6A, and A-to-I editing sites in H. sapiens and M. musculus genomes were derived from the previous works.12, 14, 43 Listed in Table 3 are the numbers of positive and negative samples for each of the benchmark datasets. It has been found by similar approaches12, 14 that the optimal length of the sequence samples in the benchmark datasets are 41nt, with the modified sites (m1A, m6A, or AI editing site) at the center. For readers’ convenience, the benchmark dataset thus obtained for H. sapiens is given in Supplemental Information S1, while that for M. musculus given in Supplemental Information S2; both can be downloaded from the link at http://lin-group.cn/server/iRNA3typeA/data.htm.

Table 3
A Breakdown of the Benchmark Dataset
SpeciesAttributeNumber of Samples
m1Am6AAI
H. sapienspositive6,3661,1303,000
negative6,3661,1303,000
M. musculuspositive1,064725831
negative1,064725831

Sample Formulation

An RNA sample with 41 nt is usually sequentially formulated by(Equation 1)R=N1N2N3NiN41,where(Equation 2)Ni{A(adenine),C(cytosine),G(guanine),U(uracil)}denotes the nucleotide at the i-th sequence position, and is the a symbol in the set theory meaning “member of.”

To enable the existing machine-learning algorithms handle the RNA sample,41 the first thing we need to do is to convert its sequential formulation into a vector. But a vector in a discrete framework might totally miss all the sequence-order information or pattern feature. To deal with this problem, the PseAAC (pseudo amino acid composition) was introduced.44 Ever since the concept of PseAAC was proposed, it has been swiftly penetrated into many biomedicine and drug development areas45, 46 and nearly all the areas of computational proteomics (see, e.g.,Esmaeili et al.,47 Mohabatkar et al.,48 Nanni et al.,49 Pacharawongsakda and Theeramunkong,50 Mondal and Pai,51 Ahman et al.,52 Kabir and Hayat,53 Yu et al.,54 Zhang and Duan,55 Muthu Krishnan,56 and a long list of references cited in two review papers42, 57). Encouraged by the successes of using PseAAC to deal with protein/peptide sequences, this idea has been extended to deal with DNA/RNA sequences21, 28, 37, 58, 59, 60 in computational genomics via PseKNC (pseudo K-tuple nucleotide composition).61, 62 According to Chen et al.63, the general form of PseKNC can be formulated as(Equation 3)R=[ϕ1ϕ2ϕuϕΓ]T,where T is the transposing operator, the subscript Γ is an integer, and its value and the components ϕu(u=1,2,) will depend on how to extract the desired features and properties from the RNA sequence (cf. Equation 1). In this study, their definitions are described below.

The four bases (A, C, G, and U) of RNA have different chemical properties and structures.64, 65 Therefore, based on their different chemical properties and structures,64, 65 A, C, G, and U can be represented by (1, 1, 1), (0, 0, 1), (1, 0, 0), and (0, 1, 0), respectively.20, 27 For instance, the RNA sequence with six nucleotides “GUGCAG” can be expressed by the vector of (3×6)=18 components; i.e., [1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0].

Moreover, to incorporate into Equation 3 the sequence-coupled information66 for the nucleotides around the modification sites, we adopt the lingering density as defined below(Equation 4)Di=1Lij=1f(Nj),where Di is the density of the nucleotide Νi at the site i of a RNA sequence, Li the length of the sliding substring concerned; denotes each of the site locations counted in the substring, and(Equation 5)f(Nj)={1,ifNj=thenucleotideconcerned0,otherwiseFor example, the RNA sequence “GUGCAG” can be represented by the vector [1, 0.5, 0.66, 0.25, 0.2, 0.5].

Thus, by using both nucleotide chemical properties and the lingering density (cf. Equation 4), each nucleotide can be defined by four variables. Accordingly, the RNA sequence of Equation 1 can be defined by a vector with (41×6)=164 components; namely Γ=164 for Equation 3 now.

Operative Machine

In this study, the SVM was chosen as the operative machine. The SVM has been widely used in computational genomics and proteomics (see, e.g., Ehsan et al.,26 Feng et al.,20, 27, 67, 68, 69 Chen et al.,70, 71, 72 Lin et al.,73 Lai et al.,74 Zhao et al.,75 and Yang et al.76). The implementation of the SVM was conducted by using the LibSVM package 3.18 available at https://www.csie.ntu.edu.tw/∼cjlin/libsvm/. The radial basis kernel function (RBF) was used to obtain the classification hyperplane, and the grid search method was applied to optimize the regularization parameter C and kernel parameter γ.

The predictor obtained via the above procedures is called “iRNA-3typeA,” where “i” stands for “identify,” and “3typeA” means RNA’s “three types of modifications at adenosine sites.” Illustrated in Figure 3 is a flowchart to show the process of how the iRNA-3typeA predictor is working.

Figure 3
A Flowchart to Show How the iRNA-3typeA Predictor Is Working

Cross-Validation

To evaluate the quality of a new predictor, we need to consider the following two problems. What metrics should be used to quantitatively display its performance? And what concrete procedure should be followed to derive the metrics’ values?

  • (1)A set of four metrics. In literature, the following four conventional metrics are generally used to evaluate a predictor’s quality:77 (1) Acc, (2) MCC, (3) sensitivity (Sn), and (4) specificity (Sp). But the conventional expressions copied directly from math books are lacking in inductivity and hard to understand for most biological scientists. Fortunately, by using the symbols introduced by Chou in studying signal peptides,78 the four metrics can be converted to a set of intuitive ones58, 79 as given below:

(Equation 6){Sn=1N+N+0Sn1Sp=1N+N0Sp1Acc=1N++N+N++N0Acc1MCC=1(N+N++N+N)(1+N+N+N+)(1+N+N+N)1MCC1,where N+ represents the total number of positive samples investigated, while N+ is the number of positive samples incorrectly predicted to be negative, and N represents the total number of negative samples investigated, while N+ the number of the negative samples incorrectly predicted to be positive. With the set of formulations in Equation 6, the meanings of Sn, Sp, Acc, and MCC have become much more intuitive and easier to understand, as discussed in a series of recent studies in various biological areas (see, e.g., Liu et al.,21, 24, 28, 60 Ehsan et al.,26 Feng et al.,20, 27 Song et al.,31 Lin et al.,37 and Xu et al.80, 81).
  • (2)Jackknife test. Now the next problem is how to test the values of these metrics in an objective way. As is well known, the independent dataset test, subsampling (or K-fold cross-validation) test, and jackknife test are the three cross-validation methods widely used for testing a prediction method.82 Of the three test methods, however, the jackknife test is deemed the least arbitrary and most objective one.32 Accordingly, the jackknife test has been widely recognized and increasingly adopted by investigators to examine the quality of various predictors (see, e.g., Ahmad et al.,52, 83 Lin et al.,84 Tang et al.,85 Tripathi and Pandey,86 and Dao et al.87). In view of this, the jackknife test was also adopted in the current study to examine the proposed predictor. During the jackknife test, each sample in the benchmark dataset is in turn singled out as an independent test sample and all the rule-parameters are calculated without including the one being identified. One more advantage of using the jackknife test is that there is no need to artificially separate the benchmark dataset into two subsets, one for training the model and one for testing it. This is because the outcome obtained by the jackknife test is actually a combination from many different independent dataset tests.88, 89, 90

Author Contributions

W.C. and H.L. designed the study; P.F., H.Y., and H.D. conducted the experiments; W.C., H.L., and K.-C.C. analyzed the results; W.C., H.L., and K.-C.C. wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

The authors wish to thank the three anonymous reviewers, whose constructive comments were very helpful for further strengthening the presentation of this paper. This work was supported by theNatural Science Foundation of China(No. 31771471 and 61772119), theNatural Science Foundation for Distinguished Young Scholar of Hebei Province(No. C2017209244), theProgram for the Top Young Innovative Talents of Higher Learning Institutions of Hebei Province(No. BJ2014028), and theApplied Basic Research Program of Sichuan Province(No. 2015JY0100).

References

  • 1. GilbertW.V.BellT.A.SchaeningC.Messenger RNA modifications: form, distribution, and functionScience352201614081412[PubMed][Google Scholar]
  • 2. MachnickaM.A.MilanowskaK.Osman OglouO.PurtaE.KurkowskaM.OlchowikA.JanuszewskiW.KalinowskiS.Dunin-HorkawiczS.RotherK.M.MODOMICS: a database of RNA modification pathways—2013 updateNucleic Acids Res.412013D262D267[PubMed][Google Scholar]
  • 3. RoundtreeI.A.EvansM.E.PanT.HeC.Dynamic RNA modifications in gene expression regulationCell169201711871200[PubMed][Google Scholar]
  • 4. JiaG.FuY.ZhaoX.DaiQ.ZhengG.YangY.YiC.LindahlT.PanT.YangY.G.HeC.N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTONat. Chem. Biol.72011885887[PubMed][Google Scholar]
  • 5. WangX.LuZ.GomezA.HonG.C.YueY.HanD.FuY.ParisienM.DaiQ.JiaG.N6-methyladenosine-dependent regulation of messenger RNA stabilityNature5052014117120[PubMed][Google Scholar]
  • 6. ZhaoB.S.RoundtreeI.A.HeC.Post-transcriptional gene regulation by mRNA modificationsNat. Rev. Mol. Cell Biol.1820173142[PubMed][Google Scholar]
  • 7. LiX.XiongX.WangK.WangL.ShuX.MaS.YiC.Transcriptome-wide mapping reveals reversible and dynamic N(1)-methyladenosine methylomeNat. Chem. Biol.122016311316[PubMed][Google Scholar]
  • 8. ChenK.LuZ.WangX.FuY.LuoG.Z.LiuN.HanD.DominissiniD.DaiQ.PanT.HeC.High-resolution N(6) -methyladenosine (m(6) A) map using photo-crosslinking-assisted m(6) A sequencingAngew. Chem. Int. Ed. Engl.54201515871590[PubMed][Google Scholar]
  • 9. HelmM.MotorinY.Detecting RNA modifications in the epitranscriptome: predict and validateNat. Rev. Genet.182017275291[PubMed][Google Scholar]
  • 10. EstellerM.PandolfiP.P.The epitranscriptome of noncoding RNAs in cancerCancer Discov.72017359368[PubMed][Google Scholar]
  • 11. NachtergaeleS.HeC.The emerging biology of RNA post-transcriptional modificationsRNA Biol.142017156163[PubMed][Google Scholar]
  • 12. ChenW.TangH.LinH.MethyRNA: a web server for identification of N6-methyladenosine sitesJ. Biomol. Struct. Dyn.352017683687[PubMed][Google Scholar]
  • 13. QiuW.R.JiangS.Y.XuZ.C.XiaoX.ChouK.C.iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide compositionOncotarget820174117841188[PubMed][Google Scholar]
  • 14. ChenW.FengP.YangH.DingH.LinH.ChouK.C.iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequencesOncotarget8201742084217[PubMed][Google Scholar]
  • 15. ChenW.FengP.DingH.LinH.PAI: predicting adenosine to inosine editing sites by using pseudo nucleotide compositionsSci. Rep.6201635123[PubMed][Google Scholar]
  • 16. QiuW.R.JiangS.Y.SunB.Q.XiaoX.ChengX.ChouK.C.iRNA-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifierMed. Chem.132017734743[PubMed][Google Scholar]
  • 17. ChenW.TangH.YeJ.LinH.ChouK.C.iRNA-PseU: identifying RNA pseudouridine sitesMol. Ther. Nucleic Acids52016e332[PubMed][Google Scholar]
  • 18. FengP.DingH.ChenW.LinH.Identifying RNA 5-methylcytosine sites via pseudo nucleotide compositionsMol. Biosyst.12201633073311[PubMed][Google Scholar]
  • 19. ChengX.XiaoX.ChouK.C.pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAACMol. Biosyst.13201717221727[PubMed][Google Scholar]
  • 20. FengP.DingH.YangH.ChenW.LinH.ChouK.C.iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNCMol. Ther. Nucleic Acids72017155163[PubMed][Google Scholar]
  • 21. LiuB.WangS.LongR.ChouK.C.iRSpot-EL: identify recombination spots with an ensemble learning approachBioinformatics3320173541[PubMed][Google Scholar]
  • 22. QiuW.R.SunB.Q.XiaoX.XuZ.C.JiaJ.H.ChouK.C.iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifierGenomics2017Published online November 16, 2017[Google Scholar]
  • 23. XiaoX.ChengX.SuS.NaoQ.pLoc-mGpos: Incorporate key gene ontology information into general PseAAC for predicting subcellular localization of Gram-positive bacterial proteinsNat. Sci.92017331349[Google Scholar]
  • 24. LiuL.M.XuY.ChouK.C.iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAACMed. Chem.132017552559[PubMed][Google Scholar]
  • 25. ChengX.XiaoX.ChouK.C.pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAACGenomics11020185058[PubMed][Google Scholar]
  • 26. EhsanA.MahmoodK.KhanY.D.KhanS.A.ChouK.C.A novel modeling in mathematical biology for classification of aignal peptidesSci. Rep.820181039[PubMed][Google Scholar]
  • 27. FengP.YangH.DingH.LinH.ChenW.ChouK.C.iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNCGenomics2018Published online January 31, 2018[Google Scholar]
  • 28. LiuB.YangF.HuangD.S.ChouK.C.iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNCBioinformatics3420183340[PubMed][Google Scholar]
  • 29. SongJ.LiF.TakemotoK.HaffariG.AkutsuT.ChouK.C.WebbG.I.PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning frameworkJ. Theor. Biol.4432018125137[PubMed][Google Scholar]
  • 30. YangH.QiuW.R.LiuG.GuoF.B.LinH.iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNCInt. J. Biol. Sci.2018[Google Scholar]
  • 31. SongJ.WangY.LiF.AkutsuT.RawlingsN.D.iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sitesBrief. Bioinform.2018[Google Scholar]
  • 32. ChouK.C.Some remarks on protein attribute prediction and pseudo amino acid compositionJ. Theor. Biol.2732011236247[PubMed][Google Scholar]
  • 33. ShenH.B.Recent advances in developing web-servers for predicting protein attributesNat. Sci.120096392[Google Scholar]
  • 34. ChengX.ZhaoS.G.LinW.Z.XiaoX.ChouK.C.pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sitesBioinformatics33201735243531[PubMed][Google Scholar]
  • 35. ChengX.XiaoX.ChouK.C.pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAACGenomics2017Published online October 6, 2017[Google Scholar]
  • 36. ChengX.XiaoX.ChouK.C.pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO informationBioinformatics2017Published online November 2, 2017[Google Scholar]
  • 37. LinH.DengE.Z.DingH.ChenW.ChouK.C.iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide compositionNucleic Acids Res.4220141296112972[PubMed][Google Scholar]
  • 38. JiaJ.LiuZ.XiaoX.LiuB.ChouK.C.iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAACJ. Theor. Biol.37720154756[PubMed][Google Scholar]
  • 39. JiaJ.ZhangL.LiuZ.XiaoX.ChouK.C.pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAACBioinformatics32201631333141[PubMed][Google Scholar]
  • 40. ChengX.XiaoX.pLoc-mVirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAACGene6282017315321[PubMed][Google Scholar]
  • 41. ChouK.C.Impacts of bioinformatics to medicinal chemistryMed. Chem.112015218234[PubMed][Google Scholar]
  • 42. ChouK.C.An unprecedented revolution in medicinal chemistry driven by the progress of biological scienceCurr. Top. Med. Chem.17201723372358[PubMed][Google Scholar]
  • 43. ChenW.FengP.TangH.DingH.LinH.RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomesSci. Rep.6201631080[PubMed][Google Scholar]
  • 44. ChouK.C.Prediction of protein cellular attributes using pseudo amino acid compositionProteins432001246255[PubMed][Google Scholar]
  • 45. ZhongW.Z.ZhouS.F.Molecular science for drug development and biomedicineInt. J. Mol. Sci.1520142007220078[PubMed][Google Scholar]
  • 46. ZhouG.P.ZhongW.Z.Perspectives in medicinal chemistryCurr. Top. Med. Chem.162016381382[PubMed][Google Scholar]
  • 47. EsmaeiliM.MohabatkarH.MohsenzadehS.Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomavirusesJ. Theor. Biol.2632010203209[PubMed][Google Scholar]
  • 48. MohabatkarH.Mohammad BeigiM.EsmaeiliA.Prediction of GABAA receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machineJ. Theor. Biol.28120111823[PubMed][Google Scholar]
  • 49. NanniL.LuminiA.GuptaD.GargA.Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary informationIEEE/ACM Trans. Comput. Biol. Bioinformatics92012467475[Google Scholar]
  • 50. PacharawongsakdaE.TheeramunkongT.Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou’s PseAACIEEE Trans. Nanobioscience122013311320[PubMed][Google Scholar]
  • 51. MondalS.PaiP.P.Chou’s pseudo amino acid composition improves sequence-based antifreeze protein predictionJ. Theor. Biol.35620143035[PubMed][Google Scholar]
  • 52. AhmadS.KabirM.HayatM.Identification of heat shock protein families and J-protein types by incorporating dipeptide composition into Chou’s general PseAACComput. Methods Programs Biomed.1222015165174[PubMed][Google Scholar]
  • 53. KabirM.HayatM.iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samplesMol. Genet. Genomics2912016285296[PubMed][Google Scholar]
  • 54. YuB.LiS.QiuW.Y.ChenC.ChenR.X.WangL.WangM.H.ZhangY.Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoisingOncotarget82017107640107665[PubMed][Google Scholar]
  • 55. ZhangS.DuanX.Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAACJ. Theor. Biol.4372018239250[PubMed][Google Scholar]
  • 56. Muthu KrishnanS.Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domainsJ. Theor. Biol.44520186274[PubMed][Google Scholar]
  • 57. ChouK.C.Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biologyCurr. Proteomics62009262274[Google Scholar]
  • 58. ChenW.FengP.M.LinH.ChouK.C.iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide compositionNucleic Acids Res.412013e68[PubMed][Google Scholar]
  • 59. QiuW.R.XiaoX.ChouK.C.iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid componentsInt. J. Mol. Sci.15201417461766[PubMed][Google Scholar]
  • 60. LiuB.YangF.ChouK.C.2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their functionMol. Ther. Nucleic Acids72017267277[PubMed][Google Scholar]
  • 61. ChenW.LeiT.Y.JinD.C.LinH.ChouK.C.PseKNC: a flexible web server for generating pseudo K-tuple nucleotide compositionAnal. Biochem.45620145360[PubMed][Google Scholar]
  • 62. ChenW.ZhangX.BrookerJ.LinH.ZhangL.ChouK.C.PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositionsBioinformatics312015119120[PubMed][Google Scholar]
  • 63. ChenW.LinH.ChouK.C.Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequencesMol. Biosyst.11201526202634[PubMed][Google Scholar]
  • 64. ChenW.FengP.TangH.DingH.LinH.Identifying 2′-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositionsGenomics1072016255258[PubMed][Google Scholar]
  • 65. LiW.C.DengE.Z.DingH.ChenW.LinH.iORI-PseKNC: a predictor for identifying origin of replication with pseudo k-tuple nucleotide compositionChemometr. Intell. Lab. Syst.1412015100106[Google Scholar]
  • 66. ChouK.C.A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteinsJ. Biol. Chem.26819931693816948[PubMed][Google Scholar]
  • 67. FengP.M.ChenW.LinH.ChouK.C.iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet compositionAnal. Biochem.4422013118125[PubMed][Google Scholar]
  • 68. FengP.M.LinH.ChenW.Identification of antioxidants from sequence information using naïve BayesComput. Math. Methods Med.20132013567529[PubMed][Google Scholar]
  • 69. FengP.M.DingH.ChenW.LinH.Naïve Bayes classifier with feature selection to identify phage virion proteinsComput. Math. Methods Med.20132013530696[PubMed][Google Scholar]
  • 70. ChenW.FengP.M.LinH.ChouK.C.iSS-PseDNC: identifying splicing sites using pseudo dinucleotide compositionBioMed Res. Int.20142014623149[PubMed][Google Scholar]
  • 71. ChenW.YangH.FengP.DingH.LinH.iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical propertiesBioinformatics33201735183523[PubMed][Google Scholar]
  • 72. ChenX.X.TangH.LiW.C.WuH.ChenW.DingH.LinH.Identification of bacterial cell wall lyases via pseudo amino acid compositionBioMed Res. Int.201620161654623[PubMed][Google Scholar]
  • 73. LinH.LiangZ.Y.TangH.ChenW.Identifying sigma70 promoters with novel pseudo nucleotide compositionIEEE/ACM Trans. Comput. Biol. Bioinformatics2017Published online February 8, 2017[Google Scholar]
  • 74. LaiH.Y.ChenX.X.ChenW.TangH.LinH.Sequence-based predictive modeling to identify cancerlectinsOncotarget820172816928175[PubMed][Google Scholar]
  • 75. ZhaoY.W.LaiH.Y.TangH.ChenW.LinH.Prediction of phosphothreonine sites in human proteins by fusing different featuresSci. Rep.6201634817[PubMed][Google Scholar]
  • 76. YangH.TangH.ChenX.X.ZhangC.J.ZhuP.P.DingH.ChenW.LinH.Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid compositionBioMed Res. Int.201620165413903[PubMed][Google Scholar]
  • 77. ChenJ.LiuH.YangJ.ChouK.C.Prediction of linear B-cell epitopes using amino acid pair antigenicity scaleAmino Acids332007423428[PubMed][Google Scholar]
  • 78. ChouK.C.Prediction of signal peptides using scaled windowPeptides22200119731979[PubMed][Google Scholar]
  • 79. XuY.DingJ.WuL.Y.ChouK.C.iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid compositionPLoS ONE82013e55844[PubMed][Google Scholar]
  • 80. XuY.ShaoX.J.WuL.Y.DengN.Y.ChouK.C.iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteinsPeerJ12013e171[PubMed][Google Scholar]
  • 81. XuY.WangZ.LiC.ChouK.C.iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAACMed. Chem.132017544551[PubMed][Google Scholar]
  • 82. ChouK.C.ZhangC.T.Prediction of protein structural classesCrit. Rev. Biochem. Mol. Biol.301995275349[PubMed][Google Scholar]
  • 83. AhmadK.WarisM.HayatM.Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid compositionJ. Membr. Biol.2492016293304[PubMed][Google Scholar]
  • 84. LinH.LiuW.X.HeJ.LiuX.H.DingH.ChenW.Predicting cancerlectins by the optimal g-gap dipeptidesSci. Rep.5201516964[PubMed][Google Scholar]
  • 85. TangH.ZouP.ZhangC.ChenR.ChenW.LinH.Identification of apolipoprotein using feature selection techniqueSci. Rep.6201630441[PubMed][Google Scholar]
  • 86. TripathiP.PandeyP.N.A novel alignment-free method to classify protein folding types by combining spectral graph clustering with Chou’s pseudo amino acid compositionJ. Theor. Biol.42420174954[PubMed][Google Scholar]
  • 87. DaoF.Y.YangH.SuZ.D.YangW.WuY.HuiD.ChenW.TangH.LinH.Recent advances in conotoxin classification by using machine learning methodsMolecules222017e1057[PubMed][Google Scholar]
  • 88. ChouK.C.ShenH.B.Recent progress in protein subcellular location predictionAnal. Biochem.3702007116[PubMed][Google Scholar]
  • 89. ChouK.C.ShenH.B.Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organismsNat. Protoc.32008153162[PubMed][Google Scholar]
  • 90. ShenH.B.Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organismsNat. Sci.2201010901103[Google Scholar]
  • 91. FrankE.HallM.TriggL.HolmesG.WittenI.H.Data mining in bioinformatics using WekaBioinformatics20200424792481[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.