AnimalTFDB: a comprehensive animal transcription factor database.
Journal: 2012/July - Nucleic Acids Research
ISSN: 1362-4962
Abstract:
Transcription factors (TFs) are proteins that bind to specific DNA sequences, thereby playing crucial roles in gene-expression regulation through controlling the transcription of genetic information from DNA to RNA. Transcription cofactors and chromatin remodeling factors are also essential in the gene transcriptional regulation. Identifying and annotating all the TFs are primary and crucial steps for illustrating their functions and understanding the transcriptional regulation. In this study, based on manual literature reviews, we collected and curated 72 TF families for animals, which is currently the most complete list of TF families in animals. Then, we systematically characterized all the TFs in 50 animal species and constructed a comprehensive animal TF database, AnimalTFDB. To better serve the community, we provided detailed annotations for each TF, including basic information, gene structure, functional domain, 3D structure hit, Gene Ontology, pathway, protein-protein interaction, paralogs, orthologs, potential TF-binding sites and targets. In addition, we collected and annotated transcription cofactors and chromatin remodeling factors. AnimalTFDB has a user-friendly web interface with multiple browse and search functions, as well as data downloading. It is freely available at http://www.bioguo.org/AnimalTFDB/.
Relations:
Content
Citations
(104)
References
(22)
Chemicals
(1)
Organisms
(1)
Processes
(1)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Nucleic Acids Research. Dec/31/2011; 40(Database issue): D144-D149
Published online Nov/11/2011

AnimalTFDB: a comprehensive animal transcription factor database

Abstract

Transcription factors (TFs) are proteins that bind to specific DNA sequences, thereby playing crucial roles in gene-expression regulation through controlling the transcription of genetic information from DNA to RNA. Transcription cofactors and chromatin remodeling factors are also essential in the gene transcriptional regulation. Identifying and annotating all the TFs are primary and crucial steps for illustrating their functions and understanding the transcriptional regulation. In this study, based on manual literature reviews, we collected and curated 72 TF families for animals, which is currently the most complete list of TF families in animals. Then, we systematically characterized all the TFs in 50 animal species and constructed a comprehensive animal TF database, AnimalTFDB. To better serve the community, we provided detailed annotations for each TF, including basic information, gene structure, functional domain, 3D structure hit, Gene Ontology, pathway, protein–protein interaction, paralogs, orthologs, potential TF-binding sites and targets. In addition, we collected and annotated transcription cofactors and chromatin remodeling factors. AnimalTFDB has a user-friendly web interface with multiple browse and search functions, as well as data downloading. It is freely available at http://www.bioguo.org/AnimalTFDB/.

INTRODUCTION

Regulation of gene expression controls the spatial and temporal expression pattern and influences all biological processes in organisms. In this regulation, transcriptional regulatory system plays a key role and involves diverse proteins, including RNA polymerase, basal and sequence specific DNA-binding transcription factors (TFs), transcription cofactors and chromatin remodeling proteins (1). Among them, TFs are most fascinating owing to their complex regulation function. Here we use the common definition of TFs, which are proteins containing a sequence specific DNA-binding domain (DBD) and regulating target gene transcription. Based on their DBDs, TFs could be classified into different TF families. It is reported that about half of the TF families in plants and animals are plant or animal specific (2). TF families in plants were well characterized and several databases for plant TFs were developed (3–5). However, until now, there is no a comprehensive animal TF family list and a database characterizing all the TFs based on TF families for the sequenced animal genomes.

To date, there are several databases about TFs for some animals, such as TFdb for mouse (6), FlyTF for fruit fly (7), TFCat for human and mouse (8), TFCONES for human, mouse and fugu (9) and ITFP for human, mouse and rat (10). As mentioned, these databases only focus on single or a few genomes. Although TRANSFAC collects abundant information about TFs for several kinds of animals (11), yet it is a commercial database and collected only experimentally verified TFs. DBD is a comprehensive TF database for more than 900 genomes across the three super kingdoms of life (Bacteria, Archaea and Eukaryotes) and includes dozens of animals (12). However, the TF family classification and TF annotation for animals could be improved to better serve the community. Thus, an integrated animal TF database with higher coverage, higher accuracy and full annotation is required as more and more animal genomes were sequenced.

With this in mind, we collected and curated a comprehensive list for animal TF families by manual literature reviews. Then we predicted TFs for all these families in 50 sequenced animal genomes and constructed a comprehensive animal TF database AnimalTFDB (http://www.bioguo.org/AnimalTFDB/). Moreover, we predicted transcription cofactors and chromatin remodeling factors for these 50 genomes. The database has a user-friendly interface to display and search the detailed annotations. We hope that AnimalTFDB may become a useful resource for the research community, especially in the studies of comparative genomics and transcriptional regulation.

METHODS

Data sources

Currently, AnimalTFDB contains TFs, transcription cofactors and chromatin remodeling factors identified in 50 animals (Table1). All genome data were downloaded from Ensembl (release version 60, http://www.ensembl.org/) database.

Table 1.
Numbers of TFs, transcription cofactors and chromatin remodeling factors of 50 species in current AnimalTFDB
Group namesSpeciesCommon namesTFsCoFsCRFsTotal
PrimatesHomo sapiensHuman15443021501996
Macaca mulattaMacaque14402661191825
Pan troglodytesChimpanzee14292721351836
Gorilla gorillaGorilla14292641301823
Callithrix jacchusMarmoset13972771321806
Pongo pygmaeusOrangutan13312631181712
Microcebus murinusMouse Lemur1037180771294
Otolemur garnettiiBushbaby894129721095
Tarsius syrichtaTarsier842151641057
RodentsMus musculusMouse14572791301866
Rattus norvegicusRat13712571191747
Cavia porcellusGuinea Pig10542531171424
Oryctolagus cuniculusRabbit10472521171416
Ochotona princepsPika903173751151
Dipodomys ordiiKangaroo rat862170781110
Tupaia belangeriTree Shrew815138641017
Spermophilus tridecemlineatusSquirrel81012852990
LaurasiatheriaBos taurusCow13132571231693
Equus caballusHorse12402581231621
Ailuropoda melanoleucaGiant Panda11992581271584
Tursiops truncatusDolphin11672341101511
Pteropus vampyrusMegabat11192361111466
Canis familiarisDog10622571291448
Sus scrofaPig1038195901323
Myotis lucifugusMicrobat970156691195
Felis catusCat887139621088
Erinaceus europaeusHedgehog74411863925
Sorex araneusShrew63012661817
Vicugna pacosAlpaca64611858822
AfrotheriaLoxodonta africanaElephant10962611191476
Procavia capensisHyrax983177741234
Echinops telfairiLesser hedgehog tenrec985155591199
XenarthraDasypus novemcinctusArmadillo868132611061
Choloepus hoffmanniSloth72510748880
Other mammalsMonodelphis domesticaOpossum1454241971792
Macropus eugeniWallaby897150531100
Ornithorhynchus anatinusPlatypus814149601023
Birds and reptilesTaeniopygia guttataZebra Finch1185181821448
Gallus gallusChicken775192831050
Anolis carolinensisLizard1211197821490
AmphibiaXenopus tropicalisFrog1038168671273
FishesDanio rerioZebrafish1916160772153
Takifugu rubripesFugu1274162731509
Tetraodon nigroviridisTetraodon1292151631506
Gasterosteus aculeatusStickleback1227153691449
Oryzias latipesMedaka1187138631388
Other chordatesCiona savignyiSea squirt4093219460
Ciona intestinalisSea squirt4284016484
Other EukaryotesDrosophila melanogasterFruitfly6273820685
Caenorhabditis elegansWorm657219687
Total52 7259111416966 005
CoFs, transcription cofactors; CRFs, chromatin remodeling factors.

Animal TF family list and their HMM profiles

We characterized and classified TFs by their sequence specific DBDs. After reviewing literatures, we finally collected and curated 71 animal TF families and a group named ‘others’ including some orphan TFs (http://www.bioguo.org/AnimalTFDB/help.php), which is currently the most complete TF family list for animals. Among them, 59 families had Hidden Markov Model (HMM) profiles for their DBDs in Pfam database (v25.0) (13), while no HMM profiles were available for the other 12 TF families. We built HMM profiles for them based on their DBD multiple sequence alignments by the hmmbuild program in the HMMER package.

TFs identification

We applied the hmmsearch program in HMMER package to search all the protein sequences against the DBD HMM profiles to predict TFs. Based on our manual checking for the predicted human and mouse TF results, we took E-value 0.0001 as the cutoff, which simultaneously considered the accuracy and sensitivity. For TFs that had more than one DBD, we assigned them into families based on their true DBD, which is the domain exactly binding to DNA in those proteins.

Identification of transcription cofactors and chromatin remodeling factors

In AnimalTFDB, transcription cofactors were considered as proteins that interact with TFs in the transcription apparatus but are not able to bind the DNA directly. The chromatin remodeling factors were defined as proteins that regulate transcription by modifying the chromatin formation. To identify them, we firstly got the human transcription cofactors and chromatin remodeling factors from TFCONES (9) and Gene Ontology (GO) (14) databases according to the GO items: transcription cofactor activity and chromatin remodeling, respectively. Then, we used the human sequences to perform BLAST search and chose the best BLAST hits as the transcription cofactors or chromatin remodeling factors for the searched species.

DATABASE CONTENT

Annotations of the identified factors

The numbers of TFs, transcription cofactors and chromatin remodeling factors identified in 50 animals were showed in Table 1. In order to provide more useful information, we made extensive annotations for them. We obtained the basic gene information and GO annotation from NCBI and Ensembl databases. Putative functional domains and 3D structure hits for the longest protein of each gene were offered. The protein–protein interaction information was parsed from BioGRID (15), HPRD (16) and An atlas of human and mouse TF interactions (17) databases. The pathway annotations from BioCarta (http://www.biocarta.com/) and KEGG (18) databases were available in AnimalTFDB. TFs binding sites and target genes were extracted from TRED (19) and JASPAR (20) databases. In addition, we also provided links to GenBank, Unigene and many species-specific databases such as: MGI, HGNC, FlyBase and so on.

Putative ortholog and paralog annotation

To predict the putative orthologs of these factors among different species, the reciprocal best hit (RBH) method (21) was used. We performed the all-against-all BLASTP search between proteins of two genomes with strict cutoffs E-value ≤ 1e–20, coverage ≥ 70%, identity ≥ 50% and set the reciprocal best hit pairs as orthologs. While, we applied the BLAST score ratio (BSR) (22) approach to predict paralogs. BLASTP search was done in each genome with the same benchmark applied in ortholog finding. After comparing the results of different BSR value, we chose the BSR value 0.4 as the cutoff for paralogs.

WEB INTERFACE

Database organization

Considering MySQL is a free database management system widely applied in bioinformatics, we stored all the information of AnimalTFDB in a MySQL database. Since the different TF annotations varied in contents and formats, we classified all the data into 30 separated tables. The Ensembl ID and Gene ID were used as the main keys to organize and link all the tables.

Data browse

To help users browse the data conveniently and clearly, AnimalTFDB provided two different ways to browse the data: (i) browse by species; (ii) browse by family. On the browse family page, all TF families were further merged into six groups based on the TRANSFAC classification: helix–turn–helix, other α-helix, zinc-coordinating, basic domains, β-scaffold and unclassified structure. The TF family list in each group was shown by the treeview on the left part of this page and the 3D structure images of TF DBDs were used as the family logos on the right part. On the browse species page, 50 species were classified into 11 categories according to the Ensembl taxonomy, which were primates, rodents, laurasiatheria, afrotheria, xenarthra, other mammals, birds & reptiles, amphibians, fishes, other chordates and other eukaryotes. An image from Ensembl was used to show phylogenetics of the 50 animals and an equivalent treeview was built on the left part. Users can browse data by clicking the logos of family and species or by clicking the name on the left treeview of the browse pages. In AnimalTFDB, a cascading style is applied for data browsing, which is browsed by the steps species->families->family gene list->single gene annotation or families->species->family gene list->single gene annotation (Figure 1).

Figure 1.

An overview and gene annotation page in AnimalTFDB. (A) Species in AnimalTFDB. (B) Three kinds of factors in human: TFs, transcription cofactors and chromatin remodeling factors. (C) A list of human TFs in the TF_Otx family. (D) An example of gene annotation page.

Data search

AnimalTFDB provided two different ways to search the data: quick search and advanced search. A quick search box was shown at the top-right of each page designed for searching by Ensembl IDs for gene, transcript and protein, Entrez gene ID or gene symbol. Advanced search page provided multiple ways for searching by different annotations and keywords of each gene. In addition, users could assign the specific families and species for better search.

DISCUSSION

Comparison with other databases and evaluation of TF identification

We compared our predicted human and mouse TFs with those published by DBD (12) and TFCat (8) databases. DBD is a comprehensive predicted TF database for bacteria, archaea and eukaryotes, while TFCat is a curated catalog for human and mouse TFs. For DBD database, through converting the protein ID into gene ID, we obtained 1383 and 1386 Ensembl gene IDs for human and mouse TF genes, respectively. By comparison, the AnimalTFDB includes 93.7% of human TFs and 93.6% of mouse TFs from DBD database. For the TFs in TFCat database, after ID conversion, we got 521 and 543 Ensembl gene IDs for human and mouse TFs, respectively. The compared result showed that 97.1% of human TFs and 96.3% of mouse TFs from TFCat database were available in our database.

We carefully checked the difference between our AnimalTFDB with the two other databases. For those TFs in the two databases but not in our database, there are two cases. First, some of them are not true TFs predicted by false TF DBD models, such as zf-A20, RNA_pol_Rpb2 and SART-1. Second, some of them should be transcription cofactors or chromatin remodeling factors, which are in the corresponding lists of AnimalTFDB. We also examined the approximately 300 AnimalTFDB-specific TFs for human and mouse. The results showed that some of them were predicted by our unique TF families, such as THAP, CBF, TSC22, Nrf1 and COE. Proteins in these families are true TFs evidenced by publications or having a typical DBD. About half of AnimalTFDB specific TFs were distributed in zf-C2H2, Homeobox, HMG and MYB families, which are all big TF families and account for ∼60% TFs of the genome. Although most of the specific TFs in these big families are unknown proteins containing typical DBDs, we still found a few of them (e.g. KLF6, KLF8, PBX2, TCF7L1 and HBP1) are proved to be as TFs by experiments in publications. Thus, we think we should keep them in the database.

Furthermore, we used the GO annotations to evaluate the reliability and accuracy of our TF list. As a result, we found that 96.3% of our identified human TFs were annotated by TF-related GO terms, such as ‘TF activity’, ‘transcription activator/repressor/regulator activity’ and ‘DNA binding’. These results suggest that the TF prediction approach we used has a reliable performance.

Comparing to other databases, our AnimalTFDB have a more complete and accurate TF family list, and thus a more accurate TF gene list with higher sensitivity and specificity. Moreover, our website is intuitive and easy to browse and search for users. Thirdly, comprehensive annotations are provided in our database as described above. Therefore, we think the AnimalTFDB database will be helpful for the community.

FUTURE PERSPECTIVES

AnimalTFDB is a comprehensive animal TF database, which characterized genome-wide TFs, transcription cofactors and chromatin remodeling factors in 50 sequenced animal genomes. According to their DBDs, all the TFs were classified into 72 families, and this is currently the most complete animal TF family list. Since our pipeline for TF prediction is built, it is much easier for us to update the data regularly with more animal genome data available. Further, we will pay more attention to the transcriptional cofactors and chromatin remodeling factors and try to classify them into different families in the future. We plan to construct and maintain a comprehensive animal TF database to provide a solid foundation for the studies of transcriptional regulation and comparative genomics.

AVAILABILITY

The AnimalTFDB database is freely available at http://www.bioguo.org/AnimalTFDB/.

FUNDING

Starting Fund fromHuazhong University of Science and Technology(to A.Y.G.);Fundamental Research Fundsfor the Central Universities (2010MS045); andNational Natural Science Foundation of China(31171271). Funding for open access charge:National Natural Science Foundation of China(31171271).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors would like to thank Zhaowu Ma, Huashan Ye, Mi Zhou, Jun Yan, Shuzhen Kuang, Yifang Liao and Yuliang Wu for their valuable advices to improve the database.

References

  • 1. LemonBTjianROrchestrated response: a symphony of transcription factors for gene controlGenes Dev.20001425512569[PubMed][Google Scholar]
  • 2. RiechmannJLHeardJMartinGReuberLJiangCKeddieJAdamLPinedaORatcliffeOJSamahaRRArabidopsis transcription factors: genome-wide comparative analysis among eukaryotesScience200029021052110[PubMed][Google Scholar]
  • 3. GuoAHeKLiuDBaiSGuXWeiLLuoJDATF: a database of Arabidopsis transcription factorsBioinformatics20052125682569[PubMed][Google Scholar]
  • 4. Riano-PachonDMRuzicicSDreyerIMueller-RoeberBPlnTFDB: an integrative plant transcription factor databaseBMC Bioinformatics2007842[PubMed][Google Scholar]
  • 5. GuoAYChenXGaoGZhangHZhuQHLiuXCZhongYFGuXHeKLuoJPlantTFDB: a comprehensive plant transcription factor databaseNucleic Acids Res.200836D966D969[PubMed][Google Scholar]
  • 6. KanamoriMKonnoHOsatoNKawaiJHayashizakiYSuzukiHA genome-wide and nonredundant mouse transcription factor databaseBiochem. Biophys. Res. Commun.2004322787793[PubMed][Google Scholar]
  • 7. AdryanBTeichmannSAFlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogasterBioinformatics20062215321533[PubMed][Google Scholar]
  • 8. FultonDLSundararajanSBadisGHughesTRWassermanWWRoachJCSladekRTFCat: the curated catalog of mouse and human transcription factorsGenome Biol.200910R29[PubMed][Google Scholar]
  • 9. LeeAPYangYBrennerSVenkateshBTFCONES: a database of vertebrate transcription factor-encoding genes and their associated conserved noncoding elementsBMC Genomics20078441[PubMed][Google Scholar]
  • 10. ZhengGTuKYangQXiongYWeiCXieLZhuYLiYITFP: an integrated platform of mammalian transcription factorsBioinformatics20082424162417[PubMed][Google Scholar]
  • 11. MatysVKel-MargoulisOVFrickeELiebichILandSBarre-DirrieAReuterIChekmenevDKrullMHornischerKTRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotesNucleic Acids Res.200634D108D110[PubMed][Google Scholar]
  • 12. KummerfeldSKTeichmannSADBD: a transcription factor prediction databaseNucleic Acids Res.200634D74D81[PubMed][Google Scholar]
  • 13. FinnRDMistryJTateJCoggillPHegerAPollingtonJEGavinOLGunasekaranPCericGForslundKThe Pfam protein families databaseNucleic Acids Res.201038D211D222[PubMed][Google Scholar]
  • 14. BarrellDDimmerEHuntleyRPBinnsDO'DonovanCApweilerRThe GOA database in 2009–an integrated Gene Ontology Annotation resourceNucleic Acids Res.200937D396D403[PubMed][Google Scholar]
  • 15. StarkCBreitkreutzBJChatr-AryamontriABoucherLOughtredRLivstoneMSNixonJVan AukenKWangXShiXThe BioGRID Interaction Database: 2011 updateNucleic Acids Res.201139D698D704[PubMed][Google Scholar]
  • 16. Keshava PrasadTSGoelRKandasamyKKeerthikumarSKumarSMathivananSTelikicherlaDRajuRShafreenBVenugopalAHuman Protein Reference Database–2009 updateNucleic Acids Res.200937D767D772[PubMed][Google Scholar]
  • 17. RavasiTSuzukiHCannistraciCVKatayamaSBajicVBTanKAkalinASchmeierSKanamori-KatayamaMBertinNAn atlas of combinatorial transcriptional regulation in mouse and manCell2010140744752[PubMed][Google Scholar]
  • 18. KanehisaMGotoSKawashimaSOkunoYHattoriMThe KEGG resource for deciphering the genomeNucleic Acids Res.200432D277D280[PubMed][Google Scholar]
  • 19. ZhaoFXuanZLiuLZhangMQTRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studiesNucleic Acids Res.200533D103D107[PubMed][Google Scholar]
  • 20. Portales-CasamarEThongjueaSKwonATArenillasDZhaoXValenEYusufDLenhardBWassermanWWSandelinAJASPAR 2010: the greatly expanded open-access database of transcription factor binding profilesNucleic Acids Res.201038D105D110[PubMed][Google Scholar]
  • 21. Moreno-HagelsiebGLatimerKChoosing BLAST options for better detection of orthologs as reciprocal best hitsBioinformatics200824319324[PubMed][Google Scholar]
  • 22. RaskoDAMyersGSRavelJVisualization of comparative genomic analyses by BLAST score ratioBMC Bioinformatics200562[PubMed][Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.