NucleaRDB: information system for nuclear receptors

Bas Vroling

David Thorne

Philip McDermott

Henk-Jan Joosten

Teresa K. Attwood

Steve Pettifer

Gert Vriend

Abstract

The NucleaRDB is a Molecular Class-Specific Information System that collects, combines, validates and disseminates large amounts of heterogeneous data on nuclear hormone receptors. It contains both experimental and computationally derived data. The data and knowledge present in the NucleaRDB can be accessed using a number of different interactive and programmatic methods and query systems. A nuclear hormone receptor-specific PDF reader interface is available that can integrate the contents of the NucleaRDB with full-text scientific articles. The NucleaRDB is freely available at http://www.receptors.org/nucleardb.

INTRODUCTION

Nuclear receptors (NRs) are ligand-inducible transcription factors that regulate processes, such as homeostasis, differentiation, embryonic development and organ physiology. A total of 49 human NRs have been identified (1). Their ligands are lipophilic compounds such as steroids, thyroid hormone, vitamin D3 and retinoids (2). The endogenous ligands are not yet known for 30% of the NRs (3). As nuclear receptors are involved in almost all aspects of human physiology and are implicated in many important diseases including cancer, diabetes and osteoporosis, understanding of these receptors has major implications for human biology and for the development of new drug treatments. Nuclear receptors are targets for pharmaceutical industries with similar importance (4), as the G protein-coupled receptors (GPCRs), ion channels and kinases.

Due to the increasing amounts of experimental and computational data buried in numerous databases and scientific articles, the task of extracting, combining and validating this data is becoming an increasingly large hurdle for the individual scientist. Databases that revolve around a single protein family can help researchers in using all data needed for their research, while relieving them of the onerous tasks related to the retrieval of many data from different sources (5).

The NucleaRDB is a data source that holds many different data types (Table 1) in a well organized and easily accessible form (6). The data are validated, internally consistent and updated regularly. The NucleaRDB provides access to the data via various interfaces, which depending on the users’ needs, are suited either for automated access or interactive usage.

DATA CONTENTS

Primary data

The NucleaRDB contains three different primary data types: sequences, structures and mutations. Sequences and structures were updated as described previously (7). Mutation data was obtained from the Nuclear Receptor Mutation Database (8) and fully integrated in the NucleaRDB. In addition, a large body of mutations was extracted from literature by the software package MuteXt (9).

Computational data

A large and diverse collection of computationally generated data are present in the NucleaRDB. Multiple sequence alignments (MSAs) form the heart of the system and allow users to easily transfer information between different proteins. MSAs are available for all families and subfamilies, and can be viewed using JalView (10) or can be directly downloaded in a number of formats. MSAs were created as described previously (7).

Correlated mutation analyses (CMA) can be used to identify groups of residues that mutate in tandem. Residues that show correlated mutation behavior are likely to be functionally related, and networks of those correlating residues indicate functional units (11). Correlation scores are available for all (sub-)families.

The entropy and variability for a position in a MSA can be an indicator of the evolutionary pressures exerted at that position (12). Entropy and variability scores are available in tabular form and via an interactive page displaying an integrated view via plots, tables and structure models.

In addition to the already large amount of structural information that is present in the NucleaRDB, homology models based on multiple template structures have been built for all NRs. All structure models were built using YASARA (13) and are available for download or can be viewed directly using Jmol (14).

INFORMATION RETRIEVAL

All data in the NucleaRDB web interface are extensively connected, allowing for easy navigation between different data types. The main way of accessing the NucleaRDB’s contents is via the hierarchical family tree. For each family, users can access the individual receptors, multiple sequence alignments (and all derived data and analyses such as correlation scores and protein distance networks), mutations, structures and models (Figure 1). All pages contain links to all related data and information. Extensive search facilities are available, allowing the search for proteins, sequences, structures, families and mutations using various search criteria and filters. A BLAST service is available that allows users to run their own sequences against the NucleaRDB.

All data types and search facilities are accessible from the web pages as well as from the web service endpoints, allowing users to write workflows or in-house software that uses the NucleaRDB.

Annotating scientific literature

Utopia Documents (15,16) is a new PDF reader that offers unique opportunities to place information and knowledge in the context of scientific literature. We have integrated the NucleaRDB with the Utopia Documents PDF reader in such a way as to present to scientists, in a non-intrusive way, all NR-relevant data and information discussed in an article at hand. Annotations are provided for proteins, residues and mutations mentioned in the PDF. For each of these concepts the annotations contain carefully selected information, as well as pointers to relevant web pages and related scientific literature. An example is shown in Figure 2. The PDF reader presents the scientist, in a non-intrusive way, all relevant data and information related to the topics discussed in the article. This alleviates the troubles associated with navigating the many links between existing data and information available from the many articles in this field. The scientist neither struggles to get access to information related to topics within an article, nor is swamped by unnecessary information that still needs disambiguation; only data and information relevant to the topic of the article is made available.

Figure 1.

Screenshot of the NucleaRDB family page. The family tree is shown on the left with the thyroid hormone family expanded. On the right-hand side, the data for the selected family is shown.

Figure 2.

An impression of the Utopia Documents PDF reader interface to the NucleaRDB data. On the left-hand side a part of a scientific paper (17) is shown that is annotated by the NucleaRDB. Annotations are available for all the highlighted words. On the right-hand side an example of such an annotation (the mutation R274A) is displayed.

Table 1.

Contents of the NucleaRDB

Proteins	3764
Families	123
Mutations	1543
Protein structures	613
Structure models	3764
Residues	2 012 651
Species	339

IMPLEMENTATION

The data in the NucleaRDB is stored in a PostgreSQL (www.postgresql.org) relational database. The web service interface is developed with the Apache CXF (cxf.apache.org) web services framework. We offer both Simple Object Access Protocol and Representational state transfer endpoints. The web interface is built using the Apache Wicket (wicket.apache.org) web application framework. The database is accessed via a Hibernate (www.hibernate.org) object-relational mapping layer. The server is running within Sun’s Glassfish (www.glassfish.org) application server.

CONCLUSION

The NucleaRDB provides researchers with a single point of access for nuclear receptor-related data. Not only does the NucleaRDB hold a large amount of information, it also provides a broad scope of tools and dissemination facilities, relieving scientist of many of the tasks that come with collecting, validating and integrating many diverse data.

FUNDING

BioRange program of the Netherlands Bioinformatics Centre (NBIC);BSIKgrant throughthe Netherlands Genomics Initiative (NGI);EMBRACEproject that is funded by theEuropean Commissionwithin itsFP6 Programme, underthe thematic area ‘Life sciences, genomics and biotechnology for health’(contract numberLHSG-CT-2004-512092); andTIPharma. Funding for open access charge:RUNMC.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Maarten Hekkelman, Wilmar Teunissen and Tim teBeek for their support with computer science issues. We thank TIPharma for financial support.

References

1. Robinson-RechaviMCarpentierASDuffraisseMLaudetVHow many nuclear hormone receptors are there in the human genome?Trends Genet.200117554556[PubMed][Google Scholar]
2. MangelsdorfDJThummelCBeatoMHerrlichPSchützGUmesonoKBlumbergBKastnerPMarkMChambonPThe nuclear receptor superfamily: the second decadeCell199583835839[PubMed][Google Scholar]
3. KliewerSALehmannJMWillsonTMOrphan nuclear receptors: shifting endocrinology into reverseScience1999284757760[PubMed][Google Scholar]
4. HopkinsALGroomCRThe druggable genomeNat. Rev. Drug Discov.20021727730[PubMed][Google Scholar]
5. FolkertsmaSvan NoortPVan DurmeJJoostenH-JBettlerEFleurenWOliveiraLHornFde VliegJVriendGA family-based approach reveals the function of residues in the nuclear receptor ligand-binding domainJ. Mol. Biol.2004341321335[PubMed][Google Scholar]
6. HornFVriendGCohenFECollecting and harvesting biological data: the GPCRDB and NucleaRDB information systemsNucleic Acids Res.200129346349[PubMed][Google Scholar]
7. VrolingBSandersMBaakmanCBorrmannAVerhoevenSKlompJOliveiraLde VliegJVriendGGPCRDB: information system for G protein-coupled receptorsNucleic Acids Res.201139D309D319[PubMed][Google Scholar]
8. Van DurmeJJJBettlerEFolkertsmaSHornFVriendGNRMD: Nuclear Receptor Mutation DatabaseNucleic Acids Res.200331331333[PubMed][Google Scholar]
9. HornFLauALCohenFEAutomated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptorsBioinformatics200420557568[PubMed][Google Scholar]
10. WaterhouseAMProcterJBMartinDMAClampMBartonGJJalview Version 2—a multiple sequence alignment editor and analysis workbenchBioinformatics20092511891191[PubMed][Google Scholar]
11. OliveiraLPaivaACMVriendGCorrelated mutation analyses on very large sequence familiesChembiochem2002310101017[PubMed][Google Scholar]
12. YeKLameijerE-WMBeukersMWIjzermanAPA two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptorsProteins20066310181030[PubMed][Google Scholar]
13. KriegerEJooKLeeJLeeJRamanSThompsonJTykaMBakerDKarplusKImproving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8Proteins200977Suppl. 9114122[PubMed][Google Scholar]
14. HerráezABiomolecules in the computer: Jmol to the rescueBiochem. Mol. Biol. Educ.200234255261[PubMed][Google Scholar]
15. AttwoodTKKellDBMcDermottPMarshJPettiferSRThorneDCalling international rescue: knowledge lost in literature and data landslide! BiochemJ.2009424317333[Google Scholar]
16. AttwoodTKKellDBMcDermottPMarshJPettiferSRThorneDUtopia documents: linking scholarly literature with research dataBioinformatics201026i568i574[PubMed][Google Scholar]
17. ChoiMYamamotoKMasunoHNakashimaKTagaTYamadaSLigand recognition by the vitamin D receptorBioorg. Med. Chem.2001917211730[PubMed][Google Scholar]