Homotypic regulatory clusters in Drosophila.
Journal: 2003/May - Genome Research
ISSN: 1088-9051
Abstract:
Cis-regulatory modules (CRMs) are transcription regulatory DNA segments (approximately 1 Kb range) that control the expression of developmental genes in higher eukaryotes. We analyzed clustering of known binding motifs for transcription factors (TFs) in over 60 known CRMs from 20 Drosophila developmental genes, and we present evidence that each type of recognition motif forms significant clusters within the regulatory regions regulated by the corresponding TF. We demonstrate how a search with a single binding motif can be applied to explore gene regulatory networks and to discover coregulated genes in the genome. We also discuss the potential of the clustering method in interpreting the differential response of genes to various levels of transcriptional regulators.
Relations:
Content
Citations
(74)
References
(43)
Grants
(23)
Drugs
(1)
Chemicals
(1)
Genes
(11)
Organisms
(2)
Processes
(7)
Similar articles
Articles by the same authors
Discussion board
Genome Res 13(4): 579-588

Homotypic Regulatory Clusters in <em>Drosophila</em>

Institute of Chemical Physics, Moscow, 117421 Russia; Scientific Center “Genetika,” Moscow, 113545 Russia; Department of Biology, New York University, New York, New York 10003-6688, USA
Corresponding author.
Received 2002 Jul 26; Accepted 2003 Jan 22.

Abstract

Cis-regulatory modules (CRMs) are transcription regulatory DNA segments (∼1 Kb range) that control the expression of developmental genes in higher eukaryotes. We analyzed clustering of known binding motifs for transcription factors (TFs) in over 60 known CRMs from 20 Drosophila developmental genes, and we present evidence that each type of recognition motif forms significant clusters within the regulatory regions regulated by the corresponding TF. We demonstrate how a search with a single binding motif can be applied to explore gene regulatory networks and to discover coregulated genes in the genome. We also discuss the potential of the clustering method in interpreting the differential response of genes to various levels of transcriptional regulators.

Abstract

One of the most intriguing questions for understanding protein-DNA recognition is how a low-abundant transcription factor (TF) quickly finds a short recognition motif at the correct place in the genome (Berg and von Hippel 1987, 1988). This important problem of TF recruitment to its functional binding site can be considered from the informational and the molecular points of view. From the informational point of view, regulatory regions must differ strikingly from the rest of the genome to facilitate the recruitment. The amount of regulatory information encoded by a single binding site is much smaller than that encoded by an array of similar binding sites, that is, a ‘homotypic’ binding site cluster. This informational advantage of the homotypic clusters might be utilized by molecular mechanisms such as high-affinity cooperative binding (Hertel et al. 1997) or lateral diffusion of a TF along the regulatory region from low- to high-affinity binding sites (Kim et al. 1987; Khory et al. 1990; Coleman and Pugh 1995). Indeed, the presence of multiple copies of binding sites of the same type in promoter and enhancer regions is a widely spread phenomenon in nature (Stanojevic et al. 1991; Arnone and Davidson 1997; Papatsenko et al. 2002). Most transcription regulatory regions, however, contain different types of binding motifs; therefore, the major efforts in exploring binding-site clustering have thus far focused on the extraction of ‘heterotypic’ clusters (clusters containing different binding motifs).

Several conceptually different strategies have been employed for evaluation of clustered signals in regulatory regions. The majority of these studies describe a cluster as a series of closely spaced binding sites for a number of different TFs. The ‘fuzzy clustering’ (Pickert et al. 1998) and related algorithms (Kondrakhin et al. 1995; Wasserman and Fickett 1998) estimate the quality of the binding motifs (weaker and stronger sites) and cluster matches with similar position weight matrix (PWM) scores in a fixed window. Hidden Markov model (HMM)-based methods require parameter settings for the window size and the number of expected motifs (Crowley et al. 1997; Frith et al. 2001). R-scan algorithms assess distance distribution between PWM (or consensus) matches to binding motifs and compare cluster significance in all possible window sizes (Wagner 1997, 1999; Su et al. 2001).

Using R-scan algorithms, it was demonstrated that upstream segments of yeast genes contain homotypic clusters of binding sites for transcriptional regulators (Wagner 1997). In higher eukaryotes, however, many genes are regulated by distant regulatory elements, which can be well separated from the coding regions. It was recently reported by Berman et al. (2002) that the clustering of binding motifs provides sufficient information for localization of these distant cis-regulatory modules (CRMs) within the genome of Drosophila. It was also shown that even clustering of a single binding motif might provide a significant basis for evaluation of CRM sequences in developmental genes of Drosophila (Markstein et al. 2002; Papatsenko et al. 2002).

The early Drosophila developmental genes encode TFs that control pattern formation of the developing fly embryo. They form a spatiotemporal cascade of direct transcriptional interactions (Kassis 1990; Small et al. 1992; Nasiadka and Krause 1999). CRMs control the expression of these developmental genes and represent regions containing binding sites for multiple upstream factors, which are themselves the products of other developmental genes in the cascade. For many of these modules (e.g., stripe enhancers from even-skipped) (Stanojevic et al. 1991; Small et al. 1992), an extensive characterization is available at the genetic, biochemical, and evolutionary (comparison between species) levels, which makes them one of the best model systems in higher eukaryotes. Another advantage of the developmental cascade is that all genes are directly connected in this regulatory network: (1) they all encode TFs, and (2) they act as gradients in the cell-free environment of the developing fly embryo (syncytium). Therefore, these genes also represent a unique opportunity to study differential gene responses to the concentration of TFs (Driever and Nusslein-Volhard 1988a,b; Driever et al. 1989).

The main goals of the present study were to analyze the clustering of individual binding motifs in CRMs of Drosophila developmental genes and to confirm that the homotypic clusters are statistically significant in this model system. Another objective was to explore the dependence of the cluster significance and the fidelity of CRM recognition in the genome on the relative site affinity and the size of the resolution window. We also demonstrate here that the analysis of clustering can be applied for quantitative description of transcriptional regions through the relative site density and the relative site affinity.

Transcription regulatory modules and promoter sequences of higher eukaryotes contain more than one type of recognition motif. In this respect, CRM identification in the context of genome annotation appears to be most successful using heterotypic clustering models (Berman et al. 2002; Halfon et al. 2002). However, construction of biologically relevant, complex heterotypic cluster models represents a challenging problem. Here we demonstrate that homotypic clustering models, the simplest of all possible combinatorial models, can also produce very impressive results and may serve to complement the heterotypic approach. Moreover, description of each individual binding motif in the context of a homotypic cluster seems to be very convenient for biological analysis of sophisticated regulatory units.

Optimal parameters, correlation values, and the absolute cluster significance are shown for nine motifs at the point of the global correlation maximum. All maximums produce a narrow range of optimal window sizes, with a high correlation value and very high cluster significance. In some cases we also observed the presence of local maxima (CAD, HB, FTZ, TLL) with similar correlation values, but their corresponding window sizes do not agree with the rest of the data (500–600 bp). We estimated cluster frequency (probability to find a cluster in any given position of genome) as the product of the cluster E-value cutoff (conditional probability, see Methods) and the site E-value cutoff. The last column shows estimated cluster significance for a locus sequence (25 Kb) and accounts for multiple independent statistical tests performed simultaneously (correction Bonferroni). The cluster significance reflects probability that a given locus sequence (25 Kb) will contain a cluster by chance.

Correlation coefficients (CC) are shown for some of the tested motif/locus combinations. High CC reflects the presence of a binding motif cluster in its cognate CRM region. A moderate CC value indicates the presence of additional clusters of the motif in the locus or shift of binding site clusters relative to the regulatory regions. Motifs tested and the optimal parameter settings are shown in the top rows; gene names are given in the first column. In all tests, the size of the resolution window was set to 575 bp, according to the results of the global optimization procedure (see Table Table1).1). All combinations tested correspond to known regulatory interactions.

Acknowledgments

We thank Ben Berman for sharing his unpublished results, and stimulating discussion. We thank Stephen Small for his ideas regarding differential gene response and his suggestion to explore this issue on the example of even-skipped stripe 4+6 and 3+7 enhancers. We also thank Claude Desplan and Bud Mishra for careful reading of the manuscript and helpful discussion. This work was supported by a grant from the NIH/National Eye Institute (EY13010) to Claude Desplan. V.M. and A.L. were also supported in part by grants from the Ludwig Institute for Cancer Research, HHMI East Europe (#55000309), and RFBR (#02-04-49111).

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Acknowledgments

Notes

E-MAIL ude.uyn@5pad; FAX (212) 995-4710.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.668403.

Notes
E-MAIL ude.uyn@5pad; FAX (212) 995-4710.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.668403.

REFERENCES

REFERENCES

References

  • 1. Arnone M.I. and Davidson, E.H. 1997. The hardwiring of development: Organization and function of genomic regulatory systems.Development124: 1851-1864. [[PubMed]
  • 2. Berg O.G. and von Hippel, P.H. 1987. Selection of DNA binding sites by regulatory proteins: Statistical mechanical theory and application to operators and promoters.J. Mol. Biol.193: 723-750. [[PubMed]
  • 3. Berg O.G. and von Hippel, P.H. 1988. Selection of DNA binding sites by regulatory proteins [published erratum appears in Trends Biochem. Sci. 1988 Aug;13(8):301].Trends Biochem. Sci.13: 207-211. [[PubMed]
  • 4. Berman B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., and Eisen, M.B2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome.Proc. Natl. Acad. Sci.99: 757-762. [Google Scholar]
  • 5. Coleman R.A. and Pugh, B.F. 1995. Evidence for functional binding and stable sliding of the TATA binding protein on nonspecific DNA.J. Biol. Chem.270: 13850-13859. [[PubMed]
  • 6. Crowley E.M., Roeder, K., and Bina, M1997. A statistical model for locating regulatory regions in genomic DNA.J. Mol. Biol.268: 8-14. [[PubMed][Google Scholar]
  • 7. Driever W. and Nusslein-Volhard, C. 1988a. The bicoid protein determines position in the Drosophila embryo in a concentration-dependent manner.Cell54: 95-104. [[PubMed]
  • 8. Driever W. and Nusslein-Volhard, C. 1988b. A gradient of bicoid protein in Drosophila embryos.Cell54: 83-93. [[PubMed]
  • 9. Driever W., Thoma, G., and Nusslein-Volhard, C1989. Determination of spatial domains of zygotic gene expression in the Drosophila embryo by the affinity of binding sites for the bicoid morphogen.Nature340: 363-367. [[PubMed][Google Scholar]
  • 10. Eldon E.D. and Pirrotta, V. 1991. Interactions of the Drosophila gap gene giant with maternal and zygotic pattern-forming genes.Development111: 367-378. [[PubMed]
  • 11. Fortini M.E. and Rubin, G.M. 1990. Analysis of cis-acting requirements of the Rh3 and Rh4 genes reveals a bipartite organization to rhodopsin promoters in Drosophila melanogaster.Genes &amp; Dev.4: 444-463. [[PubMed]
  • 12. Frith M.C., Hansen, U., and Weng, Z2001. Detection of cis-element clusters in higher eukaryotic DNA.Bioinformatics17: 878-889. [[PubMed][Google Scholar]
  • 13. Fujioka M., Emi-Sarker, Y., Yusibova, G.L., Goto, T., and Jaynes, J.B1999. Analysis of an even-skipped rescue transgene reveals both composite and discrete neuronal and early blastoderm enhancers, and multistripe positioning by gap gene repressor gradients.Development126: 2527-2538. [Google Scholar]
  • 14. Gao Q. and Finkelstein, R. 1998. Targeting gene expression to the head: The Drosophila orthodenticle gene is a direct target of the Bicoid morphogen.Development125: 4185-4193. [[PubMed]
  • 15. Gao Q., Wang, Y., and Finkelstein, R1996. Orthodenticle regulation during embryonic head development in Drosophila.Mech. Dev.56: 3-15. [[PubMed][Google Scholar]
  • 16. Guigó R., Agarwal, P., Abril, J.F., Burset, M., and Fickett, J.W2000. An assessment of gene prediction accuracy in large DNA sequences.Genome Res.10: 1631-1642. [Google Scholar]
  • 17. Halfon M.S., Grad, Y., Church, G.M., and Michelson, A.M2002. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model.Genome Res.12: 1019-1028. [Google Scholar]
  • 18. Hertel K.J., Lynch, K.W., and Maniatis, T1997. Common themes in the function of transcription and splicing enhancers.Current Opinions in Cell Biology9: 350-357. [[PubMed][Google Scholar]
  • 19. Kassis J.A1990. Spatial and temporal control elements of the Drosophila engrailed gene.Genes &amp; Dev.4: 433-443. [[PubMed][Google Scholar]
  • 20. Khory A.M., Lee, H.J., Lillis, M., and Lu, P1990. Lac repressor-operator interaction: DNA length dependence.Biochim. Biophys. Acta1087: 55-60. [[PubMed][Google Scholar]
  • 21. Kim J.G., Takeda, Y., Matthews, B.W., and Anderson, W.F1987. Kinetic studies on Cro repressor-operator DNA interaction.J. Mol. Biol.196: 149-158. [[PubMed][Google Scholar]
  • 22. Kolpakov F.A., Ananko, E.A., Kolesov, G.B., and Kolchanov, N.A1998. GeneNet: A gene network database and its automated visualization.Bioinformatics14: 529-537. [[PubMed][Google Scholar]
  • 23. Kondrakhin Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., and Milanesi, L1995. Eukaryotic promoter recognition by binding sites for transcription factors.Comput. Appl. Biosci.11: 477-488. [[PubMed][Google Scholar]
  • 24. Kosman D. and Small, S. 1997. Concentration-dependent patterning by an ectopic expression domain of the Drosophila gap gene knirps.Development124: 1343-1354. [[PubMed]
  • 25. Kraut R. and Levine, M. 1991. Spatial regulation of the gap gene giant during Drosophila development.Development111: 601-609. [[PubMed]
  • 26. Ludwig M.Z., Patel, N.H., and Kreitman, M1998. Functional analysis of eve stripe 2 enhancer evolution in Drosophila: Rules governing conservation and change.Development125: 949-958. [[PubMed][Google Scholar]
  • 27. Ludwig M.Z., Bergman, C., Patel, N.H., and Kreitman, M2000. Evidence for stabilizing selection in a eukaryotic enhancer element.Nature403: 564-567. [[PubMed][Google Scholar]
  • 28. Markstein M., Markstein, P., Markstein, V., and Levine, M.S2002. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo.Proc. Natl. Acad. Sci.99: 763-768. [Google Scholar]
  • 29. Mismer D. and Rubin, G.M. 1989. Definition of cis-acting elements regulating expression of the Drosophila melanogaster ninaE opsin gene by oligonucleotide-directed mutagenesis.Genetics121: 77-87.
  • 30. Nasiadka A. and Krause, H.M. 1999. Kinetic analysis of segmentation gene interactions in Drosophila embryos.Development126: 1515-1526. [[PubMed]
  • 31. Papatsenko D., Nazina, A., and Desplan, C2001. A conserved regulatory element present in all Drosophila rhodopsin genes mediates Pax6 functions and participates in the fine-tuning of cell-specific expression.Mech. Dev.101: 143-153. [[PubMed][Google Scholar]
  • 32. Papatsenko D.A., Makeev, V.J., Lifanov, A.P., Regnier, M., Nazina, A.G., and Desplan, C2002. Extraction of functional binding sites from unique regulatory regions: The Drosophila early developmental enhancers.Genome Res.12: 470-481. [Google Scholar]
  • 33. Pickert L., Reuter, I., Klawonn, F., and Wingender, E1998. Transcription regulatory region analysis using signal detection and fuzzy clustering [In Process Citation].Bioinformatics14: 244-251. [[PubMed][Google Scholar]
  • 34. Rubin G.M. and Lewis, E.B. 2000. A brief history of Drosophila's contributions to genome research.Science287: 2216-2218. [[PubMed]
  • 35. Sackerson C., Fujioka, M., and Goto, T1999. The even-skipped locus is contained in a 16-kb chromatin domain.Dev. Biol.211: 39-52. [[PubMed][Google Scholar]
  • 36. Serov V.N., Spirov, A.V., and Samsonova, M.G1998. Graphical interface to the genetic network database GeNet.Bioinformatics14: 546-547. [[PubMed][Google Scholar]
  • 37. Small S., Blair, A., and Levine, M1992. Regulation of even-skipped stripe 2 in the Drosophila embryo.EMBO J.11: 4047-4057. [Google Scholar]
  • 38. Small S., Blair, A., and Levine, M1996. Regulation of two pair-rule stripes by a single enhancer in the Drosophila embryo.Dev. Biol.175: 314-324. [[PubMed][Google Scholar]
  • 39. Stanojevic D., Small, S., and Levine, M1991. Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo.Science254: 1385-1387. [[PubMed][Google Scholar]
  • 40. Su X., Wallenstein, S., and Bishop, D2001. Nonoverlapping clusters: Approximate distribution and application to molecular biology.Biometrics57: 420-426. [[PubMed][Google Scholar]
  • 41. Wagner A1997. A computational genomics approach to the identification of gene networks.Nucleic Acids Res.25: 3594-3604. [Google Scholar]
  • 42. Wagner A1999. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes.Bioinformatics15: 776-784. [[PubMed][Google Scholar]
  • 43. Wasserman W.W. and Fickett, J.W. 1998. Identification of regulatory regions which confer muscle-specific gene expression.J. Mol. Biol.278: 167-181. [[PubMed]
  • 44. Waterman M.S., 1995. Introduction to computational biology. Chapmen &amp; Hall, CRC Press LLC, Boca Raton, FL.
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.