Molecular evolution and population genetics of two Drosophila mettleri cytochrome P450 genes involved in host plant utilization.
Journal: 2008/September - Molecular Ecology
ISSN: 1365-294X
Abstract:
Understanding the genetic basis of adaptation is one of the primary goals of evolutionary biology. The evolution of xenobiotic resistance in insects has proven to be an especially suitable arena for studying the genetics of adaptation, and resistant phenotypes are known to result from both coding and regulatory changes. In this study, we examine the evolutionary history and population genetics of two Drosophila mettleri cytochrome P450 genes that are putatively involved in the detoxification of alkaloids present in two of its cactus hosts: saguaro (Carnegiea gigantea) and senita (Lophocereus schottii). Previous studies demonstrated that Cyp28A1 was highly up-regulated following exposure to rotting senita tissue while Cyp4D10 was highly up-regulated following exposure to rotting saguaro tissue. Here, we show that a subset of sites in Cyp28A1 experienced adaptive evolution specifically in the D. mettleri lineage. Moreover, neutrality tests in several populations were also consistent with a history of selection on Cyp28A1. In contrast, we did not find evidence for positive selection on Cyp4D10, although this certainly does not preclude its involvement in host plant use. A surprising result that emerged from our population genetic analyses was the presence of significant genetic differentiation between flies collected from different host plant species (saguaro and senita) at Organ Pipe National Monument, Arizona, USA. This preliminary evidence suggests that D. mettleri may have evolved into distinctive host races that specialize on different hosts, a possibility that warrants further investigation.
Relations:
Content
Citations
(8)
References
(40)
Drugs
(1)
Chemicals
(3)
Organisms
(3)
Processes
(5)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Mol Ecol 17(13): 3211-3221

Molecular evolution and population genetics of two <em>Drosophila mettleri</em> cytochrome P450 genes involved in host plant utilization

Introduction

Understanding the genetic basis of adaptation is one of the primary goals of evolutionary biology. Available evidence suggests that adaptive evolution is driven by changes in both the regulation and structure of genes, though the relative importance of these two mechanisms is an unresolved issue (Hoekstra &amp; Coyne, 2007; Wray, 2007). One of the difficulties associated with identifying the genetic basis of an adaptation is that it requires not only ecological information on how a particular phenotype is adaptive, but also on the genetic mechanisms, either structural or regulatory, that underlie the phenotype. In many systems this link between genetics and phenotype is elusive, usually because of challenges associated with identifying specific genes responsible for a particular phenotype (Vasemagi &amp; Primmer, 2005).

The evolution of xenobiotic resistance in insects, which includes resistance to plant alleleochemicals and insecticides, has proven to be an excellent system for investigating the genetic basis of adaptation. Insect genes involved in detoxification typically belong to three gene superfamilies: esterases, cytochrome P450 monooxygenases (P450s), and glutathione S-transferases (Despres et al., 2007; Li et al., 2007). The identification of candidate genes involved in resistance is further facilitated by the fact that detoxification genes are sometimes transcriptionally induced by xenobiotics or other chemicals, thus making the connection between an adaptive phenotype and the genetic mechanisms underlying it more apparent. Indeed, a number of genes involved in xenobiotic resistance have been identified, and resistant phenotypes have been attributed to several genetic mechanisms including sequence amplifications, regulatory changes, and mutations in coding regions (Li et al., 2007).

Several species of Drosophila that are distributed throughout the deserts of southwestern North America and northern Mexico use rotting cacti as their main breeding substrate. The chemical composition of cactus rots includes a number of highly toxic compounds at concentrations that are intolerable to nonresident fly species (Kircher et al., 1967; Frank &amp; Fogleman, 1992; Fogleman, 2000), implying that flies obligately associated with necrotic cactus have evolved mechanisms of resistance. The range of cactus hosts and their unique chemical compositions, coupled with diverse patterns of host specialization among cactophilic flies, makes the Sonoran Desert Drosophila system especially suitable for studying the genetic basis of adaptation (Fogleman et al., 1998; Fogleman &amp; Danielson, 2001). In fact, several recent studies on D. mojavensis have implicated a number of detoxification genes in the ability of this species to tolerate the toxic environments of its hosts (Matzkin &amp; Eanes, 2003; Matzkin, 2004; Matzkin, 2005; Matzkin et al., 2006; Matzkin, 2008).

Drosophila mettleri is a cactophilic fly that has an unusual preference for breeding in soil soaked with cactus rot exudates rather than in the rotting tissue itself. This behavioral preference means that, due to water loss through high evaporation rates in the desert, larvae of this species are exposed to concentration of toxic compounds that can be an order of magnitude higher than that experienced by other cactophilic species (Fogleman et al., 1982; Meyer &amp; Fogleman, 1987). Although D. mettleri is tolerant of the toxic compounds present in all the major cactus hosts (Fogleman et al., 1998), in nature the main hosts are saguaro (Carnegiea gigantea), cardón (Pachycereus pringlei), and to a lesser extent senita (Lophocereus schottii). Saguaro is present only on the mainland, while cardón and senita are found on the mainland and Baja peninsula. A population on Santa Catalina Island (off the coast of southern California), where none of the mainland columnar hosts occur, uses prickly pear (Opuntia littoralis) exclusively. The primary toxins in all the major hosts of D. mettleri are isoquinoline alkaloids, though the specific alkaloids and their concentrations vary for the different plant species (Kircher, 1982). Most notably, the senita alkaloids lophocerine and pilocereine constitute up to 20% of dry tissue weight (compared to 1-2% for the alkaloids cargenine and gigantine in saguaro tissue), making senita especially toxic (Kircher, 1969; Kircher, 1982).

Several lines of evidence demonstrate the involvement of cytochrome P450s in the ability of D. mettleri to detoxify alkaloids present in saguaro and senita (Fogleman et al., 1998). First, experiments using the P450 inhibitor piperonyl butoxide (PBO) showed a complete loss of larval viability in substrates with alkaloids and PBO (Frank &amp; Fogleman, 1992). Subsequent in vitro experiments revealed significant induction of alkaloid detoxification metabolism upon exposure to senita, which was reduced in the presence of PBO (Frank &amp; Fogleman, 1992; Danielson et al., 1994). Most convincingly, Danielson et al. (1997) and Danielson et al. (1998) demonstrated that a small number of D. mettleri P450s were upregulated in response to saguaro and senita alkaloids. Of these five genes, three were only moderately induced, one of which was cross-induced by both saguaro and senita alkaloids. In contrast, one gene (Cyp4D10) was strongly induced only by saguaro alkaloids (Danielson et al., 1998) and another (Cyp28A1) was strongly induced by senita but not saguaro alkaloids (Danielson et al., 1997). Together, these studies suggest that alkaloid tolerance in D. mettleri may result from the collective actions of multiple P450s including a few “generalist” genes that respond to a variety of compounds, and, perhaps more importantly, “specialists” that respond only to a specific compound. Danielson et al. (1998) argued that these narrowly responsive genes represent the best candidates for direct involvement in the detoxification of cactus compounds.

These studies clearly demonstrate the important role of gene regulation in adaptation to cactus hosts. In this study, we ask whether changes in the coding sequence of the putative “specialist” P450s (Cyp4D10 and Cyp28A1) may also play a role in cactus host detoxification. Specifically, we examined whether these two genes show signatures of positive selection in their protein coding regions, as might be expected given their apparent specialized role in detoxification of alkaloids from specific host plants, and given that D. mettleri's closest relatives, D. micromettleri and D. eremophila, use different host plant species.

In order to investigate potential population specific patterns of selection, we also collected sequence data for both genes from five populations spanning the mainland, Baja peninsula, and Santa Catalina Island. These populations are not only geographically separated, but also differ in the species of host plants that are available, which could have important consequences for patterns of selection on these two putative “specialist” P450 genes. Most notably, given that saguaro and senita are not available on Santa Catalina Island and that this population is genetically distinct from all other populations, (Markow et al., 2002; Hurtado et al., 2004) it is possible that Cyp4D10 and Cyp28A1 have experienced a unique evolutionary history in this location that might be reflected in current patterns of variation. In addition, the absence of saguaro on the Baja peninsula (where cardón is the main host) could result in divergent selection pressures on Cyp4D10 (the gene induced by saguaro), though this potential might be mitigated by the chemical similarity of saguaro and cardón (Frank &amp; Fogleman, 1992), and by substantial gene flow between some mainland and Baja populations (Hurtado et al., 2004). Finally, in one population (Organ Pipe National Monument, Arizona, USA), we collected flies directly from senita and saguaro necroses, which allowed us to test for genetic differentiation between individuals collected from different hosts in the same location.

Materials and Methods

Locations sampled

We used D. mettleri samples collected from five localities: Guaymas (mainland Sonora, Mexico), Bahia Concepción (Baja peninsula, Mexico), Punta Conejo (Baja peninsula, Mexico), Santa Catalina Island (California, USA), and Organ Pipe National Monument (OPNM, Arizona, USA) (Figure 1). Samples from the first four localities were a subset of those from an earlier mtCOI study (Hurtado et al., 2004), but the OPNM samples were not included in the previous study. We collected flies using traps with rotting cardón cactus (Guaymas, Bahia Concepción, Punta Conejo) or fermented bananas (Santa Catalina island), or, in the case of OPNM, flies were collected directly from cactus necroses. At OPNM several flies were collected from four different senita necroses located in the area known as Senita Basin, while other flies were collected from a saguaro necrosis approximately 3.5 km away.

An external file that holds a picture, illustration, etc.
Object name is nihms65258f1.jpg

Map of populations sampled and host plant species present at each location. (1) Santa Catalina Island, (2) Organ Pipe National Monument, (3) Guaymas, (4) Bahia Concepción, (5) Punta Conejo. (A) senita (L. schottii), (B) saguaro (C. gigantea), (C) cardón (P. pringlei), (D) Prickly pear (Opuntia littoralis).

DNA extractions, PCR, and sequencing

We used the DNEasy kit (Qiagen, Inc., Valencia, CA), following manufacturer protocols, to extract total DNA from whole flies. Using sequences posted on GenBank, we designed primers that amplified multiple overlapping fragments to obtain nearly complete sequences for Cyp4D10 and Cyp28A1 using the polymerase chain reaction (PCR). The products were then sequenced with an ABI 3700 analyzer. Information on primers used for PCR and sequencing is given in supplementary table 1, and the numbers of flies sequenced for each population and each gene are reported in table 1. We also obtained sequences for one individual from each of two outgroup species, D. micromettleri (Drosophila Tucson stock center id: 15081-1346.02) and D. eremophila (Drosophila Tucson stock center id: 15081-1292.00). Drosophila mettleri, D. micromettleri, and D. eremophila form a monophyletic group within the repleta group, but the relationships among these species are somewhat uncertain. Data from Durando et al. (2000) suggest that D. mettleri and D. micromettleri are sister species, with D. eremophila as the outgroup. However, the sister relationship between D. mettleri and D. micromettleri is only moderately well-supported (bootstrap support = 75), and data from our study also produced ambiguous results (data not shown). Whatever the true relationships among these species, it is clear that speciation events were not particularly recent. Averaging the average number of nucleotide substitutions per site (Nei, 1987) for the two genes used in our study revealed a considerable amount of divergence between all three species (Dxy = 0.163 for D. mettleri and D. micromettleri; Dxy = 0.172 for D. mettleri and D. eremophila; and Dxy = 0.157 for D. micromettleri and D. eremophila). In order to amplify Cyp4D10 and Cyp28A1 in the D. micromettleri and D. eremophila, we had to place primers within the coding region, thus resulting in a small amount of sequence loss. Again, we used primers that amplified multiple overlapping fragments, some of which were cloned (see supplementary table 1) using the TOPO TA® cloning kit (Invitrogen), in order to obtain clean sequence.

Table 1

Number of individuals sequenced from each locality for two genes: Cyp4D10 and Cyp28A1.

LocalityCyp4D10Cyp28A1
Guaymas1111
OPNM saguaro109
OPNM senita1010
Bahia Concepción1211
Punta Conejo1315
Santa Catalina1212

By querying D. mettleri Cyp4D10 and Cyp28A1 sequences against the D. mojavensis genome database, we determined the probable chromosomal locations of the two P450 genes. From this we inferred that Cyp4D10 is X-linked, which is consistent with the fact that none of the males in our dataset were heterozygotes. In contrast, Cyp28A1 is autosomal, being found on the third chromosome in D. mojavensis (Muller element B). We used Sequencher ver. 4.1 (Gene Codes Corp.) to initially align and edit sequences. All Cyp4D10 sequences were truncated to 2073 bp (entire sequence: 2142 bp), which included two introns (596 bp and 68 bp in length). All Cyp28A1 sequences were truncated to 1774 bp (entire sequence: 1817 bp), which included five introns (60 bp, 59 bp, 58 bp, 58 bp, 57 bp in length). Truncated sequences for both genes corresponded to the length of sequences obtained for the outgroups (no polymorphisms in either gene were observed in the regions cut from D. mettleri). Because we had heterozygotes in our dataset and some of the subsequent analyses required that linkage phase be known (frequency spectrum tests), we used the ELB algorithm (Excoffier et al., 2003) implemented in Arlequin ver. 3.01 (Excoffier et al., 2005) to reconstruct haplotypes. We randomly selected one reconstructed haplotype for each individual and used these in all subsequent analyses. While all tests comparing polymorphism and divergence would be robust to errors in haplotype reconstruction, tests based on the frequency spectrum would most likely be sensitive to errors, but it is unclear what sort of bias, if any, this might introduce into these analyses. A file with the reconstructed haplotypes is found in the supplementary material. All sequences have been deposited in GenBank under accession numbers {"type":"entrez-nucleotide-range","attrs":{"text":"EU658939-EU659078","start_term":"EU658939","end_term":"EU659078","start_term_id":"193503632","end_term_id":"193503910"}}EU658939-EU659078.

Population structure

We used Arlequin ver. 3.01 (Excoffier et al., 2005) to calculate pairwise FST for all pairs of populations separately for each locus. Significance was assessed using the permutation procedure available in Arlequin (1000 permutations). We calculated FST in three ways: (1) with introns only, (2) with synonymous sites only (in case of selective constraint on introns), and (3) with nonsynonymous sites only. The rationale behind this was that discrepancies in inferred structure between noncoding/synonymous and nonsynonymous sites could help to determine whether population structure resulted from selection for different protein haplotypes in different populations, or whether population structure resulted from neutral changes. For example, strong structure at nonsynonymous, but not at noncoding or synonymous sites might indicate divergent selection in different populations, while structure at noncoding or synonymous, but not nonsynonymous sites might indicate selective constraint on protein structure.

Neutrality tests based on the frequency spectrum

We used population data to conduct three different tests of selective neutrality for each locus: Tajima's D (Tajima, 1989), Fu and Li's D, and Fu and Li's F (Fu &amp; Li, 1993). These tests analyze the allele frequency spectrum within populations in order to detect departures from expectations under selective neutrality. All tests were conducted with the computer program DnaSP v. 4.10 (Rozas et al., 2003). Given the significant structure we observed between populations in this study and in the Hurtado et al. (2004) study, we conducted separate analyses for each population. We used D. micromettleri as an outgroup for calculation of Fu and Li's D and Fu and Li's, F. Significance was assessed using 10,000 coalescent simulations using the Hudson's recombination parameter (Hudson, 1997) estimated in DnaSP. Significant tests provide support for departures from a standard neutral model, but are unable to differentiate between the effects of selection and demography. Thus, while significantly negative values may be the result of a recent selective sweep, they may also reflect a recent population expansion following a bottleneck. Likewise, significantly positive values may indicate balancing selection, but could also result from a decrease in population size. The presence of population structure could skew the frequency spectrum in either direction depending on how the subpopulations are sampled. The effects of demographic processes such as bottlenecks are expected to be particularly evident in X-linked genes (Cyp4D10 in this study), because of the lower effective populations size for such loci (Wall et al., 2002).

To further investigate what processes might underlie significant departures from neutral expectations detected by these tests, we performed the three tests separately on synonymous and nonsynonymous sites. The impetus for this approach is that events such as population expansion or a selective sweep should have similar effects on both synonymous and nonsynonymous variation, while a process such as weak purifying selection would be expected to result in a negatively skewed frequency spectrum for nonsynonymous but not synonymous sites (Hahn et al., 2002). We used a heterogeneity test (Hahn et al., 2002) to formally evaluate whether the frequency spectra of nonsynonymous and synonymous sites differed (using 5,000 coalescent simulations).

Neutrality tests based on polymorphism and divergence

We also used our population data to conduct the Hudson, Kreitman, Aguade (HKA) test (Hudson et al., 1987), which compares the ratio of intraspecific polymorphism to interspecific divergence for at least two loci. Under neutrality this ratio is expected to be the same for different genes, and thus any departures from this pattern can provide evidence for selection. By comparing multiple loci, the HKA test is able to better differentiate between the effects of selection and demography because population-level processes should affect all genes equally, while selection is expected to be locus-specific. We used the HKA program (http://lifesci.rutgers.edu/∼heylab/HeylabSoftware.htm) to conduct the test separately for Cyp4D10 and Cyp28A1, in both cases using mtCOI as the second locus (assumed to be evolving neutrally; but see Ballard &amp; Rand, 2005), and D. micromettleri as the outgroup. Sequences for mtCOI were obtained from the Hurtado et al. (2004) dataset (666 bp), with the exception of the OPNM and Punta Conejo populations, for which we sequenced 594 bp and 630 bp respectively (GenBank Accession numbers: EU659079-659106) using primers reported in Folmer et al. (1994). We designated Cyp4D10 as X-linked and Cyp28A1 as autosomal in this analysis.

To further evaluate whether the two P450 genes showed evidence of selection, we also conducted the McDonald-Kreitman (MK) test (McDonald &amp; Kreitman, 1991), which uses population data from the focal species and at least one outgroup to compare the ratio of nonsynonymous to synonymous polymorphisms within a species to that same ratio for fixed differences between species. An excess of nonsynonymous fixed differences between species provides evidence for positive selection, while an excess of nonsynonymous polymorphism is consistent with balancing selection. This test is less sensitive to demographic assumptions than tests based on the frequency spectrum (Nielsen, 2005), although the presence of slightly deleterious mutations in an expanding population can produce a pattern that is similar to that seen for positive selection (Eyre-Walker, 2002). Because few stocks of D. micromettleri and D. eremophila were available, we were unable to gather polymorphism data for either outgroup species. We thus conducted a lineage-specific test using both outgroup sequences to identify changes that occurred specifically in the D. mettleri lineage. Given this strategy, uncertainties in the relationships among the focal species would only affect our analysis if D. mettleri is, in fact, the outgroup to D. micromettleri and D. eremophila. Considering this possibility, we also report the results of the standard MK tests using D. mettleri and each of the two other species as separate tests, which would not rely on the tree topology being correct.

Branch-site test of positive selection

We also tested for evidence of positive selection using a comparative approach based on comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dn/ds or ω). Specifically, we conducted the improved branch-site test for positive selection (Zhang et al., 2005) using the codeml program available in the Phylogenetic Analysis by Maximum Likelihood Package ver. 3.14 (Yang, 1997). This analysis tests the a priori hypothesis that some sites in a designated lineage of a multispecies alignment have been under positive selection (i.e. ω > 1). In our analysis, we tested the a priori hypothesis that some sites in the D. mettleri lineage had experienced positive selection using branch-site model A as the alternative hypothesis and branch-site model A with ω2 = 1 as the null hypothesis.

We used the Clustal platform in the computer program BioEdit to generate multispecies alignments for each gene including sequences from as many Drosophila species with full genome sequence (Drosophila 12 Genome Consortium, 2007) as we could confidently align. For Cyp4D10 this included sequences from D. mettleri, D. micromettleri, D. eremophila, D. mojavensis, D. virilis and D. grimshawi (sequences from other species were difficult to align and therefore excluded). For D. mojavensis, our BLAST produced hits to two genes, so both were included in the alignment (phylogenetic analysis indicated that these two genes were most closely related to each other and thus probably represent a duplication event specific to the D. mojavensis lineage). For Cyp28A1, our alignment included: D. mettleri, D. micromettleri, D. eremophila, D. mojavensis, D. virilis, D. grimshawi (two apparent paralogs), D. willistoni, D. pseudoobscura, D. erecta, D. yakuba, D. sechellia, D. simulans, and D. melanogaster. In order to account for the uncertainty in the phylogenetic relationships among D. mettleri, D. micromettleri, and D. eremophila we performed the analysis with this group coded as a tritomy and also separately with each of the three alternative groupings. All analyses were conducted using the “clean data” setting, which excludes from the analysis sites with gaps in the alignment.

Locations sampled

We used D. mettleri samples collected from five localities: Guaymas (mainland Sonora, Mexico), Bahia Concepción (Baja peninsula, Mexico), Punta Conejo (Baja peninsula, Mexico), Santa Catalina Island (California, USA), and Organ Pipe National Monument (OPNM, Arizona, USA) (Figure 1). Samples from the first four localities were a subset of those from an earlier mtCOI study (Hurtado et al., 2004), but the OPNM samples were not included in the previous study. We collected flies using traps with rotting cardón cactus (Guaymas, Bahia Concepción, Punta Conejo) or fermented bananas (Santa Catalina island), or, in the case of OPNM, flies were collected directly from cactus necroses. At OPNM several flies were collected from four different senita necroses located in the area known as Senita Basin, while other flies were collected from a saguaro necrosis approximately 3.5 km away.

An external file that holds a picture, illustration, etc.
Object name is nihms65258f1.jpg

Map of populations sampled and host plant species present at each location. (1) Santa Catalina Island, (2) Organ Pipe National Monument, (3) Guaymas, (4) Bahia Concepción, (5) Punta Conejo. (A) senita (L. schottii), (B) saguaro (C. gigantea), (C) cardón (P. pringlei), (D) Prickly pear (Opuntia littoralis).

DNA extractions, PCR, and sequencing

We used the DNEasy kit (Qiagen, Inc., Valencia, CA), following manufacturer protocols, to extract total DNA from whole flies. Using sequences posted on GenBank, we designed primers that amplified multiple overlapping fragments to obtain nearly complete sequences for Cyp4D10 and Cyp28A1 using the polymerase chain reaction (PCR). The products were then sequenced with an ABI 3700 analyzer. Information on primers used for PCR and sequencing is given in supplementary table 1, and the numbers of flies sequenced for each population and each gene are reported in table 1. We also obtained sequences for one individual from each of two outgroup species, D. micromettleri (Drosophila Tucson stock center id: 15081-1346.02) and D. eremophila (Drosophila Tucson stock center id: 15081-1292.00). Drosophila mettleri, D. micromettleri, and D. eremophila form a monophyletic group within the repleta group, but the relationships among these species are somewhat uncertain. Data from Durando et al. (2000) suggest that D. mettleri and D. micromettleri are sister species, with D. eremophila as the outgroup. However, the sister relationship between D. mettleri and D. micromettleri is only moderately well-supported (bootstrap support = 75), and data from our study also produced ambiguous results (data not shown). Whatever the true relationships among these species, it is clear that speciation events were not particularly recent. Averaging the average number of nucleotide substitutions per site (Nei, 1987) for the two genes used in our study revealed a considerable amount of divergence between all three species (Dxy = 0.163 for D. mettleri and D. micromettleri; Dxy = 0.172 for D. mettleri and D. eremophila; and Dxy = 0.157 for D. micromettleri and D. eremophila). In order to amplify Cyp4D10 and Cyp28A1 in the D. micromettleri and D. eremophila, we had to place primers within the coding region, thus resulting in a small amount of sequence loss. Again, we used primers that amplified multiple overlapping fragments, some of which were cloned (see supplementary table 1) using the TOPO TA® cloning kit (Invitrogen), in order to obtain clean sequence.

Table 1

Number of individuals sequenced from each locality for two genes: Cyp4D10 and Cyp28A1.

LocalityCyp4D10Cyp28A1
Guaymas1111
OPNM saguaro109
OPNM senita1010
Bahia Concepción1211
Punta Conejo1315
Santa Catalina1212

By querying D. mettleri Cyp4D10 and Cyp28A1 sequences against the D. mojavensis genome database, we determined the probable chromosomal locations of the two P450 genes. From this we inferred that Cyp4D10 is X-linked, which is consistent with the fact that none of the males in our dataset were heterozygotes. In contrast, Cyp28A1 is autosomal, being found on the third chromosome in D. mojavensis (Muller element B). We used Sequencher ver. 4.1 (Gene Codes Corp.) to initially align and edit sequences. All Cyp4D10 sequences were truncated to 2073 bp (entire sequence: 2142 bp), which included two introns (596 bp and 68 bp in length). All Cyp28A1 sequences were truncated to 1774 bp (entire sequence: 1817 bp), which included five introns (60 bp, 59 bp, 58 bp, 58 bp, 57 bp in length). Truncated sequences for both genes corresponded to the length of sequences obtained for the outgroups (no polymorphisms in either gene were observed in the regions cut from D. mettleri). Because we had heterozygotes in our dataset and some of the subsequent analyses required that linkage phase be known (frequency spectrum tests), we used the ELB algorithm (Excoffier et al., 2003) implemented in Arlequin ver. 3.01 (Excoffier et al., 2005) to reconstruct haplotypes. We randomly selected one reconstructed haplotype for each individual and used these in all subsequent analyses. While all tests comparing polymorphism and divergence would be robust to errors in haplotype reconstruction, tests based on the frequency spectrum would most likely be sensitive to errors, but it is unclear what sort of bias, if any, this might introduce into these analyses. A file with the reconstructed haplotypes is found in the supplementary material. All sequences have been deposited in GenBank under accession numbers {"type":"entrez-nucleotide-range","attrs":{"text":"EU658939-EU659078","start_term":"EU658939","end_term":"EU659078","start_term_id":"193503632","end_term_id":"193503910"}}EU658939-EU659078.

Population structure

We used Arlequin ver. 3.01 (Excoffier et al., 2005) to calculate pairwise FST for all pairs of populations separately for each locus. Significance was assessed using the permutation procedure available in Arlequin (1000 permutations). We calculated FST in three ways: (1) with introns only, (2) with synonymous sites only (in case of selective constraint on introns), and (3) with nonsynonymous sites only. The rationale behind this was that discrepancies in inferred structure between noncoding/synonymous and nonsynonymous sites could help to determine whether population structure resulted from selection for different protein haplotypes in different populations, or whether population structure resulted from neutral changes. For example, strong structure at nonsynonymous, but not at noncoding or synonymous sites might indicate divergent selection in different populations, while structure at noncoding or synonymous, but not nonsynonymous sites might indicate selective constraint on protein structure.

Neutrality tests based on the frequency spectrum

We used population data to conduct three different tests of selective neutrality for each locus: Tajima's D (Tajima, 1989), Fu and Li's D, and Fu and Li's F (Fu &amp; Li, 1993). These tests analyze the allele frequency spectrum within populations in order to detect departures from expectations under selective neutrality. All tests were conducted with the computer program DnaSP v. 4.10 (Rozas et al., 2003). Given the significant structure we observed between populations in this study and in the Hurtado et al. (2004) study, we conducted separate analyses for each population. We used D. micromettleri as an outgroup for calculation of Fu and Li's D and Fu and Li's, F. Significance was assessed using 10,000 coalescent simulations using the Hudson's recombination parameter (Hudson, 1997) estimated in DnaSP. Significant tests provide support for departures from a standard neutral model, but are unable to differentiate between the effects of selection and demography. Thus, while significantly negative values may be the result of a recent selective sweep, they may also reflect a recent population expansion following a bottleneck. Likewise, significantly positive values may indicate balancing selection, but could also result from a decrease in population size. The presence of population structure could skew the frequency spectrum in either direction depending on how the subpopulations are sampled. The effects of demographic processes such as bottlenecks are expected to be particularly evident in X-linked genes (Cyp4D10 in this study), because of the lower effective populations size for such loci (Wall et al., 2002).

To further investigate what processes might underlie significant departures from neutral expectations detected by these tests, we performed the three tests separately on synonymous and nonsynonymous sites. The impetus for this approach is that events such as population expansion or a selective sweep should have similar effects on both synonymous and nonsynonymous variation, while a process such as weak purifying selection would be expected to result in a negatively skewed frequency spectrum for nonsynonymous but not synonymous sites (Hahn et al., 2002). We used a heterogeneity test (Hahn et al., 2002) to formally evaluate whether the frequency spectra of nonsynonymous and synonymous sites differed (using 5,000 coalescent simulations).

Neutrality tests based on polymorphism and divergence

We also used our population data to conduct the Hudson, Kreitman, Aguade (HKA) test (Hudson et al., 1987), which compares the ratio of intraspecific polymorphism to interspecific divergence for at least two loci. Under neutrality this ratio is expected to be the same for different genes, and thus any departures from this pattern can provide evidence for selection. By comparing multiple loci, the HKA test is able to better differentiate between the effects of selection and demography because population-level processes should affect all genes equally, while selection is expected to be locus-specific. We used the HKA program (http://lifesci.rutgers.edu/∼heylab/HeylabSoftware.htm) to conduct the test separately for Cyp4D10 and Cyp28A1, in both cases using mtCOI as the second locus (assumed to be evolving neutrally; but see Ballard &amp; Rand, 2005), and D. micromettleri as the outgroup. Sequences for mtCOI were obtained from the Hurtado et al. (2004) dataset (666 bp), with the exception of the OPNM and Punta Conejo populations, for which we sequenced 594 bp and 630 bp respectively (GenBank Accession numbers: EU659079-659106) using primers reported in Folmer et al. (1994). We designated Cyp4D10 as X-linked and Cyp28A1 as autosomal in this analysis.

To further evaluate whether the two P450 genes showed evidence of selection, we also conducted the McDonald-Kreitman (MK) test (McDonald &amp; Kreitman, 1991), which uses population data from the focal species and at least one outgroup to compare the ratio of nonsynonymous to synonymous polymorphisms within a species to that same ratio for fixed differences between species. An excess of nonsynonymous fixed differences between species provides evidence for positive selection, while an excess of nonsynonymous polymorphism is consistent with balancing selection. This test is less sensitive to demographic assumptions than tests based on the frequency spectrum (Nielsen, 2005), although the presence of slightly deleterious mutations in an expanding population can produce a pattern that is similar to that seen for positive selection (Eyre-Walker, 2002). Because few stocks of D. micromettleri and D. eremophila were available, we were unable to gather polymorphism data for either outgroup species. We thus conducted a lineage-specific test using both outgroup sequences to identify changes that occurred specifically in the D. mettleri lineage. Given this strategy, uncertainties in the relationships among the focal species would only affect our analysis if D. mettleri is, in fact, the outgroup to D. micromettleri and D. eremophila. Considering this possibility, we also report the results of the standard MK tests using D. mettleri and each of the two other species as separate tests, which would not rely on the tree topology being correct.

Branch-site test of positive selection

We also tested for evidence of positive selection using a comparative approach based on comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dn/ds or ω). Specifically, we conducted the improved branch-site test for positive selection (Zhang et al., 2005) using the codeml program available in the Phylogenetic Analysis by Maximum Likelihood Package ver. 3.14 (Yang, 1997). This analysis tests the a priori hypothesis that some sites in a designated lineage of a multispecies alignment have been under positive selection (i.e. ω > 1). In our analysis, we tested the a priori hypothesis that some sites in the D. mettleri lineage had experienced positive selection using branch-site model A as the alternative hypothesis and branch-site model A with ω2 = 1 as the null hypothesis.

We used the Clustal platform in the computer program BioEdit to generate multispecies alignments for each gene including sequences from as many Drosophila species with full genome sequence (Drosophila 12 Genome Consortium, 2007) as we could confidently align. For Cyp4D10 this included sequences from D. mettleri, D. micromettleri, D. eremophila, D. mojavensis, D. virilis and D. grimshawi (sequences from other species were difficult to align and therefore excluded). For D. mojavensis, our BLAST produced hits to two genes, so both were included in the alignment (phylogenetic analysis indicated that these two genes were most closely related to each other and thus probably represent a duplication event specific to the D. mojavensis lineage). For Cyp28A1, our alignment included: D. mettleri, D. micromettleri, D. eremophila, D. mojavensis, D. virilis, D. grimshawi (two apparent paralogs), D. willistoni, D. pseudoobscura, D. erecta, D. yakuba, D. sechellia, D. simulans, and D. melanogaster. In order to account for the uncertainty in the phylogenetic relationships among D. mettleri, D. micromettleri, and D. eremophila we performed the analysis with this group coded as a tritomy and also separately with each of the three alternative groupings. All analyses were conducted using the “clean data” setting, which excludes from the analysis sites with gaps in the alignment.

Results

Across all populations, 33 haplotypes (including coding and noncoding regions) representing 9 distinct protein haplotypes were found for Cyp4D10, while only 23 haplotypes (including coding and noncoding) representing 9 distinct protein haplotypes were found for Cyp28A1. In nearly all populations the number of segregating sites (S), nucleotide diversity (π) and θ (per site), were substantially higher for Cyp4D10 than for Cyp28A1, indicating greater within population variation at this locus (Table 2). Separate estimates of S, π, θ for nonsynonymous, synonymous, and intronic sites are presented in supplementary tables 2 and 3.

Table 2

Number of segregating sites (S), Nucleotide diversity (π), and θ (per site) for Cyp4D10 and Cyp28A1. Estimates of π and θ for Cyp4D10 were obtained by multiplying by 4/3 to account for the fact that this gene is X-linked.

PopulationSegregating sites (S)Nucleotide diversity (π)θ (per site)

Cyp4D10Cyp28A1Cyp4D10Cyp28A1Cyp4D10Cyp28A1






Guaymas47220.00880.00260.01050.0042
OPNM all2680.00650.00180.00470.0013
OPNM saguaro1850.00250.00100.00410.0010
OPNM senita0400.000800.0008
Bahia Concepción40190.00840.00250.00850.0037
Punta Conejo3650.00520.00130.00760.0009
Catalina0100.000100.0002

Population structure

The most striking result from the population structure analysis was the significant genetic subdivision between OPNM flies collected from different hosts (senita and saguaro), which was evident in coding and noncoding portions of both loci. The flies collected from senita all had a divergent haplotype at Cyp4D10 that was distinguished by the presence of four unique substitutions, including a three base pair indel in the coding region and another nonsynonymous substitution. Moreover, we examined both reconstructed haplotypes for each senita fly at Cyp28A1 and found that all shared at least one copy of a unique haplotype at this locus, though the protein haplotype was not unique. Three individuals were heterozygous with the second allele being found in the flies collected from saguaro and also in other populations. In contrast, flies collected from saguaro had only one unique synonymous substitution at Cyp4D10, and no unique substitutions at Cyp28A1. To further investigate the apparent host-associated genetic differentiation in this population, we also computed FST using mtCOI. The results from this locus provided further evidence for genetic subdivision between flies collected from the different hosts (FST=0.88; P < 0.05). Indeed, all flies collected from senita shared the same haplotype, which was rare in the Hurtado et al. (2004) dataset. In contrast, eight out of the nine individuals collected from saguaro shared the main D. mettleri haplotype from the Hurtado et al. (2004) study, while one individual had the same haplotype as those collected from senita.

Pairwise FST estimates using noncoding regions and synonymous sites of both genes indicated significant genetic structure among nearly all populations, with the exception of the Bahia Concepción/Guaymas population pair and the Punta Conejo/Guaymas pair, which were not significantly differentiated at either locus, and the OPNM saguaro/Guaymas and OPNM saguaro/Bahia Concepción pairs, which were not differentiated at Cyp28A1 (Tables 3 and and4;4; data for synonymous sites not shown because results were similar to noncoding). For Cyp4D10, FST estimates for nonsynonymous sites revealed a similar pattern, with nearly all estimates being significant. In contrast, for Cyp28A1, five of the 11 population pairs that were significantly differentiated at noncoding sites were not differentiated at nonsynonymous sites (Table 4).

Table 3

Pairwise FST for all pairs of populations using either noncoding/or nonsynonymous sites from Cyp4D10. Significant differences (P< 0.05) are indicated in bold.

GuaymasOPNM sag.OPNM sen.Bahia Conc.Punta Conejo
Guaymas
OPNM sag.0.32/0.37
OPNM sen.0.29/0.890.67/0.98
Bahia Conc.-0.05/-0.030.36/0.540.33/0.92
Punta Conejo0.08/.080.42/0.820.41/0.980.13/0.04
Catalina0.91/0.690.98/0.951.00/1.000.92/0.760.88/0.93

Table 4

Pairwise FST for all pairs of populations using either noncoding/or nonsynonymous sites from Cyp28A1. Significant differences (P < 0.05) are indicated in bold.

GuaymasOPNM sag.OPNM sen.Bahia Conc.Punta Conejo
Guaymas
OPNM sag.0.04/0.14
OPNM sen.0.53/0.200.57/0.63
Bahia Conc.0.01/0.030.03/0.100.36/0.15
Punta Conejo0.05/-0.020.20/0.190.70/0.160.08/-0.01
Catalina0.60/0.230.71/0.660.88/0.000.51/0.140.80/0.19

Neutrality tests based on the frequency spectrum

Significant departures from expectations under neutrality were detected for several populations for both Cyp4D10 and Cyp28A1 (Table 5). In nearly all cases where significant deviations were found, the values were negative, indicating an excess of rare alleles in the population. Heterogeneity tests on differences between the frequency spectra of nonsynonymous and synonymous site classes revealed only a few cases where these differed (Cyp4D10 Bahia Concepcion Tajima's D and Fu and Li's F; Cyp4D10 Organ Pipe all samples, Fu and Li's F; supplementary table 4). In the case of the Bahia Concepción population the detected heterogeneity resulted from an excess of low frequency variants at nonsynonymous sites, which is consistent with weak purifying selection leading to mildly deleterious mutations segregating in the population. This was not, however, a general pattern for Cyp4D10 across all populations, which makes it difficult to generalize about the possible effects of purifying selection on this gene overall. We do note, however, that because Cyp4D10 is sex-linked, and thus the effective population size is smaller, a higher frequency of slightly deleterious mutations segregating in populations is expected relative to autosomal genes if weak purifying selection is occurring. The frequency spectrum observed for the OPNM population as a whole (including flies collected from both saguaro and senita) was unique, since it was the only population that contained positive values for all statistics at both loci, some of which were significant (Table 5). This likely resulted from the population subdivision between flies collected from the two host species. We therefore report values for this population as a whole and also for flies collected from each host species separately, though some tests were not possible due to a lack of polymorphism.

Table 5

Results of Tajima's D, Fu and Li's D, Fu and Li's F, and HKA tests on Cyp4D10 and Cyp28A1 for the five populations used in this study. HKA tests used mtCOI as the second locus.

Tajima's DFu &amp; Li's DFu &amp; Li's FHKA (X)

Cyp4D10Cyp28A1Cyp4D10Cyp28A1Cyp4D10Cyp28A1Cyp4D10Cyp28A1








Guaymas-0.78**-1.76**-0.64-2.89**-0.82*-3.08**0.016.35**
OPNM (all)1.51**1.300.141.320.811.58*1.470.39
OPNM sag.-1.83**-0.14-0.420.42-0.970.310.72--
OPNM sen.--0.02--1.15--1.03----
Bahia Conc.-0.06-1.40**0.09-2.11**0.08-2.38**0.022.53*
Punta Conejo-1.37*1.47-1.77**1.18-1.96**1.440.366.12**
Catalina---1.14---1.42---1.57----
P< 0.10;
P< 0.05

-- polymorphism data insufficient to conduct test

Neutrality tests based on polymorphism and divergence

None of the HKA tests with Cyp4D10 detected any significant departures from the neutral expectation (Table 5; HKA table shown in supplementary table 5). In contrast, for Cyp28A1, significant departures from neutrality were detected in all populations for which the test could be conducted with the exception of the OPNM population as a whole (Table 5; HKA table shown in supplementary table 5). In this analysis we assumed that mtCOI was evolving neutrally; any violations of this assumption could have implications for the interpretation of our results. In fact, data from Hurtado et al. (2004) indicated that mtCOI was not evolving neutrally in a few D. mettleri populations, as some values of Tajima's D were significantly negative. A negative Tajima's D could be the result of a number of underlying processes, each having different implications for the interpretation of our HKA results. First, a negative Tajima's D could result from a recent population expansion. Because demographic processes are expected to affect all loci in a similar manner, this is unlikely to have any significant consequences for the interpretation of our results. Alternatively, a negative Tajima's D could indicate a history of positive selection on mtCOI. If true, this would make our HKA test conservative. Most problematic would be the possibility that the negative values are the result of mildly deleterious alleles segregating in these populations, something that has been shown previously for mitochondrial genes (Ballard &amp; Rand, 2005). To further investigate this possibility, we reanalyzed the Hurtado et al. (2004)mtCOI dataset this time partitioning nonsynonymous and synonymous variation and performing the heterogeneity test described previously (Hahn et al., 2002). Although this analysis was not possible for all populations due to a lack of polymorphism in one of the site classes, the results for three populations (Guaymas, Bahia Concepcion, and OPNM as a whole) gave no indication of excess nonsynonymous polymorphism (data not shown). We also note that similar results were obtained with an HKA test comparing mtCOI, Cyp4D10, and Cyp28a1 (data not shown). None of the lineage specific or standard MK tests were significant for either gene (supplementary table 6; data for standard tests not shown).

Branch-site tests of positive selection

The branch-site test for Cyp28A1 indicated that some sites in the D. mettleri branch have experienced positive selection (Table 6). The Bayes Empirical Bayes (BEB) follow-up analysis identified two sites that had high probabilities of belonging to the positively selected site class (residue 141, P = 0.99; residue 479, P = 0.99; Table 6). These two amino acid substitutions were unique to D. mettleri, with the first involving a change from asparagine to glutamine, and the second from threonine to glycine. Furthermore, both sites were fixed in all D. mettleri populations. Although we report only the results of the analysis that coded the relationships among D. mettleri, D. micromettleri, and D. eremophila as a tritomy, the other analyses that considered all possible relationships among these species produced similar results with the same two sites being significant (data not shown). In contrast, the branch-site test for Cyp4D10 was not significant for any of the possible groupings (Table 6).

Table 6

Log likelihood values and log ratio tests for the branch-site test of positive selection, with D. mettleri designated as the foreground branch.

GeneModelLog likelihoodParameter estimatesLRTPositively selected sites1
Cyp4D10Branch-site model A, ω2=1 fixed-5949.80p0=0.80, p1=0.16, ω0=0.06, ω12=1
Branch-site model A, ω2>1-5949.55p0=0.82, p1=0.16, ω0=0.06, ω2=3.370.49None
Cyp28A1Branch-site model A, ω2=1 fixed-9523.28p0=0.82, p1=0.12, ω0=0.08, ω12=1
Branch-site model A, ω2>1-9521.28p0=0.86, p1=0.13, ω0=0.08, ω2=6.263.99*141Q, 479G
P<0.05
From the Bayes Empirical Bayes (BEB) analysis

Population structure

The most striking result from the population structure analysis was the significant genetic subdivision between OPNM flies collected from different hosts (senita and saguaro), which was evident in coding and noncoding portions of both loci. The flies collected from senita all had a divergent haplotype at Cyp4D10 that was distinguished by the presence of four unique substitutions, including a three base pair indel in the coding region and another nonsynonymous substitution. Moreover, we examined both reconstructed haplotypes for each senita fly at Cyp28A1 and found that all shared at least one copy of a unique haplotype at this locus, though the protein haplotype was not unique. Three individuals were heterozygous with the second allele being found in the flies collected from saguaro and also in other populations. In contrast, flies collected from saguaro had only one unique synonymous substitution at Cyp4D10, and no unique substitutions at Cyp28A1. To further investigate the apparent host-associated genetic differentiation in this population, we also computed FST using mtCOI. The results from this locus provided further evidence for genetic subdivision between flies collected from the different hosts (FST=0.88; P < 0.05). Indeed, all flies collected from senita shared the same haplotype, which was rare in the Hurtado et al. (2004) dataset. In contrast, eight out of the nine individuals collected from saguaro shared the main D. mettleri haplotype from the Hurtado et al. (2004) study, while one individual had the same haplotype as those collected from senita.

Pairwise FST estimates using noncoding regions and synonymous sites of both genes indicated significant genetic structure among nearly all populations, with the exception of the Bahia Concepción/Guaymas population pair and the Punta Conejo/Guaymas pair, which were not significantly differentiated at either locus, and the OPNM saguaro/Guaymas and OPNM saguaro/Bahia Concepción pairs, which were not differentiated at Cyp28A1 (Tables 3 and and4;4; data for synonymous sites not shown because results were similar to noncoding). For Cyp4D10, FST estimates for nonsynonymous sites revealed a similar pattern, with nearly all estimates being significant. In contrast, for Cyp28A1, five of the 11 population pairs that were significantly differentiated at noncoding sites were not differentiated at nonsynonymous sites (Table 4).

Table 3

Pairwise FST for all pairs of populations using either noncoding/or nonsynonymous sites from Cyp4D10. Significant differences (P< 0.05) are indicated in bold.

GuaymasOPNM sag.OPNM sen.Bahia Conc.Punta Conejo
Guaymas
OPNM sag.0.32/0.37
OPNM sen.0.29/0.890.67/0.98
Bahia Conc.-0.05/-0.030.36/0.540.33/0.92
Punta Conejo0.08/.080.42/0.820.41/0.980.13/0.04
Catalina0.91/0.690.98/0.951.00/1.000.92/0.760.88/0.93

Table 4

Pairwise FST for all pairs of populations using either noncoding/or nonsynonymous sites from Cyp28A1. Significant differences (P < 0.05) are indicated in bold.

GuaymasOPNM sag.OPNM sen.Bahia Conc.Punta Conejo
Guaymas
OPNM sag.0.04/0.14
OPNM sen.0.53/0.200.57/0.63
Bahia Conc.0.01/0.030.03/0.100.36/0.15
Punta Conejo0.05/-0.020.20/0.190.70/0.160.08/-0.01
Catalina0.60/0.230.71/0.660.88/0.000.51/0.140.80/0.19

Neutrality tests based on the frequency spectrum

Significant departures from expectations under neutrality were detected for several populations for both Cyp4D10 and Cyp28A1 (Table 5). In nearly all cases where significant deviations were found, the values were negative, indicating an excess of rare alleles in the population. Heterogeneity tests on differences between the frequency spectra of nonsynonymous and synonymous site classes revealed only a few cases where these differed (Cyp4D10 Bahia Concepcion Tajima's D and Fu and Li's F; Cyp4D10 Organ Pipe all samples, Fu and Li's F; supplementary table 4). In the case of the Bahia Concepción population the detected heterogeneity resulted from an excess of low frequency variants at nonsynonymous sites, which is consistent with weak purifying selection leading to mildly deleterious mutations segregating in the population. This was not, however, a general pattern for Cyp4D10 across all populations, which makes it difficult to generalize about the possible effects of purifying selection on this gene overall. We do note, however, that because Cyp4D10 is sex-linked, and thus the effective population size is smaller, a higher frequency of slightly deleterious mutations segregating in populations is expected relative to autosomal genes if weak purifying selection is occurring. The frequency spectrum observed for the OPNM population as a whole (including flies collected from both saguaro and senita) was unique, since it was the only population that contained positive values for all statistics at both loci, some of which were significant (Table 5). This likely resulted from the population subdivision between flies collected from the two host species. We therefore report values for this population as a whole and also for flies collected from each host species separately, though some tests were not possible due to a lack of polymorphism.

Table 5

Results of Tajima's D, Fu and Li's D, Fu and Li's F, and HKA tests on Cyp4D10 and Cyp28A1 for the five populations used in this study. HKA tests used mtCOI as the second locus.

Tajima's DFu &amp; Li's DFu &amp; Li's FHKA (X)

Cyp4D10Cyp28A1Cyp4D10Cyp28A1Cyp4D10Cyp28A1Cyp4D10Cyp28A1








Guaymas-0.78**-1.76**-0.64-2.89**-0.82*-3.08**0.016.35**
OPNM (all)1.51**1.300.141.320.811.58*1.470.39
OPNM sag.-1.83**-0.14-0.420.42-0.970.310.72--
OPNM sen.--0.02--1.15--1.03----
Bahia Conc.-0.06-1.40**0.09-2.11**0.08-2.38**0.022.53*
Punta Conejo-1.37*1.47-1.77**1.18-1.96**1.440.366.12**
Catalina---1.14---1.42---1.57----
P< 0.10;
P< 0.05

-- polymorphism data insufficient to conduct test

Neutrality tests based on polymorphism and divergence

None of the HKA tests with Cyp4D10 detected any significant departures from the neutral expectation (Table 5; HKA table shown in supplementary table 5). In contrast, for Cyp28A1, significant departures from neutrality were detected in all populations for which the test could be conducted with the exception of the OPNM population as a whole (Table 5; HKA table shown in supplementary table 5). In this analysis we assumed that mtCOI was evolving neutrally; any violations of this assumption could have implications for the interpretation of our results. In fact, data from Hurtado et al. (2004) indicated that mtCOI was not evolving neutrally in a few D. mettleri populations, as some values of Tajima's D were significantly negative. A negative Tajima's D could be the result of a number of underlying processes, each having different implications for the interpretation of our HKA results. First, a negative Tajima's D could result from a recent population expansion. Because demographic processes are expected to affect all loci in a similar manner, this is unlikely to have any significant consequences for the interpretation of our results. Alternatively, a negative Tajima's D could indicate a history of positive selection on mtCOI. If true, this would make our HKA test conservative. Most problematic would be the possibility that the negative values are the result of mildly deleterious alleles segregating in these populations, something that has been shown previously for mitochondrial genes (Ballard &amp; Rand, 2005). To further investigate this possibility, we reanalyzed the Hurtado et al. (2004)mtCOI dataset this time partitioning nonsynonymous and synonymous variation and performing the heterogeneity test described previously (Hahn et al., 2002). Although this analysis was not possible for all populations due to a lack of polymorphism in one of the site classes, the results for three populations (Guaymas, Bahia Concepcion, and OPNM as a whole) gave no indication of excess nonsynonymous polymorphism (data not shown). We also note that similar results were obtained with an HKA test comparing mtCOI, Cyp4D10, and Cyp28a1 (data not shown). None of the lineage specific or standard MK tests were significant for either gene (supplementary table 6; data for standard tests not shown).

Branch-site tests of positive selection

The branch-site test for Cyp28A1 indicated that some sites in the D. mettleri branch have experienced positive selection (Table 6). The Bayes Empirical Bayes (BEB) follow-up analysis identified two sites that had high probabilities of belonging to the positively selected site class (residue 141, P = 0.99; residue 479, P = 0.99; Table 6). These two amino acid substitutions were unique to D. mettleri, with the first involving a change from asparagine to glutamine, and the second from threonine to glycine. Furthermore, both sites were fixed in all D. mettleri populations. Although we report only the results of the analysis that coded the relationships among D. mettleri, D. micromettleri, and D. eremophila as a tritomy, the other analyses that considered all possible relationships among these species produced similar results with the same two sites being significant (data not shown). In contrast, the branch-site test for Cyp4D10 was not significant for any of the possible groupings (Table 6).

Table 6

Log likelihood values and log ratio tests for the branch-site test of positive selection, with D. mettleri designated as the foreground branch.

GeneModelLog likelihoodParameter estimatesLRTPositively selected sites1
Cyp4D10Branch-site model A, ω2=1 fixed-5949.80p0=0.80, p1=0.16, ω0=0.06, ω12=1
Branch-site model A, ω2>1-5949.55p0=0.82, p1=0.16, ω0=0.06, ω2=3.370.49None
Cyp28A1Branch-site model A, ω2=1 fixed-9523.28p0=0.82, p1=0.12, ω0=0.08, ω12=1
Branch-site model A, ω2>1-9521.28p0=0.86, p1=0.13, ω0=0.08, ω2=6.263.99*141Q, 479G
P<0.05
From the Bayes Empirical Bayes (BEB) analysis

Discussion

In this study we demonstrate that the D. mettleri gene Cyp28A1, which previously has been shown to be upregulated in response to host plant (senita) alkaloids (Danielson et al., 1997), also has been under positive selection in the D. mettleri lineage. Our results are thus consistent with the growing body of evidence that both coding and regulatory changes play pivotal roles in adaptive evolution (Hoekstra &amp; Coyne, 2007; Wray, 2007). Although our analysis identified only a small number of sites in this gene that have been under positive selection, the amino acid change at one of these (threonine to glycine) involves the substitution of a non-polar for a polar amino acid, which could alter the three dimensional structure of the protein. This, coupled with the fact that as little as a single amino acid change can drastically alter the substrate specificity of P450s (Von Wachenfeldt &amp; Johnson, 1995), suggests that these positively selected changes could have important consequences for protein functionality. Moreover, other nonsynonymous substitutions that have occurred in the D. mettleri lineage since the divergence from D. micromettleri/D. eremophila (18 total) also may be functionally important, even if they were not specifically identified as being positively selected.

Several lines of evidence from our population genetic data also suggest a history of selection on Cyp28A1. First, we found significantly negative values for Tajima's D, Fu and Li's D, and Fu and Li's F in two populations (Bahia Concepción and Guaymas). While an excess of low frequency variants would be expected for a relatively short period of time following a selective sweep, differentiating between this and possible alternative explanations such as population expansion or weak purifying selection, is challenging. Nevertheless, the fact that the excess polymorphism was not found exclusively at nonsynonymous sites is inconsistent with weak purifying selection being the primary explanation. Moreover, though it is difficult to disentangle the effects of selection and demography, examining multiple loci can aid in this process because the demographic history of a population should have a similar effect on all loci, while the effects of selection should be locus-specific. Given that some neutrality tests using Cyp4D10, and mtCOI (Hurtado et al. 2004), also indicated a significantly skewed frequency spectrum toward low frequency variants, it seems reasonable to conclude that demographic processes have at least partially shaped the pattern of allele frequency variation in D. mettleri populations, though this could also result from pervasive positive selection and/or hitchhiking throughout the genome (Hahn, 2008). Despite this, HKA test results were still consistent with a scenario of selection on Cyp28A1 (or a gene linked to it) in all but one population. In contrast, all MK tests, which would provide the most direct evidence for selection, were non significant. One possible explanation for this discrepancy is that the MK test would have little power to detect positive selection when the number of fixations, both synonymous and nonsynonymous, is high (e.g. due to a long divergence time), especially if only a small number of nonsynonymous substitutions provided the selective advantage.

Taken together, the results of the battery of tests for selection we performed suggest a history of positive selection on Cyp28A1. The strongest support for this inference comes from tests that have more power to detect selection episodes that occurred over longer time scales (branch-site test and HKA test), while tests aimed at detecting more recent selection provided either modest support or no support at all for a scenario of selection on this gene (frequency spectrum statistics and MK tests). This, coupled with the fact that the two amino acid sites that were identified as being positively selected are fixed in all populations, leads us to conclude that the signatures of positive selection that we detected are probably the result of events that occurred in the distant past.

Subsequent to apparent historical positive selection, it does not appear that Cyp28A1 has been under divergent selection pressures in different populations. Indeed, while analysis of noncoding regions (introns) and synonymous sites revealed strong population structure between most population pairs, this structure was reduced or disappeared altogether when only amino acid coding sites were considered. This result is not particularly surprising given that senita (the host associated with Cyp28A1) is present on both the mainland and Baja peninsula. Interestingly, flies from Santa Catalina Island, where senita is not available, were still significantly genetically differentiated from almost all other populations, even when only amino acid changing sites were considered. This structure does not, however, result from nonsynonymous fixed differences between the Santa Catalina Island population and the others, but rather from the fact that several nonsynonymous sites that are polymorphic in the other populations are fixed in the Santa Catalina Island population. Given this, it does not appear that the observed pattern of structure reflects the action of divergent selection between these populations, or relaxed constraint on Santa Catalina Island.

In contrast to an apparent history of selection on Cyp28A1, we found no strong evidence for positive selection on Cyp4D10 using either comparative or population genetic approaches. Although some tests based on the frequency spectrum produced patterns consistent with positive selection, subsequent analyses did not allow us to rule out demographic effects as the cause for violations of neutral expectations. This, of course, in no way precludes an important role for this gene in the detoxification of host plant compounds, especially considering the fact that the gene was upregulated in response to saguaro alkaloids in an earlier experiment (Danielson et al., 1998). Furthermore, even though Cyp4D10 did not show a signature of adaptive amino acid evolution, eight amino acid substitutions have occurred and become fixed in the D. mettleri lineage, any of which could be important functionally.

A surprising result from this study was the significant level of genetic differentiation between OPNM flies collected from two different host plant species (saguaro and senita). Host switching and subsequent specialization has long been identified as one of the primary mechanisms for generating the high levels of diversity observed in plant-feeding arthropods (Bernays &amp; Graham, 1988; Mitter et al., 1988; Berlocher &amp; Feder, 2002; Funk et al., 2002). Until now it has been assumed that D. mettleri is a generalist capable of using several different host species. Our results, however, suggest that flies breeding in saguaro and senita may actually represent genetically differentiated host races. Our inference is limited to some degree by small sample sizes, but, even so, the fact that flies collected from senita, in addition to being genetically differentiated from flies collected from saguaro, also had a unique Cyp4D10 protein haplotype that was quite divergent, suggests that our inference is probably not a sampling artifact. The divergent Cyp4D10 haplotype (induced by saguaro) is particularly interesting because it could be the result of relaxed selective constraint or selection for a novel function if, in fact, these flies have become specialized on senita. Without data from more locations we cannot say whether the pattern of genetic differentiation we observed is a general one across the entire range of D. mettleri, or whether the results from OPNM are an anomaly. More extensive geographic sampling along with genetic, behavioral, and ecological studies on D. mettleri collected from different host species are necessary in order to fully evaluate the possible association between host plant use and genetic differentiation.

Overall patterns of genetic differentiation among populations at the two P450 loci largely corroborate the findings reported by Hurtado et al. (2004) for mtCOI. Most notably, in that study, and in an earlier allozyme study (Markow et al., 2002), the population on Santa Catalina Island was highly differentiated from all other populations. Our data for both coding and noncoding regions of Cyp4D10 and Cyp28A1 show this same pattern, which is not unexpected given that this is an island population and suitable cactus hosts are not present on neighboring stretches of the mainland. Nevertheless, mating experiments between D. mettleri from Santa Catalina Island and flies from various Baja and mainland populations however, show little, if any, sexual isolation (Markow et al., 2002). This situation is paralleled in D. mojavensis, another cactophilic species, where a highly genetically differentiated host race on Santa Catalina Island is not sexually isolated from other populations of this species (Markow et al., 1983; Machado et al., 2007; Reed et al., 2007).

The results of our study, together with previous work on D. mettleri (Frank &amp; Fogleman, 1992; Danielson et al., 1994; Danielson et al., 1995; Danielson et al., 1997; Danielson et al., 1998) and more recent work on D. mojavensis (Matzkin &amp; Eanes, 2003; Matzkin, 2004; Matzkin, 2005; Matzkin et al., 2006; Matzkin, 2008), suggest a prominent role for both regulatory and coding sequence changes in the adaptation of Sonoran Desert Drosophila to their cactus hosts. Differences in host specialization among these, and other, cactophilic flies indicate that the Sonoran Desert Drosophila system holds great promise as a system for investigating the genetic basis of adaptation.

Acknowledgments

We thank members of the Markow lab for useful discussions and four anonymous reviewers for constructive comments on the manuscript. Funding for this study came from an NIH Postdoctoral Excellence in Research and Teaching (PERT) Fellowship from the Center for Insect Science awarded to J.M.B.

Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721
Drosophila Tucson Stock Center, University of Arizona, Tucson, AZ 85721
Corresponding author: Jeremy M. Bono, Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, ude.anozira.liame@onobj, Fax: 520 626 3522

Abstract

Understanding the genetic basis of adaptation is one of the primary goals of evolutionary biology. The evolution of xenobiotic resistance in insects has proven to be an especially suitable arena for studying the genetics of adaptation, and resistant phenotypes are known to result from both coding and regulatory changes. In this study, we examine the evolutionary history and population genetics of two Drosophila mettleri cytochrome P450 genes that are putatively involved in the detoxification of alkaloids present in two of its cactus hosts: saguaro (Carnegiea gigantea) and senita (Lophocereus schottii). Previous studies demonstrated that Cyp28A1 was highly upregulated following exposure to rotting senita tissue while Cyp4D10 was highly upregulated following exposure to rotting saguaro tissue. Here, we show that a subset of sites in Cyp28A1 experienced adaptive evolution specifically in the D. mettleri lineage. Moreover, neutrality tests in several populations were also consistent with a history of selection on Cyp28A1. In contrast, we did not find evidence for positive selection on Cyp4D10, though this certainly does not preclude its involvement in host plant use. A surprising result that emerged from our population genetic analyses was the presence of significant genetic differentiation between flies collected from different host plant species (saguaro and senita) at Organ Pipe National Monument, Arizona, USA. This preliminary evidence suggests that D. mettleri may have evolved into distinctive host races that specialize on different hosts, a possibility that warrants further investigation.

Keywords: cytochrome P450, adaptation, host plant, positive selection, saguaro cactus, senita cactus
Abstract
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.