Genetics 178(2): 1073-1083

The Molecular Basis of Host Adaptation in Cactophilic Drosophila: Molecular Evolution of a Glutathione S-Transferase Gene (GstD1) in Drosophila mojavensis

MATERIALS AND METHODS

Sampling and sequencing:

All fly samples used in this study, with the exception of D. navojoa (outgroup) and one D. arizonae line, were collected from the field and maintained as isofemale lines on Banana/Opuntia media. The four D. navojoa lines sampled were obtained from the Tucson Drosophila Species Stock Center [stocks 15081-1374.01 (Tehuantepec, Mexico), 15081-1374.10 (El Dorado, Mexico), 15081-1374.11, and 15081-1374.12 (both from Jalisco, Mexico)]. The collection sites of all samples used in this study are shown in Figure 1. The collection of D. arizonae included 20 lines from the northern population (7 from Navojoa, Mexico, 6 from Riverside, CA, and 8 from Tucson, AZ) and 5 from the southern population [4 from Hidalgo, Mexico (1 was from the Stock Center, 15081-1271.05) and 1 from Chiapas, Mexico]. A total of 63 lines were used for D. mojavensis: 15 from the Mojave population (7 from Grand Canyon and 8 from Anza-Borrego), 8 from the Catalina Island population, 17 from the Baja California population (La Paz, Mexico), and 23 from the mainland Sonora population (3 from Desemboque, Mexico, 4 from Hermosillo, Mexico, and 16 from Guaymas, Mexico).

An external file that holds a picture, illustration, etc.
Object name is GEN17821073f1.jpg

Figure 1.—

Collecting locations for D. mojavensis (squares), D. arizonae (circles), and D. navojoa (triangles): (1) Catalina Island, California; (2) Anza-Borrego, California; (3) Grand Canyon, Arizona; (4) Hermosillo, Mexico; (5) Desemboque, Mexico; (6) Guaymas, Mexico; (7) La Paz, Mexico; (8) Riverside, California; (9) Tucson, Arizona; (10) Navojoa, Mexico; (11) Hidalgo, Mexico; (12) Chiapas, Mexico; (13) El Dorado, Mexico; (14) Jalisco, Mexico; (15) Tehuantepec, Mexico.

The complete nucleotide sequence of the D. mojavensis GstD1 locus was obtained by querying the sequenced clone from our previous study (Matzkinet al. 2006) to the D. mojavensis genome sequence (http://rana.lbl.gov/drosophila/). The complete coding sequence of GstD1 is 630 bp long and like other D-class Gst's it is intronless. The entire GstD1 coding region plus 99 bp upstream from the initiation codon was sequenced in all the 93 lines mentioned above. Genomic DNA was extracted from one individual per isofemale line using DNeasy columns (QIAGEN, Valencia, CA). PCR amplification of the GstD1 gene region was done using the following forward and reverse primers: 5′ CATGAGCCCGATATTTAATT 3′ and 5′ TGGAGCTCAATCGAAGTATT 3′. Amplifications were done at 54° annealing temperature in an Eppendorf MasterCycler, using Invitrogen (San Diego) Taq DNA polymerase. Sequencing of both strands using the above primers was done in an Applied Biosystems (Foster City, CA) 3730XL DNA Analyzer housed at the Genomic Analysis and Technology Core Facility at the University of Arizona. All sequences are stored under GenBank accession nos. {"type":"entrez-nucleotide","attrs":{"text":"EU079380","term_id":"158634713"}}EU079380–{"type":"entrez-nucleotide","attrs":{"text":"EU079471","term_id":"158634895"}}EU079471.

Data analysis:

All chromatographs were aligned and sequence contigs were assembled using Sequencher v4.6 (Gene Codes, Ann Arbor, MI) software. As stated above, most sequences were derived from individuals from isofemale lines, and therefore it is expected that there will be segregating polymorphisms within the sequence strains. Knowledge of the linkage phase is not necessary, however, for certain analyses that test departure from neutrality, such as the McDonald–Kreitman (MK) test (McDonald and Kreitman 1991). Given that polymorphism can lead to fixations, under neutrality the ratio of synonymous to nonsynonymous polymorphisms and fixations should be similar. The MK test determines via a G-test if there is a difference between these two ratios (i.e., departure from neutrality); hence only knowledge of the number of fixations and polymorphisms is needed. Conversely, linkage phase is needed to implement analysis on the basis of pairwise nucleotide differences, as well for the purpose of examining phylogenetic relationships. Therefore, the linkage phase had to be inferred from the data set. Several methods exist for haplotype reconstruction. The earliest method created was based on parsimony (Clark 1990). Although it was simple, it had a high error rate. Recently both maximum-likelihood and Bayesian methods have been devised. Two such a methods (HAP and ELB) were utilized in this study. Halperin and Eskin (2004) devised a maximum-likelihood algorithm (HAP) that breaks up the sequence into a block of limited diversity. This algorithm was implemented using the HAP website (http://research.calit2.net/hap/). Although, HAP appears to be faster and more accurate than previous methods (Halperin and Eskin 2004), its performance using data with known recombination is still to be determined. One method that has proved to have the lowest error rate using markers with recombination is the pseudo-Bayesian ELB algorithm (Excoffieret al. 2003). The ELB method was implemented from the Arlequin (ver. 3.01) software package (Excoffieret al. 2005). After both haplotype reconstructions were reconciled (see results), one haplotype was randomly chosen per individual. This data set of reconstructed haplotypes was used in all subsequent analyses except when otherwise noted.

Evolutionary analyses of the sequence variation at GstD1 were done mostly using the DnaSP ver. 4.10 (Rozaset al. 2003) and Arlequin ver. 3.01 (Excoffieret al. 2005) software packages. Historical selection and/or demographic changes can leave their mark in the frequency spectrum and number of sequence polymorphisms (Nielsen 2001); Tajima's D (Tajima 1989), Fu and Li's D (Fu and Li 1993), Fu's F_S (Fu 1997), and Fay and Wu's H (Fay and Wu 2000) test statistics were used to examine this pattern. Significant negative values are due to large numbers of low-frequency alleles, suggesting that a sweep or a bottleneck has occurred in the past. Positive values are a result of few alleles being maintained at relative high frequency, suggesting balancing selection. Conversely, selective neutrality tests that are independent of genealogy or those in which genealogy can be removed, such as the MK test, are robust to demographic changes (Nielsen 2001). The evolutionary history of GstD1 was examined using the MK test, and additionally an outgroup (D. navojoa) was used to polarize the fixed differences. Therefore lineage-specific MK tests were performed for each species as well as for each of the D. mojavensis populations. Statistical significance for the frequency-distribution tests and the 95% confidence intervals for θ was estimated by performing 10,000 coalescent simulations incorporating the estimated recombination rate for the particular population tested (DnaSP and Arlequin were used for these analyses). Recombination rates were calculated using the maximum-likelihood method of Hudson (2001). The coalescent simulations of θ for D. arizonae and the four D. mojavensis populations were used to examine for the presence of significant differences in the level of variation between populations and/or species.

Phylogenetic analysis was performed using both a maximum-likelihood (ML) and a Bayesian approach. Individuals with identical haplotypes were combined. The ML analysis was done using the program PAUP* ver. 4.0b10 (Swofford 2003) and for the Bayesian analysis the program MrBayes ver. 3.1.2 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003) was used. Evolutionary models were chosen using the programs Modeltest ver. 3.7 (Posada and Crandall 1998) and MrModeltest ver. 2.2 (Nylander 2004). Phylogenetic robustness of the ML analysis was determined by performing 1000 bootstrap replicates. The Bayesian analysis was done using four chains (three heated and one cold) of 10,000,000 generations, sampling every 100 generations and discarding the first 1000 trees. Since GstD1 is a nuclear recombining gene, the phylogenetic analyses were used to visually illustrate the relationships between the populations and species and not of the individuals sampled.

Sampling and sequencing:

Figure 1.—

Data analysis:

RESULTS

Intra- and interspecific variation at GstD1:

Of the 93 individuals sequenced, only 16 contained ambiguous sites and only 7 contained more than two ambiguous sites. For the most part the HAP and ELB algorithms behaved similarly. In eight cases the haplotype predictions between the HAP and ELB algorithms differed, and five of those differed only due to the placement of singletons. Given that GstD1 can potentially recombine, the ELB algorithm seems most appropriate and its results were thus used for the analysis. The reconstructed haplotype data set used in the analysis is provided as a supplemental data file (at http://www.genetics.org/supplemental/).

Overall there was a very small amount of linkage disequilibrium within each of the D. mojavensis populations and in D. arizonae. Only 3 of 36 pairwise comparisons were significant using Fisher's exact test in the Sonora population and only 1 of 10 comparisons was significant in the Baja population. There were no significant pairwise comparisons in the Mojave and Catalina Island populations. Within the northern D. arizonae population only 1 of 153 comparisons showed a departure from independence. Tables of pairwise comparisons and P-values are provided as a supplemental file (supplemental Table 1 at http://www.genetics.org/supplemental/). The level of intragenic recombination in GstD1 is shown in supplemental Table 2 (http://www.genetics.org/supplemental/).

The levels of synonymous, nonsynonymous, and noncoding sequence variation for all three species are shown in Table 1. Overall D. arizonae contained the greatest number of segregating synonymous sites (θ_s). Using 10,000 coalescent simulations the 95% confidence interval of θ_s for the northern D. arizonae population was estimated (0.0208–0.0605) and compared with that of D. mojavensis. For every comparison the level of synonymous variation in the northern D. arizonae population was significantly greater (Mojave, P < 0.001; Catalina Island, P < 0.001; Sonora, P < 0.001; Baja, P = 0.001). A similar pattern between northern D. arizonae and the four D. mojavensis populations was observed comparing both synonymous and noncoding sequence variation (Mojave, P < 0.001; Catalina Island, P < 0.001; Sonora, P < 0.001; Baja, P = 0.011). Within D. mojavensis, the Baja California population contains double the synonymous variation of Sonora and Catalina Island populations and more than seven times that of the Mojave population. Although it may appear that there is a greater amount of nonsynonymous variation when all D. mojavensis populations are combined, this is because the four populations are genetically isolated (see below). When comparing each D. mojavensis population separately, with the exception of the Catalina Island population (P = 0.021), the level of nonsynonymous variation (θ_n) is not significantly different from that of the northern D. arizonae population (95% confidence interval 0.0006–0.0047 based on 10,000 coalescent simulations).

TABLE 1

Synonymous, nonsynonymous, and noncoding sequence variation for GstD1 in D. mojavensis, D. arizonae, and D. navojoa

		Synonymous			Nonsynonymous			Synonymous and noncoding
	N	S_s	π_s	θ_s	S_n	π_n	θ_n	S_s+nc	π_s+nc	θ_s+nc
D. mojavensis
All	63	17	0.0168	0.0241	15	0.0110	0.0067	28	0.0210	0.0237
Mojave	15	1	0.0023	0.0021	2	0.0012	0.0013	1	0.0014	0.0012
Catalina Island	8	3	0.0081	0.0077	0	0.0000	0.0000	3	0.0048	0.0046
Sonora	23	4	0.0083	0.0073	4	0.0016	0.0023	9	0.0104	0.0097
Baja	17	8	0.0068	0.0158	3	0.0019	0.0019	12	0.0075	0.0142
D. arizonae
All	25	26	0.0350	0.0461	6	0.0029	0.0033	31	0.0265	0.0332
Northern	20	21	0.0310	0.0396	4	0.0019	0.0024	24	0.0211	0.0273
Southern	5	2	0.0054	0.0064	0	0.0000	0.0000	3	0.0048	0.0058
D. navojoa
All	4	7	0.0233	0.0255	1	0.0011	0.0011	11	0.0232	0.0246

N, number of lines sampled; S, number of segregating sites; π, mean pairwise difference per base pair; θ, nucleotide diversity per base pair.

Population structure:

Population structure within D. mojavensis and D. arizonae was estimated by calculating the F_ST parameter. Only synonymous and noncoding variation was used in the calculation of F_ST to remove any possible bias due to selection at nonsynonymous sites. To be certain there is no structure within each of the D. mojavensis populations, pairwise F_ST's were calculated for each of the collection sites separately (supplemental Table 3 at http://www.genetics.org/supplemental/). Most of the comparisons were significant with the exception of the two collection sites within the Mojave population and between the three collection sites within the Sonora population. When pooling all collection sites within a population, there is significant genetic isolation between all four D. mojavensis populations (Table 2). With the exception of the comparison between the Catalina Island and the Mojave population (P = 0.99), a similar pattern was observed when using nonsynonymous sites (supplemental Table 4 at http://www.genetics.org/supplemental/). This result was expected since the Catalina Island and Mojave populations share the same protein haplotype. The northern D. arizonae population is composed of collections within the Sonoran Desert and Riverside, California. There was no significant F_ST between those two collection sites (F_ST = 0.068, P = 0.09). When pooling all the northern D. arizonae collections and comparing them to the southern D. arizonae, significant genetic isolation was observed (F_ST = 0.59, P < 0.001).

TABLE 2

Pairwise F_ST between the four D. mojavensis populations

	Mojave	Catalina Island	Sonora
Catalina Island	0.654***
Sonora	0.842***	0.788***
Baja	0.902***	0.855***	0.252***

The data were permuted 10,000 times to calculate significance. ***P < 0.001.

Phylogenetic analysis:

Both ML and Bayesian methods of phylogenetics analysis were used in this study. With very minor exceptions, both methods resulted in the same phylogenetic structure. Figure 2 shows the result of the Bayesian analysis (results of the ML analysis not shown). For the Bayesian analysis the evolutionary model chosen was K80 + I + Γ (I = 0.573, Γ = 0.883) with codon partitioning (Kimura 1980). Both D. mojavensis and D. arizonae are reciprocally monophyletic. The D. mojavensis clade is split into two well-supported groups, Baja/Sonora and Catalina Island/Mojave. The southern D. arizonae population forms a monophyletic group within the larger polyphyletic clade of D. arizonae.

An external file that holds a picture, illustration, etc.
Object name is GEN17821073f2.jpg

Figure 2.—

Bayesian tree analysis for all GstD1 sequences using the K80 + I + Γ model (see text for more details). The support for branches is shown. The species (solid brackets) and populations (dashed brackets) are shown. Sample identifications for D. mojavensis: DE, Desemboque, Mexico; HE, Hermosillo, Mexico; MJ, Guaymas, Mexico; MJBC, La Paz, Mexico; CI, Catalina Island, California; ANZA, Anza-Borrego, California; WC, Grand Canyon, Arizona. Sample identifications for D. arizonae: ARNA, Navojoa, Mexico; ARTU, Tucson, Arizona; RVSD, Riverside, California; CHI, Chiapas, Mexico; HID and MXT, Hidalgo, Mexico. Sample identifications for D. navojoa: NAV1, Tehuantepec, Mexico; NAV10, El Dorado, Mexico; NAV11 and NAV12, Jalisco, Mexico.

Frequency and test of neutrality analysis:

Given the above determined population structure within the two species, demographic changes within each population could be determined with less bias by analyzing each population separately. Table 3 contains the summary statistics for several frequency-distribution tests done on each of the D. mojavensis and D. arizonae populations (analysis of D. navojoa is not shown given low statistical power due to the small sample size). In the Baja D. mojavensis population the above statistics were all negative and the majority significantly so. The only other D. mojavensis population with negative statistics was the Sonora population, although none were significant. In both D. arizonae populations the majority of the statistics were negative (see Table 3).

TABLE 3

Summary statistics of frequency distribution tests

Species	Population	Tajima's D^{^a}	Fu's F_S^{^b}	Fu and Li's D^{^c}	Fay and Wu's H^{^d}
D. mojavensis	Mojave	0.035	−0.488	1.062	0.724
	Catalina Island	0.204	0.562	0.165	0.714
	Sonora	−0.146	−9.732	−0.221	−0.431
	Baja	−1.426**	−5.194*	−2.138**	−1.412
D. arizonae	Northern	−0.878**	−8.391	−0.752	3.600
	Southern	−1.049	−0.186	−1.569	0.900

*P < 0.1, **P < 0.05.

^{Tajima (1989)}.

^{Fu (1997)}.

^{Fu and Li (1993)}.

^{Fay and Wu (2000)}.

The distribution of sequence divergence across GstD1, especially that of nonsynonymous divergence, appears to be heterogeneous in the Baja California and mainland populations, while variation is distributed homogenously within the Catalina Island and Mojave populations (Figure 3). Another method of comparing the level fixed differences and polymorphisms is the MK test (Table 4). This test was implemented in two ways (standard and lineage specific): the first compared each population against the combined data set of the sister species (e.g., Baja D. mojavensis vs. all D. arizonae) and the second used the sequence of the outgroup D. navojoa to polarize the fixed differences to be able to examine lineage-specific departures from neutrality. Although the MK test is robust to demographic changes (Nielsen 2001), the effect of such changes on these analyses cannot be fully determined until a robust knowledge of the demographic histories of the D. mojavensis populations and D. arizonae is ascertained (a large-scale multilocus analysis of the demographic histories of D. mojavensis and D. arizonae is in progress). Another factor that can affect the interpretation of the MK test is if the gene occurs in a region of low recombination (Shapiroet al. 2007). Genes in regions of low recombination, such as those located at either the centromeric or the telomeric ends of chromosomes, are more drastically affected by hitchhiking (Smith and Haigh 1974; Andolfatto and Przeworski 2001) and background selection (Charlesworthet al. 1993) at nearby genes. Recombination rates across chromosomes in D. mojavensis have not been determined, but given that GstD1 is located in the middle of Muller element E (Drosophila 12 Genomes Consortium 2007) it is not likely to be in a low-recombining region of the genome. An assumption of the MK test is that all synonymous positions be neutral; this is not the case in the presence of codon usage bias (Kliman and Hey 1994; Akashi 1999). The effective number of codons (ENC) is a measure of codon usage bias that ranges from 20 (extreme bias) to 61 (equal use of all codons) (Wright 1990). In Gst, the ENC in D. mojavensis (32.4) and D. arizonae (32.8) are similar and do not show a very high level of bias; therefore codon usage bias is not expected to greatly affect the analysis.

An external file that holds a picture, illustration, etc.
Object name is GEN17821073f3.jpg

Figure 3.—

Distribution of the nonsynonymous to synonymous divergence (K_a/K_s), using a sliding-window analysis (window size, 50 bp; step size, 10 bp). The divergence between each of the four D. mojavensis populations and D. arizonae is shown. The circles indicate the seven nonsynonymous changes that occur between the Baja and Sonora populations of D. mojavensis and D. arizonae. The solid circles are those changes that occur in the active site of GSTD1.

TABLE 4

Fixed and polymorphic variation at GstD1 and tests for departure from neutrality

			Standard MK test			Lineage-specific MK test
Species	Population		Fixed	Polymorphic	Test	Fixed	Polymorphic	Test
D. mojavensis	Mojave	Syn.	7	32	Fisher's	5	1	Fisher's
		Nonsyn.	0	7	NS	0	2	NS
	Catalina	Syn.	6	34	Fisher's	4	3	NA
	Island	Nonsyn.	0	6	NS	0	0
	Sonora	Syn.	6	40	G-test	4	9	G-test
		Nonsyn.	6	10	*	6	4	NS
	Baja	Syn.	6	43	G-test	4	12	G-test
		Nonsyn.	7	9	**	7	3	*
D. arizonae	Northern	Syn.	3	52	Fisher's	2	24	Fisher's
		Nonsyn.	0	18	NS	0	4	NS
	Southern	Syn.	9	31	G-test	8	3	Fisher's
		Nonsyn.	1	15	NS	1	0	NS

Synonymous sites include changes within both the coding and the noncoding regions. Syn., synonymous; nonsyn., nonsynonymous. *P < 0.05, **P < 0.01.

Results for both Baja and Sonora populations of D. mojavensis significantly reject the assumptions of neutrality of the MK test in the direction of a greater fixation of nonsynonymous substitutions (Table 4). For example, in the Baja population the ratio of nonsynonymous to synonymous polymorphisms is 0.25 while the same ratio for fixed differences that occur in the lineage leading to the Baja D. mojavensis populations is 1.75. A significant departure from the neutral expectation was not found in the other two D. mojavensis populations or in D. arizonae (Table 4).

Given the relatively recent divergence of D. mojavensis and D. arizonae, a large number of fixations between the species would not be expected. If the divergence time between the species is ≪4N_e generations ago the incomplete assortment of the ancestral variation would be expected. To better estimate the divergence between the two species all the population data on D. mojavensis and D. arizonae have been compiled, including data from studies of the following loci: from Adh-1, Adh-2, and G6pd (Matzkin and Eanes 2003; Matzkin 2004); the 10 random nuclear loci from Machadoet al. (2007); and GstD1 (this study). These data were used to perform 10,000 coalescent simulations using the HKA program (J. Hey; http://lifesci.rutgers.edu/∼heylab/). The divergence time between D. mojavensis and D. arizonae was estimated to have occurred 3.3N_e generations ago. According to Clark (1997) this would suggest that the majority of the once shared variation would have been fixed or lost in one or the other lineage. Any polymorphism that has not fixed in one or possibly both species would increase the number of polymorphisms within species and decrease the amount of fixed differences between the species. This essentially will make it difficult to observe a pattern of variation suggesting adaptive protein evolution, as was seen for GstD1 in this study.

To better determine the effect of the divergence time between D. mojavensis and D. arizonae on the ability to detect a significant pattern of adaptive protein evolution a series of coalescent simulations were performed. The simulations were performed only for the two populations that show a significant departure from neutrality, Baja California and Sonora (Table 4). Using the ms program (Hudson 2002), 10,000 coalescent simulations were performed of a D. mojavensis population splitting from D. arizonae 3.3N_e generations ago. For the Baja California and Sonora populations θ was set to 4.48 and 3.54, respectively (θ-values are from GstD1). Since the focus was on the strict neutral model the observed level of variation at GstD1 was used to calculate the probability that a given mutation from a simulated replicate would be assigned to either a synonymous or a nonsynonymous site class (0.12 for Baja California and 0.19 for Sonora). For each replicate, a MK test was performed and significance was calculated using a χ^{-test. For the Baja California population the probability of observing a significant MK test suggesting adaptive protein evolution under the neutral model is}P = 0.015 and for the Sonora population is P = 0.049. Therefore, it is unlikely that the significant results observed in this study are a result of the relatively short divergence time between D. mojavensis and D. arizonae.

Intra- and interspecific variation at GstD1:

TABLE 1

Synonymous, nonsynonymous, and noncoding sequence variation for GstD1 in D. mojavensis, D. arizonae, and D. navojoa

		Synonymous			Nonsynonymous			Synonymous and noncoding
	N	S_s	π_s	θ_s	S_n	π_n	θ_n	S_s+nc	π_s+nc	θ_s+nc
D. mojavensis
All	63	17	0.0168	0.0241	15	0.0110	0.0067	28	0.0210	0.0237
Mojave	15	1	0.0023	0.0021	2	0.0012	0.0013	1	0.0014	0.0012
Catalina Island	8	3	0.0081	0.0077	0	0.0000	0.0000	3	0.0048	0.0046
Sonora	23	4	0.0083	0.0073	4	0.0016	0.0023	9	0.0104	0.0097
Baja	17	8	0.0068	0.0158	3	0.0019	0.0019	12	0.0075	0.0142
D. arizonae
All	25	26	0.0350	0.0461	6	0.0029	0.0033	31	0.0265	0.0332
Northern	20	21	0.0310	0.0396	4	0.0019	0.0024	24	0.0211	0.0273
Southern	5	2	0.0054	0.0064	0	0.0000	0.0000	3	0.0048	0.0058
D. navojoa
All	4	7	0.0233	0.0255	1	0.0011	0.0011	11	0.0232	0.0246

N, number of lines sampled; S, number of segregating sites; π, mean pairwise difference per base pair; θ, nucleotide diversity per base pair.

Population structure:

TABLE 2

Pairwise F_ST between the four D. mojavensis populations

	Mojave	Catalina Island	Sonora
Catalina Island	0.654***
Sonora	0.842***	0.788***
Baja	0.902***	0.855***	0.252***

The data were permuted 10,000 times to calculate significance. ***P < 0.001.

Phylogenetic analysis:

Figure 2.—

Frequency and test of neutrality analysis:

TABLE 3

Summary statistics of frequency distribution tests

Species	Population	Tajima's D^{^a}	Fu's F_S^{^b}	Fu and Li's D^{^c}	Fay and Wu's H^{^d}
D. mojavensis	Mojave	0.035	−0.488	1.062	0.724
	Catalina Island	0.204	0.562	0.165	0.714
	Sonora	−0.146	−9.732	−0.221	−0.431
	Baja	−1.426**	−5.194*	−2.138**	−1.412
D. arizonae	Northern	−0.878**	−8.391	−0.752	3.600
	Southern	−1.049	−0.186	−1.569	0.900

*P < 0.1, **P < 0.05.

^{Tajima (1989)}.

^{Fu (1997)}.

^{Fu and Li (1993)}.

^{Fay and Wu (2000)}.

Figure 3.—

TABLE 4

Fixed and polymorphic variation at GstD1 and tests for departure from neutrality

			Standard MK test			Lineage-specific MK test
Species	Population		Fixed	Polymorphic	Test	Fixed	Polymorphic	Test
D. mojavensis	Mojave	Syn.	7	32	Fisher's	5	1	Fisher's
		Nonsyn.	0	7	NS	0	2	NS
	Catalina	Syn.	6	34	Fisher's	4	3	NA
	Island	Nonsyn.	0	6	NS	0	0
	Sonora	Syn.	6	40	G-test	4	9	G-test
		Nonsyn.	6	10	*	6	4	NS
	Baja	Syn.	6	43	G-test	4	12	G-test
		Nonsyn.	7	9	**	7	3	*
D. arizonae	Northern	Syn.	3	52	Fisher's	2	24	Fisher's
		Nonsyn.	0	18	NS	0	4	NS
	Southern	Syn.	9	31	G-test	8	3	Fisher's
		Nonsyn.	1	15	NS	1	0	NS

Synonymous sites include changes within both the coding and the noncoding regions. Syn., synonymous; nonsyn., nonsynonymous. *P < 0.05, **P < 0.01.

DISCUSSION

Population history of D.mojavensis and D. arizonae:

The pattern of variation at GstD1 strongly supports the existence of four genetically isolated populations for D. mojavensis (Table 2). This observation supports previously studies using both nuclear and mitochondrial genes (Ross and Markow 2006; Machadoet al. 2007; Reedet al. 2007). A point of disagreement between studies on nuclear genes (Machadoet al. 2007), including this study, and mitochondrial genes (Reedet al. 2007) is the relationships between the four D. mojavensis populations. As seen in GstD1, nuclear genes show that the Baja and mainland populations form one cluster and the Catalina Island and Mojave populations form another (Figure 2). Mitochondrial genes suggest that the Catalina Island population is embedded within the Baja/mainland cluster and that the Mojave population is most basal. The mitochondrial pattern contradicts the previous proposed model of the speciation of D. mojavensis and D. arizonae. Ruizet al. (1990) proposed that D. mojavensis diverged from D. arizonae in the Baja California peninsula by an allopatric event, and then a small subpopulation migrated north to the Mojave Desert and eventually became isolated. A further colonization event from this isolated Mojave population led to the Catalina Island population and in a similar fashion the Sonora population was later formed by colonizers from the Baja peninsula. According to this hypothesis it is expected that the greatest diversity will be observed within the Baja population and that the Catalina Island and Sonora populations will be clustered within the Mojave and Baja populations, respectively. Evidence from GstD1 (this study), Adh-1/Adh-2 (Matzkin and Eanes 2003; Matzkin 2004), and a random set of 10 nuclear markers (Machadoet al. 2007) shows that there is a greater level of nucleotide variation within the Baja population (Table 1). Although the phylogeographic pattern at GstD1 supports a Baja/Sonora and Catalina Island/Mojave clustering (Figure 2), the Catalina Island population did not appear to be nested within Mojave and relationships between Baja and Sonora could not be fully resolved.

The level of synonymous variation in northern D. arizonae is significantly greater than that in all four of the D. mojavensis populations, suggesting a greater effective population size in northern D. arizonae (Table 1). The level of variation in the southern D. arizonae population samples appears to be similar to that of D. mojavensis, but only few individuals were available from that population. Similarly to D. mojavensis, disagreements between nuclear and mitochondrial markers are observed in D. arizonae. A greater effective population size of D. arizonae relative to D. mojavensis has previously been observed for nuclear genes (Matzkin and Eanes 2003; Matzkin 2004; Machadoet al. 2007), while the opposite was observed for the mitochondria (Reedet al. 2007). A different pattern of variation between nuclear (diparentally inherited) and mitochondrial (maternally inherited) markers could be due to differences in dispersal rates between the sexes, the mating system (Chesser and Baker 1996), selection on the mitochondria (Bazinet al. 2006), or possibly genetic draft associated with endosymbionts (Meiklejohnet al. 2007). No sex-specific dispersal rates were previously observed in D. mojavensis, but this was examined only in one of the four populations (Markow and Castrezana 2000). Spiroplasma endosymbionts have been found in D. mojavensis (Mateoset al. 2006), although the nature of this interaction has not been resolved. These discrepancies in results between nuclear and mitochondrial genes warrant future examination.

Evolutionary history of GstD1:

The cactophilic lifestyle of D. mojavensis and D. arizonae, as well as other members of the Repleta group, subjects them to a wide array of chemical, sometimes toxic, environments (Kircher 1982). This chemical diversity is dependent on both the cactus host and the bacterial and yeast communities associated with the necrosis (Starmer 1982; Starmeret al. 1986). For example, differences in alcohols between agria and organpipe were proposed to have driven the molecular and functional divergence of Adh between the Baja and Sonora populations (Matzkin 2004, 2005). In addition to alcohols, a variety of other compounds are experienced by populations of D. mojavensis, from the alkaloids of Opuntia in Catalina Island to the triterpenes and glycosides of the columnar cacti (agria and organpipe) of both Baja and Sonora (Kircher 1977, 1980, 1982). Although chemical differences occur between agria and organpipe necrosis, these two columnar cacti share many chemical properties. Agria and organpipe are closely related, both members of the same genus, Stenocereus, and are distantly related and more derived relative to barrel cactus and even further to prickly pear (Cota and Wallace 1997; Nyffeler 2002). The phylogenetic relationships, and thus the chemical relatedness between the cactus hosts (agria/organpipe vs. barrel/prickly pear), appear to associate closely with the pattern of variation at GstD1 (Baja/Sonora vs. Catalina Island/Mojave).

In the Baja and Sonora populations, a large level of interspecific amino acid divergence in the 5′ and 3′ ends of the GstD1 locus was observed (Figure 3). Although the Baja and Sonora populations appear significantly diverged when examining synonymous positions (Table 2), they appear to have preserved a very similar protein haplotype for GSTD1. Similarly, the populations in Mojave and Catalina Island share the same protein haplotype, but there are no fixed amino acid differences at GSTD1 between these populations and D. arizonae (Table 4). Under a neutral scenario these strikingly different patterns of variation between the Baja/Sonora and Catalina Island/Mojave populations could reflect a reduction in population size in the Baja/Sonora lineage that allowed for the fixation of slightly deleterious amino acid polymorphisms and a subsequent maintenance of a large population size in Catalina Island/Mojave that would deter the fixation of slightly deleterious amino acid polymorphisms. This scenario, which excludes selection as the driver for the pattern of variation seen between populations, is not supported by the observations made in this study. A significant shift in the frequency spectrum was observed for the Baja population and a similar pattern was seen for the Sonora population (Table 3). This shift in the frequency spectrum can be a product of either demographic changes, such as population expansion, or positive selection (Nielsen 2001). The effect of a population expansion will be genomewide and therefore a similar signature at many loci should be expected. Recently, a combined analysis of 10 random regions across the genome of D. mojavensis observed that for all four populations there was an overall signature of population expansion (Machadoet al. 2007). In that study as well as in this one, it is clear that the level of variation, and hence population size, is greatest in the Baja and Sonora populations relative to the Catalina Island and Mojave populations. Furthermore, as the level of variation in Catalina Island and Mojave is much lower than that for D. arizonae, it seems unlikely that the lack of amino acid differences is associated with a large population size in Catalina Island and Mojave. Therefore it appears that the pattern of variation observed could have been formed as a result of positive selection in the Baja/Sonora lineage and purifying selection in the Catalina Island/Mojave lineage.

A significant departure from the neutral expectation is observed in the Baja and Sonora populations (Table 4). Furthermore, the lineage-specific MK test (Table 4) suggests that more amino acid changes occurred in the lineage leading to the Baja and Sonora populations of D. mojavensis than is expected by chance. Both Baja and Sonora share the same amino acid differences with D. arizonae, with the exception of residue 178, which is fixed in Baja and found in all but one individual sampled from Sonora. The location of the seven amino acids provides further evidence that this locus has gone through a period of adaptive protein evolution in the past. Two of the seven fixed differences between D. arizonae and D. mojavensis from Baja and Sonora (Leu-7-Gln and His-39-Gln) occur in the active-site pocket of the GSTD1 enzyme. The crystallographic structure and active site have been determined previously for the A. gambiae homolog of the D. mojavensis GSTD1 (Chenet al. 2003). The active-site pocket is composed of two subsites, the G-site and the H-site. The G-site, mostly composed of hydrophilic and polar residues, is the location where glutathione binds, while the H-site is a hydrophobic pocket where the substrate binds (Chenet al. 2003). The histidine (positively charged) to glutamine (polar) change at residue 39 occurs in the G-site. Although this substitution might appear to cause a charge change in the molecule, given the isoelectric point of histidine (7.6) at a physiological pH (nearly neutral) the side chain will not be charged. The leucine (hydrophobic) to glutamine (polar) change occurs in the H-site. The location of these two changes, plus the fact that they also affect the polarity of the active site, could have significant consequences for the activity and substrate specificity of GSTD1.

It has been proposed (Heed 1982) that the host of the D. mojavensis/D. arizonae ancestor was prickly pear. Evidence from this study and previous surveys (Machadoet al. 2007) suggests that the Baja/Sonora and Catalina Island/Mojave clusters are well supported. Therefore it can be hypothesized that following the divergence of D. mojavensis, a population settled in Baja or Sonora and shifted to columnar cacti, while a population migrated north remaining in prickly pear and subsequently moved to barrel cactus. The shift to columnar cacti by the ancestral Baja/Sonora population exposed them to alternative compounds. It is at this moment in its recent evolutionary history that it appears that selection at GstD1 took place. Given the pattern of sequence divergence observed in this study, its transcriptional variation (Matzkinet al. 2006), and the known role of GstD1 in detoxification, this strongly suggests that GstD1 played a role in the columnar cactus host adaptation of Baja and Sonora populations of D. mojavensis.

Transcriptional and coding sequence evolution:

Cactophilic Drosophila, as well as many other phytophagous insects, have adapted to the chemical composition of their host (Becerra 1997). The effects of and response to such chemical diversity are especially great in the presence of a host shift. It is at this point where the ability to modulate enzyme activity and pathway fluxes, especially those involved in detoxification, is of great importance.

Enzymes tend to have as their substrate a set of related compounds; when faced with a nonoptimal compound the velocity (i.e., enzyme activity) of the reaction will be reduced. To maintain wild-type activity, changes at the coding sequence (e.g., modification of the active site) or in the enzyme titer (e.g., increased expression) will be needed. When faced with an environmental shift (e.g., cactus host shift), four outcomes can be predicted for an enzyme, termed here as (1) neutral, (2) coding sequence, (3) transcriptional, and (4) coding sequence and transcriptional.

In the first case neutral changes in the coding sequence and in the expression of the enzyme will accumulate because the enzyme has no involvement in the detoxification or because of constraint. The second outcome involves the fixation of new nonsynonymous mutations that modify the active site, and hence increase the enzyme activity, although new mutations are not necessary for this outcome (Hartlet al. 1985). For the last two outcomes changes in the ability to modulate gene expression are needed. Given the structure of transcription binding sites it is probable that mutations can create a potential binding domain (Stone and Wray 2001). As a response to an environmental shift the titer and expression of transcription factors may be affected such that the previously nonfunctional binding domain will be activated and might subsequently increase the expression of the enzyme and hence increase activity. Therefore, it is these genetic backgrounds that have the ability to modulate gene expression that will increase in frequency (third outcome). The pattern of variation in GstD1 highlights the fourth outcome, which in addition to transcriptional modulation shows significant changes at the protein level. Evolutionary change, as observed in D. mojavensis, can therefore involve a combination of transcriptional and coding sequence changes.

Population history of D.mojavensis and D. arizonae:

Evolutionary history of GstD1:

Transcriptional and coding sequence evolution:

Department of Ecology and Evolutionary Biology and Center for Insect Science, University of Arizona, Tucson, Arizona 85721

^{Address for correspondence:} Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St., Tucson, AZ 85721-0088. E-mail: ude.anozira.liame@nikztaml

Communicating editor: L. Harshman

Received 2007 Oct 11; Accepted 2007 Dec 6.

Abstract

Drosophila mojavensis is a cactophilic fly endemic to the northwestern deserts of North America. This species includes four genetically isolated cactus host races each individually specializing on the necrotic tissues of a different cactus species. The necrosis of each cactus species provides the resident D. mojavensis populations with a distinct chemical environment. A previous investigation of the role of transcriptional variation in the adaptation of D. mojavensis to its hosts produced a set of candidate loci that are differentially expressed in response to host shifts, and among them was glutathione S-transferase D1 (GstD1). In both D. melanogaster and Anopheles gambiae, GstD1 has been implicated in the resistance of these species to the insecticide dichloro-diphenyl-trichloroethane (DDT). The pattern of sequence variation of the GstD1 locus from all four D. mojavensis populations, D. arizonae (sister species), and D. navojoa (outgroup) has been examined. The data suggest that in two populations of D. mojavensis GstD1 has gone through a period of adaptive amino acid evolution. Further analyses indicate that of the seven amino acid fixations that occurred in the D. mojavensis lineage, two of them occur in the active site pocket, potentially having a significant effect on substrate specificity and in the adaptation to alternative cactus hosts.

Abstract

THE concept of “evolution at two levels” proposed by King and Wilson (1975) predicts that evolutionary change is a function of both coding sequence and transcriptional variation. There are myriads of examples highlighting the role of coding sequence variation in evolution and recently this pattern has been observed at the genomic level (Fayet al. 2002; Bierne and Eyre-Walker 2004; Bustamanteet al. 2005; Voightet al. 2006). As well, it is becoming apparent that natural selection plays a large role in interspecific transcriptional variation (Rifkinet al. 2003; Nuzhdinet al. 2004; Giladet al. 2006). What is sometimes lacking in many studies, and often the least tractable, is an understanding of how the transcriptional and sequence variation relates to the ecology of the organism.

Drosophila mojavensis offers a unique opportunity to incorporate knowledge of its ecology with its transcriptional and sequence variation. D. mojavensis and its sister species D. arizonae are cactophilic flies that diverged ∼1.5 million years ago (Matzkin and Eanes 2003; Matzkin 2004; Reedet al. 2007). These species utilize the necrotic tissues of several cactus species as their host. The range of D. mojavensis is composed of four geographically and genetically isolated host races (Ross and Markow 2006; Machadoet al. 2007; Reedet al. 2007). Each host race, mainland Sonora Desert, Baja California, Catalina Island, and Mojave Desert, utilizes a different species of cactus: organpipe (Stenocereus thurberi), agria (S. gummosus), prickly pear (Opuntia spp.), and barrel (Ferocactus cylindraceus), respectively (Fellows and Heed 1972; Ruiz and Heed 1988). D. mojavensis has been proposed (Ruizet al. 1990) to have originated in Baja California, utilizing a Stenocereus cactus (possibly agria), and then migrated up the peninsula and colonized Catalina Island and the Mojave Desert, shifting cactus hosts in the process. A subsequent colonization (and host shift) from Baja to Sonora established the present-day mainland Sonora Desert population.

The differences in the chemical composition of the cacti, in addition to the microflora associated with the necrosis (Starmer 1982; Starmeret al. 1986), produce very distinct chemical environments to which each population of D. mojavensis must adapt (Heed 1978; Vacek 1979; Kircher 1982). Some of the chemical differences include such compounds as alcohols, alkaloids (in Opuntia), triterpenes, and glycosides. Previous studies in D. mojavensis have shown that this chemical variation can drive the molecular and functional evolution of metabolic genes (Matzkin and Eanes 2003; Matzkin 2004, 2005). For example, alleles of alcohol dehydrogenase-2 (ADH-2) with the greatest activity on 2-propanol are found at highest frequency in a population (Baja California) that experiences greater 2-propanol concentration within the cactus necrosis (agria). Additionally, a previous study examining transcriptional variation in D. mojavensis identified ADH-2 as one of a series of genes whose expression is induced as a response to a cactus host shift (Matzkinet al. 2006). This suggests that both transcriptional and coding sequence changes are involved in the adaptation process.

Among the list of candidate genes previously shown to differentially regulate with host utilization was a gene with high sequence identity to the D. melanogaster glutathione S-transferase D1 (GstD1) gene (Matzkinet al. 2006). In D. melanogaster (Tang and Tu 1994), as well as in the mosquito Anopheles gambiae (Ransonet al. 2001), GstD1 has a high activity and is presumed to be involved in the resistance to the insecticide dichloro-diphenyl-trichloroethane (DDT). In general several members of the Gst gene family have been known to play a role in detoxification in many taxa, including insects (Enayatiet al. 2005). GSTs generally function on hydrophobic organic compounds, altering their hydrophobicity (Atkinset al. 1993).

Given the large quantities of organic compounds within the cactus necrosis, some of which are toxic, there is constant selection pressure on D. mojavensis to evolve resistance to these compounds, especially in the presence of a cactus host shift. Therefore, it can be predicted that as a consequence of a cactus host shift, detoxification enzymes might be under selection (at both the transcriptional and the coding sequence level). This study investigates the pattern of variation among the different cactus host populations of D. mojavensis and D. arizonae at GstD1, a gene previously suggested to play a role in host adaptation (Matzkinet al. 2006). Glutathione S-transferase D1 appears to have been through a period of adaptive protein evolution associated with the cactus host shifts of two D. mojavensis populations. The fixed amino acid changes that occurred in D. mojavensis have potentially large functional consequence given their location on the enzyme's structure. Conversely, strong purifying selection appears to have occurred in the other two D. mojavensis populations as well as in D. arizonae, possibly as a result of commonalities in cactus host utilization. Additionally, the pattern of population structure at this locus suggests an alternative scenario to the relationship and history of the four D. mojavensis host races.

N, number of lines sampled; S, number of segregating sites; π, mean pairwise difference per base pair; θ, nucleotide diversity per base pair.

The data were permuted 10,000 times to calculate significance. ***P < 0.001.

*P < 0.1, **P < 0.05.

Synonymous sites include changes within both the coding and the noncoding regions. Syn., synonymous; nonsyn., nonsynonymous. *P < 0.05, **P < 0.01.

Acknowledgments

I thank Kristin Sweetser for all her hard work on this project and Therese Markow for assistance and support. Therese Markow, Carlos Machado, Jeremy Bono, and two anonymous reviewers provided helpful suggestions and comments on the manuscript. This study was supported by the National Institute of General Medical Sciences 2K12 G000798-06 Postdoctoral Excellence in Research and Teaching fellowship through the Center for Insect Science at the University of Arizona to L.M.M.

Acknowledgments

Notes

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. EU079380–EU079471.

Notes

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. EU079380–EU079471.

The Molecular Basis of Host Adaptation in Cactophilic Drosophila: Molecular Evolution of a Glutathione <em>S</em>-Transferase Gene (<em>GstD1</em>) in <em>Drosophila mojavensis</em>

MATERIALS AND METHODS

Sampling and sequencing:

Data analysis:

Sampling and sequencing:

Data analysis:

RESULTS

Intra- and interspecific variation at GstD1:

TABLE 1

Population structure:

TABLE 2

Phylogenetic analysis:

Frequency and test of neutrality analysis:

TABLE 3

TABLE 4

Intra- and interspecific variation at GstD1:

TABLE 1

Population structure:

TABLE 2

Phylogenetic analysis:

Frequency and test of neutrality analysis:

TABLE 3

TABLE 4

DISCUSSION

Population history of D.mojavensis and D. arizonae:

Evolutionary history of GstD1:

Transcriptional and coding sequence evolution:

Population history of D.mojavensis and D. arizonae:

Evolutionary history of GstD1:

Transcriptional and coding sequence evolution:

Abstract

Acknowledgments

Notes