BMC Plant Biology. Dec/31/2018; 19

Published online Jan/14/2019

PMID: 30646861

PMC: 6332543

doi: 10.1186/s12870-018-1601-1

Transcriptomic signature reveals mechanism of flower bud distortion in witches’-broom disease of soybean (Glycine max)

Amit G. Deshmukh+9 authors

Abstract

Background

Soybean (Glycine max L. Merril) crop is major source of edible oil and protein for human and animals besides its various industrial uses including biofuels. Phytoplasma induced floral bud distortion syndrome (FBD), also known as witches’ broom syndrome (WBS) has been one of the major biotic stresses adversely affecting its productivity. Transcriptomic approach can be used for knowledge discovery of this disease manifestation by morpho-physiological key pathways.

Results

We report transcriptomic study using Illumina HiSeq NGS data of FBD in soybean, revealing 17,454 differentially expressed genes, 5561 transcription factors, 139 pathways and 176,029 genic region putative markers single sequence repeats, single nucleotide polymorphism and Insertion Deletion. Roles of PmbA, Zn-dependent protease, SAP family and auxin responsive system are described revealing mechanism of flower bud distortion having abnormalities in pollen, stigma development. Validation of 10 randomly selected genes was done by qPCR. Our findings describe the basic mechanism of FBD disease, right from sensing of phytoplasma infection by host plant triggering molecular signalling leading to mobilization of carbohydrate and protein, phyllody, abnormal pollen development, improved colonization of insect in host plants to spread the disease. Study reveals how phytoplasma hijacks metabolic machinery of soybean manifesting FBD.

Conclusions

This is the first report of transcriptomic signature of FBD or WBS disease of soybean revealing morphological and metabolic changes which attracts insect for spread of disease. All the genic region putative markers may be used as genomic resource for variety improvement and new agro-chemical development for disease control to enhance soybean productivity.

Electronic supplementary material

The online version of this article (10.1186/s12870-018-1601-1) contains supplementary material, which is available to authorized users.

Background

Soybean (Glycine max L. Merril) caters the global need of edible oil (25%) and protein concentrate (67%) in animal feeds for livestock, poultry and fish. It contributes in meat, milk, cheese, bread, and oil. Per unit area, protein produced by this crop is highest, thus this crop has various epithets like “Cow of the field” or “Gold from soil”, “yellow jewel”, “great treasure”, “nature’s miracle protein” [1]. The top five global soybean producers which contribute more than 92% are, USA, Brazil, Argentina, China and India [2]. It is also grown in four major continents covering more than 20 countries including India. Soybean is a promising crop due to its potential beneficial roles in lowering of cholesterol, anti-carcinogenic effects, and protective effects against obesity, diabetes, irritants of the digestive tract, bone, and kidney diseases [3]. Soybean is also much more relevant as bioenergy crop for biofuel production along with their co-products as livestock feed [4]. Productivity of soybean is adversely affected by various biotic and abiotic stresses like salinity, pathogens, heat, drought, heat, soil heavy metals and compaction [5]. Among the biotic stresses, phytoplasma and viruses are major impediments causing productivity loss from 2 to 90% [6]. According to the earlier reports, several biotic factors have been identified in the host leading to reduced pollen vigour [7], seed mass variants [8], floral abnormalities, sepal hypertrophy, virescence, phyllody, aborted reproductive organs [9], enhanced vegetative growth [10], premature flower abortion [11], etc. Cytological examination of plant reproductive organs revealed distorted pollen grains in the symptomatic plants hampering pollination and fertilization [12].

Symptom and transmission of witches’ broom disease of soybean is well documented in India [13]. This disorder leads to failure of pod formation and lack the senescence maturity till the end of season. The symptoms are also not predictable. Abnormal development in the floral bud formation in witches’-Broom Disease of soybean causes, even acute loss in productivity [12]. It is referred in India as ‘floral bud distortion’ caused by phytoplasma [14, 15]. In USA and Iran, it is called bud proliferation syndrome [16], and pod set failure [17, 18], respectively.

Earlier gene expression studies on floral tissues are confines to microarray based limited differential gene expression [19]. Though RNA-Seq atlas of soybean has 14 diversified tissues including flower but no expression data is available for any abiotic and biotic stresses [20, 21]. Cytological and limited molecular study on FBD has been carried out revealing 27 DEG having ARF9 transcriptional factor and a regulatory protein FHA along with their protein-protein interaction network [12]. Though whole genome sequence of soybean has been done in 2010 [22] but without RNA-Seq approach detailed mechanism of FBD cannot be deciphered. Such knowledge discovery is not only required to understand basic mechanism of disease but is also relevant in designing of disease combating strategies in the endeavour of soybean productivity. By RNA-Seq approach stress responsive major morphological and physiological pathways in a single crop genotype has been successfully deciphered in crop like tea [23], tomato [24] and cassava [25]. In case of agricultural crop like soybean, stress response has been found specific to varieties/ accessions [26].

Present work aims to decipher key candidate genes and molecular mechanism in manifestation of flower bud distortion in witches’ Broom Disease of soybean. This study also aims to describe biochemical pathways along with genic region marker discovery (SSR, SNP and InDels) and prediction of TFs associate with FBD.

Results and discussion

Identification of symptomatic and asymptomatic plant of FBD

Discrimination of symptomatic and asymptomatic plants was done successfully by cytological studies of floral reproductive organs by pollen viability, morphology, stigma receptivity, anther morphology and germination assay (Fig. 1). SEM of floral organ, pollen and anther were also obtained to confirm the same (Fig. 2). Subsequently, they were further successfully confirmed by PCR based test by amplicon size as well as positive and negative control (Fig. 3). Since mycoplasma infection in asymptomatic plant is not easy to rule out, hence PCR based diagnostics is necessary [27].

Fig. 1

Cytological studies of floral reproductive organs in FBD symptomatic and asymptomatic soybean plant. a Stigma receptivity, b Anther morphology, c Percent pollen viability, d Pollen germination, e Percent pollen viability in FBD found reduced than asymptomatic, f Pollen germination ability in FBD found reduced than asymptomatic

Fig. 2

Scanning electron micrographs of floral organs of symptomatic and asymptomatic soybean plant a & b: Pollen on anthers of asymptomatic plant; c Pollen grain in asymptomatic plant; d Anthers with no/less number of pollens in symptomatic plant; E&F: Distorted pollen grains in the symptomatic plant

Fig. 3

PCR based screening of symptomatic and asymptomatic soybean plants for phytoplasma using P1-P7/P1-P3 primers. Lane 1: Ladder; 2: Positive control; 3, 5 and 7: asymptomatic soybean plants; 4 and 6: symptomatic soybean plants; 8: Negative control

RNA-Seq data generation

A total of 30,669,583 and 36,967,121 paired end reads for both the sets, i.e., infected and control, respectively were generated. A total of 156,054 and 198,931 low quality reads from both the samples, were removed and remaining high quality reads were used for transcriptome analysis. Trinity assembler generated de novo assembly having 258,427 transcripts with default k-mer value 25 (Table 1). This was followed by removal of redundant sequences using CAP3 assembler, a total of 211,343 transcripts were obtained which were used for further analysis. Final assembly had GC content 41.96% with N50 of 1081 bp. Minimum and maximum transcript lengths were 190 bp and 50,081 bp, respectively.

Table 1

Reads’ statistics before and after trimming

Sample name	Raw paired-end reads pairs	Both Surviving	Forward Only Surviving	Reverse Only Surviving	Dropped
Infected	30,669,583	27,345,451 (89.16%)	2,814,040 (9.18%)	354,038 (1.15%)	156,054 (0.51%)
Control	36,967,121	32,541,568 (88.03%)	3,787,558 (10.25%)	439,064 (1.19%)	198,931 (0.53)

Though soybean genome is having 66,210 predicted gene model [20] but constructed transcriptome assembly appears to be represented by relatively higher number of transcripts (211343). Potential reasons for relatively higher number of observed transcripts could be (1) paleopolyploid origin of soybean genome having twice duplication event (59 MYA and 13 MYA), where variant transcripts are expected due to differential half-life/ retention time of homologues across genome having variation among them. These variations are due to recombination, structural variation due to insertion, deletion, inversion, unequal crossing-over, pseudogenization and background mutation [22, 28], (2) stress induced isoforms due to alternative splicing of immune associated genes [29] and (3) higher number of genes (> 66 K) and average number of exons per gene which varies from 3 to 5 is also expected to generate higher number of transcripts (196 to 330 thousand) [20].

Identification of differentially expressed genes

Analysis of DEG were carried out by transcriptome assembly based approach because it is expected to discover “extra genes” [30] and isoforms [31] also retaining phytoplasma transcripts. However, mapping of transcripts over soybean reference genome assembly was done to evaluate the obtained DEG. Comparison of three different thresholds of mapped reads at 75, 90, 95 of % similarity was observed with < 5% variation (Table 2). This indicates accuracy of transcriptome assembly as well as uniformity of coverage and depth as there is < 5% variation.

Table 2

Mapping of transcripts over soybean reference genome assembly

Total number of transcripts	Total number of mapped transcripts over reference genome	Percentage mapped transcripts	Percentage similarity threshold
211,343	191,864	90.78	75
211,343	189,997	89.90	90
211,343	183,538	86.84	95

By transcriptome assembly approach, we found 35,725, 14,487 and 4490 DEGs at three different q value parameters i.e. q = 0.9, q = 0.95 and q = 0.99, respectively. DEGs were discovered by both the methods, i.e., edgeR and NOISeq. DEG discovered by NOISeq was compared with edgeR. Out of 17,454 DEGs obtained by edgeR, a total of 12,182 and 12,053 were found to be common in both at q = 0.9 and 0.95, respectively. At q = 0.99, comparison of both the methods showed 3879 common DEGs. Our comparative analyses by NOISeq validated that DEG obtained. Its comparison with edgeR at different Q values is presented in Table 3. At much higher stringency (q = 0.99), genes having higher expression (> ± 8 fold change value) more than 85% of the DEGs were found common. Graphical representation of upregulated and downregulated genes of these sets are depicted in heatmap (Additional file 1).

Table 3

Comparison of DEGs obtained by NOISeq and edgeR

Q values	NOISeq DEGs	Common DEGs in NOISeq and edgeR
q = 0.9	35,725	12,182
q = 0.95	14,487	12,053
q = 0.99	4490	3879

Annotation and functional characterization

Homology search of soybean differential expressed genes revealed a total of 12,900, out of 17,454 unigenes having similarity with other known gene in the database. Maximum number (6701) of hits matched with Glycine max, followed by Glycine soja and Phaseolus vulgaris i.e. 3590 and 280, respectively (Fig. 4, Additional file 2). There were only 411 phytoplasma specific transcripts, which is 2.35% of the total DEGs. Gene ontology analysis was performed to categorized genes into three sub-division of molecular, biological and cellular functions (Fig. 5).

Fig. 4

Top 20 species distribution of differential expressed genes of soybean

Fig. 5

Gene ontology of differential expressed genes of soybean, Green color lines represented biological process, while blue and yellow color lines showed molecular functions and cellular components

From the total 17,454 differential expressed genes, 5561 unigenes showed similarity with transcriptional factors using blast against PlantTFDB v4.0. Maximum number of unigenes showed similarity with MYB i.e. 449, followed by bHLH, ERF, NAC and FAR1 in 439, 363, 339 and 296, respectively (Additional file 3). KEGG pathway analysis showed that 6041 unigenes were involved in 139 pathways (Additional file 4).

Among the DEGs hits of the flower bud tissue transcriptomic data of soybean, 296 hits were found with phytoplasma species. FBD transcriptome is mixture of transcripts of host and parasite both. This is due intra cellular presence of phytoplasma in symptomatic soybean samples. Many of the transcripts of host plants are in response to the requirement of phytoplasma. Phytoplasma is dependent on protein synthetic machinery of hosts for survival [32] as it contains very limited set of genes due to its smaller genome size (500-1500Kbp) [33]. Similar higher abundance of phytoplasma transcripts are already reported in other crops witches’ broom disease also like Paulownia [34]. Phytoplasma lowers the host plant systemic resistance by supressing salicylic acid pathways, enhancing its growth for its further invasion, proliferation and dispersion [35].

Among the DEGs we found up regulation of gene PmbA which is modulator of DNA gyrase. This is required for faster bacterial replication fork bidirectional DNA unwinding along with helicase [36]. Multiple sets of transcripts with hit as “retron-type reverse transcriptase” was found which is obviously expected in phytoplasma due to its role in genome size plasticity [37]. Its coding gene known as intron II reverse transcriptase is scattered over phytoplasma genome for lateral transfer of the genes as an adoptive mechanism of parasite to survive by creating variability. Different strains of same phytoplasma species varies in their size due to such genes [37].

We found differential expression of Zn-dependent protease transcripts which is known for virulence triggering gene in crop apple. The polymorphism of this gene has been found associated with moderate to severe disease condition. This Zn-dependent protease gene is also known as hflB (synonym ftsH) which codes for membrane associated ATP- and Zn2 + −dependent proteases controlling assembly, degradation and stability of protein affecting membrane and cytoplasm [38].

Up regulated virulence protein SAP54 which is also a type of effector, it mediates degradation of MTFs (MADS-box transcription factors) through 26S proteasome shuttle protein RAD23 resulting into floral bud distortion (FBD) along with development of phyllody (leafy flower). This phyllody is and adaptive response of phytoplasma by extending the vegetative tissue (leaf like flowers) to attract insect vector. This further enhances the insect colonization which is a strategic adaptation by phytoplasma for wider dispersal through insect vector [39].

Various transcripts of auxin responsive factor, auxin transport protein, auxin induced protein were found differentially expressed. They are known to play role in pollen wall-patterning and pollen development [40]. Auxin-responsive promoter elements are known to be involved in switching of vegetative to reproductive phase along with environmental signalling. Phytoplasma perturbs this normal reproductive development of floral tissue leading to FBD [12].

In witches’ broom syndrome (WBS), there is increase of carbohydrate protein and chlorophyll by > 2 fold [41]. This is due to impaired hormonal balance leading to impaired amino acid and carbohydrate translocation along with senescence. The vegetative period prolongs as phyllody but there is no proper development of pod.

Such flower virulence and distortion with abnormal shoot branching and stunted growth is also reported in brassica [42]. Sugar transport and accumulation leads to distant signalling for manifestation of FBD [9].

Upregulation was observed in SAP11 gene which encodes for phytoplasma protein effector is known to enhance insect vector reproduction by attenuating plant development and immunity [43]. Phytoplasma effectors are reported to alter development of flower, thus induces witches’ broom and also modify shape of the leaf to facilitate plant-insect interactions. These effectors facilitates advantage in lifecycle of phytoplasma into two different kingdoms namely, Plantae (soybean) and Animalia (insect) [44]. These effectors can be used in further research for development of new agrochemicals required in disease combating strategies [45].

Mining of putative molecular markers

A total of 27,925 SSR markers were mined from 211,344 transcripts obtained by de novo transcriptome assembly of Glycine max. Table 4 provides the information of all repeat units found in mononucleotides, dinucleotides, trinucleotides, tetranucleotides, pentanucleotides and hexanucleotides markers. There is no polymorphism discovery as these SSRs are mined from a single genotype, however, the information generated (Additional file 5) having details of loci along with their computed primer for genotyping. Future use of these genomic resources were evaluated by in silico genotyping of SSR loci by e-PCR. For this, mined SSR loci having di and tri repeats were used for e-PCR validation. A total of 1874 transcripts were (size of > 1000 bps) were used for e-PCR. After exclusion of “non-specific” e-PCR products (having more than one hit with 100% similarity), a total of 193 specific SSR loci were found which can be used for future genotyping (Additional file 6). Non-specific e-PCR products represented by multiple hits were expected in soybean genome as it contains > 75% duplicate genes due to duplication events [22] and also due to isoforms. Our e-PCR results clearly demonstrates that for paleodiplod genome species like soybean, such approach can be successfully used before wet-lab validation of SSR loci to reduce the time and cost by exclusion of the multiple copy transcripts in genotyping. These final set of SSR loci can be used for genotyping required in diversity, QTL and association mapping.

Table 4

Information of mined simple sequence repeats and all repeating units

	De novo Transcriptome assembly
Sequences examined	211,344
Identified SSRs	29,964
SSR containing sequences	24,856
Sequences containing more than 1 SSR	4111
SSRs present in compound formation	2039
Mono	15,068
Di	7131
Tri	7348
Tetra	284
Penta	59
Hexa	74

Though the transcriptomic data belongs to single genotype, JS- 335 Indian soybean variety but mining of SNP and InDels has been done successfully by aligning it over available reference genome which represents Williams 82 variety of USA. A total of 146,065 variants were found in soybean RNA-seq which contains 139,461 SNPs and 6604 InDels. Maximum variants were found in chromosome 18 i.e. 9582, followed by 8748 and 7489 in chromosome 6 and 2, respectively. Minimum number of variants were found in chromosome 11 i.e. 4739. The Ts/Tv rations of SNP was 1.53 and A-G and T-C were maximum predicted variants in transition whereas T-A and A-T were maximum predicted variants in transversion (Fig. 6a). Also, the annotation was performed against Glycine max genome where maximum variants (37.48%) were found in exonic region, followed by downstream (20.32%) and upstream (16.88%) regions, respectively (Fig. 6b; Additional file 7).

Fig. 6

a Distribution of SNPs (Ts/Tv) in Soybean sequences; b Annotation and classification of Variants on the basis of locations

All the 176,029 putative markers (SSRs, SNPs and InDels) can be used as a genomic resource for future research especially for QTL, gene mapping and linkage analysis. Such use of DNA markers for phytoplasma resistance QTL mapping are already reported in crop like bitter-berry (Prunus virginiana) [46]. Phytoplasma resistance trait introgression in varietal improvement program using DNA markers has been successfully reported in apple [47]. In coconut, lethal yellowing disease (LYD) which is caused by phytoplasma DNA markers have been used in hybrid variety development program for improvement of LYD resistance [48].

Validation and expression analysis by qPCR

Relative gene expression values obtained by qPCR analysis of all the 10 genes having up and down regulated, were in correspondence with computed log fold change value of DEGs (Fig. 7). Details of Gene and primers used for qPCR are listed in Table 5.

Fig. 7

Comparison of gene expression between RNA-seq and qPCR for selected transcripts (genes)

Table 5

Gene and primes sequence details of qPCR

Transcript ID: TRINITY_DN54263_c0_g1_i14Transcript ID: TRINITY_DN56761_c0_g1_i4Transcript ID: TRINITY_DN61766_c0_g1_i1Transcript ID: TRINITY_DN56363_c0_g1_i13Transcript ID:TRINITY_DN55029_c2_g4_i10Transcript ID: TRINITY_DN8658_c0_g1_i1Transcript ID: TRINITY_DN51118_c0_g1_i8Transcript ID: TRINITY_DN22465_c0_g1_i1Transcript ID: TRINITY_DN21034_c0_g1_i1Transcript ID: TRINITY_DN9541_c0_g1_i1

Gene Name	Gene ID	Forward primer	Tm	Reverse primer	Tm
Receptor like kinase isoform X1	RLKI-X1	ACCAAACTCGGAACTTGATGGAGC	63.37	GTAGGCCTAAGTGTTGGAGAAGCA	62.37
	RLKI-X1	ACCAAACTCGGAACTTGATGGAGC	63.37	GTAGGCCTAAGTGTTGGAGAAGCA	62.37
La-related protein	LRP	GGCCCATTCCATTTCCAGCACG	64.61	TGCACATGAGGAAGAAGATGGGGA	63.71
	LRP	GGCCCATTCCATTTCCAGCACG	64.61	TGCACATGAGGAAGAAGATGGGGA	63.71
Probable peroxygenase 4 (Glycine max)73,458,878 (putative peroxygenase 4	PPER4	GCTTCCATCATAAACACTTCGG	62.13	AGGAAGGATTGGTGGCTTGGTT	64.61
	PPER4	GCTTCCATCATAAACACTTCGG	62.13	AGGAAGGATTGGTGGCTTGGTT	64.61
Zinc finger CCH domain containing protein 69 –like isoform X1	ZF	GAGCCTGTCTGAAAGGGGAGCA	64.24	TGCAGCGACTACCATAAGCACA	62.3
	ZF	GAGCCTGTCTGAAAGGGGAGCA	64.24	TGCAGCGACTACCATAAGCACA	62.3
AP2 like ethylene responsive transcription factor ANT like	AP2TF	ACTGTGGGGTGTGGAGAGTTGCA	66.09	GCCCTCTCTTCTTTGCATCCACAGC	66.04


Glyceraldehyde 3 phosphate dehyadrogenase 2,cytolistic	G3PDC	GCCCTCTGACTCCTCCTTGATAGCA	65.42	GGCATTCCGTGTCCCTACTGTGGA	66.28
	G3PDC	GCCCTCTGACTCCTCCTTGATAGCA	65.42	GGCATTCCGTGTCCCTACTGTGGA	66.28
Cellulose synthase A catalytic subunit	CS	AACTCACCAGACATCGGTTGCCC	65.11	AAGTCGGGGATGCTGTGGGAAGA	65.72
	CS	AACTCACCAGACATCGGTTGCCC	65.11	AAGTCGGGGATGCTGTGGGAAGA	65.72
Auxin response Factor 3	ARF3	CCCATTTCATGTGACTCTTCTG	63.63	TGGATTCCAAAGAGCTGAACCT	61.5
	ARF3	CCCATTTCATGTGACTCTTCTG	63.63	TGGATTCCAAAGAGCTGAACCT	61.5
Sucrose phosphate Synthase 3	SPS3	AGCCACATTGAGTCCCCAACGG	65.07	TGCGAGGCCTACGTTGTCATCCT	65.98
	SPS3	AGCCACATTGAGTCCCCAACGG	65.07	TGCGAGGCCTACGTTGTCATCCT	65.98
Ehtylene Respoonsive Trnscription Factor ERF012	ERTF	TCAGCAGAAACTCCACAAGCGA	62.43	TGGTATGACTTGGAGGGGTTGCA	63.77
	ERTF	TCAGCAGAAACTCCACAAGCGA	62.43	TGGTATGACTTGGAGGGGTTGCA	63.77

Data submission

The sequencing dataset used in the study is available in the NCBI repository with BioProject: PRJNA472133, BioSamples: SAMN09227257 (Asymptomatic flower bud tissue of soybean control) and SAMN09227258 (Symptomatic flower bud tissue of soybean infected with phytoplasma).

Conclusions

This is the first transcriptomic study of FBD or WBS disease in soybean. We report transcriptomic signature of 17,454 DEGs involving 139 pathways in manifestation of the disease. The study reveals abnormal flower development at molecular level in terms of signalling, mobilization of carbohydrate and protein, phyllody, abnormal pollen development, improved colonization of insect in host plants. A total of 176,029 genic region putative markers (SSRs, SNPs and InDels) are reported which can be used as a genomic resource for future molecular breeding program for transfer of phytoplasma resistance in soybean variety improvement. Such more studies are warranted not only for understanding the disease at molecular level but they are also required for germplasm improvement in the endeavour of soybean productivity.

Methods

Identification of symptomatic and asymptomatic plant of FBD

For present investigation, the most popular widely grown soybean variety, JS- 335 of India was selected where recent FBD outbreak has adversely affected its productivity. Experimental data and tissue were collected from the experimental field of Department of Agricultural Botany, Dr. Panjabrao Deshmukh Krishi Vidyapeeth, Akola located in tropical belt at 307.4 m above mean sea level. The geographical situation is 20.42°N latitude and 77.02°E longitude with medium black, with clay, fairly levelled and uniform in topography with appropriate drainage. To discriminate symptomatic and asymptomatic plants, pollen viability, morphology by simple and scanning electron microscope (SEM), stigma receptivity assay, anther morphology and germination assay was done at R2 reproductive stage as previously described [12].

PCR based test for diagnosis of phytoplasma

Symptomatic and asymptomatic plants were diagnosed by PCR based test from plant leaf tissue samples collected. DNA isolation was done using MO BIO PowerPlant Pro DNA Isolation Kit (Carlsbad, CA) as per manufacturer’s protocol. PCR reaction was carried out with 1X PCR buffer, 1.5 mM MgCl2, 50 ng genomic DNA, 200 μM of dNTPs, 1 unit of Taq enzyme (Invitrogen, Germany), 0.5% DMSO, 2 μg/μl BSA, 1 μM of P1 and P7 primers [49]. Subsequently nested PCR was performed using primer R16F2n and R16R2 [50] where 10 fold dilution of initial PCR product were used as template.

Tissue collection and RNA extraction

Three sets of symptomatic and asymptomatic plant tissues (frozen bud and node tissues) at R5 stage (1/8 in. long seed in the pod at one of the four uppermost nodes on the main stem) were collected and treated with RNAlater and stored at -80 °C.

For RNA extraction, sample pooling of 10 biological replicates of symptomatic and asymptomatic plant tissues were done to minimize the across sample variability in each set [51]. PureLink_RNA Mini Kit (Invitrogen, San Diego, USA) was used as per manufacturer’s protocol.

Quality of RNA was checked on the Bioanalyzer 2100 (Agilent Technologies) using Agilent RNA chip. Samples with > = 8 RNA Integrity Number (RIN) value were used for further analysis. RNA-seq library was prepared by TruSeq RNA Sample Prep Kits (Illumina) using 4 microgram of total RNA. Poly-A containing mRNA molecules were separated by poly-T oligo-attached magnetic beads as per manufacturer’s protocol. Fragmentation of mRNA was done using divalent cations at elevated temperature. cDNA synthesis was done using reverse transcriptase and random primers and second strand cDNA synthesis was done using DNA Pol I and RNase H. By an end repair process single ‘A’ gets added to ligate the adapters. Final cDNA library was created after purification and enrichment. To assess mRNA quality, enrichment, fragment size, and library size Bioanalyzer was used. Before sequencing, quantity of library was measured using Qubit and qPCR. Constructed libraries of single set were sequenced on Illumina HiSeq 2000 platform and 60 million reads (6GB) data were generated (2 x100bp PE reads).

Pre-processing and de novo transcriptome assembly

Paired-end reads of Glycine max (Soybean) were generated using Illumina HiSeq2000 technology. Raw reads were assessed and visualized using FastQC tool [52]. For trimming and removal of low quality reads (bases from 3′ and 5′ end; phred-score ≤ 20, trimmomatic tool version 0.38 [53] was used. De novo transcriptome assembly was done using Trinity [54] and redundant sequences were removed by CAP3 [55]. Being interactome data, having transcripts from two genomes, namely, soybean and phytoplasma, standard approach of aligning/filtering the reads on a reference genome was not followed. This was done in order to get “extra genes” along with isoforms and also to retain phytoplasma transcripts in subsequent analysis.

Identification of differentially expressed genes

De novo transcriptome assembly was used for identification of DEGs. Mapping and alignment of raw reads over de novo transcriptome assembly was done using Bowtie [56]. RNA-Seq by Expectation-Maximization (RSEM) tool [57] was used to calculate expression values of each transcript in terms of Fragments per kilobase of exon per million mapped reads. Subsequently, for identification of differential expressed genes, edgeR (Empirical analysis of Digital Gene Expression in R) [58] was used which normalizes the data using TMM (Trimmed Mean of M-values) method [59]. To reduce the noise and computational reproducibility, NOIseq tool was used which is having both non-parametric and data-adaptive approach for the identification of differential expressed genes is based on count matrix [60]. For the expression analysis, threshold variance and P-value was set to 0.05 for identification of significant genes [61, 62]. A comparative approach for identification of DEG by both these tools were done at three different q values, i.e. q = 0.9, q = 0.95 and q = 0.99.

Annotation and functional characterization

Differential expressed genes were taken for homology search against NCBI non-redundant database using Blastx algorithm (standalone local ncbi-blast-2.2.31+; E-value 1e-5) [63]. Blast2Go Pro version 3.1 software [64] was employed for annotation and functional characterization of the DEGs. Transcriptional factors prediction was done using PlantTFDB v4.0 database [65].

Mining of putative molecular markers

Putative molecular markers, namely, simple sequence repeats (SSRs), single nucleotide polymorphism (SNPs) and InDels were mined from transcriptomic data. MISA-MIcroSAtellite identification tool [66] was used to mine the putative SSR markers using PERL script. Repeat units used were 10 and 6 for mono- and dinucleotides, respectively. For tri-, tetra-, penta- and hexa-nucleotides, 5 repeat units were used. This was followed by generation of primers of SSR loci using PRIMER3 tool [67]. Mined SSRs having di and tri repeats were used for in silico PCR validation. Transcripts with more than 1000 bp were selected and primers were designed for e-PCR based genotyping. e-PCR products having more than one hit in the entire soybean genome were excluded.

Since transcriptomic data are generated from single Indian genotype (JS- 335) variety thus reference genome of USA variety, Williams 82 can be used for SNP mining. In order to discover SNP using reference based mapping. SNP (Single Nucleotide Polymorphism) calling was performed against reference genome of Glycine max v2.0 using several filters like minimum read depth coverage > = 15 [68], Quality score > 30 [69, 70] and considering all the variants with minimum 50 bp flaking region from both sides. Reference soybean genome assembly (Glycine max v2.0 under accession number GCA_000004515.3) was retrieved from NCBI for mining of SNPs and InDels from soybean transcriptome using Burrows-Wheeler Aligner [71] and Samtools [72]. Further, annotations of identified variants were performed by using SnpEff tool [73].

Validation and expression analysis by qPCR

cDNA synthesis was done for each RNA sample using RevertAid First Strand cDNA Synthesis Kit (Genetix, USA) to obtain template for qPCR. From DEG, 10 transcripts were randomly selected (5 up and 5 down-regulated) for designing of primer by Primer3 [67]. PCR was carried out in triplicate using QuantiFast SYBR Green PCR Master Mix kit (Genetix, USA) on real time PCR machine (ABI-7300, Applied Biosystem) for 40 cycles with melt curve step. For normalization, housekeeping gene, actin was used as reference. PCR optimization was done to obtain linear relationship for each set of primer pair. Finally, ΔΔCT fold change value [74] was calculated to obtain magnitude of differential gene expression. It was compared with logFC value for validation.

Additional file

Additional file 1:

Heatmap for the graphical representation of differential expressed genes found in Infected vs. control sample (DOCX 50 kb)

Additional file 2:

Expression profile and blast results of differential expressed genes. (XLSX 2004 kb)

Additional file 3:

Blast results of identified transcriptional factors found in DEGs. (XLSX 599 kb)

Additional file 4:

Pathways identified in differential expressed genes of soybean. (XLSX 55 kb)

Additional file 5:

List of identified SSR and three sets of designed three sets of primers. (XLSX 5310 kb)

Additional file 6:

List of 193 SSR loci by e-PCR. (XLSX 24 kb)

Additional file 7:

Variants identified from soybean samples against reference genome of Glycine max and also annotation of variants. (XLSX 31287 kb)

Acknowledgements

We sincerely thank the Department of Atomic Energy (DAE), Board of Research in Nuclear Sciences (BRNS), Government of India, Mumbai for providing financial support (grant no. 2013/37B/44/BRNS/1904). The authors would like to express their sincere thanks and gratitude to the Dr. Panjabrao Deshmukh Krishi Vidyapeeth (State Agriculture University), Akola-444104, Maharashtra State, INDIA for providing required facility to undertake the investigation. Authors are thankful to Indian Council of Agricultural Research, Ministry of Agriculture and Farmers’ Welfare, Govt. of India for providing financial assistance in the form of CABin grant (F. no. Agril. Edn.4-1/2013-A&P) as well as Advanced Super Computing Hub for Omics Knowledge in Agriculture (ASHOKA) facility at ICAR-IASRI, New Delhi, India.

Funding

Department of Atomic Energy (DAE), Board of Research in Nuclear Sciences (BRNS), Government of India, Mumbai (grant no. 2013/37B/44/BRNS/1904) and CABin grant, Indian Council of Agricultural Research, Ministry of Agriculture and Farmers’ Welfare, Govt. of India (F. no. Agril. Edn.4–1/2013-A&P).

Availability of data and materials

Abbreviations

cDNAcomplementary DNA

DEGDifferential Expressed Genes

DNADeoxyribonucleic Acid

edgeREmpirical analysis of Digital Gene Expression in R

FBDFloral Bud Distortion

InDelInsertion-Deletion

LYDLethal Yellowing Disease

NCBINational Center for Biotechnology Information

PCRPolymerase Chain Reaction

PERLPractical Extraction and Report Language

PlantTFDBPlant Transcription Factor Database

RINRNA Integrity Number

RNARibonucleic Acid

RSEMRNA-Seq by Expectation-Maximization

SEMScanning Electron Microscope

SNPSingle Nucleotide Polymorphism

SSRSimple Sequence Repeats

TFTranscription Factor

WBSWitches’ Broom Syndrome

Authors’ contributions

PVJ, SJ, AGD, RSN, SP, JGM, SSM, and DK: Conceived the theme of the study, SJ, RSJ, MAI, MPM and MSD: did the computational work; PBK, PVJ and SKK: Sample collection and wet lab wok; JK, RST, PGK and MPM: did validation work; SJ, PVJ, MAI, RSJ and DK: Drafted the manuscript; AR, JGM, RST and DK Edited the manuscript; All authors read and finalized the manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. HorvathAAChanges in the blood composition of rabbits fed on raw soy beansJ Biol Chem192668343355[Google Scholar]
2. MasudaTGoldsmithPDWorld soybean production: area harvested, yield, and long-term projectionsInternational food and agribusiness management review2009124143162[Google Scholar]
3. FriedmanMBrandonDLNutritional and health benefits of soy proteinsJ Agric Food Chem200149310691086[PubMed][Google Scholar]
4. PoppJHarangi-RákosMGabnaiZBaloghPAntalGBaiABiofuels and their co-products as livestock feed: global economic and environmental implicationsMolecules2016213285[PubMed][Google Scholar]
6. Jadhav PravinVManeSSNandanwarRSKalePBDudhareMSMoharilMPFloral bud distortion in soybean and incidence in Central IndiaEgypt J Biol2013155965[Google Scholar]
7. SubektiNAThe effects of disease on plant reproduction as basis for breeding for disease resistanceIn Proceedings of the scientific seminar and annual meeting of the regional commissioner PEI PFI XIX South Sulawesi20085167180[Google Scholar]
8. MarrDLMarshallMLThe role of fungal pathogens in flower size and seed mass variation in three species of Hydrophyllum (Hydrophyllaceae)Am J Bot2006933389398[PubMed][Google Scholar]
9. PracrosPRenaudinJEveillardSMourasAHernouldMTomato flower abnormalities induced by stolbur phytoplasma infection are associated with changes of expression of floral development genesMol Plant-Microbe Interact20061916268[PubMed][Google Scholar]
10. SinghAKBhattBPOccurrence of phytoplasma phyllody and witches' broom disease of faba bean in BiharJ Environ Biol2013345837[PubMed][Google Scholar]
12. KalePBJadhavPVWakekarRSMoharilMPDeshmukhAGDudhareMSCytological behaviour of floral organs and in silico characterization of differentially expressed transcript-derived fragments associated with ‘floral bud distortion’in soybeanJ Genet2016954787799[PubMed][Google Scholar]
13. DhingraKLChenuluVVSymptomatology and transmission of witches' broom disease of soybean in IndiaCurrent science (India)1983[Google Scholar]
14. ThoratVMoreVJadhavPManeSSNandanwarRSSurayavanshiMFirst report of a 16SrII-D group Phytoplasma associated with witches’-broom disease of soybean (Glycine max) in Maharashtra, IndiaPlant Dis2016100122521[Google Scholar]
15. KumarSSharmaPSharmaSRaoGPMixed infection and natural spread of ‘Candidatus Phytoplasma asteris’ and Mungbean yellow mosaic India virus affecting soya bean crop in IndiaJ Phytopathol20151635395406[Google Scholar]
16. K. Subramanya Sastry. Seed-borne plant virus diseases. Springer India 2013, Springer Science & Business Media. ISBN 978-81-322-0812-9. http://www.soybeancheckoffresearch.org/DetailsbyPaperid.php?id_Paper=1144. Accessed August 2014.
17. GolnaraghiARShahraeenNPourrahimRFarzadfarSGhasemiAOccurrence and relative incidence of viruses infecting soybeans in IranPlant Dis2004881010691074[Google Scholar]
19. WongCESinghMBBhallaPLMolecular processes underlying the floral transition in the soybean shoot apical meristemPlant J2009575832845[PubMed][Google Scholar]
20. SeverinAJWoodyJLBolonYTJosephBDiersBWFarmerADRNA-Seq atlas of Glycine max: a guide to the soybean transcriptomeBMC Plant Biol2010101160[PubMed][Google Scholar]
21. JungCHWongCESinghMBBhallaPLComparative genomic analysis of soybean flowering genesPLoS One201276e38250[PubMed][Google Scholar]
22. SchmutzJCannonSBSchlueterJMaJMitrosTNelsonWGenome sequence of the palaeopolyploid soybeanNature20104637278178[PubMed][Google Scholar]
23. LiuSCJinJQMaJQYaoMZMaCLLiCFTranscriptomic analysis of tea plant responding to drought stress and recoveryPLoS One2016111e0147306[PubMed][Google Scholar]
24. IovienoPPunzoPGuidaGMistrettaCVan OostenMJNurcatoRTranscriptomic changes drive physiological responses to progressive drought stress and rehydration in tomatoFront Plant Sci20167371[PubMed][Google Scholar]
25. FuLDingZHanBHuWLiYZhangJPhysiological investigation and transcriptome analysis of polyethylene glycol (PEG)-induced dehydration stress in cassavaInt J Mol Sci2016173283[PubMed][Google Scholar]
26. ShinJHVaughnJNAbdel-HaleemHChavarroCAbernathyBDo KimKTranscriptomic changes due to water deficit define a general soybean response and accession-specific pathways for drought avoidanceBMC Plant Biol201515126[PubMed][Google Scholar]
27. BoveJMMycoplasma infections of plantsIsr J Med Sci1981177572585[PubMed][Google Scholar]
28. PanchyNLehti-ShiuMShiuSHEvolution of gene duplication in plantsPlant Physiol2016171422942316[PubMed][Google Scholar]
29. YangSTangFZhuHAlternative splicing in plant immunityInt J Mol Sci20141561042410445[PubMed][Google Scholar]
30. KazemianMRenMLinJXLiaoWSpolskiRLeonardWJComprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancerMol Syst Biol2015118826[PubMed][Google Scholar]
31. PalmieriNNolteVSuvorovAKosiolCSchlöttererCEvaluation of different reference based annotation strategies using RNA-Seq–a case study in Drososphila pseudoobscuraPLoS One2012710e46415[PubMed][Google Scholar]
32. BaiXZhangJEwingAMillerSARadekAJShevchenkoDVLiving with genome instability: the adaptation of phytoplasmas to diverse environments of their insect and plant hostsJ Bacteriol20061881036823696[PubMed][Google Scholar]
33. MarconeCNeimarkHRagozzinoALauerUSeemüllerEChromosome sizes of phytoplasmas composing major phylogenetic groups and subgroupsPhytopathology1999899805810[PubMed][Google Scholar]
34. MouHQLuJZhuSFLinCLTianGZXuXTranscriptomic analysis of paulownia infected by paulownia witches'-broom phytoplasmaPLoS One2013810e77217[PubMed][Google Scholar]
35. LuYTLiMYChengKTTanCMSuLWLinWYTransgenic plants that express the phytoplasma effector SAP11 show altered phosphate starvation and defense responsesPlant Physiol2014164314561469[PubMed][Google Scholar]
37. AndersenMTLieftingLWHavukkalaIBeeverREComparison of the complete genome sequence of two closely related isolates of ‘Candidatus Phytoplasma australiense’reveals genome plasticityBMC Genomics2013141529[PubMed][Google Scholar]
38. SeemüllerESuleSKubeMJelkmannWSchneiderBThe AAA+ ATPases and HflB/FtsH proteases of ‘Candidatus Phytoplasma Mali’: phylogenetic diversity, membrane topology, and relationship to strain virulenceMol Plant-Microbe Interact2013263367376[PubMed][Google Scholar]
40. YangJTianLSunMXHuangXYZhuJGuanYFAUXIN RESPONSE FACTOR17 is essential for pollen wall pattern formation in ArabidopsisPlant Physiol20131622720731[PubMed][Google Scholar]
41. WakekarRSJadhavPVKalePBMoharilMPNandanwarRSManeSSPollen dysfunction causes ‘floral bud Distortion’in Indian soybean (Glycine max)Agricultural Research2018711024[Google Scholar]
42. KaminskaMBerniakHKaminskiPFailure of flower bud formation in Brassica plants associated with phytoplasma infectionJ Agric Sci201242219[Google Scholar]
43. SugioAMacLeanAMGrieveVMHogenhoutSAPhytoplasma protein effector SAP11 enhances insect vector reproduction by manipulating plant development and defense hormone biosynthesisProc Natl Acad Sci201110848E1254E1263[PubMed][Google Scholar]
44. SugioAMacLeanAMKingdomHNGrieveVMManimekalaiRHogenhoutSADiverse targets of phytoplasma effectors: from plant development to defense against insectsAnnu Rev Phytopathol201149175195[PubMed][Google Scholar]
45. AnthouardRDiRitaVJChemical biology applied to the study of bacterial pathogensInfect Immun2015832456469[PubMed][Google Scholar]
47. JarauschWBisogninCSchneiderBGrandoMSVelascoRSeemüllerEBreeding of apple rootstocks resistant to Candidatus phytoplasma MaliBulletin of Insectology2007602299300[Google Scholar]
48. GurrGMJohnsonACAshGJWilsonBAEroMMPilottiCACoconut lethal yellowing diseases: a phytoplasma threat to palms of global economic and social significanceFront Plant Sci201671521[PubMed][Google Scholar]
49. SmartCDSchneiderBBlomquistCLGuerraLJHarrisonNAAhrensUPhytoplasma-specific PCR primers based on sequences of the 16S-23S rRNA spacer regionAppl Environ Microbiol199662829882993[PubMed][Google Scholar]
50. GundersenDELeeIMSchaffDAHarrisonNAChangCJDavisREGenomic diversity and differentiation among phytoplasma strains in 16S rRNA groups I (aster yellows and related phytoplasmas) and III (X-disease and related phytoplasmas)Int J Syst Evol Microbiol19964616475[Google Scholar]
51. ZouCWangPXuYBulked sample analysis in genetics, genomics and crop improvementPlant Biotechnol J20161419411955[PubMed][Google Scholar]
52. AndrewSFastQC: a quality control tool for high throughput sequence data2010
53. BolgerAMLohseMUsadelBTrimmomatic: a flexible trimmer for Illumina sequence dataBioinformatics2014301521142120[PubMed][Google Scholar]
54. HaasBJPapanicolaouAYassourMGrabherrMBloodPDBowdenJDe novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysisNat Protoc2013881494[PubMed][Google Scholar]
55. HuangXMadanACAP3: A DNA sequence assembly programGenome Res199999868877[PubMed][Google Scholar]
56. LangmeadBTrapnellCPopMSalzbergSLUltrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biol2009103R25[PubMed][Google Scholar]
57. LiBDeweyCNRSEM: accurate transcript quantification from RNA-Seq data with or without a reference genomeBMC bioinformatics.2011121323[PubMed][Google Scholar]
58. RobinsonMDMcCarthyDJSmythGKedgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics2010261139140[PubMed][Google Scholar]
59. RobinsonMDOshlackAA scaling normalization method for differential expression analysis of RNA-seq dataGenome Biol2010113R25[PubMed][Google Scholar]
60. TarazonaSGarcía-AlcaldeFDopazoJFerrerAConesaADifferential expression in RNA-seq: a matter of depthGenome Res2011211222132223[PubMed][Google Scholar]
61. SotakMCzerankováOKleinDJurčackováZLiLČellárováEComparative transcriptome reconstruction of four Hypericum species focused on Hypericin biosynthesisFront Plant Sci201671039[PubMed][Google Scholar]
62. WangYGuoZYSunXLuSBXuWJZhaoQIdentification of changes in gene expression of rats after sensory and motor nerves injurySci Rep2016626579[PubMed][Google Scholar]
63. CamachoCCoulourisGAvagyanVMaNPapadopoulosJBealerKBLAST+: architecture and applicationsBMC bioinformatics2009101421[PubMed][Google Scholar]
64. ConesaAGötzSGarcía-GómezJMTerolJTalónMRoblesMBlast2GO: a universal tool for annotation, visualization and analysis in functional genomics researchBioinformatics2005211836743676[PubMed][Google Scholar]
66. ThielTMichalekWVarshneyRGranerAExploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.)Theor Appl Genet20031063411422[PubMed][Google Scholar]
67. UntergasserACutcutacheIKoressaarTYeJFairclothBCRemmMPrimer3—new capabilities and interfacesNucleic Acids Res20124015e115[PubMed][Google Scholar]
68. UitdewilligenJGWoltersAMBjornBBormTJVisserRGvan EckHJA next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potatoPLoS One201385e62355[PubMed][Google Scholar]
69. LiuJMcClelandMStawiskiEWGnadFMaybaOHavertyPMIntegrated exome and transcriptome sequencing reveals ZAK isoform usage in gastric cancerNat Commun201453830[PubMed][Google Scholar]
70. YuXSunSComparing a few SNP calling algorithms using low-coverage sequencing dataBMC bioinformatics.2013141274[PubMed][Google Scholar]
71. LiHDurbinRFast and accurate short read alignment with burrows–wheeler transformBioinformatics2009251417541760[PubMed][Google Scholar]
72. LiHHandsakerBWysokerAFennellTRuanJHomerNThe sequence alignment/map format and SAMtoolsBioinformatics2009251620782079[PubMed][Google Scholar]
73. CingolaniPPlattsAWangLLCoonMNguyenTWangLA program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3Fly (Austin)2012628092[PubMed][Google Scholar]
74. LivakKJSchmittgenTDAnalysis of relative gene expression data using real-time quantitative PCR and the 2− ΔΔCT methodMethods2001254402[PubMed][Google Scholar]