ebook img

Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug ... PDF

22 Pages·2014·1.66 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug ...

Ioannidisetal.BMCGenomics2014,15:738 http://www.biomedcentral.com/1471-2164/15/738 RESEARCH ARTICLE Open Access Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug Halyomorpha halys Panagiotis Ioannidis1,4, Yong Lu2, Nikhil Kumar1, Todd Creasy1,5, Sean Daugherty1, Marcus C Chibucos1,3, Joshua Orvis1, Amol Shetty1, Sandra Ott1, Melissa Flowers1, Naomi Sengamalay1, Luke J Tallon1, Leslie Pick2 and Julie C Dunning Hotopp1,3* Abstract Background: Halyomorphahalys(Stål)(Insecta:Hemiptera;Pentatomidae),commonlyknownastheBrownMarmorated StinkBug(BMSB),isaninvasivepestofthemid-AtlanticregionoftheUnitedStates,causingeconomicallyimportant damagetoawiderangeofcrops.NativetoAsia,BMSBwasfirstobservedinAllentown,PA,USA,in1996,andthispest isnowwell-establishedthroughouttheUSmid-Atlanticregionandbeyond.InadditiontotheseriousthreatBMSB posestoagriculture,BMSBhasbecomeanuisancetohomeowners,invadinghomegardensandcongregatinginlarge numbersinhuman-madestructures,includinghomes,tooverwinter.Despiteitssignificanceasanagriculturalpestwith limitedcontroloptions,only100bpofBMSBsequencedatawasavailableinpublicdatabaseswhenthisprojectbegan. Results:Transcriptomesequencingwasundertakentoprovideamolecularresourcetotheresearchcommunityto informthedevelopmentofpestcontrolstrategiesandtoprovidemoleculardataforpopulationgeneticsstudiesof BMSB.Usingnormalized,strand-specificlibraries,wesequencedpoolsofallBMSBlifestagesontheIlluminaHiSeq. Trinitywasusedtoassemble200,000putativetranscriptsin>100,000components.Anovelbioinformaticmethodthat analyzedthestrand-specificityofthedatareducedthisto53,071putativetranscriptsfrom18,573components.By integratingmultipleotherdatatypes,wenarrowedthisfurtherto13,211representativetranscripts. Conclusions:Bacterialendosymbiontgeneswereidentifiedinthisdataset,someofwhichhaveacopynumber consistentwithbeinglateralgenetransfersbetweenendosymbiontgenomesandHemiptera,includingankyrin-repeat relatedproteins,lysozyme,andmannanase.SuchgenesandendosymbiontsmayprovidenoveltargetsforBMSB-specific biocontrol.Thisstudydemonstratestheutilityofstrand-specificsequencingingeneratingshotguntranscriptomesand thatrapidsequencingshotguntranscriptomesispossiblewithouttheneedforextensiveinbreedingtogenerate homozygouslines.Suchsequencingcanprovidearapidresponsetopestinvasionssimilartothatalreadydescribedfor diseaseepidemiology. Keywords:Brownmarmoratedstinkbug,Halyomorphahalys,Transcriptome,Hemiptera,Invasivespecies,Lateralgene transfer,Horizontalgenetransfer,Mannanase,Lysozyme *Correspondence:[email protected] 1InstituteforGenomeSciences,UniversityofMarylandSchoolofMedicine, Baltimore,MD,USA 3DepartmentofMicrobiologyandImmunology,UniversityofMaryland SchoolofMedicine,Baltimore,MD,USA Fulllistofauthorinformationisavailableattheendofthearticle ©2014Ioannidisetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/4.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycredited.TheCreativeCommonsPublicDomain Dedicationwaiver(http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle, unlessotherwisestated. Ioannidisetal.BMCGenomics2014,15:738 Page2of22 http://www.biomedcentral.com/1471-2164/15/738 Background spread, making it a particularly difficult invasive pest to Halyomorpha halys (Stål) (Insecta: Hemiptera: Pentato- manage. Over sixty researchers funded by the USDA and midae), otherwise known as the Brown Marmorated commodity organizations are conducting experiments Stink Bug (BMSB), is an invasive pest that has ravaged aimed at better understanding the biology and ecology of farms and distressed homeowners in the mid-Atlantic thispestandtofindmanagementsolutions.Whenthispro- region of the US in recent years. It is a polyphagous in- ject began, only 100 bp of sequence data was available in sect pest that causes economically important damage to the public database. Transcriptome sequencing wasunder- many crops including vegetables, tree fruit, field crops, takentoprovideamolecularresourcetotheresearchcom- andornamentals[1]. munity for basic and applied research purposes. The data Native to Asia, BMSB was first observed in North can be used to examine the population genetics of BMSB, AmericainAllentown,PA,USAin1996[1].BMSBisnow including gaining a better understanding of the original well-establishedinthemid-AtlanticregionsoftheUSand invasion in Allentown, as well as subsequent invasions hasspreadto41different statesandDC[2].Thereisalso throughouttheUSandtheworld.Understandingthesein- evidence for established populations in Canada [3] and vasionswillincreaseourknowledge,preventingfutureinva- Switzerland [4-6]. It is a pest of tree fruit; grapes; other sions. Our study demonstrates that the transcriptomes of small fruit; row crops including vegetables, legumes and invasive species can be rapidly sequenced, providing a cotton; shade trees; ornamentals; and nursery crops [7-9]. resource to the research community without extensive BMSB crop damage reached economically significant breedingtocreatehomozygouslines(e.g.generating8gen- levels during the 2010 growing season. Growers consid- erationsofanisofemaleline).Suchsequencingcanprovide ered BMSB to be the single most important pest in the arapidresponsetopestinvasionssimilartothatalreadyde- mid-Atlantic region, leading to ‘desperate measures’ in- scribedforepidemiologyofinfectiousdisease(e.g.[15])and cluding increased use of broad-spectrum pesticides to jumpstartmolecularbiologyandgenetic-basedstudies. controlBMSB,detractingfromsustainableIntegratedPest Management practices. In keeping with this, the EPA Results granted emergency approval for the use of several pesti- Sequencingresults cides, including one approved for use in organic farming, Whole transcriptome sequencing was undertaken in order to prevent further economic loss in the mid-Atlantic re- to obtain gene sequences and jumpstart molecular biology gion [10]. In addition to being an agricultural pest, many studies focused on BMSB. RNA was collected for sequen- mid-Atlantic homeowners are troubled by BMSB. Unlike cing from 10 different stages and conditions including: (a) nativestinkbugs,BMSBaggregatesinhuman-madestruc- eggs, (b) 1st instar nymphs, (c) 2nd instar nymphs, (d) 3rd tures, including houses, to overwinter [11]. One home instarnymphs,(e)4thinstarnymphs,(f)5thinstarnymphs, owner reported removing 26,000 BMSB from his resi- (g) an active adult male, (h) an active adult female, (i) an denceinthefirsthalfof2011[12]. adult male in diapause, and (j) an adult female in diapause BMSB has a long history of hitchhiking to new areas. (Figure 1). Two pools were created with equimolar The successful introduction in Allentown, PA, USA, was amountsofeachRNAsamplewithpoolAcontainingallof preceded by the successful interception of BMSB by the thepre-adultstagesandpoolBcontainingtheadults. USDA preventing such invasions including an intercep- Each pool was used to generate a strand-specific library tion of BMSB on a 1983 flight from Japan, an intercep- that was subsequently normalized with DSN (Figure 2). tion in baggage from Korea in 1984, and eight further Strand-specific sequencing allows for assignment of the interceptions from China, Korea, and Japan from 1989– strand of transcription while normalization decreases the 1998 [8]. BMSB has been unintentionally shipped from sequencing of the most abundant transcripts allowing for Japan to New Zealand in a used vehicle [13]. In 2005, sequencing of more rare variants. Sequencing of these li- over a dozen BMSB were recovered from a storage unit braries on the Illumina HiSeq2000 resulted in 196,233,912 in Vallejo, Solano County, CA, that was rented by an in- readstotaling>19billionbasesforPoolAand170,455,294 dividual who had relocated from Pennsylvania [14]. Eco- reads totaling >17 billion bases for pool B. All of the reads logical niche modelling and climate data predict that were pooled and assembled with Trinity [16], which uses areas between 30–50 degrees latitude are at risk of inva- Inchworm to produce contigs via greedy k-mer extension, sion including Northern Europe, north-eastern North Chrysalis to produce components containing related con- America, the northern portions of the North American tigs, and Butterfly to generate transcripts using the various west coast, southern Australia, the North Island of New de Bruijn graph paths in the components. We will refer to Zealand as well as Angola in Africa and Uruguay in theassemblageofsequencesgeneratedbyButterflyasputa- South America [8]. tive transcripts, given that some of the sequences that As summarized above, BMSB are significant agricultural Butterflyassemblesarenotactuallytranscripts,asdescribed pests with limited treatment options and an ability to below. Ioannidisetal.BMCGenomics2014,15:738 Page3of22 http://www.biomedcentral.com/1471-2164/15/738 mappingsthatallowedformultiplematches,whichwillbe referred to as the SSLR for strand-specific log ratio. We wereabletoshowthatoverallthelibraryisstrand-specific with the plus strand preferred to the minus strand 3.40:1 (Figure4A).WhiletheSSLRdemonstratestheplusstrand was preferred, there was a significant amount of leakage that results in two putative transcripts that are reverse complements to one another. Leakage transcripts were identified as those that (a) matched another putative transcriptinthealternateorientationusingBLASTN,and (b) had a positive SSLR as described above. Using this approach,wediscarded32,308putativetranscriptsorigin- ating from 24,507 components. Of those putative tran- scripts, there were 20,795 that had an ORF predicted on only one of the strands, and thus in only one direction. Notunexpectedly,20,350hadtheirORFspredictedonthe minus strand compared to only 445 that had their ORFs in the plus strand. Overall, those with a SSLR<−5 had >10-fold increase in ORFs on the plus strand relative to Figure1LifestagesofBMSB.ThelifestagesofBMSBareshown the minus strand and those with a SSLR >5 had >10-fold startingwitheggsfollowedby1stinstarnymphs,2ndinstarnymphs,3rd increase in ORFs on the minus strand relative to the plus instarnymphs,4thinstarnymphs,5thinstarnymphs,andanadultina strand (Additional file 1: Figure S1). This indicates that counter-clockwisespiraloutwardsandfromlargesttosmallest.Thebar inthelowleftrepresents1cm. usingaSSLRcancorrectlypredicttherightstrand. Upon mapping the distribution of putative transcripts by strand specificity, one observes a bimodal distribution. The resulting assembly contained 194,729 putative The bimodal distribution becomes more symmetrical transcripts with 50,599 that were >1,000 bp and 20,076 upon applying a coverage threshold. In particular, there having a top BLASTX match (e≤10−10 and over >80% arenumerousputativetranscriptswithastrandspecificity of the reference) to 8,354 unique Uniref90 protein se- slightlylessthanzerothatdisappearedcompletelyasmore quences. For comparison, Trinity produced 196,000 pu- coverage was required to support the underlying putative tative transcripts with 14,522 that were >1,000 bp with transcript (Figure 4, Additional file 2: Figure S2). One ex- 4,323 having a top BLASTX match (e≤10−10) to 2,880 planation for this would be that these putative transcripts unique Uniref90 proteinsequences over >80% ofthe ref- arise from contaminating genomic DNA that lacks strand erence protein length for an outbred Bemisia tabaci specificity.Thiswouldmanifestasshortcontigssupported white fly population (Insecta: Hemiptera: Aleyrodidae) by few reads and little strand specificity. Consistent with [17]. The longest assembled putative transcript from genomic DNA being present in the samples, applying a BMSB is 30 kbp in length; most components contained coverage cutoff of only two removed >10,000 putative a single putative transcript; and the distribution of the % transcripts that are <300 bp. In other projects, we have GCpeaked around 37%(Figure 3). observed contaminating E. coli DNA that we suspect ar- rives in the sample through molecular biology reagents Strand-specificityandcoverage that may allow for an examination of our ability todetect One problem with assigning a function to transcripts is genomic DNA contamination. In BMSB, bacterial DNA determining which open reading frame is the correct may also come from the microbiome, for example from one when there are open reading frames on both strands an endosymbiont. However, regardless of the source, E. that lack evidence of function. This can be particularly coli DNA could serve as a proxy for identifying genomic true for organisms where good reference annotation is DNA in this sample. A coverage cutoff of two removed lacking, like insects. Sequencing a strand-specific library 3,501 putative transcripts that matched E. coli. If we plot should assist in identifying the transcribed strand the mean length, number of putative transcripts of size (Figure 2), reducing the number of possible ORFs by <300bp,andnumberofreadswithamatchtoE.coli,the half. However, strand-specific libraries can have leakage, greatestimprovementsareobtainedwithacoveragecutoff where experimentally the wrong strand is sequenced, of 20 (Figure 5). Three putative transcripts still persist thatneedstobeaddressed. with homology to E. coli. One of these (comp3244_c38_- To measure strand specificity, we calculated the log ra- seq1) has little strand specificity (SSLR=−0.45) and tio of sense reads to antisense reads using Bowtie2 matchesPhiX,whichisaddedtoIlluminasequencingruns Ioannidisetal.BMCGenomics2014,15:738 Page4of22 http://www.biomedcentral.com/1471-2164/15/738 Random primed first strand cDNA synthesis Second strand synthesis & removal of unincorporated nucleotides with dUTP mix & RNase H Sense cDNA Sense mRNA with dUTP Sense mRNA 5’ 3’ 5’ 3’ 555’’’ 3’ 3’ 5’ 3’ 5’ End repair, addition of A-tail, adapter ligation with index Digestion of 2nd strand PCR amplification DNA with UNG 5’ Sense cDNA 3’ Sense DNA with dUTP 5’ 3’ orange adaptor contains the index sequence 3’ 5’ 3’ 5’ 3’ 5’ DSN normalization & an additional PCR amplification 3’ 3’ 3’ 3’ Sequence read 1 Sequence read 2 A A A will be antisense A Flipping and will be sense N First strand N N second strand e D sequencing e D Indexing e D sequencing s s s n n n e e e S S S Index read 5’ 5’ 5’ 5’ Flow Cell Flow Cell Flow Cell Flow Cell Figure2Strand-specificsequencingwithDSNnormalization.Instrand-specificsequencing,thefinalsequencingreadscanbeassignedtoa particularstrandofDNA.Thisisaccomplishedbyasecond-strandsynthesiswithdUTP,ligationofadaptorstodouble-strandedDNA,and degradationofthesecondstrandwithUNG.Theresultisafragmenttobesequencedthatisdifferentiallylabeledonthetwoendsinamanner thatdictateshowtheyareloadedonthesequencerandthustheorderinwhichtheendsaresequenced. Figure3Featuresofthetrinityassembly.(A)Thesizedistributionoftheputativetranscriptsonalogarithmicscalerevealsthatthevast majorityofassembledtranscriptsare<1kbpbuttranscripts>25kbpcanbeassembled.(B)Forthevastmajorityofgenesthereisonlyasingle transcriptassembled.(C)The%GCcontentoftheputativetranscriptspeaksaround35%. Ioannidisetal.BMCGenomics2014,15:738 Page5of22 http://www.biomedcentral.com/1471-2164/15/738 and is not always completely removed through standard 25000 AA >=1 B >=20 filtering. The remaining two, comp1098_c0_seq1 and 20000 comp1100_c0_seq1, are strand-specific, unique, and have n:p = 3.85 n:p = 2.26 BLASTNmatchestothe16Sand23SrRNA,respectively, 15000 with best matches to the genome of an endosymbiont of 10000 the BMSB (GenBank: AP012554). This highlights that combining the results of the SSLR, a coverage threshold, 5000 and the BLASTN search significantly improves the tran- 0 scriptome assembly and aids in identifying transcripts 15 10 −5 0 5 10 1515 10 −5 0 5 10 15 from both the host and its endosymbiont(s). The SSLR − − − − would be an easy and useful addition to implement into Figure4Assessmentofthestrand-specificityoftheRNA-Seq library.Thestrandspecificityofthesequencinglibrarywasbased Trinityandotherassemblersthatwouldprovidethegreat- oncomparingthenumberoffirst-in-pairreadsmappedtotheplus estgains. strand,tothecorrespondingnumberofsecond-in-pairreads.Ina strand-specificlibrary,thestrand-specificlogratio(SSLR)shouldbe Filteringtoasinglegenepercomponent awayfromzero.Intheprotocolusedinthisstudy,wherethe WhileTrinity can generate almost200,000putativetran- second-in-pairreadcomesfromthesensestrand,anegativeSSLR isexpected.(A)ThedistributionofSSLRsforputativetranscripts scripts in >100,000 components, there are thought to be havingatleastoneproperreadpairmappingtothemdemonstrates only 20,000-40,000 genes in Metazoan genomes. Apply- apreferenceforthosewithanegativeSSLR.(B)Whentheplotis ing the thresholds above reduces this to 53,071 putative limitedtothosewithtwentyreadpairsmappingtothem,the transcripts from 18,573 components. However, analyses bimodaldistributionbecomesmoredistinct,likelyfromtheremoval related to function, particularly those that result in stat- ofcontigsarisingfromcontaminatinggenomicDNA. istical analyses, need the data to be reduced to a repre- sentative number of genes, thereby removing splice variants. Using an integrated approach, we were able to 12000 40000 35000 10000 30000 8000 25000 6000 20000 15000 4000 10000 2000 5000 0 0 0 10 20 30 40 50 Coverage Figure5Transcriptpropertiesrelativetotranscriptcoverage.Themeanlength(orange),numberofputativetranscripts<300bp(gray),and homologytoanE.colisequence(yellow)wereplottedontheleftaxiswhilethenumberofputativetranscripts(blue)wereplottedontheright axisafterapplyingacoveragethresholdtothetranscript(x-axis).Acoveragecutoffof20maximizesthemeanlengthandminimizesthenumber oftranscriptswithhomologytoE.coli. Ioannidisetal.BMCGenomics2014,15:738 Page6of22 http://www.biomedcentral.com/1471-2164/15/738 score each putative transcript based on two types of evi- substitution mutation. These positions were identified dence including similarity to known sequences and the in 11,462 different putative transcripts, which is 86.8% intrinsic features of each contig. We calculated a score of the total number of putative transcripts. The genetic for each of the remaining 53,071 putative transcripts of heterogeneity observed may be significantly less than theTrinity assemblybased on(a) similarityto knownse- that found in the native Asian population, since the quences, (b) features of any predicted ORFs, and (c) se- founder population in the US was likely small, repre- quencing coverage along the putative transcript. The senting a significant bottleneck, and an additional exactmetricsusedforcalculatingthescorearedescribed bottleneck occurred when founding the laboratory in Table 1 and Methods. This filtering resulted in 13,211 population. However, these polymorphisms may be a putative transcripts representing a 14-fold reduction in useful resource for population studies examining the the number of putative transcripts and a 9-fold reduc- spreadofBMSBintheUS. tion in the number of components (Table 2). This num- ber of putative transcripts is consistent with sequenced Differentialexpressionbetweenadultsandnon-adults hemipteran genomes such as the 34,604 predicted genes In order to compare the expression of juveniles and inA.pisum[18]. adults, the normalized log Ratio of the read count was 2 calculated between the two pools (Additional file 3: COGClassification Table S1). A total of 22,707 putative transcripts had The NCBI Cluster of Orthologous Groups (COG) data- more than a 2-fold change in ratio, and 6,590 had more base was used to classify the predicted proteins in these than a 16-fold change in ratio. The most highly 13,211 transcripts. Only 5,315 functions were assigned expressed genesin the adult pool B that have functional using NCBI COGs. Of those, almost a quarter (1,277) are annotation were vitellogenin genes, which is expected functions that are not well categorized, namely general given that they encode the yolk protein produced in functional prediction and unknown function (Figure 6). adult females. Similar levels of differential expression The remaining categorizations are almost evenly divided were found in a large number of hypothetical proteins betweentheotherthreecategoriesof(a)informationstor- and a protein with homology to an ankyrin protein age and processing, (b) cellular processes and signaling, found in bacterial endosymbiont Wolbachia strains. In and (c) metabolism. The top five sub-categories were (a) the juvenile stages, the most abundantly expressed general function prediction only (22.2%); (b) replication, genes were the larval cuticle proteins, which is also recombination and repair (8.2%); (c) posttranslational expected. This, combined with the qRT-PCR validation modification, proteinturnover, and chaperones(7.9%);(d) of specific transcripts of interest described below, dem- transcription(7.5%);and(e)aminoacidtransportandme- onstrates that DSN normalized libraries can be used for tabolism(7.1%).Thethreecategoriescomprising(a)extra- a gene expression analysis that successfully identifies cellular structures, (b) nuclear structure, and (c) cell genes to be targeted for further characterization with motility were the least abundant COGs represented, with other methods. The dynamic range of the expression 0, 2, and 4 matches (0.0%, <0.1%, and 0.1%), respectively. changes is expected to be muted by the normalization This is very similar to the distribution of COGs in other procedure, and this is demonstrated for specific tran- transcriptomesofeukaryotes,includinginsects[19-23]. scriptsbelow. Juvenile BMSB are more susceptible to insecticides Polymorphisms than adults ([24,25], and G. Dively, personal communi- The laboratory colony used to generate RNA for tran- cation). Therefore, we sought to identify transcripts scriptome sequencing was established recently from with functions related to pesticide detoxification that field-caught BMSB. While there was at least one bottle- wereup-regulatedinBMSB adults.Therewere25genes neck in founding the colony, the assembled sequences encoding proteins related to cytochrome P450 and were expected to have heterogeneity that reflects some glutathione S-transferase that were more abundant in of the genetic heterogeneity of the US BMSB popula- adults (Table 3). There were nine such proteins that tion. Of the 26.6 million positions in the 13,211 tran- were more abundant in the pre-adult RNA pool scripts that were examined with MPILEUP, 23.7 million (Table3). positions were supported by >20 reads when all reads were aligned with Bowtie2. Of those positions, 212,422 Taxonomicprofile had a substitution mutation that differed from the con- The putative transcripts were searched against NR using sensus base call that was supported by >5% of the BLASTX and a lowest common ancestor (LCA) assign- underlyingreads,whichrepresents~1%ofthepositions. ment [26] was made by aggregating all of the matches If more stringent requirements are applied, 109,920 of with the best e-value. Prior to filtering the putative tran- these positions had >20% of the underlying reads with a scriptstoremoveoneswesuspectedarosefromgenomic Ioannidisetal.BMCGenomics2014,15:738 Page7of22 http://www.biomedcentral.com/1471-2164/15/738 Table1Variablesandtheirweightsusedtofilterputativetranscripts Attribute Weight Variable Reason A)Basedontheputativetranscript sequence 1.Whatproportionofthedatabase 10 Proportioncovered Rewardtheputativetranscriptbasedontheproportionofthe proteiniscoveredinthefirst databaseproteinthatiscoveredinthebestBLASTXhit Uniref100hit? 2.Whatproportionoftheputative 8 Proportioncovered Rewardtheputativetranscriptbasedontheproportionofthe transcriptiscoveredbythefirst queryputativetranscriptthatiscoveredinthebestBLASTXhit Uniref100hit? 3.Whatisthelengthcoveredonthe 7 Databasehitlength/longest Rewardbasedontheabsolutedatabaseproteinlengthcoveredin databaseproteininthefirstUniref100 databasehitlength thebestBLASTXhit,comparedtothelongesthitlengthinthe hit? component 4.Whatisthelengthcoveredonthe 5 Putativetranscripthitlength/ Rewardbasedontheabsolutequeryputativetranscriptlength putativetranscriptinthefirst longestputativetranscripthit coveredinthebestBLASTXhit,comparedtothelongesthit Uniref100hit? length lengthinthecomponent 5.IsthestrandoftheUniref100match, 4 Matchstrand* RewardmatchesinplusstrandifSSLR<0ormatchesinminus theexpectedone(basedonSSLR)? (−SSLR/max|SSLR|) strandifSSLR>0.Incontrast,penalizematchesinplusstrandif SSLR>0ormatchesinminusstrandifSSLR<0 6.Whatproportionofthedatabase 9 Proportioncovered SameasthecorrespondingmetricforUniref100 proteiniscoveredinthefirstNRhit? 7.Whatproportionoftheputative 7 Proportioncovered SameasthecorrespondingmetricforUniref100 transcriptiscoveredbythebestNR hit? 8.Whatistherelativelengthcovered 6 Databasehitlength/longest SameasthecorrespondingmetricforUniref100 onthedatabaseproteininthefirst databasehitlength NRhit? 9.Whatistherelativelengthcovered 4 Putativetranscripthitlength/ SameasthecorrespondingmetricforUniref100 ontheTrinityputativetranscriptin longestputativetranscripthit thefirstNRhit? length 10.IsthestrandoftheNRmatch,the 3 Matchstrand* SameasthecorrespondingmetricforUniref100 expectedone(basedonSSLR)? (−SSLR/max|SSLR|) 11.IstheSSLRnegative(i.e.the 7 -SSLR/max|SSLR| Rewardputativetranscriptswiththenormal,negativeSSLR expected)? 12.Howlongistheputativetranscript 7 Putativetranscriptlength/ Rewardlongerputativetranscripts comparedtothelongestinthe longestputativetranscript component? length B)BasedontheORFs 1.IsthebestmatchforeachORFthe 10 (1-Numberofbestmatches)/ PenalizeputativetranscriptshavingORFsthathavedifferenthits. same? numberofbestmatches 2.ArethereORFsinbothstrandswith 10 -NumberofORFsinstrand"A"/ MaximumpenaltyifbothORFshaveaNRhit bothhavinganNRhit? numberofORFsinstrand"B" 3.ArethereORFsinbothstrandswith 8 -NumberofORFsinstrand"A"/ IntermediatepenaltyifonlyoneoftheORFshasaNRhit onlyonehavinganNRhit? numberofORFsinstrand"B" 4.ArethereORFsinbothstrandswith 3 -NumberofORFsinstrand"A"/ SmallpenaltyifnoneoftheORFshaveaNRhit noneofthetwohavinganNRhit? numberofORFsinstrand"B" 5.HowmanyORFsarecalled? 8 (1-numberofORFs)/number Penalizeputativetranscriptshaving>1ORFs ofORFs 6.AretheORFsfoundonlyinthe 8 ORFstrand* RewardputativetranscriptshavingORFscalledinonlythe expectedstrand(SSLR)? (−SSLR/max|SSLR|) expectedstrand C)Sequencingcoveragedips 1.Howmanysequencingcoverage 10 -Numberofdips/max Penalizeputativetranscriptswithsequencingcoveragedips dips? numberofdipsinthe component Ioannidisetal.BMCGenomics2014,15:738 Page8of22 http://www.biomedcentral.com/1471-2164/15/738 Table2Summaryofassemblyandannotation Characteristic Beforefiltering Afterfiltering Numberofreads(bothpools) 366,689,206 N/A Numberofputativetranscripts(bothpools) 194,729 13,211 Averagetranscriptlength(bp) 1,005 2,026 Standarddeviation(bp) 1,474 1,592 Mediantranscriptlength(bp) 439 1,649 Maximumtranscriptlength(bp) 27,655 24,046 Transcripts>1000bp 50,599 9,657 TranscriptswithaUniref100hit(e-value<1e-10) 80,536 11,513 TranscriptsmatchinguniqueUniref100proteins 37,160 9,993 TranscriptswithaNRhit(e-value<1e-10) 80,262 11,497 TranscriptsmatchinguniqueNRproteins 37,346 10,007 NumberofTrinitycomponents 123,175 13,211 NumberofORFs 89,684 13,210 NumberofORFs>450bp 61,569 11,141 NumberofORFswithafunctionassigned 57,197 9,811 DNA as described above, only 78,489 (89%) putative filtering, 11,359 (96%) putative transcripts had a transcripts had aeukaryotic LCAof atotal of88,630 pu- eukaryotic LCA of a total of 11,854 putative transcripts tative transcripts with BLAST results (Additional file 4: with a BLAST result (Additional file 6: Figure S5). Of Figure S3). Of the 7,111 (8%) putative transcripts with a the 120 (1%) putative transcripts with a bacterial LCA, bacterial LCA, 6,366 had an LCA to γ-Proteobacteria only 18 had an LCA to γ-Proteobacteria, and E. coli was with Escherichia coli as the most common species level no longer the most common species level assignment assignment (Additional file 5: Figure S4). Following (Additional file 7: Figure S6). This suggests that the 1400 1200 1000 Information storage and processing 800 Cellular processes and signalling 600 Metabolism Poorly characterized 400 200 0 Translation, ribosomal structure and biogeRnNeAs ipsrocessing and modification TrRaensplcirciapttiioonn, recombination and rCehrpoairCmeallt icn ysctlre ucctounrtre oal, ncd elld ydinvaisimiocns, chromosome partitioning Nuclear structureDefense mechaniSisgmnsal transduction Cemlle cwhaalln/ismemsmbrane/envelope biogenesis Cell motility CytoskeleItnotnracEelxltrulaacr eltlrualaffir csktirnugc, tsuercPersoesttitoran, naslnatdi voensail culmaor dtirfiacnastipoonr,t protein turnover, chapEernoerngesy production andC acrobnovherysdiroante transport and metAabmiolnios amcid transport and metNaucbloeliostimde transport and mCeoteanbzolyismem transport and metabolisLimpid transport andI nmoeSrtegacabonionlci disaormny trmaentsapboorlit taesn db iomseytnatbhoelisiss, mtransport and catabolisGmeneral function prediction onlyFunction unknown Figure6Transcriptfunctionalcategories.TheNCBIClusterofOrthologousGroups(COG)databasewasusedtoclassifythepredictedproteins inthe13,211representativetranscripts.AssignmentofCOGcategoriesshowedthatalargenumberofORFsbelongedtocategoriesofproteins whosefunctionsarepoorlycharacterized,namelythosethathavegeneralfunctionpredictiononlyandthosewithunknownfunction. Ioannidisetal.BMCGenomics2014,15:738 Page9of22 http://www.biomedcentral.com/1471-2164/15/738 SSLR-based filtering described in the prior section was endosymbionts in this population of BMSB. This is con- effective at removing contaminating genomic DNA in sistentwiththerecentidentificationofPantoeaagglomer- thelibrary. ans as a BMSB endosymbiont [28]. The highest number Intriguedbythepresenceofatranscriptionallyregulated of bacterial LCAs (53 putative transcripts or 44%) were gene of Wolbachia ancestry in the BSMB transcriptome, from Amoebophilus asiaticus in the Bacteroidetes, which we sought to investigate the ancestry of these bacterial isanendosymbiontoffree-livingamoebae[29].A.asiaticus transcripts further. The remaining bacterial putative tran- is related to insect endosymbionts including Blattabacter- scriptshavehomologytonumerousendosymbionts.LCAs iumspp.incockroaches[30],Sulciamuelleriincicada[31], for the endosymbiont of the brown-winged green stink and Cardinium hertigii in parasitoid wasps [32]. While the bug Plautia stali [27] and Pantoea spp. were detected, BLASTXreturnedanLCAofAmoebophilus,theBLASTN suggesting relatives of these bacteria may be present as searches did not return any significant matches for any of Table3Differentiallyexpresseddetoxificationgenes Name Functionalannotation Stageup-regulated Comp25785_c0_seq1 CytochromeP450gene Adults Comp26191_c0_seq1 GlutathioneS-transferase Adults Comp15122_c0_seq1 CytochromeP450gene Adults Comp16607_c1_seq2 CytochromeP450gene Adults Comp15049_c0_seq1 CytochromeP450gene Adults Comp15912_c1_seq2 CytochromeP450gene Adults Comp20672_c0_seq4 CytochromeP450gene Adults Comp13685_c1_seq2 CytochromeP450gene Adults Comp40610_c0_seq2 CytochromeP450gene Adults Comp8954_c0_seq1 GlutathioneS-transferase Adults Comp20241_c0_seq1 CytochromeP450gene Adults Comp18070_c0_seq2 CytochromeP450(2ORFs) Adults Comp11443_c0_seq1 CytochromeP450gene Adults Comp18881_c0_seq1 CytochromeP450gene Adults Comp7095_c0_seq2 CytochromeP450gene Adults Comp14891_c0_seq2 CytochromeP450gene Adults Comp8170_c0_seq1 CytochromeP450gene Adults Comp4236_c0_seq1 ProbablecytochromeP450gene Adults Comp2339_c0_seq1 CytochromeP450gene Adults Comp6322_c0_seq2 CytochromeP450gene Adults Comp17381_c0_seq6 CytochromeP450gene Adults Comp23582_c1_seq1 CytochromeP450gene Adults Comp4238_c0_seq1 CytochromeP450gene Adults Comp3991_c0_seq3 Glutathioneperoxidase(3ORFs) Adults Comp8540_c0_seq1 ProbablecytochromeP450gene Adults Comp6146_c0_seq1 GlutathioneS-transferase Pre-adults Comp11026_c2_seq1 CytochromeP450gene Pre-adults Comp21713_c0_seq1 CytochromeP450gene Pre-adults Comp3892_c0_seq1 CytochromeP450gene Pre-adults Comp18921_c0_seq3 Catalase Pre-adults Comp25932_c0_seq1 CytochromeP450gene Pre-adults Comp10873_c0_seq1 ProbablecytochromeP450gene Pre-adults Comp12303_c0_seq1 CytochromeP450gene Pre-adults Comp8344_c0_seq1 ProbablecytochromeP450gene Pre-adults Ioannidisetal.BMCGenomics2014,15:738 Page10of22 http://www.biomedcentral.com/1471-2164/15/738 these, indicating a high level of divergence for the nucleo- amino acids at the N-terminus have little homology for tide sequences. Most of the proteins with a bacterial LCA comp549_c15_seq3 and comp2753_c5_seq1, respect- had homology to ankyrin repeat containing proteins found ively. No changes in sequencing coverage around these indiverseendosymbiontandinvertebrategenomes,includ- regions were identified in either transcript that would ing all of the proteins with an Amoebophilus LCA. Add- suggest a misassembly. The N-terminus of the ORF itionaltranscriptswithbacterialLCAsencodeproteinswith encoded in comp549_c15_seq3 contains a JNK_SAPK- homologytomannanases,amylase,C4-dicarboxylatetrans- associatedprotein-1domain. porter,andthreedifferenthypotheticalproteins. Examination of the phylogenetic relationship of comp549_c15_seq3 and comp2753_c5_seq1 with hom- Ankyrinrepeatcontainingproteinswithsignificant ologous sequences in GenBank (<e-15), reveals that bacterialhomology while they have homology to Wolbachia endosymbionts Therewere29LCAstoα-Proteobacteria,allofwhichwere (α-Proteobacteria), collectively they also have homology in the Rickettsiales, including 27 that were to Wolbachia to coding sequences (CDSs) in Rickettsiella grylii (γ- endosymbionts.SinceWolbachiaendosymbiontsarefound Proteobacteria) and Diplorickettsia massiliensis (α-Pro- in many insects, we tested for the presence of Wolbachia teobacteria) (Additional files 8 and 9: Figures S7-S8). using a robust set of 40 primers that collectively amplify The phylogenetic diversity of the bacterial taxa suggests genes from a diverse array of Wolbachia endosymbionts as that there have been lateral gene transfers involving di- wellasothermicroorganismsinthefamilyAnaplasmataceae verse arthropod-associated bacterial lineages and the [33].WedidnotdetectanyamplificationproductsforWol- precise donors and recipients of these bacteria-bacteria bachia endosymbionts using BMSB genomic DNA, while lateral gene transfers are not clear. While we suspect the primers amplified products using control DNA. This that this may also be a lateral gene transfer involving suggests that this BMSB colony is not colonized by a BMSB, we cannot rule out the presence of these genes Wolbachiaendosymbiont. arisingfromanendosymbiontofBMSB.ByqPCR,these Threeputativetranscriptswereidentifiedthatwerediffer- geneswereatthesamerelativeabundanceasBMSBnu- entiallyexpressedandhadhomologytoWolbachiaankyrin clear genes, but we were not successful at placing these proteinsincludingcomp549_c15_seq1,comp2753_c5_seq1, genes in the BMSB genome using inverse PCR. The on- and comp18511_c0_seq1, which had SSLRs of −4.8, −5.4, going BMSB genome sequencing project will likely clar- and −6.2, respectively. Differential expression analysis of ify this issue. Should this prove to be a lateral gene thetranscriptomedatareveals28-foldand776-foldoverex- transfer, it is not clear whether the gene moved from pression of comp549_c15_seq1 and comp2753_c5_seq1 in bacteriaintoinsects,or viceversa. the adult pool while comp18511_c0_seq1 was 104-fold The third transcript (comp18511_c0_seq1) contains an overexpressedinthejuvenilepool.Theseresultswere con- ORFwithfulllengthmatchestoWolbachiaproteinsthat firmed by qRT-PCR using the original RNA samples, prior contain ANK repeats and a PRANC domain. PRANC to pooling. For qRT-PCR we examined comp549_c0_seq3, domains were at one time described only in Pox viruses, instead of comp549_c0_seq1, as it is slightly longer and where PRANC-containing proteins inhibit NF-κB activa- contains part of the latrotoxin domain discussed below. tion by TNF-α, thereby altering the innate immune re- These qRT-PCR experiments demonstrate a specific sponse of the mammalian host. More recently PRANC 20,000-fold and 17,000-fold increase in expression of domainswereidentifiedinthehumanpathogenOrientia comp549_c15_seq3 and comp2753_c5_seq1 in active tsutsugamushi (Bacteria: Rickettsiales), multiple Wolba- adultfemaleswhencomparedtoallotherstages,andaspe- chia endosymbionts (Bacteria: Rickettsiales), Wolbachia cific170-foldincreaseinexpressionofcomp18511_c0_seq1 phage,andNasoniavitripennis(Eukaryota:Hymenoptera). in5thinstarnymphs. Previous phylogenetic analysis suggests that the Nasonia While all three putative transcripts have homology to lineage acquired one or more of these proteins from a knownankyrin-repeatcontainingproteins,comp2753_c5_- Wolbachia endosymbiont via lateral gene transfer [34]. seq1 and comp549_c15_seq3 do not contain ankyrin ExaminationofthephylogeneticrelationshipoftheBMSB (ANK)repeats.Bothhavehomologytoalargemultipledo- sequence with homologous sequences in GenBank (<e- mainproteininWolbachiaendosymbiontsofCulexpipiens 15), reveals that while it has homology to Wolbachia en- (e.g., WP_007302981.1) that has a latrotoxin domain at its dosymbionts (α-Proteobacteria), it is distinct from the C-terminus and numerous ankyrin repeats in the N- Wolbachia proteins. It is not clear if the BMSB protein terminal half. The homology for comp549_c15_seq3 spans represents an ANK-PRANC protein in another bacterial a portion of the latrotoxin domain and the adjacent ANK lineage or arthropod lineage. As with the genes described repeats, while the homology for comp2753_c5_seq3 spans above,qPCRresultsrevealedthesamerelativeabundance only part of this region. This region of homology is at asBMSBnucleargenesandisconsistentwiththesegenes the C-terminus of BMSB ORFs while the 560 and 223 being in the BMSB nuclear genome. However, in both

Description:
Transcriptome sequencing was under- taken to provide a molecular resource to the research com- munity for basic and applied research purposes. The data can be used to examine the population genetics of BMSB, including gaining a better understanding of the original invasion in Allentown, as well
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.