Mon.Not.R.Astron.Soc.000,1–9(2013) Printed19January2015 (MNLATEXstylefilev2.2) SoFiA: a flexible source finder for 3D spectral line data Paolo Serra,1(cid:63) Tobias Westmeier,2 Nadine Giese,3 Russell Jurek,1 Lars Flo¨er,4 At- tila Popping,2,5 Benjamin Winkel,6 Thijs van der Hulst,3 Martin Meyer,2 Ba¨rbel S. Koribalski,1 Lister Staveley-Smith2,5 and He´le`ne Courtois7 1CSIROAstronomyandSpaceScience,AustraliaTelescopeNationalFacility,POBox76,Epping,NSW1710,Australia 2ICRAR,M468,TheUniversityofWesternAustralia,35StirlingHighway,Crawley,WA6009,Australia 3UniversityofGroningen,KapteynAstronomicalInstitute,Landleven12,NL-9747AD,Groningen,theNetherlands 5 4Argelander-Institutfu¨rAstronomie,AufdemHu¨gel71,53121Bonn,Germany 1 5ARCCentreofExcellenceforAll-skyAstrophysics(CAASTRO) 0 6Max-Planck-Institutfu¨rRadioastronomie,AufdemHu¨gel69,53121Bonn,Germany 2 7Universite´Lyon1,CNRS/IN2P3,InstitutdePhysiqueNucle´aire,Lyon,France n a J Accepted2014December30.Received2014December22;inoriginalform2014July28 6 1 ABSTRACT ] M We introduce SoFiA, a flexible software application for the detection and parameterization ofsourcesin3Dspectral-linedatasets.SoFiAcombinesforthefirsttimeinasinglepieceof I softwareasetofnewsource-findingandparameterizationalgorithmsdevelopedonthewayto . h futureHIsurveyswithASKAP(WALLABY,DINGO)andAPERTIF.Itisdesignedtoenable p thegeneraluseofthesenewalgorithmsbythecommunityonabroadrangeofdatasets.The - keyadvantagesofSoFiAaretheabilityto:searchforlineemissiononmultiplescalestode- o r tect3Dsourcesinacompleteandreliableway,takingintoaccountnoiselevelvariationsand t thepresenceofartefactsinadatacube;estimatethereliabilityofindividualdetections;look s a forsignalinarbitrarilylargedatacubesusingacatalogueof3Dcoordinatesasaprior;provide [ awiderangeofsourceparametersandoutputproductswhichfacilitatefurtheranalysisbythe user.WehighlightthemodularityofSoFiA,whichmakesitaflexiblepackageallowingusers 1 toselectandapplyonlythealgorithmsusefulfortheirdataandsciencequestions.Thismod- v 6 ularitymakesitalsopossibletoeasilyexpandSoFiAinordertoincludeadditionalmethods 0 astheybecomeavailable.ThefullSoFiAdistribution,includingadedicatedgraphicaluser 9 interface,ispubliclyavailablefordownload. 3 0 Keywords: methods:dataanalysis. . 1 0 5 1 1 INTRODUCTION studied. We illustrate this point in Fig. 1, where we show a data v: Thedetectionofastronomicalsignalaboveinstrumentalnoiseisa cubeoftheATLAS3D HIsurvey(Serraetal.2012).Inthisfigure, i the central object is bright (and therefore easy to detect) but has X crucial aspect of all astronomy observations. The techniques em- acomplex3Dstructure,includingalowsurfacebrightnessexten- ployedtodetectandcharacterisethissignaldependonthetypeof r siontowardslargeRA.Onthecontrary,thetopobjectisbrightand a databeinganalysed(seeMasiasetal.2012forareview).Standard relatively simple as emission is confined within a small range of methodsandtoolshaveemergedinfieldswithalargecommunity RAandDec.Finally,thebottomobjectisthetypicalcaseofare- basesuchas2Dimaging(e.g.,SExtractor;Bertin&Arnouts solved,edge-ongalaxywherethetwopeaksofthedouble-hornve- 1996)and1Dspectroscopy(e.g.,GANDALF;Sarzietal.2006).In locityprofileareclearlyvisible,anddetectionofthefaintemission other fieldswith relatively fewer users,detection algorithms vary between the two peaks is challenging. An ideal 3D source finder significantlybetweenprojects.Thisisthecaseforstudiesbasedon shouldbeabletodetectandparameterizeallthesedifferentsources 3D spectral line data (for brevity, data cubes), where the flux of inacompleteandreliableway. aspectrallineismappedasafunctionofpositionontheskyand line-of-sightvelocityoftheemittingmatter. Radio single dishes and interferometers have traditionally The diversity of source finding methods for data cubes is at beenthemostcommontelescopesusedtoconstructdatacubes(al- leastpartlyduetothediversityof3Dstructureofthesourcesbeing thoughopticalintegral-fieldspectrographsarenowalsogenerating largenumbersofsuchcubes–e.g.,Cappellarietal.2011;Croom etal.2012;Sa´nchezetal.2012).Theupgradeandcontinuingoper- (cid:63) E-mail:[email protected] ation of existing radio telescopes, as well as the construction of (cid:13)c 2013RAS 2 PaoloSerraetal. Input data cube Select sub-cube Flagbad voxels Input Apply weights weights cube S R E Noisenor- LT Convolution 2D-1D malisation FI Figure1.Volumerenderingofan HI datacubeshowingthatindividual sourceshaveacomplexanddiverse3Dstructure.Thismakestheirdetection andaccurateparameterizationchallenging. RCEERS Threshold CNHI OUND finder S+Cfinder finder SFI the Square Kilometre Array and its precursors, are leading to a rapidincreaseinthenumberandsizeofdatacubes.Standardand sufficientlygeneralsource-findingtoolswillbenecessarytoanal- Merge Input yse these data, and recent work has started addressing this need detected mask (see,e.g.,DuchampbyWhiting2012).Inthispaperweintroduce voxels cube SoFiA,anew,flexibleSourceFindingApplicationfordatacubes whichcombinesdetectionalgorithmsandtechniquesfromseveral sourcefinders. Rejectfalse detections SoFiA is designed to work on any data cube independent of telescope or observed spectral line. However, its development is part of preparatory work for a few specific, upcoming HI sur- Optimize veys: WALLABY, a blind HI survey of 3/4 of the entire sky out mask toz∼0.25tobecarriedoutwiththeAustralianSquareKilometre ArrayPathfinder(ASKAP;seeKoribalski2012a);DINGO,adeep HIsurveyouttoz ∼ 0.4(alsotobecarriedoutwithASKAP;see Parame- terize Meyer2009);andtheHIsurveysplannedforAPERTIF(Verheijen etal.2008).Thispreparatoryworkhasresultedinthedevelopment ofanumberofnewsource-findingalgorithms,whicharedescribed Filter inaseriesofpapersreferredtointhenextSection(forasummary output seeKoribalski2012b).SoFiAputsthesedifferentalgorithmsto- getherforthefirsttimeinacoherent,flexibleandpubliclyavailable pieceofsoftware. SoFiA can be obtained from https://github.com/ SoFiA-Admin/SoFiA.Onthesamewebpageweprovidealist Source Output Moment Single- mask source of requirements, installation instructions and a user manual. The catalogue cube maps products aimofthispaperistodescribehowSoFiAoperatesondatacubes andtherebyprovideareferenceforcurrentandfutureusers. Figure2.SoFiAflowchart.Wehighlightthe“Filteroutput”modulewith adashedboxasthiswillbecomeavailableinfuturereleasesofSoFiA. 2 DESCRIPTIONOFSoFiA cube(orasub-cubeselectedbytheuser)isloaded,thesemodules allowusersto: SoFiAisamodularapplicationwhoseaimistodetectandparam- eterize sources in a data cube. The flowchart in Fig. 2 shows the • modifytheinputcubebyapplyingflags,weights,orasetof variousmodulesthatuserscanchoosetouse(ornottouse),inthe filters; order in which they are executed by SoFiA. Once an input data • detectthespectrallinesignal; (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 3 switchedoff.Alternatively,itcouldbegivenanumberofrelatively morecomplextaskssuchas,forexample,applyingawaveletfil- teringalgorithm,rejectingfalsedetectionsorfittingmodelstothe spectrumofthedetectedsources. WhileSoFiAwillcontinuetobeimproved,thisbasicprinci- pleofmodularitywillnotchange.Therefore,althoughthispaper describes the software as it is at the time of writing and new al- gorithms may be introduced in the future, the main workings of SoFiAwillremainasillustratedhere. 2.1 Datacube,weightscube,maskcubesandfilters Four different types of input and/or output cubes are relevant at differentstagesofSoFiA. • Datacube,whichincludessignalfromastronomicalsources superimposedoninstrumentalnoise(anderrors). • Weights cube, which allows users to weight voxel values to takeintoaccount,e.g.,noiselevelvariationsacrossthecubeorthe presenceofimagingartefactsincertainregionsofthedatacube. • Binary mask, where detected and non-detected voxels have valuesof1and0,respectively. • Objectmask,wherenon-detectedvoxelshaveavalueof0and Figure3.ScreenshotoftheSoFiAGUI.TheGUIadoptsautomatically detected voxels have an integer value corresponding to the ID of thenativestyleofthewindowmanagerusedonthesystemwhereSoFiA theobjecttheybelongto. isinstalled.InthisfigureweshowtheGUIasitappearsonaKubuntuLinux system.TheGUIalsoofferstheoptionofdisplayingthesourcecatalogue Allsource-findingalgorithmsimplementedinSoFiAandde- generatedbySoFiAandincludesahelpbrowserthatexplainstheavailable scribed in Sec. 2.2 below assume that the noise level is uniform parametersettings. across the data cube. Therefore, noise variations caused by, e.g., mosaicking or frequency-dependent flagging need to be removed first.ThiscanbedonewithinSoFiAbymeansofaweightscube • identifysourcesbymergingdetectedvoxelstogether; inversely proportional to the noise level. SoFiA removes noise • rejectfalsedetections; variationsbymultiplyingthedatacubebytheweightscube.Once • optimizethemaskofindividualsources; sourcedetectioniscompleted,SoFiAwillundothisoperationbe- • measuresourceparameters; foremeasuringsourceparameters.Theweightscubecouldalsobe • filtertheoutputbyselectingaregionofinterestinsourcepa- usefultodown-weightregionsofadatacubeaffectedbyimaging rameterspace; artefacts(e.g.,cleaningorcontinuum-subtractionresiduals). • and produce output catalogues as well as cubes, moment Theweightscubecanbeprovidedbytheuser.Alternatively, maps,position-velocitydiagramsandintegratedspectra. userscanprovideananalyticdescriptionoftheweightsvariation Individualmodulesaredescribedinmoredetailintherestofthis across thecube. Finally, a weightscube inversely proportional to Section. They are written in either Python or C++ and rely on a thelocalnoiselevelcanbederivedbySoFiAandappliedtothe rangeofexternallibraries,includingNumPyandSciPy(Jonesetal. datacube.Theevaluationofthelocalnoiseleveliscarriedoutin- 2001; Walt, Colbert & Varoquaux 2011), Cython (Behnel et al. dependently along any or all of the three axes of the data cube. 2011),Astropy(AstropyCollaborationetal.2013),theGNUSci- Forexample,ausermaywishtoremovenoisevariationsalongthe entificLibrary1and,optionally,matplotlib(Hunter2007).Provided frequencyaxisalone,undertheassumptionthatthenoisedoesnot thattheselibrariesareavailable,SoFiAcanrunonallmachines varywithineachfrequencyplane. withaUnixorLinuxoperatingsystem(including,e.g.,MacOSX WenotethatSoFiAmeasuresthenoisewithinadatacubeat and Ubuntu). We refer to the SoFiA webpage for up-to-date de- variousotherstagesoftheprocessing.Differentmethodsofnoise tails. measurementareimplementedanduserscandecidewhichoneis SoFiA can be executed from the command line or using a moreappropriatefortheirpurpose.Possiblechoicesare:i)standard dedicated graphical user interface (GUI) based on the Qt library deviation;ii)medianabsolutedeviation;andiii)standarddeviation (seeFig.3).Bothmethodsallowuserstoselectwhichcombination ofazero-centredGaussianfittothenegativesideofthefluxhis- oftheabovemodulesandwhichsourcefindingandparameteriza- togram. tionalgorithmstouse.ThisselectionisdoneusingeithertheGUI The calculation and application of the inverse-noise weights or a plain text parameter file (if running SoFiA from the com- cube described above is part of a more general SoFiA module mandline),allowingthesource-findingstrategyanditscomplexity which allows users to apply a filter to the data cube before run- to be optimized for the type of data and sources of interest. For ningtheselectedsource-findingandparameterizationalgorithms. example, SoFiA could be asked the simple question of creating AsindicatedinFig.2,thismoduleincludestwoadditionalfiltering amoment-0imageofallvoxelsaboveagiventhresholdinadata methods:firstly,theconvolutionwitha3Dkernelwhoseshapecan cube – in which case most of SoFiA’s functionalities would be be chosen among a few options and whose size can be specified bytheuser;andsecondly,the2D-1Dwaveletde-noisingalgorithm developedbyFlo¨er&Winkel(2012).Thisalgorithmprocessesthe 1 http://www.gnu.org/software/gsl/ twospatialdimensionsandthespectraldimensionofthedatacube (cid:13)c 2013RAS,MNRAS000,1–9 4 PaoloSerraetal. separately,andreturnsanoise-freedatacubereconstructedusing todeliverhighercompletenessandreliabilitythanS+CandCNHI only wavelet coefficients above a specified threshold. Additional forsourcesunresolvedonthesky,especiallyatnarrowlinewidths. filteringoptionsmaybeprovidedinfuturereleases. Incontrast,theS+Cmethodisbyconstructionwellsuitedtofind- AsindicatedbytheflowchartinFig.2,portionsofthecube ingsourcesonavarietyofscalesandPoppingetal.(2012)deemit canbeblankedout(flagged)priortosourcefinding.Thismaybe thebestchoiceforextendedobjects. necessaryatthelocationofverybrightcontinuumsourceswhose Wenotethatmanyofthealgorithmshavebeenimprovedsince spectrumwasnotsubtractedproperlyfromthedata,oratchannels thecomparativestudyofPoppingetal.(2012).Additionaltesting dominatedbylineemissionfromtheGalaxyoraffectedbystrong cannowbecarriedoutwithinSoFiAandwillbeusedtoinvesti- radiofrequencyinterference. gatehowtofurtherimprovetheirperformance.Untilthen,werefer Finally, mask cubes are generally calculated within SoFiA to the aforementioned papers for a complete discussion of these (seebelow)butcanalsobeprovidedbytheuser.Thelattercould methods. bedesirableifauser,followinganinitialsource-findingrun,wishes Alltheabovealgorithmsreturnabinarymaskofdetectedvox- tolookforadditionalsourceswithadifferentsearchalgorithmor els(andanyadditionalsource-findingalgorithmcouldbeaddedto parameters. In this case the new sources are added to the initial, SoFiAaslongastheysatisfythiscondition).Asanexample,the inputmask.Alternatively,aninputmaskcouldbeusedifsources toppanelsofFig.4showfivechannelsextractedfromthedatacube havealreadybeenidentifiedandonlysubsequentparameterization inFig.1and,withblackcontours,theregionsincludedinthebinary stepsarerequired. mask.InthiscasetheS+Cfinderwasemployedusing12different smoothingkernels.Therelativelylowadoptedthreshold(3.5σ)re- sultsinanumberofnoisepeaksbeingincludedinthemask.We 2.2 Detectionofspectrallinesignal comebacktothispointinSec.2.4. SoFiA is meant to offer a number of detection algorithms that userscanchoosefrom.Acommonadvantageofthesealgorithms 2.3 Mergingdetectedvoxelsintosources istheabilitytolookforemissiononmultiplescales,whichises- sentialtodetectsourcesin3D(seeFig1).Anexceptionisthesim- Theaforementionedbinarymaskisthebasisforidentifyingindi- plethresholdmethod(seebelow),unlessusedincombinationwith vidualsourcesorobjects.InSoFiA,thiscomputationallyexpen- someofthefilteringmethodsdescribedabove(e.g.,2D-1Dwavelet siveoperationisperformedusingtheC++implementationofthe denoising).ThefollowingalgorithmsareimplementedinSoFiA. Lutz(1980)one-passalgorithmbyJurek(2012),combinedwitha sparse representation of 3D objects. We refer to Jurek (2012) for • Simplethreshold.Thisisthesimplestpossiblealgorithm(and detailsonthisimplementation.Hereitissufficienttosaythatthis theonlyonenotoperatingonmultiplescales):onlyvoxelswhose algorithmproducesthesameresultasafriends-of-friendsmethod absolute value is above a specified threshold are detected. Users withlinkingelementequaltoanellipticcylinder.Userscanspecify canspecifythethresholdinfluxunitsorrelativetothenoiselevel. thecylindersize.ThisstepofSoFiAalsoreturnsbasicsourcepa- • S+C.Thisisthesmooth+clipalgorithmdevelopedbySerra rameterssuchastotalflux,peakflux(bothnormalisedbythenoise et al. (2012) on the basis of techniques traditionally used within level)andsize. the HI community. It consists of searching for emission at mul- ThebottompanelsofFig.4showtheobjectscreatedfromthe tipleangularandvelocityresolutionsbysmoothingthedatacube binarymaskusingamergingcylinderwitharadiusof3pixelsand with 3D kernels specified by the user. At each resolution, voxels a height of 7 channels (we show only objects with positive total are detected if their absolute value is above a threshold given by flux).Thesepanelsshowfourrealdetectionsaswellasanumber theuser(innoiseunits).Thefinalmaskistheunionofthemasks ofpositivenoise-peakobjects.Itisworthhighlightingthesuccess- constructedatthevariousresolutions. fuldetectionofafaint,extendedHItaileastofthebrightestgalaxy • CNHI.ThisalgorithmwasdevelopedbyJurek(2012).Indi- (second panel from the left). This detection is made possible by vidual 1D spectra (or bundles of adjacent spectra) are extracted thefactthatSoFiAlooksforemissiononmultiplescales.Further- from the data cube. For each of them, the Kuiper test is used to more, SoFiA correctly identifies as a single source the resolved, identifyregionsofthespectrumwhicharenotconsistentwithcon- edge-ongalaxylocatedinthesouthernpartofthecube(visiblein tainingonlynoise.Inpractice,usersneedtoprovideaprobability allpanelsbutthefirst)despitethelowlevelemissionatchannels thresholdabovewhichaspectralregionisconsidereddetectedand closetothesystemicvelocity. isaddedtothefinalbinarymask. Thenumerouspossiblecombinationsofthesesource-findingmeth- 2.4 Reliabilityandrejectionoffalsedetections odstogetherwiththefilteringalgorithmsdescribedinSec.2.1al- lowuserstodesignanumberofdifferentstrategiestodetectsignal Alldetectionalgorithmslistedaboverequireuserstospecifyade- intheirdatacube.Forexample,theCNHIfindercouldberunfol- tectionthreshold.Thecloserthisthresholdistothenoise,themore lowing convolution with a 3D kernel appropriate for the type of noise peaks will be included in the resulting binary mask. Some sources being searched. Alternatively, a simple threshold method of these noise peaks may be identified as separate objects if they couldbeusedafterthenoisehasbeenremovedfromthecubeby aresufficientlyfarfromarealobject(seebottompanelsofFig.4). the2D-1Dwaveletfilter. SoFiA offers two ways of removing these false detections from Popping et al. (2012) discuss strengths and weaknesses of thefinaloutput. thesealgorithmsandcomparetheirperformance.Animportantrec- Thefirstmethodisasimplesizefilterandisbasedonthefact ommendationofthatworkisthatallsourcefindersshouldincorpo- thatallrealdetectionsareatleastaslargeasthedatacube’sresolu- ratesomeformof3Dsmoothinginordertoincreasecompleteness. tion.Inpractice,userscanspecifytheminimumacceptablesource Inthisrespect,thesimplethresholdalgorithmisoflimiteduseun- size along each axis of the cube independently. The downside of less coupled with a filtering methods such as the 2D-1D wavelet thismethodisthatitmaypotentiallyremoverelativelybrightbut de-noising.Poppingetal.(2012)findthisparticularcombination unresolvedsourcesfromthefinalobjectmask. (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 5 1110.3 km/s 1234.0 km/s 1357.7 km/s 1481.3 km/s 1605.0 km/s +62.3° +62.2° Dec (J2000)+62.1° +62.0° +61.9° 1110.3 km/s 1234.0 km/s 1357.7 km/s 1481.3 km/s 1605.0 km/s +62.3° +62.2° Dec (J2000)+62.1° +62.0° +61.9° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° RA (J2000) RA (J2000) RA (J2000) RA (J2000) RA (J2000) Figure4.IllustrationofthedetectionofsignalandidentificationofindividualsourcesinSoFiA.Toppanels.Channelmapsextractedfromthedatacube showninFig.1.Theline-of-sightvelocityofeachchannelisindicatedinthetop-leftcorner(notethatthesearenotadjacentchannelsintheoriginalcube). Thebeamisshowninthebottom-leftcorner.Blackcontoursshowregionsincludedinthebinarymask(Sec.2.2).Bottompanels.Samechannelmapsasin thetoppanelsbutnowshowingtheindividualobjectsformedonthebasisofthebinarymask(Sec.2.3).Weshowonlyobjectswithpositivetotalflux.Each objectisindicatedwithadifferentrandomcolour.Blackcontoursindicatethefourobjectswhosereliabilityishigherthan99percent(Sec.2.4). The second method is illustrated by Serra, Jurek & Flo¨er However, experience shows that masks can miss the faint, outer (2012)andestimatesthereliabilityofindividualobjectsbycom- edge of objects, in particular if obtained with a high detection paringthedistributionofpositiveandnegativesources(i.e.,sources threshold.Thiswouldintroducesystematiceffectsinthemeasured with positive and negative total flux, respectively) in parameter parameters(e.g.,thetotalfluxwouldbeunderestimated;seeWest- space.Thesimpleideaisthatthedistributionofpositiveandneg- meier,Popping&Serra2012).Topreventthis,SoFiAofferstwo ativenoisepeaksshouldbeidenticalwhilepositive,realdetections maskoptimizationmethodswhichmodifytheobjectmaskcubeby should not have a negative counterpart in parameter space. It is growingthemaskswhichdefineindividualobjects.Inbothmeth- basedontheassumptionsthatthenoiseissymmetricandthatreal ods,themaskisgrownindependentlyforeachobject. sourceshavepositivetotalflux(i.e.,absorptionlinesourceshave beenmasked). Thefirstmethodismostlyappropriateforsourcesthatareun- WithinSoFiA,thereliabilitycanbecalculatedfollowingthe resolvedontheskyor,ifresolved,face-onandsymmetric.Itstarts runofanysource-findingalgorithmchosenbytheuseraslongas byfittinganellipsetothemoment-0imageoftheobject.Theel- both positive and negative noise peaks are included in the binary lipseisthenusedasamaskforallvelocitychannelsoccupiedby mask,andafterthedetectedvoxelsaremergedintosources(Sec. theobject–i.e.,theinitialmask,whichgenerallyhasanarbitrary 2.3).Thereliabilitycalculationalsorequiresthatasufficientnum- 3Dshape,isconvertedintoanellipticcylinder.Finally,thesizeof berofnegativenoisepeaksisincludedinthemasksuchthattheir theellipseisincreaseduntilamaximumintotalfluxisreached(a distributioninparameterspacecanbestudiedmeaningfully.Users similarmethodisdescribedbyBardenetal.2012inthecontextof canselecttoproducediagnosticplotsonthereliabilitycalculation 2Dimaging). similartothoseshowninSerra,Jurek&Flo¨er(2012).Theblack The above method should in principle be applied only to contoursinthebottompanelsofFig.4highlightobjectswhosere- sourceswhichfillmostofthecylindricalmaskinallchannels(see liabilityishigherthan99percent. above),whileforobjectswithamorecomplex3Dstructureitcan Insummary,userscandecidetorunSoFiAwithahighdetec- resultinadecreaseoftheintegratedsignal-to-noiseratio.Forthis tionthreshold,resultinginareliablebutpossiblyincompletecata- reasonweprovideasecondmaskgrowthmethod.Thisconsistsin logueofdetections;buttheycanalsodecidetodigdeeperintothe performingabinarydilationoftheinitialmaskalongthetwospa- noiseusingalowerthreshold,andsuccessivelyremovefalsedetec- tialaxesofthedatacubeusinga2Ddilationstructuringelement tions. In the latter case, a reliability value can be returned for all whoseshapeapproximatesacircle.Thesizeofthestructuringele- positivedetections. mentisincreasediterativelyuntilthetotalfluxconverges(i.e.,until therelativefluxgrowthbetweensuccessiveiterationsislowerthan a threshold specified by the user). This method preserves the 3D 2.5 Maskoptimization shapeoftheinitialmask.Inadditiontogrowthalongthetwospa- SoFiAmeasurestheparametersofallsources(e.g.,totalflux,size, tialaxes,thisalgorithmcanalsogrowallmasksbyafixednumber line width) considering only voxels included in the mask cube. ofchannels(selectedbytheuser)alongthefrequencyaxis. (cid:13)c 2013RAS,MNRAS000,1–9 6 PaoloSerraetal. +62.2° 400 0.20 1 Dec (J2000)+62.1° velocity (km/s) 0 1200 1300 F(Jy)000...011505 0 1 1 0.00 180.7° 180.6° 180.5° 180.4° 180.7° 180.6° 180.5° 180.4° –0.1 –0.05 0 0.05 0.1 1000 1100 1200 1300 1400 RA (J2000) RA (J2000) offset (deg) velocity(km/s) Figure5.DataproductsforthebrightestgalaxyinthecubeshowninFig.1.Fromlefttoright:moment-0image;moment-1velocityfield;position-velocity diagramalongthemorphologicalmajoraxis(PA=32◦);andintegratedspectrum(filledcircles)withthebest-fittingbusyfunctionoverlaid(solidline).Inthe latterpanelthedottedanddashedlinesindicatethelinewidthsW50andW20estimatedat50and20percentofthepeakflux,respectively,onthebasisofthe busyfunctionfit. 2.6 Sourceparameterization 2.8 PerformanceofSoFiA As mentioned in Sec. 2.3, basic source parameters are measured InthecurrentimplementationofSoFiAtheentireinputdatacube whendetectedvoxelsaremergedintoobjects.Thesecanbeused (or the selected sub-cube; see Fig. 2) is loaded into memory and toestimatethereliabilityofeachsourceandrejectfalsedetections processed on a single core. Additional cubes will also need to inparameterspace.AftermaskoptimizationSoFiAre-computes be stored in memory at various stages of processing, such as the thoseparametersandmeasuresadditionalones.Theseinclude:po- weightscube,thebinaryorobjectmaskcube,andasmoothedver- sition (both geometric and centre of mass); total flux; minimum sion of the data cube if required by the source-finding algorithm and maximum voxel value; size and bounding region along each beingused(e.g.,S+C),plusapotentiallylargearrayofsourcepa- axis; line width measured using different methods (including the rameters.Itisthereforeinterestingtodiscusshowthememoryre- one proposed by Courtois et al. 2009); results of an ellipse fit to quirementandexecutiontimeofSoFiAvarywithcubesize.For themoment-0image;resultsofabusyfunctionfittotheintegrated thispurposewemakeuseoftwocubes.Thesmallercubeistheone spectrum(foradescriptionofthebusyfunctionseeWestmeieretal. usedforillustrationpurposeinthispaper(Figs.1and4).Ithas360 2014).Theseparametersareprovidedbothina“raw”format(i.e., pixelsalongbothspatialaxesand150channelsalongthefrequency coordinatesinpixelunits,fluxesindataunits,line-widthinchan- axis,resultinginafilesizeof78MB.Thesecondcubeistheone nels) as well as converted into more useful units (e.g., WCS co- usedforthesource-findingtestsofSerra,Jurek&Flo¨er(2012)and ordinatesandstandardfluxandvelocityunits).Someofthesepa- Westmeier, Popping & Serra (2012). It too has 360 pixels along rameterizationstepsareoptionalandimplementationofadditional bothspatialaxesbutconsistsof1464channelsalongthefrequency parametersisstraightforwardwithinthecode. axis.Therefore,itssizeis∼10timesthatofthefirstcube. We process the two cubes with identical settings employing a representative combination of the algorithms described in this paper: noise normalisation along the frequency axis; S+C source finding with 12 smoothing kernels; merging of detected voxels 2.7 Outputproducts into sources; calculation of reliability and removal of unreliable Users can decide what output SoFiA should produce. Available sources; optimization of the mask of individual objects using the outputproductsinclude: dilation method; source parameterization including busy function fit;creationofoutputproductsforthecubesasawholeandforthe • Catalog of objects and their parameters, both in ASCII and individualdetections;creationofASCIIandXMLcatalogues.The VO-compliantXMLformat. tworunsarecarriedoutonamachinerunningLinuxMint17with • Finalobjectmask. amemoryof16GBanda2.9GHzIntelCoreprocessor. • Moment 0 and 1 images of the sky area covered by the full Figure6showsthememoryusageofSoFiAasafunctionof datacubedeterminedfromthedatawithinthemask. time for the two cubes. Both axes of the plot are normalised by • Cut-out data cubes containing individual objects as well as thecubesize.Thetimebehaviourofthetwocurvesappearsvery their corresponding mask, moment 0, 1 and 2 images, integrated similar,indicatingthattheexecutiontimescalesapproximatelylin- spectrum and position-velocity diagram along the morphological earlywithcubesizewithintherangeexploredhere.Thememory majoraxis.AnexampleoftheseproductsisshowninFig.5. offsetbetweenthetwocurvesisduetotheloadingintomemoryof anumberoflibrariesusedbySoFiA.Thesecomewithamemory Infuturereleasesitwillbepossibletoproducetheseproductsfor overheadoftheorderofafewtensofmegabytes,whichismore just a subset of the detections by selecting a region of interest in noticeableinthecaseofasmallerdatacube.Fordatacubesmuch sourceparameterspace. larger than this overhead the memory usage is between 2 and 3 This output is designed to not only give useful information timesthesizeofthecube,withoccasionalpeaksbetween3and4 aboutthedetectedsourcesbutalsotoenablefurther,higher-level timesthecubesize. analysisbytheuser.Forexample,thecut-outcubesofindividual Figure6allowsustoinvestigatethememoryandprocessing objectsandthecorrespondingmaskscouldbeusedtomeasuread- timetakenbythevariousalgorithms.Inthiscase,mostofthetime ditional source parameters not included in SoFiA or to produce istakenbytheS+Cfinder.Thebeginningandendofitsexecution Gauss-Hermitevelocityfieldstoenablekinematicalstudies. aremarkedbyopenandfilledblackcirclesforthesmallandlarge (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 7 5 sizecube=79MB sizecube=759MB 4 e b u ec3 z si / y r o 2 m e m 1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 time/size (s/MB) cube Figure6.MemoryusageasafunctionoftimefortworunsofSoFiAontwocubeswhosesizesdifferbyafactor∼10(seelegend).Asexplainedinthetext, bothrunsincludetheuseoftheS+Csourcefinder,andopenandfilledblackcirclesindicatethebeginningandtheendoftheS+Cexecutionforthetwocubes, respectively.SeeSec.2.8foramoredetaileddiscussionofthetimetakenbyotheralgorithmsduringthetwoSoFiAruns. datacube,respectively.ThepeaksinmemoryusageofS+Ccorre- thatitallowsuserstosearchforemissioninanynumberofsmall spondtothesmoothingoperations,whiletheplateausinbetween sub-cubescentredatasetof3Dcoordinateswithinanarbitrarily peakscorrespondtothenoiselevelcalculation.Eachadditionalfil- large data cube. For example, in an era of large HI and optical ter would contribute another ∼ 0.05 s/MB to the execution time redshiftsurveys,thismodecouldbeusedtolookforemissionin inthiscase.Thenoisecalculationappearstobeparticularlytime alarge HI cubeatthelocationofgalaxiesincludedinanoptical consuming.Inthiscase,itiscarriedoutusingtheaforementioned spectroscopiccatalogue. Gaussianfittothenegativesideoffluxhistogram(Sec.2.1).This ThismodeisfullyintegratedinSoFiAandinterestedusers calculationusesthefullcube,andanobviouswaytoincreaseits needtosimplyprovidetheinputdatacubeandacatalogueof3D speedwouldbetocalculatethenoiselevelonasub-cube.Thisop- coordinates.SoFiAwillprocessthevariouspositionssequentially, tionmaybecomeavailableinthefuture. eachtimeloadingintomemoryonlythesub-cubeofinterest.The ThetimebeforethebeginningoftheS+CfinderinFig.6is 3Dsizeofthesub-cubescanbesetbytheuserandisthesamefor takenbythenoisenormalisationalongthefrequencyaxisandby allpositions.Userscanalsorequestthecreationofasingleoutput an initial measurement of the noise level in the normalised cube. catalogueofsources,whichisgeneratedbymergingthecatalogues The time after S+C is taken by all other algorithms listed above, obtainedateachposition. andthesearetypicallymuchfaster.Thememorypeakrightatthe endofS+Ccorrespondstothemergingofvoxelsintosources.The heightofthispeakdependsonthenumberofsourcesdetected.This 2.10 Comparisontoothersourcefinders isfollowedbythecalculationofthereliabilityandtherejectionof unreliablesources,whicharebothrelativelyinexpensiveinterms Anumberofestablishedsoftwarepackagesforthereductionand of memory but can be time consuming. The final memory peaks analysis of interferometric data allow some source finding to be correspondtothecreationofmomentimages. carriedoutondatacubes(e.g.,GIPSY,Miriad).However,this approachrequiresuserstodevelopcustomcodeswhichmakeuse of (and are limited by) the tasks available within those general- purposepackages.Incontrast,themorespecialisedSoFiAoffers 2.9 Source-findingbasedonacatalogueof3Dcoordinates awiderangeofready-to-usesource-findingalgorithms,whichare The above discussion makes it clear that SoFiA is currently not alreadyintegratedwithoneanotherandcanbecombinedinaflex- able to process arbitrarily large data cubes but is limited by the iblewaytoproduceavarietyofoutputproducts. memoryofthesystemonwhichitisrun.Thisproblemispartially Theother3Dapplicationforspectrallinedatawhichshares alleviatedbythefactthatSoFiAisabletolimittheprocessingto some of these characteristics is Duchamp (Whiting 2012). This asub-cubewhoseboundariesarespecifiedbytheuser.Therefore, applicationdetectssourcesusingasimplethresholdmethod(sim- users could choose to run SoFiA multiple times on sufficiently ilartotheonedescribedinSec.2.2)andthengrowsthemusinga small portions of a large input cube, obtaining individual output secondarythreshold.Thisalgorithmdiffersfromthoseavailablein productsforeachofthem.Theycouldthencombinetheseproducts, SoFiAand,inthisrespect,thetwopackagescouldbeseenascom- creatingforexampleasinglemaskorcatalogue.Inthefuturewe plementary(Poppingetal.2012showsthatDuchamphasthebest may be able to offer such breaking up of a large input cube into performance for unresolved sources but does not reach the com- sub-cubes–andthecreationoffinaldataproductsforthefullcube pleteness of S+C for resolved sources). With respect to memory –asaprocessingmodefullyintegratedwiththeothermodulesof requirements, Duchamp is similar to SoFiA in that it loads and SoFiA. processesinonecorethefullinputdatacube.Therefore,ittoois Inthiscontext,ausefulfeaturealreadyavailableinSoFiAis limitedbythememoryofthesystemonwhichitisrun. (cid:13)c 2013RAS,MNRAS000,1–9 8 PaoloSerraetal. AsignificantadvantageofSoFiAcomparedtoDuchampis sources.Thisoutputisdesignedtobothprovideausefuldescrip- that it offers a larger number of algorithms for both source find- tionofthesourcesaswellasfacilitatesubsequentanalysis. ingandparameterization.ThisincludestheS+CandCNHIfinders, WehighlightthemodularityofSoFiA,whichallowsusersto the2D-1Dwaveletde-noising(denoisingisavailableinDuchamp optimize the source-finding and parameterization strategy for the butitusesisotropicwavelets,whicharenotidealforspectralline data and sources of interest. This modularity also enables future sources whose size along the spectral axis is decoupled from the expansionsofSoFiAtoincludenewsource-findingandparame- sizealongthetwospatialaxes),thecalculationofthereliabilityof terisationalgorithms. individualdetections,themaskoptimizationbybinarydilation,the SoFiA is publicly available at the website indicated in Sec. possibilityofsearchingforsignalonthebasisofaninputcatalogue 1togetherwithtechnicalinformationonhowtousethesoftware. of3Dcoordinatesandthebusyfunctionfit.Thecreationofcubelets Softwareupdates,improvementsandbugfixesarepostedregularly andPVdiagramsforindividualdetectionsisalsonotincludedin at this webpage. SoFiA is registered at the Astrophysics Source DuchampbutisavailableinSelavy,asourcefinderbuiltupon CodeLibrarywithIDascl:1412.001. Duchamp for distributed processing of large cubes (Whiting & Humphreys2012).Whilefuturedevelopmentmayreducethedif- ference between Duchamp/Selavy and our package, all above ACKNOWLEDGMENTS methodsareatthemomentuniquetoSoFiA. Finally,itisworthmentioningthatSoFiAdoesnotofferat TheauthorsacknowledgefinancialsupportfromaResearchCol- themomentafullanalysisofthesources’morphology.Forexam- laborationAwardoftheUniversityofWesternAustralia.TvdHand ple,agroupofnearbydetectedvoxelsismergedintoasinglesource NGweresupportedbytheEuropeanResearchCouncilunderunder regardlessofthesizeofthesourceandonlybasedonthemerging theEuropeanUnion’sSeventhFrameworkProgramme(FP/2007- elementchosenbytheuser(Sec.2.3).Thismeansthatnoinforma- 2013)/ERCGrantAgreementnr.291531.LFacknowledgessup- tionisgivenaboutwhetherthesource,whichcouldbeverylarge, portbytheDeutscheForschungsgemeinschaft(DFG)undergrant iscomposedofdistinctandeasilyrecognisablecomponents.Dif- numbers KE757/7-1, KE757/7-2, KE757/7-3 and KE757/9-1. LF ferentandmorespecialisedsourcefindersareabletoprovidesuch isamemberoftheInternationalMaxPlanckResearchSchool(IM- characterisation(e.g.,ClumpfindbyWilliams,deGeus&Blitz PRS)forAstronomyandAstrophysicsattheUniversitiesofBonn 1994;BLOBCATbyHalesetal.2012).Wenotehoweverthatthe andCologne. objectmaskcubereturnedbySoFiAcouldbeusedasastarting pointforfurthermorphologicalanalysisofthedetections. REFERENCES AstropyCollaborationetal.,2013,A&A,558,A33 BardenM.,Ha¨ußlerB.,PengC.Y.,McIntoshD.H.,GuoY.,2012, 3 SUMMARY MNRAS,422,449 We provide a high-level description of SoFiA, a flexible source Behnel S., Bradshaw R., Citro C., Dalcin L., Seljebotn D. S., finderfor3Dspectrallinedata.SoFiAputstogetherforthefirst SmithK.,2011,ComputinginScience&Engineering,13,31 timeinasinglepackageanumberofnewsource-findingandpa- BertinE.,ArnoutsS.,1996,A&AS,117,393 rameterization algorithms developed in preparation of upcoming CappellariM.etal.,2011,MNRAS,413,813 HIsurveyswithASKAP(WALLABY,DINGO)andAPERTIF.It CourtoisH.M.,TullyR.B.,FisherJ.R.,BonhommeN.,Zavodny is,however,designedtoenabletheuseofthesenewalgorithmson M.,BarnesA.,2009,AJ,138,1938 anydatacubeindependentofemissionlineortelescopeused. CroomS.M.etal.,2012,MNRAS,421,872 Wedescribethevariousmethodsandalgorithmsavailablein Flo¨erL.,WinkelB.,2012,PublicationsoftheAstronomicalSoci- SoFiA as well as planned developments. One key advantage of etyofAustralia,29,244 SoFiAisthatitallowsuserstosearchforspectrallinesignalon Hales C. A., Murphy T., Curran J. R., Middelberg E., Gaensler multiple scales on the sky and in frequency (using. e.g., the S+C B.M.,NorrisR.P.,2012,MNRAS,425,979 finderorthe2D-1Dwaveletfilter),whichiscrucialtodetectand HunterJ.D.,2007,ComputingInScience&Engineering,9,90 parameterize3Dsourcesinacompleteandreliableway.Further- JonesE.,OliphantT.,PetersonP.,etal.,2001,SciPy:Opensource more,withinSoFiAitispossibletotakeintoaccountnoiselevel scientifictoolsforPython.[Online;accessed2014-07-02] variationsacrossthecubeandthepresenceoferrorsandartefacts. JurekR.,2012,PublicationsoftheAstronomicalSocietyofAus- Moreover, SoFiA is able to estimate the reliability of individual tralia,29,251 detections,whichshouldbeparticularlyusefulforsurveysexpected KoribalskiB.S.,2012a,PublicationsoftheAstronomicalSociety todetectalargenumberofsources.Itcanalsoproduceavariety ofAustralia,29,359 of output products, including moment images, cut-out cubes and KoribalskiB.S.,2012b,PublicationsoftheAstronomicalSociety images,integratedspectraandcataloguesofsourceparameters.Fi- ofAustralia,29,213 nally,SoFiAisabletosearchforlineemissioninarbitrarilylarge LutzR.K.,1980,TheComputerJournal,23,262 datacubesonthebasisofacatalogueof3Dcoordinates.Mostof MasiasM.,FreixenetJ.,Llado´ X.,PeracaulaM.,2012,MNRAS, thesemethodsarenotavailableinothersourcefindersandarecur- 422,1674 rentlyuniquetoSoFiA. MeyerM.,2009,inPanoramicRadioAstronomy:Wide-field1-2 WeprovideafewvisualexamplesofhowSoFiAworksin- GHzResearchonGalaxyEvolution cluding a view of the dedicated graphical user interface. We de- PoppingA.,JurekR.,WestmeierT.,SerraP.,Flo¨erL.,MeyerM., scribetheavailableparameterizationandthewiderangeofoutput KoribalskiB.,2012,PublicationsoftheAstronomicalSocietyof products, which include mask cubes, moment images, position- Australia,29,318 velocity diagrams and busy function spectral fits of individual Sa´nchezS.F.etal.,2012,A&A,538,A8 (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 9 SarziM.etal.,2006,MNRAS,366,1151 SerraP.,JurekR.,Flo¨erL.,2012,PublicationsoftheAstronomi- calSocietyofAustralia,29,296 SerraP.etal.,2012,MNRAS,422,1835 VerheijenM.A.W.,OosterlooT.A.,vanCappellenW.A.,Bakker L.,IvashinaM.V.,vanderHulstJ.M.,2008,inAmericanInsti- tuteofPhysicsConferenceSeries,Vol.1035,TheEvolutionof Galaxies Through the Neutral Hydrogen Window, Minchin R., MomjianE.,eds.,pp.265–271 Walt S. v. d., Colbert S. C., Varoquaux G., 2011, Computing in Science&Engineering,13,22 Westmeier T., Jurek R., Obreschkow D., Koribalski B. S., Staveley-SmithL.,2014,MNRAS,438,1176 WestmeierT.,PoppingA.,SerraP.,2012,PublicationsoftheAs- tronomicalSocietyofAustralia,29,276 WhitingM.,HumphreysB.,2012,PublicationsoftheAstronom- icalSocietyofAustralia,29,371 WhitingM.T.,2012,MNRAS,421,3242 WilliamsJ.P.,deGeusE.J.,BlitzL.,1994,ApJ,428,693 (cid:13)c 2013RAS,MNRAS000,1–9