arXiv:astro-ph/0003162v3

mkxk.com  时间:2021-04-10  阅读:()
31May2001GADGET:AcodeforcollisionlessandgasdynamicalcosmologicalsimulationsVolkerSpringel1,2,NaokiYoshida1andSimonD.
M.
White11Max-Planck-Institutf¨urAstrophysik,Karl-Schwarzschild-Strae1,85740GarchingbeiM¨unchen,Germany2Harvard-SmithsonianCenterforAstrophysics,60GardenStreet,Cambridge,MA02138,USAAbstractWedescribethenewlywrittencodeGADGETwhichissuitablebothforcosmologicalsimulationsofstructureformationandforthesimulationofinteractinggalaxies.
GADGETevolvesself-gravitatingcollisionlessuidswiththetraditionalN-bodyapproach,andacollisionalgasbysmoothedparticlehydrodynamics.
Alongwiththeserialversionofthecode,wediscussaparallelversionthathasbeendesignedtorunonmassivelyparallelsupercomputerswithdistributedmemory.
Whilebothversionsuseatreealgorithmtocomputegravitationalforces,theserialversionofGADGETcanoptionallyemploythespecial-purposehardwareGRAPEinsteadofthetree.
PeriodicboundaryconditionsaresupportedbymeansofanEwaldsummationtechnique.
Thecodeusesindividualandadaptivetimestepsforallparticles,anditcombinesthiswithaschemefordynamictreeupdates.
DuetoitsLagrangiannature,GADGETthusallowsaverylargedynamicrangetobebridged,bothinspaceandtime.
Sofar,GADGEThasbeensuccessfullyusedtorunsimulationswithupto7.
5*107particles,includingcosmologicalstudiesoflarge-scalestructureformation,high-resolutionsimulationsoftheformationofclustersofgalaxies,aswellasworkstation-sizedproblemsofinteractinggalaxies.
Inthisstudy,wedetailthenumericalalgorithmsemployed,andshowvarioustestsofthecode.
Wepublicallyreleaseboththeserialandthemassivelyparallelversionofthecode.
Keywords:methods:numerical–galaxies:interactions–cosmology:darkmatter.
1.
IntroductionNumericalsimulationsofthree-dimensionalself-gravitatinguidshavebecomeanindispensabletoolincosmology.
Theyarenowroutinelyusedtostudythenon-lineargravitationalclusteringofdarkmatter,theformationofclustersofgalax-ies,theinteractionsofisolatedgalaxies,andtheevolutionoftheintergalacticgas.
Withoutnumericaltechniquestheim-menseprogressmadeintheseeldswouldhavebeennearlyimpossible,sinceanalyticcalculationsareoftenrestrictedtoidealizedproblemsofhighsymmetry,ortoapproximatetreatmentsofinherentlynonlinearproblems.
Theadvancesinnumericalsimulationshavebecomepos-siblebothbytherapidgrowthofcomputerperformanceandbytheimplementationofevermoresophisticatednumericalalgorithms.
Thedevelopmentofpowerfulsimulationcodesstillremainsaprimarytaskifonewantstotakefulladvan-tageofnewcomputertechnologies.
Earlysimulations(Holmberg1941;Peebles1970;Press&Schechter1974;White1976,1978;Aarsethetal.
1979,amongothers)largelyemployedthedirectsummationmethodforthegravitationalN-bodyproblem,whichremainsusefulincollisionalstellardynamicalsystems,butitisinef-cientforlargeNduetotherapidincreaseofitscomputa-tionalcostwithN.
AlargenumberofgroupshavethereforedevelopedN-bodycodesforcollisionlessdynamicsthatcom-putethelarge-scalegravitationaleldbymeansofFouriertechniques.
ThesearethePM,P3M,andAP3Mcodes(East-wood&Hockney1974;Hohl1978;Hockney&Eastwood1981;Efstathiouetal.
1985;Couchman1991;Bertschinger&Gelb1991;MacFarlandetal.
1998).
Modernversionsofthesecodessupplementtheforcecomputationonscalesbelowthemeshsizewithadirectsummation,and/ortheyplacemeshrenementsonhighlyclusteredregions.
Pois-son'sequationcanalsobesolvedonahierarchicallyrenedmeshbymeansofnite-differencerelaxationmethods,anap-proachtakenintheARTcodebyKravtsovetal.
(1997).
Analternativetotheseschemesaretheso-calledtreeal-gorithms,pioneeredbyAppel(1981,1985).
Treealgorithmsarrangeparticlesinahierarchyofgroups,andcomputethegravitationaleldatagivenpointbysummingovermulti-poleexpansionsofthesegroups.
Inthiswaythecomputa-tionalcostofacompleteforceevaluationcanbereducedtoaO(NlogN)scaling.
Thegroupingitselfcanbeachieved1V.
Springel,N.
YoshidaandS.
D.
M.
Whiteinvariousways,forexamplewithEuleriansubdivisionsofspace(Barnes&Hut1986),orwithnearest-neighbourpair-ings(Press1986;Jernigan&Porter1989).
Atechniquere-latedtoordinarytreealgorithmsisthefastmultipole-method(e.
g.
Greengard&Rokhlin1987),wheremultipoleexpan-sionsarecarriedoutforthegravitationaleldinaregionofspace.
Whilemesh-basedcodesaregenerallymuchfasterforclose-to-homogeneousparticledistributions,treecodescanadaptexiblytoanyclusteringstatewithoutsignicantlossesinspeed.
ThisLagrangiannatureisagreatadvantageifalargedynamicrangeindensityneedstobecovered.
Heretreecodescanoutperformmeshbasedalgorithms.
Inaddi-tion,treecodesarebasicallyfreefromanygeometricalre-strictions,andtheycanbeeasilycombinedwithintegrationschemesthatadvanceparticlesonindividualtimesteps.
Recently,PMandtreesolvershavebeencombinedintohybridTree-PMcodes(Xu1995;Bagla1999;Bodeetal.
2000).
Inthisapproach,thespeedandaccuracyofthePMmethodforthelong-rangepartofthegravitationalforcearecombinedwithatree-computationoftheshort-rangeforce.
ThismaybeseenasareplacementofthedirectsummationPPpartinP3Mcodeswithatreealgorithm.
TheTree-PMtechniqueisclearlyapromisingnewmethod,especiallyiflargecosmologicalvolumeswithstrongclusteringonsmallscalesarestudied.
YetanotherapproachtotheN-bodyproblemisprovidedbyspecial-purposehardwareliketheGRAPEboard(Makino1990;Itoetal.
1991;Fukushigeetal.
1991;Makino&Funato1993;Ebisuzakietal.
1993;Okumuraetal.
1993;Fukushigeetal.
1996;Makinoetal.
1997;Kawaietal.
2000).
Itcon-sistsofcustomchipsthatcomputegravitationalforcesbythedirectsummationtechnique.
Bymeansoftheirenormouscomputationalspeedtheycanconsiderablyextendtherangewheredirectsummationremainscompetitivewithpuresoft-waresolutions.
ArecentoverviewofthefamilyofGRAPE-boardsisgivenbyHut&Makino(1999).
Thenewestgen-erationofGRAPEtechnology,theGRAPE-6,willachieveapeakperformanceofupto100TFlops(Makino2000),allow-ingdirectsimulationsofdensestellarsystemswithparticlenumbersapproaching106.
Usingsophisticatedalgorithms,GRAPEmayalsobecombinedwithP3M(Brieuetal.
1995)ortreealgorithms(Fukushigeetal.
1991;Makino1991a;Athanassoulaetal.
1998)tomaintainitshighcomputationalspeedevenformuchlargerparticlenumbers.
Inrecentyears,collisionlessdynamicshasalsobeencou-pledtogasdynamics,allowingamoredirectlinktoobserv-ablequantities.
Traditionally,hydrodynamicalsimulationshaveusuallyemployedsomekindofmeshtorepresentthedynamicalquantitiesoftheuid.
Whileaparticularstrengthofthesecodesistheirabilitytoaccuratelyresolveshocks,themeshalsoimposesrestrictionsonthegeometryoftheprob-lem,andontothedynamicrangeofspatialscalesthatcanbesimulated.
Newadaptivemeshrenementcodes(Norman&Bryan1998;Kleinetal.
1998)havebeendevelopedtopro-videasolutiontothisproblem.
Incosmologicalapplications,itisoftensufcienttode-scribethegasbysmoothedparticlehydrodynamics(SPH),asinventedbyLucy(1977)andGingold&Monaghan(1977).
Theparticle-basedSPHisextremelyexibleinitsabilitytoadapttoanygivengeometry.
Moreover,itsLagrangiannatureallowsalocallychangingresolutionthat'automatically'fol-lowsthelocalmassdensity.
Thisconvenientfeaturehelpstosavecomputingtimebyfocusingthecomputationaleffortonthoseregionsthathavethelargestgasconcentrations.
Fur-thermore,SPHtiesnaturallyintotheN-bodyapproachforself-gravity,andcanbeeasilyimplementedinthreedimen-sions.
TheseadvantageshaveledanumberofauthorstodevelopSPHcodesforapplicationsincosmology.
AmongthemareTREESPH(Hernquist&Katz1989;Katzetal.
1996),GRAPE-SPH(Steinmetz1996),HYDRA(Couchmanetal.
1995;Pearce&Couchman1997),andcodesbyEvrard(1988);Navarro&White(1993);Hultman&K¨allander(1997);Daveetal.
(1997);Carraroetal.
(1998).
SeeKangetal.
(1994)andFrenketal.
(1999)foracomparisonofmanyofthesecosmo-logicalhydrodynamiccodes.
InthispaperwedescribeoursimulationcodeGADGET(GAlaxieswithDarkmatterandGasintEracT),whichcanbeusedbothforstudiesofisolatedself-gravitatingsystemsincludinggas,orforcosmologicalN-body/SPHsimulations.
Wehavedevelopedtwoversionsofthiscode,aserialwork-stationversion,andaversionformassivelyparallelsuper-computerswithdistributedmemory.
Theworkstationcodeuseseitheratreealgorithmfortheself-gravity,orthespecial-purposehardwareGRAPE,ifavailable.
Theparallelversionworkswithatreeonly.
NotethatinprincipleseveralGRAPEboards,eachconnectedtoaseparatehostcomputer,canbecombinedtoworkasalargeparallelmachine,butthispossi-bilityisnotimplementedintheparallelcodeyet.
Whiletheserialcodelargelyfollowsknownalgorithmictechniques,weemployanovelparallelizationstrategyintheparallelversion.
Aparticularemphasisofourworkhasbeenontheuseofatimeintegrationschemewithindividualandadaptiveparticletimesteps,andontheeliminationofsourcesofoverheadbothintheserialandparallelcodeunderconditionsoflargedy-namicrangeintimestep.
Suchconditionsoccurindissipativegas-dynamicalsimulationsofgalaxyformation,butalsoinhigh-resolutionsimulationsofcolddarkmatter.
Thecodeal-lowstheusageofdifferenttimestepcriteriaandcell-openingcriteria,anditcanbecomfortablyappliedtoawiderangeofapplications,includingcosmologicalsimulations(withorwithoutperiodicboundaries),simulationsofisolatedorinter-actinggalaxies,andstudiesoftheintergalacticmedium.
WethusthinkthatGADGETisaveryexiblecodethatavoidsobviousintrinsicrestrictionsforthedynamicrangeoftheproblemsthatcanbeaddressedwithit.
Inthismethods-paper,wedescribethealgorithmicchoicesmadeinGADGET2GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulationswhichwereleaseinitsparallelandserialversionsonthein-ternet1,hopingthatitwillbeusefulforpeopleworkingoncosmologicalsimulations,andthatitwillstimulatecodede-velopmenteffortsandfurthercode-sharinginthecommunity.
Thispaperisstructuredasfollows.
InSection2,wegiveabriefsummaryoftheimplementedphysics.
InSection3,wediscussthecomputationofthegravitationalforcebothwithatreealgorithm,andwithGRAPE.
WethendescribeourspecicimplementationofSPHinSection4,andwediscussourtimeintegrationschemeinSection5.
TheparallelizationofthecodeisdescribedinSection6,andtestsofthecodearepresentedinSection7.
Finally,wesummarizeinSection8.
2.
Implementedphysics2.
1.
CollisionlessdynamicsandgravityDarkmatterandstarsaremodeledasself-gravitatingcol-lisionlessuids,i.
e.
theyfulllthecollisionlessBoltzmannequation(CBE)dfdt≡ft+vfxΦrfv=0,(1)wheretheself-consistentpotentialΦisthesolutionofPois-son'sequation2Φ(r,t)=4πGf(r,v,t)dv,(2)andf(r,v,t)isthemassdensityinsingle-particlephase-space.
Itisverydifculttosolvethiscoupledsystemofequa-tionsdirectlywithnitedifferencemethods.
Instead,wewillfollowthecommonN-bodyapproach,wherethephaseuidisrepresentedbyNparticleswhichareintegratedalongthecharacteristiccurvesoftheCBE.
Inessence,thisisaMonteCarloapproachwhoseaccuracydependscruciallyonasuf-cientlyhighnumberofparticles.
TheN-bodyproblemisthusthetaskoffollowingNewton'sequationsofmotionforalargenumberofparticlesundertheirownself-gravity.
Notethatwewillintroduceasoften-ingintothegravitationalpotentialatsmallseparations.
Thisisnecessarytosuppresslarge-anglescatteringintwo-bodycollisionsandeffectivelyintroducesalowerspatialresolutioncut-off.
Foragivensofteninglength,itisimportanttochoosetheparticlenumberlargeenoughsuchthatrelaxationeffectsduetotwo-bodyencountersaresuppressedsufciently,oth-erwisetheN-bodysystemprovidesnofaithfulmodelforacollisionlesssystem.
Notethattheoptimumchoiceofsoft-eninglengthasafunctionofparticledensityisanissuethatisstillactivelydiscussedintheliterature(e.
g.
Splinteretal.
1998;Romeo1998;Athanassoulaetal.
2000).
1GADGET'sweb-siteis:http://www.
mpa-garching.
mpg.
de/gadget2.
2.
GasdynamicsAsimpledescriptionoftheintergalacticmedium(IGM),ortheinterstellarmedium(ISM),maybeobtainedbymodelingitasanideal,inviscidgas.
Thegasisthengovernedbythecontinuityequationdρdt+ρ·v=0,(3)andtheEulerequationdvdt=PρΦ.
(4)Further,thethermalenergyuperunitmassevolvesaccordingtotherstlawofthermodynamics,viz.
dudt=Pρ·vΛ(u,ρ)ρ.
(5)HereweusedLagrangiantimederivatives,i.
e.
ddt=t+v·,(6)andweallowedforapieceof'extra'physicsinformofthecoolingfunctionΛ(u,ρ),describingexternalsinksorsourcesofheatforthegas.
Forasimpleidealgas,theequationofstateisP=(γ1)ρu,(7)whereγistheadiabaticexponent.
Weusuallytakeγ=5/3,appropriateforamono-atomicidealgas.
Theadiabaticsoundspeedcofthisgasisc2=γP/ρ.
3.
Gravitationalforces3.
1.
TreealgorithmAnalternativetoFouriertechniques,ortodirectsummation,aretheso-calledtreemethods.
Intheseschemes,theparticlesarearrangedinahierarchyofgroups.
Whentheforceonaparticularparticleiscomputed,theforceexertedbydistantgroupsisapproximatedbytheirlowestmultipolemoments.
Inthisway,thecomputationalcostforacompleteforceeval-uationcanbereducedtoorderO(NlogN)(Appel1985).
Theforcesbecomemoreaccurateifthemultipoleexpansioniscarriedouttohigherorder,buteventuallytheincreasingcostofevaluatinghighermomentsmakesitmoreefcienttoterminatethemultipoleexpansionandratherusealargernumberofsmallertreenodestoachieveadesiredforceaccu-racy(McMillan&Aarseth1993).
Wewillfollowthecom-moncompromisetoterminatetheexpansionafterquadrupolemomentshavebeenincluded.
3V.
Springel,N.
YoshidaandS.
D.
M.
WhiteFigure1:SchematicillustrationoftheBarnes&Hutoct-treeintwodimensions.
Theparticlesarerstenclosedinasquare(rootnode).
Thissquareistheniterativelysubdividedinfoursquaresofhalfthesize,untilexactlyoneparticleisleftineachnalsquare(leavesofthetree).
Intheresultingtreestructure,eachsquarecanbeprogenitorofuptofoursiblings.
Notethatemptysquaresneednottobestored.
WeemploytheBarnes&Hut(1986,henceforthBH)treeconstructioninthiswork.
Inthisscheme,thecomputationaldomainishierarchicallypartitionedintoasequenceofcubes,whereeachcubecontainseightsiblings,eachwithhalftheside-lengthoftheparentcube.
Thesecubesformthenodesofanoct-treestructure.
Thetreeisconstructedsuchthateachnode(cube)containseitherexactlyoneparticle,orispro-genitortofurthernodes,inwhichcasethenodecarriesthemonopoleandquadrupolemomentsofalltheparticlesthatlieinsideitscube.
AschematicillustrationoftheBHtreeisshowninFigure1.
Aforcecomputationthenproceedsbywalkingthetree,andsummingupappropriateforcecontributionsfromtreenodes.
InthestandardBHtreewalk,themultipoleexpan-sionofanodeofsizelisusedonlyifr>lθ,(8)whereristhedistanceofthepointofreferencetothecenter-of-massofthecellandθisaprescribedaccuracyparameter.
Ifanodefulllsthecriterion(8),thetreewalkalongthisbranchcanbeterminated,otherwiseitis'opened',andthewalkiscontinuedwithallitssiblings.
Forsmallervaluesoftheopeningangle,theforceswillingeneralbecomemoreaccurate,butalsomorecostlytocompute.
Onecantrytomodifytheopeningcriterion(8)toobtainhigherefciency,i.
e.
higheraccuracyatagivenlengthoftheinteractionlist,somethingthatwewilldiscussinmoredetailinSection3.
3.
Atechnicaldifcultyariseswhenthegravityissoftened.
Inregionsofhighparticledensity(e.
g.
centersofdarkhaloes,orcolddensegasknotsindissipativesimulations),itcanhap-penthatnodesfulllequation(8),andsimultaneouslyonehasrh,whilesomeotherpossibleforcelaws,likethePlummersoftening,con-vergerelativelyslowlytoNewton'slaw.
Letsbethecenter-of-mass,andMthetotalmassoftheparticles.
Furtherwedeney≡rs.
Thepotentialmaythenbeexpandedinamultipoleseriesassuming|y||xks|.
Uptoquadrupoleorder,thisresultsinΦ(r)=GMg(y)(11)+12yTg′′(y)y2Q+g′(y)y3(PQ)y.
HerewehaveintroducedthetensorsQ=kmk(xks)(xks)T=kmkxkxTkMssT,(12)4GADGET:AcodeforcollisionlessandgasdynamicalcosmologicalsimulationsandP=Ikmk(xks)2=Ikmkx2kMs2,(13)whereIistheunitmatrix.
NotethatforNewtoniangravity,equation(11)reducestothemorefamiliarformΦ(r)=GMy+12yT3QPy5y.
(14)Finally,thequadrupoleapproximationofthesoftenedgravi-tationaleldisgivenbyf(r)=Φ=GMg1(y)y+g2(y)Qy+12g3(y)yTQyy+12g4(y)Py.
(15)Hereweintroducedthefunctionsg1(y),g2(y),g3(y),andg4(y)asconvenientabbreviations.
TheirdenitionisgivenintheAppendix.
IntheNewtoniancase,thissimpliestof(r)=GMy3y+3Qy5y152yTQyy7y+32Py5y.
(16)Notethatalthoughequation(15)looksrathercumbersome,itsactualnumericalcomputationisonlymarginallymorecostlythanthatoftheNewtonianform(16)becauseallfac-torsinvolvingg(y)andderivativesthereofcanbetabulatedforlateruseinrepeatedforcecalculations.
3.
2.
TreeconstructionandtreewalksThetreeconstructioncanbedonebyinsertingtheparticlesoneaftertheotherinthetree.
Oncethegroupingiscom-pleted,themultipolemomentsofeachnodecanberecur-sivelycomputedfromthemomentsofitsdaughternodes(McMillan&Aarseth1993).
Inordertoreducethestoragerequirementsfortreenodes,weusesingle-precisionoatingpointnumberstostorenodeproperties.
Theprecisionoftheresultingforcesisstillfullysufcientforcollisionlessdynamicsaslongasthenodeprop-ertiesarecalculatedaccuratelyenough.
Intherecursivecal-culation,nodepropertieswillbecomputedfromnodesthatarealreadystoredinsingleprecision.
Whentheparticlenum-berbecomesverylarge(notethatmorethan10millionpar-ticlescanbeusedinsingleobjectslikeclustersthesedays),lossofsufcientprecisioncanthenresultforcertainparticledistributions.
Inordertoavoidthisproblem,GADGETop-tionallyusesanalternativemethodtocomputethenodeprop-erties.
Inthismethod,alink-liststructureisusedtoaccessalloftheparticlesrepresentedbyeachtreenode,allowingacomputationofthenodepropertiesindouble-precisionandastorageoftheresultsinsingle-precision.
Whilethistech-niqueguaranteesthatnodepropertiesareaccurateuptoarel-ativeerrorofabout106,itisalsoslowerthantherecursivecomputation,becauseitrequiresoforderO(NlogN)oper-ations,whiletherecursivemethodisonlyoforderO(N).
Thetree-constructioncanbeconsideredveryfastinbothcases,becausethetimespentforitisnegligiblecomparedtoacompleteforcewalkforallparticles.
However,inthein-dividualtimeintegrationschemeonlyasmallfractionofallparticlesmayrequireaforcewalkateachgiventimestep.
Ifthisfractiondropsbelow1percent,afullreconstructionofthetreecantakeasmuchtimeastheforcewalkitself.
For-tunately,mostofthistreeconstructiontimecanbeeliminatedbydynamictreeupdates(McMillan&Aarseth1993),whichwediscussinmoredetailinSection5.
Themosttimecon-sumingroutineinthecodewillthenalwaysremainthetreewalk,andoptimizingitcanconsiderablyspeeduptreecodes.
Interestingly,inthegroupingtechniqueofBarnes(1990),thespeedofthegravitationalforcecomputationcanbeincreasedbyperformingacommontree-walkforalocalizedgroupofparticles.
Eventhoughtheaveragelengthoftheinteractionlistforeachparticlesbecomeslargerinthisway,thiscanbeoffsetbysavingsomeofthetree-walkoverhead,andbyimprovedcacheutilization.
Unfortunately,thisadvantageisnoteasilykeptifindividualtimestepsareused,whereonlyasmallfractionoftheparticlesareactive,sowedonotusegrouping.
GADGETallowsdifferentgravitationalsofteningsforparti-clesofdifferent'type'.
Inordertoguaranteemomentumcon-servation,thisrequiresasymmetrizationoftheforcewhenparticleswithdifferentsofteninglengthsinteract.
Wesym-metrizethesofteningsintheformh=max(hi,hj).
(17)However,theusageofdifferentsofteninglengthsleadstocomplicationsforsoftenedtreenodes,becausestrictlyspeak-ing,themultipoleexpansionisonlyvalidifalltheparticlesinthenodehavethesamesoftening.
GADGETsolvesthisproblembyconstructingseparatetreesforeachspeciesofparticleswithdifferentsoftening.
Aslongasthesespeciesaremoreorlessspatiallyseparated(e.
g.
darkhalo,stellardisk,andstellarbulgeinsimulationsofinteractinggalaxies),nosevereperformancepenaltyresults.
However,thisisdif-ferentiftheuidsarespatiallywell'mixed'.
Hereasingletreewouldresultinhigherperformanceofthegravitycom-putation,soitisadvisabletochooseasinglesofteninginthiscase.
NotethatforSPHparticlesweneverthelessalwayscre-ateaseparatetreetoallowitsuseforafastneighboursearch,aswillbediscussedbelow.
3.
3.
Cell-openingcriterionTheaccuracyoftheforceresultingfromatreewalkdependssensitivelyonthecriterionusedtodecidewhetherthemulti-poleapproximationforagivennodeisacceptable,orwhetherthenodehastobe'opened'forfurtherrenement.
Thestan-dardBHopeningcriteriontriestolimittherelativeerrorof5V.
Springel,N.
YoshidaandS.
D.
M.
Whiteeveryparticle-nodeinteractionbycomparingaroughesti-mateofthesizeofthequadrupoleterm,Ml2/r4,withthesizeofthemonopoleterm,M/r2.
Theresultisthepurelygeometricalcriterionofequation(8).
However,asSalmon&Warren(1994)havepointedout,theworst-casebehaviouroftheBHcriterionforcommonlyemployedopeninganglesissomewhatworrying.
Althoughtypicallyveryrareinrealastrophysicalsimulations,thege-ometricalcriterion(8)canthensometimesleadtoverylargeforceerrors.
Inordertocurethisproblem,anumberofmod-icationsofthecell-openingcriterionhavebeenproposed.
Forexample,Dubinskietal.
(1996)haveusedthesimplemodicationr>l/θ+δ,wherethequantityδgivesthedis-tanceofthegeometriccenterofthecelltoitscenter-of-mass.
Thisprovidesprotectionagainstpathologicalcaseswherethecenter-of-massliesclosetoanedgeofacell.
Suchmodicationscanhelptoreducetherateatwhichlargeforceerrorsoccur,buttheyusuallydonothelptodealwithanotherproblemthatarisesforgeometricopeningcri-teriainthecontextofcosmologicalsimulationsathighred-shift.
Here,thedensityeldisveryclosetobeinghomoge-neousandthepeculiaraccelerationsaresmall.
Foratreeal-gorithmthisisasurprisinglytoughproblem,becausethetreecodealwayshastosumuppartialforcesfromallthemassinasimulation.
Smallnetforcesathighzthenariseinadelicatecancellationprocessbetweenrelativelylargepartialforces.
Ifapartialforceisindeedmuchlargerthanthenetforce,evenasmallrelativeerrorinitisenoughtoresultinalargerelativeerrorofthenetforce.
Foranunclusteredpar-ticledistribution,theBHcriterionthereforerequiresamuchsmallervalueoftheopeninganglethanforaclusteredoneinordertoachieveasimilarlevelofforceaccuracy.
Alsonotethatinacosmologicalsimulationtheabsolutesizesofforcesbetweenagivenparticleandtree-nodesofacertainopeninganglecanvarybymanyordersofmagnitude.
Inthissituation,thepurelygeometricalBHcriterionmayendupin-vestingalotofcomputationaleffortfortheevaluationofallpartialforcestothesamerelativeaccuracy,irrespectiveoftheactualsizeofeachpartialforceandthesizeoftheabsoluteerrorthusinduced.
Itwouldbebettertoinvestmorecompu-tationaleffortinregionsthatprovidemostoftheforceontheparticleandlessinregionswhosemasscontentisunimpor-tantforthetotalforce.
AssuggestedbySalmon&Warren(1994),onemaythere-foretrytodeviseacell-openingcriterionthatlimitstheabso-luteerrorineverycell-particleinteraction.
Inprinciple,onecanuseanalyticerrorbounds(Salmon&Warren1994)toobtainasuitablecell-openingcriterion,buttheevaluationoftherelevantexpressionscanconsumesignicantamountsofCPUtime.
Ourapproachtoanewopeningcriterionislessstringent.
Assumetheabsolutesizeofthetruetotalforceisalreadyknownbeforethetreewalk.
Inthepresentcode,wewillusetheaccelerationoftheprevioustimestepasahandyapprox-imatevalueforthat.
Wewillnowrequirethattheestimatederrorofanacceptablemultipoleapproximationissomesmallfractionofthistotalforce.
Sincewetruncatethemultipoleexpansionatquadrupoleorder,theoctupolemomentwillingeneralbethelargesttermintheneglectedpartofthese-ries,exceptwhenthemassdistributioninthecubicalcellisclosetobeinghomogeneous.
Forahomogeneouscubetheoctupolemomentvanishesbysymmetry(Barnes&Hut1989),suchthatthehexadecapolemomentformstheleadingterm.
WemayveryroughlyestimatethesizeofthesetermsasM/r2(l/r)3,orM/r2(l/r)4,respectively,andtakethisasaroughestimateofthesizeofthetruncationerror.
Wecanthenrequirethatthiserrorshouldnotexceedsomefractionαofthetotalforceontheparticle,wherethelatterisestimatedfromtheprevioustimestep.
Assumingtheoctupolescaling,atree-nodehasthentobeopenedifMl3>α|aold|r5.
How-ever,wehavefoundthatinpracticetheopeningcriterionMl4>α|aold|r6(18)providesstillbetterperformanceinthesensethatitproducesforcesthataremoreaccurateatagivencomputationalex-pense.
Itisalsosomewhatcheapertoevaluateduringthetree-walk,becauser6issimplertocomputethanr5,whichrequirestheevaluationofarootofthesquarednodedis-tance.
Thecriterion(18)doesnotsufferfromthehigh-zprob-lemdiscussedabove,becausethesamevalueofαproducesacomparableforceaccuracy,independentoftheclusteringstateofthematerial.
However,westillneedtocomputetheveryrstforceusingtheBHcriterion.
InSection7.
2,wewillshowsomequantitativemeasurementsoftherelativeper-formanceofthetwocriteria,andcompareittotheoptimumcell-openingstrategy.
Notethatthecriterion(18)isnotcompletelysafefromworst-caseforceerrorseither.
Inparticular,sucherrorscanoccurforopeninganglessolargethatthepointofforceeval-uationfallsintothenodeitself.
Ifthishappens,noupperboundontheforceerrorcanbeguaranteed(Salmon&War-ren1994).
Asanoptiontothecode,wethereforecombinetheopeningcriterion(18)withtherequirementthatthepointofreferencemaynotlieinsidethenodeitself.
Weformulatethisadditionalconstraintintermsofr>bmax,wherebmaxisthemaximumdistanceofthecenter-of-massfromanypointinthecell.
Thisadditionalgeometricalconstraintprovidesaveryconservativecontrolofforceerrorsifthisisneeded,butincreasesthenumberofopenedcells.
3.
4.
SpecialpurposehardwareAnalternativetosoftwaresolutionstotheN2-bottleneckofself-gravityisprovidedbytheGRAPE(GRAvityPipE)special-purposehardware.
Itisdesignedtosolvethegrav-itationalN-bodyprobleminadirectsummationapproachbymeansofitssuperiorcomputationalspeed.
Thelatterisachievedwithcustomchipsthatcomputethegravitational6GADGET:AcodeforcollisionlessandgasdynamicalcosmologicalsimulationsforcewithahardwiredPlummerforcelaw.
ThePlummer-potentialofGRAPEtakestheformΦ(r)=Gjmj(|rrj|2+2)12.
(19)Asanexample,theGRAPE-3AboardsinstalledattheMPAin1998have40N-bodyintegratorchipsintotalwithanap-proximatepeakperformanceof25GFlops.
Recently,newergenerationsofGRAPEboardshaveachievedevenhighercomputationalspeeds.
Infact,withtheGRAPE-4the1TFlopbarrierwasbroken(Makinoetal.
1997),andevenfasterspecial-purposemachinesareinpreparation(Hut&Makino1999;Makino2000).
Themostrecentgeneration,GRAPE-6,cannotonlycomputeaccelerations,butalsoitsrstandsec-ondtimederivatives.
Togetherwiththecapabilitytoperformparticlepredictions,thesemachinesareidealforhigh-orderHermiteintegrationschemesappliedinsimulationsofcolli-sionalsystemslikestarclusters.
However,ourpresentcodeisonlyadaptedtothesomewhatolderGRAPE-3(Okumuraetal.
1993),andthefollowingdiscussionislimitedtoit.
TheGRAPE-3Aboardsareconnectedtoanordinarywork-stationviaaVMEorPCIinterface.
Theboardsconsistofmemorychipsthatcanholdupto131072particlecoordi-nates,andofintegratorchipsthatcancomputetheforcesex-ertedbytheseparticlesfor40positionsinparallel.
Higherparticlenumberscanbeprocessedbysplittingthemupinsuf-cientlysmallgroups.
Inadditiontothegravitationalforce,theGRAPEboardreturnsthepotential,andalistofneigh-boursforthe40positionswithinsearchradiihispeciedbytheuser.
ThislatterfeaturemakesGRAPEattractivealsoforSPHcalculations.
ThepartsofourcodethatuseGRAPEhavebenetedfromthecodeGRAPESPHbySteinmetz(1996),andaresimilartoit.
Inshort,theusageofGRAPEproceedsasfollows.
Fortheforcecomputation,theparticlecoordinatesarerstloadedontotheGRAPEboard,thenGADGETcallsGRAPErepeat-edlytocomputetheforceforupto40positionsinparallel.
ThecommunicationwithGRAPEisdonebymeansofacon-venientsoftwareinterfaceinC.
GRAPEcanalsoprovidelistsofnearestneighbours.
ForSPH-particles,GADGETcomputesthegravitationalforceandtheinteractionlistinjustonecallofGRAPE.
Thehostcomputerthenstilldoestherestofthework,i.
e.
itadvancestheparticles,andcomputesthehydro-dynamicalforces.
Inpractice,therearesometechnicalcomplicationswhenoneworkswithGRAPE-3.
Inordertoachievehighcomputa-tionalspeed,theGRAPE-3hardwareworksinternallywithspecialxed-pointformatsforpositions,accelerationsandmasses.
ThisresultsinareduceddynamicrangecomparedtostandardIEEEoatingpointarithmetic.
Inparticular,oneneedstospecifyaminimumlengthscaledminandaminimummassscalemminwhenworkingwithGRAPE.
Thespatialdy-namicrangeisthengivenbydmin[218;218]andthemassrangeismmin[1;64/dmin](Steinmetz1996).
WhilethecommunicationtimewithGRAPEscalespropor-tionaltotheparticlenumberN,theactualforcecomputationofGRAPEisstillanO(N2)-algorithm,becausetheGRAPEboardimplementsadirectsummationapproachtothegrav-itationalN-bodyproblem.
ThisimpliesthatforverylargeparticlenumberatreecoderunningontheworkstationalonewilleventuallycatchupandoutperformthecombinationofworkstationandGRAPE.
Forourcurrentset-upatMPAthisbreak-evenpointisaboutat300000particles.
However,itisalsopossibletocombineGRAPEwithatreealgorithm(Fukushigeetal.
1991;Makino1991a;Athanas-soulaetal.
1998;Kawaietal.
2000),forexamplebyexport-ingtreenodesinsteadofparticlesinanappropriateway.
Suchacombinationoftree+GRAPEscalesasO(NlogN)andisabletooutperformpuresoftwaresolutionsevenforlargeN.
4.
SmoothedparticlehydrodynamicsSPHisapowerfulLagrangiantechniquetosolvehydrody-namicalproblemswithaneasethatisunmatchedbygridbaseduidsolvers(seeMonaghan1992,foranexcellentre-view).
Inparticular,SPHisverywellsuitedforthree-dimensionalastrophysicalproblemsthatdonotcruciallyrelyonaccuratelyresolvedshockfronts.
Unlikeothernumericalapproachesforhydrodynamics,theSPHequationsdonottakeauniqueform.
Instead,manyfor-mallydifferentversionsofthemcanbederived.
Furthermore,alargevarietyofrecipesforspecicimplementationsofforcesymmetrization,determinationsofsmoothinglengths,andar-ticialviscosity,havebeendescribed.
SomeofthesechoicesarecrucialfortheaccuracyandefciencyoftheSPHimple-mentation,othersareonlyofminorimportance.
Seethere-centworkbyThackeretal.
(2000)andLombardietal.
(1999)foradiscussionoftherelativeperformanceofsomeofthesepossibilities.
BelowwegiveasummaryofthespecicSPHimplementationweuse.
4.
1.
BasicequationsThecomputationofthehydrodynamicforceandtherateofchangeofinternalenergyproceedsintwophases.
Intherstphase,newsmoothinglengthshiaredeterminedfortheac-tiveparticles(thesearetheonesthatneedaforceupdateatthecurrenttimestep,seebelow),andforeachofthem,theneigh-bouringparticlesinsidetheirrespectivesmoothingradiiarefound.
TheLagrangiannatureofSPHariseswhenthisnum-berofneighboursiskepteitherexactly,oratleastroughly,constant.
Thisisachievedbyvaryingthesmoothinglengthhiofeachparticleaccordingly.
Thehithusadjusttothelocalparticledensityadaptively,leadingtoaconstantmassresolutionindependentofthedensityoftheow.
Nelson&Papaloizou(1994)arguethatitisactuallybesttokeepthenumberofneighboursexactlyconstant,resultinginthelow-estlevelofnoiseinSPHestimatesofuidquantities,andin7V.
Springel,N.
YoshidaandS.
D.
M.
Whitethebestconservationofenergy.
Inpractice,similarlygoodresultsareobtainediftheuctuationsinneighbournumberremainverysmall.
IntheserialversionofGADGETwekeepthenumberofneighboursxed,whereasitisallowedtovaryinasmallbandintheparallelcode.
Havingfoundtheneighbours,wecomputethedensityoftheactiveparticlesasρi=Nj=1mjW(rij;hi),(20)whererij≡rirj,andwecomputeanewestimateofdivergenceandvorticityasρi(·v)i=jmj(vjvi)iW(rij;hi),(21)ρi(*v)i=jmj(vivj)*iW(rij;hi).
(22)Hereweemploythegatherformulationforadaptivesmooth-ing(Hernquist&Katz1989).
Forthepassiveparticles,valuesfordensity,internalen-ergy,andsmoothinglengtharepredictedatthecurrenttimebasedonthevaluesofthelastupdateofthoseparticles(seeSection5).
Finally,thepressureoftheparticlesissettoPi=(γ1)ρiui.
Inthesecondphase,theactualforcesarecomputed.
HerewesymmetrizethekernelsofgatherandscatterformulationsasinHernquist&Katz(1989).
Wecomputethegasdynami-calaccelerationsasagasi=Pρi+avisci=jmjPiρ2i+Pjρ2j+Πij12iW(rij;hi)+12iW(rij;hj),(23)andthechangeoftheinternalenergyasduidt=12jmjPiρ2i+Pjρ2j+Πij(vivj)12iW(rij;hi)+12iW(rij;hj).
(24)Insteadofsymmetrizingthepressuretermswithanarithmeticmean,thecodecanalsobeusedwithageometricmeanac-cordingtoPiρ2i+Pjρ2j→2PiPjρiρj.
(25)Thismaybeslightlymorerobustincertainsituations(Hern-quist&Katz1989).
ThearticialviscosityΠijistakentobeΠij=12(fi+fj)Πij,(26)withΠij=αcijij+2α2ij/ρijifvij·rijhiisslightlymoretricky.
Onesolutiontothisproblemistosimplyndallneighboursofiinsidehi,andtoconsidertheforcecomponentsfij=mimjPiρ2i+Pjρ2j+Πij12iW(rij;hi).
(30)Ifweaddfijtotheforceoni,andfijtotheforceonj,thesumofequation(23)isreproduced,andthemomentumcon-servationismanifest.
Thisalsoholdsfortheinternalenergy.
Unfortunately,thisonlyworksifallparticlesareactive.
Inanindividualtimestepscheme,wethereforeneedanefcientwaytondalltheneighboursofparticleiintheabovesense,andwediscussouralgorithmfordoingthisbelow.
4.
2.
NeighboursearchInSPH,abasictaskistondthenearestneighboursofeachSPHparticletoconstructitsinteractionlist.
Specically,intheimplementationwehavechosenweneedtondallpar-ticlescloserthanasearchradiushiinordertoestimatethedensity,andoneneedsallparticleswith|rij|101001000Rε=0.
0004100100010000101001000Rε=0.
004100100010000101001000Rε=0.
04Figure5:Radiienclosingaxedmass(7.
1*105,3.
6*104,1.
8*103,7.
1*103,3.
6*102,2.
1*101,and7.
1*101inunitsofthevirialmass)afterintegratingtheparticlepopulationofaNFWhaloinaxedpotentialforoneHubbletimewithvarioustimestepcriteria.
Solidlinesareforxedtimestepsforallparticles,dashedlinesfortimestepsbasedont∝|a|1,dottedlinesfort∝|a|1/2,anddot-dashedfortimestepsbasedonthelocaldynamicaltime,i.
e.
t∝ρ1/2.
ThethreepanelsareforsoftenedformsoftheNFWpotentialaccordingtoequation(66).
Notethattheuctuationsofthelowestradialshellarecausedbythelowsamplingofthehalointhecoreandarenotsignicant.
ThehorizontalaxisgivesthemeannumberofforceevaluationperparticlerequiredtoadvancetheparticlesetforoneHubbletime.
causethevelocitydispersiondeclinestowardsthecenterinthecoreregionofdarkmatterhaloes.
However,thisisnotenoughtocompletelysuppressnon-monotonicbehaviour,aswehavefoundinfurthersetsoftestcalculations.
Wethusthinkthatinparticularforthecoresoflargehaloesthelocaldynamicaltimeprovidesthebestofthetimestepcri-teriatested.
Itleadstohardlyanysecularevolutionofthedensityproledowntorathercoarsetimestepping.
Ontheotherhand,asQuinnetal.
(1997)havepointedout,estimat-ingthedensityinarealsimulationposesadditionalproblems.
Sincekernelestimatesarecarriedoutoveranumberofneigh-bours,thedensityinsmallhaloescanbeunderestimated,re-sultingin'evaporation'ofhaloes.
Thesehaloesarebettertreatedwithacriterionbasedonlocalacceleration,whichismuchmoreaccuratelyknowninthiscase.
Wethusthinkthataconservativeandaccuratewaytochoosetimestepsincosmologicalsimulationsisobtainedbytakingtheminimumof(37)and(39).
Thisrequiresacompu-tationofthelocalmatterdensityandlocalvelocitydispersionofcollisionlessmaterial.
Bothofthesequantitiescanbeob-tainedforthedarkmatterjustasitisdonefortheSPHparti-cles.
Ofcourse,thisrequiresadditionalworkinordertondneighboursandtodotheappropriatesummationsandkernelestimates,whichis,however,typicallynotmorethan30%ofthecostthegravitationalforcecomputation.
However,asFigure5shows,thisisingeneralworthinvesting,becausethealternativetimestepcriteriarequiremoretimesteps,andhencelargercomputationalcost,byalargerfactorinordertoachievethesameaccuracy.
20GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulations10-510-410-310-210-1100δa/a020406080100percentilez=25BHopeningcriterionθ=0.
40.
60.
81.
01.
210-510-410-310-210-1100δa/a020406080100percentilez=0BHopeningcriterionθ=0.
40.
60.
81.
01.
210-510-410-310-210-1100δa/a020406080100percentilez=25Newcriterionα=0.
0002α=0.
0210-510-410-310-210-1100δa/a020406080100percentilez=0Newcriterionα=0.
0002α=0.
0210-510-410-310-210-1100δa/a020406080100percentilez=25Optimalopeningδ=0.
0001δ=0.
0110-510-410-310-210-1100δa/a020406080100percentilez=0Optimalopeningδ=0.
0001δ=0.
01Figure6:Cumulativedistributionoftherelativeforceerrorobtainedasafunctionofthetoleranceparameterforthreeopeningcriteria,andfortwodifferentparticledistribution.
Thepanelsintheleftcolumnshowresultsfortheparticledistributionintheinitialconditionsofa323simulationatz=25,whilethepanelsontherightgivetheforceerrorsfortheevolvedandclustereddistributionatz=0.
21V.
Springel,N.
YoshidaandS.
D.
M.
White100100010000interactions/particle0.
00010.
00100.
01000.
10001.
0000δa/a(95%percentile)BHopeningcriterionNewopeningcriterionOptimalopeningz=25100100010000interactions/particle0.
00010.
00100.
01000.
10001.
0000δa/a(95%percentile)BHopeningcriterionNewopeningcriterionOptimalopeningz=0Figure7:Computationalefciencyofthreedifferentcell-openingcriteria,parameterizedbytheirtoleranceparameters.
Thehorizontalaxesisproportionaltothecomputationalcost,andtheverticalaxesgivesthe95%percentileofthecumulativedistributionofrelativeforceerrorsina323cosmologicalsimulation.
Theleftpanelshowsresultsfortheinitialconditionsatz=25,therightpanelfortheclustereddistributionatredshiftz=0.
7.
2.
ForceaccuracyandopeningcriterionInSection3.
3,wehaveoutlinedthatstandardvaluesoftheBH-openingcriterioncanresultinveryhighforceerrorsfortheinitialconditionsofcosmologicalN-bodysimulations.
Here,thedensityeldisveryclosetobeinghomogeneous,sothatsmallpeculiaraccelerationsarisefromacancellationofrelativelylargepartialforces.
Wenowinvestigatethisprob-lemfurtherusingacosmologicalsimulationwith323parti-clesinaperiodicbox.
Weconsidertheparticledistributionoftheinitialconditionsatredshiftz=25,andtheclusteredoneoftheevolvedsimulationatz=0.
RomeelDavehaskindlymadetheseparticledumpsavailabletous,togetherwithexactforceshecalculatedforallparticleswithadirectsummationtechnique,properlyincludingtheperiodicimagesofthepar-ticles.
Inthefollowingwewillcomparehisforces(whichwetaketobetheexactresult)totheonescomputedbytheparallelversionofGADGETusingtwoprocessorsonaLinuxPC.
Werstexaminethedistributionoftherelativeforceer-rorasafunctionofthetoleranceparameterusedinthetwodifferentcell-openingcriteria(8)and(18).
TheseresultsareshowninFigure6,wherewealsocontrastthesedistributionswiththeonesobtainedwithanoptimumcell-openingstrat-egy.
Thelattermaybeoperationallydenedasfollows:Anycompletetreewalkresultsinaninteractionlistthatcontainsanumberofinternaltreenodes(whosemultipoleexpansionsareused)andanumberofsingleparticles(whichgiverisetoexactpartialforces).
Obviously,theshortestpos-sibleinteractionlististhatofjustoneentry,therootnodeofthetreeitself.
Supposewestartwiththisinteractionlist,andthenopenalwaystheonenodeofthecurrentinteractionlistthatgivesrisetothelargestabsoluteforceerror.
Thisisthenodewiththelargestdifferencebetweenmultipoleexpansionandexactforce,asresultingfromdirectsummationoveralltheparticlesrepresentedbythenode.
Withsuchanopeningstrategy,theshortestpossibleinteractionlistcanbefoundforanydesiredlevelofaccuracy.
Thelattermaybesetbyrequir-ingthatthelargestforceerrorofanynodeintheinteractionlisthasfallenbelowafractionδ|a|ofthetrueforceeld.
Ofcourse,thisoptimumstrategyisnotpracticalinarealsimulation,afterallitrequirestheknowledgeofalltheex-actforcesforallinternalnodesofthetree.
Ifthesewereknown,wewouldn'thavetobotherwithtreestobeginwith.
However,itisveryinterestingtondoutwhatsuchadu-cialoptimumopeningstrategywouldultimatelydeliver.
Anyotheranalyticorheuristiccell-openingcriterionisboundtobeworse.
Knowinghowmuchworsesuchcriteriaarewillthusinformusaboutthemaximumperformanceimprove-mentpossiblebyadoptingalternativecell-openingcriteria.
Wethereforecomputedforeachparticletheexactdirectsum-mationforcesexertedbyeachoftheinternalnodes,henceal-lowingustoperformanoptimumtreewalkintheaboveway.
TheresultingcumulativeerrordistributionsasafunctionofδareshowninFigure6.
LookingatthisFigure,itisclearthattheBHerrordistri-butionhasamorepronouncedtailoflargeforceerrorscom-paredtotheopeningcriterion(18).
WhatismoststrikinghoweveristhatathighredshifttheBHforceerrorscanbe-comeverylargeforvaluesoftheopeningangleθthattendto22GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulationsgiveperfectlyacceptableaccuracyforaclusteredmassdis-tribution.
Thisisclearlycausedbyitspurelygeometricalnature,whichdoesnotknowaboutthesmallnessofthenetforces,andthusdoesnotinvestsufcientefforttoevaluatepartialforcestohighenoughaccuracy.
Theerrordistributionsalonedonottellyetaboutthecom-putationalefciencyofthecell-openingstrategies.
Toassessthese,wedenetheerrorforagivencell-openingcriterionasthe95%percentileoftheerrordistribution,andweplotthisversusthemeanlengthoftheinteractionlistperparticle.
Thislengthisessentiallylinearlyrelatedtothecomputationalcostoftheforceevaluation.
InFigure7,wecomparetheperformanceofthecell-openingcriteriainthisway.
Athighredshift,weseethatthelargeerrorsoftheBHcriterionaremainlycausedbecausethemeanlengthoftheinteractionlistremainssmallanddoesnotadapttothemoredemandingrequirementsoftheforcecom-putationinthisregime.
Our'new'cell-openingcriteriondoesmuchbetterinthisrespect;itserrorsremainwellcontrolledwithouthavingtoadjustthetoleranceparameter.
AnotheradvantageofthenewcriterionisthatitisinfactmoreefcientthantheBHcriterion,i.
e.
itachieveshigherforceaccuracyforagivencomputationalexpense.
AsFigure7shows,foraclusteredparticledistributioninacosmolog-icalsimulation,theimpliedsavingcaneasilyreachafactor2-3,speedingupthesimulationbythesamefactor.
Similarperformanceimprovementsareobtainedforisolatedgalax-ies,astheexampleinFigure8demonstrates.
Thiscanbeun-derstoodasfollows:ThegeometricalBHcriteriondoesnottakethedynamicalsignicanceofthemassdistributionintoaccount.
Forexample,itwillinvestalargenumberofcell-particleinteractionstocomputetheforceexertedbyalargevoidtoahighrelativeaccuracy,whileactuallythisforceisofsmallabsolutesize,anditwouldbebettertoconcentratemoreonthoseregionsthatprovidemostoftheforceonthecurrentparticle.
Thenewopeningcriterionfollowsthislatterstrategy,improvingtheforceaccuracyatagivennumberofparticle-cellinteractions.
Wenotethattheimprovementobtainedbycriterion(18)bringsusabouthalfwaytowhatmightultimatelybepossiblewithanoptimumcriterion.
Itthusappearsthatthereisstillroomforfurtherimprovementofthecell-openingcriterion,eventhoughitisclearthattheoptimumwilllikelynotbereachableinpractice.
7.
3.
CollidingdiskgalaxiesAsatestoftheperformanceandaccuracyoftheintegrationofcollisionlesssystems,wehereconsiderapairofmergingdiskgalaxies.
Eachgalaxyhasamassivedarkhaloconsistingof30000particles,andanembeddedstellardisk,representedby20000particles.
ThedarkhaloismodeledaccordingtotheNFW-prole,adiabaticallymodiedbythecentralexponen-tialdisk,whichcontributes5%ofthetotalmass.
Thehalo050010001500interactions/particle0.
00010.
00100.
01000.
1000δa/aθ=1.
2θ=1.
0θ=0.
8θ=0.
6α=0.
0025α=0.
0100α=0.
0400Figure8:Forceerrorsforanisolatedhalo/diskgalaxywiththeBH-criterion(boxes),andthenewopeningcriterion(triangles).
ThedarkhaloofthegalaxyismodeledwithaNFWproleandistruncatedatthevirialradius.
Theplotshowsthe90%percentileoftheerrordis-tribution,i.
e.
90%oftheparticleshaveforceerrorsbelowthecitedvalues.
Thehorizontalaxesmeasuresthecomputationalexpense.
hasacircularvelocityv200=160kms1,aconcentrationc=5,andspinparameterλ=0.
05.
Theradialexponen-tialscalelengthofthediskisRd=4.
5h1kpc,whiletheverticalstructureisthatofanisothermalsheetwiththicknessz0=0.
2Rd.
Thegravitationalsofteningofthehaloparticlesissetto0.
4h1kpc,andthatofthediskto0.
1h1kpc.
Initially,thetwogalaxiesareset-uponaparabolicorbit,withseparationsuchthattheirdarkhaloesjusttouch.
Bothofthegalaxieshaveaprogradeorientation,butareinclinedwithrespecttotheorbitalplane.
Infact,thetestconsideredherecorrespondsexactlytothesimulation'C1'computedbySpringel(2000),wheremoreinformationabouttheconstruc-tionoftheinitialconditionscanbefound(seealsoSpringel&White1999).
Werstconsiderarunofthismodelwithasetofparam-etersequaltothecoarsestvalueswewouldtypicallyemployforasimulationofthiskind.
Forthetimeintegration,weusedtheparameterαtolσ=25kms1,andfortheforcecomputationwiththetreealgorithm,thenewopeningcri-terionwithα=0.
04.
Thesimulationwasthenrunfromt=0tot=2.
8ininternalunitsoftime(correspondingto2.
85h1Gyr).
Duringthistimethegalaxieshavetheirrstcloseencounterataroundt1.
0,wheretidaltailsareejectedoutofthestellardisks.
Duetothebrakingbydynam-icalfriction,thegalaxieseventuallyfalltogetherforasecondtime,afterwhichtheyquicklymergeandviolentlyrelaxtoformasinglemergerremnant.
Att=2.
8,theinnerpartsofthemergerremnantarewellrelaxed.
Thissimulationrequiredatotalnumberof4684stepsand27795733forcecomputations,i.
e.
acomputationallyequallyexpensivecomputationwithxedtimestepscouldjustmake280fulltimesteps.
Therelativeerrorinthetotalenergywas23V.
Springel,N.
YoshidaandS.
D.
M.
White110r[h-1kpc]105106107108109ρ(r)[h2MOkpc-3]Figure9:Sphericallyaverageddensityproleofthestellarcompo-nentinthemergerremnantoftwocollidingdiskgalaxies.
Trianglesshowtheresultsobtainedusingourvariabletimestepintegration,whileboxesanddiamondsareforxedtimestepintegrationswitht=0.
01(10h1Myr)andt=0.
0025(2.
5h1Myr),respec-tively.
Notethatthesimulationusingtheadaptivetimestepisaboutasexpensiveastheonewitht=0.
01.
Ineachcase,thecenteroftheremnantwasdenedasthepositionoftheparticlewiththeminimumgravitationalpotential.
3.
0*103,andaSunUltrasparc-IIworkstation(sparcv9pro-cessor,296MHzclockspeed)didthesimulationin4hours(usingtheserialcode).
Therawforcespeedwas2800forcecomputationspersecond,olderworkstationswillachievesomewhatsmallervalues,ofcourse.
Also,higherforceac-curacysettingswillslowdownthecodesomewhat.
Wenowconsiderasimulationofthesamesystemusingaxedtimestep.
Fort=0.
01,therunneeds280fullsteps,i.
e.
itconsumesthesameamountofCPUtimeasabove.
However,inthissimulation,theerrorinthetotalenergyis2.
2%,substantiallylargerthanbefore.
Therearealsodif-ferencesinthedensityproleofthemergerremnants.
InFigure9,wecomparetheinnerdensityproleofthesimula-tionwithadaptivetimesteps(triangles)totheonewithaxedtimestepoft=0.
01(boxes).
Wehereshowthesphericallyaveragedproleofthestellarcomponent,withthecenteroftheremnantsdenedasthepositionoftheparticlewiththeminimumgravitationalpotential.
Itcanbeseenthatintheinnermost1h1kpc,thedensityobtainedwiththexedtimestepfallsshortoftheadaptivetimestepintegration.
Togetanideahowsmallthexedtimestephastobetoachievesimilaraccuracyaswiththeadaptivetimestep,wehavesimulatedthistestasecondtime,withaxedtimestepsoft=0.
0025.
Wealsoshowthecorrespondingprole(diamonds)inFigure9.
Itcanbeseenthatforsmallert,theagreementwiththevariabletimestepresultimproves.
At2*0.
4h1kpc,thesofteningofthedarkmatterstartstobecomeimportant.
Foragreementdowntothisscale,thexedtimestepneedstobeslightlysmallerthant=0.
0025,sotheoverallsavingduetotheuseofindividualparticletimestepsisatleastafactorof45inthisexample.
7.
4.
CollapseofacoldgassphereTheself-gravitatingcollapseofaninitiallyisothermal,coolgasspherehasbeenacommontestproblemofSPHcodes(Evrard1988;Hernquist&Katz1989;Steinmetz&M¨uller1993;Thackeretal.
2000;Carraroetal.
1998,andothers).
Followingtheseauthors,weconsiderasphericallysymmetricgascloudoftotalmassM,radiusR,andinitialdensityproleρ(r)=M2πR21r.
(67)Wetakethegastobeofconstanttemperatureinitially,withaninternalenergyperunitmassofu=0.
05GMR.
(68)Atthestartofthesimulation,thegasparticlesareatrest.
Weobtaintheirinitialcoordinatesfromadistortedregulargridthatreproducesthedensityprole(67),andweuseasystemofunitswithG=M=R=1.
InFigure10,weshowtheevolutionofthepotential,thethermal,andthekineticenergyofthesystemfromthestartofthesimulationatt=0tot=3.
Weplotresultsfortwosimulations,onewith30976particles(solid),andonewith4224particles(dashed).
Duringthecentralbouncebetweent≈0.
8andt≈1.
2mostofthekineticenergyisconvertedintoheat,andastrongshockwavetravelsoutward.
Later,thesystemslowlysettlestovirialequilibrium.
FortheserunsNswassetto40,thegravitationalsoften-ingto=0.
02,andtimeintegrationwascontrolledbytheparametersαtolσ=0.
05andαcour=0.
12,resultinginverygoodenergyconservation.
Theabsoluteerrorinthetotalen-ergyislessthan1.
1*103atalltimesduringthesimula-tion,translatingtoarelativeerrorof0.
23%.
Sinceweuseatimeintegrationschemewithindividualtimestepsofarbi-trarysize,thissmallerrorisreassuring.
Thetotalnumberofsmallstepstakenbythe4224particlesimulationwas3855,withatotalof2192421forcecomputations,i.
e.
theequiva-lentnumberof'full'timestepswas519.
ASunUltrasparc-IIworkstationneeded2300secondsforthesimulation.
Thelarger30976particleruntook10668steps,withanequiva-lentof1086'full'steps,and12hoursofCPUtime.
Notethatbyreducingthetimeintegrationaccuracybyafactorof2,withacorrespondingreductionoftheCPUtimeconsump-tionbythesamefactor,theresultsdopracticallynotchange,however,themaximumerrorintheenergygoesupto1.
2%inthiscase.
2Notethatourdenitionofthesmoothinglengthhdiffersbyafactorof2frommostpreviousSPHimplementations.
Asaconsequence,corre-spondingvaluesofαcouraredifferentbyafactorof2,too.
24GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulations0123time-2-101energythermaltotalkineticpotentialFigure10:Timeevolutionofthethermal,kinetic,potential,andtotalenergyforthecollapseofaninitiallyisothermalgassphere.
Solidlinesshowresultsforasimulationswith30976particles,dashedlinesarefora4224particlerun.
TheresultsofFigure10agreeverywellwiththoseofSteinmetz&M¨uller(1993),andwithThackeretal.
(2000)ifwecomparetotheir'best'implementationofarticialvis-cosity(theirversion12).
Steinmetz&M¨uller(1993)havealsocomputedasolutionofthisproblemwithaveryaccurate,one-dimensional,sphericallysymmetricpiecewiseparabolicmethod(PPM).
Forparticlenumbersabove10000,ourSPHresultsbecomeveryclosetothenitedifferenceresult.
How-ever,evenforverysmallparticlenumbers,SPHiscapableofreproducingthegeneralfeaturesofthesolutionverywell.
Wealsoexpectthatathree-dimensionalPPMcalculationofthecollapsewouldlikelyrequireatleastthesameamountofCPUtimeasourSPHcalculation.
7.
5.
PerformanceandscalabilityoftheparallelgravityWehereshowasimpletestoftheperformanceoftheparallelversionofthecodeunderconditionsrelevantforrealtargetapplications.
Forthistest,wehaveuseda'stripped-down'versionoftheinitialconditionsoriginallyconstructedforahigh-resolutionsimulationofaclusterofgalaxies.
Theorig-inalsetofinitialconditionswasset-uptofollowthecosmo-logicalevolutionofalargesphericalregionwithcomovingradius70h1Mpc,withinaΛCDMcosmogonycorrespond-ingto0=0.
3,Λ=0.
7,zstart=50,andh=0.
7.
Inthecenterofthesimulationsphere,2millionhigh-resolutionparticleswereplacedinthesomewhatenlargedLagrangianregionofthecluster.
Therestofthevolumewaslledwithanextendedshellofboundaryparticlesoflargermassandlargersoftening;theyareneededforaproperrepresentationofthegravitationaltidaleld.
Tokeepourtestsimple,wehavecutacentredsphereofco-movingradius12.
25h1Mpcfromtheseinitialconditions,andweonlysimulatedthe500000high-resolutionparticleswithmassmp=1.
36*109h1M⊙foundwithinthisregion.
Suchasimulationwillnotbeusefulfordirectscienticanal-ysisbecauseitdoesnotmodelthetidaleldproperly.
How-25V.
Springel,N.
YoshidaandS.
D.
M.
WhiteTable1:ConsumptionofCPU-timeinvariouspartsofthecodeforacosmologicalrunfromz=50toz=4.
3.
Thetablegivestimingsforrunswith4,8and16processorsontheGarchingCrayT3E.
Thecomputationofthegravitationalforcesisbyfarthedominantcomputationaltask.
Wehavefurthersplitupthattimeintotheactualtree-walktime,thetree-constructiontime,thetimeforcommunicationandsummationofforcecontributions,andintothetimelostbywork-loadimbalance.
Thepotentialcomputationisdoneonlyonceinthistest(itcanoptionallybedoneinregularintervalstocheckenergyconservationofthecode).
'Miscellaneous'referstotimespentinadvancingandpredictingparticles,andinmanagingthebinarytreeforthetimeline.
I/Otimeforwritingasnapshotle(groupsofprocessorscanwriteinparallel)isonly1-2seconds,andthereforenotlistedhere.
4processors8processors16processorscumulativetime[sec]relativetimecumulativetime[sec]relativetimecumulativetime[sec]relativetimetotal8269.
0100.
0%4074.
0100.
0%2887.
5100.
0%gravity7574.
691.
6%3643.
089.
4%2530.
387.
6%treewalks5086.
361.
5%2258.
455.
4%1322.
445.
8%treeconstruction1518.
418.
4%773.
719.
0%588.
420.
4%communication&summation24.
40.
3%35.
40.
9%54.
11.
9%work-loadimbalance901.
510.
9%535.
113.
1%537.
418.
6%domaindecomposition209.
92.
5%158.
13.
9%172.
26.
0%potentialcomputation(optional)46.
30.
2%18.
10.
4%11.
40.
4%miscellaneous438.
35.
3%254.
86.
3%173.
66.
0%05101520Np05.
01031.
01041.
51042.
01042.
5104rawgravityspeed[part/sec]05101520Np02.
01034.
01036.
01038.
01031.
01041.
2104overallcodespeed[part/sec]Figure11:Codeperformanceandscalabilityforacosmologicalintegrationwithvacuumboundaries.
Theleftpanelshowsthespeedofthegravitationalforcecomputationasafunctionofprocessornumber(inparticlespersecond).
Thisisbasedonthetreewalktimealone.
Inthesimulation,additionaltimeisneededfortreeconstruction,work-loadimbalance,communication,domaindecomposition,predictionofparticles,timeline,etc.
Thisreducesthe'effective'speedofthecode,asshownintherightpanel.
Thiseffectivespeedgivesthenumberofparticlesadvancedbyonetimesteppersecond.
Notethattheonlysignicantsourceofwork-loadimbalanceinthecodeoccursinthegravitycomputation,wheresomesmallfractionoftimeislostwhenprocessorsidlywaitforotherstonishtheirtree-walks.
ever,thistestwillshowrealisticclusteringandtime-steppingbehaviour,andthusallowsareasonableassessmentoftheex-pectedcomputationalcostandscalingofthefullproblem.
WehaverunthetestproblemwithGADGETontheGarch-ingT3Efromredshiftz=50toredshiftz=4.
3.
Werepeatedtheidenticalrunonpartitionsofsize4,8,and16processors.
Inthistest,weincludedquadrupolemomentsinthetreecomputation,weusedaBHopeningcriterionwithθ=1.
0,andagravitationalsofteninglengthof15h1kpc.
InTable1welistindetailtheelapsedwall-clocktimeforvariouspartsofthecodeforthethreesimulations.
Thedom-inantsinkofCPUtimeisthecomputationofgravitationalforcesfortheparticles.
Toadvancethetestsimulationfromz=50toz=4.
3,GADGETneeded30.
0*106forcecom-putationsinatotalof3350timesteps.
Notethatonaverageonly1.
8%oftheparticlesareadvancedinasingletimestep.
Undertheseconditionsitischallengingtoeliminatesourcesofoverheadincurredbythetime-steppingandtomaintainwork-loadbalancing.
GADGETsolvesthistasksatisfactorily.
Ifweusedaxedtimestep,thework-loadbalancingwouldbe26GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulationsessentiallyperfect.
Notethattheuseofouradaptivetimestepintegratorresultsinasavingofaboutafactorof3-5comparedtoaxedtimestepschemewiththesameaccuracy.
WethinkthattheoverallperformanceofGADGETisgoodinthistest.
Therawgravitationalspeedishigh,andthealgo-rithmusedtoparallelizetheforcecomputationscaleswell,asisseenintheleftpanelofFigure11.
Notethattheforce-speedoftheNp=8runisevenhigherthanthatoftheNp=4run.
Thisisbecausethedomaindecompositiondoesexactlyonesplitinthex-,y-,andz-directionsintheNp=8case.
Thedomainsarethenclosetocubes,whichreducesthedepthofthetreeandspeedsupthetree-walks.
Also,theforcecommunicationdoesnotinvolveasigni-cantcommunicationoverhead,andthetimespentinmiscella-neoustasksofthesimulationcodescalescloselywithproces-sornumber.
MostlossesinGADGEToccurduetowork-loadimbalanceintheforcecomputation.
Whilewethinktheselossesareacceptableintheabovetest,oneshouldkeepinmindthatweherekepttheproblemsizexed,andjustin-creasedtheprocessornumber.
Ifwealsoscaleuptheprob-lemsize,work-loadbalancingwillbesignicantlyeasiertoachieve,andtheefciencyofGADGETwillbenearlyashighasforsmallprocessornumber.
Nevertheless,thecomputationalspeedofGADGETmayseemdisappointingwhencomparedwiththetheoreticalpeakperformanceofmodernmicroprocessors.
Forexample,theprocessorsoftheT3Eusedforthetimingshaveanominalpeakperformanceof600MFlops,butGADGETfallsfarshortofreachingthis.
However,thepeakperformancecanonlybereachedunderthemostfavourableofcircumstances,andtyp-icalcodesoperateinarangewheretheyareafactorof5-10slower.
GADGETisnoexceptionhere.
Whilewethinkthatthecodedoesareasonablejobinavoidingunnecessaryoat-ingpointoperations,thereiscertainlyroomforfurthertuningmeasures,includingprocessor-speciconeswhichhaven'tbeentriedatallsofar.
Alsonotethatouralgorithmofindivid-ualtreewalksproducesessentiallyrandomaccessofmemorylocations,asituationthatcouldhardlybeworseforthecachepipelineofcurrentmicroprocessors.
7.
6.
ParallelSPHinaperiodicvolumeAsafurthertestofthescalingandperformanceoftheparal-lelversionofGADGETintypicalcosmologicalapplicationsweconsiderasimulationofstructureformationinaperiodicboxofsize(50h1Mpc)3,includingadiabaticgasphysics.
Weuse323darkmatterparticles,and323SPHparticlesinaΛCDMcosmologywith0=0.
3,Λ=0.
7andh=0.
7,normalizedtoσ8=0.
9.
Forsimplicity,initialconditionsareconstructedbydisplacingtheparticlesfromagrid,withtheSPHparticlesplacedontopofthedarkmatterparticles.
Wehaveevolvedtheseinitialconditionsfromz=10toz=1,using2,4,8,16,and32processorsontheCrayT3EinGarching.
Thenalparticledistributionsofalltheseruns010203040Np02000400060008000speed[part/sec]gravityspeed010203040Np05.
01031.
01041.
51042.
01042.
5104speed[part/sec]SPHspeed010203040Np0100020003000400050006000speed[part/sec]overallcodespeedFigure12:SpeedofthecodeinagasdynamicalsimulationofaperiodicΛCDMuniverse,runfromz=10toz=1,asafunc-tionofthenumberofT3Eprocessorsemployed.
Thetoppanelshowsthespeedofthegravitationalforcecomputation(includingtreeconstructiontimes).
Wedenethespeedhereasthenumberofforcecomputationsperelapsedwall-clocktime.
Themiddlepanelgivesthespeedinthecomputationofhydrodynamicalforces,whilethebottompanelshowstheresultingoverallcodespeedintermsofparticlesadvancedbyonetimesteppersecond.
Thiseffectivecodespeedincludesallothercodeoverhead,whichislessthan5%ofthetotalcputimeinallruns.
areinexcellentagreementwitheachother.
InthebottompanelofFigure12,weshowthecodespeed27V.
Springel,N.
YoshidaandS.
D.
M.
Whiteasafunctionofthenumberofprocessorsemployed.
Weheredenethespeedasthetotalnumberofforcecomputationsdividedbythetotalelapsedwall-clocktimeduringthistest.
Notethatthescalingofthecodeisalmostperfectlylinearinthisexample,evenbetterthanfortheset-upusedinthepre-vioussection.
Infact,thisislargelycausedbythesimplergeometryoftheperiodicboxascomparedtothesphericalvolumeusedearlier.
Theveryabsenceofsuchboundaryef-fectsmakestheperiodicboxeasiertodomain-decompose,andtowork-balance.
ThetopandmiddlepanelsofFigure12showthespeedofthegravitationalforcecomputationandthatoftheSPHpartseparately.
WhatwendencouragingisthattheSPHalgo-rithmscalesreallyverywell,whichispromisingforfuturelarge-scaleapplications.
Itisinterestingthatinthistest(whereNs=40SPHneigh-bourshavebeenused)thehydrodynamicalpartactuallycon-sumesonly25%oftheoverallCPUtime.
PartlythisisduetotheslowergravitationalspeedinthistestcomparedtotheresultsshowninFigure11,whichinturniscausedbytheEwaldsummationneededtotreattheperiodicboundarycon-ditions,andbylongerinteractionlistsinthepresenttest(wehereusedournewopeningcriterion).
AlsonotethatonlyhalfoftheparticlesinthissimulationareSPHparticles.
Weremarkthatthegravitationalforcecomputationwillusuallybemoreexpensiveathigherredshiftthanatlowerredshift,whiletheSPHpartdoesnothavesuchadependence.
ThefractionoftimeconsumedbytheSPHpartthustendstoincreasewhenthematerialbecomesmoreclustered.
Indissi-pativesimulationsonewilltypicallyformclumpsofcoldgaswithveryhighdensity-objectsthatwethinkwillformstarsandturnintogalaxies.
Suchcoldknotsofgascanslowdownthecomputationsubstantially,becausetheyrequiresmallhy-drodynamicaltimesteps,andifalowerspatialresolutioncut-offforSPHisimposed,thehydrodynamicalsmoothingmaystarttoinvolvemoreneighboursthanNs.
8.
DiscussionWehavepresentedthenumericalalgorithmsofourcodeGADGET,designedasaexibletooltostudyawiderangeofproblemsincosmology.
TypicalapplicationsofGADGETcanincludeinteractingandcollidinggalaxies,starformationandfeedbackintheinterstellarmedium,formationofclus-tersofgalaxies,ortheformationoflarge-scalestructureintheuniverse.
Infact,GADGEThasalreadybeenusedsuccessfullyinalloftheseareas.
Usingourcode,Springel&White(1999)havestudiedtheformationoftidaltailsincollidinggalaxies,andSpringel(2000)hasmodeledstarformationandfeedbackinisolatedandcollidinggas-richspirals.
Forthesesimulations,theserialversionofthecodewasemployed,bothwithandwithoutsupportbytheGRAPEspecial-purposehardware.
TheparallelversionofGADGEThasbeenusedtocomputehigh-resolutionN-bodysimulationsofclustersofgalaxies(Springel1999;Springeletal.
2000;Yoshidaetal.
2000a,b).
Inthelargestsimulationofthiskind,69millionparticleshavebeenemployed,with20millionofthemendingupinthevirializedregionofasingleobject.
Theparticlemassinthehigh-resolutionzonewasjust1010ofthetotalsimulatedmass,andthegravitationalsofteninglengthwas0.
7h1kpcinasimulationvolumeofdiameter140h1Mpc,translatingtoanimpressivespatialdynamicrangeof2*105inthreedimensions.
WehavealsosuccessfullyemployedGADGETfortwo'constrained-realization'(CR)simulationsoftheLocalUni-verse(Mathisetal.
2000,inpreparation).
Inthesesimu-lations,theobserveddensityeldasseenbyIRASgalaxieshasbeenusedtoconstrainthephasesofthewavesoftheini-tialuctuationspectrum.
ForeachofthetwoCRsimulations,weemployed75millionparticlesintotal,with53millionhigh-resolutionparticlesofmass3.
6*109h1M⊙(ΛCDM)or1.
2*1010h1M⊙(τCDM)inthelow-densityandcritical-densitymodels,respectively.
ThemaintechnicalfeaturesofGADGETareasfollows.
GravitationalforcesarecomputedwithaBarnes&Hutoct-tree,usingmultipoleexpansionsuptoquadrupoleorder.
Pe-riodicboundaryconditionscanoptionallybeusedandareim-plementedbymeansofEwaldsummation.
Thecell-openingcriterionmaybechoseneitherasthestandardBH-criterion,oranewcriterionwhichwehaveshowntobecomputation-allymoreefcientandbettersuitedtocosmologicalsimula-tionsstartingathighredshift.
Asanalternativetothetree-algorithm,theserialcodecanusethespecial-purposehard-wareGRAPEbothtocomputegravitationalforcesandforthesearchforSPHneighbours.
InourSPHimplementation,thenumberofsmoothingneighborsiskeptexactlyconstantintheserialcode,andisallowedtouctuateinasmallbandintheparallelcode.
Forcesymmetryisachievedbyusingthekernelaveragingtechnique,andasuitableneighboursearchingalgorithmisusedtoguaranteethatallinteractingpairsofSPHparticlesarealwaysfound.
Weuseashear-reducedarticialviscos-itythathasemergedasagoodparameterizationinrecentsystematicstudiesthatcomparedseveralalternativeformu-lations(Thackeretal.
2000;Lombardietal.
1999).
Parallelizationofthecodeformassivelyparallelsuper-computersisachievedinanexplicitmessagepassingap-proach,usingtheMPIstandardcommunicationlibrary.
Thesimulationvolumeisspatiallysplitusingarecursiveor-thogonalbisection,andeachoftheresultingdomainsismappedontooneprocessor.
Dynamicwork-loadbalancingisachievedbymeasuringthecomputationalexpenseincurredbyeachparticle,andbalancingthesumoftheseweightsinthedomaindecomposition.
Thecodeallowsfullyadaptive,individualparticletimesteps,bothforcollisionlessparticlesandforSPHpar-28GADGET:Acodeforcollisionlessandgasdynamicalcosmologicalsimulationsticles.
Thespeed-upobtainedbytheuseofindividualtimestepsdependsonthedynamicrangeofthetimescalespresentintheproblem,andontherelativepopulationofthesetimescaleswithparticles.
Foracollisionlesscosmologicalsimulationwithagravitationalsofteninglengthlargerthan30h1kpctheoverallsavingistypicallyafactorof35.
However,ifsmallersofteninglengthsaredesired,theuseofindividualparticletimestepsresultsinlargersavings.
Inthehydrodynamicalpart,thesavingscanbestilllarger,espe-ciallyifdissipativephysicsisincluded.
Inthiscase,adap-tivetimestepsmayberequiredtomakeasimulationfeasibletobeginwith.
GADGETcanbeusedtorunsimulationsbothinphysicalandincomovingcoordinates.
Thelatterisusedforcosmologicalsimulationsonly.
Here,thecodeemploysanintegrationschemethatcandealwitharbitrarycosmolog-icalbackgroundmodels,andwhichisexactinlineartheory,i.
e.
thelinearregimecanbetraversedwithmaximumef-ciency.
GADGETisanintrinsicallyLagrangiancode.
Boththegravityandthehydrodynamicalpartsimposenorestrictiononthegeometryoftheproblem,noranyhardlimitontheallowabledynamicrange.
Currentandfuturesimulationsofstructureformationthataimtoresolvegalaxiesintheircor-rectcosmologicalsettingwillhavetoresolvelengthscalesofsize0.
11h1kpcinvolumesofsize100h1Mpc.
Thisrangeofscalesisaccompaniedbyasimilarlylargedynamicrangeinmassandtimescales.
Ournewcodeisessentiallyfreetoadapttothesescalesnaturally,anditinvestscomputa-tionalworkonlywhereitisneeded.
Itisthereforeatoolthatshouldbewellsuitedtoworkontheseproblems.
SinceGADGETiswritteninstandardANSI-C,andthepar-allelizationformassivelyparallelsupercomputersisachievedwiththestandardMPIlibrary,thecoderunsonalargevari-etyofplatforms,withoutrequiringanychange.
Havingelim-inatedthedependenceonproprietarycompilersoftwareandoperatingsystemswehopethatthecodewillremainusablefortheforeseeablefuture.
Wereleasetheparallelandthese-rialversionofGADGETpublicallyinthehopethattheywillbeusefulforothersasascientictoolandasabasisforfur-thernumericaldevelopments.
AcknowledgementsWearegratefultoBarbaraLanzoni,BepiTormen,andSi-moneMarrifortheirpatienceinworkingwithearlierver-sionsofGADGET.
WethankLarsHernquist,MartinWhite,CharlesColdwell,JasjeetBaglaandMatthiasSteinmetzformanyhelpfuldiscussionsonvariousalgorithmicandnumer-icalissues.
WealsothankRomeelDaveformakingsomeofhistestparticlecongurationsavailabletous.
Wearein-debtedtotheRechenzentrumoftheMax-Planck-SocietyinGarchingforprovidingexcellentsupportfortheirT3E,onwhichalargepartofthecomputationsofthisworkhavebeen0.
00.
51.
01.
5u-3.
0-2.
5-2.
0-1.
5-1.
0-0.
5ΦFigure13:Comparisonofspline-softened(solid)andPlummer-softened(dotted)potentialofapointmasswiththeNewtonianpo-tential(dashed).
Hereh=1.
0,and=h/2.
8.
carriedout.
Wewanttothankthereferee,JunichiroMakino,forveryinsightfulcommentsonthemanuscript.
Appendix:SoftenedtreenodesThesmoothingkernelweuseforSPHcalculationsisasplineoftheform(Monaghan&Lattanzio1985)W(r;h)=8πh316rh2+6rh3,0≤rh≤12,21rh3,121.
(69)Notethatwedenethesmoothingkernelontheinterval[0,h]andnoton[0,2h]asitisfrequentlydoneinotherSPHcalcu-lations.
Wederivethespline-softenedgravitationalforcefromthiskernelbytakingtheforcefromapointmassmtobetheoneresultingfromadensitydistributionρ(r)=mW(r;h).
ThisleadstoapotentialΦ(r)=GmhW2rh(70)withakernelW2(u)=163u2485u4+325u5145,0≤u1,(76)g2(y)=1h5384596u,u≤12,384515u5+48u+32u,121,(77)g3(y)=1h796u,u≤12,32u+1u748u3,121,(78)g4(y)=1h5965(5u4),u≤12,48u15u53845+32u,121.
(79)InFigures13and14,weshowthespline-softenedandPlummer-softenedforceandpotentialofapointmass.
Foragivensplinesofteninglengthh,wedenethe'equivalent'Plummersofteninglengthas=h/2.
8.
Forthischoice,theminimumofthepotentialatu=0hasthesamedepth.
ReferencesAarsethS.
J.
,1963,MNRAS,126,223AarsethS.
J.
,TurnerE.
L.
,GottJ.
R.
,1979,ApJ,228,664AppelA.
W.
,1981,UndergraduateThesis,PrincetonUniver-sityAppelA.
W.
,1985,SIAM,J.
Sci.
Stat.
Comp,6,85AthanassoulaE.
,BosmaA.
,LambertJ.
C.
,MakinoJ.
,1998,MNRAS,293,369AthanassoulaE.
,FadyE.
,LambertJ.
C.
,BosmaA.
,2000,MNRAS,314,475BaglaJ.
S.
,1999,preprint,astro-ph/9911025BalsaraD.
W.
,1995,J.
Comp.
Phys.
,121,357BarnesJ.
,1986,inTheUseofSupercomputersinStel-larDynamics,editedbyP.
Hut,S.
L.
J.
McMillan,175,Springer,NewYorkBarnesJ.
,HutP.
,1986,Nature,324,446BarnesJ.
E.
,1990,J.
Comp.
Phys.
,87,161BarnesJ.
E.
,HutP.
,1989,ApJS,70,389BertschingerE.
,GelbJ.
M.
,1991,ComputersinPhysics,5,164BodeP.
,OstrikerJ.
P.
,XuG.
,2000,ApJS,128,561BrieuP.
P.
,SummersF.
J.
,OstrikerJ.
P.
,1995,ApJ,453,566CarraroG.
,LiaC.
,ChiosiC.
,1998,MNRAS,297,1021CouchmanH.
M.
P.
,1991,ApJ,368,23CouchmanH.
M.
P.
,ThomasP.
,PearceF.
,1995,ApJ,452,797DaveR.
,DubinskiJ.
,HernquistL.
,1997,NewAstronomy,2,277DikaiakosM.
D.
,StadelJ.
,1996,ICSconferenceproceed-ingsDubinskiJ.
,1996,NewAstronomy,1,133DubinskiJ.
,MihosJ.
C.
,HernquistL.
,1996,ApJ,462,576EastwoodJ.
W.
,HockneyR.
W.
,1974,J.
Comp.
Phys.
,16,342EbisuzakiT.
,MakinoJ.
,FukushigeT.
,etal.
,1993,PASJ,45,269EfstathiouG.
,DavisM.
,FrenkC.
S.
,WhiteS.
D.
M.
,1985,ApJS,57,241EvrardA.
E.
,1988,MNRAS,235,911FrenkC.
S.
,WhiteS.
D.
M.
,BodeP.
,etal.
,1999,ApJ,525,554FukushigeT.
,ItoT.
,MakinoJ.
,EbisuzakiT.
,SugimotoD.
,UmemuraM.
,1991,PASJ,43,841FukushigeT.
,TaijiM.
,MakinoJ.
,EbisuzakiT.
,SugimotoD.
,1996,ApJ,468,51GingoldR.
A.
,MonaghanJ.
J.
,1977,MNRAS,181,37530GADGET:AcodeforcollisionlessandgasdynamicalcosmologicalsimulationsGreengardL.
,RokhlinV.
,1987,J.
Comp.
Phys.
,73,325GroomW.
,1997,Ph.
D.
thesis,UniversityofCambridgeHernquistL.
,BouchetF.
R.
,SutoY.
,1991,ApJS,75,231HernquistL.
,KatzN.
,1989,ApJ,70,419HiotelisN.
,VoglisN.
,1991,A&A,243,333HockneyR.
W.
,EastwoodJ.
W.
,1981,ComputerSimulationUsingParticles,McGraw-Hill,NewWorkHohlF.
,1978,ApJ,83,768HolmbergE.
,1941,ApJ,94,385HultmanJ.
,K¨allanderD.
,1997,A&A,324,534HutP.
,MakinoJ.
,1999,Science,283,501HutP.
,MakinoJ.
,McMillanS.
,1995,ApJ,443,93ItoT.
,EbisuzakiT.
,MakinoJ.
,SugimotoD.
,1991,PASJ,43,547JerniganJ.
G.
,PorterD.
H.
,1989,ApJS,71,871KangH.
,OstrikerJ.
P.
,CenR.
,etal.
,1994,ApJ,430,83KatzN.
,GunnJ.
E.
,1991,ApJ,377,365KatzN.
,WeinbergD.
H.
,HernquistL.
,1996,ApJS,105,19KawaiA.
,FukushigeT.
,MakinoJ.
,TaijiM.
,2000,PASJ,52,659KleinR.
I.
,FisherR.
T.
,McKeeC.
F.
,TrueloveJ.
K.
,1998,inProceedingsoftheInternationalConferenceonNumeri-calAstrophysics1998(NAP98),editedbyS.
M.
Miyama,K.
Tomisaka,T.
Hanawa,p.
131,KluwerAcademic,Boston,Mass.
KravtsovA.
V.
,KlypinA.
A.
,KhokhlovA.
M.
J.
,1997,ApJS,111,73LiaC.
,CarraroG.
,2000,MNRAS,314,145LombardiJ.
C.
,SillsA.
,RasioF.
A.
,ShapiroS.
L.
,1999,J.
ofComp.
Phys.
,152,687LucyL.
B.
,1977,AJ,82,1013MacFarlandT.
,CouchmanH.
M.
P.
,PearceF.
R.
,PichlmeierJ.
,1998,NewAstronomy,3,687MakinoJ.
,1990,PASJ,42,717MakinoJ.
,1991a,PASJ,43,621MakinoJ.
,1991b,ApJ,369,200MakinoJ.
,2000,inDynamicsofStarClustersandtheMilkyWay,editedbyS.
Deiters,B.
Fuchs,A.
Just,R.
Spurzem,R.
Wielen,astro-ph/0007084MakinoJ.
,AarsethS.
J.
,1992,PASJ,44,141MakinoJ.
,FunatoY.
,1993,PASJ,45,279MakinoJ.
,HutP.
,1989,ComputerPhysicsReports,9,199MakinoJ.
,TaijiM.
,EbisuzakiT.
,SugimotoD.
,1997,ApJ,480,432McMillanS.
L.
W.
,AarsethS.
J.
,1993,ApJ,414,200MonaghanJ.
J.
,1992,Ann.
Rev.
Astron.
Astrophys.
,30,543MonaghanJ.
J.
,GingoldR.
A.
,1983,J.
Comp.
Phys.
,52,374MonaghanJ.
J.
,LattanzioJ.
C.
,1985,A&A,149,135NavarroJ.
F.
,FrenkC.
S.
,WhiteS.
D.
M.
,1996,ApJ,462,563NavarroJ.
F.
,FrenkC.
S.
,WhiteS.
D.
M.
,1997,ApJ,490,493NavarroJ.
F.
,WhiteS.
D.
M.
,1993,MNRAS,265,271NelsonR.
P.
,PapaloizouJ.
C.
B.
,1994,MNRAS,270,1NormanM.
L.
,BryanG.
L.
,1998,inProceedingsoftheInternationalConferenceonNumericalAstrophysics1998(NAP98),editedbyS.
M.
Miyama,K.
Tomisaka,T.
Hanawa,p.
19,KluwerAcademic,Boston,Mass.
OkumuraS.
K.
,MakinoJ.
,EbisuzakiT.
,etal.
,1993,PASJ,45,329PachecoP.
S.
,1997,ParallelProgrammingwithMPI,MorganKaufmannPublishers,SanFranciscoPearceF.
R.
,CouchmanH.
M.
P.
,1997,NewAstronomy,2,411PeeblesP.
J.
E.
,1970,Astron.
J.
,75,13PressW.
H.
,1986,inTheUseofSupercomputersinStellarDynamics,editedbyP.
Hut,S.
McMillan,184,Springer-Verlag,BerlinHeidelbergNewYorkPressW.
H.
,SchechterP.
,1974,ApJ,187,425PressW.
H.
,TeukolskyS.
A.
,VetterlingW.
T.
,FlanneryB.
P.
,1995,NumericalrecipiesinC,CambridgeUniver-sityPress,CambridgeQuinnT.
,KatzN.
,StadelJ.
,LakeG.
,1997,preprint,astro-ph/9710043RomeoA.
B.
,1998,A&A,335,922SalmonJ.
K.
,WarrenM.
S.
,1994,J.
Comp.
Phys.
,111,136SnirM.
,OttoS.
W.
,Huss-LedermanS.
,WalkerD.
W.
,Don-garraJ.
,1995,MPI:Thecompletereference,MITPress,CambridgeSplinterR.
J.
,MelottA.
L.
,ShandarinS.
F.
,SutoY.
,1998,ApJ,497,38SpringelV.
,1999,Ph.
D.
thesis,Ludwig-Maximilians-Universit¨atM¨unchenSpringelV.
,2000,MNRAS,312,859SpringelV.
,WhiteS.
D.
M.
,1999,MNRAS,307,162SpringelV.
,WhiteS.
D.
M.
,TormenB.
,KauffmannG.
,2000,preprint,astro-ph/0012055SteinmetzM.
,1996,MNRAS,278,1005SteinmetzM.
,M¨ullerE.
,1993,A&A,268,391ThackerR.
J.
,TittleyE.
R.
,PearceF.
R.
,CouchmanH.
M.
P.
,ThomasP.
A.
,2000,MNRAS,319,619TheunsT.
,RathsackM.
E.
,1993,Comp.
Phys.
Comm.
,76,141ViturroH.
R.
,CarpinteroD.
D.
,2000,A&AS,142,157WarrenM.
S.
,QuinnP.
J.
,SalmonJ.
K.
,ZurekW.
H.
,1992,ApJ,399,405WhiteS.
D.
M.
,1976,MNRAS,177,717WhiteS.
D.
M.
,1978,MNRAS,184,185WirthN.
,1986,Algorithmsanddatastructures,Prentice-Hall,LondonXuG.
,1995,ApJS,98,35531V.
Springel,N.
YoshidaandS.
D.
M.
WhiteYahagiH.
,MoriM.
,YoshiiY.
,1999,ApJS,124,1YoshidaN.
,SpringelV.
,WhiteS.
D.
M.
,TormenG.
,2000a,ApJ,535,L103YoshidaN.
,SpringelV.
,WhiteS.
D.
M.
,TormenG.
,2000b,ApJ,544,L8732

欧路云(22元) 新增美国Cera线路VPS主机且可全场8折

欧路云(oulucloud) 商家在前面的文章中也有陆续介绍过几次,这不今天有看到商家新增加美国Cera线路的VPS主机,而且有提供全场八折优惠。按照最低套餐最低配置的折扣,月付VPS主机低至22元,还是比较便宜的。不过我们需要注意的是,欧路云是一家2021年新成立的国人主机商,据说是由深圳和香港的几名大佬创建。如果我们有介意新商家的话,选择的时候谨慎且月付即可,注意数据备份。商家目前主营高防VP...

优林云(53元)哈尔滨电信2核2G

优林怎么样?优林好不好?优林 是一家国人VPS主机商,成立于2016年,主营国内外服务器产品。云服务器基于hyper-v和kvm虚拟架构,国内速度还不错。今天优林给我们带来促销的是国内东北地区哈尔滨云服务器!全部是独享带宽!首月5折 续费5折续费!地区CPU内存硬盘带宽价格购买哈尔滨电信2核2G50G1M53元直达链接哈尔滨电信4核4G50G1M83元直达链接哈尔滨电信8核8G50G1M131元直...

数脉科技:六月优惠促销,免备案香港物理服务器,E3-1230v2处理器16G内存,350元/月

数脉科技六月优惠促销发布了!数脉科技对香港自营机房的香港服务器进行超低价促销,可选择30M、50M、100Mbps的优质bgp网络。更大带宽可在选购时选择同样享受优惠,目前仅提供HKBGP、阿里云产品,香港CN2、产品优惠码续费有效,仅限新购,每个客户可使用于一个订单。新客户可以立减400元,或者选择对应的机器用相应的优惠码,有需要的朋友可以尝试一下。点击进入:数脉科技官方网站地址数脉科技是一家成...

mkxk.com为你推荐
vc组合天然维生素c和合成维生素c有区别吗冯媛甑夏如芝是康熙来了的第几期?网站检测请问,对网站进行监控检测的工具有哪些?99nets.com99nets网游模拟娱乐社区怎么打不开了?????????谁能告诉我 ???、www.zhiboba.com看NBA直播的网站哪个知道175qq.comhttp://www.qq10008.com/这个网页是真的吗?长房娇谁知道以下几种都是什么花?花期多长?网站检测工具.请介绍至少五种软件测试工具采采风荷莫言春度芳菲尽,别有中流采麦荷 啥意思恋战千年“一眼千年”是什么意思
中文域名申请 国外服务器 60g硬盘 patcha 777te 共享主机 广州服务器 国外视频网站有哪些 百度云空间 美国迈阿密 阿里dns 阵亡将士纪念日 时间服务器 带宽测速 cdn免备案空间 studentmain vim vi命令 crontab 招聘瓦工 更多