typestargetframework

targetframework  时间:2021-03-02  阅读:()
EarlyExperiencesinUsingaDomain-SpecicLanguageforLarge-ScaleGraphAnalysisSungpackHongOracleLabssungpack.
hong@oracle.
comJanVanDerLugtOracleLabsjan.
van.
der.
lugt@oracle.
comAdamWelcOracleLabsadam.
welc@oracle.
comRaghavanRamanOracleLabsraghavan.
raman@oracle.
comHassanChaOracleLabshassan.
cha@oracle.
comABSTRACTLarge-scalegraphanalysishasrecentlybeendrawinglotsofatten-tionfrombothindustryandacademia.
Althoughtherearealreadyseveralframeworksdesignedforscalablegraphanalysis,e.
g.
Gi-raph[1],alltheseframeworksadoptnon-traditionalprogrammingmodelsandAPIs.
Thiscansignicantlylowertheproductivityoftheframeworkuser.
ThispaperdiscussesthefeasibilityofusinganintuitiveDomain-SpecicLanguage(DSL)forgraphanalysis.
Specically,weuseacompilertotranslateGreen-Marl[5]pro-gramsintoanequivalentGiraphapplication,automaticallybridg-ingbetweenverydifferentprogrammingmodels.
WeobservethattheDSLprogramsareconciseandintuitive,andthatthecompilergeneratedGiraphimplementationsexhibitperformanceonparwiththatofhand-writtenones.
However,theDSLcompilationcannotbutfailifthealgorithmisfundamentallynotcompatiblewiththetargetframework.
Overall,webelievethattheDSL-basedapproachwillprovidegreatproductivitybenetsonceitmatures.
1.
INTRODUCTIONAgraphisafundamentaldatarepresentationthatcapturesrela-tionships(edges)betweendataentities(vertices).
Graphanalysisisaprocedurewhichexaminessuchrelationshipsandextractscertaininformationthatisnotimmediatelyavailablefromagivendata-set.
Examplesofgraphanalysisincludeassigningweightstothedataentitiesbasedontheirrelativeimportance,predictingfuturerela-tionshipbetweendataentities,andidentifyinggroupsofentitiesthataremorecloselyrelatedthanothers.
Furthermore,graphanal-ysisalgorithmscantakeasinputtheresultsofotheranalyses,e.
g.
countingnumberof(un-)closedtriadsorsamplingtheverticesinalargegraph.
Large-scalegraphanalysisisoftenconductedinanoff-line(batch)mannerusingthroughput-orientedsystems.
Thisisduetothefactthatsuchananalysistypicallyinvolves(repeated)inspectionofnearlytheentiregraphinstance,andthusnaturallytakesalotoftime.
Consequently,theusagemodelofgraphanalysisissome-Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.
Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.
ProceedingsoftheFirstInternationalWorkshoponGraphDataManage-mentExperienceandSystems(GRADES2013)Jun23,2013,NewYork,NY,USACopyright2013ACM978-1-4503-2188-4.
.
.
$15.
00.
whatdifferentfromthatofsimplegraphqueryingwhichmainlyaimstoidentifythespeciedsubsetofthegraph,usingthemostup-to-dateinformation,withinacertainlatency.
ThisisanalogoustothedifferencebetweenOLAPsystemsandOLTPsystemsinclassicdatabasedesign.
Severaldifferentoff-linesystemshavebeenusedforlarge-scalegraphanalysis.
Hadoop[4],themostpopularMapReduceengine,hasbeenwidelyusedtosolvegraphanalysisproblems[14]eventhoughitsperformanceforgraphanalysishasbeenputinques-tion[9].
Therearealsosystemsthatarespeciallydesignedforgraphanalysis.
Forexample,Pregel[10],anditsopen-sourceim-plementationGiraph[1],provideagraph-specicmessage-passingAPIontopofabulk-synchronous[17]computationengine.
Graph-Lab[9]performsdistributedasynchronousvertex-wisecomputa-tiontriggeredbymessagesexchangedbetweenvertices.
Trinity[16]performsgraphcomputationusingadistributedin-memorykey-valuestore.
Noticeably,allthegraphanalysissystemsaboveadoptdiffer-entcomputationmodels,nottomentiondifferentAPIs.
Conse-quently,itisuptotheusertoimplementagraphalgorithmthatiscompatiblewitheachsystem'scomputationmodelandAPI.
Suchanimplementation,however,canimposeanon-trivialprogram-mingoverheadtotheuserandthuslowersproductivityassub-stantialmodicationstotheoriginalgraphalgorithmareoftenre-quired.
(Section2)Inordertoaddressthisissue,weintroduceahigh-leveldomain-speciclanguage(DSL)forgraphanalysis.
Thatis,welettheusersprogramthegraphalgorithmsintuitivelyusingtheDSL,andthenletthecompilertranslateitintothetargetsystemwithitsdifferentprogrammingmodelandAPI,inanefcientway.
NotethatitisnotanewideatouseaDSLforthispurpose.
SQLisaclassicexampleofanintuitiveinterfacelanguageforaccessingarelationaldatabase.
Pig[11]andHive[15]arealsogoodexamplesthatgivesignicantproductivitybenetsforMapReduceenvironments.
ThispaperreportsourearlyexperienceofusingaDSLforlarge-scalegraphanalysis.
Specically,weencodeafewpopulargraphalgorithmswiththeGreen-Marl[5]DSL.
WethenusetheGreen-MarlcompilertogenerateanequivalentGiraphapplicationoutofthegivenGreen-Marlprogram.
Wediscussthefollowingtopicsregardingthisapproach:Productivity:HowintuitiveorhowhardisittoprogramgraphalgorithmswithGreen-Marl(Section3)Performance:Howisthequalityofthecompiler-generatedcodeHowmuchperformanceoverheaddoesitinduce(Section4)Otherissues:WhatareotherpracticalbenetsandissuesinusingaDSLEspecially,howcantheDSLapproachbeseamlesslyintegratedintoaconventionaldataanalysissys-tem(Section5)WedrawsomeconclusionsinSection6whileweusethenextsection(Section2)toprovidesomebackgroundinformation.
2.
BACKGROUND2.
1Pregel,GiraphandTheirProgrammingModelPregel[10]isadistributed,in-memorygraphanalysisframe-work,inwhichtheverticesofthegrapharedistributedacrossmul-tipleworkermachines.
Theoriginalpublicationshowedthattheframeworkishighlyscalable.
ThePregelframeworkadoptsaspecialprogrammingmodel:vertex-centric:Theuserimplementsasinglemethod(vertex.
compute())whichdescribesthebehaviorofeachvertex.
Themethodisappliedtoeveryvertexinthegraphinparallel.
statefulanditerative:Thesamevertex.
compute()func-tionisappliedovermultipletime-steps.
Vertex-privatedataismaintainedbetweentime-steps.
message-passing:Forinter-vertexcommunication,aver-texcanexplicitlysendmessagestoothervertices.
GloballyshareddatacanbeimplementedviaaspecialaggregatorAPI.
bulk-synchronous[17]:Conceptually,everyvertexcompu-tationhappensinparallelateachtime-step,whileglobalbar-riersynchronizationisenforcedattheendofthetime-step.
Allthemessagesgeneratedinatimestepareinstantlydeliv-eredatthebeginningofthenexttime-step.
Giraph[1]isanopen-sourceimplementationofPregel,withafewenhancements.
First,GiraphworkersareimplementedasHadoopMappers;therefore,GiraphtaskscanbetriviallyintegratedwithalltheotherHadoopinfrastructurecomponents,suchasHDFS(HadoopDistributedFileSystem),Pig,andHive.
Notethatthisisveryusefulwhenalargegraphinstanceisgeneratedfromtheevenlargerrawdata-setinHDFSasthereisnoneedtomovealargeamountofdataonlyforgraphanalysis.
Second,Giraphadoptstheconceptofmaster.
compute()[13],whichallowsmaster-side,se-quentialcomputationbetweeneachparallelvertexcomputation.
2.
2CompilingGreen-MarlintoGiraphInthisstudy,weuseGreen-Marl[5],aDSLfordescribinggraphanalysisalgorithms.
IncontrasttoPregel'sprogrammingmodel,Green-Marladoptsanimperative,shared-memorystyleprogram-mingmodel.
Italsoprovidesvariousbuilt-indatatypes,operators,andfunctionswhichallowsforintuitiveprogrammingbyusers,whilestillexposingimportantsemanticinformationtothecom-piler.
Weomitprovidingdetailsofthelanguagehere,butthelan-guagespecicationispubliclyavailable[2].
Inordertoaccommodatelarge-scalegraphanalysis,weextendtheexistingGreen-MarlcompilersuchthatittransformsthegivenGreen-MarlprogramintoanequivalentGiraphprogram.
Notethatsuchatransformationisnottrivialbecausethesetwoprogramsas-sumeverydifferentprogrammingmodels;Green-Marlprogramsarewritteninashared-memoryimperativestyle,whilePregelpro-gramsarewritteninabulk-synchronous,message-passing,vertex-centricstyle.
Herewepresentanoverviewofthiscompilertrans-Figure1:Compilationsteps:Initially,thecompilerusesanannotatedabstractsyntaxtree(AST)astheinternalrepresentation(IR).
Aftertransforma-tionsteps,thecompilerusesanotherIRthatiscomposedofbothanitestatemachine(FSM)andanAST.
formation.
Thedetailsaboutthistransformationareoutsidethescopeofthispaper,butavailableinanothermanuscript[6].
Figure1depictstheoverallowofourcompilationsteps.
OncetheinputGreen-Marlprogramisparsedandtype-checked,thepro-gramisrstinternallyrepresentedasanannotatedsyntaxtree.
Thenthecompilerappliesmultipletransformationrules,whiletry-ingtoturnthegivenprogramintoPregel-compatibleform.
Forinstance,thefollowingprogramisnotPregel-compatible,becauseeachvertexnisreadingitsneighborst'sbarvaluewhilePregeldoesnotallowremotedatareading:Foreach(n:G.
Nodes)Foreach(t:n.
Nbrs)n.
foo+=t.
barThecompilertransformstheaboveprogramintothefollowingform,whichisPregel-compatible;noweachvertextpushesitsownbarvaluetoincomingneighbors,whichcanbeimplementedasmes-sages.
Foreach(t:G.
Nodes)Foreach(n:t.
InNbrs)n.
foo+=t.
barNotethat,however,somegraphalgorithmsareinherentlyse-quential(e.
g.
Tarjan'sstronglyconnectedalgorithm)andthusnotcompatiblewithPregelinanyrealisticway.
IfthecompilerfailstotransformthegivenprogramintoPregel-compatibleform,itsimplyreportsanerrorandstops.
Ontheotherhand,inthecaseofaPregel-compatibleGreen-Marlprogram,Thecompileridentiesparallelandsequentialex-ecutionphasesofthealgorithmandanalyzesthecontrolstructurebetweenthem.
Fromthisinformation,thecompilercreatesanitestatemachine(FSM)inwhicheachstatecorrespondstoatime-stepinPregelexecution.
Thecompilerexaminesinformationthatisexchangedbetweenstatesandconvertsthemintomessages.
Thecompileralsoappliesseveraloptimizationrulestominimizethenumberoftime-stepsandthesizeofthemessages.
Finally,thecompilercreatestheresultingGiraphprogramswiththeappropri-ateAPIcalls,includingalltherequiredboilerplatecode.
2.
3ExampleGraphAlgorithmsInthisstudy,weperformanearlyevaluationofusingGreen-Marlforlargescalegraphanalysis,usingafewpopular,importantgraphalgorithmsasourexamplealgorithms.
Thealgorithmsusedinourstudyareasfollows:1.
PageRank[12]:PageRankisaveryfamousalgorithmtocomputeaweightofavertexbasedonthelinkstructureofFigure2:TheCountedTrianglePatterninDirectedGraphsthegraph.
ThePageRankvalueofeachvertexisdeterminedbytheweightedsumofPageRankvalueofitsneighborhoodvertices;thecomputationisrepeateduntilallthePageRankvalueshaveconverged.
2.
TriangleCounting[14]:Countingthenumberoftriangles(orclosedtriads)isacriticalstepforcomputingclusteringco-efcientanddetectingcommunitystructures.
Inthisexperi-ment,weuseavariantofthealgorithmwhichcountsonlyaspecicpatternoftriangles(Fig.
2)assumingthattheinputgraphisdirected.
3.
RandomWalkSamplingwithRandomJumps[8]:Samplingisusedtoobtainarepresentative(vertex)setofalargegraph.
Here,weimplementasamplingalgorithmthatperformsran-domwalkingonthegraphwithprobabilisticrandomjump-ingforthesakeofescapingfromlocalclusters.
3.
PROGRAMMABILITYANDPRODUCTIVITYInthissection,wediscussthebenetsofusingaDSLanditsimpactonprogrammabilityandproductivity.
Forthispurpose,weimplementtheexamplealgorithmsinSection2bothwithGreen-MarlaswellasdirectlywiththeGiraphAPI.
Figure3showstheresultingGreen-Marlprograms.
Notethatthegureshowstheen-tireprogramsandnotexcerpts.
3.
1CurrentBenetsIntuitiveProgrammingModelGraphanalysisalgorithmsimplementedinGreen-Marlarequiteintuitive,ascanbeseenfromFigure3.
Mostnoticeably,Green-Marlprogramsarewritteninanimperative,shared-memorystyle,whichissimilartohowtheoriginalgraphalgorithmswerespec-ied.
Indeed,theGreen-MarlimplementationofPageRankandTriangleCountingarealmostidenticaltotheabstractalgorithmde-scriptionintheiroriginalpublications[12,14].
Tothecontrary,thePregelprogrammingmodelmayrequiremod-icationofgraphalgorithms.
Forinstance,intheoriginalPageR-ankalgorithm,eachvertexreadsvaluesfromitsincomingneigh-bors(line10).
However,inthePregelimplementation,eachvertexsendsoutitsPageRankvalue(dividedbyitsdegree)totheoutgoingneighbors://vertexclasspublicvoidcompute(Iterablemsgs){//receivedmessages;for(DoubleMsgm:msgs)sum+=m.
get();doublenewV=((1.
0-d)/N+d*sum);//newPageRank//sendmessagesto'out-nbrs'sendMsgToNbrs(newDoubleMsg(newV/numEdges()));.
.
.
}Inotherwords,theoriginalalgorithmusesdata-pullingwhilethePregelAPIonlyallowsdata-pushing;thereforethealgorithmhastobemodiedaccordingly.
TheGreen-Marlcompilerautomati-callyhandlessuchtransformationbetweendata-pullinganddata-pushing1ProcedurePageRank(G:Graph,e,d:Double,max:Int2;pg_rank:Node_Prop){3Doublediff;4DoubleN=G.
NumNodes();5Intcnt=0;6Do{7diff=0;8Foreach(t:G.
Nodes){9Doubleval=(1-d)/N+10d*Sum(w:t.
InNbrs){w.
pg_rank/w.
Degree()};11diff+=|val-t.
pg_rank|;12//pg_rankupdatedsynchronouslyaftert-loop13t.
pg_ranke)&&(cntu){23If(w.
HasEdgeFrom(u)||w.
HasEdgeTo(u))24T++;25}}}26ReturnT;27}(b)TriangleCounting28ProcedureRandom_Walk_Sampling(G:Graph,29p_sample:Float,p_jump:Float,30num_tokens:Int;Selected:Node_Prop){3132//temporaryproperties33Node_PropToken,TokenNxt;3435//Initialize36G.
Token=0;37G.
TokenNxt=0;38G.
Selected=False;39LongN=G.
NumNodes()*p_sample;//samplesize4041//Chooseinitialnodesandgivethemtokens42Node_SetS;43While(S.
Size()0){53//Increasesamplesizeatfirsttokenreception54If(!
n.
Selected){55n.
Selected=True;56size++;57}58While(n.
Token>0){//randomlyshootouttokens59n.
Token--;60If((n.
Degree()==0)||(Uniform()N)halt();}}Asimilarstatemachinehastobeimplementedforthevertexclass.
Suchstatemanagementcodegrowsevenlongerifthetargetalgo-rithmiscomposedofmultiplecomputationkernels.
Finally,intuitiveprogrammingisalsoveryvaluableforthesakeofsoftwaremaintenance.
Green-MarlprogramslikeinFigure3areshortandintuitivesothattheycanbeeasilyunderstoodbypeoplewhodidn'toriginallywritetheprogramandthuscanbemaintainedoveralongperiodoftime.
AutomaticApplicationofOptimizationsWhilegeneratingtheGiraphprograms,theGreen-Marlcompilerautomaticallyappliesasetofoptimizationsthat(1)enableexpress-ingcertainfunctionalitieswiththeGiraphAPIor(2)enhancetheperformanceofthegeneratedcode.
Forexample,theTriangleCountingalgorithmrequiresthein-comingneighborsaswellastheoutgoingneighbors(Line23inFigure3).
However,theoriginalPregel(Giraph)APIonlyprovidesoutgoingneighbors.
Inthiscase,thecompilerautomaticallyinsertsextrastatesintheFSMthatcomputeincomingneighbors.
Thatis,atthersttimestepeachvertexsendsitsownIDtoitsoutgoingneighbors,andatthenextstepeachvertexconstructtheincom-ingneighborlistfromthosemessages.
Notethatourcompilerin-sertstheseextrastatesonlyifthegivenprogramrequiresincomingneighbors.
Asanotherexample,forPageRankandRandomWalkSampling,thecompileridentiesthattherearemessagepassingcomputationkernelsinsideawhileloop.
Insuchacase,thecompilerautomat-icallycombinesthemessagereceivingofthelastkernelwiththemessagesendingofrstkernelinthewhile-loop.
Thisoptimiza-tionreducesthenumberoftime-stepsinthegeneratedprogram.
Certainly,theprogrammerscouldhaveappliedthesamepro-grammingtricksmanuallywhentheyhand-codetheGiraphappli-cation.
However,sincethesetechniquesareautomaticallyappliedbyourcompiler,eventheprogrammerswhoarenotawareofthesetechniquescanhavethesamebenets.
FreeofBoilerplateCodeTable1comparesthesizeofGreen-Marlprogramsandhand-writtenGiraphprogramsusingbothLines-of-Codeandnumberofbytes.
Noticeably,GiraphprogramsarealotlongerthanGreen-Marlprograms,eventhoughtheyimplementthesamealgorithm.
MostofthesizedifferencecomesfromboilerplatecoderequiredbytheGiraphAPI.
Forinstance,theprogramhastodeneamessageandvertex/edgedataclass,implementtheirserializationmethods,registeraggregators,declareinputgraphloadersandoutputwrit-ers,andspecifyjobcongurations.
Inourapproach,theGreen-Marlcompilerautomaticallygeneratesallthislengthyboilerplate,therebyprovidingadditionalproductivitybenets.
3.
2TobeImprovedTheLearningCurveGreen-Marl|ManualGiraphAlgorithmLOC#BytesLOC#BytesPageRank195161886633TriangleCounting143001686272Random-WalkSampling53116844416766Table1:ComparisonofLineofCodes(LOC)andnumberofbytesbetweenGreen-MarlandmanualGiraphimplementa-tionofthesamealgorithmsBeingaDSL,Green-Marlrequirestheusertoundergoacertainamountoflearning.
Thelearningcurveisnotunreasonablysteep;thelanguagelookslikejustanotherdescendantofCanddoesn'taddtoomanynewconceptsotherthangraph-specicbuilt-intypesandoperatorsHowever,sinceGreen-Marlisaresearch-levellanguageunderdevelopment,itcertainlylacksadetaileddocumentationorausercommunity,whichhampersthelearningexperiencequiteabit.
Non-Giraph-CompatibleAlgorithmsWhenthegivenalgorithmisaninherentlysequentialone,how-ever,theGreen-MarlcompilercannotturnitintoaGiraphprogram.
ThecompilersimplyreportsanerrorwhenitfailstotransformthegivenprogramintoaPregel-compatibleform(Section2).
Insuchacase,itisuptotheprogrammertocreateanewalgorithmthatiscompatiblewithPregel.
Amongourexamples,theoriginalRandomWalkSamplingal-gorithm[8]isinherentlysequentialandthuscannotbecompiledintoGiraph.
Wecreatedaparallelversionofthisalgorithmbyin-troducingtheconceptoftokens;weletmultipleagents,asmanyasnumTokens,walkthegraphinparallel,whereanyvertexhavingatokenbecomesanagent.
NotethattheresultingGreen-Marlpro-gram(Figure3.
(c))isstillwritteninanimperative,shared-memorystylewithoutanyboilerplatecode.
AlthoughauserinterventionisdisappointingasitdiminishesthebenetoftheDSL,itisinevitableforthosealgorithmsthatareinherentlynotPregel-compatible.
Theuserhastocreateanewal-gorithmwhichcanbetargetedtotheunderlyingcomputationalsys-tem.
Indeed,thetwosamplingalgorithms(theoriginalalgorithmandourparallelizedone)arenotstrictlyequivalent;theoriginalal-gorithmproducesasampleoftheexactsize,whiletheparalleloneproducesasamplethatislarger(bythenumberoftokens)thantherequestedsize.
Currently,wearetakingapracticalapproachtothisissue.
(1)Thecompilerrecognizesfrequent(non-Pregel-compatible)patternsingraphalgorithmsandtransformsthemintoPregel-compatibleonesautomatically.
(2)WedeneasubsetofGreen-Marlprogram-mingpatternsthatareguaranteedtobecompilableintoGiraph.
Therefore,thecompilerpointsoutnon-Pregel-compatibleportionsoftheprogramsothatuserscanre-writetheseportionsusingthePregel-compatibleGreen-Marlsubset;thusthecompilerishelpingtheuserwiththeinevitablealgorithmmodication.
CompilerMaturityConversely,onecanaskaboutthecompletenessofourapproach:caneveryPregel-compatiblealgorithmbecompiledfromitsGreen-MarldescriptionFundamentally,webelievethattheanswerisyes1;however,thecurrentcompilerneedsmoreimprovementasitdoesnotsupportallthefeaturesrequiredforthePregel-compatiblesubset.
Specically,thecurrentcompileronlyprovideslimitedsup-portforcollectiondatatypesandaggregationoncollectionsforGiraphcompilation.
User-deneddatatypesarealsonotsupportedyet.
Nevertheless,thesearenotfundamentalobstaclesbutmostly1wecanmakeamappingfromeverymessage-passingcallinPregeltoawritestatementinGreen-MarlFigure4:RuntimeComparisonBetweenHand-codedandCompiler-GeneratedGiraphImplementations:PR,TCandRWstandsforPagerank,TriangleCountingandRandomWalkSampling,respectivelyengineeringissuesthatcanbeimproveduponovertime.
Overall,webelievethatprogramminginGreen-Marlcanprovideconsiderableproductivitybenetsforgraphanalysisasitmatures.
Easeofprogrammingisespeciallyusefulwhentheuserwantstoexploredifferentgraphalgorithmsinashorttimeandquicklytryoutnewgraphalgorithms,orwhentheywishtocustomizeanex-istinggraphalgorithmfortheirownpurpose.
4.
QUALITYOFTHECOMPILERGENER-ATEDIMPLEMENTATIONInthissection,wediscussthequalityofthecompiler-generatedGiraphimplementations,focusingontheirperformance.
Forthispurpose,weevaluateboththehand-codedandcompiler-generatedGiraphimplementationofthethreeexamplealgorithmsinSec-tion2.
32onanx86clusterandexaminetheirperformancechar-acteristics.
Ourclusteriscomposedof8machines,witheachmachinehav-ing16cores(32HWthreads)and256GBofmemory,runningCen-tOS6.
Weusedthe1.
0.
4releaseofHadoopandasnapshotofGiraph0.
2.
203takenonFeb28th,2013.
AllapplicationswerecompiledwithOraclejdk1.
7.
017andranwiththecorrespond-ingOracleHotSpot64-bitJVM.
Weonlyinstantiate10workerspermachine,i.
e.
80workersintotal;sinceeachworkerprocessalreadyconsistsofafewthreads(forcomputationandcommuni-cation),addingmoreworkerspermachineimpactedperformancenegatively.
Weranthealgorithmsontwopublicgraphinstances;oneisawebgraphinstance(LiveJournal)fromtheSNAPgraphreposi-tory[3],andtheotherisasocialgraphextractedfromTwitteruserrelationships[7].
LiveJournalismoderateinsize(4millionverticesand34millionedges),whileTwitterislarger(40millionverticesand1.
8billionedges).
Figure4depictstheresultingexecutiontimeofeachalgorithmonthesegraphinstanceswhilethey-axisshowstherelativeexecutiontime,i.
e.
normalizedbyexecutiontimeofthehand-codedgiraphapplication.
4.
1CurrentBenetsDecentPerformanceofCompiler-GeneratedImplementationsAscanbeseeninFigure4,thecompiler-generatedGiraphim-plementationsperformfairlyclosetothehand-writtenones.
Eventhoughthecompiler-generatedimplementationswereslowerthanmanualonesforPagerankandTriangleCounting,thedifference2Wemodiedthetrianglecountingalgorithmforthesakeofthisexper-iment,sincethereisafundamentalissuerelatedtotheBSPcomputationmodelforthisalgorithm.
SeeSection4.
2fordetails.
wasonlyabout1015%.
InthecaseofRandomSampling,therewasvirtuallynodifferenceatall.
Thesmalldifferenceistheresultofseveralcompileroptimiza-tionsappliedduringcodegeneration.
Forinstance,thecompilermergesmultiplestatesintooneaslongasthereisnodatadepen-dencybetweenthem,reducingthenumberofglobalbarriers.
Wereferinterestedreaderstoamoredetailedreportthatexplainshowourcompilerworks[6].
Reasonsfortheremainingperformancedifferencewillbediscussedinthenextsubsection.
ThecaseoftheTriangleCountingalgorithmisworthacloserlookasthersthand-codedversionwasactuallyslowerthanthecompiler-generatedone.
Infact,itdidnotevennishfortheTwit-tergraphinstanceinareasonabletime.
Thiswasbecausethepro-grammerused(fromhabit)EdgeListVertexasbaseclass,whichhasasmallmemoryfootprintandprovidesfastiterateoverneigh-bors,butisveryslowwhencheckingwhetherthereisneighborhoodrelationship.
Thismistakewasnotevidentduringtheinitialtestingphaseassmallgraphinstancesaretypicallyused.
NotethatwithDSL-apporach,theuserdoesn'thavetoworryaboutsuchissuesbutcanrelyonthecompilertohandlethem.
ReadabilityofGeneratedCodeWeremindthereadersthatthecompiler-generatedcodeisjustaplainJavaprogramthatusestheGiraphAPI.
Infact,thegeneratedcodeisfairlyreadable.
Thecodeisappropriatelyindentedandvari-ablenamesintheoriginalGreen-Marlprogramsarepreservedasmuchaspossible.
Moreover,eachgeneratedJavacodeblockcon-tainsthecommentsoftheoriginalGreen-MarlsourcefromwhichtheJavacodeisgenerated.
Forpracticalreasons,westrivetoensurethattheprogrammerisabletoreadthegeneratedcodeandeventoeditthegeneratedcode;forinstanceifGiraphintroducesanewfeatureorAPI,theprogrammercanmanuallyapplyhot-xestothegeneratedcode,beforethecompilergetsupdated.
4.
2ToBeImprovedSub-optimalPerformanceStill,thereisacertainperformancedifferencebetweenthehand-codedGiraphimplementationsandthecompiler-generatedones,forPageRankandTriangleCountingalgorithminFigure4.
Inbothcases,theperformanceoverheadcamefromthemoregenericap-proachtakenbythecompiler.
ForthecaseofPageRank,thecompilerisnotutilizingthemes-sagecombinerAPIatall,sinceingeneraltherecanbemultiplemessagetypesthatareused(eveninsideonekernel),whiletheGi-raphAPIonlyallowsonecombinerclasswhichwillbeappliedtoeverymessage.
However,itispossibletoextendthecompilersuchthatitcanhandlespecialcasesdifferently.
Inthisparticularcase,thecompliercanrecognizethatthereisonlyonetypeofmessagebeingusedforsummingvaluesatthedestination,andthusitcanusethesummationcombiner.
ForthecaseofTriangleCounting,themanualimplementationusesHashMapVertex,whichhasalargememoryfootprint,isslowforiteration,butfastincheckingneighborhoodrelationships.
Ontheotherhand,thecompiler-generatedimplementation,asatrade-off,usesEdgeListVertexaccompaniedwithamemory-efcientsidestructure(i.
e.
asortedprimitivearray),whichisgoodforbothiterationandneighborhoodchecking.
Again,thecompilercanrecognizethatthegivenprogramisaspecialcasewhichhasonlyonekernelthatdoesnothingbutneighborhoodchecking,anditcandecidetouseHashMapVertexinstead.
Fundamentally,theGreen-MarlDSLexposesenoughse-manticinformationforthecompilertheapplythesespecializationtechniques.
LimitationoftheTargetRuntimeSystemTheGreen-MarlgeneratedimplementationsarestillboundbythefundamentallimitationsoftheunderlyingGiraphruntime.
Forinstance,aGiraphimplementationoftheoriginalTriangleCount-ingalgorithminFigure3whenusedwiththeTwittergraphin-stance,(orevenwiththeLiveJournalinstance),simplyfailstoexe-cute.
Thisisbecausethesegraphsshowahighlyskeweddegreedistri-butionsuchthatthereexistasmallnumberofhigh-degreenodes,i.
e.
nodesthathavemillionsofneighbors.
NotethattheGiraphim-plementationoftheoriginalalgorithmproducesO(degree2)num-berofmessagespereachvertex.
Forverticeswithamillionsofneighbors,thisnumberreachesthetrillions.
Workerprocessesrunoutofmemory,sinceGiraphkeepsthesemessagesin-memoryun-tilthenexttime-stepbegins.
Inourexperiment,weonlycountedtrianglesthatoriginatedfromsmalldegreenodes(i.
e.
smallerthan100)sothatwecanstillevaluateourapproach.
Itwouldalsobepossible,though,forthecompilertowarntheusersinadvance,ifitseesapatternthatmightresultinalow-performingimplementationduetomessagecountexplosion.
5.
OTHERBENEFITSANDISSUESDebuggingNewAlgorithmsGreen-Marlcanprovideadditionalbenetsfordebuggingnewgraphalgorithms.
Firstofall,webelievethatprogrammingusingGreen-Marlwouldbelesserror-pronethandirectlyprogrammingGiraph,sinceitismoreintuitiveandmoresuccinct.
Moreover,sincetheGreen-Marlcompilercangenerateafastrunning(parallel)shared-memoryC++programfromthesamein-put[5],thealgorithmdevelopercantesttheiralgorithminafastwaywithsmall-sizedgraphsonasinglemachine,withouthavingtouseaHadoopclusteratall.
Thisallowsformuchshorterturn-aroundtimeintestinganewalgorithm.
However,theusermaystillsuspectthatthecompilerinjectser-rorsincodegeneration.
Currently,weallowtheuserstoinspectthegeneratedcodesinceitisveryreadable.
Thecompilercanalsobeaskedtoprintthecodeateachintermediatecompilationstep.
FutureBenetsforPortabilityThiseffortfocusedonthetranslationofabstractGreen-Marlpro-gramsintoonespecicanalysisframework:Giraph.
However,Green-MarlisnotdenedonlyfortheGiraphframework,butforgeneralgraphalgorithms.
ThereforeonecanimagineaGreen-MarlcompilerthattranslatesthesameGreen-Marlprogramintootherdifferentgraphprocessingframeworks(e.
g.
GraphLab[9]andTrinity[16])aswellasGiraph.
Sincethesesystemsgenerallyshowdifferentperformancecharac-teristicsdependingonthealgorithmorsize/shapeofinputgraph,usersmaychoosetheappropriateruntimefortheirpurpose.
Indeed,asofnow,theusercangenerateashared-memoryimple-mentationandGiraphimplementationfromthesameGreen-Marlprogram.
Ifthesizeofthetargetgraphtsinasinglemachine'smemoryspace,onecanusetheshared-memoryimplementationwhichisgenerallyfasterthanGiraphexecutionasitdoesnotsufferfromcommunicationcost.
SystemIntegrationAnotherpracticalissueishowtointegratetheGreen-MarlDSLwiththerestofacompletedataanalysissystem,inwhichgraphanalysisisonlyonephaseintheanalysisow.
Acompletedataanalysissystemmayincludepersistentstorageofgraphinstancesorthematerializationofatemporarygraphoutofrawdata.
Itmayalsoincludeotheroff-lineanalysisengines(e.
g.
Hadoopjobsforlteringoutuninterestingdata)aswellasaconnectionmechanismtoon-linesystemsthatmakeuseoftheresultofoff-lineanalysis.
Currently,ourcompiler-generatedprograms(bothGiraphandshared-memoryapplications)loaddataleseitherfromthelocallesystemorfromHDFSassumingthatthegraphisavailableasoneofafewpopularformatssuchasanadjacencylist.
Supportingmoreformatsoruser-denedloaderswouldmakeitmoreexible.
Ontheotherhand,wecanalsothinkofextendingtheDSLsuchthatitcandeclareinputsandoutputsofagraphanalysisinamoreabstractanddeclarativeway.
Forinstance,theinputcanbeloadedfromadatale,databaseorevendirectlypipelinedfromotheranal-ysis.
SimilarapproachesofothersuccessfulDSLS(e.
g.
PigandLINQ)canbeappliedforextendingGreen-Marlaswell.
6.
CONCLUSIONThispaperreportedonourearlyexperiencewithusingGreen-Marlasafront-endlanguageforlarge-scalegraphanalysis,bycompilingGreen-MarlprogramsintoGiraphapplications.
Weob-servedthatGreen-Marlprogramsareconciseandintuitivewhiletheperformanceofthecompiler-generatedcodecloselymatcheshand-tunedapplications.
Overall,wethinkthisapproachcouldprovideconsiderableproductivitybenetsasthecompilermatures.
AcknowledgementsWeappreciateSamShah,RoshanSumbalyandEvionKimatLinkedInfortheircollaborationinthisstudy.
7.
REFERENCES[1]ApacheGiraphProject.
http://giraph.
apache.
org.
[2]Green-MarlDSL.
http://github.
com/stanford-ppl/Green-Marl.
[3]Stanfordnetworkanalysislibrary.
http://snap.
stanford.
edu/snap.
[4]ApacheHadoop.
http://hadoop.
apache.
org/.
[5]S.
Hong,H.
Cha,E.
Sedlar,andK.
Olukotun.
Green-Marl:ADSLforEasyandFfcientGraphAnalysis.
InASPLOS,2012.
[6]S.
Hong,S.
Salihoglu,J.
Widom,andK.
Olukotun.
TechReport:CompilingGreen-MarlintoGPS.
http://ppl.
stanford.
edu/papers/tr_gm_gps.
pdf.
[7]H.
Kwak,C.
Lee,H.
Park,andS.
Moon.
WhatisTwitter,asocialnetworkoranewsmediaWWW'10,2010.
[8]J.
LeskovecandC.
Faloutsos.
Samplingfromlargegraphs.
InKDD,2006.
[9]Y.
Low,D.
Bickson,J.
Gonzalez,C.
Guestrin,A.
Kyrola,andJ.
Hellerstein.
DistributedGraphLab:AFrameworkforMachineLearningandDataMiningintheCloud.
ProceedingsoftheVLDBEndowment,5(8):716–727,2012.
[10]G.
Malewicz,M.
H.
Austern,A.
J.
Bik,J.
C.
Dehnert,I.
Horn,N.
Leiser,andG.
Czajkowski.
Pregel:ASystemforLarge-scaleGraphProcessing.
InSIGMOD'10.
ACM.
[11]C.
Olston,B.
Reed,U.
Srivastava,R.
Kumar,andA.
Tomkins.
PigLatin:ANot-so-foreignLanguageforDataProcessing.
InSIGMOD.
ACM,2008.
[12]L.
Page.
Methodfornoderankinginalinkeddatabase,Sept.
42001.
USPatent6,285,999.
[13]S.
SalihogluandJ.
Widom.
GPS:GraphProcessingSystem.
http://infolab.
stanford.
edu/gps.
[14]S.
SuriandS.
Vassilvitskii.
Countingtrianglesandthecurseofthelastreducer.
InWWW.
ACM,2011.
[15]A.
Thusoo,J.
Sarma,N.
Jain,Z.
Shao,P.
Chakka,S.
Anthony,H.
Liu,P.
Wyckoff,andR.
Murthy.
Hive:AWarehousingSolutionOveraMap-ReduceFramework.
ProceedingsoftheVLDBEndowment,2(2):1626–1629,2009.
[16]Trinity.
http://research.
microsoft.
com/en-us/projects/trinity/default.
aspx.
[17]L.
Valiant.
Abridgingmodelforparallelcomputation.
CommunicationsoftheACM,33(8):103–111,1990.

亚洲云-浙江高防BGP.提供自助防火墙高防各种offer高防BGP!

 亚洲云Asiayun怎么样?亚洲云Asiayun好不好?亚洲云成立于2021年,隶属于上海玥悠悠云计算有限公司(Yyyisp),是一家新国人IDC商家,且正规持证IDC/ISP/CDN,商家主要提供数据中心基础服务、互联网业务解决方案,及专属服务器租用、云服务器、云虚拟主机、专属服务器托管、带宽租用等产品和服务。Asiayun提供源自大陆、香港、韩国和美国等地骨干级机房优质资源,包括B...

A400互联(49元/月)洛杉矶CN2 GIA+BGP、1Gbps带宽,全场独服永久5折优惠

a400互联是一家成立于2020年商家,主营美国机房的产品,包括BGP线路、CN2 GIA线路的云服务器、独立服务器、高防服务器,接入线路优质,延迟低,稳定性高,额外也还有香港云服务器业务。当前,全场服务器5折,香港VPS7折,洛杉矶VPS5折,限时促销!A400互联官网:https://a400.net/优惠活动全场独服永久5折优惠(续费同价):0722香港VPS七折优惠:0711洛杉矶VPS五...

艾云年付125元圣何塞GTT,洛杉矶vps年付85元

艾云怎么样?艾云是一家去年年底成立的国人主机商家,商家主要销售基于KVM虚拟架构的VPS服务,机房目前有美国洛杉矶、圣何塞和英国伦敦,目前商家推出了一些年付特价套餐,性价比非常高,洛杉矶套餐低至85元每年,给500M带宽,可解奈飞,另外圣何塞也有特价机器;1核/1G/20G SSD/3T/2.5Gbps,有需要的朋友以入手。点击进入:艾云官方网站艾云vps促销套餐:KVM虚拟架构,自带20G的防御...

targetframework为你推荐
photoimpact教程PhotoImpact 10的下载地址。。无毒的。谢谢~深圳公交车路线深圳公交车路线中国论坛大全天涯论坛的网址?工信部备案怎样在工信部进行域名备案?要详细硬盘人500G的硬盘容量是多少啊?如何快速收录如何让百度快速收录云挂机有免费的云挂机软件吗?网站优化方案网站优化方案如何写?blogcn远目是什么意思?网络虚拟机虚拟机网络设置
vps安全设置 息壤备案 themeforest 空间打开慢 云主机51web 镇江联通宽带 国外在线代理 100x100头像 谁的qq空间最好看 169邮箱 idc是什么 域名接入 空间登录首页 1元域名 空间登陆首页 免费asp空间 韩国代理ip 服务器硬件配置 上海联通 phpinfo 更多