typestargetframework

targetframework  时间:2021-03-02  阅读:()
EarlyExperiencesinUsingaDomain-SpecicLanguageforLarge-ScaleGraphAnalysisSungpackHongOracleLabssungpack.
hong@oracle.
comJanVanDerLugtOracleLabsjan.
van.
der.
lugt@oracle.
comAdamWelcOracleLabsadam.
welc@oracle.
comRaghavanRamanOracleLabsraghavan.
raman@oracle.
comHassanChaOracleLabshassan.
cha@oracle.
comABSTRACTLarge-scalegraphanalysishasrecentlybeendrawinglotsofatten-tionfrombothindustryandacademia.
Althoughtherearealreadyseveralframeworksdesignedforscalablegraphanalysis,e.
g.
Gi-raph[1],alltheseframeworksadoptnon-traditionalprogrammingmodelsandAPIs.
Thiscansignicantlylowertheproductivityoftheframeworkuser.
ThispaperdiscussesthefeasibilityofusinganintuitiveDomain-SpecicLanguage(DSL)forgraphanalysis.
Specically,weuseacompilertotranslateGreen-Marl[5]pro-gramsintoanequivalentGiraphapplication,automaticallybridg-ingbetweenverydifferentprogrammingmodels.
WeobservethattheDSLprogramsareconciseandintuitive,andthatthecompilergeneratedGiraphimplementationsexhibitperformanceonparwiththatofhand-writtenones.
However,theDSLcompilationcannotbutfailifthealgorithmisfundamentallynotcompatiblewiththetargetframework.
Overall,webelievethattheDSL-basedapproachwillprovidegreatproductivitybenetsonceitmatures.
1.
INTRODUCTIONAgraphisafundamentaldatarepresentationthatcapturesrela-tionships(edges)betweendataentities(vertices).
Graphanalysisisaprocedurewhichexaminessuchrelationshipsandextractscertaininformationthatisnotimmediatelyavailablefromagivendata-set.
Examplesofgraphanalysisincludeassigningweightstothedataentitiesbasedontheirrelativeimportance,predictingfuturerela-tionshipbetweendataentities,andidentifyinggroupsofentitiesthataremorecloselyrelatedthanothers.
Furthermore,graphanal-ysisalgorithmscantakeasinputtheresultsofotheranalyses,e.
g.
countingnumberof(un-)closedtriadsorsamplingtheverticesinalargegraph.
Large-scalegraphanalysisisoftenconductedinanoff-line(batch)mannerusingthroughput-orientedsystems.
Thisisduetothefactthatsuchananalysistypicallyinvolves(repeated)inspectionofnearlytheentiregraphinstance,andthusnaturallytakesalotoftime.
Consequently,theusagemodelofgraphanalysisissome-Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.
Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.
ProceedingsoftheFirstInternationalWorkshoponGraphDataManage-mentExperienceandSystems(GRADES2013)Jun23,2013,NewYork,NY,USACopyright2013ACM978-1-4503-2188-4.
.
.
$15.
00.
whatdifferentfromthatofsimplegraphqueryingwhichmainlyaimstoidentifythespeciedsubsetofthegraph,usingthemostup-to-dateinformation,withinacertainlatency.
ThisisanalogoustothedifferencebetweenOLAPsystemsandOLTPsystemsinclassicdatabasedesign.
Severaldifferentoff-linesystemshavebeenusedforlarge-scalegraphanalysis.
Hadoop[4],themostpopularMapReduceengine,hasbeenwidelyusedtosolvegraphanalysisproblems[14]eventhoughitsperformanceforgraphanalysishasbeenputinques-tion[9].
Therearealsosystemsthatarespeciallydesignedforgraphanalysis.
Forexample,Pregel[10],anditsopen-sourceim-plementationGiraph[1],provideagraph-specicmessage-passingAPIontopofabulk-synchronous[17]computationengine.
Graph-Lab[9]performsdistributedasynchronousvertex-wisecomputa-tiontriggeredbymessagesexchangedbetweenvertices.
Trinity[16]performsgraphcomputationusingadistributedin-memorykey-valuestore.
Noticeably,allthegraphanalysissystemsaboveadoptdiffer-entcomputationmodels,nottomentiondifferentAPIs.
Conse-quently,itisuptotheusertoimplementagraphalgorithmthatiscompatiblewitheachsystem'scomputationmodelandAPI.
Suchanimplementation,however,canimposeanon-trivialprogram-mingoverheadtotheuserandthuslowersproductivityassub-stantialmodicationstotheoriginalgraphalgorithmareoftenre-quired.
(Section2)Inordertoaddressthisissue,weintroduceahigh-leveldomain-speciclanguage(DSL)forgraphanalysis.
Thatis,welettheusersprogramthegraphalgorithmsintuitivelyusingtheDSL,andthenletthecompilertranslateitintothetargetsystemwithitsdifferentprogrammingmodelandAPI,inanefcientway.
NotethatitisnotanewideatouseaDSLforthispurpose.
SQLisaclassicexampleofanintuitiveinterfacelanguageforaccessingarelationaldatabase.
Pig[11]andHive[15]arealsogoodexamplesthatgivesignicantproductivitybenetsforMapReduceenvironments.
ThispaperreportsourearlyexperienceofusingaDSLforlarge-scalegraphanalysis.
Specically,weencodeafewpopulargraphalgorithmswiththeGreen-Marl[5]DSL.
WethenusetheGreen-MarlcompilertogenerateanequivalentGiraphapplicationoutofthegivenGreen-Marlprogram.
Wediscussthefollowingtopicsregardingthisapproach:Productivity:HowintuitiveorhowhardisittoprogramgraphalgorithmswithGreen-Marl(Section3)Performance:Howisthequalityofthecompiler-generatedcodeHowmuchperformanceoverheaddoesitinduce(Section4)Otherissues:WhatareotherpracticalbenetsandissuesinusingaDSLEspecially,howcantheDSLapproachbeseamlesslyintegratedintoaconventionaldataanalysissys-tem(Section5)WedrawsomeconclusionsinSection6whileweusethenextsection(Section2)toprovidesomebackgroundinformation.
2.
BACKGROUND2.
1Pregel,GiraphandTheirProgrammingModelPregel[10]isadistributed,in-memorygraphanalysisframe-work,inwhichtheverticesofthegrapharedistributedacrossmul-tipleworkermachines.
Theoriginalpublicationshowedthattheframeworkishighlyscalable.
ThePregelframeworkadoptsaspecialprogrammingmodel:vertex-centric:Theuserimplementsasinglemethod(vertex.
compute())whichdescribesthebehaviorofeachvertex.
Themethodisappliedtoeveryvertexinthegraphinparallel.
statefulanditerative:Thesamevertex.
compute()func-tionisappliedovermultipletime-steps.
Vertex-privatedataismaintainedbetweentime-steps.
message-passing:Forinter-vertexcommunication,aver-texcanexplicitlysendmessagestoothervertices.
GloballyshareddatacanbeimplementedviaaspecialaggregatorAPI.
bulk-synchronous[17]:Conceptually,everyvertexcompu-tationhappensinparallelateachtime-step,whileglobalbar-riersynchronizationisenforcedattheendofthetime-step.
Allthemessagesgeneratedinatimestepareinstantlydeliv-eredatthebeginningofthenexttime-step.
Giraph[1]isanopen-sourceimplementationofPregel,withafewenhancements.
First,GiraphworkersareimplementedasHadoopMappers;therefore,GiraphtaskscanbetriviallyintegratedwithalltheotherHadoopinfrastructurecomponents,suchasHDFS(HadoopDistributedFileSystem),Pig,andHive.
Notethatthisisveryusefulwhenalargegraphinstanceisgeneratedfromtheevenlargerrawdata-setinHDFSasthereisnoneedtomovealargeamountofdataonlyforgraphanalysis.
Second,Giraphadoptstheconceptofmaster.
compute()[13],whichallowsmaster-side,se-quentialcomputationbetweeneachparallelvertexcomputation.
2.
2CompilingGreen-MarlintoGiraphInthisstudy,weuseGreen-Marl[5],aDSLfordescribinggraphanalysisalgorithms.
IncontrasttoPregel'sprogrammingmodel,Green-Marladoptsanimperative,shared-memorystyleprogram-mingmodel.
Italsoprovidesvariousbuilt-indatatypes,operators,andfunctionswhichallowsforintuitiveprogrammingbyusers,whilestillexposingimportantsemanticinformationtothecom-piler.
Weomitprovidingdetailsofthelanguagehere,butthelan-guagespecicationispubliclyavailable[2].
Inordertoaccommodatelarge-scalegraphanalysis,weextendtheexistingGreen-MarlcompilersuchthatittransformsthegivenGreen-MarlprogramintoanequivalentGiraphprogram.
Notethatsuchatransformationisnottrivialbecausethesetwoprogramsas-sumeverydifferentprogrammingmodels;Green-Marlprogramsarewritteninashared-memoryimperativestyle,whilePregelpro-gramsarewritteninabulk-synchronous,message-passing,vertex-centricstyle.
Herewepresentanoverviewofthiscompilertrans-Figure1:Compilationsteps:Initially,thecompilerusesanannotatedabstractsyntaxtree(AST)astheinternalrepresentation(IR).
Aftertransforma-tionsteps,thecompilerusesanotherIRthatiscomposedofbothanitestatemachine(FSM)andanAST.
formation.
Thedetailsaboutthistransformationareoutsidethescopeofthispaper,butavailableinanothermanuscript[6].
Figure1depictstheoverallowofourcompilationsteps.
OncetheinputGreen-Marlprogramisparsedandtype-checked,thepro-gramisrstinternallyrepresentedasanannotatedsyntaxtree.
Thenthecompilerappliesmultipletransformationrules,whiletry-ingtoturnthegivenprogramintoPregel-compatibleform.
Forinstance,thefollowingprogramisnotPregel-compatible,becauseeachvertexnisreadingitsneighborst'sbarvaluewhilePregeldoesnotallowremotedatareading:Foreach(n:G.
Nodes)Foreach(t:n.
Nbrs)n.
foo+=t.
barThecompilertransformstheaboveprogramintothefollowingform,whichisPregel-compatible;noweachvertextpushesitsownbarvaluetoincomingneighbors,whichcanbeimplementedasmes-sages.
Foreach(t:G.
Nodes)Foreach(n:t.
InNbrs)n.
foo+=t.
barNotethat,however,somegraphalgorithmsareinherentlyse-quential(e.
g.
Tarjan'sstronglyconnectedalgorithm)andthusnotcompatiblewithPregelinanyrealisticway.
IfthecompilerfailstotransformthegivenprogramintoPregel-compatibleform,itsimplyreportsanerrorandstops.
Ontheotherhand,inthecaseofaPregel-compatibleGreen-Marlprogram,Thecompileridentiesparallelandsequentialex-ecutionphasesofthealgorithmandanalyzesthecontrolstructurebetweenthem.
Fromthisinformation,thecompilercreatesanitestatemachine(FSM)inwhicheachstatecorrespondstoatime-stepinPregelexecution.
Thecompilerexaminesinformationthatisexchangedbetweenstatesandconvertsthemintomessages.
Thecompileralsoappliesseveraloptimizationrulestominimizethenumberoftime-stepsandthesizeofthemessages.
Finally,thecompilercreatestheresultingGiraphprogramswiththeappropri-ateAPIcalls,includingalltherequiredboilerplatecode.
2.
3ExampleGraphAlgorithmsInthisstudy,weperformanearlyevaluationofusingGreen-Marlforlargescalegraphanalysis,usingafewpopular,importantgraphalgorithmsasourexamplealgorithms.
Thealgorithmsusedinourstudyareasfollows:1.
PageRank[12]:PageRankisaveryfamousalgorithmtocomputeaweightofavertexbasedonthelinkstructureofFigure2:TheCountedTrianglePatterninDirectedGraphsthegraph.
ThePageRankvalueofeachvertexisdeterminedbytheweightedsumofPageRankvalueofitsneighborhoodvertices;thecomputationisrepeateduntilallthePageRankvalueshaveconverged.
2.
TriangleCounting[14]:Countingthenumberoftriangles(orclosedtriads)isacriticalstepforcomputingclusteringco-efcientanddetectingcommunitystructures.
Inthisexperi-ment,weuseavariantofthealgorithmwhichcountsonlyaspecicpatternoftriangles(Fig.
2)assumingthattheinputgraphisdirected.
3.
RandomWalkSamplingwithRandomJumps[8]:Samplingisusedtoobtainarepresentative(vertex)setofalargegraph.
Here,weimplementasamplingalgorithmthatperformsran-domwalkingonthegraphwithprobabilisticrandomjump-ingforthesakeofescapingfromlocalclusters.
3.
PROGRAMMABILITYANDPRODUCTIVITYInthissection,wediscussthebenetsofusingaDSLanditsimpactonprogrammabilityandproductivity.
Forthispurpose,weimplementtheexamplealgorithmsinSection2bothwithGreen-MarlaswellasdirectlywiththeGiraphAPI.
Figure3showstheresultingGreen-Marlprograms.
Notethatthegureshowstheen-tireprogramsandnotexcerpts.
3.
1CurrentBenetsIntuitiveProgrammingModelGraphanalysisalgorithmsimplementedinGreen-Marlarequiteintuitive,ascanbeseenfromFigure3.
Mostnoticeably,Green-Marlprogramsarewritteninanimperative,shared-memorystyle,whichissimilartohowtheoriginalgraphalgorithmswerespec-ied.
Indeed,theGreen-MarlimplementationofPageRankandTriangleCountingarealmostidenticaltotheabstractalgorithmde-scriptionintheiroriginalpublications[12,14].
Tothecontrary,thePregelprogrammingmodelmayrequiremod-icationofgraphalgorithms.
Forinstance,intheoriginalPageR-ankalgorithm,eachvertexreadsvaluesfromitsincomingneigh-bors(line10).
However,inthePregelimplementation,eachvertexsendsoutitsPageRankvalue(dividedbyitsdegree)totheoutgoingneighbors://vertexclasspublicvoidcompute(Iterablemsgs){//receivedmessages;for(DoubleMsgm:msgs)sum+=m.
get();doublenewV=((1.
0-d)/N+d*sum);//newPageRank//sendmessagesto'out-nbrs'sendMsgToNbrs(newDoubleMsg(newV/numEdges()));.
.
.
}Inotherwords,theoriginalalgorithmusesdata-pullingwhilethePregelAPIonlyallowsdata-pushing;thereforethealgorithmhastobemodiedaccordingly.
TheGreen-Marlcompilerautomati-callyhandlessuchtransformationbetweendata-pullinganddata-pushing1ProcedurePageRank(G:Graph,e,d:Double,max:Int2;pg_rank:Node_Prop){3Doublediff;4DoubleN=G.
NumNodes();5Intcnt=0;6Do{7diff=0;8Foreach(t:G.
Nodes){9Doubleval=(1-d)/N+10d*Sum(w:t.
InNbrs){w.
pg_rank/w.
Degree()};11diff+=|val-t.
pg_rank|;12//pg_rankupdatedsynchronouslyaftert-loop13t.
pg_ranke)&&(cntu){23If(w.
HasEdgeFrom(u)||w.
HasEdgeTo(u))24T++;25}}}26ReturnT;27}(b)TriangleCounting28ProcedureRandom_Walk_Sampling(G:Graph,29p_sample:Float,p_jump:Float,30num_tokens:Int;Selected:Node_Prop){3132//temporaryproperties33Node_PropToken,TokenNxt;3435//Initialize36G.
Token=0;37G.
TokenNxt=0;38G.
Selected=False;39LongN=G.
NumNodes()*p_sample;//samplesize4041//Chooseinitialnodesandgivethemtokens42Node_SetS;43While(S.
Size()0){53//Increasesamplesizeatfirsttokenreception54If(!
n.
Selected){55n.
Selected=True;56size++;57}58While(n.
Token>0){//randomlyshootouttokens59n.
Token--;60If((n.
Degree()==0)||(Uniform()N)halt();}}Asimilarstatemachinehastobeimplementedforthevertexclass.
Suchstatemanagementcodegrowsevenlongerifthetargetalgo-rithmiscomposedofmultiplecomputationkernels.
Finally,intuitiveprogrammingisalsoveryvaluableforthesakeofsoftwaremaintenance.
Green-MarlprogramslikeinFigure3areshortandintuitivesothattheycanbeeasilyunderstoodbypeoplewhodidn'toriginallywritetheprogramandthuscanbemaintainedoveralongperiodoftime.
AutomaticApplicationofOptimizationsWhilegeneratingtheGiraphprograms,theGreen-Marlcompilerautomaticallyappliesasetofoptimizationsthat(1)enableexpress-ingcertainfunctionalitieswiththeGiraphAPIor(2)enhancetheperformanceofthegeneratedcode.
Forexample,theTriangleCountingalgorithmrequiresthein-comingneighborsaswellastheoutgoingneighbors(Line23inFigure3).
However,theoriginalPregel(Giraph)APIonlyprovidesoutgoingneighbors.
Inthiscase,thecompilerautomaticallyinsertsextrastatesintheFSMthatcomputeincomingneighbors.
Thatis,atthersttimestepeachvertexsendsitsownIDtoitsoutgoingneighbors,andatthenextstepeachvertexconstructtheincom-ingneighborlistfromthosemessages.
Notethatourcompilerin-sertstheseextrastatesonlyifthegivenprogramrequiresincomingneighbors.
Asanotherexample,forPageRankandRandomWalkSampling,thecompileridentiesthattherearemessagepassingcomputationkernelsinsideawhileloop.
Insuchacase,thecompilerautomat-icallycombinesthemessagereceivingofthelastkernelwiththemessagesendingofrstkernelinthewhile-loop.
Thisoptimiza-tionreducesthenumberoftime-stepsinthegeneratedprogram.
Certainly,theprogrammerscouldhaveappliedthesamepro-grammingtricksmanuallywhentheyhand-codetheGiraphappli-cation.
However,sincethesetechniquesareautomaticallyappliedbyourcompiler,eventheprogrammerswhoarenotawareofthesetechniquescanhavethesamebenets.
FreeofBoilerplateCodeTable1comparesthesizeofGreen-Marlprogramsandhand-writtenGiraphprogramsusingbothLines-of-Codeandnumberofbytes.
Noticeably,GiraphprogramsarealotlongerthanGreen-Marlprograms,eventhoughtheyimplementthesamealgorithm.
MostofthesizedifferencecomesfromboilerplatecoderequiredbytheGiraphAPI.
Forinstance,theprogramhastodeneamessageandvertex/edgedataclass,implementtheirserializationmethods,registeraggregators,declareinputgraphloadersandoutputwrit-ers,andspecifyjobcongurations.
Inourapproach,theGreen-Marlcompilerautomaticallygeneratesallthislengthyboilerplate,therebyprovidingadditionalproductivitybenets.
3.
2TobeImprovedTheLearningCurveGreen-Marl|ManualGiraphAlgorithmLOC#BytesLOC#BytesPageRank195161886633TriangleCounting143001686272Random-WalkSampling53116844416766Table1:ComparisonofLineofCodes(LOC)andnumberofbytesbetweenGreen-MarlandmanualGiraphimplementa-tionofthesamealgorithmsBeingaDSL,Green-Marlrequirestheusertoundergoacertainamountoflearning.
Thelearningcurveisnotunreasonablysteep;thelanguagelookslikejustanotherdescendantofCanddoesn'taddtoomanynewconceptsotherthangraph-specicbuilt-intypesandoperatorsHowever,sinceGreen-Marlisaresearch-levellanguageunderdevelopment,itcertainlylacksadetaileddocumentationorausercommunity,whichhampersthelearningexperiencequiteabit.
Non-Giraph-CompatibleAlgorithmsWhenthegivenalgorithmisaninherentlysequentialone,how-ever,theGreen-MarlcompilercannotturnitintoaGiraphprogram.
ThecompilersimplyreportsanerrorwhenitfailstotransformthegivenprogramintoaPregel-compatibleform(Section2).
Insuchacase,itisuptotheprogrammertocreateanewalgorithmthatiscompatiblewithPregel.
Amongourexamples,theoriginalRandomWalkSamplingal-gorithm[8]isinherentlysequentialandthuscannotbecompiledintoGiraph.
Wecreatedaparallelversionofthisalgorithmbyin-troducingtheconceptoftokens;weletmultipleagents,asmanyasnumTokens,walkthegraphinparallel,whereanyvertexhavingatokenbecomesanagent.
NotethattheresultingGreen-Marlpro-gram(Figure3.
(c))isstillwritteninanimperative,shared-memorystylewithoutanyboilerplatecode.
AlthoughauserinterventionisdisappointingasitdiminishesthebenetoftheDSL,itisinevitableforthosealgorithmsthatareinherentlynotPregel-compatible.
Theuserhastocreateanewal-gorithmwhichcanbetargetedtotheunderlyingcomputationalsys-tem.
Indeed,thetwosamplingalgorithms(theoriginalalgorithmandourparallelizedone)arenotstrictlyequivalent;theoriginalal-gorithmproducesasampleoftheexactsize,whiletheparalleloneproducesasamplethatislarger(bythenumberoftokens)thantherequestedsize.
Currently,wearetakingapracticalapproachtothisissue.
(1)Thecompilerrecognizesfrequent(non-Pregel-compatible)patternsingraphalgorithmsandtransformsthemintoPregel-compatibleonesautomatically.
(2)WedeneasubsetofGreen-Marlprogram-mingpatternsthatareguaranteedtobecompilableintoGiraph.
Therefore,thecompilerpointsoutnon-Pregel-compatibleportionsoftheprogramsothatuserscanre-writetheseportionsusingthePregel-compatibleGreen-Marlsubset;thusthecompilerishelpingtheuserwiththeinevitablealgorithmmodication.
CompilerMaturityConversely,onecanaskaboutthecompletenessofourapproach:caneveryPregel-compatiblealgorithmbecompiledfromitsGreen-MarldescriptionFundamentally,webelievethattheanswerisyes1;however,thecurrentcompilerneedsmoreimprovementasitdoesnotsupportallthefeaturesrequiredforthePregel-compatiblesubset.
Specically,thecurrentcompileronlyprovideslimitedsup-portforcollectiondatatypesandaggregationoncollectionsforGiraphcompilation.
User-deneddatatypesarealsonotsupportedyet.
Nevertheless,thesearenotfundamentalobstaclesbutmostly1wecanmakeamappingfromeverymessage-passingcallinPregeltoawritestatementinGreen-MarlFigure4:RuntimeComparisonBetweenHand-codedandCompiler-GeneratedGiraphImplementations:PR,TCandRWstandsforPagerank,TriangleCountingandRandomWalkSampling,respectivelyengineeringissuesthatcanbeimproveduponovertime.
Overall,webelievethatprogramminginGreen-Marlcanprovideconsiderableproductivitybenetsforgraphanalysisasitmatures.
Easeofprogrammingisespeciallyusefulwhentheuserwantstoexploredifferentgraphalgorithmsinashorttimeandquicklytryoutnewgraphalgorithms,orwhentheywishtocustomizeanex-istinggraphalgorithmfortheirownpurpose.
4.
QUALITYOFTHECOMPILERGENER-ATEDIMPLEMENTATIONInthissection,wediscussthequalityofthecompiler-generatedGiraphimplementations,focusingontheirperformance.
Forthispurpose,weevaluateboththehand-codedandcompiler-generatedGiraphimplementationofthethreeexamplealgorithmsinSec-tion2.
32onanx86clusterandexaminetheirperformancechar-acteristics.
Ourclusteriscomposedof8machines,witheachmachinehav-ing16cores(32HWthreads)and256GBofmemory,runningCen-tOS6.
Weusedthe1.
0.
4releaseofHadoopandasnapshotofGiraph0.
2.
203takenonFeb28th,2013.
AllapplicationswerecompiledwithOraclejdk1.
7.
017andranwiththecorrespond-ingOracleHotSpot64-bitJVM.
Weonlyinstantiate10workerspermachine,i.
e.
80workersintotal;sinceeachworkerprocessalreadyconsistsofafewthreads(forcomputationandcommuni-cation),addingmoreworkerspermachineimpactedperformancenegatively.
Weranthealgorithmsontwopublicgraphinstances;oneisawebgraphinstance(LiveJournal)fromtheSNAPgraphreposi-tory[3],andtheotherisasocialgraphextractedfromTwitteruserrelationships[7].
LiveJournalismoderateinsize(4millionverticesand34millionedges),whileTwitterislarger(40millionverticesand1.
8billionedges).
Figure4depictstheresultingexecutiontimeofeachalgorithmonthesegraphinstanceswhilethey-axisshowstherelativeexecutiontime,i.
e.
normalizedbyexecutiontimeofthehand-codedgiraphapplication.
4.
1CurrentBenetsDecentPerformanceofCompiler-GeneratedImplementationsAscanbeseeninFigure4,thecompiler-generatedGiraphim-plementationsperformfairlyclosetothehand-writtenones.
Eventhoughthecompiler-generatedimplementationswereslowerthanmanualonesforPagerankandTriangleCounting,thedifference2Wemodiedthetrianglecountingalgorithmforthesakeofthisexper-iment,sincethereisafundamentalissuerelatedtotheBSPcomputationmodelforthisalgorithm.
SeeSection4.
2fordetails.
wasonlyabout1015%.
InthecaseofRandomSampling,therewasvirtuallynodifferenceatall.
Thesmalldifferenceistheresultofseveralcompileroptimiza-tionsappliedduringcodegeneration.
Forinstance,thecompilermergesmultiplestatesintooneaslongasthereisnodatadepen-dencybetweenthem,reducingthenumberofglobalbarriers.
Wereferinterestedreaderstoamoredetailedreportthatexplainshowourcompilerworks[6].
Reasonsfortheremainingperformancedifferencewillbediscussedinthenextsubsection.
ThecaseoftheTriangleCountingalgorithmisworthacloserlookasthersthand-codedversionwasactuallyslowerthanthecompiler-generatedone.
Infact,itdidnotevennishfortheTwit-tergraphinstanceinareasonabletime.
Thiswasbecausethepro-grammerused(fromhabit)EdgeListVertexasbaseclass,whichhasasmallmemoryfootprintandprovidesfastiterateoverneigh-bors,butisveryslowwhencheckingwhetherthereisneighborhoodrelationship.
Thismistakewasnotevidentduringtheinitialtestingphaseassmallgraphinstancesaretypicallyused.
NotethatwithDSL-apporach,theuserdoesn'thavetoworryaboutsuchissuesbutcanrelyonthecompilertohandlethem.
ReadabilityofGeneratedCodeWeremindthereadersthatthecompiler-generatedcodeisjustaplainJavaprogramthatusestheGiraphAPI.
Infact,thegeneratedcodeisfairlyreadable.
Thecodeisappropriatelyindentedandvari-ablenamesintheoriginalGreen-Marlprogramsarepreservedasmuchaspossible.
Moreover,eachgeneratedJavacodeblockcon-tainsthecommentsoftheoriginalGreen-MarlsourcefromwhichtheJavacodeisgenerated.
Forpracticalreasons,westrivetoensurethattheprogrammerisabletoreadthegeneratedcodeandeventoeditthegeneratedcode;forinstanceifGiraphintroducesanewfeatureorAPI,theprogrammercanmanuallyapplyhot-xestothegeneratedcode,beforethecompilergetsupdated.
4.
2ToBeImprovedSub-optimalPerformanceStill,thereisacertainperformancedifferencebetweenthehand-codedGiraphimplementationsandthecompiler-generatedones,forPageRankandTriangleCountingalgorithminFigure4.
Inbothcases,theperformanceoverheadcamefromthemoregenericap-proachtakenbythecompiler.
ForthecaseofPageRank,thecompilerisnotutilizingthemes-sagecombinerAPIatall,sinceingeneraltherecanbemultiplemessagetypesthatareused(eveninsideonekernel),whiletheGi-raphAPIonlyallowsonecombinerclasswhichwillbeappliedtoeverymessage.
However,itispossibletoextendthecompilersuchthatitcanhandlespecialcasesdifferently.
Inthisparticularcase,thecompliercanrecognizethatthereisonlyonetypeofmessagebeingusedforsummingvaluesatthedestination,andthusitcanusethesummationcombiner.
ForthecaseofTriangleCounting,themanualimplementationusesHashMapVertex,whichhasalargememoryfootprint,isslowforiteration,butfastincheckingneighborhoodrelationships.
Ontheotherhand,thecompiler-generatedimplementation,asatrade-off,usesEdgeListVertexaccompaniedwithamemory-efcientsidestructure(i.
e.
asortedprimitivearray),whichisgoodforbothiterationandneighborhoodchecking.
Again,thecompilercanrecognizethatthegivenprogramisaspecialcasewhichhasonlyonekernelthatdoesnothingbutneighborhoodchecking,anditcandecidetouseHashMapVertexinstead.
Fundamentally,theGreen-MarlDSLexposesenoughse-manticinformationforthecompilertheapplythesespecializationtechniques.
LimitationoftheTargetRuntimeSystemTheGreen-MarlgeneratedimplementationsarestillboundbythefundamentallimitationsoftheunderlyingGiraphruntime.
Forinstance,aGiraphimplementationoftheoriginalTriangleCount-ingalgorithminFigure3whenusedwiththeTwittergraphin-stance,(orevenwiththeLiveJournalinstance),simplyfailstoexe-cute.
Thisisbecausethesegraphsshowahighlyskeweddegreedistri-butionsuchthatthereexistasmallnumberofhigh-degreenodes,i.
e.
nodesthathavemillionsofneighbors.
NotethattheGiraphim-plementationoftheoriginalalgorithmproducesO(degree2)num-berofmessagespereachvertex.
Forverticeswithamillionsofneighbors,thisnumberreachesthetrillions.
Workerprocessesrunoutofmemory,sinceGiraphkeepsthesemessagesin-memoryun-tilthenexttime-stepbegins.
Inourexperiment,weonlycountedtrianglesthatoriginatedfromsmalldegreenodes(i.
e.
smallerthan100)sothatwecanstillevaluateourapproach.
Itwouldalsobepossible,though,forthecompilertowarntheusersinadvance,ifitseesapatternthatmightresultinalow-performingimplementationduetomessagecountexplosion.
5.
OTHERBENEFITSANDISSUESDebuggingNewAlgorithmsGreen-Marlcanprovideadditionalbenetsfordebuggingnewgraphalgorithms.
Firstofall,webelievethatprogrammingusingGreen-Marlwouldbelesserror-pronethandirectlyprogrammingGiraph,sinceitismoreintuitiveandmoresuccinct.
Moreover,sincetheGreen-Marlcompilercangenerateafastrunning(parallel)shared-memoryC++programfromthesamein-put[5],thealgorithmdevelopercantesttheiralgorithminafastwaywithsmall-sizedgraphsonasinglemachine,withouthavingtouseaHadoopclusteratall.
Thisallowsformuchshorterturn-aroundtimeintestinganewalgorithm.
However,theusermaystillsuspectthatthecompilerinjectser-rorsincodegeneration.
Currently,weallowtheuserstoinspectthegeneratedcodesinceitisveryreadable.
Thecompilercanalsobeaskedtoprintthecodeateachintermediatecompilationstep.
FutureBenetsforPortabilityThiseffortfocusedonthetranslationofabstractGreen-Marlpro-gramsintoonespecicanalysisframework:Giraph.
However,Green-MarlisnotdenedonlyfortheGiraphframework,butforgeneralgraphalgorithms.
ThereforeonecanimagineaGreen-MarlcompilerthattranslatesthesameGreen-Marlprogramintootherdifferentgraphprocessingframeworks(e.
g.
GraphLab[9]andTrinity[16])aswellasGiraph.
Sincethesesystemsgenerallyshowdifferentperformancecharac-teristicsdependingonthealgorithmorsize/shapeofinputgraph,usersmaychoosetheappropriateruntimefortheirpurpose.
Indeed,asofnow,theusercangenerateashared-memoryimple-mentationandGiraphimplementationfromthesameGreen-Marlprogram.
Ifthesizeofthetargetgraphtsinasinglemachine'smemoryspace,onecanusetheshared-memoryimplementationwhichisgenerallyfasterthanGiraphexecutionasitdoesnotsufferfromcommunicationcost.
SystemIntegrationAnotherpracticalissueishowtointegratetheGreen-MarlDSLwiththerestofacompletedataanalysissystem,inwhichgraphanalysisisonlyonephaseintheanalysisow.
Acompletedataanalysissystemmayincludepersistentstorageofgraphinstancesorthematerializationofatemporarygraphoutofrawdata.
Itmayalsoincludeotheroff-lineanalysisengines(e.
g.
Hadoopjobsforlteringoutuninterestingdata)aswellasaconnectionmechanismtoon-linesystemsthatmakeuseoftheresultofoff-lineanalysis.
Currently,ourcompiler-generatedprograms(bothGiraphandshared-memoryapplications)loaddataleseitherfromthelocallesystemorfromHDFSassumingthatthegraphisavailableasoneofafewpopularformatssuchasanadjacencylist.
Supportingmoreformatsoruser-denedloaderswouldmakeitmoreexible.
Ontheotherhand,wecanalsothinkofextendingtheDSLsuchthatitcandeclareinputsandoutputsofagraphanalysisinamoreabstractanddeclarativeway.
Forinstance,theinputcanbeloadedfromadatale,databaseorevendirectlypipelinedfromotheranal-ysis.
SimilarapproachesofothersuccessfulDSLS(e.
g.
PigandLINQ)canbeappliedforextendingGreen-Marlaswell.
6.
CONCLUSIONThispaperreportedonourearlyexperiencewithusingGreen-Marlasafront-endlanguageforlarge-scalegraphanalysis,bycompilingGreen-MarlprogramsintoGiraphapplications.
Weob-servedthatGreen-Marlprogramsareconciseandintuitivewhiletheperformanceofthecompiler-generatedcodecloselymatcheshand-tunedapplications.
Overall,wethinkthisapproachcouldprovideconsiderableproductivitybenetsasthecompilermatures.
AcknowledgementsWeappreciateSamShah,RoshanSumbalyandEvionKimatLinkedInfortheircollaborationinthisstudy.
7.
REFERENCES[1]ApacheGiraphProject.
http://giraph.
apache.
org.
[2]Green-MarlDSL.
http://github.
com/stanford-ppl/Green-Marl.
[3]Stanfordnetworkanalysislibrary.
http://snap.
stanford.
edu/snap.
[4]ApacheHadoop.
http://hadoop.
apache.
org/.
[5]S.
Hong,H.
Cha,E.
Sedlar,andK.
Olukotun.
Green-Marl:ADSLforEasyandFfcientGraphAnalysis.
InASPLOS,2012.
[6]S.
Hong,S.
Salihoglu,J.
Widom,andK.
Olukotun.
TechReport:CompilingGreen-MarlintoGPS.
http://ppl.
stanford.
edu/papers/tr_gm_gps.
pdf.
[7]H.
Kwak,C.
Lee,H.
Park,andS.
Moon.
WhatisTwitter,asocialnetworkoranewsmediaWWW'10,2010.
[8]J.
LeskovecandC.
Faloutsos.
Samplingfromlargegraphs.
InKDD,2006.
[9]Y.
Low,D.
Bickson,J.
Gonzalez,C.
Guestrin,A.
Kyrola,andJ.
Hellerstein.
DistributedGraphLab:AFrameworkforMachineLearningandDataMiningintheCloud.
ProceedingsoftheVLDBEndowment,5(8):716–727,2012.
[10]G.
Malewicz,M.
H.
Austern,A.
J.
Bik,J.
C.
Dehnert,I.
Horn,N.
Leiser,andG.
Czajkowski.
Pregel:ASystemforLarge-scaleGraphProcessing.
InSIGMOD'10.
ACM.
[11]C.
Olston,B.
Reed,U.
Srivastava,R.
Kumar,andA.
Tomkins.
PigLatin:ANot-so-foreignLanguageforDataProcessing.
InSIGMOD.
ACM,2008.
[12]L.
Page.
Methodfornoderankinginalinkeddatabase,Sept.
42001.
USPatent6,285,999.
[13]S.
SalihogluandJ.
Widom.
GPS:GraphProcessingSystem.
http://infolab.
stanford.
edu/gps.
[14]S.
SuriandS.
Vassilvitskii.
Countingtrianglesandthecurseofthelastreducer.
InWWW.
ACM,2011.
[15]A.
Thusoo,J.
Sarma,N.
Jain,Z.
Shao,P.
Chakka,S.
Anthony,H.
Liu,P.
Wyckoff,andR.
Murthy.
Hive:AWarehousingSolutionOveraMap-ReduceFramework.
ProceedingsoftheVLDBEndowment,2(2):1626–1629,2009.
[16]Trinity.
http://research.
microsoft.
com/en-us/projects/trinity/default.
aspx.
[17]L.
Valiant.
Abridgingmodelforparallelcomputation.
CommunicationsoftheACM,33(8):103–111,1990.

易探云香港云服务器价格多少钱1个月/1年?

易探云怎么样?易探云是目前国内少数优质的香港云服务器服务商家,目前推出多个香港机房的香港云服务器,有新界、九龙、沙田、葵湾等机房,还提供CN2、BGP及CN2三网直连香港云服务器。近年来,许多企业外贸出海会选择香港云服务器来部署自己的外贸网站,使得越来越多的用户会选择易探云作为网站服务提供平台。今天,云服务器网(yuntue.com)小编来谈谈易探云和易探云服务器怎么样?具体香港云服务器多少钱1个...

Linode十八周年及未来展望

这两天Linode发布了十八周年的博文和邮件,回顾了过去取得的成绩和对未来的展望。作为一家运营18年的VPS主机商,Linode无疑是有一些可取之处的,商家提供基于KVM架构的VPS主机,支持随时删除(按小时计费),可选包括美国、英国、新加坡、日本、印度、加拿大、德国等全球十多个数据中心,所有机器提供高出入网带宽,最低仅$5/月($0.0075/小时)。This month marks Linod...

标准互联(450元)襄阳电信100G防御服务器 10M独立带宽

目前在标准互联这边有两台香港云服务器产品,这不看到有通知到期提醒才关注到。平时我还是很少去登录这个服务商的,这个服务商最近一年的促销信息比较少,这个和他们的运营策略有关系。已经从开始的倾向低价和个人用户云服务器市场,开始转型到中高端个人和企业用户的独立服务器。在这篇文章中,有看到标准互联有推出襄阳电信高防服务器100GB防御。有三款促销方案我们有需要可以看看。我们看看几款方案配置。型号内存硬盘IP...

targetframework为你推荐
伪装微信地理位置微信地理位置伪装软件怎么定位到微信万网核心代理万网代理商?中国万网认证核心分销商?渗透测试渗透测试的专业服务公章制作如何用photoshop制作公章伪静态什么是伪静态伪静态有何作用ps抠图技巧ps抠图多种技巧,越详细越好,急~~~~~~~虚拟机软件下载谁有虚拟机软件的网址要好用的服务器连接异常手机服务器连接异常网站地图制作我想给网站做网站地图不知道怎么做的,请教高手!域名库想自己买一个域名,然后自己做一个网站,挂上去。请问基本流程是什么样的?
国内vps 网站域名备案 工信部域名备案系统 云网数据 免费主机 香港主机 美国主机论坛 申请空间 网通代理服务器 亚洲小于500m 网通ip 佛山高防服务器 上海服务器 国外ip加速器 创建邮箱 lamp兄弟连 阿里云手机官网 重庆联通服务器托管 windowsserver2012r2 web服务器有哪些 更多