RelationalApproachforShortestPathDiscoveryoverLargeGraphsJunGaoRuomingJin§JiashuaiZhouJeffreyXuYuXiaoJiangTengjiaoWangKeyLaboratoryofHighCondenceSoftwareTechnologies,EECS,PekingUniversity§DepartmentofComputerScience,KentStateUniversityDepartmentofSystemsEngineering&EngineeringManagement,ChineseUniversityofHongKong{gaojun,zjiash,jiangxiao,tjwang}@pku.
edu.
cn,jin@cs.
kent.
edu,yu@se.
cuhk.
edu.
hkABSTRACTWiththerapidgrowthoflargegraphs,wecannotassumethatgraphscanstillbefullyloadedintomemory,thusthedisk-basedgraphoperationisinevitable.
Inthispaper,wetaketheshortestpathdiscoveryasanexampletoinvestigatethetechniqueissueswhenleveragingexistinginfrastructureofrelationaldatabase(RDB)inthegraphdatamanagement.
Basedontheobservationthatavarietyofgraphsearchqueriescanbeimplementedbyiterativeoperationsincludingselectingfron-tiernodesfromvisitednodes,makingexpansionfromtheselectedfrontiernodes,andmergingtheexpandednodesintothevisitedones,weintroducearelationalFEMframeworkwiththreecorre-spondingoperatorstoimplementgraphsearchtasksintheRDBcontext.
WeshownewfeaturessuchaswindowfunctionandmergestatementintroducedbyrecentSQLstandardscannotonlysim-plifytheexpressionbutalsoimprovetheperformanceoftheFEMframework.
Inaddition,weproposetwooptimizationstrategiesspecictoshortestpathdiscoveryinsidetheFEMframework.
First,wetakeabi-directionalsetDijkstra'salgorithminthepathnding.
Thebi-directionalstrategycanreducethesearchspace,andsetDi-jkstra'salgorithmndstheshortestpathinaset-at-a-timefashion.
Second,weintroduceanindexnamedSegTabletopreservethelo-calshortestsegments,andexploitSegTabletofurtherimprovetheperformance.
Thenalextensiveexperimentalresultsillustrateourrelationalapproachwiththeoptimizationstrategiesachieveshighscalabilityandperformance.
1.
INTRODUCTIONTherapidgrowthofgraphdataraisessignicantchallengestographmanagement.
Nowadays,thegraphdataareusedinnu-merousapplications,e.
g.
,webgraphs,socialnetworks,ontologygraphs,transportationnetworks.
Thesegraphsarealwaysexceed-inglylargeandkeepgrowingatafastrate.
Whenthegraphcannottintothememory,theexistingin-memorygraphoperationshavetobere-examined.
Weneedthebuffermechanismtoloadgraphblocksondemand,inwhichcasetheI/Ocostbecomesthekeyfac-torintheevaluationcost.
Wealsorequireexibleapproachestoaccessnodesandedges.
Moreover,thestabilityofgraphmanage-mentsystemisanotherconcern.
Graphsearchisanimportantandprimitiverequirementonthegraphmanagement,whichseeksasub-graph(s)meetingthespe-cicpurposes,suchastheshortestpathbetweentwonodes[12],theminimalspanningtree[20],thepathfortravelingsalesman[9],andthelike.
Thispaperfocusesontheshortestpathdiscoveryproblemduetotworeasons.
First,theshortestpathdiscoveryplaysakeyroleinmanyapplications.
Forexample,theshortestpathdiscov-eryinasocialnetworkbetweentwoindividualsrevealshowtheirrelationshipisbuilt[21].
Second,theshortestpathdiscoveryisarepresentativegraphsearchquery,whichhasasimilarevaluationpatterntootherqueries[20,9].
Thecurrentdisk-basedmethodsfacelimitationstosupportthegeneralgraphsearchqueries.
Theindexontheexternalmem-oryforshortestpathdiscoveryhasbeendesigned,butthemethodisrestrictedtoplanargraphs[8].
WenoticethattheMapReduceframework[16,3]anditsopensourceimplementationHadoop[1]canprocesslargegraphsstoredinthedistributedlesystemoveraclusterofcomputers.
However,duetothelackofschemaandin-dexmechanism,itisdifculttoaccessgraphsinaexibleway.
Inaddition,itisexpensivetosupportthedynamicchangesoftheorig-inalgraph,atleastinthecurrentHadoopdistributedlesystem[1].
Someothergraphoperations,suchasminimum-cut[5]andcliquecomputation[15],canbeevaluatedonthepartiallyloadedgraphwhenthequalityofresultsisassuredtheoreticallyortheapproxi-materesultsareallowed[15,5].
However,itisdifculttoextendthesemethodstoothergraphoperations.
RDBprovidesapromisinginfrastructuretomanagelargegraphs.
Aftermorethan40years'development,RDBismatureenoughandplaysakeyroleintheinformationsystem.
WecannoticethatRDBandthegraphdatamanagementhavemanyoverlappingfunctional-ities,includingthedatastorage,thedatabuffer,theaccessmethods,andthelike.
Indeed,RDBprovidesabasicsupporttothegraphstorageandtheexibleaccesstothegraph.
Inaddition,RDBhasalreadyshownitsexibilityinmanagingothercomplexdatatypesandsupportingnovelapplications.
Forexample,RDBcansupporttheXMLdatamanagement[17,13],andcanbeusedinreachabil-ityqueryandbreadth-rst-search(BFS)inthegraphmanagement[23,22].
ThestatisticaldataanalysisanddataminingoverRDBhavealsobeenstudiedin[6,7].
However,itisachallengingtasktotranslatethegraphsearchmethodsandimprovetheperformanceintheRDBcontextduetothesemanticmismatchbetweenrelationaloperationandgraphop-eration.
Thedataprocessinginthegraphsearchisalwayscomplex.
Wemayneeddologicandarithmeticcomputation,makechoicesbasedonaggregatevalues,andrecordnecessaryinformationinthesearching[12,20,9].
Atthesametime,thegraphandrunningtime358datahavetobestoredinthetablesinRDBrst,andonlyrestrictedoperations,including,projection,selection,join,aggregation,etc,areallowedtoperformontables[14].
Inthispaper,weinvestigatethetechniquesusedintheshortestpathdiscoveryintheRDBcontext.
Wewishourmethodcannotonlydeliverbenetstotheshortestpathcomputation,butalsoshedlightsontheSQLimplementationforothergraphsearchqueriesandgraphanalysis.
Specically,thecontributionsofthispapercanbesummarizedasfollows:Basedontheobservationthatmanygraphsearchqueriescanbeevaluatedbyiterativeoperationsincludingselectingfron-tiernodesfromvisitednodes,makingexpansionfromthefrontiernodes,andmergingtheexpandednodesintothevis-itednodes,weintroduceagenericgraphprocessingframe-workFEMwiththreekeyoperatorsF,EandM-operator.
Wendnewfeatures,suchaswindowfunctionandmergestatementintroducedbyrecentSQLstandardscanbeusedtosimplifytheexpressionandimprovetheperformance.
WealsoshowhowtheshortestpathisdiscoveredusingtheFEMframework.
(Section3)WeproposetwooptimizationsspecictotheshortestpathdiscoveryinsidetheFEMframework.
First,wetakeabi-directionalsetDijkstra'smethod.
Thebi-directionalsearch-ingcanndthepathwithlesssearchspace.
SetDijkstra'smethodallowsallnodeswiththesameminimaldistancetobeexpandedinoneoperation,whichismoresuitableinRDBcontext.
Second,weintroduceanindexnamedSegTabletopreservelocalshortestsegments,andexploitSegTabletofurtherimprovetheperformancebybalancingreductionofsearchspaceandpromotionofset-at-a-timequeryevalua-tion.
(Section4)Weconductextensiveexperimentsoversyntheticandreal-lifedata.
Theresultsshowthatourrelationalapproachhashighscalabilityinhandlinglargegraphs.
WealsoseethatthenewfeaturesofSQLstandard,aswellasouroptimizationscanimprovetheperformancesignicantly.
(Section5)2.
PRELIMINARYInthissection,werstgivetherelatednotationsandrelationalrepresentationofgraph,andthenshownewSQLfeaturesusedinourpathndingmethod.
2.
1GraphNotationsThispaperstudiesweighted(directedorundirected)graphs.
LetG=(V,E)beagraph,whereVisanodesetandEisanedgeset.
Eachnodev∈Vhasauniquenodeidentifer.
Eachedgee∈Eisrepresentedbye=(u,v),u,v∈V.
Foranedgee=(u,v),eisu′soutgoingedgeandv′sincomingedge.
Eachedge(u,v)∈Ehasitsnon-negativeweightw(u,v).
Theminimaledgeweightinagraphisdenotedbywmin.
Apathp=u0;uxfromu0touxisasequenceofedges(u0,u1),(u1,u2)ux1,ux),whereei=(ui,ui+1)∈E(0≤iAggregation>Over([PartitionBy[OrderBy])2:MergestatementMergeIntotablenameUsingtableOn(condition)WhenMatchedThenUpdateSetcol1=val1[,col2=val2.
.
.
]WhenNotMatchedThenInsert(col1[,col2Values(val1[,val2Thewindowfunction1,introducedbySQL2003,returnsag-gregateresults.
Comparedtotraditionalaggregatefunctionswhichreturnonevalueforonetupleset,windowfunctionobtainstheag-gregateresultforeachtupleintheset,andthenthenon-aggregateattributesareallowedtobealongwiththeaggregateonesintheselectclause,eveniftheyarenotinthegroupbyclause.
Inaddi-tion,windowfunctionsupportstheaggregatefunctionsrelatedtothetuplepositioninasortedtuplesub-set,suchasrank,rownum,andthelike.
ThesyntaxforwindowfunctioncanbedescribedinListing1(1).
Themergestatement2addsnewtuplesorupdatestheexistingonesinatargettablefromasource(SQLresults,avieworatable).
ItisofciallyintroducedbySQL2008,butissupportedearlierbydifferentdatabasevendorsduetotheneedinloadingdataintodatawarehouse.
Themainpartofmergestatementspeciesactions,likeinsert,deleteandupdateunderdifferentrelationshipsbetweenthesourceandtargettable.
Comparedtothegeneral-purposeoneup-datestatementfollowedbyaninsertstatement,themergestatementismorespecic,whichindicatesthatitcanbeevaluatedfasterthantwostatements.
ThesyntaxformergestatementcanbereferredinListing1(2).
3.
RELATIONALFEMFRAMEWORKFORSHORTESTPATHDISCOVERYInthissection,weintroduceagenericgraphsearchframeworkFEMandleverageittorealizetheclassicalDijkstra'sshortestpathdiscoveryalgorithmonRDB.
Inthenextsection,wewillexploreseveralnewtechniquestospeeduptheshortestpathdiscoveryun-dertherelationalFEMframework.
1http://en.
wikipedia.
org/wiki/SQL:20032http://en.
wikipedia.
org/wiki/Merge(SQL)3593.
1AGenericProcessingFrameworkforGraphSearchGraphsearchqueriesseekthesub-graph(s)meetingthespecicrequirements.
Forexample,reachabilityqueryanswerswhetherthereexistsapathbetweentwogivennodes[11].
Theshortestpathquerylocatestheshortestpathbetweentwogivennodes[19,2].
Theminimalspanningqueryreturnsatreecoveringallnodeswiththeminimalsumofedgeweights[20].
Thesalesmanpathquerygetsashortestpossibletourthatvisitseachcityexactlyonce[9].
Thegraphpatternmatchretrievesallsub-graphs,eachofwhichisisomorphictothegivengraphpattern[25].
Manygraphsearchalgorithmsshowacommonpattern.
Duetothelargesearchspace,theyalwaysutilizegreedyideas.
Inaddi-tion,weobservethatmostofthesegreedyalgorithmscantintoagenericiterativeprocessingstructure.
AvisitednodesetA1isinitializedrstaccordingtodifferentpurposes.
Thenaniterativesearchingstarts.
WeillustratethekthiterationinFigure2.
LetthevisitednodesAkrecordallthenodesencounteredinthegraphsearchsofar;thefrontiernodesFkareselectedfromthevisitednodeswiththecertaincriteria(applicationdependent,FkAk);theexpandednodesEkarethenextvisitednodesfromthefrontiernodes(generallyneighboringsubsetofFk);andthenextvisitedonesAk+1areobtainedbymergingthenewlyexpandednodesEkandAk.
Theiterationscontinueuntilthetargetsub-graph(s)canbediscovered.
WerefertosuchagenericprocessingstructureasaFEMframework.
FSelectExpandEdgesEk+1kkMergeVisitednodesAExpandednodesAkVisitednodesAkVisitednodesFrontiernodesFigure2:ConversionBetweenNodesinthekthIterationBeforeweproceedtothedetailsofimplementingtherelationaloperatorsforFEMframework,werstexploitittobrieydescribethreerepresentativegraphsearchalgorithms:Dijkstra'sshortestpathalgorithm,Prim'sminimalspanningtreealgorithm,andgraphpatternmatching.
IntheDijkstra'salgorithmforshortestpathdis-covery[12]usingtheFEMframework,eachnodeisannotatedwithd2storecordthedistancefromthesourcenodes,andaagfin-dicatingwhetherthenodeisnalizedornot.
Weinitiallyaddthesourcenodesintovisitednodeswiths.
d2s=0ands.
f=false.
Wethenstartaniterativepathexpansionforthetargetnode.
Ineachiteration,weselectanon-nalizednodeu(u.
f=false)fromallvisitednodeswiththeminimald2sasafrontiernode,nalizeu,expandu(visitingitsneighbors),andmergethenewlyexpandednodesintothevisitednodes.
AnothercaseistoconstructaminimalspanningtreeTbyPrim'salgorithm[20].
Eachnodeuisrepresentedbyu(p2s,w,f).
Herep2sistheparentofu,wistheedgeweighfromutop2s,ifuisinT.
fisaagforwhetheruisinT.
Thevisitednodesareinitializedbyanynodeuwithu.
f=falseandu.
w=0.
IneachiterationinbuildingT,weselectanodeuwithu.
f=falseandtheminimaledgeweight,adduintoTbychangingu.
f=true,makefurtherexpansionfromu,andmergetheexpandednodesintothevisitednodes.
Inthemergeoperation,theexpandednodescanbediscardeddirectlyiftheyhavebeenincluded(f=true)inT.
TheiterationsrepeatuntilallnodeshavebeeninT.
Graphpatternmatching[25]isamorecomplexcase.
Letq=(Vq,Eq)beagraphpatternoveragraphG=(V,E).
EachquerynodeinVqanddatanodeinVhavelabels.
Initially,westartfromanyquerynodeq0u,andobtainthevisitednodes{(d0u)|d0u∈Vandq0u∈Vqhavethesamelabel}.
WethenhandletheotherquerynodesinVqiteratively.
Duringprocessingthequerynodeqku,letthevisitednodes{(d0u,dk1u)|d0u∈V,dk1u∈Vhavethesamelabelstoq0u∈Vq,qk1u∈Vqcorrespondingly}.
Weexpandthevisitednodesinto{(d0u,dku)|d0u∈V,dku∈Vhavethesamelabelstoq0u∈Vq,qku∈Vqcorrespondingly,andtherelationshipsamongd0u,dk1uanddkumeetthere-quirementsamongq0u,qk1uandqku}.
Afterallquerynodeshavebeenhandled,eachelementinthevisitednodesetrepresentsasub-graphmeetingtherequirement.
Notethatthereareotheroperationsbesidesthethreebasicopera-tions(select,expand,andmerge)inthegraphsearch.
Forexample,therecoveryoftheshortestpathortheminimalspanningtree,andtheterminationdetection,arealsoneededintheFEMframework.
However,theseoperationsaregenerallyauxiliaryundertheFEMframeworkandtheircomputationalcostsarequiteminimalcom-paredtothethreemainoperations.
Theexplorationontheshortestpathsearchinthereminderofthepaperwillillustratethedetailsoftheseadditionaloperations,andalsodemonstratethekeyfunction-alityofthethreeoperations.
3.
2RelationalFEMOperatorsforShortestPathDiscoveryTorealizeagraphsearchqueryusingFEMframeworkonrela-tionaldatabase,weneedthreeoperatorswhichcanbeexpressedbyrelationalalgebra:i)F-operatortoselectfrontiernodesfromthevisitednodes;ii)E-operatortoexpandthefrontiernodes;andiii)M-operatortomergethenewlyexpandednodesintothevisitedones.
Asdiscussedabove,theattributesonnodesandtheoper-ationsmaybedifferentfordifferentgraphsearchtasksundertheFEMframework.
Inthereminderofthepaper,wewillfocusonefcientlyrealizingtheshortestpathdiscoveryunderFEMframe-work.
ThenewtechniqueswedevelopedforshortestpathsintheFEMframeworkcanbeingeneralappliedorextendedtodealwithothergraphsearchalgorithms.
Now,westartwithDijkstra'salgorithmfortheshortestpathdis-coveryasanexampletodescribeitsF,EandM-operator.
ThevisitednodesfortheshortestpathdiscoverycanberepresentedbyAk,wherekisthenumberofiterations.
Letsbethesourcenode.
Foreachnodeu∈Ak,uisrepresentedby(nid,d2s,p2s,f),wherenidisforthenodeidentier,d2sisforthedistancefromstou,p2sisforthepredecessornodeofuinthepathfromstou,andfforthesignindicatingwhetheruisnalizedornot.
TaketheshortestpathdiscoveryfromstotinthegraphinFig-ure1asanexample.
WeaddstoA1rst.
d,c,bwillbeaddedintoA2next.
Figure3showsthevisitednodes,frontiernodes,andexpandednodesinthe2-nditerationoftheshortestpathdiscovery.
d2snidp2s0ss6ds1cs2bsf1000Visited2d2snidp2s1csFrontier2d2snidp2s2dc4ecExpanded2d2snidp2s0ss2dc1cs2bsf1010Visited3F4ec0EMTEdgesVisited2f00f1Figure3:F,EandM-operatorinthe2-ndIterationBelowweformallydescribetheseoperatorsbasedontherela-tionalalgebra.
DEFINITION1.
F-operatorreturnsfrontiernodesFkfromvis-itednodesAkinthekthexpansion.
Fk←σnid=midAk.
360InDijkstra'salgorithm,weselectthenodewiththeminimald2samongallnon-nalizednodes,andassignitsidentiertomid.
TakeFigure3asanexample,midisfornodecwiththemini-mald2s=1.
ThecomputationofmidcanbedonebyanauxiliaryoperationbeforeF-operator.
Inordertoenhancetheutilityofourframework,weassumethattheremaybemultiplefrontiernodesaftertheF-operator.
Forexample,theoptimizationstrategyusedinthenextsectionmayproducemultiplefrontiernodeswiththerevisedpredicateintheF-operator.
AftertheF-operator,weuseanotherauxiliaryoperationtoadjustf=1forthenodeidentiedbymid.
DEFINITION2.
E-operatorreturnstheexpandednodesEkbasedonfrontiernodesFkandTEdgestableinthekthexpansion.
(1)minCost(x,c)←xGmin(d2s+w)Π(x,d2s,w)(Fk(r,d2s,p2s,f)TEdges(r,x,w));(2)Ek←Π(x,d2s+w,r,0)σc=d2s+wminCost(x,c)Fk(r,d2s,p2s,f)TEdges(r,x,w);The(r,x,w)inTEdgestableisfortheedgefromrtoxwiththeweightw.
IntheE-operator,minCost(x,c)preservesthenewlyvisitednodeswiththeminimaldistances.
Sincetheexpandednodesmaybereachedbydifferentpaths,andonlytheoneswiththeminimaldistance(d2s+w)areneeded,weuseanaggregatefunctiontondthem.
NotethatminCost(x,c)containsthenewlyexpandednodesalongwiththeircosts,butlackstheirparentsp2s,whicharerequiredintherecoveryofthefullpath.
However,wecannotsimplyputthenon-aggregateattributep2sintotheselectclauseinminCost(x,c),duetotheconstraintontheaggregationfunctioninrelationalalgebra.
Thus,anotherjoinoperationisrequiredtondtheparentnodep2sfromTEdgesunderwhichtheminimaldistancecanbeachieved.
TakeFigure3asanexample,wemaketheexpansionfromthefrontiernodes{c},andgetthenewlyex-pandednodesdande.
DEFINITION3.
M-operatorreturnsvisitednodesAk+1basedontheexpandednodesEkandexistingvisitednodesAk.
(1)Ak←AkΠx,d2s1,p2s1,f1(σd2s0d2s1(Ek(x,d2s0,p2s0,f0)Ak(x,d2s1,p2s1,f1));(3)Ak+1←Ak∪Ek;IntheM-operator,weremovethevisitednodesfromAk,whosedistancesarelargerthanthoseofthecorrespondingnewlyexpandednodesinEk.
ThenweremovethenewlyexpandednodesfromEk,whosedistancesarelargerthanthoseofthecorrespondingnodesinAk.
Finally,weunionAkandEktogetthenextvisitednodesetAk+1.
StilluseFigure3asanexample,nodedcanbereplacedbythenewlyexpandedone,andnodeeisnewlyaddedintothevisitednodes.
3.
3SQLImplementationfortheOperatorsNow,wediscusstheSQLimplementationfortheseoperators.
WeuseatableTVisitedtostoreallvisitednodes.
Theattributesnid,d2s,p2s,finTVisitedhavethesamemeaningsabove.
SinceF,EandM-operatorarebasedonrelationalalgebra,wecanuseSQLtoexpressthem.
However,thedirecttranslationwillresultinapoorperformance,especiallyforEandM-operator.
Forex-ample,E-operatorisimplementedbyanaggregatefunctionoverthejoinresultswithgroupbyclause.
Inaddition,thelocationoftheparentnodep2sofeachexpandednodestillneedsanotherjoinoperation.
Moreover,whentherearemultiplepathswiththesameminimald2stotheexpandednodex,wehavetokeeponlyoneduetotheprimarykeyconstraintonnidinTVisitedtable.
AsforM-operator,wehavetwodifferentactionsaccordingtowhethertheexpandednodeshavebeeninthevisitednodesornot.
IntheSQLimplementation,wemightuseanupdatestatementfollowedbyaninsertstatementwithanotexistssub-query.
Suchakindofexpressionisnotonlyverbosebutalsoinefcient.
Listing2:SQLinPathFinding1://InitializeTVisitedwithsourcenodeInsertintoTVisited(nid,d2s,p2s,f)values(s,0,s,0);2://LocatethenextnodetobeexpandedSelecttop1nidfromTVisitedwheref=0andd2s=(selectmin(d2s)fromTVisitedwheref=0);3://Eoperatorinthekthforwardexpansioncreateviewekasselectnid,p2s,costfrom(selectout.
tid,out.
fid,out.
cost+q.
d2s,row_number()over(partitionbyout.
tidorderbyout.
cost+q.
d2sasc)asrownumfromTVisitedq,TEdgesoutwhereq.
nid=out.
fidandq.
nid=mid)tmp(nid,p2s,cost,rownum)whererownum=14://MoperatorinthekthforwardexpansionMergeTVisitedastargetusingekassourceonsource.
nid=target.
nidwhenmatchedandtarget.
d2s>source.
costthenupdatesetd2s=source.
cost,p2s=source.
p2s,f=0whennotmatchedbytargettheninsert(nid,d2s,p2s,f)values(source.
nid,source.
cost,source.
p2s,0);Fortunately,wendthenewfeaturesincludingwindowfunctionandmergestatementintroducedbyrecentSQLstandardscansim-plifytheexpressionaswellasimprovetheperformance.
Windowfunctioncanreturnnon-aggregateattributesalongwithaggregateresultsforthesametuple.
Asforourcase,wecanpartitionalloc-currencesofexpandednodesbythesameidentifernid,sortthemwiththeird2sandselectthetuplewiththeminimald2sbyusinganaggregatefunctionrownumber.
Weseethatthewindowfunc-tioncanavoidanotherextrajoinoperationtolocatetheparentnodep2softhecurrentlyexpandedone.
Italsocanhandlethecasewhenthenodecanbereachedbymultiplepathswiththesameminimaldistances.
AsfortheM-operator,weuseonemergestatementtocombinetwoseparatedinsertandupdatestatements.
TheSQLstatementusedinthepathexpansionisillustratedinListing2.
The1-ststatementaddsthesourcenodesintoTVisitednodesinitially,wherefissetto0(non-nalized).
TheE-operatorcanbeexpressedbythe3-rdstatement.
Itmakesajoinoperationbetweenthefrontiernodesspeciedbynid=mid(F-operator)andTEdgestable,wheremidistheidentiferoftheto-be-nalizednodewhichisdiscoveredbythe2-ndstatement.
Thetupleswiththeminimald2s+camongmultipleoccurrencesforthesamenodecanbelocatedbyawindowfunctionrownumber=1overthesortedtuples.
Inthe4-thSQLstatementforM-operator,weuseonestatementtomergetheexpandednodesintoTVisitedtable.
WhenthenewlyexpandednodesarenewtoexistingvisitednodesinTVisited,wedirectlyaddthemintoTVisited.
Whenthenewlyexpandednodeshavesmallerdistancesfromthesourcenode,thenodesintheTVisitedarereplacedbythenewlyexpandedones.
3613.
4ShortestPathDiscoveryusingFEMFrameworkHere,wepresentDijkstra'salgorithmforshortestpathdiscov-eryinAlgorithm1asacasetoshowthefunctionalityoftheFEMframework.
BesidesthekeySQLsinthepathexpansion,wealsoneedauxiliarySQLsinListing3.
Suchanalgorithmcanrunontheclientside,whichconnectstotheunderlyingRDBviaJDBCorODBC.
Intherunningtime,onlyfewvariablesarekeptontheclientside,andtheRDBcarriesouttime-consumingtasks.
Algorithm1:ShortestPathDiscoveryinFEMFrameworkInput:sourcenodesandtargetnodet,GraphG=(V,E).
Output:Theshortestpathbetweensandt.
InitializeTvisitedwiththeSQLinListing2(1);1whiletruedo2Locatemidfornodeu′sidwiththeSQLinListing2(2);3ExpandpathwiththeSQLsinListing2(3,4)withmid;4ifthenumberofaffectedtuplesis0then5Break;6u.
f←1forthenalizednodewiththeSQLinListing73(2)withmid;ifthereexistsresultfortheSQLinListing3(1)then8Break;9Iterativelyndedgesintheshortestpathpalongp2slink10withtheSQLinListing3(3);Returnp;11WeinitializeTVisitedtablerstwiththesourcenode.
Wethenstartaniterationtondtheshortestpathfromline2toline9.
midinline3isthenodeidfortheto-be-nalizednode,whichcanbelocatedbyanSQLonTVisitedtable.
Wethenusemidtocom-posetheSQLforF,EandM-operatorsinpathexpansioninline4.
Afterthat,wedetectthenumberofaffectedtuplesfromSQLcommunicationareaofdatabase(SQLCA)inline5,andterminateiterationswhenTVisitedisnotupdated.
Otherwise,wenalizethenodebyitsidentiermid.
Wethendetectwhetherthetargetnodehasbeennalizedornot.
Oncetheiterationsareterminated,werecoverthefullpathfromthesourcenodetothetargetnodewithiterativeSQLsalongthep2slink.
Listing3:AuxiliarySQLsinPathFinding1://DetectterminationSelectfromTVisitedwheref=1andnid=t;2://FinializethefrontiernodeUpdateTVisitedsetf=1wherenid=mid3://LocatethepredecessornodeSelectp2sfromTVisitedwherenid=xid;ThereareatmostniterationsforthepathndinginAlgorithm1intheworstcase,wherenisthenumberofnodesinthegraph.
Ineachiteration,wehave4separateSQLs.
Thus,weatmostissue4nSQLsintheshortestpathdiscovery.
4.
OPTIMIZATIONSFORSHORTESTPATHDISCOVERYInthelastsection,wedescribetheDijkstra'sshortestpathalgo-rithmusingthegenericFEMframeworkonRDB.
Inthissection,weproposeseveraltechniquestofurtheroptimizetheshortestpathdiscoveryintheFEMframework.
Inthetypicalshortestpathdiscovery,themainoptimizationistoreducethesearchspace[2,24].
IntheFEMframework,thiscorre-spondstominimizethenumberoftotalvisitednodes.
Toaddressthisissue,thebi-directionalsearchstrategyisoftenemployed[10]andweshowitcanbeadoptedintheFEMframeworkeasily(inSection4.
1).
Suchanoptimizationcanreducethetotalcomputa-tionalcostcontributedbyrelationaloperators,suchasF,E,andM-operator,asfewernodesneedtobevisited.
Anotherimportantanduniqueaspectofrelationalshortestpathdiscoveryistooptimizethequeryevaluation:intheRDBcon-text,theset-at-a-timeevaluationismoresuitablethanthenode-at-a-timefashionforthesamesearchspace,sincetheformercanenabledatabasetofullyadopttheintelligentschedulingtoaccessdiskandexploitthedataloadedinthebuffer,andthustomakeabetterevaluationplan[14,18].
Forexample,letusconsidertheE-operator(the3-rdSQLinListing2).
IfwehaveniterationsinAlgorithm1,wewillissuenSQLstatementsforloadingtheedgesofnnodesseparately.
Suchanode-at-a-timeoperationisveryinef-cientduetotheredundantI/Ocostforaccessingedgesofmultiplenodeswhentheyarestoredinonedatablock.
Thus,anaturalstrat-egyistoadoptbatchprocessingforedgeaccess,e.
g.
,loadingtheedgesofmultiplenodesaltogether(set-at-a-time)inasingleSQLstatement.
Inotherwords,wewouldliketoreducethenumberofSQLstatementstobeissuedinashortestpathdiscovery.
How-ever,themainissueisthattheset-at-a-timefashioncaneasilyleadtoincreasesearchspace.
Forinstance,theBFSstrategy,althoughrequiresfewerSQLstatements,canleadtoamuchlargersearchspacecomparedtoDijkstra'salgorithm.
WestudyoptimizationstrategiestoimprovetheperformanceforshortestpathdiscoveryintheFEMframework.
Specically,two(often-conicting)goalsneedtobesimultaneouslyachievedandbalanced:i)reducingsearchspace,whichisagenericoptimizationcriteriaforshortestpathdiscovery(andmanyothergraphsearchtasks);andii)promotingset-at-a-timequeryevaluation(batchpro-cessing)fortherelationaldatabaseoperators.
Twoefcienttech-niques,bi-directionalsetDijkstra'salgorithmandSegTableindex,havebeenintroducedtoimprovetheperformance.
4.
1Bi-directionalSetDijkstra'sAlgorithmThebi-directionalsetDijkstra'salgorithmattemptstoaddressbothbatchdataaccessandsearchspacereductioninpathnding.
Thepathcanbefoundbybothforwardexpansionsfromthesourcenodeandbackwardexpansionsfromthetargetnode[10].
Inaddi-tion,insteadofselectingonlyonefrontiernodeineachexpansion,weselectallnon-nalizednodeswiththesameminimaldistanceasthefrontiernodes.
SuchastrategyisRDB-friendlysincemorenodeshavebeenprocessedinoneoperationandthustheredundantI/Ocostfordifferentnodescanbeloweredbyabetterevaluationplan.
Thebi-directionalsetDijkstra'ssearchcanalsobeexpressedin-sidetheFEMframeworkwithextensionstotheexistingapproachinAlgorithm1.
First,werevisetheF-operatortolocateasetoffrontiernodes.
Supposethatwecanobtainmind2sforthemini-mald2sofallnon-nalizednodes(withtheirf=0)withanaux-iliarySQLoperation.
ThepredicatefortheF-operatorcanthenbechangedtod2s=mind2s.
Thus,thenodeswiththesamemini-mald2swillbeselectedbytheF-operator.
WealsoneedanotherauxiliarySQLoperationtonalizeallfrontiernodeswiththepred-icated2s=mind2saftertheF-operator.
Itiseasytoknowthatsuchabatchnodeprocessingdoesnotimpactthecorrectnessofpathnding.
Second,theTVisitedtableshouldbeextendedandtheselectionofexpansiondirectionisneededintheiterations.
Besidesp2s,d2s,362fusedintheforwardsearching,wealsokeepp2tforthesuccessornodetothetargetnode,d2tforthedistancetothetargetnode,andbforthenodenalizationinthebackwardexpansionsimilarly.
Asfortheselectionofexpansiondirection,wetakethedirectionwithfewerfrontiernodestoreducetheintermediatenodes.
ThenumberoffrontiernodescanbecomputedbyanextraSQLonTVisited,orapproximatelyrepresentedbythenumberofaffectedtuplesfromSQLCAaftertheSQLstatementforM-operatorisevaluated.
Third,thebi-directionalsearchingposesadifferentterminationcondition.
Recallthebi-directionalDijkstra'salgorithm[10].
Letsandtbesourcenodeandtargetnoderespectively,lfandlbbetheminimaldistancediscoveredinthelatestforwardandback-wardexpansionrespectively,minCostbetheminimaldistancebetweensandtseensofar,minCostistheshortestdistancewhenminCost≤lf+lb.
Asforourcase,wecancomputeminCostastheminimalsumofd2sandd2tfornodesinTVisitedtable,lfastheminimald2sforthelatestforwardexpansion,andlbastheminimald2tforthelatestbackwardexpansion.
Withthesevaluescollected,wecandeterminewhethertheiterationscanbetermi-natedornot.
Thebi-directionalsearchingalsobringsnewoptimizationstrat-egytoprunesearchspace,whichcanbedepictedinthefollow-ing.
Suchapruningrulebeginstoworkonceonepathbetweenthesourceandthetargetnodehasbeenfound.
THEOREM1.
Letlfandlbbetheminimaldistancediscov-eredinthelatestforwardandbackwardexpansionrespectively,minCostbetheminimaldistancebetweensandtseensofar.
Taketheforwardexpansionasanexample.
Forafrontiernodev,weneednotexpandvtonodex,ifv.
d2s+w(v,x)+lb>minCost.
PROOF.
Weproveitbycontradiction.
Supposethatthereexistsapathp′=s;twithlen(p′)minCost,thedistancefromxtothetargetnodetislessthanlbinp′toachievelen(p′)minCost,wherelfandlbaredistancenalizedinthelatestforwardandbackwardexpansion,andminCostisδ(s,t).
Thus,thenumberofiterationsisnomorethanmin(δ(s,t)/wmin,n).
24.
2SelectivePathExpansionviaSegTableThebi-directionalsetDijkstra'salgorithmndstheshortestpathinaset-at-a-timefashion.
AlthoughitrequiresfarfeweroperationsonRDBthanthebasicmethod,thetotalnumberofoperationsisstillverylarge,asshowninTheorem2.
Then,canwescanmorenodesinthepathexpansiontofurtherlowerthenumberofopera-tionsonRDBIfso,howtoselectthesenodesAnextremeapproachistotakeBFSinndingtheshortestpathtoreducethenumberofoperations.
BFSexpandsallpossiblenodesincludingnewlyexpandednodesorthenodeswiththereduceddis-tanceineachiteration.
Forashortestpathp,BFScanndpwithe(p)iterations,wheree(p)isthetotalnumberofedgesinp.
Cer-tainly,BFSrequiresnomoreoperationsthanothermethods.
How-ever,BFSisnotalwayseffectivesinceitscanslargersearchspacethanDijkstra'salgorithm.
Wecanknowthatthenodeswhichhavebeenexpandedstillhavehighchancestobere-expandedinthefol-lowingiterationsinBFS.
Inordertoaddresstheseissues,weintroduceanindexnamedSegTabletopreservepre-computedshortestsegments.
Wethencanexploitthesesegmentstoselectthepartialnodesforfurtherexpan-sion.
Wewishtoreducethetotalnumberofiterationswhilenotenlargingsearchspaceseriously.
SegTable.
Intuitively,theedgeswithsmallerweightshavemorechancestobeinshortestpaths.
Then,wecanpre-computetheshortestsegmentswiththeirdistanceslessthanagiventhreshold,andstoretheresultsintoSegTable.
Roughlyspeaking,SegTablecanbeviewedasashortestpathindex.
DEFINITION4.
SegTable.
LetG=(V,E)beagraph,lthdbeathreshold.
SegTableincludesTOutSegsandTInSegstables.
TOutSegspreservesthepre-computedsegmentsintheoutgoingdi-rection.
Eachtuple(d,tid,pid,cost)inTOutSegscanbecom-posedby:(1)(u,v,pre(v),δ(u,v)),foranynodepairu,v∈Vwithδ(u,v)≤lthd.
Hereδ(u,v)istheshortestdistancebetweenuandv,andpre(v)isthepredecessorofvintheshortestpathfromutov.
(2)(u,v,u,w(u,v)),forany(u,v)∈Eandδ(u,v)>lthd.
Herew(u,v)istheweightoftheedge(u,v).
TInSegspreservesthepre-computedsegmentsintheincomingdi-rectioninasimilarway.
lthdiscalledtheindexthreshold.
SegTableisactuallytherepresentationforagraphG′contain-ingpre-computedsegmentsandoriginaledgesingraphG.
Figure4(a)showsthegraphG′withsegmentsontheoriginalgraphGinFigure1withlthd=6.
Theedge(s,e)withtheweight4isapre-computedsegment.
Ifwestartpathsearchingfroms,ecanbefoundinoneexpansioninsteadoftwoexpansions.
Theedge(s,d)withtheweight2isalsoapre-computedsegment.
There-nededgeweightcanavoidunnecessaryre-expandingfromdontheoriginalgraph.
Theedge(e,h)istheoriginaledgeinGwithδ(e,h)>lthd.
TherelationaltablesforSegTablegraphareshowninFigure4(b).
SelectivePathExpansion.
SinceSegTablecontainsthesegmentswhichhavemorechancestobeinshortestpaths,weattempttose-lectthepartialnodesfromSegTableintheexpansions.
Letlthdbetheindexthreshold.
Takethekthforwardexpansionasanex-ample.
Anodeuisselectedasafrontiernodefromvisitednodes,ifu.
d2sisnomorethanklthdoru.
d2sistheminimalamongallnodestobeexpanded.
Inotherwords,wepreferselectingnodes363(a)GraphwithSegmentsfidtidcostTInSegsfidtidpidTOutSegspidcosteb2ees4bec3e(b)SegmentsTable.
.
.
eg3eef7eeh8e.
.
.
21sbcdefghijt22374943582117846553wmin1-stexpansionwmin2-ndexpansionlthd3-rdexpansionlthdwminwmin.
.
.
.
.
.
.
(c)MinimalDistanceafterExpansions6Figure4:SegTablewithIndexThresholdlthd=6withsmalleru.
d2ssincealargerd2sindicatesahigherchanceinunnecessaryre-expansions.
Weextendthetwo-valuesignffornodenalizationintothree-valuesigntosupportthecomplexruleinselectingfrontiernodes.
AsthatinListing2,f=0onanodeuindicatesuisacandi-datefrontiernode,andf=1onanodeumeansuhasbeenex-pandedbefore.
Here,weintroducef=2fortheselectedfrontiernodes.
Asdiscussedabove,anodeuisselectedasafrontiernodewhenu.
f=0,andu.
d2s≤klthdoru.
d2sistheminimalamongallnodeswithf=0.
InordertosimplifytheexpressionofF-operator,wecanselectthesefrontiernodesandchangetheirsignfto2withanauxiliarySQLbeforeF-operator.
Hence,f=2istheonlypredicateintheF-operator.
Inaddition,afterthepathexpansion,westillusef=2todistinguishthefrontiernodesjustnow,andchangestheirfto1indicatingthatthesenodeshavebeenexpanded.
TheselectivepathexpansionoverSegTablecanndtheshort-estpathwithnomoreiterationsthanthesetDijkstra'salgorithmontheoriginalgraph.
WerstpresentanintuitiveideainFig-ure4(c)andwillgiveaformaldescriptioninthefollowing.
Letlthdbetheindexthreshold,wminfortheminimaledgeweight.
Theminimaldistancelffoundinthe1-stforwardexpansioniswminintheworstcase.
lfwillbelthd+wmininthe2-ndex-pansion,sincethedistancelessthanlthd+wminhasbeendis-coveredinthe1-stexpansion.
Similarly,lfinthe3-rdexpansionwillbelthd+2wmin.
Inductively,lfinthekthfexpansionwillbekf/2lthd+kf/2wmin.
Recallthatlfinthekthfexpan-sionwillbekfwmininthesetDijkstra'salgorithmontheoriginalgraphintheworstcase.
lfincreasesfasterintermsofthenumberofexpansionsthanbefore,whichmakestheterminationconditionlb+lf>minCostsatisedearlier.
ConstructionofSegTable.
GivenanoriginalgraphGandanin-dexthresholdlthd,theconstructionofitsSegTableneedstolocateallshortestsegmentswiththeirdistancesnomorethanlthd,andtondtheotherremainingnecessaryedges.
WecanexploittheFEMframeworktolocateallrequiredshortestsegmentsintheSegTable.
TaketheconstructionofTOutSegsasanexample.
WecanputallnodesinGintoavisitednodesetinitially.
JustlikeF-operatorintheselectivepathsearchingoverSegTable,inthekthiteration,anodeuisselectedintofrontiernodeswhenuisacandidatefrontiernode,andu.
d2s0&&nb>0do6ifnf≤nbthen7UpdatesignsforthefrontiernodeswiththeSQLin8Listing4(1);ExpandpathswiththeSQLinListing4(2);9nf←thenumberofaffectedtuplesfromSQLCA;10ResetsignsforthefrontiernodeswiththeSQLin11Listing4(3);LocatelfwiththeSQLinListing4(4);12fwd←fwd+1;13else14Similaractionsfromline8toline13forthe15backwardexpansion;LocateminCostwiththeSQLinListing4(5);16LocateanodexidintheshortestpathwiththeSQLinListing174(6);Findthesub-pathp0fromstoxidalongp2slinks;18Findthesub-pathp1fromxidtotalongp2tlinks;19Returnp0+p1;20Wemakethepathexpansionintheiterationsfromline6toline16.
Weselectanexpansiondirectionaccordingtothenumberoffrontiernodes,andshowtheactionsintheforwardexpansionfromline8toline13.
First,wesetf=2ontheselectedfrontiernodesfromallcandidatefrontiernodes(f=0).
WethenimplementF,E,andM-operatorwiththe2-ndSQLinListing4.
Comparedtothe3-rdand4-thSQLinListing2,the2-ndSQLinListing4usespre-computedsegmentsTOutSegsinsteadofTEdges,speci-esfrontiernodeswithf=2,andtakesbi-directionalpruningruleout.
cost+q.
d2s+lbsource.
costthenupdatesetd2s=source.
cost,p2s=source.
p2s,f=0whennotmatchedbytargettheninsert(nid,d2s,d2t,p2s,f)values(source.
nid,cost,Max,source.
p2s,0);3://Resetthesignthatnodehasbeenexpanded.
UpdateTVisitedsetf=1wheref=2;4://LocatetheminimaldistanceinforwardsearchSelectmin(d2s)fromTVisitedwheref=0;5://LocatetheminimaldistancediscoveredSelectmin(d2s+d2t)fromTVisited;6://LocateanodeintheshortestpathSelectnidfromTVisitedwhered2s+d2t=minCost;TheshortestpathndinginAlgorithm2requiresnomoreitera-tionsthantheformermethods,whichcanbeillustratedasfollows:THEOREM3.
Givenasourcenodesandatargetnodet,theSegTablewiththeindexthresholdlthd,thenumberofiterationsinAlgorithm2islessthanmin(n,2(δ(u,v)+lthd)lthd+wmin).
PROOF.
Algorithm2willterminatewhenlf+lb≥minCost,wherelfandlbaretheminimaldistancefoundinthelatestfor-wardandbackwardexpansionsrespectively,andminCostistheminimaldistancebetweensandtseensofar.
Sinceeachitera-tionwillnalizeatleastonenode,thetotaliterationsarenomorethannintheworstcase.
Letkfandkbbethenumberoffor-wardandbackwardexpansionsrespectively.
AsillustratedinFig-ure4(c),lfisnolessthankf/2lthd+kf/2wmin,andlbisnolessthankb/2lthd+kb/2wmin.
lb+lf>((kb+kf)/2)(lthd+wmin)lthd.
Thus,whenthetotalnumberofiterations,kb+kf,reaches2(δ(u,v)+lthd)lthd+wmin,theiterationscanbeterminatedwithlb+lf≥δ(u,v).
2sbcditgfeh1-stforwardexpansion2-ndforwardexpansion1-stbackwardexpansion22148753j932f11110100nidbsdcfehgjitp2ssssseseed2s2021114127b0001d2t3920p2tttttFigure5:TVisitedinBi-directionalSearchingonSegTableThepathndingviaSegTablecanbeillustratedinFigure5.
Letsbethesourcenodeandtbethetargetnode.
Wemake2for-wardexpansionsand1backwardexpansion.
TVisitedtablecon-tainsthecurrentintermediateresults.
ThecurrentshortestdistanceminCostis15,achievedonnodeh.
Wehavetocontinuesearch-ing,sincelf=12,lb=2,andlf+lb6002529.
27783.
2060k>60031311.
5823.
4380k>60035613.
2883.
75100k>60041415.
1853.
62Table2:Exps(#Expansions),Time(Time:s)onPowerGraphs5.
2QueryEvaluationWestudytheissuesrelatedtotheFEMframeworkanditsopti-mizations.
Theexperimentswillbedividedinto3sub-parts.
WerstverifytheeffectivenessofFEMframeworkandset-at-a-timeoptimization.
WethenstudyoptimizationstrategieswithSegTable.
Finally,wemakeextensivestudiesonourmethodwithallopti-mizations.
Inthefollowingexperiments,werandomlygenerate100shortestpathqueries,andreporttheaveragetimecost.
FEMFrameworkandSet-at-a-timeFashion.
Inthissub-part,werstcomparethetimecostusedbythesingledirectional(DJ),thebi-directional(BDJ)andbi-directionalset(BSDJ)Dijkstra'sap-proach,thenstudythecostusedbydifferentphasesandoperators,andshowtheeffectivenessofnewfeaturesofSQL.
Figure6(a)reportstheresultsofDJ,BDJ,andBSDJonPowergraphsvaryingsizefrom20kto100k.
Taketheshortestpathdis-coveryinagraphwith20knodesasanexample,DJconsumesabout7minutes,whileBDJconsumes6.
75seconds,andBSDJcosts2.
25seconds.
Infact,wecannotuselargegraphsinthistest,sinceDJshowsapoorperformanceintheshortestpathdiscovery.
AnotherobservationinFigure6(a)isthatthetimecostofBSDJisabout1/3ofthatinBDJ.
Inordertondthereasonsbehindtheperformanceimprove-ment,wecollectthetotalnumberofexpansionsinpathndingandthetimecostconsumedbyDJ,BDJ,andBSDJinTable2.
ItclearlyshowsthatBSDJtakesthefewestexpansions.
Thenumberofex-pansionsinDJisabout50timesbiggerthanthatinBDJ,and140timesbiggerthanthatinBSDJ.
Thus,theSQLsusedbyBSDJisfewestamongthree.
Theresultsverifyourclaimbefore.
Theset-at-a-timeevaluationfashion,whichenablestheRDBoptimizertoproduceabetterevaluationplan,canbeatthenode-at-a-timeeval-uationfashioneasily,whentheyhavethesamesearchspace.
Figure6(b)plotsthetimecostusedindifferentphasesinthepathndinginAlgorithm1,includingthepathexpansion(denotedbyPE),thestatisticscollection(denotedbySC),andthefullpathrecovery(denotedbyFPR).
Wecanseethatthepathexpansionwiththreeoperatorsconsumesmostoftime.
Next,wegodeepintothepathexpansion.
WedirectlytranslateF,E,andM-operatorsintoseparateSQLs,andcollecttheirtimecost.
TheresultsarepresentedinFigure6(c).
WendE-operatortakesabout75percenttimeintheshortestpathdiscovery.
ItisbecauseE-operatormakesajoinwiththegraphtabletondnewlyexpandednodes.
204060801000246810121416Dataset=PowerxkN3dDatabase=DBMS-xTimeCost(s)#nodes(k)BDJBSDJ(a)Querytimevs.
graphscale2040608010002468Dataset=PowerxkN3dMethod=BSDJDatabase=DBMS-xTimeCost(s)#nodes(k)PESCFPR(b)Querytimevs.
phases2040608010002468Dataset=PowerxkN3dMethod=BSDJDatabase=DBMS-xTimeCost(s)#nodes(k)F-operatorE-operatorM-operator(c)Querytimevs.
operators20406080100345678Dataset=PowerxkN3dMethod=BSDJDatabase=DBMS-xTimeCost(s)#nodes(k)TSQLNSQL(d)Querytimevs.
SQLFigure6:ExperimentalResultsonFEMFrameworkFigure6(d)studiesthequerytimecostaffectedbydifferentSQLfeatures.
Weimplementpathdiscoverywiththenewfeaturesin-cludingwindowfunctionandmergestatement(denotedbyNSQL)andthetraditionalfeaturesincludingaggregatefunctionsandin-sert/updateformerge(denotedbyTSQL).
TheresultsclearlyshowthattheNSQLmethodoutperformstheTSQLmethodsignicantly.
WebelievethattheadvanceofRDBmakesitmorepowerfulinhan-dlingtheshortestpathdiscoveryandothercomplexoperations.
DuetothefactthatBSDJoutperformsDJandBDJthoroughly,wereportonlythecurveforBSDJ,andomitthecurvesfortheothertwointhefollowing.
WewillusenewfeaturesofSQLinpathndingunlessexplicitlymentioned.
OptimizationsusingSegTable.
Next,weanalyzetheperformanceoftheselectivepathexpansiononSegTable(BSEG).
ThetermBSEG(x)isforBSEGwiththeindexthresholdlthd=x.
Inor-dertoseethetrade-offbetweenthesearchspaceandtheevalu-ationfashionclearly,wealsoimplementBBFSforbi-directionalbreadth-rst-search.
Figure7(a)andFigure7(b)comparethetimecostconsumedbyBSDJ,BBFSandBSEGontwosetsoflargegraphs.
BSEGisfastestamongthree.
ThetimecostinBSEGisnearly1/7ofthatinBBFSand1/3inBSDJacrossLiveJournal4mgraph,andthepro-portionisabout1/2or1/3onRandomgraphs.
Wemakeadeepanalysisonthetimecost,thenumberofexpansionsandthesizeofvisitednodesinBSDJ,BBFSandBSEG.
TheresultsarelistedinTable3.
WecanseethatalthoughthevisitednodesinBSEGarealittlemorethanthoseinBSDJ,thenumberofexpansionsinBSEGisjust1/3ofthatinBSDJ.
Thereductioninthecorrespond-ingSQLsatthecostofslightlyenlargedsearchspacemakesBSEGworkbest.
AsforBBFS,ittakesfewestexpansionsinthetests.
However,BBFSisslowerthanothertwomethodsinsomecasessinceBBFSincursmorevisitednodes.
Bycombiningtheseex-perimentresults,weknowthatwecannotconsidertheevaluationfashion(thenumberofoperations)orthesearchspace(thesizeofintermediatenodes)separately,butstrivetoachieveabalancebe-tweenthem.
Figure7(c)andFigure7(d)illustratethequeryevaluationtimecostvaryingtheindexthresholdlthdofSegTableonbothPowergraphandrealdatarespectively.
Weobservethattheperformance3660.
00.
51.
01.
52.
02.
53.
03.
54.
005101520253035Dataset=LiveJournalxmnDatabase=DBMS-xTimeCost(s)#nodes(m)BSDJBBFSBSEG(3)(a)Querytimevs.
graphscale5m10m15m20m40m010203040506070Dataset=RandomxmN3dDatabase=DBMS-xTimeCost(s)DataSetBBFSBSDJBSEG(3)BSEG(5)BSEG(7)(b)Querytimevs.
graphscale1002003004005000.
00.
51.
01.
52.
02.
5Dataset=PowerxkN3dMethod=BSEGDatabase=DBMS-xTimeCost(s)#nodes(k)lthd=10lthd=30lthd=40lthd=50(c)Querytimevs.
lthdGoogleWebDBLP0123456Method=BSEGDatabase=DBMS-xTimeCost(s)DataSetlthd=2lthd=4lthd=6lthd=8lthd=10(d)Querytimevs.
lthdFigure7:ExperimentalResultsonSegTableBSDJBBFSBSEG(5)|V|TimeExpsVstTimeExpsVstTimeExpsVst5M15.
31743.
6k13.
530129k8.
2574.
4k10M27.
81844.
9k24.
732222k13.
4607.
4k15M33.
41915.
9k44.
033283k12.
4619.
3k20M41.
71977.
4k62.
434358k21.
76210.
1kTable3:Time(Time:s),Exps(#expansions),andVst(#visitednodes)onRandomGraphsisimprovedrstandthendeclineswiththeincreaseoftheindexthresholdlthd.
Weexplainthereasonsasfollows:ontheonehand,accordingtoourshortestpathdiscoveryalgorithm,alargerlthdre-sultsinmoresegmentsinSegTableandfewerexpansionstolocatetheshortestpath.
Ontheotherhand,morepre-computedsegmentsenlargethesearchspace,whichcutsdownthequeryperformance.
Inaddition,asshowninFigure7(c)andFigure7(d),arelativelylargelthd(e.
g.
,lthd=30)isappropriateforPowergraphwhileasmallerlthd(e.
g.
,lthd=6or8)ismoresuitableonourrealgraphs.
HowtondanoptimallthdforSegTableoverdifferentgraphswillbeafutureworkofthispaper.
ExtensiveStudies.
Inthissub-part,westudytheimpactsofdif-ferentdatabaseengines,differentbuffersizesandindexstrategiesonthequeryperformance.
Finally,wecompareourRDBapproachwiththein-memoryones.
Figure8(a)comparesthequerytimeconsumedbyBBFSandBSEGonPostgreSQL.
SincePostgreSQLsupportsthewindowfunctionbutcannotprovidethemergestatement,weuseinsertandupdatestatementfortheM-operatorinstead.
TheresultsonPost-greSQLaresimilartothoseonthecommercialdatabasesystem,whichalsoshowsthatourmethodhasgoodapplicability.
Figure8(b)illustratestheimpactofdifferentbuffersizesinRDBonthequeryperformance.
ItisnotsurprisedthattheincreaseofthebuffersizedecreasesthenumberofI/Ooperations,andthenimprovestheperformance.
Forourcase,thedecreaseofcostisnearlylineartotheincreaseofthebuffersizes.
Wealsonoticethatoncethebuffersizereachesathreshold,forexample,6Gfor1002003004005001.
01.
52.
02.
53.
03.
5Dataset=PowerxkN3dDatabase=PostgreSQLTimeCost(s)#nodes(k)BBFSBSEG(20)(a)Querytimevs.
database1234567468101214Dataset=LiveJournalDatabase=DBMS-xTimeCost(s)BufferSize(G)BSEG(3)(b)Querytimevs.
buffersize10020030040050012345Dataset=PowerxkN3dMethod=BSEG(20)Database=DBMS-xTimeCost(s)#nodes(k)NoIndexIndexCluIndex(c)Querytimevs.
index100200300400500012345678Dataset=PowerxkN3dDatabase=DBMS-xBuffer=1.
5GTimeCost(s)#nodes(k)MDJBSEG(20)MBDJ(d)Querytimevs.
in-memorymethodsFigure8:ExperimentalResultsonExtensiveStudytheshortestpathdiscoveryoverLiveJournal,theevaluationtimestaysalmoststable.
Insuchasetting,thegraphcanbefullyloadedintomemory,andtheformerkeyfactorinI/Ooperationsisnotasimportantasbefore.
InFigure8(c),weconductstudiesondifferentindexstrategies,includingtheclusteredanduniqueindex(denotedbyCluIndex),non-clustereduniqueindex(denotedbyIndex),andnoindex(de-notedbyNoIndex)onTOutSegs(fid)(onTInSegssimilarly),aswellasonTVisited(nid).
WecanseethatCluIndexachievesthebestperformance.
WethinkthattheuniqueindexmayimprovetheperformanceofE-operator,whichcansupporttheindex-joinbe-tweentheSegTableandTVisitedtable,andtheclusteredindexcanreduceI/Ocostinaccessingedgesforgivennodes.
Last,wecompareourrelationalapproachwiththein-memoryones,includingDijkstra'salgorithm(MDJ)andbi-directionalDi-jkstra'salgorithm(MBDJ).
WesetboththebuffersizeofRDBandthemaximalmemoryusedinin-memoryapproachesto1.
5G.
Inor-dertomakethecomparisonfair,thetimereportedforin-memoryapproachesdoesnotincludethetimeinloadinggraphintomemory,andthetimeforrelationalapproachesiscollectedafterthedatabasebufferbecomeshot.
WecanseeourBSEGalgorithmisnotasfastasMBDJinFigure8(d).
ItisexpectedasRDBisageneralinfras-tructuretodifferentinformationsystems,andisnottailoredforthegraphdatamanagement.
Inaddition,thebufferofRDBalsocon-tainsotherinformationsuchassystemtables.
WealsonoticethatBSEGcanoutperformsMDJandshowsbetterscalability.
TheseresultsalsorevealthattheproperevaluationstrategyintheRDBcontextstillcangainasatisfactoryperformance.
WestressherethatthemainadvantageofRDBliesinitsscalability,stabilityandeasyprogramminginthegraphmanagement.
5.
3SegTableConstructionInthispart,westudytheconstructionofSegTablefortworea-sons.
Onetheonehand,SegTableisanimportantoptimizationstrategyinthepathdiscovery.
Ontheotherhand,theconstruc-tionofSegTablecanbeviewedasanotherapplicationoftheFEMframework.
Thesetestscanbethesupplementtothequeryevalua-tionones.
3671002003004005000123456Dataset=PowerxkN3dDatabase=DBMS-xEncodingNumber(m)#nodes(k)lthd=10lthd=20lthd=30lthd=40(a)Indexsizevs.
lthdGoogleWebDBLP04812162024Database=DBMS-xEncodingNumber(m)DataSetlthd=2lthd=4lthd=6lthd=8lthd=10(b)Indexsizevs.
lthd1002003004005000100200300400500600Dataset=PowerxkN3dDatabase=DBMS-xTimeCost(s)#nodes(k)lthd=10lthd=20lthd=30lthd=40(c)Constructiontimevs.
lthdGoogleWebDBLP02004006008001000Database=DBMS-xTimeCost(s)DataSetlthd=2lthd=4lthd=6lthd=8(d)Constructiontimevs.
lthd1002003004005001234567Dataset=PowerxkN3dDatabase=PostgreSQLTimeCost(ks)#nodes(k)lthd=10lthd=20lthd=30(e)Constructiontimevs.
lthd10020030040050050100150200250300350Dataset=PowerxkN3dlthd=20Database=DBMS-xTimeCost(s)#nodes(k)NSQLTSQL(f)Constructiontimevs.
SQL60080010001200140016004567891011Dataset=LiveJournalDatabase=DBMS-xTimeCost(ks)BufferSize(M)lthd=3(g)Constructiontimevs.
buffersize0.
00.
51.
01.
52.
02.
53.
03.
54.
00.
51.
01.
52.
02.
53.
03.
54.
0Dataset=LiveJournalxmnDatabase=DBMS-xTimeCost(ks)#nodes(m)lthd=3(h)Constructiontimevs.
graphscaleFigure9:ExperimentalResultsonSegTableConstructionIndexSize.
Figure9(a)andFigure9(b)plottheindexsizevaryingtheindexthresholdlthdonPowergraphsandrealdatasetsrespec-tively.
Wecanknowthatalargerlthdrequirespre-computingmoresegments,whichresultsinalargerSegTableindex.
Wealsoob-servethat,fordifferentgraphs,theimpactoflthdonindexsizevaries.
Forexample,GoogleWebismoresensitivetolthdthanDBLP.
ItispartiallyduetothatGoogleWebhasaskeweddegreedistribution.
ConstructionTime.
Figure9(c)andFigure9(d)summarizetheconstructiontimecostvaryingtheindexthresholdlthdacrossbothsyntheticandrealdatasetsoncommercialdatabase.
Theresultsandreasonaresimilartothoseinthetestsontheindexsize.
Alargerlthdcorrespondstolongershortestsegmentsandthenincursmoreindexingtime.
Figure9(e)showsthetimeresultsonopensourcedatabasePost-greSQLacrossPowergraphs.
ThebehavioronPostgreSQLissim-ilartothatonthecommercialdatabaseDBMS-x,suchasFigure9(c)andFigure9(d).
ThisprovesourSegTablebasedmethodcanbeappliedondifferentrelationaldatabaseplatforms.
Figure9(f)comparestheconstructiontimecostusingthenewandtraditionalfeaturesofSQLonPowergraphs.
Sincetheinter-mediatenodesintheindexconstructionarealsorestrictedbytheindexthresholdlthd,theadvantageofthemethodusingNSQLoverthemethodusingTSQLisnotassignicantasthatinthepathnd-ing.
However,wecanseethemethodusingNSQLstilloutperformstheoneusingTSQL.
Figure9(g)illustratestheimpactofdifferentbuffersizesusedbyRDBontheindexconstructiontimecost.
Theresultsaresimilartothoseinthequeryevaluation.
Theincreaseofbuffersizewillimprovetheindexingperformance.
Forexample,theindexingtimecostwithamemorysetto0.
6Gistwotimesofthatwithmemoryxedto1.
6G.
Moreover,whenthebuffersizeexceeds1.
2G,themaximalrequirementonmemoryforindexing,thetimecurveisalmosthorizonal.
Figure9(h)showstheindexconstructiontimevaryingthesizeofgraphs.
Alargergraphneedsmorevisitednodesintheconstruc-tion,whichconsumesmoreindexingtime.
Additionally,weob-servethatourconstructionmethodhashighscalabilityasgraphsgrowlarger,astherelationshipsbetweentheindexconstructiontimeandgraphsizearealmostlinear.
ItisduetothatourSegTableindexonlyencodeslocalshortestsegments.
5.
4SummaryTosumup,fromtheexperimentalresults,wecandrawthefol-lowingconclusions:i)OurFEMframeworkcanbeusedtoanswerthegraphsearchqueriessuchastheshortestpathdiscovery.
AndthenewfeaturesintroducedbyrecentSQLstandardscanimprovetheperformanceofFEMgreatly.
ii)Theset-at-a-timeevaluationmethod,suchasBSDJ,canoutperformthenode-at-a-timeevalua-tionmethod,suchasBDJ,signicantly.
iii)Theindex,SegTable,canfurtherimprovetheperformancebyreducingthenumberofexpansionsatthecostofslightlyenlargedsearchspace,ifthein-dexthresholdisproperlyset.
iv)Ourrelationalapproachshowshighscalabilityintermsofgraphsizes,buffersizes,anddifferentdatabasemanagementsystems.
6.
RELATEDWORKGraphsearchqueriesareverybasicandimportantgraphopera-tions.
Duetothelargesearchspace,manyproposedmethodstakegreedyapproaches,includingDijkstra'salgorithmfortheshortestpath[12],Prim'salgorithmfortheminimalspanningtree[20],thegreedysalesmanpathdiscovery[9],andthelike.
Shortestpathdiscoveryisarepresentativegraphsearchopera-tion.
Dijkstra'salgorithmisawell-knownonlinealgorithm[12]tosolveasingle-sourceshortestpathprobleminO(n2)timeongeneralgraphs.
Thebi-directionalsearchstrategy[10]isanimpor-tantextensiontoreducethesearchspace.
Theshortestpathscanbepre-computedandstoredbydifferentforms,suchastheland-markindex[19,2],2-HOPrelatedindex[11],TEDIindex[24],etc.
Theseindicescanimprovetheperformanceintherunningtime.
However,theseonlineshortestpathdiscoveryandshortestindexbuildingmethodsdonotconsiderthecasewhenthegraphcannotbefullyloadedintothememory.
Theexternalgraphoperationsareneededwiththerapidgrowthofgraphs.
Anexternalshortestpathindexisproposedby[8]onpla-nargraphs.
MapReduceframework[16,3]anditsopensourceim-plementationHadoop[1]achievehighscalabilityinhandlinglargegraphs.
ThecurrentlimitationofMapReduceframeworkliesin368itsweaksupporttoonlinequeryandexpensivedynamicupdateofgraphs.
Wealsonoticespecicgraphoperationsintheexternalmemory.
Forexample,theapproximateminimum-cut[5]canbecomputedonthesampledgraph.
Thecliquescanbefoundonthepartiallyloadedsub-graphs[15].
However,itishardtoextendthemtosupportgeneralgraphsearchqueries.
TheextensionofRDBasascalableplatformtomanagecom-plexdatatypesorsupportsophisticatedapplicationshasbeenahottopicrecently.
Thedataminingtasks[4]andstatisticdataanaly-sis[6]canbeevaluatedonthetopofRDB.
WhenusingRDBtomanageXMLdata[17,13,18],anXMLqueryisalwaystranslatedintomultipleSQLs,duringwhichanimportantoptimizationistoproducefewerSQLsandthustheoptimizerinRDBcangeneratebetterevaluationplans[18].
RDBbasedgraphqueryprocessingiscloselyrelatedtoourwork.
Thereachabilityqueryisevaluatedbyastoredprocedure[23].
TheSQLbasedgraphdatamininghasbeenstudiedin[22,7].
Differ-entfromtheexistingmethods,weattempttodesignagenericgraphsearchframeworkinRDB,andimproveitsperformancebyexploit-ingnewfeaturesofSQL.
WealsodesignoptimizationstofurtherimprovetheperformancebybalancingthesearchspacereductionandRDB-friendlyevaluationfashion.
7.
CONCLUSIONSANDFUTUREWORKThispaperfocusesontherelationalapproachtodiscovertheshortestpathonlargegraphs.
WeabstractarelationalgenericgraphsearchframeworkFEMwiththreenewoperators,andem-ploythenewfeaturesofSQLsuchaswindowfunctionandmergestatementtoimprovetheperformanceoftheframework.
Addition-ally,wedesigntwooptimizationsforshortestpathdiscoveryinsidetheframework,includingthebi-directionalsetDijkstra'ssearchingandselectivepathexpansiononSegTablecontainingpre-computedshortestsegments.
Thenalexperimentalresultsshowthescala-bilityandefciencyofourrelationalapproach.
Thisworkcanbeextendedinseveralinterestingdirections.
First,wewillstudytheevaluationofothergraphsearchqueries,suchasgraphpatternmatch,usingtheFEMframework.
Second,wewillexploitthedistributeddatabasetoachievehigherscalabilityintermsofgraphsizes.
Thepartitionoftherelationaltablesforgraphsandintermediateresultsamongdistributeddatabaseisaninterestingissue.
Third,therelationalapproachtographmanage-mentneedstoconsiderthedynamicchangesoftheoriginalgraph.
Thepre-computedresults,suchasSegTableinthispaper,shouldbemaintainedincrementally.
ACKNOWLEDGMENTSWewouldliketothanktheanonymousreviewersfortheirhelpfulcomments.
TheNSFCsupportedGaovia60873062and61073018,andsupportedJinvia61003167.
TheresearchgrantsCounciloftheHongKongSARsupportedYuvia419008and419109.
TheNSFsupportedJinviaIIS-0953950.
NationalscienceandtechnologymajorprogramsupportedWangvia2010ZX01042-001-003-05and2010ZX01042-002-002-02.
8.
REFERENCES[1]ApacheHadoop.
http://hadoop.
apache.
org.
[2]A.
GoldbergandC.
Harrelson.
Computingtheshortestpath:searchmeetsgraphtheory.
InSODA,pages156–165,2005.
[3]B.
Bahmani,K.
Chakrabarti,andD.
Xin.
Fastpersonalizedpagerankonmapreduce.
InSIGMOD,pages973–984,2011.
[4]B.
Zou,X.
Ma,B.
Kemme,G.
Newton,andD.
Precup.
Dataminingusingrelationaldatabasemanagementsystems.
InPAKDD,pages657–667,2006.
[5]C.
Aggarwal,Y.
Xie,andP.
Yu.
Gconnect:Aconnectivityindexformassivedisk-residentgraphs.
PVLDB,2(1):862–873,2009.
[6]C.
Mayeld,J.
Neville,andS.
Prabhakar.
Eracer:adatabaseapproachforstatisticalinferenceanddatacleaning.
InSIGMOD,pages75–86,2010.
[7]C.
Wang,W.
Wang,J.
Pei,Y.
Zhu,andB.
Shi.
Scalableminingoflargedisk-basedgraphdatabases.
InSIGKDD,pages316–325,2004.
[8]D.
Hutchinson,A.
Maheshwari,andN.
Zeh.
Anexternalmemorydatastructureforshortestpathqueries.
DiscreteAppliedMathematics,126(1):55–82,2003.
[9]D.
JohnsonandL.
McGeoch.
Thetravelingsalesmanproblem:Acasestudyinlocaloptimization.
Localsearchincombinatorialoptimization,215-310,1997.
[10]D.
WagnerandT.
Willhalm.
Speed-uptechniquesforshortest-pathcomputations.
InSTACS,pages23–36,2007.
[11]E.
Cohen,E.
Halperin,H.
Kaplan,andU.
Zwick.
Reachabilityanddistancequeriesvia2-hoplabels.
InSODA,pages937–946,2002.
[12]E.
Dijkstra.
Anoteontwoproblemsinconnexionwithgraphs.
NumerischeMathematik,pages269–271,1959.
[13]F.
Tian,B.
Reinwald,H.
Pirahesh,T.
Mayr,andJ.
Myllymaki.
Implementingascalablexmlpublish/subscribesystemusingarelationaldatabasesystem.
InSIGMOD,pages479–490,2004.
[14]H.
Garcia-Molina,J.
Ullman,andJ.
Widom.
DatabaseSystems:TheCompleteBook.
PrenticeHallPress,2008.
[15]J.
Cheng,Y.
Ke,A.
W.
Fu,J.
X.
Yu,andL.
Zhu.
Findingmaximalcliquesinmassivenetworksbyh*-graph.
InSIGMOD,pages447–458,2010.
[16]J.
DeanandS.
Ghemawat.
Mapreduce:Simplieddataprocessingonlargeclusters.
InOSDI,pages137–150,2004.
[17]J.
Shanmugasundaram,K.
Tufte,C.
Zhang,G.
He,D.
DeWitt,andJ.
Naughton.
Relationaldatabasesforqueryingxmldocuments:Limitationsandopportunities.
InVLDB,pages302–314,1999.
[18]M.
Benedikt,C.
Chan,W.
Fan,R.
Rastogi,S.
Zheng,andA.
Zhou.
Dtd-directedpublishingwithattributetranslationgrammars.
InVLDB,pages838–849,2002.
[19]M.
Potamias,F.
Bonchi,C.
Castillo,andA.
Gionis.
Fastshortestpathdistanceestimationinlargenetworks.
InCIKM,pages453–470,2009.
[20]R.
Prim.
Shortestconnectionnetworksandsomegeneralizations.
BellSystemTechnicalJournal,36:1389–1401,1957.
[21]R.
RonenandO.
Shmueli.
Soql:Alanguageforqueryingandcreatingdatainsocialnetworks.
InICDE,pages1595–1602,2009.
[22]S.
Srihari,S.
Chandrashekar,andS.
Parthasarathy.
Aframeworkforsql-basedminingoflargegraphsonrelationaldatabases.
InPAKDD,pages160–167,2010.
[23]S.
TrilandU.
Leser.
Fastandpracticalindexingandqueryingofverylargegraphs.
InSIGMOD,pages845–856,2007.
[24]F.
Wei.
Tedi:efcientshortestpathqueryansweringongraphs.
InSIGMOD,pages99–110,2010.
[25]W.
Fan,J.
Li,S.
Ma,N.
Tang,Y.
Wu,andY.
Wu.
Graphpatternmatching:Fromintractabletopolynomialtime.
PVLDB,3(1):264-275,2010.
369
妮妮云的来历妮妮云是 789 陈总 张总 三方共同投资建立的网站 本着“良心 便宜 稳定”的初衷 为小白用户避免被坑妮妮云的市场定位妮妮云主要代理市场稳定速度的云服务器产品,避免新手购买云服务器的时候众多商家不知道如何选择,妮妮云就帮你选择好了产品,无需承担购买风险,不用担心出现被跑路 被诈骗的情况。妮妮云的售后保证妮妮云退款 通过于合作商的友好协商,云服务器提供2天内全额退款到网站余额,超过2天...
创梦网络怎么样,创梦网络公司位于四川省达州市,属于四川本地企业,资质齐全,IDC/ISP均有,从创梦网络这边租的服务器均可以****,属于一手资源,高防机柜、大带宽、高防IP业务,另外创梦网络近期还会上线四川眉山联通、广东优化线路高防机柜,CN2专线相关业务。广东电信大带宽近期可以预约机柜了,成都优化线路,机柜租用、服务器云服务器租用,适合建站做游戏,不须要在套CDN,全国访问快,直连省骨干,大网...
[六一云迎国庆]转盘活动实物礼品美国G口/香港CTG/美国T级超防云/物理机/CDN大促销六一云 成立于2018年,归属于西安六一网络科技有限公司,是一家国内正规持有IDC ISP CDN IRCS电信经营许可证书的老牌商家。大陆持证公司受大陆各部门监管不好用支持退款退现,再也不怕被割韭菜了!主要业务有:国内高防云,美国高防云,美国cera大带宽,香港CTG,香港沙田CN2,海外站群服务,物理机,...
postgresql9.0为你推荐
leavemealone女孩说leave me alone还有戏吗?求帮助neworiental天津新东方总部地址在哪里?安徽汽车网想在合肥买辆二手车,想问在哪里买比较放心?嘀动网手机一键通用来干嘛呢?kb123.net股市里的STAQ、NET市场是什么?www.zhiboba.com网上看nbawww.ca800.com西门子plc仿真软件有什么功能www4399com4399是什么网站啊???汴京清谈求好看的鼠猫文~月风随笔关于春夏秋冬的散文
已备案未注册域名 北京vps主机 荷兰服务器 外贸主机 vmsnap3 网页背景图片 镇江联通宽带 vip购优汇 徐正曦 西安服务器托管 学生服务器 阿里云邮箱申请 hostease umax 中国电信宽带测速 网站防护 apachetomcat windowsserver2012 so域名 免费服务器 更多