Delayslinuxmint
linuxmint 时间:2021-03-28 阅读:(
)
BenchmarkingFederatedSPARQLQueryEngines:AreExistingTestbedsEnoughGabrielaMontoya1,Maria-EstherVidal1,OscarCorcho2,EdnaRuckhaus1,andCarlosBuil-Aranda31UniversidadSimonBolvar,Venezuela{gmontoya,mvidal,ruckhaus}@ldc.
usb.
ve2OntologyEngineeringGroup,UniversidadPolitecnicadeMadrid,Spainocorcho@fi.
upm.
es3DepartmentofComputerScience,PonticiaUniversidadCatolica,Chilecbuil@ing.
puc.
clAbstract.
Testbedsproposedsofartoevaluate,compare,andeventu-allyimproveSPARQLqueryfederationsystemshavestillsomelimita-tions.
Somevariablesandcongurationsthatmayhaveanimpactonthebehaviorofthesesystems(e.
g.
,networklatency,datapartitioningandqueryproperties)arenotsucientlydened;thisaectstheresultsandrepeatabilityofindependentevaluationstudies,andhencetheinsightsthatcanbeobtainedfromthem.
InthispaperweevaluateFedBench,themostcomprehensivetestbeduptonow,andempiricallyprobetheneedofconsideringadditionaldimensionsandvariables.
TheevaluationhasbeenconductedonthreeSPARQLqueryfederationsystems,andtheanalysisoftheseresultshasallowedtouncoverpropertiesofthesesystemsthatwouldnormallybehiddenwiththeoriginaltestbeds.
1IntroductionThenumberofRDFdatasetsmadepubliclyavailablethroughSPARQLend-pointshasexplodedinrecentyears.
Thisfact,togetherwiththepotentialaddedvaluethatcanbeobtainedfromthecombinationofsuchdistributeddatasources,hasmotivatedthedevelopmentofsystemsthatallowexecutingqueriesoverfed-eratedSPARQLendpoints(e.
g.
,SPARQL-DQP[2],Jena'sARQ1,RDF::Query2,ANAPSID[1],FedX[10],ADERIS[3]).
SomesystemsuseSPARQL1.
0orad-hocextensions,whileothersrelyonthequeryfederationextensionsthatarebeingproposedaspartoftheupcomingSPARQL1.
1specication[6].
InparalleltothedevelopmentoffederatedSPARQLqueryevaluationsys-tems,severaltestbedshavebeencreated(e.
g.
,asdescribedin[2,7,8]),whichcomplementthosealreadyusedforsingle-endpointqueryevaluation.
Theroleofthesetestbedsistoallowevaluatingandcomparingthemaincharacteristicsofthesesystems,soastoprovideenoughinformationtoimprovethem.
Among1http://jena.
apache.
org/2http://search.
cpan.
org/~gwilliams/RDF-Query/P.
Cudre-Maurouxetal.
(Eds.
):ISWC2012,PartII,LNCS7650,pp.
313–324,2012.
cSpringer-VerlagBerlinHeidelberg2012314G.
Montoyaetal.
thefeaturesevaluatedbythesetestbedswecancite:i)functionalrequirementssupported,ii)eciencyoftheimplementationswithdierentcongurationsofdatasetsandwithdierenttypesofqueries,oriii)resiliencetochangesinthecongurationsofthesesystemsandtheunderlyingdatasets.
ThemostrecentandcompletetestbedisFedBench[8],whichproposesavarietyofqueriesindierentdomainsandwithdierentcharacteristics,includingstar-shaped,chain-likeandhybridqueries,andcomplexqueryformsusinganadaptationofSP2Bench[9].
Thesetestbedsarestepsforwardtowardsestablishingacontinuousbenchmark-ingprocessoffederatedSPARQLqueryengines.
However,theyarestillfarfromeectivelysupportingsuchbenchmarkingobjectives.
Infact,theydonotspecifycompletelyorevenconsidersomeofthedependentandindependentvariablesandcongurationsetupsthatcharacterizethetypesofproblemstobetackledinfederatedSPARQLqueryprocessing,andthatclearlyaecttheperformanceandqualityofdierentsolutions.
Thismayleadtoincorrectcharacterizationswhenthesetestbedsareusedtoselectthemostappropriatesystemsinagivenscenario,ortodecidethenextstepsintheirdevelopment.
Forexample,testbedsliketheonein[2]havelimitations.
First,queriesareexecuteddirectlyonliveSPARQLendpoints;thismeansthatexperimentsarenotreproducible,astheloadofendpointsandnetworklatencyvariesovertime.
Second,querieswereconstructedforthedataavailableintheselectedendpointsatthetimeofgeneratingthetestbed,butthestructureoftheseunderlyingRDFdatasourceschanges,andmayresultinqueriesthatarereturningdierentanswersorthatdonotreturnanyansweratall.
IncaseslikeFedBench[8],thelevelofreproducibilityisimprovedbyusingdatasetsthatcanbehandledlocally.
However,asshowninSection2,therearevariablesthatarenotyetconsideredinthisbenchmark(e.
g.
,networklatency,datasetcongurations)andthatareimportantinordertoobtainmoreaccurateandinformativeresults.
Theobjectiveofthispaperistodescriberstthecharacteristicsexhibitedbythesetestbeds(mainlyfocusingonFedBench)andreectontheircurrentlim-itations.
Additionalvariablesandcongurationsetups(e.
g.
,newqueries,newcongurationsofnetworklatencydetails,newdatasetdistributionparameters)areproposedinordertoprovidemoreaccurateandwell-informedoverviewsofthecurrentstatusofeachoftheevaluatedsystems,sothattheexperimentstobeexecutedcanoermoreaccurateinformationaboutthebehavioroftheevalu-atedsystems,andhencetheycanbeusedincontinuousimprovementprocesses.
Finally,wedescribebrieytheresultsofourevaluationofthisextendedtestbedusingthreedierentfederatedqueryengines:ARQ,ANAPSID,andFedX.
2SomeLimitationsofExistingTestbedsThereisnounique"one-size-ts-all"testbedtomeasureeverycharacteristicneededbyanapplicationthatrequiressomeformoffederatedqueryprocess-ing[8].
However,regardless,existingtestbedscanstillbeimprovedsothattheycanfullltheirroleincontinuousbenchmarkingprocesses.
Wewillrstillustratewhyweneedtoimproveexistingtestbeds,particularlyFedBench,bydescribingascenariowheretheuseofthetestbedinitscurrentBenchmarkingFederatedSPARQLQueryEngines315formmayleadtowrongdecisions.
WehaveexecutedtheFedBenchtestbedwiththreesystems(ANAPSID,ARQ,andFedX)onthethreesetsofqueriesproposed(LifeScience,CrossDomain,andLinkedData)[4].
Wehaveuseddierentsimu-latedcongurationsfornetworklatenciesanddierentdatadistributionsofthedatasetsusedintheexperiments.
Asaresult,weobserveinterestingresultsthatsuggesttheneedforimprovements.
Forinstance,fortheCross-DomainqueryCD1,allsystemsbehavewellinaperfectnetwork(asshowninTable1).
However,theirbehaviorchangesdramaticallywhennetworklatenciesareconsidered.
Forinstance,ARQisnotabletohandlethisqueryformedium-fastandfastnet-works,giventhetimeoutconsidered;thetimeneededtoexecutethequeryinthecaseofFedXgrowsfrom0.
72secs.
(perfectnetwork)to2.
23secs.
(fastnetwork)and16.
93secs.
(medium-fastnetwork);andforANAPSIDtheresultsaresimilarforperfectandfastnetworks,andgrowsslowerinmedium-fastnetworks.
Table1.
EvaluationofFedBenchqueryCD1-Numberofresultsandexecutiontime(secs.
)underdierentnetworklatencyconditions.
Timeoutwassetupto30minutes.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-FastNetwork(DelaysfollowGammadistribution(α=3,β=1.
0)).
NumberofresultsExecutiontime(secs.
)Executiontime(secs.
)(rsttuple)(alltuples)QueryEngineMediumFastPerfectMediumFastPerfectMediumFastPerfectANAPSID6161610.
980.
170.
160.
980.
170.
16FedX61616116.
932.
230.
7216.
932.
230.
72ARQ––63––0.
98––0.
98ThisisalsothecaseforotherFedBenchqueries(e.
g.
,LD10,LD11,LS7,CD2),wheredierentbehaviorscanbeobserveddependingnotonlyonnetworklatency,butalsoonadditionalparameters,e.
g.
,datadistribution.
Whattheseexamplesshowisthatthoseparametersarealsoimportantwhenconsideringfederatedqueryprocessingapproaches,andshouldbeconguredinatestbed,soastoprovidesucientinformationfordecisionmakerstoselecttherighttoolforthetypeofproblembeinghandled,orfortooldeveloperstounderstandbettertheweaknessesoftheirsystemsandimprovethemaccordingly,ifpossible.
Finally,thereisalsoanotheraspectthatisimportantwhenconsideringthequalityofexistingtestbeds,anditisthefactthatsometimestherearenotsucientexplanationsaboutthepurposeofeachoftheparametersthatcanbecongured.
Forexample,inthecaseofFedBenchthereareseveralparam-etersthatareconsideredwhendescribingqueries,aspresentedin[8],suchaswhetherthequeryusesoperatorslikeconjunctions,unions,ltersoroptionals,modierslikeDISTINCT,LIMIT,OFFSETorORDERBY,andstructureslikestar-shapedqueries,chainsorhybridcases.
Whilethisisquiteacomprehen-sivesetoffeaturestocharacterizeaSPARQLquery,therearenoclearreasonsaboutwhyeachofthe36queriesfromthetestbedareincluded.
Onlysomeex-amplesareprovidedin[8],explainingthatLS4"includesastar-shapedgroupoftriplepatternsofdrugswhichisconnectedviaowl:sameAslinktoDBpediadrugentities",orthatCD5isa"chain-likequeryforndinglmentitieslinked316G.
Montoyaetal.
viaowl:sameAsandrestrictedongenreanddirector".
However,therearenoexplanationsinthepaperorinthecorrespondingbenchmarkwebsiteaboutthereasonsforincludingeachofthem.
Furthermore,thereareparametersthatarenotadequatelyrepresented(e.
g.
,commonqueryoperatorslikeoptionalsandl-tersdonotappearincrossdomainorlinkeddataqueries),andcharacteristicsthatarenotsucientlydiscussed(e.
g.
,thenumberoftriplepatternsineachbasicgraphpatternappearinginthequery,theselectivityofeachpartofthequery,etc.
),whichmakesthetestbednotcompleteenough.
Insummary,whileweacknowledgetheimportanceofthesetestbedsinthestateoftheartoffederatedqueryprocessingevaluation,wecanidentifysomeoftheirshortcomingswhichweillustrateanddescribeindierentscenarios.
3BenchmarkDesignInthissectionwedescribesomeofthevariablesthathaveanimpactonfed-eratedSPARQLqueryengines.
Therearetwogroupsofvariables:independentanddependent.
Independentvariablesarethosecharacteristicsthatneedtobeminimallyspeciedinthebenchmarkinordertoensurethatevaluationscenar-iosarereplicable.
Independentvariableshavebeengroupedintofourdimensions:Query,Data,Platform,andEndpoint.
Dependent(orobserved)variablesarethosecharacteristicsthatarenormallyinuencedbyindependentvariables,asdescribedinTable2,andthatwillbemeasuredduringtheevaluation:–EndpointSelectionTime.
ElapsedtimebetweenquerysubmissionandthegenerationoftheSPARQL1.
1federatedqueryannotatedwiththeendpointswheresub-querieswillbeexecuted3.
–ExecutionTime.
Thisvariableisinturncomprisedof:i)Timeforthersttupleorelapsedtimebetweenquerysubmissionandrstanswer,ii)Timedistributionofthereceptionofqueryanswers,andiii)Totalexecutiontime.
–AnswerCompleteness.
Numberofanswersreceivedinrelationtothedataavailableintheselectedendpoints.
Inthefollowingsectionswedescribeindependentvariablesinmoredetail.
3.
1QueryDimensionThisdimensiongroupsvariablesthatcharacterizethequeriesintermsoftheirstructure,evaluation,andquerylanguageexpressivity.
Regardingthestructureofthequery,wefocusonthreemainaspects:i)thequeryplanshape,ii)thenumberofbasictriplepatternsinthequery,andiii)theinstantiationsofsubject,objectand/orpredicatesinthequery.
3ThisvariableisapplicableonlyincaseswherethesystemhandlesSPARQL1.
0queriesandnoendpointsarespeciedinthequery;hence,thesequerieshavetobetranslatedintoSPARQL1.
1orintoanequivalentinternalrepresentation.
BenchmarkingFederatedSPARQLQueryEngines317Table2.
VariablesthatimpactthebehaviorofSPARQLfederatedenginesObservedVariablesIndependentVariablesEndpointSelectionTimeExecutionTimeAnswerCompletenessQueryqueryplanshape#basictriplepatterns#instantiationsandtheirpositionjoinselectivity#intermediateresultsanswersizeusageofquerylanguageexpressivity#generalpredicatesDatadatasetsizedatafrequencydistributiontypeofpartitioningdataendpointdistributionPlatformcacheon/offRAMavailable#processorsEndpoint#endpointsendpointtyperelationgraph/endpoint/instancenetworklatencyinitialdelaymessagesizetransferdistributionanswersizelimittimeoutShape.
Queryplansmaybestar-shaped,chain-shapedoracombinationofthem,asdescribedin[8].
Ingeneral,theshapeoftheinputqueriesandofthequeryplansgeneratedbythesystemshasanimportantimpactonthethreedependentvariablesidentiedinourevaluation(endpointselectiontime,ifap-plicable,executiontimeandanswercompleteness).
Theshapeofthequeryplanswillbeinturnaectedbythenumberofbasictriplepatternsinthequerysincethisnumberwillinuencethenalqueryshape.
Queryevaluationsystemscanapplydierenttechniqueswhengeneratingqueryplansforaspecictypeofinputquery,andthiswillnormallyyielddierentselectionandexecutiontimes,andcompletenessresults.
Forexample,aqueryplangeneratormayormaynotgrouptogetherallgraphpatternsrelatedtooneendpoint.
Instantiationsandtheirpositionintriplepatterns.
Thisisrelatedtowhetheranyoftheelementsofthetriplepatternsinthequery(subject,objectorpredicate)arealreadyinstantiated,i.
e.
,boundedtosomeURI.
Togetherwithjoinselectivity,instantiationhasanimportantimpactonthepotentialnum-berofintermediateresultsthatmaybegeneratedthroughoutqueryexecution.
Forinstance,theabsenceofinstantiations(e.
g.
,presenceofvariables)inthepredicatepositionofatriplepatternmayhaveanimportantimpactinqueryexecutiontime,becauseseveralendpointsmaybeabletoprovideanswersforthepattern.
Answersizeandnumberofintermediateresults.
Ifthenumberofan-swersorintermediateresultsinvolvedinaqueryexecutionislarge,itmaytakealongtimetotransferthemacrossthenetwork,andhencethismayaectthequeryexecutiontime.
318G.
Montoyaetal.
Usageofquerylanguageexpressivity.
TheuseofspecicSPARQLop-eratorsmayaecttheexecutiontimeandthecompletenessofthenalresultset.
Forexample,theOPTIONALoperatorisoneofthemostcomplexoperatorsinSPARQL[5]andmayaddagoodnumberofintermediateresults,whiletheFILTERoperatormayrestricttheintermediateresultsandanswersize.
Generalpredicates(e.
g.
,rdf:type,owl:sameAs)arecommonlyusedinSPARQLqueries.
However,astheynormallyappearinmostdatasetsitisnotalwayscleartowhichendpointthecorrespondingsubqueryshouldbesubmit-ted,andthismayhaveanimpactinbothendpointselectionandqueryexecutiontime.
3.
2DataDimensionWenowdescribetheindependentvariablesrelatedtothecharacteristicsoftheRDFdatasetsthatarebeingaccessed.
AnRDFdatasetcanbedenedintermsofitssizeanditsstructuralcharacteristicslikethenumberofsubjects,pred-icatesandobjects,andtheinandoutdegreeofproperties.
Thesecharacteristicsimpactthenumberoftriplesthataretransferred,andhencethetotalexecutiontime.
Additionally,theymayaecttheperformanceoftheindividualendpoints.
Partitioninganddatadistributionaretwoofthemostimportantvari-ablesthatneedtobespeciedinthecontextofqueriesagainstfederationsofendpoints.
PartitioningreferstothewaythattheRDFdatasetisfragmented.
Datadistributionisthewaypartitionsareallocatedtothedierentendpoints.
Datamaybefullycentralized,fullydistributed,orsomewhereinbetween.
Adatasetmaybefragmentedintodisjunctpartitions;thepartitioningmaybedonehorizontally,verticallyoracombinationofboth.
Horizontalpartitioningfragmentstriplessothattheymaycontaindierentproperties.
Verticalparti-tioningproducesfragmentswhichcontainallthetriplesofatleastoneofthepropertiesinthedataset.
Horizontalpartitioningimpactsonthecompletenessoftheanswerwhereasverticalpartitioningaectstheexecutiontime.
Parti-tionsmaybereplicatedinseveralendpoints,eveninalloftheendpoints,i.
e.
,fullyreplicated,sothattheavailabilityofthesystemincreasesincaseofend-pointfailureorendpointdelay.
Table3comparesthebehaviorofANAPSIDandFedXwithdierentcongurations.
Thetwoenginesbehavesimilarlywhenthereisonedatasetperendpointandinhorizontalpartitioningwithoutreplication.
Forverticalpartitioningwithoutreplication,oneengineissuperiortotheother.
Whenpartitioningwithreplication,oneengineoutperformstheotherinverticalpartitioning,andtheinversebehavioroccurswithhorizontalpartitioning.
Table4showsanotherexampleoftheeectofdatadistributiononthequeryexecutiontime,againforANAPSIDandFedX.
Wecanobservethatwhentherearemultipleendpoints,resultsaresimilar,whilewithanetworkwithnodelay(perfectnetwork)andalldatasetsinasingleendpoint,oneoftheenginesclearlyoutperformstheotherinoneorderofmagnitude.
ResultsinTables3and4supporttheclaimthatdatapartitioning,datadistributionandnetworkdelaysneedtobeexplicitlycongurableintestbeds.
BenchmarkingFederatedSPARQLQueryEngines319Table3.
ImpactofDataPartitioningandDistributiononFedBenchqueryLD10(Per-fectNetwork).
VerticalPartitioning:triplesofpredicatesskos:subject,owl:sameAs,andnytimes:latestusewerestoredinfragments.
VerticalPartitioningWithoutReplication:threeendpoints,eachfragmentinadierentendpoint.
VerticalPar-titioningWithReplication:correspondstousefourendpointsandstoreoneofthethreefragmentsinthefourendpoints,anotherfragmentintwoendpoints,andthelastfragmentinoneendpoint.
HorizontalPartitioning:triplesofthethreepredicateswerepartitionedintwofragments;eachfragmenthasdatatoproduceatleastoneanswer.
HorizontalPartitioningWithoutReplicationtwoendpoints;onefragmentinadierentendpoint.
HorizontalPartitioningWithReplicas:fourendpoints;onefragmentisreplicatedineachendpoint,theotherfragmentinonlyoneendpoint.
QueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsOneDatasetperEndpointFedX1.
061.
063ANAPSID1.
081.
283VerticalPartitioningWithoutReplicationFedX0.
690.
693ANAPSID3.
8814.
253HorizontalPartitioningWithoutReplicationFedX0.
720.
723ANAPSID0.
030.
031VerticalPartitioningWithReplicationFedX0.
850.
8514ANAPSID4.
0614.
483HorizontalPartitioningWithReplicationFedX0.
910.
9125ANAPSID0.
060.
0613.
3PlatformDimensionThePlatformdimensiongroupsvariablesthatarerelatedtothecomputinginfrastructureusedintheevaluation.
Hereweincludeaminimumsetofpa-rameters,relatedtothesystem'scache,availableRAMmemoryandnumberofprocessors,sincethisdimensionmaycontainmanymoreparametersthatarerelevantinthiscontext,andthatshouldanywaybeexplicitlyspeciedinanyevaluationsetupwhenusingthistestbed.
TurningthecachemanagementfunctioninthesystemtogetherwiththeavailableRAMmayaectgreatlythequeryexecutiontime.
ThemeaningofdroppingandwarmingupcacheneedstobeclearlyspeciedaswellastheTable4.
ImpactofDataDistributiononFedBenchqueryCD1(PerfectNetwork).
AllDatasetsinoneendpointversusdatasetsdistributedindierentendpoints.
QueryExecutiontimeExecutiontimeNumberEngineFirstTuple(secs.
)AllTime(secs.
)ofResultsSingleEndpoint-AllDatabasetsFedX0.
510.
5161ANAPSID0.
0450.
04661MultipleEndpointsFedX0.
720.
7261ANAPSID0.
170.
1761320G.
Montoyaetal.
numberofiterationswhereanexperimentisruninwarmcache,andwhencachecontentsaredroopedo.
Inthecontextoffederationsofendpoints,informationonendpointcapabilitiesmaybestoredincache.
Thenumberofprocessorsisalsoarelevantvariableinthecontextoffederatedqueries.
Iftheinfrastruc-tureoersseveralprocessors,operatorsmayparallelizetheirexecution,andtheexecutiontimemaybeaectedpositively.
3.
4EndpointDimensionThisdimensioncomprisesvariablesthatarerelatedtothenumberandcapabil-itiesoftheendpointsusedinthetestbed.
TherstvariabletobeconsideredisthenumberofSPARQLendpointswherethequerywillbesubmittedandthetypeofendpointsthatareusedfortheevaluation.
Therstvariableaectsallthreeobservedvariables,speciallytheresultcompletenessbecausedierentendpointsmayproducedierentan-swers.
Therelationshipbetweenthenumberofinstances,graphsandendpointsofthesystemsusedduringtheevaluationisalsoanimportantas-pectthatneedstobespecied.
Dierentcongurationsoftheserelationshipsmayimpactthethreedependentvariables.
Thereareseveralvariablesthathaveanimportantimpactontheexecutiontime,suchasthetransferdistribution,whichisthetimedistributionofthetransmissionofpacketsbytheendpoints,thenetworklatency,whichdenesthedelayinsendingpacketsthroughthenetwork,andtheinitialendpointdelay.
AnexampleoftheimpactofdierentnetworkdelaysisillustratedinTable5.
TwoqueriesfromtheLinkedDatacollectionofFedBenchwereexecuted(LD10andLD11).
NotethatANAPSIDandFedXbehavesimilarlyinLD10whenthereisnodelay;however,whendelaysareconsidered,FedXoutperformsANAPSID.
Ontheotherhand,inLD11ANAPSIDoutperformsFedXwhendelaysarepresent.
Infact,ANAPSIDisabletoproducethersttupleafterthesameamountoftime,independentlyofthedelay.
Finally,SPARQLendpointsnormallyallowconguringalimitonthean-swersizeofthequeriesandatimeout,soastopreventuserstoquerytheentiredataset.
Thismaygenerateemptyresultsetsorincompleteresults,particularlywhenendpointsub-queriesarecomplex.
4SomeExperimentalResultsInthissectionweillustratehowthetestbedextensioncanbeusedtobetterunderstandthebehaviorofsomeoftheexistingfederatedqueryengines.
Theextendedtestbedhasbeenexecutedonthreesystems(ANAPSID,ARQandFedX)withseveralcongurationsfortheindependentvariablesidentiedinSection3.
ThecompleteresultsetgeneratedbytheseexecutionscanbebrowsedattheDEFENDERportal4.
4http://159.
90.
11.
58/BenchmarkingFederatedSPARQLQueryEngines321Table5.
ImpactofNetworklatencyonFedBenchqueriesLD10andLD11.
Timeoutwassetupto30minutesandMessageSizeis16KB.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-Fast(DelaysfollowGammadistribution(α=3,β=1.
0);Medium-Slow(DelaysfollowGammadistribution(α=3,β=1.
5);Slow(DelaysfollowGammadistribution(α=5,β=2.
0)).
QueryQueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsPerfectNetworkANAPSIDLD101.
081.
293LD110.
060.
09376FedXLD101.
061.
063LD115.
445.
44376FastNetworkANAPSIDLD1018.
1322.
893LD110.
062.
80376FedXLD103.
453.
453LD1114.
2114.
22376MediumFastNetworkANAPSIDLD10191.
78241.
583LD110.
0727.
86376FedXLD1027.
2727.
273LD11108.
93108.
93376MediumSlowNetworkANAPSIDLD10287.
88362.
593LD110.
0541.
74376FedXLD1041.
4241.
423LD11162.
45162.
45376SlowNetworkANAPSIDLD10653.
44819.
723LD110.
0992.
52376FedXLD1087.
1987.
193LD11347.
93347.
93376Nowwewillfocusononeoftheanalysesthatasystemdevelopermaybeinterestedin,inthecontextofthecontinuousbenchmarkingprocessthatwehavereferredtointhispaper.
Thatis,wearenotanalyzingthewholesetofresultsobtainedfromtheexecution,butonlyasubsetofit.
Specically,let'sassumethatweareinterestedinunderstandingtheperformanceofthethreeevaluatedsystemsunderdierentdatadistributionsinanidealscenario,withnoornegligibleconnectionlatency.
Ourhypothesisisthatexistingqueryenginesaresensibletothewaydataisdistributedalongdierentendpoints,evenwhenthenetworkisperfect.
Therefore,theseresultsmaybeusefultovalidatethathypothesisandtounderstandwhetherasetoffederateddatasetsforwhichwehavethecorrespondingRDFdumpsshouldbebetterstoredinasingleendpointorindierentendpointstooeranswersmoreeciently.
Basedonthesetofvariablesidentiedinourstudy,thefollowingexperimentalsetupisused:DatasetsandQueryBenchmarks.
Weran36queriesagainsttheFedBenchdatasetcollections[8]:DBPedia,NYTimes,Geonames,KEGG,ChEBI,Drugbank,Jamendo,LinkedMDB,andSWDogFood.
Thesequeriesinclude25FedBenchqueriesandelevencomplexqueries5.
Thelatterareaddedto5http://www.
ldc.
usb.
ve/~mvidal/FedBench/queries/ComplexQueries322G.
Montoyaetal.
coversomeofthemissingelementsintheformergroupofqueries.
Theyarecomprisedofbetween6and48triplepatterns,andcanbedecomposedintoupto8sub-queries;andtheycoverdierentSPARQLoperators.
Virtuoso6wasusedtoimplementendpoints,andthetimeoutwassetupto240secs.
or71,000tuples.
ExperimentswereexecutedonaLinuxMintmachinewithanIntelPentiumCore2DuoE75002.
93GHz8GBRAM1333MHzDDR3.
NetworkLatency.
Weconguredaperfectnetworkwithnodelays.
Thesizeofthemessagecorrespondedto16KB.
DataDistribution.
Weconsideredtwodierentdistributionsofthedata:i)Complete:theFedBenchcollectionswerestoredintoasinglegraphandmadeaccessiblethroughonesingleSPARQLendpoint,andii)Federated:theFedBenchcollectionswerestoredinnineVirtuosoendpoints.
Therefore,weconsiderthequeriesinfourgroupsandsixcongurations:Con-guration1:ANAPSIDCompleteDistribution,Conguration2:ANAPSIDFederatedDistribution,Conguration3:ARQCompleteDistribution,Con-guration4:ARQFederatedDistribution,Conguration5:FedXCompleteDistribution,Conguration6:FedXFederatedDistribution.
Ineachcong-uration,thecorrespondingquerieswereorderedaccordingtothetotalexecu-tiontimeconsumedbythecorrespondingengines.
Forexample,ANAPSIDinaCompleteDistribution,i.
e.
,Conguration1,theCross-Domainquerieswereorderedasfollows:CD2,CD3,CD4,CD5,CD1,CD7,andCD6.
QueriesofeachcongurationwerecomparedusingtheSpearman'sRhocorrelation.
Ahighpos-itivevalueofcorrelationvaluebetweentwocongurationsindicatesthatthecorrespondingengineshadasimilarbehavior,i.
e.
,thetrendsofexecutiontimeofthetwoenginesaresimilar.
Thus,whenConguration1iscomparedtoitself,theSpearman'sRhocorrelationreachesthehighestvalue(1.
0).
Ontheotherhand,anegativevalueindicatesaninversecorrelation;forexample,thishappenedwithComplexQueriestoARQinaCompleteDistribution(Cong-uration3)whencomparedtoFedXFederatedDistribution(Conguration6);itsvalueis-0.
757.
Finally,avalueof0.
0representsthatthereisnocorrela-tionbetweenthetwocongurations,e.
g.
,forLifeSciencequeriesConguration4andConguration6.
Figure1illustratestheresultsofthisspecicstudy(again,thedatausedforthisstudyisavailablethroughtheDEFENDERpor-tal).
Whitecirclesrepresentthehighestvalueofcorrelation;redonescorrespondtoinversecorrelations,whileblueonesindicateapositivecorrelation.
Thesizeofthecirclesisproportionaltothevalueofthecorrelation.
Givenagroupofqueries,alowvalueofcorrelationofoneengineintwodierentdistributionssuggeststhatthedistributionaectstheenginebehavior,e.
g.
,FedXandARQinComplexQuerieswithdierentdatadistributionshavecorrelationvaluesof0.
143and0.
045,respectively.
Furthermore,thenumberofsmallbluecirclesbe-tweencongurationsofdierentdatadistributionsofthesameengine,indicatethatthisparameteraectsthebehaviorofthestudiedengine.
BecausethereareseveralofthesepointsintheComplexQueriesplot,wecanconcludethat6http://virtuoso.
openlinksw.
com/BenchmarkingFederatedSPARQLQueryEngines323(a)CrossDomain(CD)(b)LinkedData(LD)(c)LifeScience(LS)(d)NewComplexQueries(C)Fig.
1.
Spearman'sRhoCorrelationofQueriesinthreeFedBenchsetsofqueries(a)Cross-Domain(CD),(b)LifeScience(LS),(c)LinkedData(LD)and(d)NewCom-plexQueries.
Sixcongurations:(1)ANAPSIDCompleteDistribution;(2)ANAPSIDFederatedDistribution;(3)ARQCompleteDistribution;(4)ARQFederatedDis-tribution;(5)FedXCompleteDistribution;(6)FedXFederatedDistribution.
Whitecirclescorrespondtocorrelationvalueof1.
0;bluecirclesindicateapositivecorrelation(Fig.
1(d)points(3,4)and(5,6)correlationvalues0.
045and0.
143,respectively);redcirclesindicateanegativecorrelation(Fig.
1(d)points(2,6)and(6,3)correlationvalues-0.
5and-0.
757,respectively).
Circles'diametersindicateabsolutecorrelationvalues.
thesetwoparameters(querycomplexityanddatadistribution)allowuncoveringengines'behaviorthatcouldnotbeobservedbefore.
Thisillustratestheneedfortheextensionsproposedinthispaper.
5ConclusionandFutureWorkInthispaperwehaveshownthatthereisaneedtoextendcurrentfederatedSPARQLquerytestbedswithadditionalvariablesandcongurationsetups(e.
g.
,datapartitioninganddistribution,networklatency,andquerycomplexity),soastoprovidemoreaccuratedetailsofthebehaviorofexistingengines,whichcanthenbeusedtoprovidebettercomparisonsandasinputforimprovementproposals.
Takingthoseadditionalvariablesintoaccount,wehaveextensivelyevaluatedthreeoftheexistingengines(ANAPSID,ARQandFedX),andhavemadeavailablethoseresultsforpublicconsumptionintheDEFENDERportal,324G.
Montoyaetal.
whichweplantomaintainup-to-dateonaregularbasis.
Wehavealsoshownhowthegeneratedresultdatasetcanbeusedtovalidatehypothesesaboutthesystems'behavior.
OurfutureworkplanswillbefocusedoncontinuingwiththeevaluationofadditionalfederatedSPARQLqueryengines,andwiththeinclusionofadditionalparametersinthebenchmarkthatmaystillbeneededtoprovidemoreaccurateandwell-informedresults.
Acknowledgements.
ThisworkhasbeenfundedbytheprojectmyBigData(TIN2010-17060),andDID-USB.
WethankMaribelAcosta,CosminBasca,andRaulGarca-Castroforfruitfuldiscussions.
References1.
Acosta,M.
,Vidal,M.
-E.
,Lampo,T.
,Castillo,J.
,Ruckhaus,E.
:ANAPSID:AnAdaptiveQueryProcessingEngineforSPARQLEndpoints.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
18–34.
Springer,Heidelberg(2011)2.
Buil-Aranda,C.
,Arenas,M.
,Corcho,O.
:SemanticsandOptimizationoftheSPARQL1.
1FederationExtension.
In:Antoniou,G.
,Grobelnik,M.
,Simperl,E.
,Parsia,B.
,Plexousakis,D.
,DeLeenheer,P.
,Pan,J.
(eds.
)ESWC2011,PartII.
LNCS,vol.
6644,pp.
1–15.
Springer,Heidelberg(2011)3.
Lynden,S.
,Kojima,I.
,Matono,A.
,Tanimura,Y.
:ADERIS:AnAdaptiveQueryProcessorforJoiningFederatedSPARQLEndpoints.
In:Meersman,R.
,Dillon,T.
,Herrero,P.
,Kumar,A.
,Reichert,M.
,Qing,L.
,Ooi,B.
-C.
,Damiani,E.
,Schmidt,D.
C.
,White,J.
,Hauswirth,M.
,Hitzler,P.
,Mohania,M.
(eds.
)OTM2011,PartII.
LNCS,vol.
7045,pp.
808–817.
Springer,Heidelberg(2011)4.
Montoya,G.
,Vidal,M.
-E.
,Acosta,M.
:DEFENDER:aDEcomposerforquEriesagainstfeDERationsofendpoints.
In:ExtendedSemanticWebConference,ESWCWorkshopandDemo2012(toappear)5.
Perez,J.
,Arenas,M.
,Gutierrez,C.
:SemanticsandcomplexityofSPARQL.
TODS34(3)(2009)6.
Prud'hommeaux,E.
,Buil-Aranda,C.
:SPARQL1.
1federatedquery(November2011)7.
Quilitz,B.
,Leser,U.
:QueryingDistributedRDFDataSourceswithSPARQL.
In:Bechhofer,S.
,Hauswirth,M.
,Homann,J.
,Koubarakis,M.
(eds.
)ESWC2008.
LNCS,vol.
5021,pp.
524–538.
Springer,Heidelberg(2008)8.
Schmidt,M.
,G¨orlitz,O.
,Haase,P.
,Ladwig,G.
,Schwarte,A.
,Tran,T.
:FedBench:ABenchmarkSuiteforFederatedSemanticDataQueryProcessing.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
585–600.
Springer,Heidelberg(2011)9.
Schmidt,M.
,Hornung,T.
,Lausen,G.
,Pinkel,C.
:SP2bench:ASPARQLperfor-mancebenchmark.
In:ICDT,pp.
4–33(2010)10.
Schwarte,A.
,Haase,P.
,Hose,K.
,Schenkel,R.
,Schmidt,M.
:FedX:OptimizationTechniquesforFederatedQueryProcessingonLinkedData.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
601–616.
Springer,Heidelberg(2011)
ZJI是成立于2011年原Wordpress圈知名主机商—维翔主机,2018年9月更名为ZJI,主要提供香港、日本、美国独立服务器(自营/数据中心直营)租用及VDS、虚拟主机空间、域名注册业务。本月商家针对香港阿里云线路独立服务器提供月付立减270-400元优惠码,优惠后香港独立服务器(阿里云专线)E3或者E5 CPU,SSD硬盘,最低每月仅480元起。阿里一型CPU:Intel E5-2630L...
wordpress公司网站模板,wordpresss简洁风格的高级通用自适应网站效果,完美自适应支持多终端移动屏幕设备功能,高级可视化后台自定义管理模块+规范高效的搜索优化。wordpress公司网站模板采用标准的HTML5+CSS3语言开发,兼容当下的各种主流浏览器: IE 6+(以及类似360、遨游等基于IE内核的)、Firefox、Google Chrome、Safari、Opera等;同时...
香港服务器多少钱一个月?香港服务器租用配置价格一个月多少,现在很多中小型企业在建站时都会租用香港服务器,租用香港服务器可以使网站访问更流畅、稳定性更好,安全性会更高等等。香港服务器的租用和其他地区的服务器租用配置元素都是一样的,那么为什么香港服务器那么受欢迎呢,香港云服务器最便宜价格多少钱一个月呢?阿里云轻量应用服务器最便宜的是1核1G峰值带宽30Mbps,24元/月,288元/年。不过我们一般选...
linuxmint为你推荐
蓝瘦香菇被抢注蓝瘦香菇当事人被质疑炒作称没想红 蓝瘦香菇什么意思firetrap流言终结者 中的银幕神偷 和开保险柜 的流言是 取材与 那几部电影的广东GDP破10万亿中国GDP10万亿,广东3万亿多。占了中国三分之一的经纪。如果,我是说如果。广东独立了。中国会有什地陷裂口造成地陷都有哪些原因?罗伦佐娜罗拉芳娜 (西班牙小姐)谁可以简单的介绍以下同ip站点查询如何查看几个站是不是同IP长尾关键词挖掘工具大家是怎么挖掘长尾关键词的?同一服务器网站同一服务器上的域名/网址无法访问8090lu.com8090lu.com怎么样了?工程有进展吗?kb123.net股市里的STAQ、NET市场是什么?
深圳域名空间 lnmp 香港服务器99idc ix主机 香港主机 缓存服务器 免费smtp服务器 本网站在美国维护 福建天翼加速 蜗牛魔方 上海域名 日本bb瘦 lol台服官网 nerds 酷番云 主机管理系统 日本代理ip 云服务是什么意思 万网服务器 国外免费网盘 更多