Delayslinuxmint
linuxmint 时间:2021-03-28 阅读:(
)
BenchmarkingFederatedSPARQLQueryEngines:AreExistingTestbedsEnoughGabrielaMontoya1,Maria-EstherVidal1,OscarCorcho2,EdnaRuckhaus1,andCarlosBuil-Aranda31UniversidadSimonBolvar,Venezuela{gmontoya,mvidal,ruckhaus}@ldc.
usb.
ve2OntologyEngineeringGroup,UniversidadPolitecnicadeMadrid,Spainocorcho@fi.
upm.
es3DepartmentofComputerScience,PonticiaUniversidadCatolica,Chilecbuil@ing.
puc.
clAbstract.
Testbedsproposedsofartoevaluate,compare,andeventu-allyimproveSPARQLqueryfederationsystemshavestillsomelimita-tions.
Somevariablesandcongurationsthatmayhaveanimpactonthebehaviorofthesesystems(e.
g.
,networklatency,datapartitioningandqueryproperties)arenotsucientlydened;thisaectstheresultsandrepeatabilityofindependentevaluationstudies,andhencetheinsightsthatcanbeobtainedfromthem.
InthispaperweevaluateFedBench,themostcomprehensivetestbeduptonow,andempiricallyprobetheneedofconsideringadditionaldimensionsandvariables.
TheevaluationhasbeenconductedonthreeSPARQLqueryfederationsystems,andtheanalysisoftheseresultshasallowedtouncoverpropertiesofthesesystemsthatwouldnormallybehiddenwiththeoriginaltestbeds.
1IntroductionThenumberofRDFdatasetsmadepubliclyavailablethroughSPARQLend-pointshasexplodedinrecentyears.
Thisfact,togetherwiththepotentialaddedvaluethatcanbeobtainedfromthecombinationofsuchdistributeddatasources,hasmotivatedthedevelopmentofsystemsthatallowexecutingqueriesoverfed-eratedSPARQLendpoints(e.
g.
,SPARQL-DQP[2],Jena'sARQ1,RDF::Query2,ANAPSID[1],FedX[10],ADERIS[3]).
SomesystemsuseSPARQL1.
0orad-hocextensions,whileothersrelyonthequeryfederationextensionsthatarebeingproposedaspartoftheupcomingSPARQL1.
1specication[6].
InparalleltothedevelopmentoffederatedSPARQLqueryevaluationsys-tems,severaltestbedshavebeencreated(e.
g.
,asdescribedin[2,7,8]),whichcomplementthosealreadyusedforsingle-endpointqueryevaluation.
Theroleofthesetestbedsistoallowevaluatingandcomparingthemaincharacteristicsofthesesystems,soastoprovideenoughinformationtoimprovethem.
Among1http://jena.
apache.
org/2http://search.
cpan.
org/~gwilliams/RDF-Query/P.
Cudre-Maurouxetal.
(Eds.
):ISWC2012,PartII,LNCS7650,pp.
313–324,2012.
cSpringer-VerlagBerlinHeidelberg2012314G.
Montoyaetal.
thefeaturesevaluatedbythesetestbedswecancite:i)functionalrequirementssupported,ii)eciencyoftheimplementationswithdierentcongurationsofdatasetsandwithdierenttypesofqueries,oriii)resiliencetochangesinthecongurationsofthesesystemsandtheunderlyingdatasets.
ThemostrecentandcompletetestbedisFedBench[8],whichproposesavarietyofqueriesindierentdomainsandwithdierentcharacteristics,includingstar-shaped,chain-likeandhybridqueries,andcomplexqueryformsusinganadaptationofSP2Bench[9].
Thesetestbedsarestepsforwardtowardsestablishingacontinuousbenchmark-ingprocessoffederatedSPARQLqueryengines.
However,theyarestillfarfromeectivelysupportingsuchbenchmarkingobjectives.
Infact,theydonotspecifycompletelyorevenconsidersomeofthedependentandindependentvariablesandcongurationsetupsthatcharacterizethetypesofproblemstobetackledinfederatedSPARQLqueryprocessing,andthatclearlyaecttheperformanceandqualityofdierentsolutions.
Thismayleadtoincorrectcharacterizationswhenthesetestbedsareusedtoselectthemostappropriatesystemsinagivenscenario,ortodecidethenextstepsintheirdevelopment.
Forexample,testbedsliketheonein[2]havelimitations.
First,queriesareexecuteddirectlyonliveSPARQLendpoints;thismeansthatexperimentsarenotreproducible,astheloadofendpointsandnetworklatencyvariesovertime.
Second,querieswereconstructedforthedataavailableintheselectedendpointsatthetimeofgeneratingthetestbed,butthestructureoftheseunderlyingRDFdatasourceschanges,andmayresultinqueriesthatarereturningdierentanswersorthatdonotreturnanyansweratall.
IncaseslikeFedBench[8],thelevelofreproducibilityisimprovedbyusingdatasetsthatcanbehandledlocally.
However,asshowninSection2,therearevariablesthatarenotyetconsideredinthisbenchmark(e.
g.
,networklatency,datasetcongurations)andthatareimportantinordertoobtainmoreaccurateandinformativeresults.
Theobjectiveofthispaperistodescriberstthecharacteristicsexhibitedbythesetestbeds(mainlyfocusingonFedBench)andreectontheircurrentlim-itations.
Additionalvariablesandcongurationsetups(e.
g.
,newqueries,newcongurationsofnetworklatencydetails,newdatasetdistributionparameters)areproposedinordertoprovidemoreaccurateandwell-informedoverviewsofthecurrentstatusofeachoftheevaluatedsystems,sothattheexperimentstobeexecutedcanoermoreaccurateinformationaboutthebehavioroftheevalu-atedsystems,andhencetheycanbeusedincontinuousimprovementprocesses.
Finally,wedescribebrieytheresultsofourevaluationofthisextendedtestbedusingthreedierentfederatedqueryengines:ARQ,ANAPSID,andFedX.
2SomeLimitationsofExistingTestbedsThereisnounique"one-size-ts-all"testbedtomeasureeverycharacteristicneededbyanapplicationthatrequiressomeformoffederatedqueryprocess-ing[8].
However,regardless,existingtestbedscanstillbeimprovedsothattheycanfullltheirroleincontinuousbenchmarkingprocesses.
Wewillrstillustratewhyweneedtoimproveexistingtestbeds,particularlyFedBench,bydescribingascenariowheretheuseofthetestbedinitscurrentBenchmarkingFederatedSPARQLQueryEngines315formmayleadtowrongdecisions.
WehaveexecutedtheFedBenchtestbedwiththreesystems(ANAPSID,ARQ,andFedX)onthethreesetsofqueriesproposed(LifeScience,CrossDomain,andLinkedData)[4].
Wehaveuseddierentsimu-latedcongurationsfornetworklatenciesanddierentdatadistributionsofthedatasetsusedintheexperiments.
Asaresult,weobserveinterestingresultsthatsuggesttheneedforimprovements.
Forinstance,fortheCross-DomainqueryCD1,allsystemsbehavewellinaperfectnetwork(asshowninTable1).
However,theirbehaviorchangesdramaticallywhennetworklatenciesareconsidered.
Forinstance,ARQisnotabletohandlethisqueryformedium-fastandfastnet-works,giventhetimeoutconsidered;thetimeneededtoexecutethequeryinthecaseofFedXgrowsfrom0.
72secs.
(perfectnetwork)to2.
23secs.
(fastnetwork)and16.
93secs.
(medium-fastnetwork);andforANAPSIDtheresultsaresimilarforperfectandfastnetworks,andgrowsslowerinmedium-fastnetworks.
Table1.
EvaluationofFedBenchqueryCD1-Numberofresultsandexecutiontime(secs.
)underdierentnetworklatencyconditions.
Timeoutwassetupto30minutes.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-FastNetwork(DelaysfollowGammadistribution(α=3,β=1.
0)).
NumberofresultsExecutiontime(secs.
)Executiontime(secs.
)(rsttuple)(alltuples)QueryEngineMediumFastPerfectMediumFastPerfectMediumFastPerfectANAPSID6161610.
980.
170.
160.
980.
170.
16FedX61616116.
932.
230.
7216.
932.
230.
72ARQ––63––0.
98––0.
98ThisisalsothecaseforotherFedBenchqueries(e.
g.
,LD10,LD11,LS7,CD2),wheredierentbehaviorscanbeobserveddependingnotonlyonnetworklatency,butalsoonadditionalparameters,e.
g.
,datadistribution.
Whattheseexamplesshowisthatthoseparametersarealsoimportantwhenconsideringfederatedqueryprocessingapproaches,andshouldbeconguredinatestbed,soastoprovidesucientinformationfordecisionmakerstoselecttherighttoolforthetypeofproblembeinghandled,orfortooldeveloperstounderstandbettertheweaknessesoftheirsystemsandimprovethemaccordingly,ifpossible.
Finally,thereisalsoanotheraspectthatisimportantwhenconsideringthequalityofexistingtestbeds,anditisthefactthatsometimestherearenotsucientexplanationsaboutthepurposeofeachoftheparametersthatcanbecongured.
Forexample,inthecaseofFedBenchthereareseveralparam-etersthatareconsideredwhendescribingqueries,aspresentedin[8],suchaswhetherthequeryusesoperatorslikeconjunctions,unions,ltersoroptionals,modierslikeDISTINCT,LIMIT,OFFSETorORDERBY,andstructureslikestar-shapedqueries,chainsorhybridcases.
Whilethisisquiteacomprehen-sivesetoffeaturestocharacterizeaSPARQLquery,therearenoclearreasonsaboutwhyeachofthe36queriesfromthetestbedareincluded.
Onlysomeex-amplesareprovidedin[8],explainingthatLS4"includesastar-shapedgroupoftriplepatternsofdrugswhichisconnectedviaowl:sameAslinktoDBpediadrugentities",orthatCD5isa"chain-likequeryforndinglmentitieslinked316G.
Montoyaetal.
viaowl:sameAsandrestrictedongenreanddirector".
However,therearenoexplanationsinthepaperorinthecorrespondingbenchmarkwebsiteaboutthereasonsforincludingeachofthem.
Furthermore,thereareparametersthatarenotadequatelyrepresented(e.
g.
,commonqueryoperatorslikeoptionalsandl-tersdonotappearincrossdomainorlinkeddataqueries),andcharacteristicsthatarenotsucientlydiscussed(e.
g.
,thenumberoftriplepatternsineachbasicgraphpatternappearinginthequery,theselectivityofeachpartofthequery,etc.
),whichmakesthetestbednotcompleteenough.
Insummary,whileweacknowledgetheimportanceofthesetestbedsinthestateoftheartoffederatedqueryprocessingevaluation,wecanidentifysomeoftheirshortcomingswhichweillustrateanddescribeindierentscenarios.
3BenchmarkDesignInthissectionwedescribesomeofthevariablesthathaveanimpactonfed-eratedSPARQLqueryengines.
Therearetwogroupsofvariables:independentanddependent.
Independentvariablesarethosecharacteristicsthatneedtobeminimallyspeciedinthebenchmarkinordertoensurethatevaluationscenar-iosarereplicable.
Independentvariableshavebeengroupedintofourdimensions:Query,Data,Platform,andEndpoint.
Dependent(orobserved)variablesarethosecharacteristicsthatarenormallyinuencedbyindependentvariables,asdescribedinTable2,andthatwillbemeasuredduringtheevaluation:–EndpointSelectionTime.
ElapsedtimebetweenquerysubmissionandthegenerationoftheSPARQL1.
1federatedqueryannotatedwiththeendpointswheresub-querieswillbeexecuted3.
–ExecutionTime.
Thisvariableisinturncomprisedof:i)Timeforthersttupleorelapsedtimebetweenquerysubmissionandrstanswer,ii)Timedistributionofthereceptionofqueryanswers,andiii)Totalexecutiontime.
–AnswerCompleteness.
Numberofanswersreceivedinrelationtothedataavailableintheselectedendpoints.
Inthefollowingsectionswedescribeindependentvariablesinmoredetail.
3.
1QueryDimensionThisdimensiongroupsvariablesthatcharacterizethequeriesintermsoftheirstructure,evaluation,andquerylanguageexpressivity.
Regardingthestructureofthequery,wefocusonthreemainaspects:i)thequeryplanshape,ii)thenumberofbasictriplepatternsinthequery,andiii)theinstantiationsofsubject,objectand/orpredicatesinthequery.
3ThisvariableisapplicableonlyincaseswherethesystemhandlesSPARQL1.
0queriesandnoendpointsarespeciedinthequery;hence,thesequerieshavetobetranslatedintoSPARQL1.
1orintoanequivalentinternalrepresentation.
BenchmarkingFederatedSPARQLQueryEngines317Table2.
VariablesthatimpactthebehaviorofSPARQLfederatedenginesObservedVariablesIndependentVariablesEndpointSelectionTimeExecutionTimeAnswerCompletenessQueryqueryplanshape#basictriplepatterns#instantiationsandtheirpositionjoinselectivity#intermediateresultsanswersizeusageofquerylanguageexpressivity#generalpredicatesDatadatasetsizedatafrequencydistributiontypeofpartitioningdataendpointdistributionPlatformcacheon/offRAMavailable#processorsEndpoint#endpointsendpointtyperelationgraph/endpoint/instancenetworklatencyinitialdelaymessagesizetransferdistributionanswersizelimittimeoutShape.
Queryplansmaybestar-shaped,chain-shapedoracombinationofthem,asdescribedin[8].
Ingeneral,theshapeoftheinputqueriesandofthequeryplansgeneratedbythesystemshasanimportantimpactonthethreedependentvariablesidentiedinourevaluation(endpointselectiontime,ifap-plicable,executiontimeandanswercompleteness).
Theshapeofthequeryplanswillbeinturnaectedbythenumberofbasictriplepatternsinthequerysincethisnumberwillinuencethenalqueryshape.
Queryevaluationsystemscanapplydierenttechniqueswhengeneratingqueryplansforaspecictypeofinputquery,andthiswillnormallyyielddierentselectionandexecutiontimes,andcompletenessresults.
Forexample,aqueryplangeneratormayormaynotgrouptogetherallgraphpatternsrelatedtooneendpoint.
Instantiationsandtheirpositionintriplepatterns.
Thisisrelatedtowhetheranyoftheelementsofthetriplepatternsinthequery(subject,objectorpredicate)arealreadyinstantiated,i.
e.
,boundedtosomeURI.
Togetherwithjoinselectivity,instantiationhasanimportantimpactonthepotentialnum-berofintermediateresultsthatmaybegeneratedthroughoutqueryexecution.
Forinstance,theabsenceofinstantiations(e.
g.
,presenceofvariables)inthepredicatepositionofatriplepatternmayhaveanimportantimpactinqueryexecutiontime,becauseseveralendpointsmaybeabletoprovideanswersforthepattern.
Answersizeandnumberofintermediateresults.
Ifthenumberofan-swersorintermediateresultsinvolvedinaqueryexecutionislarge,itmaytakealongtimetotransferthemacrossthenetwork,andhencethismayaectthequeryexecutiontime.
318G.
Montoyaetal.
Usageofquerylanguageexpressivity.
TheuseofspecicSPARQLop-eratorsmayaecttheexecutiontimeandthecompletenessofthenalresultset.
Forexample,theOPTIONALoperatorisoneofthemostcomplexoperatorsinSPARQL[5]andmayaddagoodnumberofintermediateresults,whiletheFILTERoperatormayrestricttheintermediateresultsandanswersize.
Generalpredicates(e.
g.
,rdf:type,owl:sameAs)arecommonlyusedinSPARQLqueries.
However,astheynormallyappearinmostdatasetsitisnotalwayscleartowhichendpointthecorrespondingsubqueryshouldbesubmit-ted,andthismayhaveanimpactinbothendpointselectionandqueryexecutiontime.
3.
2DataDimensionWenowdescribetheindependentvariablesrelatedtothecharacteristicsoftheRDFdatasetsthatarebeingaccessed.
AnRDFdatasetcanbedenedintermsofitssizeanditsstructuralcharacteristicslikethenumberofsubjects,pred-icatesandobjects,andtheinandoutdegreeofproperties.
Thesecharacteristicsimpactthenumberoftriplesthataretransferred,andhencethetotalexecutiontime.
Additionally,theymayaecttheperformanceoftheindividualendpoints.
Partitioninganddatadistributionaretwoofthemostimportantvari-ablesthatneedtobespeciedinthecontextofqueriesagainstfederationsofendpoints.
PartitioningreferstothewaythattheRDFdatasetisfragmented.
Datadistributionisthewaypartitionsareallocatedtothedierentendpoints.
Datamaybefullycentralized,fullydistributed,orsomewhereinbetween.
Adatasetmaybefragmentedintodisjunctpartitions;thepartitioningmaybedonehorizontally,verticallyoracombinationofboth.
Horizontalpartitioningfragmentstriplessothattheymaycontaindierentproperties.
Verticalparti-tioningproducesfragmentswhichcontainallthetriplesofatleastoneofthepropertiesinthedataset.
Horizontalpartitioningimpactsonthecompletenessoftheanswerwhereasverticalpartitioningaectstheexecutiontime.
Parti-tionsmaybereplicatedinseveralendpoints,eveninalloftheendpoints,i.
e.
,fullyreplicated,sothattheavailabilityofthesystemincreasesincaseofend-pointfailureorendpointdelay.
Table3comparesthebehaviorofANAPSIDandFedXwithdierentcongurations.
Thetwoenginesbehavesimilarlywhenthereisonedatasetperendpointandinhorizontalpartitioningwithoutreplication.
Forverticalpartitioningwithoutreplication,oneengineissuperiortotheother.
Whenpartitioningwithreplication,oneengineoutperformstheotherinverticalpartitioning,andtheinversebehavioroccurswithhorizontalpartitioning.
Table4showsanotherexampleoftheeectofdatadistributiononthequeryexecutiontime,againforANAPSIDandFedX.
Wecanobservethatwhentherearemultipleendpoints,resultsaresimilar,whilewithanetworkwithnodelay(perfectnetwork)andalldatasetsinasingleendpoint,oneoftheenginesclearlyoutperformstheotherinoneorderofmagnitude.
ResultsinTables3and4supporttheclaimthatdatapartitioning,datadistributionandnetworkdelaysneedtobeexplicitlycongurableintestbeds.
BenchmarkingFederatedSPARQLQueryEngines319Table3.
ImpactofDataPartitioningandDistributiononFedBenchqueryLD10(Per-fectNetwork).
VerticalPartitioning:triplesofpredicatesskos:subject,owl:sameAs,andnytimes:latestusewerestoredinfragments.
VerticalPartitioningWithoutReplication:threeendpoints,eachfragmentinadierentendpoint.
VerticalPar-titioningWithReplication:correspondstousefourendpointsandstoreoneofthethreefragmentsinthefourendpoints,anotherfragmentintwoendpoints,andthelastfragmentinoneendpoint.
HorizontalPartitioning:triplesofthethreepredicateswerepartitionedintwofragments;eachfragmenthasdatatoproduceatleastoneanswer.
HorizontalPartitioningWithoutReplicationtwoendpoints;onefragmentinadierentendpoint.
HorizontalPartitioningWithReplicas:fourendpoints;onefragmentisreplicatedineachendpoint,theotherfragmentinonlyoneendpoint.
QueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsOneDatasetperEndpointFedX1.
061.
063ANAPSID1.
081.
283VerticalPartitioningWithoutReplicationFedX0.
690.
693ANAPSID3.
8814.
253HorizontalPartitioningWithoutReplicationFedX0.
720.
723ANAPSID0.
030.
031VerticalPartitioningWithReplicationFedX0.
850.
8514ANAPSID4.
0614.
483HorizontalPartitioningWithReplicationFedX0.
910.
9125ANAPSID0.
060.
0613.
3PlatformDimensionThePlatformdimensiongroupsvariablesthatarerelatedtothecomputinginfrastructureusedintheevaluation.
Hereweincludeaminimumsetofpa-rameters,relatedtothesystem'scache,availableRAMmemoryandnumberofprocessors,sincethisdimensionmaycontainmanymoreparametersthatarerelevantinthiscontext,andthatshouldanywaybeexplicitlyspeciedinanyevaluationsetupwhenusingthistestbed.
TurningthecachemanagementfunctioninthesystemtogetherwiththeavailableRAMmayaectgreatlythequeryexecutiontime.
ThemeaningofdroppingandwarmingupcacheneedstobeclearlyspeciedaswellastheTable4.
ImpactofDataDistributiononFedBenchqueryCD1(PerfectNetwork).
AllDatasetsinoneendpointversusdatasetsdistributedindierentendpoints.
QueryExecutiontimeExecutiontimeNumberEngineFirstTuple(secs.
)AllTime(secs.
)ofResultsSingleEndpoint-AllDatabasetsFedX0.
510.
5161ANAPSID0.
0450.
04661MultipleEndpointsFedX0.
720.
7261ANAPSID0.
170.
1761320G.
Montoyaetal.
numberofiterationswhereanexperimentisruninwarmcache,andwhencachecontentsaredroopedo.
Inthecontextoffederationsofendpoints,informationonendpointcapabilitiesmaybestoredincache.
Thenumberofprocessorsisalsoarelevantvariableinthecontextoffederatedqueries.
Iftheinfrastruc-tureoersseveralprocessors,operatorsmayparallelizetheirexecution,andtheexecutiontimemaybeaectedpositively.
3.
4EndpointDimensionThisdimensioncomprisesvariablesthatarerelatedtothenumberandcapabil-itiesoftheendpointsusedinthetestbed.
TherstvariabletobeconsideredisthenumberofSPARQLendpointswherethequerywillbesubmittedandthetypeofendpointsthatareusedfortheevaluation.
Therstvariableaectsallthreeobservedvariables,speciallytheresultcompletenessbecausedierentendpointsmayproducedierentan-swers.
Therelationshipbetweenthenumberofinstances,graphsandendpointsofthesystemsusedduringtheevaluationisalsoanimportantas-pectthatneedstobespecied.
Dierentcongurationsoftheserelationshipsmayimpactthethreedependentvariables.
Thereareseveralvariablesthathaveanimportantimpactontheexecutiontime,suchasthetransferdistribution,whichisthetimedistributionofthetransmissionofpacketsbytheendpoints,thenetworklatency,whichdenesthedelayinsendingpacketsthroughthenetwork,andtheinitialendpointdelay.
AnexampleoftheimpactofdierentnetworkdelaysisillustratedinTable5.
TwoqueriesfromtheLinkedDatacollectionofFedBenchwereexecuted(LD10andLD11).
NotethatANAPSIDandFedXbehavesimilarlyinLD10whenthereisnodelay;however,whendelaysareconsidered,FedXoutperformsANAPSID.
Ontheotherhand,inLD11ANAPSIDoutperformsFedXwhendelaysarepresent.
Infact,ANAPSIDisabletoproducethersttupleafterthesameamountoftime,independentlyofthedelay.
Finally,SPARQLendpointsnormallyallowconguringalimitonthean-swersizeofthequeriesandatimeout,soastopreventuserstoquerytheentiredataset.
Thismaygenerateemptyresultsetsorincompleteresults,particularlywhenendpointsub-queriesarecomplex.
4SomeExperimentalResultsInthissectionweillustratehowthetestbedextensioncanbeusedtobetterunderstandthebehaviorofsomeoftheexistingfederatedqueryengines.
Theextendedtestbedhasbeenexecutedonthreesystems(ANAPSID,ARQandFedX)withseveralcongurationsfortheindependentvariablesidentiedinSection3.
ThecompleteresultsetgeneratedbytheseexecutionscanbebrowsedattheDEFENDERportal4.
4http://159.
90.
11.
58/BenchmarkingFederatedSPARQLQueryEngines321Table5.
ImpactofNetworklatencyonFedBenchqueriesLD10andLD11.
Timeoutwassetupto30minutesandMessageSizeis16KB.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-Fast(DelaysfollowGammadistribution(α=3,β=1.
0);Medium-Slow(DelaysfollowGammadistribution(α=3,β=1.
5);Slow(DelaysfollowGammadistribution(α=5,β=2.
0)).
QueryQueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsPerfectNetworkANAPSIDLD101.
081.
293LD110.
060.
09376FedXLD101.
061.
063LD115.
445.
44376FastNetworkANAPSIDLD1018.
1322.
893LD110.
062.
80376FedXLD103.
453.
453LD1114.
2114.
22376MediumFastNetworkANAPSIDLD10191.
78241.
583LD110.
0727.
86376FedXLD1027.
2727.
273LD11108.
93108.
93376MediumSlowNetworkANAPSIDLD10287.
88362.
593LD110.
0541.
74376FedXLD1041.
4241.
423LD11162.
45162.
45376SlowNetworkANAPSIDLD10653.
44819.
723LD110.
0992.
52376FedXLD1087.
1987.
193LD11347.
93347.
93376Nowwewillfocusononeoftheanalysesthatasystemdevelopermaybeinterestedin,inthecontextofthecontinuousbenchmarkingprocessthatwehavereferredtointhispaper.
Thatis,wearenotanalyzingthewholesetofresultsobtainedfromtheexecution,butonlyasubsetofit.
Specically,let'sassumethatweareinterestedinunderstandingtheperformanceofthethreeevaluatedsystemsunderdierentdatadistributionsinanidealscenario,withnoornegligibleconnectionlatency.
Ourhypothesisisthatexistingqueryenginesaresensibletothewaydataisdistributedalongdierentendpoints,evenwhenthenetworkisperfect.
Therefore,theseresultsmaybeusefultovalidatethathypothesisandtounderstandwhetherasetoffederateddatasetsforwhichwehavethecorrespondingRDFdumpsshouldbebetterstoredinasingleendpointorindierentendpointstooeranswersmoreeciently.
Basedonthesetofvariablesidentiedinourstudy,thefollowingexperimentalsetupisused:DatasetsandQueryBenchmarks.
Weran36queriesagainsttheFedBenchdatasetcollections[8]:DBPedia,NYTimes,Geonames,KEGG,ChEBI,Drugbank,Jamendo,LinkedMDB,andSWDogFood.
Thesequeriesinclude25FedBenchqueriesandelevencomplexqueries5.
Thelatterareaddedto5http://www.
ldc.
usb.
ve/~mvidal/FedBench/queries/ComplexQueries322G.
Montoyaetal.
coversomeofthemissingelementsintheformergroupofqueries.
Theyarecomprisedofbetween6and48triplepatterns,andcanbedecomposedintoupto8sub-queries;andtheycoverdierentSPARQLoperators.
Virtuoso6wasusedtoimplementendpoints,andthetimeoutwassetupto240secs.
or71,000tuples.
ExperimentswereexecutedonaLinuxMintmachinewithanIntelPentiumCore2DuoE75002.
93GHz8GBRAM1333MHzDDR3.
NetworkLatency.
Weconguredaperfectnetworkwithnodelays.
Thesizeofthemessagecorrespondedto16KB.
DataDistribution.
Weconsideredtwodierentdistributionsofthedata:i)Complete:theFedBenchcollectionswerestoredintoasinglegraphandmadeaccessiblethroughonesingleSPARQLendpoint,andii)Federated:theFedBenchcollectionswerestoredinnineVirtuosoendpoints.
Therefore,weconsiderthequeriesinfourgroupsandsixcongurations:Con-guration1:ANAPSIDCompleteDistribution,Conguration2:ANAPSIDFederatedDistribution,Conguration3:ARQCompleteDistribution,Con-guration4:ARQFederatedDistribution,Conguration5:FedXCompleteDistribution,Conguration6:FedXFederatedDistribution.
Ineachcong-uration,thecorrespondingquerieswereorderedaccordingtothetotalexecu-tiontimeconsumedbythecorrespondingengines.
Forexample,ANAPSIDinaCompleteDistribution,i.
e.
,Conguration1,theCross-Domainquerieswereorderedasfollows:CD2,CD3,CD4,CD5,CD1,CD7,andCD6.
QueriesofeachcongurationwerecomparedusingtheSpearman'sRhocorrelation.
Ahighpos-itivevalueofcorrelationvaluebetweentwocongurationsindicatesthatthecorrespondingengineshadasimilarbehavior,i.
e.
,thetrendsofexecutiontimeofthetwoenginesaresimilar.
Thus,whenConguration1iscomparedtoitself,theSpearman'sRhocorrelationreachesthehighestvalue(1.
0).
Ontheotherhand,anegativevalueindicatesaninversecorrelation;forexample,thishappenedwithComplexQueriestoARQinaCompleteDistribution(Cong-uration3)whencomparedtoFedXFederatedDistribution(Conguration6);itsvalueis-0.
757.
Finally,avalueof0.
0representsthatthereisnocorrela-tionbetweenthetwocongurations,e.
g.
,forLifeSciencequeriesConguration4andConguration6.
Figure1illustratestheresultsofthisspecicstudy(again,thedatausedforthisstudyisavailablethroughtheDEFENDERpor-tal).
Whitecirclesrepresentthehighestvalueofcorrelation;redonescorrespondtoinversecorrelations,whileblueonesindicateapositivecorrelation.
Thesizeofthecirclesisproportionaltothevalueofthecorrelation.
Givenagroupofqueries,alowvalueofcorrelationofoneengineintwodierentdistributionssuggeststhatthedistributionaectstheenginebehavior,e.
g.
,FedXandARQinComplexQuerieswithdierentdatadistributionshavecorrelationvaluesof0.
143and0.
045,respectively.
Furthermore,thenumberofsmallbluecirclesbe-tweencongurationsofdierentdatadistributionsofthesameengine,indicatethatthisparameteraectsthebehaviorofthestudiedengine.
BecausethereareseveralofthesepointsintheComplexQueriesplot,wecanconcludethat6http://virtuoso.
openlinksw.
com/BenchmarkingFederatedSPARQLQueryEngines323(a)CrossDomain(CD)(b)LinkedData(LD)(c)LifeScience(LS)(d)NewComplexQueries(C)Fig.
1.
Spearman'sRhoCorrelationofQueriesinthreeFedBenchsetsofqueries(a)Cross-Domain(CD),(b)LifeScience(LS),(c)LinkedData(LD)and(d)NewCom-plexQueries.
Sixcongurations:(1)ANAPSIDCompleteDistribution;(2)ANAPSIDFederatedDistribution;(3)ARQCompleteDistribution;(4)ARQFederatedDis-tribution;(5)FedXCompleteDistribution;(6)FedXFederatedDistribution.
Whitecirclescorrespondtocorrelationvalueof1.
0;bluecirclesindicateapositivecorrelation(Fig.
1(d)points(3,4)and(5,6)correlationvalues0.
045and0.
143,respectively);redcirclesindicateanegativecorrelation(Fig.
1(d)points(2,6)and(6,3)correlationvalues-0.
5and-0.
757,respectively).
Circles'diametersindicateabsolutecorrelationvalues.
thesetwoparameters(querycomplexityanddatadistribution)allowuncoveringengines'behaviorthatcouldnotbeobservedbefore.
Thisillustratestheneedfortheextensionsproposedinthispaper.
5ConclusionandFutureWorkInthispaperwehaveshownthatthereisaneedtoextendcurrentfederatedSPARQLquerytestbedswithadditionalvariablesandcongurationsetups(e.
g.
,datapartitioninganddistribution,networklatency,andquerycomplexity),soastoprovidemoreaccuratedetailsofthebehaviorofexistingengines,whichcanthenbeusedtoprovidebettercomparisonsandasinputforimprovementproposals.
Takingthoseadditionalvariablesintoaccount,wehaveextensivelyevaluatedthreeoftheexistingengines(ANAPSID,ARQandFedX),andhavemadeavailablethoseresultsforpublicconsumptionintheDEFENDERportal,324G.
Montoyaetal.
whichweplantomaintainup-to-dateonaregularbasis.
Wehavealsoshownhowthegeneratedresultdatasetcanbeusedtovalidatehypothesesaboutthesystems'behavior.
OurfutureworkplanswillbefocusedoncontinuingwiththeevaluationofadditionalfederatedSPARQLqueryengines,andwiththeinclusionofadditionalparametersinthebenchmarkthatmaystillbeneededtoprovidemoreaccurateandwell-informedresults.
Acknowledgements.
ThisworkhasbeenfundedbytheprojectmyBigData(TIN2010-17060),andDID-USB.
WethankMaribelAcosta,CosminBasca,andRaulGarca-Castroforfruitfuldiscussions.
References1.
Acosta,M.
,Vidal,M.
-E.
,Lampo,T.
,Castillo,J.
,Ruckhaus,E.
:ANAPSID:AnAdaptiveQueryProcessingEngineforSPARQLEndpoints.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
18–34.
Springer,Heidelberg(2011)2.
Buil-Aranda,C.
,Arenas,M.
,Corcho,O.
:SemanticsandOptimizationoftheSPARQL1.
1FederationExtension.
In:Antoniou,G.
,Grobelnik,M.
,Simperl,E.
,Parsia,B.
,Plexousakis,D.
,DeLeenheer,P.
,Pan,J.
(eds.
)ESWC2011,PartII.
LNCS,vol.
6644,pp.
1–15.
Springer,Heidelberg(2011)3.
Lynden,S.
,Kojima,I.
,Matono,A.
,Tanimura,Y.
:ADERIS:AnAdaptiveQueryProcessorforJoiningFederatedSPARQLEndpoints.
In:Meersman,R.
,Dillon,T.
,Herrero,P.
,Kumar,A.
,Reichert,M.
,Qing,L.
,Ooi,B.
-C.
,Damiani,E.
,Schmidt,D.
C.
,White,J.
,Hauswirth,M.
,Hitzler,P.
,Mohania,M.
(eds.
)OTM2011,PartII.
LNCS,vol.
7045,pp.
808–817.
Springer,Heidelberg(2011)4.
Montoya,G.
,Vidal,M.
-E.
,Acosta,M.
:DEFENDER:aDEcomposerforquEriesagainstfeDERationsofendpoints.
In:ExtendedSemanticWebConference,ESWCWorkshopandDemo2012(toappear)5.
Perez,J.
,Arenas,M.
,Gutierrez,C.
:SemanticsandcomplexityofSPARQL.
TODS34(3)(2009)6.
Prud'hommeaux,E.
,Buil-Aranda,C.
:SPARQL1.
1federatedquery(November2011)7.
Quilitz,B.
,Leser,U.
:QueryingDistributedRDFDataSourceswithSPARQL.
In:Bechhofer,S.
,Hauswirth,M.
,Homann,J.
,Koubarakis,M.
(eds.
)ESWC2008.
LNCS,vol.
5021,pp.
524–538.
Springer,Heidelberg(2008)8.
Schmidt,M.
,G¨orlitz,O.
,Haase,P.
,Ladwig,G.
,Schwarte,A.
,Tran,T.
:FedBench:ABenchmarkSuiteforFederatedSemanticDataQueryProcessing.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
585–600.
Springer,Heidelberg(2011)9.
Schmidt,M.
,Hornung,T.
,Lausen,G.
,Pinkel,C.
:SP2bench:ASPARQLperfor-mancebenchmark.
In:ICDT,pp.
4–33(2010)10.
Schwarte,A.
,Haase,P.
,Hose,K.
,Schenkel,R.
,Schmidt,M.
:FedX:OptimizationTechniquesforFederatedQueryProcessingonLinkedData.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
601–616.
Springer,Heidelberg(2011)
EdgeNat 商家在之前也有分享过几次活动,主要提供香港和韩国的VPS主机,分别在沙田和首尔LG机房,服务器均为自营硬件,电信CN2线路,移动联通BGP直连,其中VPS主机基于KVM架构,宿主机采用四路E5处理器、raid10+BBU固态硬盘!最高可以提供500Gbps DDoS防御。这次开年活动中有提供七折优惠的韩国独立服务器,原生IP地址CN2线路。第一、优惠券活动EdgeNat优惠码(限月...
tmhhost放出了2021年的端午佳节+618年中大促的优惠活动:日本软银、洛杉矶200G高防cn2 gia、洛杉矶三网cn2 gia、香港200M直连BGP、韩国cn2,全都是高端优化线路,所有这些VPS直接8折,部分已经做了季付8折然后再在此基础上继续8折(也就是6.4折)。 官方网站:https://www.tmhhost.com 香港BGP线路VPS ,200M带宽 200M带...
HostKvm商家我们也不用多介绍,这个服务商来自国内某商家,旗下也有多个品牌的,每次看到推送信息都是几个服务商品牌一起推送的。当然商家还是比较稳定的,商家品牌比较多,这也是国内商家一贯的做法,这样广撒网。这次看到黑五优惠活动发布了,针对其主打的香港云服务器提供终身6折的优惠,其余机房服务器依然是8折,另还有充值50美元赠送5美元的优惠活动,有需要的可以看看。HostKvm是一个创建于2013年的...
linuxmint为你推荐
.cn域名cn域名和com域名有啥区别?各有啥优点?安徽汽车网安徽什么汽车网站比较好?今日油条油条晚上炸好定型明天可再复炸吗?李子柒年入1.6亿李子柒男朋友是谁,李子柒父母怎么去世的?www.7160.com电影网站有那些陈嘉垣反黑阿欣是谁演的 扮演者介绍51sese.comwww.51xuanh.com这是什么网站是骗人的吗?www.zjs.com.cn我的信用卡已经申请成功了,显示正在寄卡,怎么查询寄卡信息?29ff.comhttp://fcm.com在哪里输入这个网址啊月风随笔散文校园月色600字初中作文
网络服务器租用 网游服务器租用 国外vps租用 justhost hawkhost优惠码 BWH sockscap godaddy 个人域名 idc资讯 免费高速空间 网通服务器托管 华为云盘 raid10 网通服务器 主机返佣 中国电信宽带测速 免费获得q币 香港博客 zcloud 更多