Delayslinuxmint
linuxmint 时间:2021-03-28 阅读:(
)
BenchmarkingFederatedSPARQLQueryEngines:AreExistingTestbedsEnoughGabrielaMontoya1,Maria-EstherVidal1,OscarCorcho2,EdnaRuckhaus1,andCarlosBuil-Aranda31UniversidadSimonBolvar,Venezuela{gmontoya,mvidal,ruckhaus}@ldc.
usb.
ve2OntologyEngineeringGroup,UniversidadPolitecnicadeMadrid,Spainocorcho@fi.
upm.
es3DepartmentofComputerScience,PonticiaUniversidadCatolica,Chilecbuil@ing.
puc.
clAbstract.
Testbedsproposedsofartoevaluate,compare,andeventu-allyimproveSPARQLqueryfederationsystemshavestillsomelimita-tions.
Somevariablesandcongurationsthatmayhaveanimpactonthebehaviorofthesesystems(e.
g.
,networklatency,datapartitioningandqueryproperties)arenotsucientlydened;thisaectstheresultsandrepeatabilityofindependentevaluationstudies,andhencetheinsightsthatcanbeobtainedfromthem.
InthispaperweevaluateFedBench,themostcomprehensivetestbeduptonow,andempiricallyprobetheneedofconsideringadditionaldimensionsandvariables.
TheevaluationhasbeenconductedonthreeSPARQLqueryfederationsystems,andtheanalysisoftheseresultshasallowedtouncoverpropertiesofthesesystemsthatwouldnormallybehiddenwiththeoriginaltestbeds.
1IntroductionThenumberofRDFdatasetsmadepubliclyavailablethroughSPARQLend-pointshasexplodedinrecentyears.
Thisfact,togetherwiththepotentialaddedvaluethatcanbeobtainedfromthecombinationofsuchdistributeddatasources,hasmotivatedthedevelopmentofsystemsthatallowexecutingqueriesoverfed-eratedSPARQLendpoints(e.
g.
,SPARQL-DQP[2],Jena'sARQ1,RDF::Query2,ANAPSID[1],FedX[10],ADERIS[3]).
SomesystemsuseSPARQL1.
0orad-hocextensions,whileothersrelyonthequeryfederationextensionsthatarebeingproposedaspartoftheupcomingSPARQL1.
1specication[6].
InparalleltothedevelopmentoffederatedSPARQLqueryevaluationsys-tems,severaltestbedshavebeencreated(e.
g.
,asdescribedin[2,7,8]),whichcomplementthosealreadyusedforsingle-endpointqueryevaluation.
Theroleofthesetestbedsistoallowevaluatingandcomparingthemaincharacteristicsofthesesystems,soastoprovideenoughinformationtoimprovethem.
Among1http://jena.
apache.
org/2http://search.
cpan.
org/~gwilliams/RDF-Query/P.
Cudre-Maurouxetal.
(Eds.
):ISWC2012,PartII,LNCS7650,pp.
313–324,2012.
cSpringer-VerlagBerlinHeidelberg2012314G.
Montoyaetal.
thefeaturesevaluatedbythesetestbedswecancite:i)functionalrequirementssupported,ii)eciencyoftheimplementationswithdierentcongurationsofdatasetsandwithdierenttypesofqueries,oriii)resiliencetochangesinthecongurationsofthesesystemsandtheunderlyingdatasets.
ThemostrecentandcompletetestbedisFedBench[8],whichproposesavarietyofqueriesindierentdomainsandwithdierentcharacteristics,includingstar-shaped,chain-likeandhybridqueries,andcomplexqueryformsusinganadaptationofSP2Bench[9].
Thesetestbedsarestepsforwardtowardsestablishingacontinuousbenchmark-ingprocessoffederatedSPARQLqueryengines.
However,theyarestillfarfromeectivelysupportingsuchbenchmarkingobjectives.
Infact,theydonotspecifycompletelyorevenconsidersomeofthedependentandindependentvariablesandcongurationsetupsthatcharacterizethetypesofproblemstobetackledinfederatedSPARQLqueryprocessing,andthatclearlyaecttheperformanceandqualityofdierentsolutions.
Thismayleadtoincorrectcharacterizationswhenthesetestbedsareusedtoselectthemostappropriatesystemsinagivenscenario,ortodecidethenextstepsintheirdevelopment.
Forexample,testbedsliketheonein[2]havelimitations.
First,queriesareexecuteddirectlyonliveSPARQLendpoints;thismeansthatexperimentsarenotreproducible,astheloadofendpointsandnetworklatencyvariesovertime.
Second,querieswereconstructedforthedataavailableintheselectedendpointsatthetimeofgeneratingthetestbed,butthestructureoftheseunderlyingRDFdatasourceschanges,andmayresultinqueriesthatarereturningdierentanswersorthatdonotreturnanyansweratall.
IncaseslikeFedBench[8],thelevelofreproducibilityisimprovedbyusingdatasetsthatcanbehandledlocally.
However,asshowninSection2,therearevariablesthatarenotyetconsideredinthisbenchmark(e.
g.
,networklatency,datasetcongurations)andthatareimportantinordertoobtainmoreaccurateandinformativeresults.
Theobjectiveofthispaperistodescriberstthecharacteristicsexhibitedbythesetestbeds(mainlyfocusingonFedBench)andreectontheircurrentlim-itations.
Additionalvariablesandcongurationsetups(e.
g.
,newqueries,newcongurationsofnetworklatencydetails,newdatasetdistributionparameters)areproposedinordertoprovidemoreaccurateandwell-informedoverviewsofthecurrentstatusofeachoftheevaluatedsystems,sothattheexperimentstobeexecutedcanoermoreaccurateinformationaboutthebehavioroftheevalu-atedsystems,andhencetheycanbeusedincontinuousimprovementprocesses.
Finally,wedescribebrieytheresultsofourevaluationofthisextendedtestbedusingthreedierentfederatedqueryengines:ARQ,ANAPSID,andFedX.
2SomeLimitationsofExistingTestbedsThereisnounique"one-size-ts-all"testbedtomeasureeverycharacteristicneededbyanapplicationthatrequiressomeformoffederatedqueryprocess-ing[8].
However,regardless,existingtestbedscanstillbeimprovedsothattheycanfullltheirroleincontinuousbenchmarkingprocesses.
Wewillrstillustratewhyweneedtoimproveexistingtestbeds,particularlyFedBench,bydescribingascenariowheretheuseofthetestbedinitscurrentBenchmarkingFederatedSPARQLQueryEngines315formmayleadtowrongdecisions.
WehaveexecutedtheFedBenchtestbedwiththreesystems(ANAPSID,ARQ,andFedX)onthethreesetsofqueriesproposed(LifeScience,CrossDomain,andLinkedData)[4].
Wehaveuseddierentsimu-latedcongurationsfornetworklatenciesanddierentdatadistributionsofthedatasetsusedintheexperiments.
Asaresult,weobserveinterestingresultsthatsuggesttheneedforimprovements.
Forinstance,fortheCross-DomainqueryCD1,allsystemsbehavewellinaperfectnetwork(asshowninTable1).
However,theirbehaviorchangesdramaticallywhennetworklatenciesareconsidered.
Forinstance,ARQisnotabletohandlethisqueryformedium-fastandfastnet-works,giventhetimeoutconsidered;thetimeneededtoexecutethequeryinthecaseofFedXgrowsfrom0.
72secs.
(perfectnetwork)to2.
23secs.
(fastnetwork)and16.
93secs.
(medium-fastnetwork);andforANAPSIDtheresultsaresimilarforperfectandfastnetworks,andgrowsslowerinmedium-fastnetworks.
Table1.
EvaluationofFedBenchqueryCD1-Numberofresultsandexecutiontime(secs.
)underdierentnetworklatencyconditions.
Timeoutwassetupto30minutes.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-FastNetwork(DelaysfollowGammadistribution(α=3,β=1.
0)).
NumberofresultsExecutiontime(secs.
)Executiontime(secs.
)(rsttuple)(alltuples)QueryEngineMediumFastPerfectMediumFastPerfectMediumFastPerfectANAPSID6161610.
980.
170.
160.
980.
170.
16FedX61616116.
932.
230.
7216.
932.
230.
72ARQ––63––0.
98––0.
98ThisisalsothecaseforotherFedBenchqueries(e.
g.
,LD10,LD11,LS7,CD2),wheredierentbehaviorscanbeobserveddependingnotonlyonnetworklatency,butalsoonadditionalparameters,e.
g.
,datadistribution.
Whattheseexamplesshowisthatthoseparametersarealsoimportantwhenconsideringfederatedqueryprocessingapproaches,andshouldbeconguredinatestbed,soastoprovidesucientinformationfordecisionmakerstoselecttherighttoolforthetypeofproblembeinghandled,orfortooldeveloperstounderstandbettertheweaknessesoftheirsystemsandimprovethemaccordingly,ifpossible.
Finally,thereisalsoanotheraspectthatisimportantwhenconsideringthequalityofexistingtestbeds,anditisthefactthatsometimestherearenotsucientexplanationsaboutthepurposeofeachoftheparametersthatcanbecongured.
Forexample,inthecaseofFedBenchthereareseveralparam-etersthatareconsideredwhendescribingqueries,aspresentedin[8],suchaswhetherthequeryusesoperatorslikeconjunctions,unions,ltersoroptionals,modierslikeDISTINCT,LIMIT,OFFSETorORDERBY,andstructureslikestar-shapedqueries,chainsorhybridcases.
Whilethisisquiteacomprehen-sivesetoffeaturestocharacterizeaSPARQLquery,therearenoclearreasonsaboutwhyeachofthe36queriesfromthetestbedareincluded.
Onlysomeex-amplesareprovidedin[8],explainingthatLS4"includesastar-shapedgroupoftriplepatternsofdrugswhichisconnectedviaowl:sameAslinktoDBpediadrugentities",orthatCD5isa"chain-likequeryforndinglmentitieslinked316G.
Montoyaetal.
viaowl:sameAsandrestrictedongenreanddirector".
However,therearenoexplanationsinthepaperorinthecorrespondingbenchmarkwebsiteaboutthereasonsforincludingeachofthem.
Furthermore,thereareparametersthatarenotadequatelyrepresented(e.
g.
,commonqueryoperatorslikeoptionalsandl-tersdonotappearincrossdomainorlinkeddataqueries),andcharacteristicsthatarenotsucientlydiscussed(e.
g.
,thenumberoftriplepatternsineachbasicgraphpatternappearinginthequery,theselectivityofeachpartofthequery,etc.
),whichmakesthetestbednotcompleteenough.
Insummary,whileweacknowledgetheimportanceofthesetestbedsinthestateoftheartoffederatedqueryprocessingevaluation,wecanidentifysomeoftheirshortcomingswhichweillustrateanddescribeindierentscenarios.
3BenchmarkDesignInthissectionwedescribesomeofthevariablesthathaveanimpactonfed-eratedSPARQLqueryengines.
Therearetwogroupsofvariables:independentanddependent.
Independentvariablesarethosecharacteristicsthatneedtobeminimallyspeciedinthebenchmarkinordertoensurethatevaluationscenar-iosarereplicable.
Independentvariableshavebeengroupedintofourdimensions:Query,Data,Platform,andEndpoint.
Dependent(orobserved)variablesarethosecharacteristicsthatarenormallyinuencedbyindependentvariables,asdescribedinTable2,andthatwillbemeasuredduringtheevaluation:–EndpointSelectionTime.
ElapsedtimebetweenquerysubmissionandthegenerationoftheSPARQL1.
1federatedqueryannotatedwiththeendpointswheresub-querieswillbeexecuted3.
–ExecutionTime.
Thisvariableisinturncomprisedof:i)Timeforthersttupleorelapsedtimebetweenquerysubmissionandrstanswer,ii)Timedistributionofthereceptionofqueryanswers,andiii)Totalexecutiontime.
–AnswerCompleteness.
Numberofanswersreceivedinrelationtothedataavailableintheselectedendpoints.
Inthefollowingsectionswedescribeindependentvariablesinmoredetail.
3.
1QueryDimensionThisdimensiongroupsvariablesthatcharacterizethequeriesintermsoftheirstructure,evaluation,andquerylanguageexpressivity.
Regardingthestructureofthequery,wefocusonthreemainaspects:i)thequeryplanshape,ii)thenumberofbasictriplepatternsinthequery,andiii)theinstantiationsofsubject,objectand/orpredicatesinthequery.
3ThisvariableisapplicableonlyincaseswherethesystemhandlesSPARQL1.
0queriesandnoendpointsarespeciedinthequery;hence,thesequerieshavetobetranslatedintoSPARQL1.
1orintoanequivalentinternalrepresentation.
BenchmarkingFederatedSPARQLQueryEngines317Table2.
VariablesthatimpactthebehaviorofSPARQLfederatedenginesObservedVariablesIndependentVariablesEndpointSelectionTimeExecutionTimeAnswerCompletenessQueryqueryplanshape#basictriplepatterns#instantiationsandtheirpositionjoinselectivity#intermediateresultsanswersizeusageofquerylanguageexpressivity#generalpredicatesDatadatasetsizedatafrequencydistributiontypeofpartitioningdataendpointdistributionPlatformcacheon/offRAMavailable#processorsEndpoint#endpointsendpointtyperelationgraph/endpoint/instancenetworklatencyinitialdelaymessagesizetransferdistributionanswersizelimittimeoutShape.
Queryplansmaybestar-shaped,chain-shapedoracombinationofthem,asdescribedin[8].
Ingeneral,theshapeoftheinputqueriesandofthequeryplansgeneratedbythesystemshasanimportantimpactonthethreedependentvariablesidentiedinourevaluation(endpointselectiontime,ifap-plicable,executiontimeandanswercompleteness).
Theshapeofthequeryplanswillbeinturnaectedbythenumberofbasictriplepatternsinthequerysincethisnumberwillinuencethenalqueryshape.
Queryevaluationsystemscanapplydierenttechniqueswhengeneratingqueryplansforaspecictypeofinputquery,andthiswillnormallyyielddierentselectionandexecutiontimes,andcompletenessresults.
Forexample,aqueryplangeneratormayormaynotgrouptogetherallgraphpatternsrelatedtooneendpoint.
Instantiationsandtheirpositionintriplepatterns.
Thisisrelatedtowhetheranyoftheelementsofthetriplepatternsinthequery(subject,objectorpredicate)arealreadyinstantiated,i.
e.
,boundedtosomeURI.
Togetherwithjoinselectivity,instantiationhasanimportantimpactonthepotentialnum-berofintermediateresultsthatmaybegeneratedthroughoutqueryexecution.
Forinstance,theabsenceofinstantiations(e.
g.
,presenceofvariables)inthepredicatepositionofatriplepatternmayhaveanimportantimpactinqueryexecutiontime,becauseseveralendpointsmaybeabletoprovideanswersforthepattern.
Answersizeandnumberofintermediateresults.
Ifthenumberofan-swersorintermediateresultsinvolvedinaqueryexecutionislarge,itmaytakealongtimetotransferthemacrossthenetwork,andhencethismayaectthequeryexecutiontime.
318G.
Montoyaetal.
Usageofquerylanguageexpressivity.
TheuseofspecicSPARQLop-eratorsmayaecttheexecutiontimeandthecompletenessofthenalresultset.
Forexample,theOPTIONALoperatorisoneofthemostcomplexoperatorsinSPARQL[5]andmayaddagoodnumberofintermediateresults,whiletheFILTERoperatormayrestricttheintermediateresultsandanswersize.
Generalpredicates(e.
g.
,rdf:type,owl:sameAs)arecommonlyusedinSPARQLqueries.
However,astheynormallyappearinmostdatasetsitisnotalwayscleartowhichendpointthecorrespondingsubqueryshouldbesubmit-ted,andthismayhaveanimpactinbothendpointselectionandqueryexecutiontime.
3.
2DataDimensionWenowdescribetheindependentvariablesrelatedtothecharacteristicsoftheRDFdatasetsthatarebeingaccessed.
AnRDFdatasetcanbedenedintermsofitssizeanditsstructuralcharacteristicslikethenumberofsubjects,pred-icatesandobjects,andtheinandoutdegreeofproperties.
Thesecharacteristicsimpactthenumberoftriplesthataretransferred,andhencethetotalexecutiontime.
Additionally,theymayaecttheperformanceoftheindividualendpoints.
Partitioninganddatadistributionaretwoofthemostimportantvari-ablesthatneedtobespeciedinthecontextofqueriesagainstfederationsofendpoints.
PartitioningreferstothewaythattheRDFdatasetisfragmented.
Datadistributionisthewaypartitionsareallocatedtothedierentendpoints.
Datamaybefullycentralized,fullydistributed,orsomewhereinbetween.
Adatasetmaybefragmentedintodisjunctpartitions;thepartitioningmaybedonehorizontally,verticallyoracombinationofboth.
Horizontalpartitioningfragmentstriplessothattheymaycontaindierentproperties.
Verticalparti-tioningproducesfragmentswhichcontainallthetriplesofatleastoneofthepropertiesinthedataset.
Horizontalpartitioningimpactsonthecompletenessoftheanswerwhereasverticalpartitioningaectstheexecutiontime.
Parti-tionsmaybereplicatedinseveralendpoints,eveninalloftheendpoints,i.
e.
,fullyreplicated,sothattheavailabilityofthesystemincreasesincaseofend-pointfailureorendpointdelay.
Table3comparesthebehaviorofANAPSIDandFedXwithdierentcongurations.
Thetwoenginesbehavesimilarlywhenthereisonedatasetperendpointandinhorizontalpartitioningwithoutreplication.
Forverticalpartitioningwithoutreplication,oneengineissuperiortotheother.
Whenpartitioningwithreplication,oneengineoutperformstheotherinverticalpartitioning,andtheinversebehavioroccurswithhorizontalpartitioning.
Table4showsanotherexampleoftheeectofdatadistributiononthequeryexecutiontime,againforANAPSIDandFedX.
Wecanobservethatwhentherearemultipleendpoints,resultsaresimilar,whilewithanetworkwithnodelay(perfectnetwork)andalldatasetsinasingleendpoint,oneoftheenginesclearlyoutperformstheotherinoneorderofmagnitude.
ResultsinTables3and4supporttheclaimthatdatapartitioning,datadistributionandnetworkdelaysneedtobeexplicitlycongurableintestbeds.
BenchmarkingFederatedSPARQLQueryEngines319Table3.
ImpactofDataPartitioningandDistributiononFedBenchqueryLD10(Per-fectNetwork).
VerticalPartitioning:triplesofpredicatesskos:subject,owl:sameAs,andnytimes:latestusewerestoredinfragments.
VerticalPartitioningWithoutReplication:threeendpoints,eachfragmentinadierentendpoint.
VerticalPar-titioningWithReplication:correspondstousefourendpointsandstoreoneofthethreefragmentsinthefourendpoints,anotherfragmentintwoendpoints,andthelastfragmentinoneendpoint.
HorizontalPartitioning:triplesofthethreepredicateswerepartitionedintwofragments;eachfragmenthasdatatoproduceatleastoneanswer.
HorizontalPartitioningWithoutReplicationtwoendpoints;onefragmentinadierentendpoint.
HorizontalPartitioningWithReplicas:fourendpoints;onefragmentisreplicatedineachendpoint,theotherfragmentinonlyoneendpoint.
QueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsOneDatasetperEndpointFedX1.
061.
063ANAPSID1.
081.
283VerticalPartitioningWithoutReplicationFedX0.
690.
693ANAPSID3.
8814.
253HorizontalPartitioningWithoutReplicationFedX0.
720.
723ANAPSID0.
030.
031VerticalPartitioningWithReplicationFedX0.
850.
8514ANAPSID4.
0614.
483HorizontalPartitioningWithReplicationFedX0.
910.
9125ANAPSID0.
060.
0613.
3PlatformDimensionThePlatformdimensiongroupsvariablesthatarerelatedtothecomputinginfrastructureusedintheevaluation.
Hereweincludeaminimumsetofpa-rameters,relatedtothesystem'scache,availableRAMmemoryandnumberofprocessors,sincethisdimensionmaycontainmanymoreparametersthatarerelevantinthiscontext,andthatshouldanywaybeexplicitlyspeciedinanyevaluationsetupwhenusingthistestbed.
TurningthecachemanagementfunctioninthesystemtogetherwiththeavailableRAMmayaectgreatlythequeryexecutiontime.
ThemeaningofdroppingandwarmingupcacheneedstobeclearlyspeciedaswellastheTable4.
ImpactofDataDistributiononFedBenchqueryCD1(PerfectNetwork).
AllDatasetsinoneendpointversusdatasetsdistributedindierentendpoints.
QueryExecutiontimeExecutiontimeNumberEngineFirstTuple(secs.
)AllTime(secs.
)ofResultsSingleEndpoint-AllDatabasetsFedX0.
510.
5161ANAPSID0.
0450.
04661MultipleEndpointsFedX0.
720.
7261ANAPSID0.
170.
1761320G.
Montoyaetal.
numberofiterationswhereanexperimentisruninwarmcache,andwhencachecontentsaredroopedo.
Inthecontextoffederationsofendpoints,informationonendpointcapabilitiesmaybestoredincache.
Thenumberofprocessorsisalsoarelevantvariableinthecontextoffederatedqueries.
Iftheinfrastruc-tureoersseveralprocessors,operatorsmayparallelizetheirexecution,andtheexecutiontimemaybeaectedpositively.
3.
4EndpointDimensionThisdimensioncomprisesvariablesthatarerelatedtothenumberandcapabil-itiesoftheendpointsusedinthetestbed.
TherstvariabletobeconsideredisthenumberofSPARQLendpointswherethequerywillbesubmittedandthetypeofendpointsthatareusedfortheevaluation.
Therstvariableaectsallthreeobservedvariables,speciallytheresultcompletenessbecausedierentendpointsmayproducedierentan-swers.
Therelationshipbetweenthenumberofinstances,graphsandendpointsofthesystemsusedduringtheevaluationisalsoanimportantas-pectthatneedstobespecied.
Dierentcongurationsoftheserelationshipsmayimpactthethreedependentvariables.
Thereareseveralvariablesthathaveanimportantimpactontheexecutiontime,suchasthetransferdistribution,whichisthetimedistributionofthetransmissionofpacketsbytheendpoints,thenetworklatency,whichdenesthedelayinsendingpacketsthroughthenetwork,andtheinitialendpointdelay.
AnexampleoftheimpactofdierentnetworkdelaysisillustratedinTable5.
TwoqueriesfromtheLinkedDatacollectionofFedBenchwereexecuted(LD10andLD11).
NotethatANAPSIDandFedXbehavesimilarlyinLD10whenthereisnodelay;however,whendelaysareconsidered,FedXoutperformsANAPSID.
Ontheotherhand,inLD11ANAPSIDoutperformsFedXwhendelaysarepresent.
Infact,ANAPSIDisabletoproducethersttupleafterthesameamountoftime,independentlyofthedelay.
Finally,SPARQLendpointsnormallyallowconguringalimitonthean-swersizeofthequeriesandatimeout,soastopreventuserstoquerytheentiredataset.
Thismaygenerateemptyresultsetsorincompleteresults,particularlywhenendpointsub-queriesarecomplex.
4SomeExperimentalResultsInthissectionweillustratehowthetestbedextensioncanbeusedtobetterunderstandthebehaviorofsomeoftheexistingfederatedqueryengines.
Theextendedtestbedhasbeenexecutedonthreesystems(ANAPSID,ARQandFedX)withseveralcongurationsfortheindependentvariablesidentiedinSection3.
ThecompleteresultsetgeneratedbytheseexecutionscanbebrowsedattheDEFENDERportal4.
4http://159.
90.
11.
58/BenchmarkingFederatedSPARQLQueryEngines321Table5.
ImpactofNetworklatencyonFedBenchqueriesLD10andLD11.
Timeoutwassetupto30minutesandMessageSizeis16KB.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-Fast(DelaysfollowGammadistribution(α=3,β=1.
0);Medium-Slow(DelaysfollowGammadistribution(α=3,β=1.
5);Slow(DelaysfollowGammadistribution(α=5,β=2.
0)).
QueryQueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsPerfectNetworkANAPSIDLD101.
081.
293LD110.
060.
09376FedXLD101.
061.
063LD115.
445.
44376FastNetworkANAPSIDLD1018.
1322.
893LD110.
062.
80376FedXLD103.
453.
453LD1114.
2114.
22376MediumFastNetworkANAPSIDLD10191.
78241.
583LD110.
0727.
86376FedXLD1027.
2727.
273LD11108.
93108.
93376MediumSlowNetworkANAPSIDLD10287.
88362.
593LD110.
0541.
74376FedXLD1041.
4241.
423LD11162.
45162.
45376SlowNetworkANAPSIDLD10653.
44819.
723LD110.
0992.
52376FedXLD1087.
1987.
193LD11347.
93347.
93376Nowwewillfocusononeoftheanalysesthatasystemdevelopermaybeinterestedin,inthecontextofthecontinuousbenchmarkingprocessthatwehavereferredtointhispaper.
Thatis,wearenotanalyzingthewholesetofresultsobtainedfromtheexecution,butonlyasubsetofit.
Specically,let'sassumethatweareinterestedinunderstandingtheperformanceofthethreeevaluatedsystemsunderdierentdatadistributionsinanidealscenario,withnoornegligibleconnectionlatency.
Ourhypothesisisthatexistingqueryenginesaresensibletothewaydataisdistributedalongdierentendpoints,evenwhenthenetworkisperfect.
Therefore,theseresultsmaybeusefultovalidatethathypothesisandtounderstandwhetherasetoffederateddatasetsforwhichwehavethecorrespondingRDFdumpsshouldbebetterstoredinasingleendpointorindierentendpointstooeranswersmoreeciently.
Basedonthesetofvariablesidentiedinourstudy,thefollowingexperimentalsetupisused:DatasetsandQueryBenchmarks.
Weran36queriesagainsttheFedBenchdatasetcollections[8]:DBPedia,NYTimes,Geonames,KEGG,ChEBI,Drugbank,Jamendo,LinkedMDB,andSWDogFood.
Thesequeriesinclude25FedBenchqueriesandelevencomplexqueries5.
Thelatterareaddedto5http://www.
ldc.
usb.
ve/~mvidal/FedBench/queries/ComplexQueries322G.
Montoyaetal.
coversomeofthemissingelementsintheformergroupofqueries.
Theyarecomprisedofbetween6and48triplepatterns,andcanbedecomposedintoupto8sub-queries;andtheycoverdierentSPARQLoperators.
Virtuoso6wasusedtoimplementendpoints,andthetimeoutwassetupto240secs.
or71,000tuples.
ExperimentswereexecutedonaLinuxMintmachinewithanIntelPentiumCore2DuoE75002.
93GHz8GBRAM1333MHzDDR3.
NetworkLatency.
Weconguredaperfectnetworkwithnodelays.
Thesizeofthemessagecorrespondedto16KB.
DataDistribution.
Weconsideredtwodierentdistributionsofthedata:i)Complete:theFedBenchcollectionswerestoredintoasinglegraphandmadeaccessiblethroughonesingleSPARQLendpoint,andii)Federated:theFedBenchcollectionswerestoredinnineVirtuosoendpoints.
Therefore,weconsiderthequeriesinfourgroupsandsixcongurations:Con-guration1:ANAPSIDCompleteDistribution,Conguration2:ANAPSIDFederatedDistribution,Conguration3:ARQCompleteDistribution,Con-guration4:ARQFederatedDistribution,Conguration5:FedXCompleteDistribution,Conguration6:FedXFederatedDistribution.
Ineachcong-uration,thecorrespondingquerieswereorderedaccordingtothetotalexecu-tiontimeconsumedbythecorrespondingengines.
Forexample,ANAPSIDinaCompleteDistribution,i.
e.
,Conguration1,theCross-Domainquerieswereorderedasfollows:CD2,CD3,CD4,CD5,CD1,CD7,andCD6.
QueriesofeachcongurationwerecomparedusingtheSpearman'sRhocorrelation.
Ahighpos-itivevalueofcorrelationvaluebetweentwocongurationsindicatesthatthecorrespondingengineshadasimilarbehavior,i.
e.
,thetrendsofexecutiontimeofthetwoenginesaresimilar.
Thus,whenConguration1iscomparedtoitself,theSpearman'sRhocorrelationreachesthehighestvalue(1.
0).
Ontheotherhand,anegativevalueindicatesaninversecorrelation;forexample,thishappenedwithComplexQueriestoARQinaCompleteDistribution(Cong-uration3)whencomparedtoFedXFederatedDistribution(Conguration6);itsvalueis-0.
757.
Finally,avalueof0.
0representsthatthereisnocorrela-tionbetweenthetwocongurations,e.
g.
,forLifeSciencequeriesConguration4andConguration6.
Figure1illustratestheresultsofthisspecicstudy(again,thedatausedforthisstudyisavailablethroughtheDEFENDERpor-tal).
Whitecirclesrepresentthehighestvalueofcorrelation;redonescorrespondtoinversecorrelations,whileblueonesindicateapositivecorrelation.
Thesizeofthecirclesisproportionaltothevalueofthecorrelation.
Givenagroupofqueries,alowvalueofcorrelationofoneengineintwodierentdistributionssuggeststhatthedistributionaectstheenginebehavior,e.
g.
,FedXandARQinComplexQuerieswithdierentdatadistributionshavecorrelationvaluesof0.
143and0.
045,respectively.
Furthermore,thenumberofsmallbluecirclesbe-tweencongurationsofdierentdatadistributionsofthesameengine,indicatethatthisparameteraectsthebehaviorofthestudiedengine.
BecausethereareseveralofthesepointsintheComplexQueriesplot,wecanconcludethat6http://virtuoso.
openlinksw.
com/BenchmarkingFederatedSPARQLQueryEngines323(a)CrossDomain(CD)(b)LinkedData(LD)(c)LifeScience(LS)(d)NewComplexQueries(C)Fig.
1.
Spearman'sRhoCorrelationofQueriesinthreeFedBenchsetsofqueries(a)Cross-Domain(CD),(b)LifeScience(LS),(c)LinkedData(LD)and(d)NewCom-plexQueries.
Sixcongurations:(1)ANAPSIDCompleteDistribution;(2)ANAPSIDFederatedDistribution;(3)ARQCompleteDistribution;(4)ARQFederatedDis-tribution;(5)FedXCompleteDistribution;(6)FedXFederatedDistribution.
Whitecirclescorrespondtocorrelationvalueof1.
0;bluecirclesindicateapositivecorrelation(Fig.
1(d)points(3,4)and(5,6)correlationvalues0.
045and0.
143,respectively);redcirclesindicateanegativecorrelation(Fig.
1(d)points(2,6)and(6,3)correlationvalues-0.
5and-0.
757,respectively).
Circles'diametersindicateabsolutecorrelationvalues.
thesetwoparameters(querycomplexityanddatadistribution)allowuncoveringengines'behaviorthatcouldnotbeobservedbefore.
Thisillustratestheneedfortheextensionsproposedinthispaper.
5ConclusionandFutureWorkInthispaperwehaveshownthatthereisaneedtoextendcurrentfederatedSPARQLquerytestbedswithadditionalvariablesandcongurationsetups(e.
g.
,datapartitioninganddistribution,networklatency,andquerycomplexity),soastoprovidemoreaccuratedetailsofthebehaviorofexistingengines,whichcanthenbeusedtoprovidebettercomparisonsandasinputforimprovementproposals.
Takingthoseadditionalvariablesintoaccount,wehaveextensivelyevaluatedthreeoftheexistingengines(ANAPSID,ARQandFedX),andhavemadeavailablethoseresultsforpublicconsumptionintheDEFENDERportal,324G.
Montoyaetal.
whichweplantomaintainup-to-dateonaregularbasis.
Wehavealsoshownhowthegeneratedresultdatasetcanbeusedtovalidatehypothesesaboutthesystems'behavior.
OurfutureworkplanswillbefocusedoncontinuingwiththeevaluationofadditionalfederatedSPARQLqueryengines,andwiththeinclusionofadditionalparametersinthebenchmarkthatmaystillbeneededtoprovidemoreaccurateandwell-informedresults.
Acknowledgements.
ThisworkhasbeenfundedbytheprojectmyBigData(TIN2010-17060),andDID-USB.
WethankMaribelAcosta,CosminBasca,andRaulGarca-Castroforfruitfuldiscussions.
References1.
Acosta,M.
,Vidal,M.
-E.
,Lampo,T.
,Castillo,J.
,Ruckhaus,E.
:ANAPSID:AnAdaptiveQueryProcessingEngineforSPARQLEndpoints.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
18–34.
Springer,Heidelberg(2011)2.
Buil-Aranda,C.
,Arenas,M.
,Corcho,O.
:SemanticsandOptimizationoftheSPARQL1.
1FederationExtension.
In:Antoniou,G.
,Grobelnik,M.
,Simperl,E.
,Parsia,B.
,Plexousakis,D.
,DeLeenheer,P.
,Pan,J.
(eds.
)ESWC2011,PartII.
LNCS,vol.
6644,pp.
1–15.
Springer,Heidelberg(2011)3.
Lynden,S.
,Kojima,I.
,Matono,A.
,Tanimura,Y.
:ADERIS:AnAdaptiveQueryProcessorforJoiningFederatedSPARQLEndpoints.
In:Meersman,R.
,Dillon,T.
,Herrero,P.
,Kumar,A.
,Reichert,M.
,Qing,L.
,Ooi,B.
-C.
,Damiani,E.
,Schmidt,D.
C.
,White,J.
,Hauswirth,M.
,Hitzler,P.
,Mohania,M.
(eds.
)OTM2011,PartII.
LNCS,vol.
7045,pp.
808–817.
Springer,Heidelberg(2011)4.
Montoya,G.
,Vidal,M.
-E.
,Acosta,M.
:DEFENDER:aDEcomposerforquEriesagainstfeDERationsofendpoints.
In:ExtendedSemanticWebConference,ESWCWorkshopandDemo2012(toappear)5.
Perez,J.
,Arenas,M.
,Gutierrez,C.
:SemanticsandcomplexityofSPARQL.
TODS34(3)(2009)6.
Prud'hommeaux,E.
,Buil-Aranda,C.
:SPARQL1.
1federatedquery(November2011)7.
Quilitz,B.
,Leser,U.
:QueryingDistributedRDFDataSourceswithSPARQL.
In:Bechhofer,S.
,Hauswirth,M.
,Homann,J.
,Koubarakis,M.
(eds.
)ESWC2008.
LNCS,vol.
5021,pp.
524–538.
Springer,Heidelberg(2008)8.
Schmidt,M.
,G¨orlitz,O.
,Haase,P.
,Ladwig,G.
,Schwarte,A.
,Tran,T.
:FedBench:ABenchmarkSuiteforFederatedSemanticDataQueryProcessing.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
585–600.
Springer,Heidelberg(2011)9.
Schmidt,M.
,Hornung,T.
,Lausen,G.
,Pinkel,C.
:SP2bench:ASPARQLperfor-mancebenchmark.
In:ICDT,pp.
4–33(2010)10.
Schwarte,A.
,Haase,P.
,Hose,K.
,Schenkel,R.
,Schmidt,M.
:FedX:OptimizationTechniquesforFederatedQueryProcessingonLinkedData.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
601–616.
Springer,Heidelberg(2011)
4324云是成立于2012年的老牌商家,主要经营国内服务器资源,是目前国内实力很强的商家,从价格上就可以看出来商家实力,这次商家给大家带来了全网最便宜的物理服务器。只能说用叹为观止形容。官网地址 点击进入由于是活动套餐 本款产品需要联系QQ客服 购买 QQ 800083597 QQ 2772347271CPU内存硬盘带宽IP防御价格e5 2630 12核16GBSSD 500GB30M1个IP...
hostkey应该不用说大家都是比较熟悉的荷兰服务器品牌商家,主打荷兰、俄罗斯机房的独立服务器,包括常规服务器、AMD和Intel I9高频服务器、GPU服务器、高防服务器;当然,美国服务器也有,在纽约机房!官方网站:https://hostkey.com/gpu-dedicated-servers/比特币、信用卡、PayPal、支付宝、webmoney都可以付款!CPU类型AMD Ryzen9 ...
如今我们还有在做个人网站吗?随着自媒体和短视频的发展和兴起,包括我们很多WEB2.0产品的延续,当然也包括个人建站市场的低迷和用户关注的不同,有些个人已经不在做网站。但是,由于我们有些朋友出于网站的爱好或者说是有些项目还是基于PC端网站的,还是有网友抱有信心的,比如我们看到有一些老牌个人网站依旧在运行,且还有新网站的出现。今天在这篇文章中谈谈有网友问关于个人网站备案的问题。这个也是前几天有他在选择...
linuxmint为你推荐
brandoff国际大牌包包都有哪些呐?摩拜超15分钟加钱怎么领取摩拜单车免费卷h连锁酒店有哪些快捷酒店连锁酒店。硬盘工作原理数据存储的原理是什么百度关键词价格查询百度推广里怎么查指定的关键字参与竞价的价位呢地陷裂口造成地陷都有哪些原因?m.kan84.net电视剧海派甜心全集海派甜心在线观看海派甜心全集高清dvd快播迅雷下载www.hyyan.comdota屠夫怎么玩?从初期到后期的装备是什么?www.45gtv.com登录农行网银首页www.abchina.com,www.gogo.comNEO春之色直径?
企业虚拟主机 网站备案域名查询 外国服务器 10t等于多少g lamp配置 panel1 京东商城0元抢购 admit的用法 老左正传 可外链网盘 卡巴斯基试用版 1美金 多线空间 google台湾 512mb 国外网页代理 godaddyssl phpwind论坛 带宽测速 nic 更多