Delayslinuxmint

linuxmint  时间:2021-03-28  阅读:()
BenchmarkingFederatedSPARQLQueryEngines:AreExistingTestbedsEnoughGabrielaMontoya1,Maria-EstherVidal1,OscarCorcho2,EdnaRuckhaus1,andCarlosBuil-Aranda31UniversidadSimonBolvar,Venezuela{gmontoya,mvidal,ruckhaus}@ldc.
usb.
ve2OntologyEngineeringGroup,UniversidadPolitecnicadeMadrid,Spainocorcho@fi.
upm.
es3DepartmentofComputerScience,PonticiaUniversidadCatolica,Chilecbuil@ing.
puc.
clAbstract.
Testbedsproposedsofartoevaluate,compare,andeventu-allyimproveSPARQLqueryfederationsystemshavestillsomelimita-tions.
Somevariablesandcongurationsthatmayhaveanimpactonthebehaviorofthesesystems(e.
g.
,networklatency,datapartitioningandqueryproperties)arenotsucientlydened;thisaectstheresultsandrepeatabilityofindependentevaluationstudies,andhencetheinsightsthatcanbeobtainedfromthem.
InthispaperweevaluateFedBench,themostcomprehensivetestbeduptonow,andempiricallyprobetheneedofconsideringadditionaldimensionsandvariables.
TheevaluationhasbeenconductedonthreeSPARQLqueryfederationsystems,andtheanalysisoftheseresultshasallowedtouncoverpropertiesofthesesystemsthatwouldnormallybehiddenwiththeoriginaltestbeds.
1IntroductionThenumberofRDFdatasetsmadepubliclyavailablethroughSPARQLend-pointshasexplodedinrecentyears.
Thisfact,togetherwiththepotentialaddedvaluethatcanbeobtainedfromthecombinationofsuchdistributeddatasources,hasmotivatedthedevelopmentofsystemsthatallowexecutingqueriesoverfed-eratedSPARQLendpoints(e.
g.
,SPARQL-DQP[2],Jena'sARQ1,RDF::Query2,ANAPSID[1],FedX[10],ADERIS[3]).
SomesystemsuseSPARQL1.
0orad-hocextensions,whileothersrelyonthequeryfederationextensionsthatarebeingproposedaspartoftheupcomingSPARQL1.
1specication[6].
InparalleltothedevelopmentoffederatedSPARQLqueryevaluationsys-tems,severaltestbedshavebeencreated(e.
g.
,asdescribedin[2,7,8]),whichcomplementthosealreadyusedforsingle-endpointqueryevaluation.
Theroleofthesetestbedsistoallowevaluatingandcomparingthemaincharacteristicsofthesesystems,soastoprovideenoughinformationtoimprovethem.
Among1http://jena.
apache.
org/2http://search.
cpan.
org/~gwilliams/RDF-Query/P.
Cudre-Maurouxetal.
(Eds.
):ISWC2012,PartII,LNCS7650,pp.
313–324,2012.
cSpringer-VerlagBerlinHeidelberg2012314G.
Montoyaetal.
thefeaturesevaluatedbythesetestbedswecancite:i)functionalrequirementssupported,ii)eciencyoftheimplementationswithdierentcongurationsofdatasetsandwithdierenttypesofqueries,oriii)resiliencetochangesinthecongurationsofthesesystemsandtheunderlyingdatasets.
ThemostrecentandcompletetestbedisFedBench[8],whichproposesavarietyofqueriesindierentdomainsandwithdierentcharacteristics,includingstar-shaped,chain-likeandhybridqueries,andcomplexqueryformsusinganadaptationofSP2Bench[9].
Thesetestbedsarestepsforwardtowardsestablishingacontinuousbenchmark-ingprocessoffederatedSPARQLqueryengines.
However,theyarestillfarfromeectivelysupportingsuchbenchmarkingobjectives.
Infact,theydonotspecifycompletelyorevenconsidersomeofthedependentandindependentvariablesandcongurationsetupsthatcharacterizethetypesofproblemstobetackledinfederatedSPARQLqueryprocessing,andthatclearlyaecttheperformanceandqualityofdierentsolutions.
Thismayleadtoincorrectcharacterizationswhenthesetestbedsareusedtoselectthemostappropriatesystemsinagivenscenario,ortodecidethenextstepsintheirdevelopment.
Forexample,testbedsliketheonein[2]havelimitations.
First,queriesareexecuteddirectlyonliveSPARQLendpoints;thismeansthatexperimentsarenotreproducible,astheloadofendpointsandnetworklatencyvariesovertime.
Second,querieswereconstructedforthedataavailableintheselectedendpointsatthetimeofgeneratingthetestbed,butthestructureoftheseunderlyingRDFdatasourceschanges,andmayresultinqueriesthatarereturningdierentanswersorthatdonotreturnanyansweratall.
IncaseslikeFedBench[8],thelevelofreproducibilityisimprovedbyusingdatasetsthatcanbehandledlocally.
However,asshowninSection2,therearevariablesthatarenotyetconsideredinthisbenchmark(e.
g.
,networklatency,datasetcongurations)andthatareimportantinordertoobtainmoreaccurateandinformativeresults.
Theobjectiveofthispaperistodescriberstthecharacteristicsexhibitedbythesetestbeds(mainlyfocusingonFedBench)andreectontheircurrentlim-itations.
Additionalvariablesandcongurationsetups(e.
g.
,newqueries,newcongurationsofnetworklatencydetails,newdatasetdistributionparameters)areproposedinordertoprovidemoreaccurateandwell-informedoverviewsofthecurrentstatusofeachoftheevaluatedsystems,sothattheexperimentstobeexecutedcanoermoreaccurateinformationaboutthebehavioroftheevalu-atedsystems,andhencetheycanbeusedincontinuousimprovementprocesses.
Finally,wedescribebrieytheresultsofourevaluationofthisextendedtestbedusingthreedierentfederatedqueryengines:ARQ,ANAPSID,andFedX.
2SomeLimitationsofExistingTestbedsThereisnounique"one-size-ts-all"testbedtomeasureeverycharacteristicneededbyanapplicationthatrequiressomeformoffederatedqueryprocess-ing[8].
However,regardless,existingtestbedscanstillbeimprovedsothattheycanfullltheirroleincontinuousbenchmarkingprocesses.
Wewillrstillustratewhyweneedtoimproveexistingtestbeds,particularlyFedBench,bydescribingascenariowheretheuseofthetestbedinitscurrentBenchmarkingFederatedSPARQLQueryEngines315formmayleadtowrongdecisions.
WehaveexecutedtheFedBenchtestbedwiththreesystems(ANAPSID,ARQ,andFedX)onthethreesetsofqueriesproposed(LifeScience,CrossDomain,andLinkedData)[4].
Wehaveuseddierentsimu-latedcongurationsfornetworklatenciesanddierentdatadistributionsofthedatasetsusedintheexperiments.
Asaresult,weobserveinterestingresultsthatsuggesttheneedforimprovements.
Forinstance,fortheCross-DomainqueryCD1,allsystemsbehavewellinaperfectnetwork(asshowninTable1).
However,theirbehaviorchangesdramaticallywhennetworklatenciesareconsidered.
Forinstance,ARQisnotabletohandlethisqueryformedium-fastandfastnet-works,giventhetimeoutconsidered;thetimeneededtoexecutethequeryinthecaseofFedXgrowsfrom0.
72secs.
(perfectnetwork)to2.
23secs.
(fastnetwork)and16.
93secs.
(medium-fastnetwork);andforANAPSIDtheresultsaresimilarforperfectandfastnetworks,andgrowsslowerinmedium-fastnetworks.
Table1.
EvaluationofFedBenchqueryCD1-Numberofresultsandexecutiontime(secs.
)underdierentnetworklatencyconditions.
Timeoutwassetupto30minutes.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-FastNetwork(DelaysfollowGammadistribution(α=3,β=1.
0)).
NumberofresultsExecutiontime(secs.
)Executiontime(secs.
)(rsttuple)(alltuples)QueryEngineMediumFastPerfectMediumFastPerfectMediumFastPerfectANAPSID6161610.
980.
170.
160.
980.
170.
16FedX61616116.
932.
230.
7216.
932.
230.
72ARQ––63––0.
98––0.
98ThisisalsothecaseforotherFedBenchqueries(e.
g.
,LD10,LD11,LS7,CD2),wheredierentbehaviorscanbeobserveddependingnotonlyonnetworklatency,butalsoonadditionalparameters,e.
g.
,datadistribution.
Whattheseexamplesshowisthatthoseparametersarealsoimportantwhenconsideringfederatedqueryprocessingapproaches,andshouldbeconguredinatestbed,soastoprovidesucientinformationfordecisionmakerstoselecttherighttoolforthetypeofproblembeinghandled,orfortooldeveloperstounderstandbettertheweaknessesoftheirsystemsandimprovethemaccordingly,ifpossible.
Finally,thereisalsoanotheraspectthatisimportantwhenconsideringthequalityofexistingtestbeds,anditisthefactthatsometimestherearenotsucientexplanationsaboutthepurposeofeachoftheparametersthatcanbecongured.
Forexample,inthecaseofFedBenchthereareseveralparam-etersthatareconsideredwhendescribingqueries,aspresentedin[8],suchaswhetherthequeryusesoperatorslikeconjunctions,unions,ltersoroptionals,modierslikeDISTINCT,LIMIT,OFFSETorORDERBY,andstructureslikestar-shapedqueries,chainsorhybridcases.
Whilethisisquiteacomprehen-sivesetoffeaturestocharacterizeaSPARQLquery,therearenoclearreasonsaboutwhyeachofthe36queriesfromthetestbedareincluded.
Onlysomeex-amplesareprovidedin[8],explainingthatLS4"includesastar-shapedgroupoftriplepatternsofdrugswhichisconnectedviaowl:sameAslinktoDBpediadrugentities",orthatCD5isa"chain-likequeryforndinglmentitieslinked316G.
Montoyaetal.
viaowl:sameAsandrestrictedongenreanddirector".
However,therearenoexplanationsinthepaperorinthecorrespondingbenchmarkwebsiteaboutthereasonsforincludingeachofthem.
Furthermore,thereareparametersthatarenotadequatelyrepresented(e.
g.
,commonqueryoperatorslikeoptionalsandl-tersdonotappearincrossdomainorlinkeddataqueries),andcharacteristicsthatarenotsucientlydiscussed(e.
g.
,thenumberoftriplepatternsineachbasicgraphpatternappearinginthequery,theselectivityofeachpartofthequery,etc.
),whichmakesthetestbednotcompleteenough.
Insummary,whileweacknowledgetheimportanceofthesetestbedsinthestateoftheartoffederatedqueryprocessingevaluation,wecanidentifysomeoftheirshortcomingswhichweillustrateanddescribeindierentscenarios.
3BenchmarkDesignInthissectionwedescribesomeofthevariablesthathaveanimpactonfed-eratedSPARQLqueryengines.
Therearetwogroupsofvariables:independentanddependent.
Independentvariablesarethosecharacteristicsthatneedtobeminimallyspeciedinthebenchmarkinordertoensurethatevaluationscenar-iosarereplicable.
Independentvariableshavebeengroupedintofourdimensions:Query,Data,Platform,andEndpoint.
Dependent(orobserved)variablesarethosecharacteristicsthatarenormallyinuencedbyindependentvariables,asdescribedinTable2,andthatwillbemeasuredduringtheevaluation:–EndpointSelectionTime.
ElapsedtimebetweenquerysubmissionandthegenerationoftheSPARQL1.
1federatedqueryannotatedwiththeendpointswheresub-querieswillbeexecuted3.
–ExecutionTime.
Thisvariableisinturncomprisedof:i)Timeforthersttupleorelapsedtimebetweenquerysubmissionandrstanswer,ii)Timedistributionofthereceptionofqueryanswers,andiii)Totalexecutiontime.
–AnswerCompleteness.
Numberofanswersreceivedinrelationtothedataavailableintheselectedendpoints.
Inthefollowingsectionswedescribeindependentvariablesinmoredetail.
3.
1QueryDimensionThisdimensiongroupsvariablesthatcharacterizethequeriesintermsoftheirstructure,evaluation,andquerylanguageexpressivity.
Regardingthestructureofthequery,wefocusonthreemainaspects:i)thequeryplanshape,ii)thenumberofbasictriplepatternsinthequery,andiii)theinstantiationsofsubject,objectand/orpredicatesinthequery.
3ThisvariableisapplicableonlyincaseswherethesystemhandlesSPARQL1.
0queriesandnoendpointsarespeciedinthequery;hence,thesequerieshavetobetranslatedintoSPARQL1.
1orintoanequivalentinternalrepresentation.
BenchmarkingFederatedSPARQLQueryEngines317Table2.
VariablesthatimpactthebehaviorofSPARQLfederatedenginesObservedVariablesIndependentVariablesEndpointSelectionTimeExecutionTimeAnswerCompletenessQueryqueryplanshape#basictriplepatterns#instantiationsandtheirpositionjoinselectivity#intermediateresultsanswersizeusageofquerylanguageexpressivity#generalpredicatesDatadatasetsizedatafrequencydistributiontypeofpartitioningdataendpointdistributionPlatformcacheon/offRAMavailable#processorsEndpoint#endpointsendpointtyperelationgraph/endpoint/instancenetworklatencyinitialdelaymessagesizetransferdistributionanswersizelimittimeoutShape.
Queryplansmaybestar-shaped,chain-shapedoracombinationofthem,asdescribedin[8].
Ingeneral,theshapeoftheinputqueriesandofthequeryplansgeneratedbythesystemshasanimportantimpactonthethreedependentvariablesidentiedinourevaluation(endpointselectiontime,ifap-plicable,executiontimeandanswercompleteness).
Theshapeofthequeryplanswillbeinturnaectedbythenumberofbasictriplepatternsinthequerysincethisnumberwillinuencethenalqueryshape.
Queryevaluationsystemscanapplydierenttechniqueswhengeneratingqueryplansforaspecictypeofinputquery,andthiswillnormallyyielddierentselectionandexecutiontimes,andcompletenessresults.
Forexample,aqueryplangeneratormayormaynotgrouptogetherallgraphpatternsrelatedtooneendpoint.
Instantiationsandtheirpositionintriplepatterns.
Thisisrelatedtowhetheranyoftheelementsofthetriplepatternsinthequery(subject,objectorpredicate)arealreadyinstantiated,i.
e.
,boundedtosomeURI.
Togetherwithjoinselectivity,instantiationhasanimportantimpactonthepotentialnum-berofintermediateresultsthatmaybegeneratedthroughoutqueryexecution.
Forinstance,theabsenceofinstantiations(e.
g.
,presenceofvariables)inthepredicatepositionofatriplepatternmayhaveanimportantimpactinqueryexecutiontime,becauseseveralendpointsmaybeabletoprovideanswersforthepattern.
Answersizeandnumberofintermediateresults.
Ifthenumberofan-swersorintermediateresultsinvolvedinaqueryexecutionislarge,itmaytakealongtimetotransferthemacrossthenetwork,andhencethismayaectthequeryexecutiontime.
318G.
Montoyaetal.
Usageofquerylanguageexpressivity.
TheuseofspecicSPARQLop-eratorsmayaecttheexecutiontimeandthecompletenessofthenalresultset.
Forexample,theOPTIONALoperatorisoneofthemostcomplexoperatorsinSPARQL[5]andmayaddagoodnumberofintermediateresults,whiletheFILTERoperatormayrestricttheintermediateresultsandanswersize.
Generalpredicates(e.
g.
,rdf:type,owl:sameAs)arecommonlyusedinSPARQLqueries.
However,astheynormallyappearinmostdatasetsitisnotalwayscleartowhichendpointthecorrespondingsubqueryshouldbesubmit-ted,andthismayhaveanimpactinbothendpointselectionandqueryexecutiontime.
3.
2DataDimensionWenowdescribetheindependentvariablesrelatedtothecharacteristicsoftheRDFdatasetsthatarebeingaccessed.
AnRDFdatasetcanbedenedintermsofitssizeanditsstructuralcharacteristicslikethenumberofsubjects,pred-icatesandobjects,andtheinandoutdegreeofproperties.
Thesecharacteristicsimpactthenumberoftriplesthataretransferred,andhencethetotalexecutiontime.
Additionally,theymayaecttheperformanceoftheindividualendpoints.
Partitioninganddatadistributionaretwoofthemostimportantvari-ablesthatneedtobespeciedinthecontextofqueriesagainstfederationsofendpoints.
PartitioningreferstothewaythattheRDFdatasetisfragmented.
Datadistributionisthewaypartitionsareallocatedtothedierentendpoints.
Datamaybefullycentralized,fullydistributed,orsomewhereinbetween.
Adatasetmaybefragmentedintodisjunctpartitions;thepartitioningmaybedonehorizontally,verticallyoracombinationofboth.
Horizontalpartitioningfragmentstriplessothattheymaycontaindierentproperties.
Verticalparti-tioningproducesfragmentswhichcontainallthetriplesofatleastoneofthepropertiesinthedataset.
Horizontalpartitioningimpactsonthecompletenessoftheanswerwhereasverticalpartitioningaectstheexecutiontime.
Parti-tionsmaybereplicatedinseveralendpoints,eveninalloftheendpoints,i.
e.
,fullyreplicated,sothattheavailabilityofthesystemincreasesincaseofend-pointfailureorendpointdelay.
Table3comparesthebehaviorofANAPSIDandFedXwithdierentcongurations.
Thetwoenginesbehavesimilarlywhenthereisonedatasetperendpointandinhorizontalpartitioningwithoutreplication.
Forverticalpartitioningwithoutreplication,oneengineissuperiortotheother.
Whenpartitioningwithreplication,oneengineoutperformstheotherinverticalpartitioning,andtheinversebehavioroccurswithhorizontalpartitioning.
Table4showsanotherexampleoftheeectofdatadistributiononthequeryexecutiontime,againforANAPSIDandFedX.
Wecanobservethatwhentherearemultipleendpoints,resultsaresimilar,whilewithanetworkwithnodelay(perfectnetwork)andalldatasetsinasingleendpoint,oneoftheenginesclearlyoutperformstheotherinoneorderofmagnitude.
ResultsinTables3and4supporttheclaimthatdatapartitioning,datadistributionandnetworkdelaysneedtobeexplicitlycongurableintestbeds.
BenchmarkingFederatedSPARQLQueryEngines319Table3.
ImpactofDataPartitioningandDistributiononFedBenchqueryLD10(Per-fectNetwork).
VerticalPartitioning:triplesofpredicatesskos:subject,owl:sameAs,andnytimes:latestusewerestoredinfragments.
VerticalPartitioningWithoutReplication:threeendpoints,eachfragmentinadierentendpoint.
VerticalPar-titioningWithReplication:correspondstousefourendpointsandstoreoneofthethreefragmentsinthefourendpoints,anotherfragmentintwoendpoints,andthelastfragmentinoneendpoint.
HorizontalPartitioning:triplesofthethreepredicateswerepartitionedintwofragments;eachfragmenthasdatatoproduceatleastoneanswer.
HorizontalPartitioningWithoutReplicationtwoendpoints;onefragmentinadierentendpoint.
HorizontalPartitioningWithReplicas:fourendpoints;onefragmentisreplicatedineachendpoint,theotherfragmentinonlyoneendpoint.
QueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsOneDatasetperEndpointFedX1.
061.
063ANAPSID1.
081.
283VerticalPartitioningWithoutReplicationFedX0.
690.
693ANAPSID3.
8814.
253HorizontalPartitioningWithoutReplicationFedX0.
720.
723ANAPSID0.
030.
031VerticalPartitioningWithReplicationFedX0.
850.
8514ANAPSID4.
0614.
483HorizontalPartitioningWithReplicationFedX0.
910.
9125ANAPSID0.
060.
0613.
3PlatformDimensionThePlatformdimensiongroupsvariablesthatarerelatedtothecomputinginfrastructureusedintheevaluation.
Hereweincludeaminimumsetofpa-rameters,relatedtothesystem'scache,availableRAMmemoryandnumberofprocessors,sincethisdimensionmaycontainmanymoreparametersthatarerelevantinthiscontext,andthatshouldanywaybeexplicitlyspeciedinanyevaluationsetupwhenusingthistestbed.
TurningthecachemanagementfunctioninthesystemtogetherwiththeavailableRAMmayaectgreatlythequeryexecutiontime.
ThemeaningofdroppingandwarmingupcacheneedstobeclearlyspeciedaswellastheTable4.
ImpactofDataDistributiononFedBenchqueryCD1(PerfectNetwork).
AllDatasetsinoneendpointversusdatasetsdistributedindierentendpoints.
QueryExecutiontimeExecutiontimeNumberEngineFirstTuple(secs.
)AllTime(secs.
)ofResultsSingleEndpoint-AllDatabasetsFedX0.
510.
5161ANAPSID0.
0450.
04661MultipleEndpointsFedX0.
720.
7261ANAPSID0.
170.
1761320G.
Montoyaetal.
numberofiterationswhereanexperimentisruninwarmcache,andwhencachecontentsaredroopedo.
Inthecontextoffederationsofendpoints,informationonendpointcapabilitiesmaybestoredincache.
Thenumberofprocessorsisalsoarelevantvariableinthecontextoffederatedqueries.
Iftheinfrastruc-tureoersseveralprocessors,operatorsmayparallelizetheirexecution,andtheexecutiontimemaybeaectedpositively.
3.
4EndpointDimensionThisdimensioncomprisesvariablesthatarerelatedtothenumberandcapabil-itiesoftheendpointsusedinthetestbed.
TherstvariabletobeconsideredisthenumberofSPARQLendpointswherethequerywillbesubmittedandthetypeofendpointsthatareusedfortheevaluation.
Therstvariableaectsallthreeobservedvariables,speciallytheresultcompletenessbecausedierentendpointsmayproducedierentan-swers.
Therelationshipbetweenthenumberofinstances,graphsandendpointsofthesystemsusedduringtheevaluationisalsoanimportantas-pectthatneedstobespecied.
Dierentcongurationsoftheserelationshipsmayimpactthethreedependentvariables.
Thereareseveralvariablesthathaveanimportantimpactontheexecutiontime,suchasthetransferdistribution,whichisthetimedistributionofthetransmissionofpacketsbytheendpoints,thenetworklatency,whichdenesthedelayinsendingpacketsthroughthenetwork,andtheinitialendpointdelay.
AnexampleoftheimpactofdierentnetworkdelaysisillustratedinTable5.
TwoqueriesfromtheLinkedDatacollectionofFedBenchwereexecuted(LD10andLD11).
NotethatANAPSIDandFedXbehavesimilarlyinLD10whenthereisnodelay;however,whendelaysareconsidered,FedXoutperformsANAPSID.
Ontheotherhand,inLD11ANAPSIDoutperformsFedXwhendelaysarepresent.
Infact,ANAPSIDisabletoproducethersttupleafterthesameamountoftime,independentlyofthedelay.
Finally,SPARQLendpointsnormallyallowconguringalimitonthean-swersizeofthequeriesandatimeout,soastopreventuserstoquerytheentiredataset.
Thismaygenerateemptyresultsetsorincompleteresults,particularlywhenendpointsub-queriesarecomplex.
4SomeExperimentalResultsInthissectionweillustratehowthetestbedextensioncanbeusedtobetterunderstandthebehaviorofsomeoftheexistingfederatedqueryengines.
Theextendedtestbedhasbeenexecutedonthreesystems(ANAPSID,ARQandFedX)withseveralcongurationsfortheindependentvariablesidentiedinSection3.
ThecompleteresultsetgeneratedbytheseexecutionscanbebrowsedattheDEFENDERportal4.
4http://159.
90.
11.
58/BenchmarkingFederatedSPARQLQueryEngines321Table5.
ImpactofNetworklatencyonFedBenchqueriesLD10andLD11.
Timeoutwassetupto30minutesandMessageSizeis16KB.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-Fast(DelaysfollowGammadistribution(α=3,β=1.
0);Medium-Slow(DelaysfollowGammadistribution(α=3,β=1.
5);Slow(DelaysfollowGammadistribution(α=5,β=2.
0)).
QueryQueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsPerfectNetworkANAPSIDLD101.
081.
293LD110.
060.
09376FedXLD101.
061.
063LD115.
445.
44376FastNetworkANAPSIDLD1018.
1322.
893LD110.
062.
80376FedXLD103.
453.
453LD1114.
2114.
22376MediumFastNetworkANAPSIDLD10191.
78241.
583LD110.
0727.
86376FedXLD1027.
2727.
273LD11108.
93108.
93376MediumSlowNetworkANAPSIDLD10287.
88362.
593LD110.
0541.
74376FedXLD1041.
4241.
423LD11162.
45162.
45376SlowNetworkANAPSIDLD10653.
44819.
723LD110.
0992.
52376FedXLD1087.
1987.
193LD11347.
93347.
93376Nowwewillfocusononeoftheanalysesthatasystemdevelopermaybeinterestedin,inthecontextofthecontinuousbenchmarkingprocessthatwehavereferredtointhispaper.
Thatis,wearenotanalyzingthewholesetofresultsobtainedfromtheexecution,butonlyasubsetofit.
Specically,let'sassumethatweareinterestedinunderstandingtheperformanceofthethreeevaluatedsystemsunderdierentdatadistributionsinanidealscenario,withnoornegligibleconnectionlatency.
Ourhypothesisisthatexistingqueryenginesaresensibletothewaydataisdistributedalongdierentendpoints,evenwhenthenetworkisperfect.
Therefore,theseresultsmaybeusefultovalidatethathypothesisandtounderstandwhetherasetoffederateddatasetsforwhichwehavethecorrespondingRDFdumpsshouldbebetterstoredinasingleendpointorindierentendpointstooeranswersmoreeciently.
Basedonthesetofvariablesidentiedinourstudy,thefollowingexperimentalsetupisused:DatasetsandQueryBenchmarks.
Weran36queriesagainsttheFedBenchdatasetcollections[8]:DBPedia,NYTimes,Geonames,KEGG,ChEBI,Drugbank,Jamendo,LinkedMDB,andSWDogFood.
Thesequeriesinclude25FedBenchqueriesandelevencomplexqueries5.
Thelatterareaddedto5http://www.
ldc.
usb.
ve/~mvidal/FedBench/queries/ComplexQueries322G.
Montoyaetal.
coversomeofthemissingelementsintheformergroupofqueries.
Theyarecomprisedofbetween6and48triplepatterns,andcanbedecomposedintoupto8sub-queries;andtheycoverdierentSPARQLoperators.
Virtuoso6wasusedtoimplementendpoints,andthetimeoutwassetupto240secs.
or71,000tuples.
ExperimentswereexecutedonaLinuxMintmachinewithanIntelPentiumCore2DuoE75002.
93GHz8GBRAM1333MHzDDR3.
NetworkLatency.
Weconguredaperfectnetworkwithnodelays.
Thesizeofthemessagecorrespondedto16KB.
DataDistribution.
Weconsideredtwodierentdistributionsofthedata:i)Complete:theFedBenchcollectionswerestoredintoasinglegraphandmadeaccessiblethroughonesingleSPARQLendpoint,andii)Federated:theFedBenchcollectionswerestoredinnineVirtuosoendpoints.
Therefore,weconsiderthequeriesinfourgroupsandsixcongurations:Con-guration1:ANAPSIDCompleteDistribution,Conguration2:ANAPSIDFederatedDistribution,Conguration3:ARQCompleteDistribution,Con-guration4:ARQFederatedDistribution,Conguration5:FedXCompleteDistribution,Conguration6:FedXFederatedDistribution.
Ineachcong-uration,thecorrespondingquerieswereorderedaccordingtothetotalexecu-tiontimeconsumedbythecorrespondingengines.
Forexample,ANAPSIDinaCompleteDistribution,i.
e.
,Conguration1,theCross-Domainquerieswereorderedasfollows:CD2,CD3,CD4,CD5,CD1,CD7,andCD6.
QueriesofeachcongurationwerecomparedusingtheSpearman'sRhocorrelation.
Ahighpos-itivevalueofcorrelationvaluebetweentwocongurationsindicatesthatthecorrespondingengineshadasimilarbehavior,i.
e.
,thetrendsofexecutiontimeofthetwoenginesaresimilar.
Thus,whenConguration1iscomparedtoitself,theSpearman'sRhocorrelationreachesthehighestvalue(1.
0).
Ontheotherhand,anegativevalueindicatesaninversecorrelation;forexample,thishappenedwithComplexQueriestoARQinaCompleteDistribution(Cong-uration3)whencomparedtoFedXFederatedDistribution(Conguration6);itsvalueis-0.
757.
Finally,avalueof0.
0representsthatthereisnocorrela-tionbetweenthetwocongurations,e.
g.
,forLifeSciencequeriesConguration4andConguration6.
Figure1illustratestheresultsofthisspecicstudy(again,thedatausedforthisstudyisavailablethroughtheDEFENDERpor-tal).
Whitecirclesrepresentthehighestvalueofcorrelation;redonescorrespondtoinversecorrelations,whileblueonesindicateapositivecorrelation.
Thesizeofthecirclesisproportionaltothevalueofthecorrelation.
Givenagroupofqueries,alowvalueofcorrelationofoneengineintwodierentdistributionssuggeststhatthedistributionaectstheenginebehavior,e.
g.
,FedXandARQinComplexQuerieswithdierentdatadistributionshavecorrelationvaluesof0.
143and0.
045,respectively.
Furthermore,thenumberofsmallbluecirclesbe-tweencongurationsofdierentdatadistributionsofthesameengine,indicatethatthisparameteraectsthebehaviorofthestudiedengine.
BecausethereareseveralofthesepointsintheComplexQueriesplot,wecanconcludethat6http://virtuoso.
openlinksw.
com/BenchmarkingFederatedSPARQLQueryEngines323(a)CrossDomain(CD)(b)LinkedData(LD)(c)LifeScience(LS)(d)NewComplexQueries(C)Fig.
1.
Spearman'sRhoCorrelationofQueriesinthreeFedBenchsetsofqueries(a)Cross-Domain(CD),(b)LifeScience(LS),(c)LinkedData(LD)and(d)NewCom-plexQueries.
Sixcongurations:(1)ANAPSIDCompleteDistribution;(2)ANAPSIDFederatedDistribution;(3)ARQCompleteDistribution;(4)ARQFederatedDis-tribution;(5)FedXCompleteDistribution;(6)FedXFederatedDistribution.
Whitecirclescorrespondtocorrelationvalueof1.
0;bluecirclesindicateapositivecorrelation(Fig.
1(d)points(3,4)and(5,6)correlationvalues0.
045and0.
143,respectively);redcirclesindicateanegativecorrelation(Fig.
1(d)points(2,6)and(6,3)correlationvalues-0.
5and-0.
757,respectively).
Circles'diametersindicateabsolutecorrelationvalues.
thesetwoparameters(querycomplexityanddatadistribution)allowuncoveringengines'behaviorthatcouldnotbeobservedbefore.
Thisillustratestheneedfortheextensionsproposedinthispaper.
5ConclusionandFutureWorkInthispaperwehaveshownthatthereisaneedtoextendcurrentfederatedSPARQLquerytestbedswithadditionalvariablesandcongurationsetups(e.
g.
,datapartitioninganddistribution,networklatency,andquerycomplexity),soastoprovidemoreaccuratedetailsofthebehaviorofexistingengines,whichcanthenbeusedtoprovidebettercomparisonsandasinputforimprovementproposals.
Takingthoseadditionalvariablesintoaccount,wehaveextensivelyevaluatedthreeoftheexistingengines(ANAPSID,ARQandFedX),andhavemadeavailablethoseresultsforpublicconsumptionintheDEFENDERportal,324G.
Montoyaetal.
whichweplantomaintainup-to-dateonaregularbasis.
Wehavealsoshownhowthegeneratedresultdatasetcanbeusedtovalidatehypothesesaboutthesystems'behavior.
OurfutureworkplanswillbefocusedoncontinuingwiththeevaluationofadditionalfederatedSPARQLqueryengines,andwiththeinclusionofadditionalparametersinthebenchmarkthatmaystillbeneededtoprovidemoreaccurateandwell-informedresults.
Acknowledgements.
ThisworkhasbeenfundedbytheprojectmyBigData(TIN2010-17060),andDID-USB.
WethankMaribelAcosta,CosminBasca,andRaulGarca-Castroforfruitfuldiscussions.
References1.
Acosta,M.
,Vidal,M.
-E.
,Lampo,T.
,Castillo,J.
,Ruckhaus,E.
:ANAPSID:AnAdaptiveQueryProcessingEngineforSPARQLEndpoints.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
18–34.
Springer,Heidelberg(2011)2.
Buil-Aranda,C.
,Arenas,M.
,Corcho,O.
:SemanticsandOptimizationoftheSPARQL1.
1FederationExtension.
In:Antoniou,G.
,Grobelnik,M.
,Simperl,E.
,Parsia,B.
,Plexousakis,D.
,DeLeenheer,P.
,Pan,J.
(eds.
)ESWC2011,PartII.
LNCS,vol.
6644,pp.
1–15.
Springer,Heidelberg(2011)3.
Lynden,S.
,Kojima,I.
,Matono,A.
,Tanimura,Y.
:ADERIS:AnAdaptiveQueryProcessorforJoiningFederatedSPARQLEndpoints.
In:Meersman,R.
,Dillon,T.
,Herrero,P.
,Kumar,A.
,Reichert,M.
,Qing,L.
,Ooi,B.
-C.
,Damiani,E.
,Schmidt,D.
C.
,White,J.
,Hauswirth,M.
,Hitzler,P.
,Mohania,M.
(eds.
)OTM2011,PartII.
LNCS,vol.
7045,pp.
808–817.
Springer,Heidelberg(2011)4.
Montoya,G.
,Vidal,M.
-E.
,Acosta,M.
:DEFENDER:aDEcomposerforquEriesagainstfeDERationsofendpoints.
In:ExtendedSemanticWebConference,ESWCWorkshopandDemo2012(toappear)5.
Perez,J.
,Arenas,M.
,Gutierrez,C.
:SemanticsandcomplexityofSPARQL.
TODS34(3)(2009)6.
Prud'hommeaux,E.
,Buil-Aranda,C.
:SPARQL1.
1federatedquery(November2011)7.
Quilitz,B.
,Leser,U.
:QueryingDistributedRDFDataSourceswithSPARQL.
In:Bechhofer,S.
,Hauswirth,M.
,Homann,J.
,Koubarakis,M.
(eds.
)ESWC2008.
LNCS,vol.
5021,pp.
524–538.
Springer,Heidelberg(2008)8.
Schmidt,M.
,G¨orlitz,O.
,Haase,P.
,Ladwig,G.
,Schwarte,A.
,Tran,T.
:FedBench:ABenchmarkSuiteforFederatedSemanticDataQueryProcessing.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
585–600.
Springer,Heidelberg(2011)9.
Schmidt,M.
,Hornung,T.
,Lausen,G.
,Pinkel,C.
:SP2bench:ASPARQLperfor-mancebenchmark.
In:ICDT,pp.
4–33(2010)10.
Schwarte,A.
,Haase,P.
,Hose,K.
,Schenkel,R.
,Schmidt,M.
:FedX:OptimizationTechniquesforFederatedQueryProcessingonLinkedData.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
601–616.
Springer,Heidelberg(2011)

DiyVM:50元/月起-双核,2G内存,50G硬盘,香港/日本/洛杉矶机房

DiyVM是一家比较低调的国人主机商,成立于2009年,提供VPS主机和独立服务器租用等产品,其中VPS基于XEN(HVM)架构,数据中心包括香港沙田、美国洛杉矶和日本大阪等,CN2或者直连线路,支持异地备份与自定义镜像,可提供内网IP。本月商家最高提供5折优惠码,优惠后香港沙田CN2线路VPS最低2GB内存套餐每月仅50元起。香港(CN2)VPSCPU:2cores内存:2GB硬盘:50GB/R...

Gcorelabs:美国GPU服务器,8路RTX2080Ti;2*Silver-4214/256G内存/1T SSD,1815欧/月

gcorelabs怎么样?gcorelabs是创建于2011年的俄罗斯一家IDC服务商,Gcorelabs提供优质的托管服务和VPS主机服务,Gcorelabs有一支强大的技术队伍,对主机的性能和稳定性要求非常高。Gcorelabs在 2017年收购了SkyparkCDN并提供全球CDN服务,目标是进入全球前五的网络服务商。G-Core Labs总部位于卢森堡,在莫斯科,明斯克和彼尔姆设有办事处。...

百驰云(19/月),高性能服务器,香港三网CN2 2核2G 10M 国内、香港、美国、日本、VPS、物理机、站群全站7.5折,无理由退换,IP免费换!

百驰云成立于2017年,是一家新国人IDC商家,且正规持证IDC/ISP/CDN,商家主要提供数据中心基础服务、互联网业务解决方案,及专属服务器租用、云服务器、云虚拟主机、专属服务器托管、带宽租用等产品和服务。百驰云提供源自大陆、香港、韩国和美国等地骨干级机房优质资源,包括BGP国际多线网络,CN2点对点直连带宽以及国际顶尖品牌硬件。专注为个人开发者用户,中小型,大型企业用户提供一站式核心网络云端...

linuxmint为你推荐
李子柒年入1.6亿宋朝鼎盛时期 政府财政收入有将近1亿贯铜钱,那么GDP是多少呢?关键字什么叫关键词冯媛甑夏如芝是康熙来了的第几期?8090lu.com《8090》节目有不有高清的在线观看网站啊?www.sesehu.comwww.121gao.com 是谁的网站啊www.zhiboba.com看NBA直播的网站哪个知道yinrentangWeichentang正品怎么样,谁知道?haole012.com说在:012qq.com这个网站能免费挂QQ,是真的吗?javlibrary.comsony home network library官方下载地址www.147.qqq.com谁有147清晰的视频?学习学习
长沙虚拟主机 域名网站 绍兴服务器租用 enzu 老鹰主机 mach5 cpanel 宕机监控 表单样式 e蜗牛 本网站在美国维护 最好的免费空间 asp免费空间申请 网站木马检测工具 hkg 架设邮件服务器 英雄联盟台服官网 国内域名 中国联通宽带测速 域名转入 更多