Delayslinuxmint

linuxmint  时间:2021-03-28  阅读:()
BenchmarkingFederatedSPARQLQueryEngines:AreExistingTestbedsEnoughGabrielaMontoya1,Maria-EstherVidal1,OscarCorcho2,EdnaRuckhaus1,andCarlosBuil-Aranda31UniversidadSimonBolvar,Venezuela{gmontoya,mvidal,ruckhaus}@ldc.
usb.
ve2OntologyEngineeringGroup,UniversidadPolitecnicadeMadrid,Spainocorcho@fi.
upm.
es3DepartmentofComputerScience,PonticiaUniversidadCatolica,Chilecbuil@ing.
puc.
clAbstract.
Testbedsproposedsofartoevaluate,compare,andeventu-allyimproveSPARQLqueryfederationsystemshavestillsomelimita-tions.
Somevariablesandcongurationsthatmayhaveanimpactonthebehaviorofthesesystems(e.
g.
,networklatency,datapartitioningandqueryproperties)arenotsucientlydened;thisaectstheresultsandrepeatabilityofindependentevaluationstudies,andhencetheinsightsthatcanbeobtainedfromthem.
InthispaperweevaluateFedBench,themostcomprehensivetestbeduptonow,andempiricallyprobetheneedofconsideringadditionaldimensionsandvariables.
TheevaluationhasbeenconductedonthreeSPARQLqueryfederationsystems,andtheanalysisoftheseresultshasallowedtouncoverpropertiesofthesesystemsthatwouldnormallybehiddenwiththeoriginaltestbeds.
1IntroductionThenumberofRDFdatasetsmadepubliclyavailablethroughSPARQLend-pointshasexplodedinrecentyears.
Thisfact,togetherwiththepotentialaddedvaluethatcanbeobtainedfromthecombinationofsuchdistributeddatasources,hasmotivatedthedevelopmentofsystemsthatallowexecutingqueriesoverfed-eratedSPARQLendpoints(e.
g.
,SPARQL-DQP[2],Jena'sARQ1,RDF::Query2,ANAPSID[1],FedX[10],ADERIS[3]).
SomesystemsuseSPARQL1.
0orad-hocextensions,whileothersrelyonthequeryfederationextensionsthatarebeingproposedaspartoftheupcomingSPARQL1.
1specication[6].
InparalleltothedevelopmentoffederatedSPARQLqueryevaluationsys-tems,severaltestbedshavebeencreated(e.
g.
,asdescribedin[2,7,8]),whichcomplementthosealreadyusedforsingle-endpointqueryevaluation.
Theroleofthesetestbedsistoallowevaluatingandcomparingthemaincharacteristicsofthesesystems,soastoprovideenoughinformationtoimprovethem.
Among1http://jena.
apache.
org/2http://search.
cpan.
org/~gwilliams/RDF-Query/P.
Cudre-Maurouxetal.
(Eds.
):ISWC2012,PartII,LNCS7650,pp.
313–324,2012.
cSpringer-VerlagBerlinHeidelberg2012314G.
Montoyaetal.
thefeaturesevaluatedbythesetestbedswecancite:i)functionalrequirementssupported,ii)eciencyoftheimplementationswithdierentcongurationsofdatasetsandwithdierenttypesofqueries,oriii)resiliencetochangesinthecongurationsofthesesystemsandtheunderlyingdatasets.
ThemostrecentandcompletetestbedisFedBench[8],whichproposesavarietyofqueriesindierentdomainsandwithdierentcharacteristics,includingstar-shaped,chain-likeandhybridqueries,andcomplexqueryformsusinganadaptationofSP2Bench[9].
Thesetestbedsarestepsforwardtowardsestablishingacontinuousbenchmark-ingprocessoffederatedSPARQLqueryengines.
However,theyarestillfarfromeectivelysupportingsuchbenchmarkingobjectives.
Infact,theydonotspecifycompletelyorevenconsidersomeofthedependentandindependentvariablesandcongurationsetupsthatcharacterizethetypesofproblemstobetackledinfederatedSPARQLqueryprocessing,andthatclearlyaecttheperformanceandqualityofdierentsolutions.
Thismayleadtoincorrectcharacterizationswhenthesetestbedsareusedtoselectthemostappropriatesystemsinagivenscenario,ortodecidethenextstepsintheirdevelopment.
Forexample,testbedsliketheonein[2]havelimitations.
First,queriesareexecuteddirectlyonliveSPARQLendpoints;thismeansthatexperimentsarenotreproducible,astheloadofendpointsandnetworklatencyvariesovertime.
Second,querieswereconstructedforthedataavailableintheselectedendpointsatthetimeofgeneratingthetestbed,butthestructureoftheseunderlyingRDFdatasourceschanges,andmayresultinqueriesthatarereturningdierentanswersorthatdonotreturnanyansweratall.
IncaseslikeFedBench[8],thelevelofreproducibilityisimprovedbyusingdatasetsthatcanbehandledlocally.
However,asshowninSection2,therearevariablesthatarenotyetconsideredinthisbenchmark(e.
g.
,networklatency,datasetcongurations)andthatareimportantinordertoobtainmoreaccurateandinformativeresults.
Theobjectiveofthispaperistodescriberstthecharacteristicsexhibitedbythesetestbeds(mainlyfocusingonFedBench)andreectontheircurrentlim-itations.
Additionalvariablesandcongurationsetups(e.
g.
,newqueries,newcongurationsofnetworklatencydetails,newdatasetdistributionparameters)areproposedinordertoprovidemoreaccurateandwell-informedoverviewsofthecurrentstatusofeachoftheevaluatedsystems,sothattheexperimentstobeexecutedcanoermoreaccurateinformationaboutthebehavioroftheevalu-atedsystems,andhencetheycanbeusedincontinuousimprovementprocesses.
Finally,wedescribebrieytheresultsofourevaluationofthisextendedtestbedusingthreedierentfederatedqueryengines:ARQ,ANAPSID,andFedX.
2SomeLimitationsofExistingTestbedsThereisnounique"one-size-ts-all"testbedtomeasureeverycharacteristicneededbyanapplicationthatrequiressomeformoffederatedqueryprocess-ing[8].
However,regardless,existingtestbedscanstillbeimprovedsothattheycanfullltheirroleincontinuousbenchmarkingprocesses.
Wewillrstillustratewhyweneedtoimproveexistingtestbeds,particularlyFedBench,bydescribingascenariowheretheuseofthetestbedinitscurrentBenchmarkingFederatedSPARQLQueryEngines315formmayleadtowrongdecisions.
WehaveexecutedtheFedBenchtestbedwiththreesystems(ANAPSID,ARQ,andFedX)onthethreesetsofqueriesproposed(LifeScience,CrossDomain,andLinkedData)[4].
Wehaveuseddierentsimu-latedcongurationsfornetworklatenciesanddierentdatadistributionsofthedatasetsusedintheexperiments.
Asaresult,weobserveinterestingresultsthatsuggesttheneedforimprovements.
Forinstance,fortheCross-DomainqueryCD1,allsystemsbehavewellinaperfectnetwork(asshowninTable1).
However,theirbehaviorchangesdramaticallywhennetworklatenciesareconsidered.
Forinstance,ARQisnotabletohandlethisqueryformedium-fastandfastnet-works,giventhetimeoutconsidered;thetimeneededtoexecutethequeryinthecaseofFedXgrowsfrom0.
72secs.
(perfectnetwork)to2.
23secs.
(fastnetwork)and16.
93secs.
(medium-fastnetwork);andforANAPSIDtheresultsaresimilarforperfectandfastnetworks,andgrowsslowerinmedium-fastnetworks.
Table1.
EvaluationofFedBenchqueryCD1-Numberofresultsandexecutiontime(secs.
)underdierentnetworklatencyconditions.
Timeoutwassetupto30minutes.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-FastNetwork(DelaysfollowGammadistribution(α=3,β=1.
0)).
NumberofresultsExecutiontime(secs.
)Executiontime(secs.
)(rsttuple)(alltuples)QueryEngineMediumFastPerfectMediumFastPerfectMediumFastPerfectANAPSID6161610.
980.
170.
160.
980.
170.
16FedX61616116.
932.
230.
7216.
932.
230.
72ARQ––63––0.
98––0.
98ThisisalsothecaseforotherFedBenchqueries(e.
g.
,LD10,LD11,LS7,CD2),wheredierentbehaviorscanbeobserveddependingnotonlyonnetworklatency,butalsoonadditionalparameters,e.
g.
,datadistribution.
Whattheseexamplesshowisthatthoseparametersarealsoimportantwhenconsideringfederatedqueryprocessingapproaches,andshouldbeconguredinatestbed,soastoprovidesucientinformationfordecisionmakerstoselecttherighttoolforthetypeofproblembeinghandled,orfortooldeveloperstounderstandbettertheweaknessesoftheirsystemsandimprovethemaccordingly,ifpossible.
Finally,thereisalsoanotheraspectthatisimportantwhenconsideringthequalityofexistingtestbeds,anditisthefactthatsometimestherearenotsucientexplanationsaboutthepurposeofeachoftheparametersthatcanbecongured.
Forexample,inthecaseofFedBenchthereareseveralparam-etersthatareconsideredwhendescribingqueries,aspresentedin[8],suchaswhetherthequeryusesoperatorslikeconjunctions,unions,ltersoroptionals,modierslikeDISTINCT,LIMIT,OFFSETorORDERBY,andstructureslikestar-shapedqueries,chainsorhybridcases.
Whilethisisquiteacomprehen-sivesetoffeaturestocharacterizeaSPARQLquery,therearenoclearreasonsaboutwhyeachofthe36queriesfromthetestbedareincluded.
Onlysomeex-amplesareprovidedin[8],explainingthatLS4"includesastar-shapedgroupoftriplepatternsofdrugswhichisconnectedviaowl:sameAslinktoDBpediadrugentities",orthatCD5isa"chain-likequeryforndinglmentitieslinked316G.
Montoyaetal.
viaowl:sameAsandrestrictedongenreanddirector".
However,therearenoexplanationsinthepaperorinthecorrespondingbenchmarkwebsiteaboutthereasonsforincludingeachofthem.
Furthermore,thereareparametersthatarenotadequatelyrepresented(e.
g.
,commonqueryoperatorslikeoptionalsandl-tersdonotappearincrossdomainorlinkeddataqueries),andcharacteristicsthatarenotsucientlydiscussed(e.
g.
,thenumberoftriplepatternsineachbasicgraphpatternappearinginthequery,theselectivityofeachpartofthequery,etc.
),whichmakesthetestbednotcompleteenough.
Insummary,whileweacknowledgetheimportanceofthesetestbedsinthestateoftheartoffederatedqueryprocessingevaluation,wecanidentifysomeoftheirshortcomingswhichweillustrateanddescribeindierentscenarios.
3BenchmarkDesignInthissectionwedescribesomeofthevariablesthathaveanimpactonfed-eratedSPARQLqueryengines.
Therearetwogroupsofvariables:independentanddependent.
Independentvariablesarethosecharacteristicsthatneedtobeminimallyspeciedinthebenchmarkinordertoensurethatevaluationscenar-iosarereplicable.
Independentvariableshavebeengroupedintofourdimensions:Query,Data,Platform,andEndpoint.
Dependent(orobserved)variablesarethosecharacteristicsthatarenormallyinuencedbyindependentvariables,asdescribedinTable2,andthatwillbemeasuredduringtheevaluation:–EndpointSelectionTime.
ElapsedtimebetweenquerysubmissionandthegenerationoftheSPARQL1.
1federatedqueryannotatedwiththeendpointswheresub-querieswillbeexecuted3.
–ExecutionTime.
Thisvariableisinturncomprisedof:i)Timeforthersttupleorelapsedtimebetweenquerysubmissionandrstanswer,ii)Timedistributionofthereceptionofqueryanswers,andiii)Totalexecutiontime.
–AnswerCompleteness.
Numberofanswersreceivedinrelationtothedataavailableintheselectedendpoints.
Inthefollowingsectionswedescribeindependentvariablesinmoredetail.
3.
1QueryDimensionThisdimensiongroupsvariablesthatcharacterizethequeriesintermsoftheirstructure,evaluation,andquerylanguageexpressivity.
Regardingthestructureofthequery,wefocusonthreemainaspects:i)thequeryplanshape,ii)thenumberofbasictriplepatternsinthequery,andiii)theinstantiationsofsubject,objectand/orpredicatesinthequery.
3ThisvariableisapplicableonlyincaseswherethesystemhandlesSPARQL1.
0queriesandnoendpointsarespeciedinthequery;hence,thesequerieshavetobetranslatedintoSPARQL1.
1orintoanequivalentinternalrepresentation.
BenchmarkingFederatedSPARQLQueryEngines317Table2.
VariablesthatimpactthebehaviorofSPARQLfederatedenginesObservedVariablesIndependentVariablesEndpointSelectionTimeExecutionTimeAnswerCompletenessQueryqueryplanshape#basictriplepatterns#instantiationsandtheirpositionjoinselectivity#intermediateresultsanswersizeusageofquerylanguageexpressivity#generalpredicatesDatadatasetsizedatafrequencydistributiontypeofpartitioningdataendpointdistributionPlatformcacheon/offRAMavailable#processorsEndpoint#endpointsendpointtyperelationgraph/endpoint/instancenetworklatencyinitialdelaymessagesizetransferdistributionanswersizelimittimeoutShape.
Queryplansmaybestar-shaped,chain-shapedoracombinationofthem,asdescribedin[8].
Ingeneral,theshapeoftheinputqueriesandofthequeryplansgeneratedbythesystemshasanimportantimpactonthethreedependentvariablesidentiedinourevaluation(endpointselectiontime,ifap-plicable,executiontimeandanswercompleteness).
Theshapeofthequeryplanswillbeinturnaectedbythenumberofbasictriplepatternsinthequerysincethisnumberwillinuencethenalqueryshape.
Queryevaluationsystemscanapplydierenttechniqueswhengeneratingqueryplansforaspecictypeofinputquery,andthiswillnormallyyielddierentselectionandexecutiontimes,andcompletenessresults.
Forexample,aqueryplangeneratormayormaynotgrouptogetherallgraphpatternsrelatedtooneendpoint.
Instantiationsandtheirpositionintriplepatterns.
Thisisrelatedtowhetheranyoftheelementsofthetriplepatternsinthequery(subject,objectorpredicate)arealreadyinstantiated,i.
e.
,boundedtosomeURI.
Togetherwithjoinselectivity,instantiationhasanimportantimpactonthepotentialnum-berofintermediateresultsthatmaybegeneratedthroughoutqueryexecution.
Forinstance,theabsenceofinstantiations(e.
g.
,presenceofvariables)inthepredicatepositionofatriplepatternmayhaveanimportantimpactinqueryexecutiontime,becauseseveralendpointsmaybeabletoprovideanswersforthepattern.
Answersizeandnumberofintermediateresults.
Ifthenumberofan-swersorintermediateresultsinvolvedinaqueryexecutionislarge,itmaytakealongtimetotransferthemacrossthenetwork,andhencethismayaectthequeryexecutiontime.
318G.
Montoyaetal.
Usageofquerylanguageexpressivity.
TheuseofspecicSPARQLop-eratorsmayaecttheexecutiontimeandthecompletenessofthenalresultset.
Forexample,theOPTIONALoperatorisoneofthemostcomplexoperatorsinSPARQL[5]andmayaddagoodnumberofintermediateresults,whiletheFILTERoperatormayrestricttheintermediateresultsandanswersize.
Generalpredicates(e.
g.
,rdf:type,owl:sameAs)arecommonlyusedinSPARQLqueries.
However,astheynormallyappearinmostdatasetsitisnotalwayscleartowhichendpointthecorrespondingsubqueryshouldbesubmit-ted,andthismayhaveanimpactinbothendpointselectionandqueryexecutiontime.
3.
2DataDimensionWenowdescribetheindependentvariablesrelatedtothecharacteristicsoftheRDFdatasetsthatarebeingaccessed.
AnRDFdatasetcanbedenedintermsofitssizeanditsstructuralcharacteristicslikethenumberofsubjects,pred-icatesandobjects,andtheinandoutdegreeofproperties.
Thesecharacteristicsimpactthenumberoftriplesthataretransferred,andhencethetotalexecutiontime.
Additionally,theymayaecttheperformanceoftheindividualendpoints.
Partitioninganddatadistributionaretwoofthemostimportantvari-ablesthatneedtobespeciedinthecontextofqueriesagainstfederationsofendpoints.
PartitioningreferstothewaythattheRDFdatasetisfragmented.
Datadistributionisthewaypartitionsareallocatedtothedierentendpoints.
Datamaybefullycentralized,fullydistributed,orsomewhereinbetween.
Adatasetmaybefragmentedintodisjunctpartitions;thepartitioningmaybedonehorizontally,verticallyoracombinationofboth.
Horizontalpartitioningfragmentstriplessothattheymaycontaindierentproperties.
Verticalparti-tioningproducesfragmentswhichcontainallthetriplesofatleastoneofthepropertiesinthedataset.
Horizontalpartitioningimpactsonthecompletenessoftheanswerwhereasverticalpartitioningaectstheexecutiontime.
Parti-tionsmaybereplicatedinseveralendpoints,eveninalloftheendpoints,i.
e.
,fullyreplicated,sothattheavailabilityofthesystemincreasesincaseofend-pointfailureorendpointdelay.
Table3comparesthebehaviorofANAPSIDandFedXwithdierentcongurations.
Thetwoenginesbehavesimilarlywhenthereisonedatasetperendpointandinhorizontalpartitioningwithoutreplication.
Forverticalpartitioningwithoutreplication,oneengineissuperiortotheother.
Whenpartitioningwithreplication,oneengineoutperformstheotherinverticalpartitioning,andtheinversebehavioroccurswithhorizontalpartitioning.
Table4showsanotherexampleoftheeectofdatadistributiononthequeryexecutiontime,againforANAPSIDandFedX.
Wecanobservethatwhentherearemultipleendpoints,resultsaresimilar,whilewithanetworkwithnodelay(perfectnetwork)andalldatasetsinasingleendpoint,oneoftheenginesclearlyoutperformstheotherinoneorderofmagnitude.
ResultsinTables3and4supporttheclaimthatdatapartitioning,datadistributionandnetworkdelaysneedtobeexplicitlycongurableintestbeds.
BenchmarkingFederatedSPARQLQueryEngines319Table3.
ImpactofDataPartitioningandDistributiononFedBenchqueryLD10(Per-fectNetwork).
VerticalPartitioning:triplesofpredicatesskos:subject,owl:sameAs,andnytimes:latestusewerestoredinfragments.
VerticalPartitioningWithoutReplication:threeendpoints,eachfragmentinadierentendpoint.
VerticalPar-titioningWithReplication:correspondstousefourendpointsandstoreoneofthethreefragmentsinthefourendpoints,anotherfragmentintwoendpoints,andthelastfragmentinoneendpoint.
HorizontalPartitioning:triplesofthethreepredicateswerepartitionedintwofragments;eachfragmenthasdatatoproduceatleastoneanswer.
HorizontalPartitioningWithoutReplicationtwoendpoints;onefragmentinadierentendpoint.
HorizontalPartitioningWithReplicas:fourendpoints;onefragmentisreplicatedineachendpoint,theotherfragmentinonlyoneendpoint.
QueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsOneDatasetperEndpointFedX1.
061.
063ANAPSID1.
081.
283VerticalPartitioningWithoutReplicationFedX0.
690.
693ANAPSID3.
8814.
253HorizontalPartitioningWithoutReplicationFedX0.
720.
723ANAPSID0.
030.
031VerticalPartitioningWithReplicationFedX0.
850.
8514ANAPSID4.
0614.
483HorizontalPartitioningWithReplicationFedX0.
910.
9125ANAPSID0.
060.
0613.
3PlatformDimensionThePlatformdimensiongroupsvariablesthatarerelatedtothecomputinginfrastructureusedintheevaluation.
Hereweincludeaminimumsetofpa-rameters,relatedtothesystem'scache,availableRAMmemoryandnumberofprocessors,sincethisdimensionmaycontainmanymoreparametersthatarerelevantinthiscontext,andthatshouldanywaybeexplicitlyspeciedinanyevaluationsetupwhenusingthistestbed.
TurningthecachemanagementfunctioninthesystemtogetherwiththeavailableRAMmayaectgreatlythequeryexecutiontime.
ThemeaningofdroppingandwarmingupcacheneedstobeclearlyspeciedaswellastheTable4.
ImpactofDataDistributiononFedBenchqueryCD1(PerfectNetwork).
AllDatasetsinoneendpointversusdatasetsdistributedindierentendpoints.
QueryExecutiontimeExecutiontimeNumberEngineFirstTuple(secs.
)AllTime(secs.
)ofResultsSingleEndpoint-AllDatabasetsFedX0.
510.
5161ANAPSID0.
0450.
04661MultipleEndpointsFedX0.
720.
7261ANAPSID0.
170.
1761320G.
Montoyaetal.
numberofiterationswhereanexperimentisruninwarmcache,andwhencachecontentsaredroopedo.
Inthecontextoffederationsofendpoints,informationonendpointcapabilitiesmaybestoredincache.
Thenumberofprocessorsisalsoarelevantvariableinthecontextoffederatedqueries.
Iftheinfrastruc-tureoersseveralprocessors,operatorsmayparallelizetheirexecution,andtheexecutiontimemaybeaectedpositively.
3.
4EndpointDimensionThisdimensioncomprisesvariablesthatarerelatedtothenumberandcapabil-itiesoftheendpointsusedinthetestbed.
TherstvariabletobeconsideredisthenumberofSPARQLendpointswherethequerywillbesubmittedandthetypeofendpointsthatareusedfortheevaluation.
Therstvariableaectsallthreeobservedvariables,speciallytheresultcompletenessbecausedierentendpointsmayproducedierentan-swers.
Therelationshipbetweenthenumberofinstances,graphsandendpointsofthesystemsusedduringtheevaluationisalsoanimportantas-pectthatneedstobespecied.
Dierentcongurationsoftheserelationshipsmayimpactthethreedependentvariables.
Thereareseveralvariablesthathaveanimportantimpactontheexecutiontime,suchasthetransferdistribution,whichisthetimedistributionofthetransmissionofpacketsbytheendpoints,thenetworklatency,whichdenesthedelayinsendingpacketsthroughthenetwork,andtheinitialendpointdelay.
AnexampleoftheimpactofdierentnetworkdelaysisillustratedinTable5.
TwoqueriesfromtheLinkedDatacollectionofFedBenchwereexecuted(LD10andLD11).
NotethatANAPSIDandFedXbehavesimilarlyinLD10whenthereisnodelay;however,whendelaysareconsidered,FedXoutperformsANAPSID.
Ontheotherhand,inLD11ANAPSIDoutperformsFedXwhendelaysarepresent.
Infact,ANAPSIDisabletoproducethersttupleafterthesameamountoftime,independentlyofthedelay.
Finally,SPARQLendpointsnormallyallowconguringalimitonthean-swersizeofthequeriesandatimeout,soastopreventuserstoquerytheentiredataset.
Thismaygenerateemptyresultsetsorincompleteresults,particularlywhenendpointsub-queriesarecomplex.
4SomeExperimentalResultsInthissectionweillustratehowthetestbedextensioncanbeusedtobetterunderstandthebehaviorofsomeoftheexistingfederatedqueryengines.
Theextendedtestbedhasbeenexecutedonthreesystems(ANAPSID,ARQandFedX)withseveralcongurationsfortheindependentvariablesidentiedinSection3.
ThecompleteresultsetgeneratedbytheseexecutionscanbebrowsedattheDEFENDERportal4.
4http://159.
90.
11.
58/BenchmarkingFederatedSPARQLQueryEngines321Table5.
ImpactofNetworklatencyonFedBenchqueriesLD10andLD11.
Timeoutwassetupto30minutesandMessageSizeis16KB.
PerfectNetwork(NoDelays);FastNetwork(DelaysfollowGammadistribution(α=1,β=0.
3);Medium-Fast(DelaysfollowGammadistribution(α=3,β=1.
0);Medium-Slow(DelaysfollowGammadistribution(α=3,β=1.
5);Slow(DelaysfollowGammadistribution(α=5,β=2.
0)).
QueryQueryExecutiontimeExecutiontimeNumberofEngineFirstTuple(secs.
)AllTuples(secs.
)ResultsPerfectNetworkANAPSIDLD101.
081.
293LD110.
060.
09376FedXLD101.
061.
063LD115.
445.
44376FastNetworkANAPSIDLD1018.
1322.
893LD110.
062.
80376FedXLD103.
453.
453LD1114.
2114.
22376MediumFastNetworkANAPSIDLD10191.
78241.
583LD110.
0727.
86376FedXLD1027.
2727.
273LD11108.
93108.
93376MediumSlowNetworkANAPSIDLD10287.
88362.
593LD110.
0541.
74376FedXLD1041.
4241.
423LD11162.
45162.
45376SlowNetworkANAPSIDLD10653.
44819.
723LD110.
0992.
52376FedXLD1087.
1987.
193LD11347.
93347.
93376Nowwewillfocusononeoftheanalysesthatasystemdevelopermaybeinterestedin,inthecontextofthecontinuousbenchmarkingprocessthatwehavereferredtointhispaper.
Thatis,wearenotanalyzingthewholesetofresultsobtainedfromtheexecution,butonlyasubsetofit.
Specically,let'sassumethatweareinterestedinunderstandingtheperformanceofthethreeevaluatedsystemsunderdierentdatadistributionsinanidealscenario,withnoornegligibleconnectionlatency.
Ourhypothesisisthatexistingqueryenginesaresensibletothewaydataisdistributedalongdierentendpoints,evenwhenthenetworkisperfect.
Therefore,theseresultsmaybeusefultovalidatethathypothesisandtounderstandwhetherasetoffederateddatasetsforwhichwehavethecorrespondingRDFdumpsshouldbebetterstoredinasingleendpointorindierentendpointstooeranswersmoreeciently.
Basedonthesetofvariablesidentiedinourstudy,thefollowingexperimentalsetupisused:DatasetsandQueryBenchmarks.
Weran36queriesagainsttheFedBenchdatasetcollections[8]:DBPedia,NYTimes,Geonames,KEGG,ChEBI,Drugbank,Jamendo,LinkedMDB,andSWDogFood.
Thesequeriesinclude25FedBenchqueriesandelevencomplexqueries5.
Thelatterareaddedto5http://www.
ldc.
usb.
ve/~mvidal/FedBench/queries/ComplexQueries322G.
Montoyaetal.
coversomeofthemissingelementsintheformergroupofqueries.
Theyarecomprisedofbetween6and48triplepatterns,andcanbedecomposedintoupto8sub-queries;andtheycoverdierentSPARQLoperators.
Virtuoso6wasusedtoimplementendpoints,andthetimeoutwassetupto240secs.
or71,000tuples.
ExperimentswereexecutedonaLinuxMintmachinewithanIntelPentiumCore2DuoE75002.
93GHz8GBRAM1333MHzDDR3.
NetworkLatency.
Weconguredaperfectnetworkwithnodelays.
Thesizeofthemessagecorrespondedto16KB.
DataDistribution.
Weconsideredtwodierentdistributionsofthedata:i)Complete:theFedBenchcollectionswerestoredintoasinglegraphandmadeaccessiblethroughonesingleSPARQLendpoint,andii)Federated:theFedBenchcollectionswerestoredinnineVirtuosoendpoints.
Therefore,weconsiderthequeriesinfourgroupsandsixcongurations:Con-guration1:ANAPSIDCompleteDistribution,Conguration2:ANAPSIDFederatedDistribution,Conguration3:ARQCompleteDistribution,Con-guration4:ARQFederatedDistribution,Conguration5:FedXCompleteDistribution,Conguration6:FedXFederatedDistribution.
Ineachcong-uration,thecorrespondingquerieswereorderedaccordingtothetotalexecu-tiontimeconsumedbythecorrespondingengines.
Forexample,ANAPSIDinaCompleteDistribution,i.
e.
,Conguration1,theCross-Domainquerieswereorderedasfollows:CD2,CD3,CD4,CD5,CD1,CD7,andCD6.
QueriesofeachcongurationwerecomparedusingtheSpearman'sRhocorrelation.
Ahighpos-itivevalueofcorrelationvaluebetweentwocongurationsindicatesthatthecorrespondingengineshadasimilarbehavior,i.
e.
,thetrendsofexecutiontimeofthetwoenginesaresimilar.
Thus,whenConguration1iscomparedtoitself,theSpearman'sRhocorrelationreachesthehighestvalue(1.
0).
Ontheotherhand,anegativevalueindicatesaninversecorrelation;forexample,thishappenedwithComplexQueriestoARQinaCompleteDistribution(Cong-uration3)whencomparedtoFedXFederatedDistribution(Conguration6);itsvalueis-0.
757.
Finally,avalueof0.
0representsthatthereisnocorrela-tionbetweenthetwocongurations,e.
g.
,forLifeSciencequeriesConguration4andConguration6.
Figure1illustratestheresultsofthisspecicstudy(again,thedatausedforthisstudyisavailablethroughtheDEFENDERpor-tal).
Whitecirclesrepresentthehighestvalueofcorrelation;redonescorrespondtoinversecorrelations,whileblueonesindicateapositivecorrelation.
Thesizeofthecirclesisproportionaltothevalueofthecorrelation.
Givenagroupofqueries,alowvalueofcorrelationofoneengineintwodierentdistributionssuggeststhatthedistributionaectstheenginebehavior,e.
g.
,FedXandARQinComplexQuerieswithdierentdatadistributionshavecorrelationvaluesof0.
143and0.
045,respectively.
Furthermore,thenumberofsmallbluecirclesbe-tweencongurationsofdierentdatadistributionsofthesameengine,indicatethatthisparameteraectsthebehaviorofthestudiedengine.
BecausethereareseveralofthesepointsintheComplexQueriesplot,wecanconcludethat6http://virtuoso.
openlinksw.
com/BenchmarkingFederatedSPARQLQueryEngines323(a)CrossDomain(CD)(b)LinkedData(LD)(c)LifeScience(LS)(d)NewComplexQueries(C)Fig.
1.
Spearman'sRhoCorrelationofQueriesinthreeFedBenchsetsofqueries(a)Cross-Domain(CD),(b)LifeScience(LS),(c)LinkedData(LD)and(d)NewCom-plexQueries.
Sixcongurations:(1)ANAPSIDCompleteDistribution;(2)ANAPSIDFederatedDistribution;(3)ARQCompleteDistribution;(4)ARQFederatedDis-tribution;(5)FedXCompleteDistribution;(6)FedXFederatedDistribution.
Whitecirclescorrespondtocorrelationvalueof1.
0;bluecirclesindicateapositivecorrelation(Fig.
1(d)points(3,4)and(5,6)correlationvalues0.
045and0.
143,respectively);redcirclesindicateanegativecorrelation(Fig.
1(d)points(2,6)and(6,3)correlationvalues-0.
5and-0.
757,respectively).
Circles'diametersindicateabsolutecorrelationvalues.
thesetwoparameters(querycomplexityanddatadistribution)allowuncoveringengines'behaviorthatcouldnotbeobservedbefore.
Thisillustratestheneedfortheextensionsproposedinthispaper.
5ConclusionandFutureWorkInthispaperwehaveshownthatthereisaneedtoextendcurrentfederatedSPARQLquerytestbedswithadditionalvariablesandcongurationsetups(e.
g.
,datapartitioninganddistribution,networklatency,andquerycomplexity),soastoprovidemoreaccuratedetailsofthebehaviorofexistingengines,whichcanthenbeusedtoprovidebettercomparisonsandasinputforimprovementproposals.
Takingthoseadditionalvariablesintoaccount,wehaveextensivelyevaluatedthreeoftheexistingengines(ANAPSID,ARQandFedX),andhavemadeavailablethoseresultsforpublicconsumptionintheDEFENDERportal,324G.
Montoyaetal.
whichweplantomaintainup-to-dateonaregularbasis.
Wehavealsoshownhowthegeneratedresultdatasetcanbeusedtovalidatehypothesesaboutthesystems'behavior.
OurfutureworkplanswillbefocusedoncontinuingwiththeevaluationofadditionalfederatedSPARQLqueryengines,andwiththeinclusionofadditionalparametersinthebenchmarkthatmaystillbeneededtoprovidemoreaccurateandwell-informedresults.
Acknowledgements.
ThisworkhasbeenfundedbytheprojectmyBigData(TIN2010-17060),andDID-USB.
WethankMaribelAcosta,CosminBasca,andRaulGarca-Castroforfruitfuldiscussions.
References1.
Acosta,M.
,Vidal,M.
-E.
,Lampo,T.
,Castillo,J.
,Ruckhaus,E.
:ANAPSID:AnAdaptiveQueryProcessingEngineforSPARQLEndpoints.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
18–34.
Springer,Heidelberg(2011)2.
Buil-Aranda,C.
,Arenas,M.
,Corcho,O.
:SemanticsandOptimizationoftheSPARQL1.
1FederationExtension.
In:Antoniou,G.
,Grobelnik,M.
,Simperl,E.
,Parsia,B.
,Plexousakis,D.
,DeLeenheer,P.
,Pan,J.
(eds.
)ESWC2011,PartII.
LNCS,vol.
6644,pp.
1–15.
Springer,Heidelberg(2011)3.
Lynden,S.
,Kojima,I.
,Matono,A.
,Tanimura,Y.
:ADERIS:AnAdaptiveQueryProcessorforJoiningFederatedSPARQLEndpoints.
In:Meersman,R.
,Dillon,T.
,Herrero,P.
,Kumar,A.
,Reichert,M.
,Qing,L.
,Ooi,B.
-C.
,Damiani,E.
,Schmidt,D.
C.
,White,J.
,Hauswirth,M.
,Hitzler,P.
,Mohania,M.
(eds.
)OTM2011,PartII.
LNCS,vol.
7045,pp.
808–817.
Springer,Heidelberg(2011)4.
Montoya,G.
,Vidal,M.
-E.
,Acosta,M.
:DEFENDER:aDEcomposerforquEriesagainstfeDERationsofendpoints.
In:ExtendedSemanticWebConference,ESWCWorkshopandDemo2012(toappear)5.
Perez,J.
,Arenas,M.
,Gutierrez,C.
:SemanticsandcomplexityofSPARQL.
TODS34(3)(2009)6.
Prud'hommeaux,E.
,Buil-Aranda,C.
:SPARQL1.
1federatedquery(November2011)7.
Quilitz,B.
,Leser,U.
:QueryingDistributedRDFDataSourceswithSPARQL.
In:Bechhofer,S.
,Hauswirth,M.
,Homann,J.
,Koubarakis,M.
(eds.
)ESWC2008.
LNCS,vol.
5021,pp.
524–538.
Springer,Heidelberg(2008)8.
Schmidt,M.
,G¨orlitz,O.
,Haase,P.
,Ladwig,G.
,Schwarte,A.
,Tran,T.
:FedBench:ABenchmarkSuiteforFederatedSemanticDataQueryProcessing.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
585–600.
Springer,Heidelberg(2011)9.
Schmidt,M.
,Hornung,T.
,Lausen,G.
,Pinkel,C.
:SP2bench:ASPARQLperfor-mancebenchmark.
In:ICDT,pp.
4–33(2010)10.
Schwarte,A.
,Haase,P.
,Hose,K.
,Schenkel,R.
,Schmidt,M.
:FedX:OptimizationTechniquesforFederatedQueryProcessingonLinkedData.
In:Aroyo,L.
,Welty,C.
,Alani,H.
,Taylor,J.
,Bernstein,A.
,Kagal,L.
,Noy,N.
,Blomqvist,E.
(eds.
)ISWC2011,PartI.
LNCS,vol.
7031,pp.
601–616.
Springer,Heidelberg(2011)

LightNode(7.71美元),免认证高质量香港CN2 GIA

LightNode是一家位于香港的VPS服务商.提供基于KVM虚拟化技术的VPS.在提供全球常见节点的同时,还具备东南亚地区、中国香港等边缘节点.满足开发者建站,游戏应用,外贸电商等应用场景的需求。新用户注册充值就送,最高可获得20美元的奖励金!成为LightNode的注册用户后,还可以获得属于自己的邀请链接。通过你的邀请链接带来的注册用户,你将直接获得该用户的消费的10%返佣,永久有效!平台目前...

HostKvm($4.25/月),俄罗斯CN2带宽大升级,俄罗斯/香港高防限量5折优惠进行中

HostKvm是一家成立于2013年的国外VPS服务商,产品基于KVM架构,数据中心包括日本、新加坡、韩国、美国、俄罗斯、中国香港等多个地区机房,均为国内直连或优化线路,延迟较低,适合建站或者远程办公等。本月,商家旗下俄罗斯、新加坡、美国、香港等节点带宽进行了大幅度升级,俄罗斯机房国内电信/联通直连,CN2线路,150Mbps(原来30Mbps)带宽起,目前俄罗斯和香港高防节点5折骨折码继续优惠中...

HostKvm开年促销:香港国际/美国洛杉矶VPS七折,其他机房八折

HostKvm也发布了开年促销方案,针对香港国际和美国洛杉矶两个机房的VPS主机提供7折优惠码,其他机房业务提供8折优惠码。商家成立于2013年,提供基于KVM架构的VPS主机,可选数据中心包括日本、新加坡、韩国、美国、中国香港等多个地区机房,均为国内直连或优化线路,延迟较低,适合建站或者远程办公等。下面列出几款主机配置信息。美国洛杉矶套餐:美国 US-Plan1CPU:1core内存:2GB硬盘...

linuxmint为你推荐
安徽汽车网安徽汽车票查询留学生认证留学生前阶段双认证认证什么内容?22zizi.comwww 地址 didi22怎么打不开了,还有好看的吗>comlunwenjiance论文检测,知网的是32.4%,改了以后,维普的是29.23%。如果再到知网查,会不会超过呢?ip查询器怎么样查看自己电脑上的IP地址www.585ccc.com手机ccc认证查询,求网址bbs2.99nets.com这个"风情东南亚"网站有78kg.cn做网址又用bbs.风情东南亚.cn那么多此一举啊!www.zhiboba.com网上看nba盗车飞侠侠盗飞车罪恶都市全部秘籍ps手柄版的干支论坛天干地支???
双线主机租用 burstnet jsp主机 免费个人博客 全能主机 申请个人网页 godaddy域名证书 193邮箱 web服务器的架设 200g硬盘 91vps 佛山高防服务器 服务器托管什么意思 服务器合租 绍兴电信 vip域名 跟踪路由命令 免费个人主页 博客域名 发证机构 更多