hardhologres
hologres 时间:2021-04-06 阅读:(
)
AlibabaHologres:ACloud-NativeServiceforHybridServing/AnalyticalProcessingXiaoweiJiang,YuejunHu,YuXiang,GuangranJiang,XiaojunJin,ChenXia,WeihuaJiang,JunYu,HaitaoWang,YuanJiang,JihongMa,LiSu,KaiZengAlibabaGroup{xiaowei.
jxw,yuejun.
huyj,yu.
xiangy,guangran.
jianggr,xiaojun.
jinxj,chen.
xiac,guobei.
jwh,bob.
yj,haitao.
w,yuan.
jiang,jihong.
ma,lisu.
sl,zengkai.
zk}@alibaba-inc.
comABSTRACTInexistingbigdatastacks,theprocessesofanalyticalprocessingandknowledgeservingareusuallyseparatedindierentsystems.
InAl-ibaba,weobservedanewtrendwherethesetwoprocessesarefused:knowledgeservingincursgenerationofnewdata,andthesedataarefedintotheprocessofanalyticalprocessingwhichfurthernetunestheknowledgebaseusedintheservingprocess.
Splittingthisfusedprocessingparadigmintoseparatesystemsincursoverheadsuchasextradataduplication,discrepantapplicationdevelopmentandex-pensivesystemmaintenance.
Inthiswork,weproposeHologres,whichisacloudnativeser-viceforhybridservingandanalyticalprocessing(HSAP).
Hologresdecouplesthecomputationandstoragelayers,allowingexiblescal-ingineachlayer.
Tablesarepartitionedintoself-managedshards.
Eachshardprocessesitsreadandwriterequestsconcurrentlyin-dependentofeachother.
Hologresleverageshybridrow/columnstoragetooptimizeoperationssuchaspointlookup,columnscananddataingestionusedinHSAP.
WeproposeExecutionContextasaresourceabstractionbetweensystemthreadsandusertasks.
Ex-ecutioncontextscanbecooperativelyscheduledwithlittlecontextswitchingoverhead.
Queriesareparallelizedandmappedtoexecu-tioncontextsforconcurrentexecution.
eschedulingframeworkenforcesresourceisolationamongdierentqueriesandsupportscustomizableschedulepolicy.
Weconductedexperimentscompar-ingHologreswithexistingsystemsspecicallydesignedforan-alyticalprocessingandservingworkloads.
eresultsshowthatHologresconsistentlyoutperformsothersystemsinbothsystemthroughputandend-to-endquerylatency.
PVLDBReferenceFormat:XiaoweiJiang,YuejunHu,YuXiang,GuangranJiang,XiaojunJin,ChenXia,WeihuaJiang,JunYu,HaitaoWang,YuanJiang,JihongMa,LiSu,KaiZeng.
AlibabaHologres:ACloud-NativeServiceforHybridServing/AnalyticalProcessing.
PVLDB,DOI:https://doi.
org/.
/.
1.
INTRODUCTIONModernbusinessispervasivelydrivenbyderivingbusinessin-sightsfromhugeamountsofdata.
FromtheexperienceofrunningThisworkislicensedundertheCreativeCommonsAttribution-NonCommercial-NoDerivatives4.
0InternationalLicense.
Toviewacopyofthislicense,visithttp://creativecommons.
org/licenses/by-nc-nd/4.
0/.
Foranyusebeyondthosecoveredbythislicense,obtainpermissionbyemailinginfo@vldb.
org.
Copyrightisheldbytheowner/author(s).
PublicationrightslicensedtotheVLDBEndowment.
ProceedingsoftheVLDBEndowment,Vol.
13,No.
12ISSN2150-8097.
DOI:https://doi.
org/10.
14778/3415478.
3415550Alibabainternalbigdataservicestacksaswellaspubliccloudof-ferings,wehaveobservednewpatternsonhowmodernbusinessusesbigdata.
Forinstance,tosupportreal-timelearningandde-cisionmaking,thebigdatastackbehindmoderne-commerceser-vicesusuallyaggregatereal-timesignalslikepurchasetransactionsanduserclicklogstocontinuouslyderivefreshproductanduserstatistics.
esestatisticsareheavilyusedinbothonlineandoinemanners,e.
g.
:()eyareservedimmediatelyonlineasimportantfeatures.
Incomingusereventsarejoinedwiththesefeaturestogen-eratesamplesforreal-timemodeltraininginsearchandrecommen-dationsystems.
()eyarealsousedbydatascientistsincomplexinteractiveanalysistoderiveinsightsformodeltuningandmarket-ingoperations.
eseusagepatternsclearlydemonstrateahostofnewtrendswhichthetraditionalconceptofOnlineAnalyticalPro-cessing(OLAP)cannolongeraccuratelycover:FusionofAnalyticalProcessingandServing.
TraditionalOLAPsystemsusuallyplayaratherstaticroleinthewholebusinessstack.
eyanalyzelargequantitiesofdataandderiveknowledge(e.
g.
,precomputedviews,learnedmodels,etc.
)o-line,buthandoverthederivedknowledgetoanothersystemforservingonlineapplica-tions.
Dierently,modernbusinessdecision-makingisaconstantly-tunedonlineprocess.
ederivedknowledgeisnotonlyservedbutalsoparticipatesincomplexanalytics.
eneedforanalyticalpro-cessingandservingonbigdataisfusedtogether.
FusionofOnlineandOineAnalysis.
Modernbusinessneedstoquicklytransformfreshlyobtaineddataintoinsights.
Writtendatahastobeavailabletoreadwithinseconds.
AlengthyoineETLprocessisnolongertolerable.
Furthermore,amongallthedatacol-lected,thetraditionalwayofsynchronizingdatafromanOLTPsys-temonlyaccountsforaverysmallportion.
Ordersofmagnitudesmoredataisfromlesstransactionalscenariossuchasuserclicklogs.
esystemshavetohandlehighvolumedataingestionwithverylowlatencywhileprocessingqueries.
Existingbigdatasolutionsusuallyhostthehybridservingandan-alyticalprocessingworkloadsusingacombinationofdierentsys-tems.
Forinstance,theingesteddataispre-aggregatedinrealtimeusingsystemslikeFlink[]populatedinsystemslikeDruid[]thathandlesmulti-dimensionalanalytics,andservedinsystemslikeCassandra[].
isinevitablycausesexcessivedataduplicationandcomplexdatasynchronizationacrosssystems,inhibitsanap-plication'sabilitytoactondataimmediately,andincursnon-trivialdevelopmentandadministrativeoverheads.
Inthispaper,wearguethathybridserving/analyticalprocessing(HSAP)shouldbeuniedandhandledinasinglesystem.
InAl-ibaba,webuildacloud-nativeHSAPservicecalledHologres.
Asanewserviceparadigm,HSAPhaschallengesthatareverydierentfromexistingbigdatastacks(SeeSection.
foradetaileddiscus-sion):()esystemneedstohandlequeryworkloadsmuchhigher3272thantraditionalOLAPsystems.
eseworkloadsarehybrid,withverydierentlatencyandthroughputtrade-os.
()Whilehandlinghigh-concurrencyqueryworkloads,thesystemalsoneedstokeepupwithhigh-throughputdataingestion.
eingesteddataneedstobeavail-abletoreadswithinseconds,inordertomeetthestringentfreshnessrequirementsofservingandanalysisjobs.
()emixedworkloadsarehighlydynamic,usuallysubjecttosuddenbursts.
esystemhastobehighlyelasticandscalable,reactingtotheseburstspromptly.
Inordertotacklethesechallenges,Hologresisbuiltwithacom-pleterethinkingofthesystemdesign:StorageDesign.
Hologresadoptsanarchitecturethatdecouplesstoragefromcomputation.
Dataisremotelypersistedincloudstor-age.
Hologresmanagestablesintablegroups,andpartitionsatablegroupintomultipleshards.
Eachshardisself-contained,andman-agesreadsandwritesindependently.
Decoupledfromthephysicalworkernodes,datashardscanbeexiblymigratedbetweenwork-ers.
WithdatashardasthebasicdatamanagementunitinHologres,processessuchasfailurerecovery,loadbalancingandclusterscalingoutcanbeimplementedusingshardmigrationeciently.
Tosupportlow-latencyquerieswithhigh-throughputwritesatthesametime,shardsaredesignedtobeversioned.
ecriticalpathsofreadsandwritesoneachtablegroupshardareseparated.
Hologresusesatabletstructuretouniformlystoretables.
Tabletscanbeinroworcolumnarformats,andarebothmanagedinaLSM-likewaytomaximizethewritethroughput,andminimizesthefreshnessdelayfordataingestion.
ConcurrentQueryExecution.
Webuildaservice-orientedresourcemanagementandschedulingframework,namedHOS.
HOSusesexe-cutioncontextastheresourceabstractionontopofsystemthreads.
Executioncontextsarecooperativelyscheduledwithlittlecontextswitchingoverhead.
HOSparallelizesqueryexecutionbydividingqueriesintone-grainedworkunitsandmappingworkunitstoex-ecutioncontexts.
isarchitecturecanfullyexploitthepotentialofhighhardwareparallelism,allowingustomultiplexahugenumberofqueriesconcurrently.
Executioncontextalsofacilitatestheen-forcementofresourceisolation,suchthatlow-latencyservingwork-loadcancoexistwiththeanalyticalworkloadinthesamesystemwithoutbeingstalled.
HOSmakesthesystemeasilyscalableaccord-ingtopracticalworkload.
Inretrospect,wemakethefollowinglistofcontributions:.
Weintroduceanewparadigmofbigdataserviceforhybridserv-ing/analyticalprocessing(HSAP),andidentifythenewchallengesunderthisnewparadigm.
.
Wedesignandimplementacloud-nativeHSAPservicecalledHologres.
Hologreshasanovelstoragedesign,andahighlyecientresourcemanagementandschedulinglayernamedHOS.
esenoveldesignsincombinationhelpHologresachievereal-timeingestion,low-latencyserving,interactiveanalyticalpro-cessing,andalsosupportfederatedqueryexecutionwithothersystemssuchasPostgreSQL[].
.
wehavedeployedHologresinAlibaba'sinternalbigdatastackaswellaspubliccloudoerings,andconductedathoroughper-formancestudyunderreal-lifeworkloads.
OurresultsshowthatHologresachievessuperiorperformanceevencomparedwithspecializedservingsystemsandOLAPengines.
epaperisorganizedasfollows:ekeydesignconsiderationsandsystemoverviewofHologresarepresentedinSection.
InSection,weexplainthedatamodelandstorageframework.
Next,weintroducetheschedulingmechanismanddetailsofquerypro-cessinginSection.
Experimentalresultsarepresentedanddis-cussedinSection.
Lastly,wediscusstherelatedresearchinSec-tionandconcludethiswork.
Figure:AnexampleHSAPscenario:thebigdatastackbehindarecommendationservice2.
KEYDESIGNCONSIDERATIONSBigdatasystemsinmodernenterprisesarefacinganincreasingrequestofhybridservingandanalyticalprocessing.
Inthissection,weusetherecommendationserviceinAlibabatodemonstrateatypicalHSAPscenario,andsummarizethenewchallengesposedbyHSAPtosystemdesign.
enweprovideasystemoverviewofhowHologresaddressesthesechallenges.
2.
1HSAPinActionModernrecommendationservicesputgreatemphasisonreect-ingreal-timeusertrendsandprovidepersonalizedrecommenda-tions.
Inordertoachievethesegoals,thebackendbigdatastackhasevolvedintoastatewithextremecomplexityanddiversedataprocessingpatterns.
FigurepresentsanillustrativepictureofthebigdatastackbackingtherecommendationserviceinAlibabae-commerceplatforms.
Tocapturepersonalizedreal-timebehaviors,therecommenda-tionserviceheavilyreliesonreal-timefeaturesandcontinuouslyupdatedmodels.
ereareusuallytwotypesofreal-timefeatures:.
eplatformaggressivelycollectsalargenumberofreal-timeevents,includinglogevents(e.
g.
,pageviews,userclicks),aswellastransactions(e.
g.
,paymentssyncedfromtheOLTPdatabases).
Asweobservedfromproduction,theseeventsareofextremelyhighvolume,andthemajorityofthemarelesstransactionallogdata,e.
g.
,events/s.
eseeventsareimmediatelyingestedintothedatastack(a)forfutureuse,butmoreimportantlytheyarejoinedwithvariousdimensiondataontheytoderiveusefulfeatures(),andthesefeaturesarefedintotherecommendationsystematreal-time.
isreal-timejoinneedspointlookupofdi-mensiondatawithextremelylowlatencyandhighthroughput,inordertokeepupwiththeingestion.
.
eplatformalsoderivesmanyfeaturesbyaggregatingthereal-timeeventsinslidingwindows,alongavarietyofdimensionsandtimegranularities,e.
g.
,-minitemclicks,-daypageviews,and-dayturnover.
eseaggregationsarecarriedoutinei-therbatch()orstreamingfashiondependingontheslidingwindowgranularity,andingestedintothedatastack(b).
esereal-timedataarealsousedingeneratingtrainingdatatocon-tinuouslyupdatetherecommendationmodels,throughbothonlineandoinetraining.
Despiteofitsimportance,theaboveprocessisonlyasmallpor-tionoftheentirepipeline.
ereisawholestackofmonitoring,val-idation,analysisandrenementprocesssupportingarecommenda-tionsystem.
eseincludebutnotlimitedtocontinuousdashboardqueries()onthecollectedeventstomonitorthekeyperformancemetricsandconductA/Btesting,andperiodicbatchqueries()to3273generateBIreports.
Besides,datascientistsareconstantlyperform-ingcomplexinteractiveanalysisoverthecollecteddatatoderivereal-timeinsightsforbusinessdecisions,todocausalanalysisandrenementofthemodels.
Forinstance,onthedouble-shoppingfestival,theincomingOLAPqueryrequestscangouptohundredsofqueriespersecond.
eabovedemonstratesahighlycomplexHSAPscenario,rang-ingfromreal-timeingestion(a)tobulkload(b),fromservingworkload(),continuousaggregation(),tointeractiveanalysis(),allthewaytobatchanalysis().
Withoutauniedsystem,theabovescenariohastobejointlyservedbymultipleisolatedsystems,e.
g.
,batchanalysisbysystemslikeHive;servingworkloadbysys-temslikeCassandra;continuousaggregationbysystemslikeDruid;interactiveanalysisbysystemslikeImpalaorGreenplum.
2.
2ChallengesofaHSAPServiceAsanewbigdataserviceparadigm,HSAPserviceproposeschal-lengesthatwerenotasprominentjustafewyearsago.
High-ConcurrencyHybridQueryWorkload.
HSAPsystemsusu-allyfacehighqueryconcurrencythatisunprecedentedintradi-tionalOLAPsystems.
Inpractice,comparedtoOLAPquerywork-load,theconcurrencyofservingqueryworkloadisusuallymuchhigher.
Forinstance,wehaveobservedinreal-lifeapplicationsthatservingqueriescouldarriveatarateashighasqueriespersec-ond(QPS),whichisveordersofmagnitudehigherthantheQPSofOLAPqueries.
Furthermore,servingquerieshaveamuchmorestringentlatencyrequirementthanOLAPqueries.
HowtofulllthesedierentquerySLOswhilemultiplexingthemtofullyutilizethecomputationresourceisreallychallenging.
ExistingOLAPsystemsgenerallyuseaprocess/thread-basedcon-currencymodel,i.
e.
,useaseparateprocess[]orthread[]tohan-dleaquery,andrelyontheoperatingsystemtoscheduleconcurrentqueries.
eexpensivecontextswitchingcausedbythisdesignputsahardlimitonthesystemconcurrency,andthusisnolongersuit-ableforHSAPsystems.
AnditpreventsthesystemtohaveenoughschedulingcontroltomeetdierentquerySLOs.
High-roughputReal-TimeDataIngestion.
Whilehandlinghigh-concurrencyqueryworkloads,HSAPsystemsalsoneedtohandlehigh-throughputdataingestion.
Amongallthedataingested,thetraditionalwayofsynchronizingdatafromanOLTPsystemonlyaccountsforaverysmallportion,whilethemajorityofdatacomesfromvariousdatasourcessuchasreal-timelogdatathatdonothaveastrongtransactionsemantics.
eingestionvolumecanbemuchhigherthanobservedinahybridtransaction-analyticalpro-cessing(HTAP)system.
Forinstance,intheabovescenariothein-gestionrategoesuptotensofmillionsoftuplespersecond.
Whatismore,dierentfromtraditionalOLAPsystems,HSAPsystemsre-quirereal-timedataingestion—writtendatahastobevisiblewithinsubsecond—toguaranteethedatafreshnessofanalysis.
HighElasticityandScalability.
eingestionandqueryworkloadcanundergosuddenbursts,andthusrequirethesystemtobeelas-ticandscalable,andreactpromptly.
Wehaveobservedinreal-worldapplicationsthatthepeakingestionthroughputreaches.
Xoftheaverage,andthepeakquerythroughputreachesXoftheaverage.
Also,theburstsiningestionandqueryworkloaddonotnecessarilycoincide,whichrequiresthesystemtoscalethestorageandcompu-tationindependently.
2.
3DataStorageInthissubsection,wediscussthehigh-leveldesignofdatastorageinHologres.
DecouplingofStorage/Computation.
Hologrestakesacloud-nativedesignwherethecomputationandstoragelayersaredecou-pled.
AllthedatalesandlogsofHologresarepersistedinPangubydefault,whichisahighperformancedistributedlesysteminAl-ibabaCloud.
Wealsosupportopen-sourcedistributedlesystemssuchasHDFS[].
Withthisdesign,boththecomputationandstor-agelayerscanbeindependentlyscaledoutaccordingtothework-loadandresourceavailability.
Tablet-basedDataLayout.
InHologres,bothtablesandindexesarepartitionedintone-grainedtablets.
Awriterequestisdecom-posedintomanysmalltaskseachofwhichhandlestheupdatestoasingletablet.
Tabletsforcorrelatedtablesandindexesarefurthergroupedintoshards,toprovideecientconsistencyguarantees.
Toreducecontention,weusealatch-freedesignthateachtabletisman-agedbyasinglewriter,butcanhavearbitrarynumberofreaders.
Wecancongureaveryhighreadparallelismforqueryworkloads,whichhidesthelatencyincurredbyreadingfromaremotestorage.
SeparationofReads/Writes.
Hologresseparatesthereadandwritepaths,tosupportbothhigh-concurrencyreadsandhigh-throughputwritesatthesametime.
ewriterofatabletusesanLSM-likeap-proachtomaintainthetabletimage,wheretherecordsareproperlyversioned.
Freshwritescanbevisibleforreadswithsubsecond-levellatency.
Concurrentreadscanrequestaspecicversionofthetabletimage,andthusarenotblockedbythewrites.
2.
4ConcurrentQueryExecutionInthissubsection,wediscussthehigh-leveldesignoftheschedul-ingmechanismusedbyHologres.
ExecutionContext.
Hologresbuildsaschedulingframework,re-ferredtoasHOS,whichprovidesauser-spacethreadcalledexecutioncontexttoabstractthesystemthread.
Executioncontextsaresuperlightweightandcanbecreatedanddestroyedwithnegligiblecost.
HOScooperativelyschedulesexecutioncontextsontopthesystemthreadpoolswithlittlecontextswitchingoverhead.
Anexecutioncontextprovidesanasynchronoustaskinterface.
HOSdividesusers'writeandreadqueriesintone-grainedworkunits,andmapstheworkunitsontoexecutioncontextsforscheduling.
isdesignalsoenablesHologrestopromptlyreacttosuddenworkloadbursts.
esystemcanbeelasticallyscaledupanddownatruntime.
CustomizableSchedulingPolicy.
HOSdecouplestheschedulingpolicyfromtheexecutioncontextbasedschedulingmechanism.
HOSgroupsexecutioncontextsfromdierentqueriesintoschedulinggroups,eachwiththeirownresourceshare.
HOSisinchargeofmon-itoringtheconsumedshareofeachschedulinggroup,andenforcingresourceisolationandfairnessbetweenschedulinggroups.
2.
5SystemOverviewFigurepresentsthesystemoverviewofHologres.
efrond-endnodes(FEs)receivequeriessubmittedfromclientsandreturnthequeryresults.
Foreachquery,thequeryoptimizerintheFEnodegeneratesaqueryplan,whichisparallelizedintoaDAGoffragmentinstances.
ecoordinatordispatchesfragmentinstancesinaqueryplantotheworkernodes,eachofwhichmapthefragmentinstancesintoworkunits(Section.
).
Aworkernodeisaunitofphysicalresources,i.
e.
,CPUcoresandmemory.
Eachworkernodecanholdthememorytablesformultipletablegroupshards(Section.
)foradatabase.
Inaworkernode,workunitsareexecutedasexecutioncontextsintheECpool(Section.
).
eHOSschedulerschedulestheECpoolontopofthesystemthreads(Section.
),followingthepre-conguredschedulingpolicy(Section.
).
eresourcemanagermanagesthedistributionoftablegroupshardsamongworkernodes:resourcesinaworkernodearelogi-callysplitintoslots,eachofwhichcanonlybeassignedtooneta-3274Figure:ArchitectureofHologresblegroupshard.
eresourcemanagerisalsoresponsiblefortheadding/removalofworkernodesinaHologrescluster.
Workernodesperiodicallysendheartbeatstotheresourcemanager.
Uponaworkernodefailureoraworkloadburstinthecluster,theresourcemanagerdynamicallyaddsnewworkernodesintothecluster.
estoragemanagermaintainsadirectoryoftablegroupshards(seeSection.
),andtheirmetadatasuchasthephysicallocationsandkeyranges.
Eachcoordinatorcachesalocalcopyofthesemetadatatofacilitatethedispatchingofqueryrequests.
HologresallowstheexecutionofasinglequerytospanHologresandotherqueryengines(Section.
.
).
Forinstance,whenfrag-mentinstancesneedtoaccessdatanotstoredinHologres,thecoordinatordistributesthemtoothersystemsstoringtherequireddata.
WedesignedandimplementedasetofuniedAPIsforqueryprocessing,suchthatworkunitsexecutedinHologrescancom-municatewithotherexecutionenginessuchasPostgreSQL[].
Non-HologresexecutionengineshavetheirownqueryprocessingandschedulingmechanismsindependentofHologres.
3.
STORAGEHologressupportsahybridrow-columnstoragelayouttailoredforHSAPscenarios.
erowstorageisoptimizedforlow-latencypointlookups,andthecolumnstorageisdesignedtoperformhigh-throughputcolumnscans.
Inthissection,wepresentthedetaileddesignofthehybridstorageinHologres.
Westartbyintroducingthedatamodelanddeningsomepreliminaryconcepts.
Next,weintroducetheinternalstructureoftablegroupshards,andexplainindetailshowtoperformwritesandreads.
Lastly,wepresentthelay-outsoftherowandcolumnstorage,followedbyabriefintroductiontothecachingmechanisminHologres.
3.
1DataModelInHologres,eachtablehasauser-speciedclusteringkey(emptyifnotspecied),andauniquerowlocator.
Iftheclusteringkeyisunique,itisdirectlyusedastherowlocator;otherwise,auniqui-erisappendedtotheclusteringkeytomakearowlocator,i.
e.
,clusteringkey,uniquifier.
Allthetablesofadatabasearegroupedintotablegroups.
Atablegroupisshardedintoanumberoftablegroupshards(TGSs),whereeachTGScontainsforeachtableapartitionofthebasedataandapartitionofalltherelatedindexes.
Wetreatthebase-datapartitionaswellasanindexpartitionuniformlyasatablet.
Tabletshavetwostorageformats:rowtabletandcolumntablet,optimizedforpointlookupandsequentialscanrespectively.
ebasedataandindexescanbestoredinarowtablet,acolumntablet,orboth.
Atabletisrequiredtohaveauniquekey.
erefore,thekeyofabase-datatabletistherowlocator.
Whereasfortabletsofsecondaryindexes,iftheindexisunique,theindexedcolumnsareusedasthekeyofthetablet;otherwise,thekeyisdenedbyaddingtherowlocatortotheindexedcolumns.
Forinstance,consideraTGSwithasingletableandtwosecondaryindexes—auniquesecondaryindex(k→v)andanon-uniquesecondaryindex(k→v)—andthebasedataisstoredinbothrowandcolumntablets.
Asexplainedabove,thekeyofthebase-data(rowandcolumn)tabletsarerowlocator,thekeyoftheunique-indextabletiskandthekeyofthenon-unique-indextabletisk,rowlocator.
Weobservedthatmajoritiesofwritesinadatabaseaccessafewclosely-relatedtables,alsowritestoasingletableupdatethebasedataandrelatedindexessimultaneously.
Bygroupingtablesintotablegroups,wecantreatrelatedwritestodierenttabletsinaTGSasanatomicwriteoperation,andonlypersistonelogentryinthelesystem.
ismechanismhelpsimprovethewriteeciencybyreducingthenumberoflogushes.
Besides,groupingtableswhicharefrequentlyjoinedhelpseliminateunnecessarydatashuing.
3.
2TableGroupShardTGSisthebasicunitofdatamanagementinHologres.
ATGSmainlycomprisesaWALmanagerandmultipletabletsbelongingtothetableshardsinthisTGS,asexampledinFigure.
TabletsareuniformlymanagedasanLSMtree:Eachtabletcon-sistsofamemorytableinthememoryoftheworkernode,andasetofimmutableshardlespersistedinthedistributedlesystem.
ememorytableisperiodicallyushedasashardle.
eshardlesareorganizedintomultiplelevels,Level,Level,.
.
.
,LevelN.
InLevel,eachshardlecorrespondstoaushedmemorytable.
StartingfromLevel,alltherecordsinthislevelaresortedandparti-tionedintodierentshardlesbythekey,andthusthekeyrangesofdierentshardlesatthesamelevelarenon-overlapping.
Leveli+canholdKtimesmoreshardlesthanLeveli,andeachshardleisofmaxsizeM.
MoredetailsoftherowandcolumntabletsareexplainedinSection.
and.
,respectively.
Atabletalsomaintainsametadatalestoringthestatusofitsshardles.
emetadataleismaintainedfollowingasimilarap-proachasRocksDB[],andpersistedinthelesystem.
Asrecordsareversioned,readsandwritesinTGSsarecompletelydecoupled.
Ontopofthat,wetakealock-freeapproachbyonlyal-lowingasinglewriterfortheWALbutanynumberofreaderscon-currentlyonaTGS.
AsHSAPscenarioshaveaweakerconsistencyrequirementthanHTAP,Hologreschoosestoonlysupportatomicwriteandread-your-writesreadtoachievehighthroughputandlowlatencyforbothreadsandwrites.
Next,weexplainindetailshowreadsandwritesareperformed.
3.
2.
1WritesinTGSsHologressupportstwotypesofwrites:single-shardwriteanddistributedbatchwrite.
Bothtypesofwritesareatomic,i.
e.
,writeseithercommitorrollback.
Single-shardwriteupdatesoneTGSatatime,andcanbeperformedatanextremelyhighrate.
Ontheotherhand,distributedbatchwriteisusedtodumpalargeamountofdataintomultipleTGSsasasingletransaction,andisusuallyperformedwithamuchlowerfrequency.
Single-shardWrite.
AsillustratedinFigure,onreceivingasingle-shardingestion,theWALmanager()assignsthewriterequestanLSN,whichconsistsofthetimestampandanincreasingsequencenumber,and()createsanewlogentryandpersiststhelogentryinthelesystem.
elogentrycontainsthenecessaryinformationtoreplaytheloggedwrite.
ewriteiscommittedaeritslogentryiscompletelypersisted.
Aerthat,()operationsinthewriterequest3275Figure:InternalsofTGSareappliedinthememorytablesofthecorrespondingtabletsandmadevisibletonewreadrequests.
Itisworthnotingthatupdatesondierenttabletscanbeparallelized(seeSection.
).
Oncethememorytableisfull,()itisushedasashardleinthelesystemandanewoneisinitialized.
Lastly,()shardlesareasynchronouslycompactedinthebackground.
Attheendofacompactionormem-orytableush,themetadataleofthetabletisupdatedaccordingly.
DistributedBatchWrite.
Weadoptatwo-phasecommitmecha-nismtoguaranteewriteatomicityfordistributedbatchwrite.
eFEnodewhichreceivesthebathwriterequestlocksalltheaccessedtabletsintheinvolvedTGSs.
eneachTGS:()assignsanLSNforthisbatchwrite,()ushesthememorytablesoftheinvolvedtabletsand()loadsthedataasintheprocessofsingle-shardin-gestionandushesthemasshardles.
Notethat,step()canbefurtheroptimizedbybuildingmultiplememorytablesandush-ingthemintothelesysteminparallel.
Oncenished,eachTGSvotestotheFEnode.
WhentheFEnodecollectsallthevotesfromparticipatingTGSs,itacknowledgesthemthenalcommitorabortdecision.
Onreceivingthecommitdecision,eachTGSpersistsalogindicatingthisbatchwriteiscommitted;otherwise,allthenewlygeneratedlesduringthisbatchwriteareremoved.
Whenthetwo-phasecommitisdone,locksoninvolvedtabletsarereleased.
3.
2.
2ReadsinTGSsHologressupportsmulti-versionreadsinbothrowandcolumntablets.
econsistencylevelofreadrequestsisread-your-writes,i.
e.
,aclientwillalwaysseethelatestcommittedwritebyitself.
Eachreadrequestcontainsareadtimestamp,whichisusedtoconstructanLSNread.
isLSNreadisusedtolteroutinvisiblerecordstothisread,i.
e.
,recordswhoseLSNsarelargerthanLSNread.
Tofacilitatemulti-versionread,aTGSmaintainsforeachtableaLSNref,whichstorestheLSNoftheoldestversionmaintainedfortabletsinthistable.
LSNrefisperiodicallyupdatedaccordingtoauser-speciedretainingperiod.
Duringthememorytableushandlecompaction,foragivenkey:()recordswhoseLSNsareequaltoorsmallerthanLSNrefaremerged;()recordswhoseLSNsarelargerthanLSNrefarekeptintact.
3.
2.
3DistributedTGSManagementInourcurrentimplementation,thewriterandallthereadersofaTGSareco-locatedinthesameworkernodetosharethemem-orytablesofthisTGS.
Iftheworkernodeisundergoingworkloadbursts,HologressupportsmigratingsomeTGSsotheoverloadedworkernodes(seeSection.
).
WeareworkingonasolutionthatmaintainsforaTGSread-onlyreplicasremotetothecorrespondingwriter,tofutherbalanceconcurrentreads.
Weplantosupporttwotypesofread-onlyrepli-cas:()afully-syncedreplicamaintainstheup-to-datecopyofboththememorytableandmetadataleoftheTGS,andcanserveallreadrequests;()apartially-syncedreplicaonlymaintainsanup-to-datecopyofthemetadatale,andcanonlyservereadsoverthedataushedintolesystem.
ReadstoaTGScanbedispatchedtodierentreplicasaccordingtotheirreadversions.
Notethat,bothread-onlyreplicasdonotneedtoreplicatetheshardles,whichareloadedfromthedistributedlesystemifrequested.
IfaTGSisfailed,thestoragemanagerrequestsanavailableslotfromtheresourcemanager,andatthesametimebroadcastsaTGS-failmessagetoallthecoordinators.
WhenrecoveringaTGS,wereplaytheWALlogsfromthelatestushedLSNtorebuilditsmem-orytables.
erecoveryisdoneonceallthememorytablesarecom-pletelyrebuilt.
Aerthat,thestoragemanagerisacknowledgedandthenbroadcastsaTGS-recoverymessagecontainingthenewloca-tiontoallthecoordinators.
ecoordinatorstemporarilyholdre-queststothefailedTGSuntilitisrecovered.
3.
3RowTabletRowtabletsareoptimizedtosupportecientpointlookupsforthegivenkeys.
Figure(a)illustratesthestructureofarowtablet:WemaintainthememorytableasaMasstree[],withinwhichwesorttherecordsbytheirkeys.
Dierently,theshardlesareofablock-wisestructure.
Ashardleconsistsoftwotypesofblocks:datablockandindexblock.
Recordsinashardlearesortedbythekey.
Consecutiverecordsaregroupedasadatablock.
Tohelplookuprecordsbytheirkeys,wefurtherkeeptrackofthestartingkeyofeachdatablockanditsosetintheshardleasapairofkey,blockoffsetintheindexblock.
Tosupportmulti-versioneddata,thevaluestoredinarowtabletisextendedasvaluecols,delbit,LSN:()thevaluecolsarethenon-keycolumnvalues;()delbitindicatesifthisisadeleterecord;()LSNisthecorrespondingwriteLSN.
Givenakey,boththememorytableandtheshardlescouldhavemultiplerecordswithdierentLSNs.
ReadsinRowTablets.
EveryreadinrowtabletsconsistsofakeyandanLSNread.
eresultisobtainedbysearchinginthememorytableandshardlesofthetabletinparallel.
Onlytheshardleswhosekeyrangesoverlapswiththegivenkeyaresearched.
Duringthesearch,arecordismarkedasacandidateifitcontainsthegivenkeyandhasanLSNequaltoorsmallerthanLSNread.
ecandidaterecordsaremergedintheorderoftheirLSNsastheresultrecord.
Ifthedelbitintheresultrecordisequalto,ornocandidaterecordisfound,thereisnorecordforthegivenkeyexistsintheversionofLSNread.
Otherwise,theresultrecordisreturned.
WritesinRowTablets.
Inrowtablets,aninsertorupdatecon-sistsofthekey,columnvaluesandanLSNwrite.
Adeletecontainsakey,aspecialdeletionmarkandanLSNwrite.
Eachwriteistrans-formedintoakey-valuepairofrowtablets.
Forinsertandupdate,thedelbitissetas.
Fordelete,thecolumneldsareemptyanddelbitissetas.
ekey-valuepairsarerstappendedintothememorytable.
Oncethememorytableisfull,itisushedintothelesystemasashardleinLevel.
iscouldfurthertriggeracas-cadingcompactionfromLevelitoLeveli+ifLeveliisfull.
3.
4ColumnTabletColumntabletsaredesignedtofacilitatecolumnscans.
Asde-pictedinFigure(b),dierentfromrowtablets,acolumntabletconsistsoftwocomponents,acolumnLSMtreeandadeletemap.
evaluestoredinacolumnLSMtreeisextendedintheformatof:valuecols,LSN,wherevaluecolsarethenon-keycolumnsandLSNisthecorrespondingwriteLSN.
InacolumnLSMtree,the3276(a)(b)Figure:(a)estructuresofarowtablet,and(b)thestructuresofacolumntabletmemorytablestorestherecordsintheformatofApacheArrow[].
Recordsarecontinuouslyappendedintothememorytableintheirarrivingorder.
Inashardle,recordsaresortedbythekeyandlog-icallysplitintorowgroups.
Eachcolumninarowgroupisstoredasaseparatedatablock.
Datablocksofthesamecolumnarecon-tinuouslystoredintheshardletofacilitatesequentialscan.
Wemaintainthemetadataforeachcolumnandtheentireshardleinthemetablocktospeeduplarge-scaledataretrieving.
emetablockstores:()foreachcolumn,theosetsofdatablocks,thevaluerangesofeachdatablockandtheencodingscheme,and()fortheshardle,thecompressionscheme,thetotalrowcount,theLSNandkeyrange.
Toquicklylocatetherowaccordingtoagivenkey,westorethesortedrstkeysofrowgroupsintheindexblock.
edeletemapisarowtablet,wherethekeyistheIDofashardle(withthememorytabletreatedasaspecialshardle)inthecol-umnLSMtree,andthevalueisabitmapindicatingwhichrecordsarenewlydeletedatthecorrespondingLSNintheshardle.
Withthehelpofthedeletemap,columntabletscanmassivelyparallelizesequentialscanasexplainedbelow.
ReadsinColumnTablets.
AreadoperationtoacolumntabletcomprisesofthetargetcolumnsandanLSNread.
ereadresultsareobtainedbyscanningthememorytableandalltheshardles.
Beforescanningashardle,wecompareitsLSNrangewithLSNread:()ifitsminimumLSNislargerthanLSNread,thisleisskipped;()ifitsmaximumLSNisequaltoorsmallerthanLSNread,theentireshardleisvisibleinthereadversion;()otherwise,onlyasubsetofrecordsinthislearevisibleinthereadversion.
Inthethirdcase,wescantheLSNcolumnofthisleandgenerateanLSNbitmapindicatingwhichrowsarevisibleinthereadversion.
Tol-teroutthedeletedrowsinashardle,weperformareadinthedeletemap(asexplainedinSection.
)withtheIDoftheshardleasthekeyatversionLSNread,wherethemergeoperationunionsallthecandidatebitmaps.
eobtainedbitmapisintersectedwiththeLSNbitmap,andjoinedwiththetargetdatablockstolteroutthedeletedandinvisiblerowsatthereadversion.
Notethatdier-entfromrowtablets,inacolumntableteachshardlecanbereadindependentlywithouttheneedofconsolidatingwithshardlesinotherlevels,asthedeletemapcanecientlytellallthedeletedrowsuptoLSNreadinashardle.
WritesinColumnTablets.
Incolumntablets,aninsertoperationconsistsofakey,asetofcolumnvaluesandanLSNwrite.
Adeleteop-erationspeciesthekeyoftherowtobedeleted,withwhichwecanquicklyndouttheleIDcontainingthisrowanditsrownumberinthisle.
WeperformaninsertatversionLSNwriteinthedeletemap,wherethekeyistheleIDandthevalueistherownumberofthedeletedrow.
eupdateoperationisimplementedasdeletefol-lowedbyinsert.
InsertionstothecolumnLSMtreeandthedeletemapcantriggermemorytableushandshardlecompaction.
3.
5HierarchicalCacheHologresadoptsahierarchicalcachingmechanismtoreduceboththeI/Oandcomputationcosts.
ereareintotalthreelayersofcaches,whicharethelocaldiskcache,blockcacheandrowcache.
Everytabletcorrespondstoasetofshardlesstoredinthedis-tributedlesystem.
elocaldiskcacheisusedtocacheshardlesinlocaldisks(SSD)toreducethefrequencyofexpensiveI/Ooper-ationsinthelesystem.
OntopoftheSSDcache,anin-memoryblockcacheisusedtostoretheblocksrecentlyreadfromtheshardles.
Astheservingandanalyticworkloadshaveverydierentdataaccesspatterns,wephysicallyisolatetheblockcachesofrowandcolumntablets.
Ontopoftheblockcache,wefurthermaintainanin-memoryrowcachetostorethemergedresultsofrecentpointlookupsinrowtablets.
4.
QUERYPROCESSING&SCHEDULINGInthissection,wepresenttheparallelqueryexecutionparadigmofHologresandtheHOSschedulingframework.
4.
1HighlyParallelQueryExecutionFigureillustratesthequery-processingworkowinHologres.
Onreceivingaquery,thequeryoptimizerintheFEnodegeneratesaqueryplanrepresentedasaDAG,anddividestheDAGatshuf-eboundariesintofragments.
erearethreetypesoffragments:read/write/queryfragments.
Aread/writefragmentcontainsaread-/writeoperatoraccessingatable,whereasaqueryfragmentonlycontainsnon-read/writeoperators.
Eachfragmentisthenparalel-lizedintomultiplefragmentinstancesinadataparallelway,e.
g.
,eachread/writefragmentinstanceprocessesoneTGS.
eFEnodeforwardsthequeryplantoacoordinator.
eco-ordinatorthendispatchesthefragmentinstancestoworkernodes.
Read/writefragmentinstancesarealwaysdispatchedtotheworkernodeshostingtheaccessedTGSs.
Queryfragmentinstancescanbeexecutedinanyworkernode,andaredispatchedtakingintoaccounttheexistingworkloadsofworkernodestoachieveloadbalancing.
elocalityandworkloadinformationaresyncedwiththestoragemanagerandresourcemanager,respectively.
Inaworkernode,fragmentinstancesaremappedintoworkunits(WUs),whicharethebasicunitsofqueryexecutioninHologres.
AWUcandynamicallyspawnWUsatruntime.
emappingisdescribedasfollows:3277Figure:WorkowofQueryParallelizationAreadfragmentinstanceisinitiallymappedtoaread-syncWU,whichfetchesthecurrentversionofthetabletfromthemetadatale,includingaread-onlysnapshotofthememorytableandalistofshardles.
Next,theread-syncWUspawnsmultipleread-applyWUstoreadthememorytableandshardlesinparallel,aswellastoexecutedownstreamoperatorsonthereaddata.
ismechanismexploitshighintra-operatorparallelismtomakebet-teruseofthenetworkandI/Obandwidth.
Awritefragmentinstancemapsallnon-writeoperatorsintoaqueryWU,followedbyawrite-syncWUpersistingthelogen-tryinWALforthewrittendata.
ewrite-syncWUthenspawnsmultiplewrite-applyWUs,eachupdatingonetabletinparallel.
AqueryfragmentinstanceismappedtoaqueryWU.
4.
2ExecutionContextAsaHSAPservice,Hologresisdesignedtoexecutemultiplequeriessubmittedbydierentusersconcurrently.
eoverheadofcontextswitchingamongWUsofconcurrentqueriescouldbecomeabottleneckforconcurrency.
Tosolvethisproblem,Hologrespro-posesauser-spacethread,namedasexecutioncontext(EC),astheresourceabstractionforWU.
Dierentfromthreadswhicharepre-emptivelyscheduled,ECsarecooperativelyscheduledwithoutus-inganysystemcallorsynchronizationprimitive.
usthecostofswitchingbetweenECsisalmostnegligible.
HOSusesECasthebasicschedulingunit.
Computationresourcesareallocatedinthegranu-larityofEC,whichfurtherschedulesitsinternaltasks.
AnECwillbeexecutedonthethreadwhichitisassignedto.
4.
2.
1ECPoolsInaworkernode,wegroupECsintodierentpoolstoallowisola-tionandprioritization.
ECpoolscanbecategorizedintothreetypes:data-boundECpool,queryECpoolandbackgroundECpool.
Adata-boundECpoolhastwotypesofECs:WALECandtabletEC.
WithinaTGS,thereisoneWALECandmultipletabletsECs,oneforeachtablet.
eWALECexecutesthewrite-syncWUs,whilethetabletECexecutesthewrite-applyWUsandread-syncWUsonthecorrespondingtablet.
eWAL/tabletECsprocessWUsinasingle-threadedway,whicheliminatesthenecessityofsynchronizationbetweenconcurrentWUs.
InaqueryECpool,eachqueryWUorread-applyWUismappedtoaqueryEC.
InabackgroundECpool,ECsareusedtoooadexpensiveworkfromdata-boundECsandimprovethewritethroughput.
isin-cludesmemorytableushandshardlecompaction,etc.
Withthisdesign,thedata-boundECsarereservedmainlyforopera-tionsontheWALandwritestomemorytables,andthusthesys-temcanachieveaveryhighwritethroughputwithouttheover-headoflocking.
TolimittheresourceconsumptionofbackgroundECs,wephysi-callyisolatebackgroundECsfromthedata-boundandqueryECsindierentthreadpools,andexecutethebackgroundECsinathreadpoolwithlowerpriority.
4.
2.
2InternalsofExecutionContextNext,weintroducetheinternalstructureofanEC.
TaskQueue.
erearetwotaskqueuesinanEC:()alock-freeinternalqueuewhichstorestaskssubmittedbytheECitself,()athread-safesubmitqueuewhichstorestaskssubmittedbyotherECs.
Oncescheduled,tasksinthesubmitqueuearerelocatedtothein-ternalqueuetofacilitatelock-freescheduling.
TasksintheinternalqueuearescheduledinFIFOorder.
State.
DuringthelifetimeofanEC,itswitchesbetweenthreestates:runnable,blockingandsuspended.
BeingsuspendedmeanstheECcannotbescheduled,asitstaskqueuesareempty.
Submit-tingtasktoanECswitchesitsstateasrunnable,whichindicatestheECcanbescheduled.
IfallthetasksinanECareblocked,e.
g.
,byI/Ostall,theECswitchesoutanditsstateissetasblocking.
Oncereceivingnewtaskortheblockedtaskreturns,ablockingECbe-comesrunnableagain.
ECscanbeexternallycancelledorjoined.
CancellinganECwillfailtheincompletedtasksandsuspendit.
Af-teranECisjoined,itcannotreceivenewtasksandsuspendsitselfaf-teritscurrenttasksarecompleted.
ECsarecooperativelyscheduledontopthesystemthreadpools,andthustheoverheadofcontextswitchingisalmostnegligible.
4.
2.
3FederatedQueryExecutionHologressupportsfederatedqueryexecutiontointeractwiththerichservicesavailablefromtheopensourceworld(e.
g.
,Hive[]andHBase[]).
WeallowasinglequeryspanningHologresandotherquerysystemswhicharephysicallyisolatedindierentpro-cesses.
Duringquerycompilation,operatorstobeexecutedindif-ferentsystemsarecompiledasseparatefragments,whicharethendispatchestotheirdestinationsystemsbycoordinatorsinHologres.
OthersystemsinteractingwithHologresareabstractedasspecialstubWUs,eachofwhichismappedtoanECuniformlymanagedinHologres.
isstubWUhandlespullrequestssubmittedbyWUsinHologres.
Besidesfunctionalityconsiderationssuchasaccess-ingdatainothersystems,thisabstractionalsoservesasanisolationsandboxforsystemsecurityreasons.
Forinstance,userscansubmitquerieswithpossibly-insecureuser-denedfunctions.
HologresdisseminatestheexecutionofthesefunctionstoPostgreSQLpro-cesses,whichexecutetheminacontextphysicallyisolatedfromotherusersinHologres.
4.
3SchedulingMechanismInthissubsection,weintroducedetailsabouthowWUsofaqueryarescheduledtoproducethequeryoutputs.
AsynchronousPull-basedQueryExecution.
Queriesareexecutedasynchronouslyfollowingapull-basedparadigminHologres.
Inaqueryplan,theleaffragmentsconsumeexternalinputs,i.
e.
,shardles,andthesinkfragmentproducesqueryoutputs.
epull-basedqueryexecutionstartsfromthecoordinator,whichsendspullre-queststotheWUsofthesinkfragments.
Whenprocessingapullre-quest,thereceiverWUfurthersendspullrequeststoitsdependentWUs.
OncetheWUofareadoperator,i.
e.
,columnscan,receivesapullrequest,itreadsabatchofdatafromthecorrespondingshardleandreturnstheresultsintheformatofrecordbatch,EOS,whererecordbatchisabatchoftheresultrecordsandEOSisaboolindicatingiftheproducerWUhascompleteditswork.
On3278receivingresultsforthepreviouspullrequest,thecoordinatorde-terminesifthequeryhascompletedbycheckingthereturnedEOS.
Ifthequeryhasnotcompleted,itsendsoutanotherroundofpullrequests.
AWUdependingonmultipleupstreamWUsneedstopullfrommultipleinputsconcurrentlytoimprovetheparallelismofqueryexecutionandtheutilizationofcomputation/networkre-source.
Hologressupportsconcurrentpullsbysendingmultipleasynchronouspullrequests.
isapproachismorenaturalande-cientcomparedwithtraditionalconcurrencymodelwhichrequiresmultiplethreadstocooperate.
Intra-workerpullrequestisimplementedasafunctioncall,whichinsertsapulltaskintothetaskqueueofEChostingthereceiverWU.
Aninter-workerpullrequestisencapsulatedasanRPCcallbetweenthesourceanddestinationworkernodes.
AnRPCcallcontainsIDofthereceiverWU,accordingtowhichthedestinationworkernodeinsertsapulltaskintothetaskqueueofthecorrespondingEC.
Backpressure.
Basedontheaboveparadigm,weimplementedapull-basedbackpressuremechanismtopreventaWUfrombeingoverwhelmedbyreceivingtoomanypullrequests.
Firstofall,weconstrainthenumberofconcurrentpullrequeststhataWUcanissueatatime.
Secondly,inaWUwhichproducesoutputsformul-tipledownstreamWUs,processingapullrequestmayresultsintheproductionofnewoutputsformultipledownstreamWUs.
eseoutputsarebueredwaitingforthepullrequestsfromthecorre-spondingWUs.
TopreventtheoutputbuerinaWUgrowingtoofast,thedownstreamWUthatpullsmorefrequentlythanotherswilltemporarilyslowdownsendingnewpullrequeststothisWU.
Prefetch.
HOSsupportsprefetchingresultsforfuturepullrequeststoreducethequerylatency.
Insuchcases,asetofprefetchtasksareenqueued.
eresultsofprefetchtasksarequeuedinaprefetchbuer.
Whenprocessingapullrequest,resultsintheprefetchbuercanbeimmediatelyreturnedandanewprefetchtaskiscreated.
4.
4LoadBalancingeloadbalancingmechanisminHologresareoftwofolds:()migratingTGSsacrossworkernodes,and()redistributingECsamongintra-workerthreads.
MigrationofTGSs:Inourcurrentimplementation,read/writefrag-mentinstancesarealwaysdispatchedtotheworkernodeshostingtheTGS.
IfoneTGSbecomesahotspot,oraworkernodeisover-loaded,HologressupportsmigratingsomeTGSsfromtheover-loadedworkernodestootherswithmoreavailableresources.
TomigrateaTGS,wemarktheTGSasfailedinthestoragemanager,andthenrecoveritinanewworkernodefollowingthestandardTGSrecoveryprocedure(seeSection.
.
).
AsdiscussedinSec-tion.
.
,weareimplementingread-onlyreplicasforTGSs,whichenablesbalancingthereadfragmentinstancestoaTGS'sread-onlyreplicaslocatedinmultipleworkernodes.
RedistributionofECs:Inaworkernode,HOSredistributesECsamongthreadswithineachECpooltobalancetheworkload.
HOSperformsthreetypesofredistribution:()anewlycreatedECisal-waysassignedtothethreadwithminimumnumberofECsinthethreadpool;()HOSperiodicallyreassignsECsbetweenthreadssuchthatthedierenceofthenumbersofECsamongthreadsismini-mized;()HOSalsosupportsworkloadstealing.
OnceathreadhasnoECtoschedule,it"steals"onefromthethreadwhichhasthemax-imumnumberofECsinthesamethreadpool.
ereassignmentofanECisconductedonlywhenitisnotrunninganytask.
4.
5SchedulingPolicyAcriticalchallengeforHOSistoguaranteethequery-levelSLOinmulti-tenantscenarios,e.
g.
,large-scaleanalyticqueriesshouldnotblockthelatency-sensitiveservingqueries.
Tosolvethisproblem,weproposeSchedulingGroup(SG)asavirtualresourceabstractionforthedata-boundandqueryECsinaworkernode.
Morespeci-cally,HOSassignseachSGashare,whosevalueisproportionaltotheamountofresourcesassignedtothisSG.
eresourcesofanSGarefurthersplitamongitsECs,andanECcanonlyconsumeresourcesallocatedtoitsownSG.
Inordertoseparatetheingestionworkloadsfromquerywork-loads,weisolatedata-boundECsandqueryECsintodierentSGs.
Data-boundECshandlecriticaloperationsthatneedsynchroniza-tionsharedbyallqueries,andaremainlydedicatedtoingestionworkload(read-syncWUareusuallyverylight-weight),wegroupallthedata-boundECsinasingledata-boundSG.
Onthecontrary,weputqueryECsofdierentqueriesintoseparatequerySGs.
Weassignthedata-boundSGalargeenoughsharetohandleallinges-tionworkload.
Bydefault,allthequerySGsareassignedofthesamesharetoenforcefairresourceallocation.
SGsharesarecongurable.
GivenaSG,theamountofCPUtimeassignedtoitsECsinatimeintervalisimpactedbytwofactors:()itsshare,()theamountofCPUtimeithasoccupiedinthelasttimeinterval.
eshareofanSGisadjustedaccordingtothestatusofitsECsinthelasttimeinterval,asexplainedbelow:AnECcanonlybescheduledwhenitisrunnable.
DenotingtheshareofECiasECsharei,wecalculateECshareavgitorepresentthepracticalshareofECiinatimeinterval,whilethepracticalshareofSGiisthesumofthesharesofitsECs:ECshareavgi=ECshareiTrunTrun+Tspd+TblkSGshareavgi=Nj=ECshareavgjTrun,TspdandTblkrepresentthetimeintervalswhileECiisinthestatusofrunnable,suspendandblocking.
ForECiinSGj,wemaintainaVirtualRuntimereectingthestateofitshistoricalresourceallocation.
DenotingtheCPUtimethatECiisassignedofduringthelasttimeintervalasCPUtimei,theincrementonECi'sVirtualRuntime,vruntimei,duringthelasttimeintervaliscalculatedasfollows:ECvsharei=ECshareiSGsharejSGshareavgj;vruntimei=CPUtimeiECvshareiWhenselectingthenextECtobescheduled,thethreadscheduleralwaysselectstheonewiththeminimumvruntime.
5.
EXPERIMENTSInthissection,weconductexperimentstoevaluatetheperfor-manceofHologres.
WerststudytheperformanceofHologresonOLAPworkloadsandservingworkloadsrespectively,bycom-paringitwithstate-of-the-artOLAPsystemsandservingsystems(Section.
).
WeshowthatHologreshassuperiorperformanceevencomparedwiththesespecializedsystems.
enwepresentex-perimentresultsonvariousperformanceaspectsofHologreshan-dlinghybridservingandanalyticalprocessingworkloads:WestudyinisolationhowwellthedesignofHologrescanpar-allelizeandscalewhenhandlinganalyticalworkloadsorservingworkloadsalone.
Weexperimentwithincreasingtheworkloadandthecomputationresource(Section.
).
3279(a)(b)(c)Figure:(a)AnalyticalquerylatenciesofHologresandGreenplumontheTPC-Hbenchmark.
(b)AbreakdownstudyontheeectsofHologres'sperformance-criticalfeatures.
(c)Servingquerythroughputs/latenciesofHologresandHBaseontheYCSBbenchmark.
WestudytwoaspectsofHOS'sperformance:()whetherHOScanenforceresourceisolationandfairschedulingwhenhandlinghy-bridservingandanalyticalworkloads;()whetherHOScanreactinapromptwaytosuddenworkloadbursts(Section.
).
WestudytheeciencyofHologres'sstoragedesign:()theim-pactofhigh-speeddataingestiononreadperformance,and()thewritelatencyandwritethroughputunderthemaintenanceofmultipleindexes(Section.
).
5.
1ExperimentSetupWorkloads.
WeusetheTPC-Hbenchmark[](TB)tosimulateatypicalanalyticalworkload,andtheYCSBbenchmark[]tosim-ulateatypicalservingworkload,whichcontainsatableofmil-lionrecords,eachrecordhaselds,andeacheldisbytes.
Whentestingonahybridservingandanalyticalworkloadonthesamedata(Sectioin.
.
),weusetheTPC-HdatasetandmixtheTPC-Hquerieswithsyntheticservingqueries(pointlookup)onthelineitemtable.
Tostudyundermixedread/writerequests,wesim-ulateaproductionworkloadinAlibaba,referredtoasPW.
PWhasashoppingcarttablethatconsistsofmillionrows,andhasupdatespersecond.
Eachrecordhaselds,andthesizeofarecordisbytes.
Wereplaytheupdatesduringtheexperiment.
SystemCongurations.
Weuseaclusterconsistingofphysicalmachines,eachwithvirtualcores(viahyper-threading),GBmemoryandTSSD.
Unlessexplicitlyspecied,weusethisdefaultsettingintheexperimentsontheTPC-HandYCSBbenchmarks.
Tothebestofourknowledge,thereisnoexistingHSAPsystem.
InordertostudytheperformanceofHologres,wecompareditwithspecializedsystemsforanalyticalprocessingandservingre-spectively.
Foranalyticalprocessing,wecomparedagainstGreen-plumforserving,wecomparedagainstHBase.
.
[].
edetailedcongurationsofeachsystemareexplainedasfollows:()eGreenplumclusterhasintotalsegments,whichareevenlyallocatedamongphysicalmachines.
Eachsegmentisassignedcores.
isistherecommendedsettingfromGreenplum'socialdocumentation[],inconsiderationofbothintra-query(multi-pleplanfragmentsinaquery)andinter-queryconcurrencyduringqueryexecution.
Greenplumusesthelocaldiskstostorethedatales,andthedataisstoredincolumnformat.
()eHBaseclusterhasregionservers,eachofwhichisdeployedonaphysicalma-chine.
HBasestoresthedatalesinHDFS,conguredusingthelocaldisks.
HBasestoresthedatainrowformat.
()eHologresclusterhasworkernodes,eachworkernodeoccupyingonephysi-calmachineexclusively.
TomakeafaircomparisonwithGreenplumandHBase,Hologresisalsoconguredtousethelocaldisks.
edataisstoredinbothrowandcolumnformatsinHologres.
eexperimentsonthePWworkloadareconductedinacloudenvironmentwith,coresand,GBmemory.
WeusePangu—theremotedistributedlesysteminAlibabaCloudtostorethedata.
ebasedataoftheshoppingcarttableisstoredincolumnformat.
istablealsohasanindexstoredinrowformat.
ExperimentMethodology.
Alltheexperimentsstartwithawarm-upperiodofminutes.
Foreveryreporteddatapoint,werepeattheexperimentfortimesandreporttheaveragevalue.
Intheexperiments,weusethestandardYCSBclientforalltheexperimentsontheYCSBdata.
ForexperimentsonTPC-HandPWdata,weimplementedaclientsimilartoYCSB.
Morespecically,theclientconnectionssubmitqueryrequestsasynchronously.
Wecancongurethemaximalnumberofconcurrentqueriesasingleconnectioncansubmit(denotedasW).
Multipleclientconnectionssubmitqueryrequestsconcurrently.
Unlessexplicitlyspecied,wesetW=throughouttheexperiments.
5.
2OverallSystemPerformanceInthissetofexperiments,westudytheperformanceofHologresonanalyticalworkloadsandservingworkloadsrespectively,com-paredagainstspecializedOLAPandservingsystems.
AnalyticalWorkloads.
Inthisexperiment,wecompareHologresandGreenplumusingtheTPC-Hdataset.
Toaccuratelymeasurethequerylatency,weuseasingleclientandsetWto.
Figure(a)reportstheaverageend-to-endlatencyofthequeries.
Asshowninthegure,HologresoutperformsGreenplumonalltheTPC-Hqueries:thequerylatencyinHologresisonaverageonly.
ofthatinGreenplum.
ForQ,HologresisXfasterthanGreenplum.
ereasonsareasfollows:()HOSenablesex-iblehighintra-operatorparallelismforqueryexecution.
ereadparallelismcangoashighasthenumberofshardlesinthetables.
eexibilityallowsHologrestohavetherightparallelismforallqueries.
Ontheotherhand,GreenPlum'sparallelismisdeterminedbythenumberofsegmentsandcannotmakefulluseofCPUforallqueries(e.
g.
,Q).
()elayoutofcolumntabletssupporte-cientencodingandindexes.
esestoragelayoutoptimizationscangreatlyimprovetheperformance,ifthequeryhasltersthatcanbepusheddowntodatascan(e.
g.
,Q).
()Hologresadoptse-cientvectorizedexecution,andcansupporttheAVX-instructionset[],whichcanfurtherspeedupqueriesthatbenetfromvector-izedexecution(e.
g.
,Q).
()Hologrescangeneratebetterplans,makinguseofoptimizationssuchasdynamicltersforjoins(e.
g.
,Q).
eseoptimizationstogethercontributetotheimprovedper-formanceofanalyticprocessinginHologres.
Toverifytheeectoftheaboveperformance-criticaltechniquesweconductabreakdownexperimentusingtheTPC-Hbench-WeuseAVX-mainlyin:()arithmeticexpressions(e.
g.
,addi-tion,subtraction,multiplication,division,equals,not-equals);()ltering;()bitmapoperations;()hashvaluecomputation;and()batchcopy.
3280(a)(b)(c)(d)(e)Core=(f)Core=(g)Core=(h)Core=Figure:ethroughputandlatencyofanalyticalworkloadsunder(a)(b)dierentnumbersofconcurrentqueriesand(c)(d)dierentnumbersofcores.
(e)(f)(g)(h)ethroughput/latencycurvesofservingworkloadsunderdierentnumbersofcores.
mark.
Foreachtechniquewechoosearepresentativequery,andcomparethequerylatencyinHologreswiththetechniqueturnedonando.
Specically:For(),weuseQ,andtoturnthefeatureowesettheparallelismtothenumberofsegmentsinGreenplum.
For(),weuseQ,andtoturnthefeatureowedisablethedictionaryencoding.
For(),weuseQ,andtoturnthefeatureoweuseabuildwithoutAVX-.
For(),weuseQ,andtoturnthefea-tureowedisablethedynamiclteroptimization.
eresultsarereportedinFigure(b),wherearedenotedasQ-DOP,Q-Storage,Q-AVX,andQ-Planrespectively.
Aswecansee,thesetechniquesbringsaperformanceboostfrom.
Xto.
X.
Wealsoconductamicro-benchmarkonthesingle-machineper-formancebycomparingHologreswithVectorwise(ActianVector.
[])usingtheTPC-Hbenchmark(GB).
eexperimentisconductedonasinglemachinewithcoresandGBmemory.
IttakesHologresstorunalltheTPC-Hqueries,whilesforVectorwise.
isresultshowsthatHologresstillhasroomforper-formanceimprovements.
However,theoptimizationtechniquesinVectorwiseareapplicabletoHologresandinfutureworkwewillintegratethemintoHologres.
ServingWorkloads.
Inthisexperiment,wecompareHologresandHBaseintermsofthethroughputandlatencyusingtheYCSBbenchmark.
WegraduallyincreasethequerythroughputfromKQPStoKQPS.
Foreachthroughput,wereportthecorrespond-ingaverage,andpercentileofquerylatenciesofbothsys-temsinFigure(c).
WesetthelatencySLOtoms,anddonotreportthedatapointsexceedingtheSLO.
Firsttonotethat,HBasedoesnotscaletothroughputslargerthanKQPS,asthequerylatencyexceedsthelatencySLO.
Whereas,evenatKQPS,thelatencyofHologresisstillunderms,andthelatencyisevenbelow.
ms.
ForthroughputsunderKQPS,theaverage,andlatenciesofHologresonav-eragearebetterthanHBasebyX,XandXrespectively.
isisbecausethethread-basedconcurrencymodelinHBaseincurssig-nicantcontextswitchingoverheadwhenfacinghighlyconcurrentservingworkload.
Onthecontrary,executioncontextsinHologresareverylight-weightandcanbecooperativelyscheduledwithlittlecontextswitchingoverhead.
isdesignalsomakestheschedulingwellundercontrol,guaranteeingthestabilityofquerylatencies.
Forinstance,atthroughput=KQPS,thelatencyofHBaseis.
Xhigherthanitsaveragelatency;onthecontrary,thisdier-enceinHologresisonly.
X.
eaboveexperimentsclearlydemonstratethatwiththenewstor-ageandschedulingdesign,Hologresconsistentlyoutperformsstate-of-the-artspecializedanalyticalsystemsandservingsystems.
5.
3ParallelismandScalabilityofHologresNext,westudytheparallelismandscalabilityofHologreswhenhandlinganalyticalworkloadsandservingworkloadsrespectively.
AnalyticalWorkloads.
Foranalyticalworkloads,westudytwoas-pects:()howwellHologrescanparallelizeanalyticalqueries,and()howscalableHologresiswithmorecomputationresources.
WechooseTPC-HQasarepresentativeOLAPqueryofsequentialscansoveralargeamountofdata.
Intherstexperiment,weusethedefaultclustersetting(workernodeseachwithcores).
Weuseasingleclienttosubmitthequeries,butgraduallyincreasethenumberofconcurrentqueriesWfromto.
eresultsarereportedinFigure(a)and(b).
Aswecansee,withthenumberofconcurrentqueriesincreasing,thethroughputkeepsstable.
isresultclearlyshowsthatevenwithasingleanalyticalquery,Hologrescanfullyutilizetheparallelisminthehardware.
elatencyincreaseslinearlyastheresourcesareevenlysharedbyalltheconcurrentqueries.
Inthesecondexperiment,wexthenumberofconcurrentqueriesW=,butscaleouttheresources.
Specically,weuseworkernodes,andgradullyincreasethenumberofcoresineachworkernodefromto.
eresultsarepresentedinFigure(c)and(d),whichshowthatthethroughputincreaseslinearly,andmeanwhilethequerylatencydecreasesasthenumberofcoresincreases.
Again,thisshowsthatthehighintra-operatorparallelismmechanismofHologrescanautomaticallysaturatethehardwareparallelism.
ServingWorkloads.
Inthissetofexperiments,weevaluatethethroughputandlatencyofHologresonservingworkloadsbyvary-ingtheamountofresources.
Again,weuseworkernodes,andgraduallyincreasethenumberofcoresfromtoineachworkernode.
Foreachclustersetting,weincreasethethroughputuntilthelatencyexceedsalatencySLOofms.
Weuseclientstocon-tinuouslysubmitthequeries.
Wereportthecorrespondingquerylatenciesforeachthroughput.
eresultsarepresentedinFigure(e)-(h)respectively.
Wehavetwoobservationsfromthesegures.
First,themaximumthrough-3281(a)(b)Figure:(a)Hybridworkload:thelatencyCDFoftheforegroundservingqueriesunderdierentbackgroundanalyticalworkloads.
(b)edynamicsharesofCPUtimeHOSassignedtoconcurrentqueries.
(a)(b)(c)(d)Figure:(a)eforegroundlatencyofreadqueriesunderdierentbackgroundwriteworkloads.
(b)edistributionofper-TGSwritethroughputovertimeinthePWworkload.
(c)(d)ewritelatency/throughputwhenmaintainingvariednumbersofsecondaryindexes.
putthatHologrescanachieveincreaseslinearlyasthenumberofcoresincreases.
Forinstance,wecanseethatthemaximumthrough-putatcore=istimesofthemaximumthroughputatcore=.
Second,beforethesystemreachesitsmaximumthroughput,thequerylatenciesremainatastablelevel.
Takingcore=asanex-ample,theaverage,andlatenciesincreaseveryslowlyasthethroughputgrows.
isisduetothefactthatHologrescanfullycontroltheschedulingofexecutioncontextsinuserspace.
5.
4PerformanceofHOSInthissubsection,westudytwoperformanceaspectsofHOS:()resourceisolationunderhybridservingandanalyticalworkloads,and()schedulingelasticityundersuddenworkloadbursts.
5.
4.
1ResourceIsolationunderHybridWorkloadsAkeyschedulingrequirementinHSAPservicesisthatthelatency-sensitiveservingqueriesarenotaectedbyresource-consumingan-alyticalqueries.
Tostudythis,wegenerateahybridserving/analyt-icalworkloadthathastwoparts:()background:Wecontinuouslysubmitanalyticalqueries(TPC-HQwithdierentpredicates)inthebackground.
WevarythebackgroundworkloadsbyincreasingthenumberofconcurrentqueriesWfromto.
()foreground:Wesubmitservingqueriesintheforegoundandmeasurethequerylatency.
Toaccuratelytestthelatency,wesetthenumberofconcur-rentqueriesW=.
Foreachsettingofthebackgroundworkloads,wecollectKdatapoints,andplottheirCDF.
Figure(a)presentstheresults.
Wecanseethat:byincreasingthenumberofbackgroundqueriesfromto,thereisasmallin-crementonthelatencyofservingquery;butfurtherincreasingthebackgroundworklaods(fromto)bringsnoincrement.
ItclearlyshowsthatresourcesallocatedtodierentqueriesarewellisolatedbyHOS,becauseexecutioncontextsofdierentqueriesaregroupedintoseparateschedulinggroups.
erefore,analyticalqueriesandservingqueriescancoexistinthesamesystemwhileboththeirlatencySLOscanstillbefullled.
5.
4.
2SchedulingElasticityunderSuddenBurstsInthisexperiment,wedemonstratehowwellHOScanreacttosuddernworkloadbursts.
eexperimentisstartedbyconcurrentlyissuingQandQattime.
Attime,weissuenewqueries(Q-Q).
Q-Qnishroughlyattime.
Attime,queryQentersthesystem.
QandQnishroughlyattime.
Intheend,attime,wesubmitQandQ,andleaveQ-Qruntocompletion.
Allthequeriesareassignedwithequalpriorities.
Figure(b)showsthefractionofCPUusedbyeachqueryalongthetimeline.
Notethatattime,HOSquicklyadjuststheresourceassignmentsothatallthesevenquerieshaveanequalshareofCPU.
Attime,aerQ-Qnishexecution,HOSimmediatelyadjuststheschedul-ingandreassignsCPUequallybetweenQandQthatarestillrun-ning.
Similarbehaviorscanbeobservedattime,and.
isexperimenthighlightsthatHOScandynamicallyandpromptlyad-justitsschedulingbehaviorsaccordingtothereal-timeconcurrentworkloadsinthesystem,alwaysguaranteeingfairsharing.
5.
5PerformanceofHologresStorageInthissetofexperiments,weevaluatetheeectsofread/writeseparationonquerylatencyandstudythewriteperformanceunderindexmaintenanceinHologres.
5.
5.
1SeparatingRead/WriteOperationsTostudytheimpactsofwritesonquerylatency,wegenerateamixedread/writeworkloadsonthePWworkloadconsistingoftwoparts:()background:WereplaythetuplewritesinPWtosimulatea-minutebackgroundworkloads.
Wevarythewritethroughputbyincreasingthenumberofwriteclientsfromto.
ewritesareuniformlydistributedacrossTGSs.
E.
g.
,forthecasethatthenumberofwriteclientsis,wesamplethewritethroughputev-eryseconds,andreporttheaverage/min/maxwritethroughputsamongalltheTGSsinFigure(b).
()foreground:WeuseclientstosubmitOLAPqueriesastheforegoundworkloads.
Toaccuratelymeasurethequerylatency,eachclienthasitsWsetto.
Were-3282porttheaveragequerylatencyateachthroughputsettinginFig-ure(a).
Asshown,thelatencyoftheOLAPqueriesisstablede-spiteoftheincreaseonwritethroughputs.
isresultevidencesthathigh-throughputwriteshaslittleimpactonquerylatencies.
isisbecauseoftheread/writeseparationinHologres.
eversionedtabletsguaranteethatreadsarenotblockedbywrites.
5.
5.
2WritePerformanceNext,westudythewriteperformanceofHologresunderindexmaintenanceusingtheYCSBbenchmark,wherewecreateanum-berofsecondaryindexesfortheYCSBtable.
Wevarythenumberofsecondaryindexesfromto.
Foreachsetting,wepushthesys-temtoitsmaximumwritethroughputandreporttheandpercentileofthewritelatencies.
AsshowninFigure(c)and(d),asthenumberofindexesin-creases,thewritelatencyandthewritethroughputkeeprathersta-ble,andonlychangeslightly.
Comparedtothecasewithnosec-ondaryindex,maintainingsecondaryindexesonlyincursaincrementonthewritelatencyandadecrementonthewritethroughput.
isresultshowsthatindexmaintenanceinHologresisveryecientandhasverylimitedimpactonwriteperformance.
emainreasonsarethreefolds:()HologresoptimizesthewriteperformancebysharingaWALamongalltheindextabletsinaTGS.
erefore,addingmoreindexesdoesnotincuradditionallogushes.
()ForeachwritetoaTGS,eachindexisupdatedbyasep-aratewrite-applyWUinparallel.
()Hologresaggressivelyparal-lelizesoperationssuchasmemorytableushesandlecompactionsbyooadingthemtothebackgroundECpool.
Withenoughcom-putationresources,thisdesignremovesperformancebottleneck.
6.
RELATEDWORKOLTPandOLAPSystems.
OLTPsystems[,,]adoptrowstoretosupportquicktransactionswhichfrequentlyperformpointlookupsoverasmallnumberofrows.
OLAPsystems,,]utilizecolumnstoretoachieveecientcolumnscans,whichisthetypicaldataaccesspatterninanalyticqueries.
UnliketheaboveOLTP/OLAPsystems,Hologressupportshybridrow-columnstorage.
AtablecanbestoredinboththerowandcolumnstorageformatstoecientlysupportbothpointlookupandcolumnscansrequiredbyHSAPworkloads.
MPPdatabaseslikeGreenplum[]usuallypartitiondataintolargesegments,andco-locatethedatasegmentswiththecomputingnodes.
Whenscalingthesystem,MPPdatabasesusuallyneedtoreshardthedata.
Conversely,HologresmanagesdatainTGSswhichisamuchsmallerunitsegments.
HologresmapsTGSsdynamicallytoworkernodes,andcanexiblymigratebetweenworkernodeswith-outreshardingthedata.
Also,theworkernodesonlyneedtokeepthememorytablesofthehostedTGSsinmemory,butfetchTGS'sshardlesfromtheremotelesystemondemand.
Intermsofmulti-tenantscheduling,[]handlesdierentrequestsindierentprocessesandreliesontheOStoscheduleconcurrentqueries,eas-ilyputtingahardlimitonthequeryconcurrency.
InsteadHologresmultiplexesconcurrentqueriesonasetofuserspacethreads,achiev-ingmuchbetterqueryconcurrency.
[,]studythehighlyparallelqueryprocessingmechanismsforanalyticalworkloads.
eydecomposequeryexecutionintosmalltasksandschedulestasksacrossasetofthreadspinnedinphysi-calcores.
Hologrestakesasimilarhighparallelapproach.
ButHologresusesahierarchicalschedulingframework,andtheab-stractionofworkunitsreducesthecomplexityandoverheadswhenschedulingalargenumberoftasksinamulti-tenantscenario.
Exe-cutioncontextsandschedulinggroupsprovideapowerfulmecha-nismtoensureresourceisolationacrossdierenttenants.
[]dis-cussesaCPUsharingtechniqueforperformanceisolationinmulti-tenantdatabases.
ItemphasizesanabsoluteCPUreservationthatisrequiredinDatabase-as-a-Serviceenvironments.
While,HologresonlyrequiresrelativeCPUreservation,whichisenoughtopreventanalyticalqueriesformdelayingservingqueries.
HTAPSystems.
Inrecentyears,withthefastincreasingneedsformorereal-timeanalysis,wehaveseenalotofresearchinterestonprovidingHybridTransactional/AnalyticalProcessing(HTAP)so-lutionsoverbigdatasets.
[]studieshowthehybridrowandcol-umnformathelpsimprovethedatabases'performanceforquerieswithvariousdataaccesspatterns.
Follow-upsystemssuchasSAPHANA[],MemSQL[],HyPer[],OracleDatabase[]andSQLServer[,]supportbothtransactionalandanalyticalpro-cessing.
eyusuallyuserowformatsforOLTPandcolumnfor-matsforOLAP,butrequireconvertingthedatabetweenrowandcolumnformats.
Duetotheseconversions,newlycommitteddatamightnotbereectedinthecolumnstoresimmediately.
Onthecontrary,Hologrescanstoretablesinbothrowandcolumntablets,andeachwriteintoatableupdatesbothtypesoftabletsatthesametime.
Hologresparallelizeswritestoalltabletsatthesametimetoachievehighwritethroughput.
Inaddition,HSAPscenarioshavemuchhigheringestionratesthantransactionratesinHTAPsce-nario(e.
g.
,usersusuallygeneratetensofpagevieweventsbeforemakingapurchasetransaction),butusuallywithaweakerconsis-tencyrequirement.
Hologresdeliberatelyonlysupportsatomicwriteandread-your-writeread,whichachievesamuchhigherread-/writethroughputbyavoidingthecomplexconcurrencycontrol.
[]studiestaskschedulingforhighlyconcurrentworkloadsinHTAPsystems.
ForOLTPworkloads,itadaptsconcurrencyleveltosaturateCPUasOLTPtasksincludeheavyusageofsynchroniza-tion.
However,Hologresadoptsalatch-freeapproachandavoidsfrequentblocking.
ForOLAPworkloads,itusesaconcurrencyhinttoadjusttaskgranularityforanalyticalworkloads,whichcanbein-tegratedintoHologrestoscheduleexecutioncontexts.
NewSQL.
eshardingmechanismadoptedinHologresissim-ilartoBigTable[]andSpanner[].
BigTableusestheabstrac-tionoftabletablettofacilitaterangesearchoversorteddata.
Span-nerisaglobally-distributedkey-valuestoresupportingstrongcon-sistency.
edatashardinSpannerisusedasthebasicunitformaintainingdataconsistencywiththeexistenceofdistributeddatareplication.
UnlikeSpannerwhichismainlyusedasanOLTPso-lution,Hologresdeliberatelychoosestosupportaweakerconsis-tencymodelforHSAPscenariostochaseforbetterperformance.
7.
CONCLUSION&FUTUREWORKereareahostofnewtrendstowardsafusionofservingandanalyticalprocessing(HSAP)inmodernbigdataprocessing.
InAl-ibaba,wedesignandimplementHologres,acloud-nativeHSAPservice.
Hologresadoptsanoveltablet-basedstoragedesign,anexecutioncontext-basedschedulingmechanism,aswellasacleardecouplingofstorage/computationandreads/writes.
isenablesHologrestodeliverhigh-throughputdataingestioninreal-timeandsuperiorqueryperformanceforthehybridservingandanalyt-icalprocessing.
WepresentacomprehensiveexperimentalstudyofHologresandanumberofbigdatasystems.
OurresultsshowthatHologresoutperformsevenstate-of-the-artsystemsthatarespe-cializedforanalyticalorservingscenarios.
ereareanumberofopenchallengesforevenhigherperfor-manceinHSAP.
esechallengesincludebetterscale-outmecha-nismforread-heavyhotspots,betterresourceisolationofmemorysubsystemandnetworkbandwidth,andabsoluteresourcereserva-tionindistributedenvironments.
Weplanonexploringtheseissuesaspartoffuturework.
32838.
REFERENCES[]Actianvector.
https://www.
actian.
com.
[]Apachearrow.
https://arrow.
apache.
org.
[]Apachehdfs.
https://hadoop.
apache.
org.
[]Flink.
https://flink.
apache.
org.
[]Greenplum.
https://greenplum.
org.
[]Hbase.
https://hbase.
apache.
org.
[]Hive.
https://hive.
apache.
org.
[]Intelavx-instructionset.
https://www.
intel.
com/content/www/us/en/architecture-and-technology/avx-512-overview.
html.
[]Memsql.
http://www.
memsql.
com/.
[]Mysql.
https://www.
mysql.
com.
[]Pivotalgreenplum.
https://gpdb.
docs.
pivotal.
io/6-0/admin_guide/workload_mgmt.
html.
[]Postgresql.
https://www.
postgresql.
org.
[]Rocksdb.
https://github.
com/facebook/rocksdb/wiki.
[]Teradata.
http://www.
teradata.
com.
[]Tpc-hbenchmark.
http://www.
tpc.
org/tpch.
[]F.
Chang,J.
Dean,S.
Ghemawat,W.
C.
Hsieh,D.
A.
Wallach,M.
Burrows,T.
Chandra,A.
Fikes,andR.
E.
Gruber.
Bigtable:Adistributedstoragesystemforstructureddata.
ACMTrans.
Comput.
Syst.
,(),June.
[]B.
F.
Cooper,A.
Silberstein,E.
Tam,R.
Ramakrishnan,andR.
Sears.
Benchmarkingcloudservingsystemswithycsb.
InProceedingsofthestACMSymposiumonCloudComputing,SoCC,NewYork,NY,USA,.
AssociationforComputingMachinery.
[]J.
C.
Corbett,J.
Dean,M.
Epstein,A.
Fikes,C.
Frost,J.
J.
Furman,S.
Ghemawat,A.
Gubarev,C.
Heiser,P.
Hochschild,andetal.
Spanner:Google'sgloballydistributeddatabase.
ACMTrans.
Comput.
Syst.
,(),Aug.
.
[]S.
Das,V.
R.
Narasayya,F.
Li,andM.
Syamala.
CPUsharingtechniquesforperformanceisolationinmultitenantrelationaldatabase-as-a-service.
PVLDB,[]C.
Diaconu,C.
Freedman,E.
Ismert,P.
-A.
Larson,P.
Mittal,R.
Stonecipher,N.
Verma,andM.
Zwilling.
Hekaton:Sqlserver'smemory-optimizedoltpengine.
InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,pages–,.
[]F.
F¨arber,N.
May,W.
Lehner,P.
Groe,I.
M¨uller,H.
Rauhe,andJ.
Dees.
esaphanadatabase–anarchitectureoverview.
IEEEDataEng.
Bull.
[]J.
-F.
Im,K.
Gopalakrishna,S.
Subramaniam,M.
Shrivastava,A.
Tumbde,X.
Jiang,J.
Dai,S.
Lee,N.
Pawar,J.
Li,andetal.
Pinot:Realtimeolapformillionusers.
InProceedingsoftheInternationalConferenceonManagementofData,SIGMOD,NewYork,NY,USA,.
AssociationforComputingMachinery.
[]A.
KemperandT.
Neumann.
Hyper:Ahybridoltp&olapmainmemorydatabasesystembasedonvirtualmemorysnapshots.
InIEEEthInternationalConferenceonDataEngineering,pages–.
IEEE,.
[]M.
Kornacker,A.
Behm,V.
Bittorf,T.
Bobrovytsky,C.
Ching,A.
Choi,J.
Erickson,M.
Grund,D.
Hecht,M.
Jacobs,I.
Joshi,L.
Ku,D.
Kumar,A.
Leblang,N.
Li,I.
Pandis,H.
Robinson,D.
Rorke,S.
Rus,J.
Russell,D.
Tsirogiannis,S.
Wanderman-Milne,andM.
Yoder.
Impala:Amodern,open-sourceSQLengineforhadoop.
InCIDR,SeventhBiennialConferenceonInnovativeDataSystemsResearch,Asilomar,CA,USA,January-,,OnlineProceedings.
www.
cidrdb.
org,.
[]T.
Lahiri,S.
Chavan,M.
Colgan,D.
Das,A.
Ganesh,M.
Gleeson,S.
Hase,A.
Holloway,J.
Kamp,T.
Lee,J.
Loaiza,N.
Macnaughton,V.
Marwah,N.
Mukherjee,A.
Mullick,S.
Muthulingam,V.
Raja,M.
Roth,E.
Soylemez,andM.
Zait.
Oracledatabasein-memory:Adualformatin-memorydatabase.
InIEEEstInternationalConferenceonDataEngineering,pages–,.
[]A.
LakshmanandP.
Malik.
Cassandra:Adecentralizedstructuredstoragesystem.
SIGOPSOper.
Syst.
Rev.
,(),Apr.
.
[]A.
Lamb,M.
Fuller,R.
Varadarajan,N.
Tran,B.
Vandiver,L.
Doshi,andC.
Bear.
everticaanalyticdatabase:C-storeyearslater.
PVLDB,[]P.
-r.
Larson,A.
Birka,E.
N.
Hanson,W.
Huang,M.
Nowakiewicz,andV.
Papadimos.
Real-timeanalyticalprocessingwithsqlserver.
PVLDB,[]V.
Leis,P.
Boncz,A.
Kemper,andT.
Neumann.
Morsel-drivenparallelism:Anuma-awarequeryevaluationframeworkforthemany-coreage.
InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,SIGMOD,NewYork,NY,USA,.
AssociationforComputingMachinery.
[]Y.
Mao,E.
Kohler,andR.
T.
Morris.
Cachecrainessforfastmulticorekey-valuestorage.
InProceedingsofthethACMEuropeanConferenceonComputerSystems,EuroSys,NewYork,NY,USA,.
AssociationforComputingMachinery.
[]J.
M.
Patel,H.
Deshmukh,J.
Zhu,N.
Potti,Z.
Zhang,M.
Spehlmann,H.
Memisoglu,andS.
Saurabh.
Quickstep:Adataplatformbasedonthescaling-upapproach.
PVLDB,[]I.
Psaroudakis,T.
Scheuer,N.
May,andA.
Ailamaki.
Taskschedulingforhighlyconcurrentanalyticalandtransactionalmain-memoryworkloads.
InProceedingsoftheFourthInternationalWorkshoponAcceleratingDataManagementSystemsUsingModernProcessorandStorageArchitectures(ADMS),numberCONF,.
[]R.
Ramamurthy,D.
J.
DeWitt,andQ.
Su.
Acaseforfracturedmirrors.
InProceedingsofthethInternationalConferenceonVeryLargeDataBases,page–.
VLDBEndowment,.
[]V.
Raman,G.
Attaluri,R.
Barber,N.
Chainani,D.
Kalmuk,V.
KulandaiSamy,J.
Leenstra,S.
Lightstone,S.
Liu,G.
M.
Lohman,etal.
Dbwithbluacceleration:Somuchmorethanjustacolumnstore.
PVLDB,[]M.
StonebrakerandA.
Weisberg.
evoltdbmainmemorydbms.
IEEEDataEng.
Bull.
[]F.
Yang,E.
Tschetter,X.
Leaute,N.
Ray,G.
Merlino,andD.
Ganguli.
Druid:Areal-timeanalyticaldatastore.
InProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,SIGMOD,NewYork,NY,USA,.
AssociationforComputingMachinery.
[]M.
ZukowskiandP.
A.
Boncz.
Vectorwise:Beyondcolumnstores.
IEEEDataEng.
Bull.
3284
gcorelabs提供美国阿什本数据中心的GPU服务器(显卡服务器),默认给8路RTX2080Ti,服务器网卡支持2*10Gbps(ANX),CPU为双路Silver-4214(24核48线程),256G内存,1Gbps独享带宽仅需150欧元、10bps带宽仅需600欧元,不限流量随便跑吧。 官方网站 :https://gcorelabs.com/hosting/dedicated/gpu/ ...
今天遇到一个网友,他在一个服务器中搭建有十几个网站,但是他之前都是采集站点数据很大,但是现在他删除数据之后希望设置可能有索引的文章给予404跳转页面。虽然他程序有默认的404页面,但是达不到他引流的目的,他希望设置统一的404页面。实际上设置还是很简单的,我们找到他是Nginx还是Apache,直接在引擎配置文件中设置即可。这里有看到他采用的是宝塔面板,直接在他的Nginx中设置。这里我们找到当前...
最近AS9929线路比较火,联通A网,对标电信CN2,HostYun也推出了走联通AS9929线路的VPS主机,基于KVM架构,开设在洛杉矶机房,采用SSD硬盘,分为入门和高带宽型,最高提供500Mbps带宽,可使用9折优惠码,最低每月仅18元起。这是一家成立于2008年的VPS主机品牌,原主机分享组织(hostshare.cn),商家以提供低端廉价VPS产品而广为人知,是小成本投入学习练手首选。...
hologres为你推荐
太空国家在载人航天领域排名前三的国家是什么?摩拜超15分钟加钱怎么领取摩拜单车免费卷openeuler谁知道open opened close closed的区别吗杨紫别祝我生日快乐祝我生日快乐的歌词西部妈妈网加入新疆妈妈网如何通过验证?mathplayerjavascript 如何判断document.body.innerHTML是否为空www.4411b.com难道那www真的4411B坏了,还是4411b梗换com鑫域明了罗伦佐娜米开朗琪罗简介百度指数词百度指数是指,词不管通过什么样的搜索引擎进行搜索,都会被算成百度指数吗?bbs2.99nets.com这个"风情东南亚"网站有78kg.cn做网址又用bbs.风情东南亚.cn那么多此一举啊!
已备案域名注册 北京服务器租用 siteground 2014年感恩节 华为云主机 创梦 国外代理服务器软件 可外链相册 100mbps 国外ip加速器 windowsserver2008 服务器是什么意思 傲盾代理 免费服务器 crontab qq登陆空间 关闭空间申请 好看的空间图片 灵动鬼影实录3 灵动鬼影实录4 更多