proposedto

destoon 时间:2021-02-05 阅读:()

LearningBayesianNetworkswithThousandsofVariablesMauroScanagattaIDSIA,SUPSI,USILugano,Switzerlandmauro@idsia.
chCassioP.
deCamposQueen'sUniversityBelfastNorthernIreland,UKc.
decampos@qub.
ac.
ukGiorgioCoraniIDSIA,SUPSI,USILugano,Switzerlandgiorgio@idsia.
chMarcoZaffalonIDSIALugano,Switzerlandzaffalon@idsia.
chAbstractWepresentamethodforlearningBayesiannetworksfromdatasetscontainingthousandsofvariableswithouttheneedforstructureconstraints.
Ourapproachismadeoftwoparts.
Therstisanovelalgorithmthateffectivelyexploresthespaceofpossibleparentsetsofanode.
Itguidestheexplorationtowardsthemostpromisingparentsetsonthebasisofanapproximatedscorefunctionthatiscomputedinconstanttime.
Thesecondpartisanimprovementofanexistingordering-basedalgorithmforstructureoptimization.
Thenewalgorithmprovablyachievesahigherscorecomparedtoitsoriginalformulation.
Ournovelapproachconsistentlyoutperformsthestateoftheartonverylargedatasets.
1IntroductionLearningthestructureofaBayesiannetworkfromdataisNP-hard[2].
Wefocusonscore-basedlearning,namelyndingthestructurewhichmaximizesascorethatdependsonthedata[9].
Severalexactalgorithmshavebeendevelopedbasedondynamicprogramming[12,17],branchandbound[7],linearandintegerprogramming[4,10],shortest-pathheuristic[19,20].
Usuallystructurallearningisaccomplishedintwosteps:parentsetidenticationandstructureoptimization.
Parentsetidenticationproducesalistofsuitablecandidateparentsetsforeachvariable.
Structureoptimizationassignsaparentsettoeachnode,maximizingthescoreoftheresultingstructurewithoutintroducingcycles.
Theproblemofparentsetidenticationisunlikelytoadmitapolynomial-timealgorithmwithagoodqualityguarantee[11].
Thismotivatesthedevelopmentofeffectivesearchheuristics.
Usuallyhoweveronedecidesthemaximumin-degree(numberofparentspernode)kandthensimplycom-putesthescoreofallparentsets.
Atthatpointoneperformsstructuraloptimization.
AnexceptionisthegreedysearchoftheK2algorithm[3],whichhashoweverbeensupersededbythemoremodernapproachesmentionedabove.
Ahigherin-degreeimpliesalargersearchspaceandallowsachievingahigherscore;howeveritalsorequireshighercomputationaltime.
Whenchoosingthein-degreetheusermakesatrade-offbetweenthesetwoobjectives.
Howeverwhenthenumberofvariablesislarge,thein-degreeisIstitutoDalleMolledistudisull'IntelligenzaArticiale(IDSIA)ScuolauniversitariaprofessionaledellaSvizzeraitaliana(SUPSI)Universit`adellaSvizzeraitaliana(USI)1generallysettoasmallvalue,toallowtheoptimizationtobefeasible.
Thelargestdatasetanalyzedin[1]withtheGobnilp1softwarecontains413variables;itisanalyzedsettingk=2.
In[5]Gobnilpisusedforstructurallearningwith1614variables,settingk=2.
Theseareamongthelargestexamplesofscore-basedstructurallearningintheliterature.
Inthispaperweproposeanalgorithmthatperformsapproximatedstructurelearningwiththousandsofvariableswithoutconstraintsonthein-degree.
Itisconstitutedbyanovelapproachforparentsetidenticationandanovelapproachforstructureoptimization.
Asforparentsetidenticationweproposeananytimealgorithmthateffectivelyexploresthespaceofpossibleparentsets.
Itguidestheexplorationtowardsthemostpromisingparentsets,exploitinganapproximatedscorefunctionthatiscomputedinconstanttime.
Asforstructureoptimization,weextendtheordering-basedalgorithmof[18],whichprovidesaneffectiveapproachformodelselectionwithreducedcomputationalcost.
Ouralgorithmisguaranteedtondasolutionbetterthanorequaltothatof[18].
Wetestourapproachondatasetscontaininguptotenthousandvariables.
Asaperformanceindica-torweconsiderthescoreofthenetworkfound.
Ourparentsetidenticationapproachoutperformsconsistentlytheusualapproachofsettingthemaximumin-degreeandthencomputingthescoreofallparentsets.
OurstructureoptimizationapproachoutperformsGobnilpwhenlearningwithmorethan500nodes.
Allthesoftwareanddatasetsusedintheexperimentsareavailableonline.
2.
2StructureLearningofBayesianNetworksConsidertheproblemoflearningthestructureofaBayesianNetworkfromacompletedatasetofNinstancesD={D1,.
.
.
,DN}.
ThesetofncategoricalrandomvariablesisX={X1,.
.
.
,Xn}.
ThegoalistondthebestDAGG=(V,E),whereVisthecollectionofnodesandEisthecollectionofarcs.
EcanbedenedasthesetofparentsΠ1,.
.
.
,Πnofeachvariable.
DifferentscorescanbeusedtoassessthetofaDAG.
WeadopttheBIC,whichasymptoticallyapproximatestheposteriorprobabilityoftheDAG.
TheBICscoreisdecomposable,namelyitisconstitutedbythesumofthescoresoftheindividualvariables:BIC(G)==ni=1BIC(Xi,Πi)=ni=1π∈|Πi|x∈|Xi|Nx,πlogθx|πlogN2(|Xi|1)(|Πi|),whereθx|πisthemaximumlikelihoodestimateoftheconditionalprobabilityP(Xi=x|Πi=π),andNx,πrepresentsthenumberoftimes(X=x∧Πi=π)appearsinthedataset,and|·|indicatesthesizeoftheCartesianproductspaceofthevariablesgivenasarguments(insteadofthenumberofvariables)suchthat|Xi|isthenumberofstatesofXiand||=1.
Exploitingdecomposability,werstidentifyindependentlyforeachvariablealistofcandidateparentsets(parentsetidentication).
Thenbystructureoptimizationweselectforeachnodetheparentsetthatyieldsthehighestscorewithoutintroducingcycles.
3ParentsetidenticationForparentsetidenticationusuallyoneexploresallthepossibleparentsets,whosenumberhoweverincreasesasO(nk),wherekdenotesthemaximumin-degree.
Pruningrules[7]donotconsiderablyreducethesizeofthisspace.
Usuallytheparentsetsareexploredinsequentialorder:rstalltheparentsizeofsizeone,thenalltheparentsetsofsizetwo,andsoon,uptosizek.
Werefertothisapproachassequentialordering.
Ifthesolveradoptedforstructuraloptimizationisexact,thisstrategyallowstondthegloballyoptimumgraphgiventhechosenvalueofk.
Inordertodealwithalargenumberofvariablesitishowevernecessarysettingalowin-degreek.
Forinstance[1]adoptsk=2whendealingwiththelargestdataset(diabetes),whichcontains413variables.
In[5]Gobnilpisusedforstructurallearningwith1614variables,againsettingk=2.
Ahighervalueofkwouldmakethestructural1http://www.
cs.
york.
ac.
uk/aig/sw/gobnilp/2http://blip.
idsia.
ch2learningnotfeasible.
Yetalowkimpliesdroppingalltheparentsetswithsizelargerthank.
Someofthempossiblyhaveahighscore.
In[18]itisproposedtoadoptthesubsetΠcorrofthemostcorrelatedvariableswiththechildrenvariable.
Then[18]consideronlyparentsetswhicharesubsetsofΠcorr.
Howeverthisapproachisnotcommonlyadopted,possiblybecauseitrequiresspecifyingthesizeofΠcorr.
Indeed[18]acknowledgestheneedforfurtherinnovativeapproachesinordertoeffectivelyexplorethespaceoftheparentsets.
Weproposetwoanytimealgorithmstoaddressthisproblem.
Therstisthesimplest;wecallitgreedyselection.
Itstartsbyexploringalltheparentsetsofsizeoneandaddingthemtoalist.
Thenitrepeatsthefollowinguntiltimeisexpired:popsthebestscoringparentsetΠfromthelist,exploresallthesupersetsobtainedbyaddingonevariabletoΠ,andaddsthemtothelist.
Notethatingeneraltheparentsetschosenattwoadjoiningsteparenotrelatedtoeachother.
Thesecondapproach(independenceselection)adoptsamoresophisticatedstrategy,asexplainedinthefollowing.
3.
1ParentsetidenticationbyindependenceselectionIndependenceselectionusesanapproximationoftheactualBICscoreofaparentsetΠ,whichwedenoteasBIC,toguidetheexplorationofthespaceoftheparentsets.
TheBICofaparentsetconstitutedbytheunionoftwonon-emptyparentsetsΠ1andΠ2isdenedasfollows:BIC(X,Π1,Π2)=BIC(X,Π1)+BIC(X,Π2)+inter(X,Π1,Π2),(1)withΠ1∪Π2=Πandinter(X,Π1,Π2)=logN2(|X|1)(|Π1|+|Π2||Π1||Π2|1)BIC(X,).
IfwealreadyknowBIC(X,Π1)andBIC(X,Π2)frompreviouscalculations(andweknowBIC(X,)),thenBICcanbecomputedinconstanttime(withrespecttodataaccesses).
WethusexploitBICtoquicklyestimatethescoreofalargenumberofcandidateparentsetsandtodecidetheordertoexplorethem.
WeprovideaboundforthedifferencebetweenBIC(X,Π1,Π2)andBIC(X,Π1∪Π2).
Tothisend,wedenotebyiitheInteractionInformation[14]:ii(X;Y;Z)=I(X;Y|Z)I(X;Y),namelythedifferencebetweenthemutualinformationofXandYconditionalonZandtheunconditionalmutualinformationofXandY.
Theorem1.
LetXbeanodeofGandΠ=Π1∪Π2beaparentsetforXwithΠ1∩Π2=andΠ1,Π2non-empty.
ThenBIC(X,Π)=BIC(X,Π1,Π2)+N·ii(Π1;Π2;X),whereiiistheInteractionInformationestimatedfromdata.
Proof.
BIC(X,Π1∪Π2)BIC(X,Π1,Π2)=BIC(X,Π1∪Π2)BIC(X,Π1)BIC(X,Π2)inter(X,Π1,Π2)=x,π1,π2Nx,π1,π2logθx|π1,π2log(θx|π1θx|π2)+xNxlogθx=x,π1,π2Nx,π1,π2logθx|π1,π2logθx|π1θx|π2θx=x,π1,π2Nx,π1,π2logθx|π1,π2θxθx|π1θx|π2=x,π1,π2N·θx,π1,π2logθπ1,π2|xθπ1θπ2θπ1|xθπ2|xθπ1,π2=Nx,π1,π2θx,π1,π2logθπ1,π2|xθπ1|xθπ2|xπ1,π2θπ1,π2logθπ1,π2θπ1θπ2=N·(I(Π1;Π2|X)I(Π1;Π2))=N·ii(Π1;Π2;X),whereI(·)denotesthe(conditional)mutualinformationestimatedfromdata.
Corollary1.
LetXbeanodeofG,andΠ=Π1∪Π2beaparentsetofXsuchthatΠ1∩Π2=andΠ1,Π2non-empty.
Then|BIC(X,Π)BIC(X,Π1,Π2)|≤Nmin{H(X),H(Π1),H(Π2)}.
Proof.
Theorem1statesthatBIC(X,Π)=BIC(X,Π1,Π2)+N·ii(Π1;Π2;X).
Wenowdeviseboundsforinteractioninformation,recallingthatmutualinformationandconditionalmutualinfor-mationarealwaysnon-negativeandachievetheirmaximumvalueatthesmallestentropyHoftheir3argument:H(Π2)≤I(Π1;Π2)≤ii(Π1;Π2;X)≤I(Π1;Π2|X)≤H(Π2).
ThetheoremisprovenbysimplypermutingthevaluesΠ1;Π2;Xintheiiofsuchequation.
Sinceii(Π1;Π2;X)=I(Π1;Π2|X)I(Π1;Π2)=I(X;Π1|Π2)I(X;Π1)=I(Π2;X|Π1)I(Π2;X),theboundsforiiarevalid.
Weknowthat0≤H(Π)≤log(|Π|)foranysetofnodesΠ,hencetheresultofCorollary1couldbefurthermanipulatedtoachieveaboundforthedifferencebetweenBICandBICofatmostNlog(min{|X|,|Π1|,|Π2|}).
However,Corollary1isstrongerandcanstillbecomputedefcientlyasfollows.
WhencomputingBIC(X,Π1,Π2),weassumedthatBIC(X,Π1)andBIC(X,Π2)hadbeenprecomputed.
Assuch,wecanalsohaveprecomputedthevaluesH(Π1)andH(Π2)atthesametimeastheBICscoreswerecomputed,withoutanysignicantincreaseofcomplexity(whencomputingBIC(X,Π)foragivenΠ,justusethesameloopoverthedatatocomputeH(Π)).
Corollary2.
LetXbeanodeofG,andΠ=Π1∪Π2beaparentsetforthatnodewithΠ1∩Π2=andΠ1,Π2non-empty.
IfΠ1⊥⊥Π2,thenBIC(X,Π1∪Π2)≥BIC(X,Π1∪Π2).
IfΠ1⊥⊥Π2|X,thenBIC(X,Π1∪Π2)≤BIC(X,Π1∪Π2).
Iftheinteractioninformationii(Π1;Π2;X)=0,thenBIC(X,Π1∪Π2)=BIC(X,Π1,Π2).
Proof.
ItfollowsfromTheorem1consideringthatmutualinformationI(Π1,Π2)=0ifΠ1andΠ2areindependent,whileI(Π1,Π2|X)=0ifΠ1andΠ2areconditionallyindependent.
WenowdeviseanovelpruningstrategyforBICbasedontheboundsofCorollaries1and2.
Theorem2.
LetXbeanodeofG,andΠ=Π1∪Π2beaparentsetforthatnodewithΠ1∩Π2=andΠ1,Π2non-empty.
LetΠΠ.
IfBIC(X,Π1,Π2)+logN2(|X|1)|Π|>Nmin{H(X),H(Π1),H(Π2)},thenΠanditssupersetsarenotoptimalandcanbeignored.
Proof.
BIC(X,Π1,Π2)Nmin{H(X),H(Π1),H(Π2)}+logN2(|X|1)|Π|>0impliesBIC(Π)+logN2(|X|1)|Π|>0,andTheorem4of[6]prunesΠandallitssupersets.
Thuswecanefcientlycheckwhetherlargepartsofthesearchspacecanbediscardedbasedontheseresults.
WenotethatCorollary1andhenceTheorem2areverygenericinthechoiceofΠ1andΠ2,eventhoughusuallyoneofthemistakenasasingleton.
3.
2IndependenceselectionalgorithmWenowdescribethealgorithmthatexploitstheBICscoreinordertoeffectivelyexplorethespaceoftheparentsets.
Itusestwolists:(1)open:alistfortheparentsetstobeexplored,orderedbytheirBICscore;(2)closed:alistofalreadyexploredparentsets,alongwiththeiractualBICscore.
ThealgorithmstartswiththeBICoftheemptysetcomputed.
FirstitexploresalltheparentsetsofsizeoneandsavestheirBICscoreintheclosedlist.
Thenitaddstotheopenlisteveryparentsetofsizetwo,computingtheirBICscoresinconstanttimeonthebasisofthescoresavailablefromtheclosedlist.
Itthenproceedsasfollowsuntilallelementsinopenhavebeenprocessed,orthetimeisexpired.
ItextractsfromopentheparentsetΠwiththebestBICscore;itcomputesitsBICscoreandaddsittotheclosedlist.
ItthenlooksforallthepossibleexpansionsofΠobtainedbyaddingasinglevariableY,suchthatΠ∪Yisnotpresentinopenorclosed.
ItaddsthemtoopenwiththeirBIC(X,Π,Y)scores.
EventuallyitalsoconsidersalltheexploredsubsetsofΠ.
Itsafely[7]prunesΠifanyofitssubsetsyieldsahigherBICscorethanΠ.
Thealgorithmreturnsthecontentoftheclosedlist,prunedandorderedbytheBICscore.
Suchlistbecomesthecontentoftheso-calledcacheofscoresforX.
Theprocedureisrepeatedforeveryvariableandcanbeeasilyparallelized.
Figure1comparessequentialorderingandindependenceselection.
Itshowsthatindependenceselectionismoreeffectivethansequentialorderingbecauseitbiasesthesearchtowardsthehighest-scoringparentsets.
4StructureoptimizationThegoalofstructureoptimizationistochoosetheoverallhighestscoringparentsets(measuredbythesumofthelocalscores)withoutintroducingdirectedcyclesinthegraph.
Westartfromtheapproachproposedin[18](whichwecallordering-basedsearchorOBS),whichexploitsthefact45001,0002,0001,8001,6001,400IterationBIC(a)Sequentialordering.
5001,0002,0001,8001,6001,400IterationBIC(b)Indep.
selectionordering.
Figure1:Explorationoftheparentsetsspaceforagivenvariableperformedbysequentialorderingandindependenceselection.
Eachpointreferstoadistinctparentset.
thattheoptimalnetworkcanbefoundintimeO(Ck),whereC=ni=1ciandciisthenumberofelementsinthecacheofscoresofXi,ifanorderingoverthevariablesisgiven.
3Θ(k)isneededtocheckwhetherallthevariablesinaparentsetforXcomebeforeXintheordering(asimplearraycanbeusedasdatastructureforthischecking).
Thisimpliesworkingonthesearchspaceofthepossibleorderings,whichisconvenientasitissmallerthanthespaceofnetworkstructures.
Multipleorderingsaresampledandevaluated(differenttechniquescanbeusedforguidingthesampling).
ForeachsampledtotalorderingovervariablesX1,Xn,thenetworkisconsistentwiththeorderifXi:X∈Πi:XXi.
Anetworkconsistentwithagivenorderingautomaticallysatisestheacyclicityconstraint.
Thisallowsustochooseindependentlythebestparentsetofeachnode.
Moreover,foragiventotalorderingV1,Vnofthevariables,thealgorithmtriestoimprovethenetworkbyagreedysearchswappingprocedure:ifthereisapairVj,Vj+1suchthattheswappedorderingwithVjinplaceofVj+1(andviceversa)yieldsbetterscoreforthenetwork,thenthesenodesareswappedandthesearchcontinues.
Oneadvantageofthisswappingoverextrarandomorderingsisthatsearchingforitandupdatingthenetwork(ifagoodswapisfound)onlytakestimeO((cj+cj+1)·kn)(whichcanbespedupascjonlyisinspectedforparentssetscontainingVj+1,andcj+1isonlyprocessedifVj+1hasVjasparentinthecurrentnetwork),whileanewsampledorderingwouldtakeO(n+Ck)(theswappingapproachisusuallyfavourableifciis(n),whichisaplausibleassumption).
Weemphasizethattheuseofkhereissolewiththepurposeofanalyzingthecomplexityofthemethods,sinceourparentsetidenticationapproachdoesnotrelyonaxedvaluefork.
However,theconsistencyruleofOBSisquiterestricting.
Whileitsurelyrefusesallcyclicstructures,italsorulesoutsomeacycliconeswhichcouldbecapturedbyinterpretingtheorderinginaslightlydifferentmanner.
WeproposeanovelconsistencyruleforagivenorderingwhichprocessesthenodesinV1,VnfromVntoV1(OBScandoitinanyorder,asthelocalparentsetscanbechosenindependently)andwedenetheparentsetofVjsuchthatitdoesnotintroduceacycleinthecurrentpartialnetwork.
Thisallowsback-arcsintheorderingfromanodeVjtoitssuccessors,aslongasthisdoesnotintroduceacycle.
WecallthisideaacyclicselectionOBS(orsimplyASOBS).
Becauseweneedtocheckforcyclesateachstepofconstructingthenetworkforagivenordering,atarstglancethealgorithmseemstobeslower(timecomplexityofO(Cn)againstO(Ck)forOBS;notethisdifferenceisonlyrelevantasweintendtoworkwithlargevaluesn).
Surprisingly,wecanimplementitinthesameoveralltimecomplexityofO(Ck)asfollows.
1.
BuildandkeepaBooleansquarematrixmtomarkwhicharethedescendantsofnodes(m(X,Y)tellswhetherYisdescendantofX).
Startitallfalse.
2.
ForeachnodeVjintheorder,withj=n,1:(a)Gothroughtheparentsetsandpickthebestscoringoneforwhichallcontainedpar-entsarenotdescendantsofVj(thistakestimeO(cik)ifparentsetsarekeptaslists).
(b)BuildatodolistwiththedescendantsofVjfromthematrixrepresentationandasso-ciateanemptytodolisttoallancestorsofVj.
(c)StartthetodolistsoftheparentsofVjwiththedescendantsofVj.
(d)ForeachancestorXofVj(ancestorswillbeiterativelyvisitedbyfollowingadepth-rstgraphsearchprocedureusingthenetworkbuiltsofar;weprocessanodeafter3O(andΘ(·)shallbeunderstoodasusualasymptoticnotationfunctions.
5itschildrenwithnon-emptytodolistshavebeenalreadyprocessed;thesearchstopswhenallancestorsarevisited):i.
ForeachelementYinthetodolistofX,ifm(X,Y)istrue,thenignoreYandmoveon;otherwisesetm(X,Y)totrueandaddYtothetodoofparentsofX.
Letusanalyzethecomplexityofthemethod.
Step2atakesoveralltimeO(Ck)(alreadyconsideringtheouterloop).
Step2btakesoveralltimeO(n2)(alreadyconsideringtheouterloop).
Steps2cand2(d)iwillbeanalyzedbasedonthenumberofelementsonthetodolistsandthetimetoprocesstheminanamortizedway.
Notethatthetimecomplexityisdirectlyrelatedtothenumberofelementsthatareprocessedfromthetodolists(wecansimplylooktothemomentthattheyleavealist,astheirinclusioninthelistswillbeinequalnumber).
Wewillnowcountthenumberoftimesweprocessanelementfromatodolist.
Thisnumberisoverallbounded(overallexternalloopcycles)bythenumberoftimeswecanmakeacellofmatrixmturnfromfalsetotrue(whichisO(n2))plusthenumberoftimesweignoreanelementbecausethematrixcellwasalreadysettotrue(whichisatmostO(n)pereachVj,asthisisthemaximumnumberofdescendantsofVjandeachofthemcanfallintothiscategoryonlyonce,soagainthereareO(n2)timesintotal).
Inotherwords,eachelementbeingremovedfromatodolistiseitherignored(matrixalreadysettotrue)oranentryinthematrixofdescendantsischangedfromfalsetotrue,andthiscanonlyhappenO(n2)times.
HencethetotaltimecomplexityisO(Ck+n2),whichisO(Ck)foranyCgreaterthann2/k(averyplausiblescenario,aseachlocalcacheofavariableusuallyhasmorethann/kelements).
Moreover,wehavethefollowinginterestingpropertiesofthisnewmethod.
Theorem3.
Foragivenordering,thenetworkobtainedbyASOBShasscoreequalthanorgreatertothatobtainedbyOBS.
Proof.
ItfollowsimmediatelyfromthefactthattheconsistencyruleofASOBSgeneralizesthatofOBS,thatis,foreachnodeVjwithj=n,1,ASOBSallowsallparentsetsallowedbyOBSandalsoothers(containingback-arcs).
Theorem4.
ForagivenorderingdenedbyV1,VnandacurrentgraphGconsistentwith,ifOBSconsistencyruleallowstheswappingofVj,Vj+1andleadstoimprovingthescoreofG,thentheconsistencyruleofASOBSallowsthesameswappingandachievesthesameimprovementinscore.
Proof.
ItfollowsimmediatelyfromthefactthattheconsistencyruleofASOBSgeneralizesthatofOBS,sofromagivengraphG,ifaswappingispossibleunderOBSrules,thenitisalsopossibleunderASOBSrules.
5ExperimentsWecomparethreedifferentapproachesforparentsetidentication(sequential,greedyselectionandindependenceselection)andthreedifferentapproaches(Gobnilp,OBSandASOBS)forstructureoptimization.
Thisyieldsninedifferentapproachesforstructurallearning,obtainedbycombiningallthemethodsforparentsetidenticationandstructureoptimization.
NotethatOBShasbeenshownin[18]tooutperformothergreedy-tabusearchoverstructures,suchasgreedyhill-climbingandoptimal-reinsertion-searchmethods[15].
Weallowoneminutepervariabletoeachapproachforparentsetidentication.
Wesetthemaximumin-degreetok=6,ahighvaluethatallowslearningevencomplexstructures.
Noticethatournovelapproachdoesnotneedamaximumin-degree.
Wesetamaximumin-degreetoputourapproachanditscompetitorsonthesameground.
Oncecomputedthescoresoftheparentsetsweruneachsolver(Gobnilp,OBS,ASOBS)for24hours.
Foragivendatasetthecomputationisperformedonthesamemachine.
TheexplicitgoalofeachapproachforbothparentsetidenticationandstructureoptimizationistomaximizetheBICscore.
WethenmeasuretheBICscoreoftheBayesiannetworkseventuallyobtainedasperformanceindicator.
ThedifferenceintheBICscorebetweentwoalternativenetworksisanasymptoticapproximationofthelogarithmoftheBayesfactor.
TheBayesfactoristheratiooftheposteriorprobabilitiesoftwocompetingmodels.
LetusdenotebyBIC1,2=BIC1-BIC2thedifferencebetweentheBICscoreofnetwork1andnetwork2.
PositivevaluesofBIC1,2imply6DatasetnDatasetnDatasetnDatasetnAudio100Retail135MSWeb294Reuters-52889Jester100Pumsb-star163Book500C20NG910Netix100DNA180EachMovie500BBC1058Accidents111Kosarek190WebKB839Ad1556Table1:Datasetssortedaccordingtothenumbernofvariables.
evidenceinfavorofnetwork1.
Theevidenceinfavorofnetwork1isrespectively[16]{weak,positive,strong,verystrong}ifBIC1,2isbetween{0and2;2and6;6and10;beyond10}.
5.
1LearningfromdatasetsWeconsider16datasetsalreadyusedintheliteratureofstructurelearning,rstlyintroducedin[13]and[8].
Werandomlyspliteachdatasetintothreesubsetsofinstances.
Thisyields48datasets.
TheapproachesforparentsetidenticationarecomparedinTable2.
Foreachxedstructureop-timizationapproach,welearnthenetworkstartingfromthelistofparentsetscomputedbyinde-pendenceselection(IS),greedyselection(GS)andsequentialselection(SQ).
InturnweanalyzeBICIS,GSandBICIS,SQ.
ApositiveBICmeansthatindependenceselectionyieldsanetworkwithhigherBICscorethanthenetworkobtainedusinganalternativeapproachforparentsetiden-tication;viceversafornegativevaluesofBIC.
Inmostcases(seeTable2)BIC>10,implyingverystrongsupportforthenetworklearnedusingindependenceselection.
Wefurtheranalyzetheresultsthroughasign-test.
ThenullhypothesisofthetestisthattheBICscoreofthenetworklearnedunderindependenceselectionissmallerthanorequivalenttotheBICscoreofthenetworklearnedusingthealternativeapproach(greedyselectionorsequentialselectiondependingonthecase).
IfadatasetyieldsaBICwhichis{verynegative,stronglynegative,negative,neutral},itsupportsthenullhypothesis.
IfadatasetsyieldsaBICscorewhichis{positive,stronglypositive,extremelypositive},itsupportsthealternativehypothesis.
Underanyxedstructuresolver,thesigntestrejectsthenullhypothesis,providingsignicantevidenceinfavorofindependenceselection.
Inthefollowingwhenwefurthercitethesigntestwerefertosametypeofanalysis:thesigntestanalyzesthecountsoftheBICwhichareinfavorandagainstagivenmethod.
Asforstructureoptimization,ASOBSachieveshigherBICscorethanOBSinallthe48datasets,undereverychosenapproachforparentsetidentication.
TheseresultsconrmtheimprovementofASOBSoverOBS,theoreticallyproveninSection4.
InmostcasestheBICinfavorofASOBSislargerthan10.
ThedifferenceinfavorofASOBSissignicant(signtest,p10)443844304432Stronglypositive(610)212120211921Stronglypositive(610inmostcases.
TakeforinstanceGobnilpforstructureopti-mization.
ThenindependenceselectionyieldsaBIC>10in18/20caseswhencomparedtoGSandBIC>10in19/20caseswhencomparedtoSQ.
Similarresultsareobtainedusingtheothersolversforstructureoptimization.
StrongresultssupportalsoASOBSagainstOBSandGobnilp.
Undereveryapproachforparentsetidentication,BIC>10isobtainedin20/20caseswhencomparingASOBSandOBS.
ThenumberofcasesinwhichASOBSobtainsBIC>10whencomparedagainstGobnilprangesbetween17/20and19/20dependingontheapproachadoptedforparentsetselection.
ThesuperiorityofASOBSoverbothOBSandGobnilpissignicant(signtest,ptothou-sandsofnodeswithoutconstraintsonthemaximumin-degree.
ThecurrentresultsrefertotheBICscore,butinfuturethemethodologycouldbeextendedtootherscoringfunctions.
AcknowledgmentsWorkpartiallysupportedbytheSwissNSFgrantn.
200021146606/1.
4http://www.
bnlearn.
com/bnrepository/8References[1]M.
BartlettandJ.
Cussens.
IntegerlinearprogrammingfortheBayesiannetworkstructurelearningproblem.
ArticialIntelligence,2015.
inpress.
[2]D.
M.
Chickering,C.
Meek,andD.
Heckerman.
Large-samplelearningofBayesiannetworksishard.
InProceedingsofthe19stConferenceonUncertaintyinArticialIntelligence,UAI-03,pages124–133.
MorganKaufmann,2003.
[3]G.
F.
CooperandE.
Herskovits.
ABayesianmethodfortheinductionofprobabilisticnetworksfromdata.
MachineLearning,9(4):309–347,1992.
[4]J.
Cussens.
Bayesiannetworklearningwithcuttingplanes.
InProceedingsofthe27stCon-ferenceAnnualConferenceonUncertaintyinArticialIntelligence,UAI-11,pages153–160.
AUAIPress,2011.
[5]J.
Cussens,B.
Malone,andC.
Yuan.
IJCAI2013tutorialonoptimalalgorithmsforlearningBayesiannetworks(https://sites.
google.
com/site/ijcai2013bns/slides),2013.
[6]C.
P.
deCamposandQ.
Ji.
EfcientstructurelearningofBayesiannetworksusingconstraints.
JournalofMachineLearningResearch,12:663–689,2011.
[7]C.
P.
deCampos,Z.
Zeng,andQ.
Ji.
StructurelearningofBayesiannetworksusingconstraints.
InProceedingsofthe26stAnnualInternationalConferenceonMachineLearning,ICML-09,pages113–120,2009.
[8]J.
V.
HaarenandJ.
Davis.
Markovnetworkstructurelearning:Arandomizedfeaturegenerationapproach.
InProceedingsofthe26stAAAIConferenceonArticialIntelligence,2012.
[9]D.
Heckerman,D.
Geiger,andD.
M.
Chickering.
LearningBayesiannetworks:Thecombina-tionofknowledgeandstatisticaldata.
MachineLearning,20:197–243,1995.
[10]T.
Jaakkola,D.
Sontag,A.
Globerson,andM.
Meila.
LearningBayesianNetworkStructureusingLPRelaxations.
InProceedingsofthe13stInternationalConferenceonArticialIntel-ligenceandStatistics,AISTATS-10,pages358–365,2010.
[11]M.
Koivisto.
ParentassignmentishardfortheMDL,AIC,andNMLcosts.
InProceedingsofthe19stannualconferenceonLearningTheory,pages289–303.
Springer-Verlag,2006.
[12]M.
KoivistoandK.
Sood.
ExactBayesianStructureDiscoveryinBayesianNetworks.
JournalofMachineLearningResearch,5:549–573,2004.
[13]D.
LowdandJ.
Davis.
LearningMarkovnetworkstructurewithdecisiontrees.
InGeoffreyI.
Webb,BingLiu0001,ChengqiZhang,DimitriosGunopulos,andXindongWu,editors,Pro-ceedingsofthe10stInt.
ConferenceonDataMining(ICDM2010),pages334–343,2010.
[14]W.
J.
McGill.
Multivariateinformationtransmission.
Psychometrika,19(2):97–116,1954.
[15]A.
MooreandW.
Wong.
Optimalreinsertion:AnewsearchoperatorforacceleratedandmoreaccurateBayesiannetworkstructurelearning.
InT.
FawcettandN.
Mishra,editors,Proceedingsofthe20stInternationalConferenceonMachineLearning,ICML-03,pages552–559,MenloPark,California,August2003.
AAAIPress.
[16]A.
E.
Raftery.
Bayesianmodelselectioninsocialresearch.
Sociologicalmethodology,25:111–164,1995.
[17]T.
SilanderandP.
Myllymaki.
AsimpleapproachforndingthegloballyoptimalBayesiannetworkstructure.
InProceedingsofthe22ndConferenceonUncertaintyinArticialIntelli-gence,UAI-06,pages445–452,2006.
[18]M.
TeyssierandD.
Koller.
Ordering-basedsearch:Asimpleandeffectivealgorithmforlearn-ingBayesiannetworks.
InProceedingsofthe21stConferenceonUncertaintyinArticialIntelligence,UAI-05,pages584–590,2005.
[19]C.
YuanandB.
Malone.
AnimprovedadmissibleheuristicforlearningoptimalBayesiannetworks.
InProceedingsofthe28stConferenceonUncertaintyinArticialIntelligence,UAI-12,2012.
[20]C.
YuanandB.
Malone.
LearningoptimalBayesiannetworks:Ashortestpathperspective.
JournalofArticialIntelligenceResearch,48:23–65,2013.
9

展开全文

proposedto相关文档

模板destoon二次开发模板和调用语法参考

分类destoon调用当前栏目分类及子分类和三级分类的方法

destoondestoon多少人是从PHP168过来的啊？Destoon的B2B很好，强烈支持你们中国企业在线一般都在哪里找企业信息啊？支付宝调整还款日蚂蚁借呗还款日能改吗重庆杨家坪猪肉摊主杀人重庆九龙坡区治安好么 asp.net网页制作如何用DREAMWEAVER ASP.NET 做网页 360免费建站我用的360免费建站，但自己买的一级域名要先备案，360不提供备案，我要怎么做才能把我的域名绑定网站啊？35互联在中国哪家服务商提供的企业邮箱好呢?可信网站可信网站认证怎么做？贵不？价格大概是多少？400电话查询400电话号码可以查询归属地吗？办理400电话是不是很贵？无忧登陆无忧登陆这个软件有毒吗武汉域名注册 warez zpanel linkcloud 火车票抢票攻略 tk域名 xen 网页背景图片 gg广告蜗牛魔方 php空间推荐泉州电信双11秒杀共享主机 cdn加速是什么新睿云彩虹云架设邮件服务器 789 免费asp空间申请更多

proposedto

华纳云，3折低至优惠云服务器，独立服务器/高防御服务器低至6折，免备案香港云服务器CN2 GIA三网直连线路月付18元起，10Mbps带宽不限流量

腾讯云CVM云服务器大硬盘方案400GB和800GB数据盘方案

GeorgeDatacenter：洛杉矶/达拉斯/芝加哥/纽约vps云服务器;2核/8GB/250GB/2TB流量/1Gbps端口,$84/年