proportionwww.ddd138.com

www.ddd138.com 时间:2021-05-24 阅读:()

VirginiaCommonwealthUniversityVCUScholarsCompassComputerSciencePublicationsDept.
ofComputerScience2017Ensemblelearningfordatastreamanalysis:AsurveyBartoszKrawczykVirginiaCommonwealthUniversity,bkrawczyk@vcu.
eduLeandroL.
MinkuUniversityofLeicesterJooGamaUniversityofPortoJerzyStefanowskiPoznańUniversityofTechnologyMichaWozniakWrocawUniversityofScienceandTechnologyFollowthisandadditionalworksat:http://scholarscompass.
vcu.
edu/cmsc_pubsPartoftheComputerEngineeringCommonsThisArticleisbroughttoyouforfreeandopenaccessbytheDept.
ofComputerScienceatVCUScholarsCompass.
IthasbeenacceptedforinclusioninComputerSciencePublicationsbyanauthorizedadministratorofVCUScholarsCompass.
Formoreinformation,pleasecontactlibcompass@vcu.
edu.
Downloadedfromhttp://scholarscompass.
vcu.
edu/cmsc_pubs/39InformationFusion37(2017)132–156ContentslistsavailableatScienceDirectInformationFusionjournalhomepage:www.
elsevier.
com/locate/inffusEnsemblelearningfordatastreamanalysis:AsurveyBartoszKrawczyka,,LeandroL.
Minkub,JooGamac,JerzyStefanowskid,MichaWozniakeaDepartmentofComputerScience,VirginiaCommonwealthUniversity,Richmond,VA23284,USAbDepartmentofComputerScience,UniversityofLeicester,Leicester,UKcLaboratoryofArticialIntelligenceandDecisionSupport,UniversityofPorto,Porto,PortugaldInstituteofComputingScience,PoznanUniversityofTechnology,60-965Poznan,PolandeDepartmentofSystemsandComputerNetworks,WrocawUniversityofScienceandTechnology,Wyb.
Wyspianskiego27,50-370Wrocaw,PolandarticleinfoArticlehistory:Received15December2016Revised31January2017Accepted1February2017Availableonline3February2017Keywords:EnsemblelearningDatastreamsConceptdriftOnlinelearningNon-stationaryenvironmentsabstractInmanyapplicationsofinformationsystemslearningalgorithmshavetoactindynamicenvironmentswheredataarecollectedintheformoftransientdatastreams.
Comparedtostaticdatamining,process-ingstreamsimposesnewcomputationalrequirementsforalgorithmstoincrementallyprocessincomingexampleswhileusinglimitedmemoryandtime.
Furthermore,duetothenon-stationarycharacteristicsofstreamingdata,predictionmodelsareoftenalsorequiredtoadapttoconceptdrifts.
Outofseveralnewproposedstreamalgorithms,ensemblesplayanimportantrole,inparticularfornon-stationaryen-vironments.
Thispapersurveysresearchonensemblesfordatastreamclassicationaswellasregressiontasks.
Besidespresentingacomprehensivespectrumofensembleapproachesfordatastreams,wealsodiscussadvancedlearningconceptssuchasimbalanceddatastreams,noveltydetection,activeandsemi-supervisedlearning,complexdatarepresentationsandstructuredoutputs.
Thepaperconcludeswithadiscussionofopenresearchproblemsandlinesoffutureresearch.
PublishedbyElsevierB.
V.
1.
IntroductionTheanalysisofhugevolumesofdataisrecentlythefocusofintenseresearch,becausesuchmethodscouldgiveacompetitiveadvantageforagivencompany.
Forcontemporaryenterprises,thepossibilityofmakingappropriatebusinessdecisionsonthebasisofknowledgehiddeninstoreddataisoneofthecriticalsuccessfactors.
Similarinterestsinexploringnewtypesofdataarepresentinmanyotherareasofhumanactivity.
Inmanyoftheseapplications,oneshouldalsotakeintocon-siderationthatdatausuallycomescontinuouslyintheformofdatastreams.
Representativeexamplesincludenetworkanaly-sis,nancialdataprediction,traccontrol,sensormeasurementprocessing,ubiquitouscomputing,GPSandmobiledevicetrack-ing,user'sclicklogmining,sentimentanalysis,andmanyothers[19,59,60,203,208].
DatastreamsposenewchallengesformachinelearninganddataminingasthetraditionalmethodshavebeendesignedforstaticdatasetsandarenotcapableofecientlyanalyzingfastCorrespondingauthor.
E-mailaddresses:bkrawczyk@vcu.
edu(B.
Krawczyk),leandro.
minku@leicester.
ac.
uk(L.
L.
Minku),jgama@fep.
up.
pt(J.
Gama),jerzy.
stefanowski@cs.
put.
poznan.
pl(J.
Stefanowski),michal.
wozniak@pwr.
edu.
pl(M.
Wozniak).
growingamountsofdataandtakingintoconsiderationcharacter-isticssuchas:Limitedcomputationalresourcesasmemoryandtime,aswellastightneedstomakepredictionsinreasonabletime.
Thephenomenoncalledconceptdrift,i.
e.
,changesindistribu-tionofdatawhichoccurinthestreamovertime.
Thiscoulddramaticallydeteriorateperformanceoftheusedmodel.
Datamaycomesoquicklyinsomeapplicationsthatlabelingallitemsmaybedelayedorsometimesevenimpossible.
Outofseveraltasksstudiedindatastreams[60],supervisedclassicationhasreceivedthemostresearchattention.
Itisoftenappliedtosolvemanyreallifeproblemssuchasdiscoveringclientpreferencechanges,spamltering,frauddetection,andmedicaldi-agnosistoenumerateonlyafew.
Theaforementionedspeed,sizeandevolvingnatureofdatastreamsposetheneedfordevelop-ingnewalgorithmicsolutions.
Inparticular,classiersdedicatedtodatastreamshavetopresentadaptationabilities,becausethedis-tributionofthedatainmotioncanchange.
Totacklethesechal-lenges,severalnewalgorithms,suchasVFDT[44],specializedslid-ingwindows,samplingmethods,driftdetectorsandadaptiveen-sembleshavebeenintroducedinthelastdecade.
Inouropinion,ensemblemethodsareoneofthemostpromis-ingresearchdirections[188].
Anensemble,alsocalledamultipleclassierorcommittee,isasetofindividualcomponentclassi-http://dx.
doi.
org/10.
1016/j.
inffus.
2017.
02.
0041566-2535/PublishedbyElsevierB.
V.
B.
Krawczyketal.
/InformationFusion37(2017)132–156133Fig.
1.
Adiagramoftheclassierensemble.
erswhosepredictionsarecombinedtopredictnewincomingin-stances.
Ensembleshavebeenshowntobeanecientwayofim-provingpredictiveaccuracyor/anddecomposingacomplex,di-cultlearningproblemintoeasiersub-problems.
ThemainmotivationforusingclassierensemblesisthenofreelunchtheoremformulatedbyWolpert[185].
Accordingtoit,thereisnotasingleclassierthatisappropriateforallthetasks,sinceeachalgorithmhasitsowndomainofcompetence.
Usually,wehaveapoolofclassiersatourdisposaltosolveagivenproblem.
Turner[176]showedthataveragingoutputsofaninnitenum-berofunbiasedandindependentclassiersmayleadtothesameresponseastheoptimalBayesclassier[48].
Ho[75]underlinedthatadecisioncombinationfunctionmustreceiveusefulrepresen-tationofeachindividualdecision.
Specically,theyconsideredsev-eralmethodsbasedondecisionranks,suchasBordacount.
WealsohavetomentionanotherofHo'swork[74],whodis-tinguishedtwomainapproachestodesignaclassierensemble:Coverageoptimizationfocusesonthegenerationofasetofmu-tuallycomplementaryclassiers,whichmaybecombinedtoachieveoptimalaccuracyusingaxeddecisioncombinationfunction.
Decisionoptimizationconcentratesondesigningandtraininganappropriatedecisioncombinationfunction,whileasetofindi-vidualmodelsisgiveninadvance[151].
Otherimportantissuesthathavebetakenintoconsiderationwhenbuildingclassierensemblesarethefollowing:Proposinginterconnectionsamongindividualclassiersintheensemble.
Selectingapoolofdiverseandcomplementaryindividualclas-siersfortheensemble.
Proposingacombinationrule,responsibleforthenaldeci-sionoftheensemble,whichshouldexploitthestrengthsofthecomponentclassiers.
ThegeneraldiagramofaclassierensembleisdepictedinFig.
1.
Theselectionofclassiersfortheensembleisakeyfactor.
Anidealensembleincludesmutuallycomplementaryindividualclassi-erswhicharecharacterizedbyhighdiversityandaccuracy[106].
Itisgenerallyagreedthatnotonlytheaccuracy,butalsothedi-versityoftheclassiersisakeyingredientforincreasingtheen-semble'saccuracy[195].
Classiersmustbeselectedtoobtainpos-itiveresultsfromtheircombination.
Sharkleyetal.
[159]proposedfourlevelsofdiversitybasedonthemajorityvoterule,coincidenterror,andthepossibilityofatleastonecorrectanswerofensem-blemembers.
Brownetal.
[24]reectedthatitisinappropriateforthecasewherediversityofanensembleisdifferentinvarioussubspacesofthefeaturespace.
Forcomprehensivereviewsonen-semblemethodsdevelopedforstaticdatasetssee,e.
g.
,[108].
Classierensemblesareanattractiveapproachtoconstructdatastreamclassiers,becausetheyfacilitateadaptationtochangesinthedatadistribution.
Theiradaptationcouldbedonebychangingtheline-upoftheensemble,e.
g.
,byaddingcompo-nentsclassierstrainedonthemostrecentdataand/orremovingtheoutdatedclassiers,orbyretrainingtheensemblecomponents.
Thereareseveralinterestingbooksorsurveysonthedatastreamanalysisandclassication,butmostofthemfocusongen-eralmethodsofdatastreamanalysis,notdedicatingtoomuchspacetoensembleapproaches[43,60,64,114,131],andsomehavebeenwrittenseveralyearsago[59,107,109].
Therefore,thereisstillagapinthisliteraturewithrespecttopresentthedevelopmentinlearningensemblesfromdatastreams.
Thissurveyaimstollthisgap.
Itisalsoworthmentioningthework[105,207],wheredatastreamminingchallengeshavebeendiscussed.
Wewilldiscussopenresearchproblemsandlinesoffutureresearchinthespecicareaofensembleapproachesfordatastreams.
Wewillpaythemostattentiontoclassierensembles,giventhatmostexistingliteratureisinthisarea.
However,wewillalsodiscussresearchonregression(orpredictionmodel)ensembles.
Furthermore,wewillreviewrecentensembleapproachesdedi-catedtovariousmorecomplexdatarepresentationsinstreams.
Thissurveyisorganizedasfollows.
Section2focusesonthemaincharacteristicsofdatastreamsandmethodsdedicatedtotheiranalysis,aswellasonthetypeofdatastreamsanddriftde-tectionmethods.
Section3presentsmethodsforevaluatingclas-siersoverstreamingdata.
InSection4,acomprehensivesurveyonensembletechniquesforclassicationandregressionproblemsispresented.
Section5enumeratesadvancedproblemsfordatastreammining,suchasimbalanceddata,noveltydetection,one-classclassication,andactivelearning,aswellasfocusesonnon-standardandcomplexdatarepresentationsorclassstructures.
Thenalsectiondrawsopenchallengesinthiseldforfutureresearch.
2.
DatastreamcharacteristicsInthissectionwewillprovideageneraloverviewofthedatastreamdomain,discussingdifferenttypesofstreamingdata,learn-ingframeworksusedforitsanalysis,andtheissueofchangesinthedatastreamdistribution,knownasconceptdrift.
2.
1.
GeneralissuesAdatastreamisapotentiallyunbounded,orderedsequenceofdataitemswhicharriveovertime.
Thetimeintervalsbetweenthearrivalofeachdataitemmayvary.
Thesedataitemscanbesimpleattribute-valuepairslikerelationaldatabasetuples,ormorecom-plexstructuressuchasgraphs.
Themaindifferencesbetweendatastreamsandconventionalstaticdatasetsinclude[11,60,169]:dataitemsinthestreamappearsequentiallyovertime,thereisnocontrolovertheorderinwhichdataitemsarriveandtheprocessingsystemshouldbereadytoreactatanytime,thesizeofthedatamaybehuge(streamsarepossiblyofin-nitelength);itisusuallyimpossibletostoreallthedatafromthedatastreaminmemory,usuallyonlyonescanofitemsfromadatastreamispossible;whentheitemisprocesseditisdiscardedorsometimesstoredifnecessary,oraggregatedstatisticsorsynopsesarecalculated,thedataitemsarrivalrateisrapid(relativelyhighwithrespecttotheprocessingpowerofthesystem),datastreamsaresusceptibletochange(datadistributionsgen-eratingexamplesmaychangeonthey),thedatalabelingmaybeverycostly(orevenimpossibleinsomecases),andmaynotbeimmediate.
Thesedatastreamcharacteristicsposetheneedforotheralgo-rithmsthanonespreviouslydevelopedforbatchlearning,where134B.
Krawczyketal.
/InformationFusion37(2017)132–156Fig.
2.
Differencebetweenincrementalandblockbaseclassierupdating.
dataarestoredinnite,persistentdatarepositories.
Typicalbatchlearningalgorithmsarenotcapableoffulllingallofthedatastreamrequirementssuchasconstraintsofmemoryusage,re-strictedprocessingtime,andonescanofincomingexamples[25].
Notethatsomealgorithms,likeNaveBayes,instancebasedlearn-ingorneuralnetworksarenaturallyincrementalones.
However,simpleincrementallearningistypicallyinsucient,asitdoesnotmeettightcomputationaldemandsanddoesnottackleevolvingnatureofdatasources[60].
Constraintsonmemoryandtimehaveresultedinthedevelop-mentofdifferentkindsofwindowingtechniques,sampling(e.
g.
reservoirsampling)andothersummarizationapproaches.
How-ever,thedistributioninthedatasourcegeneratingthestreamdataitemsmaychangeovertime.
Thus,incaseofnon-stationarydatastreams,datafromthepastcanbecomeirrelevantorevenharm-fulforthecurrentsituation,deterioratingpredictionsoftheclas-siers.
Datamanagementapproachescanplaytheroleofaforget-tingmechanismwhereolddatainstancesarediscarded.
2.
2.
TypesofdatastreamsandlearningframeworksIfacompletelysupervisedlearningframeworkisconsidered,itisassumedthataftersometimethetruetargetoutputvalueytoftheexampleisavailable.
Thus,datastreamSisasequenceoflabeledexampleszt=(xt,yt)fort=1,2,T.
Usually,xisavectorofat-tributevalues,andyiseitheradiscreteclasslabel(y∈{K1,Kl})forclassicationproblemsornumericoutput(independent)val-uesforregressionproblems.
Thegeneraltaskistolearnfromthepastdata(atrainingsetofexamples)therelationshipbetweenthesetofattributesandthetargetoutput.
Inthecaseofclassi-cation,thisrelationshipcorrespondstodiscoveredclassicationknowledgeanditisoftenusedasclassierCtodeterminetheclasslabelforthenewcomingexamplext.
Inthecaseofregres-sion,thelearnedmodelisusedtopredictanumericvalue.
Notethattheclassierortheregressionmodelissupposedtoprovideitspredictionatanytimebasedonwhatithaslearnedfromthedataitems{z1,z2,zt}seensofar.
Thispredictionytandtruetargetvalueytcanbeusedbythelearningalgorithmasadditionallearninginformation.
Asmostofthecurrentresearchondatastreamensemblescon-cernsclassication,wewillpresenttheremainingofthissectionusingtheclassicationterminology.
However,nearlyalloftheseissuesarealsovalidforregressioncases.
Themajorityofproposedalgorithmsforlearningstreamclas-siersfollowthesupervisedframework(i.
e.
withacompleteandimmediateaccesstoclasslabelsforallprocessedexamples).
How-ever,insomeapplicationstheassumptionofacompletelabelingoflearningexamplesmaybeunrealisticorimpractical,astheclasslabelsofnewlycomingexamplesindatastreamsarenotimmedi-atelyavailable.
Forinstance,inthenancialfrauddetection,infor-mationonfraudtransactionsisusuallyknownafteralongdelay(e.
g.
whenanaccountholderreceivesthemonthlyreport[52]),whileforacreditapprovalproblemthetruelabelisoftenavail-ableafter2–3years.
Moreover,theacquiringoflabelsfromex-pertsiscostlyandneedssubstantialefforts[204].
Thereforesomeresearchersconsiderotherframeworkssuchas:learningwithdelayedlabelingwhenanaccesstotrueclassla-belsisavailablemuchlaterthanitisexpected;theclassiermayadapttothestreamearlierwithoutknowingit[104],semi-supervisedlearningwherelabelsarenotavailableforallincomingexamples;Theyareprovidedinlimitedportionsfromtimetotime;alternatively,thesystememploysanactivelearn-ingtechnique,whichselectsunlabeledexamplesforacquiringtheirlabels[52,97,110,204],unsupervisedframeworkorlearningfrominitiallylabeledex-amples;Aninitialclassierislearnedfromalimitednumberoflabeledtrainingexamples,andthenitprocessestheupcom-ingstreamofunlabeledexampleswithoutanyaccesstotheirlabels[49].
WewillcometotheseissuesinSection5.
3.
Examplesfromthedatastreamareprovidedeitheronline,i.
e.
,instancebyinstance,orintheformofdatachunks(portions,blocks).
Intherstapproach,algorithmsprocesssingleexam-plesappearingonebyoneinconsecutivemomentsintime,whileintheotherapproach,examplesareavailableonlyinlargersetscalleddatablocks(ordatachunks)S=B1∪B2Bn.
Blocksareusuallyofequalsizeandtheconstruction,evaluation,orupdat-ingofclassiersisdonewhenallexamplesfromanewblockareavailable.
Thisdistinctionmaybeconnectedwithsupervisedorsemi-supervisedframeworks.
Forinstance,insomeproblemsdataitemsaremorenaturallyaccumulatedforsometimeandlabeledinblockswhileanaccesstoclasslabelsinanonlinesetupismoredemanding.
Moreover,thesetypesofprocessingexamplesalsoin-uencetheevaluationofclassiers.
Bothdiscussedmodesarede-pictedinFig.
2.
2.
3.
Stationaryandnon-stationary(drifting)datastreamsTwobasicmodelsofdatastreamsareconsidered:stationary,whereexamplesaredrawnfromaxed,albeitunknown,proba-bilitydistribution,andnon-stationary,wheredatacanevolveovertime.
Inthesecondcase,targetconcepts(classesofexamples)and/orattributedistributionschange.
Inotherwords,theconceptfromwhichthedatastreamisgeneratedshiftsafteraminimumstabilityperiod[60].
Thisphenomenoniscalledconceptdrift,a.
k.
a,covariantshift.
Conceptdriftsarereectedintheincomingin-stancesanddeterioratetheaccuracyofclassiers/regressionmod-elslearnedfrompasttraininginstances.
Typicalreallifestreamsaffectedbyconceptdriftcouldinclude[200]:B.
Krawczyketal.
/InformationFusion37(2017)132–156135Fig.
3.
Typeofdrifts.
computerortelecommunicationsystems,whereattackerslookfornewwaysofovercomingsecuritysystems,tracmonitoring,wheretracpatternsmaychangeovertime,Weatherpredictions,whereclimatechangesandnaturalanomaliesmayinuencetheforecast,systemfollowingpersonalinterests,likepersonaladvertise-ment,whereusersmaychangetheirpreferences,andmedicaldecisionaiding,wherediseaseprogressionmaybein-uencedandchangedinresponsetoapplieddrugsornaturalresistanceofthepatients.
Otherexamplesofreallifeconceptdriftsincludespamcatego-rization,objectpositioning,industrialmonitoringsystems,nancialfrauddetection,androbotics;andtheyarereviewedintherecentsurvey[208].
Conceptdriftcanbedenedfromtheperspectiveofhid-dendatacontexts,whichareunknowntothelearningalgorithm.
Zliobaitealsocallsitanunforeseenchangeasthechangeisun-expectedwithrespecttothecurrentdomainknowledgeorprevi-ouslearningexamples[200].
However,amoreprobabilisticviewonthismatterisusuallypresented,e.
g.
[60,183].
Ineachpointintimet,everyexampleisgeneratedbyasourcewithajointprobabilitydistributionPt(x,y).
Conceptsindataarestableorstationaryifallexamplesaregeneratedbythesamedis-tribution.
If,fortwodistinctpointsintimetandt+,thereexitsxsuchthatPt(x,y)=Pt+(x,y),thenconceptdrifthasoccurred.
DifferentcomponentsofPt(x,y)maychange[60].
Inparticu-lar,whenconceptdriftoccurs,eitheroneorbothofthefollowingchanges:priorprobabilitiesofclassesP(y),classconditionalprobabilitiesP(x|y).
Asaresult,posteriorprobabilitiesoftheclassesP(y|x)may(ormaynot)change.
Basedonthecauseandeffectofthesechanges,twotypesofdriftaredistinguished:realdriftandvirtualdrift.
ArealdriftisdenedasachangeinP(y|x).
Itisworthnot-ingthatsuchchangescanoccurwithorwithoutchangesinP(x).
Therefore,theymayormaynotbevisiblefromthedatadistri-butionwithoutknowingthetrueclasslabels.
Suchadistinctioniscrucial,assomemethodsattempttodetectconceptdriftsusingsolelyinputattributevalues.
Realdrifthasalsobeenreferredtoasconceptshiftandconditionalchange[64].
Avirtualdriftisusuallydenedasachangeintheattribute-valueP(x),orclassdistributionsP(y)thatdoesnotaffectdecisionboundaries.
Insomeworkvirtualdriftisdenedasachangethatdoesnotaffecttheposteriorprobabilities,butitishardtoimag-inethatP(x)ischangedwithoutchangingP(y|x)=P(y)P(x|y)P(x)inrealworldapplications.
However,thesourceandthereforethein-terpretationofsuchchangesdiffersamongauthors.
WidmerandKubat[184]attributedvirtualdrifttoincompletedatarepresen-tationratherthantotruechangesinconcepts.
Tsymbal[175]ontheotherhanddenedvirtualdriftaschangesinthedatadis-tributionthatdonotmodifythedecisionboundary,whileDelany[40]describeditasadriftthatdoesnotaffectthetargetconcept.
Furthermore,virtualdriftshavealsobeencalledtemporarydrifts,samplingshiftsorfeaturechanges[25].
Mostcurrentresearchonlearningclassiersfromevolvingstreamsconcentratesonrealdrifts.
However,itisworthmention-ingthatevenifthetrueclassboundariesdonotchangeinvirtualdrifts,thistypeofdriftmaystillresultinthelearntclassbound-ariestobecomeinadequate.
Therefore,techniquesforhandlingrealdriftsmaystillworkforcertaintypesofvirtualdrifts.
Ifposteriorprobabilitiesdonotchange,itisworthlesstorebuildthemodel,becausethedecisionboundariesarestillthesame.
Virtualdriftde-tectionisalsoimportant,becauseeventhoughitdoesnoteffectthedecisionboundariesoftheclassier,itswronginterpretation(i.
e.
,detectingandclassifyingasrealdrift)couldprovidewrongdecisionaboutclassierretraining.
Apartfromdifferencesinthecauseandeffectofconceptchanges,researchersdistinguishbetweenseveralwaysofhowsuchchangesoccur.
Conceptdriftscanbefurthercharacterized,forex-ample,bytheirpermanence,severity,predictability,andfrequency.
ThereaderisalsoreferredtotherecentpaperbyHydeetal.
[183],whichistherstattempttoprovidethemoreformalframeworkforcomparingdifferenttypesofdriftsandtheirmainproperties.
Theseauthorsalsoproposedanew,quitecomprehensivetaxonomyofconceptdrifttypes.
Themostpopularcategorizationsincludesudden(abrupt)andgradualdrifts[175].
Thersttypeofdriftoccurswhen,atamo-mentintimet,thesourcedistributioninStissuddenlyreplacedbyadifferentdistributioninSt+1.
Gradualdriftsarenotsorad-icalandareconnectedwithaslowerrateofchanges,whichcanbenoticedwhileobservingadatastreamforalongerperiodoftime.
Additionally,someauthorsdistinguishtwotypesofgradualdrift[126].
Thersttypeofgradualdriftreferstothetransitionphasewheretheprobabilityofsamplingfromtherstdistribu-tionPjdecreaseswhiletheprobabilityofgettingexamplesfromthenextdistributionPj+1increases.
Theothertype,calledincre-mental(stepwise)drift,consistsofasequenceofsmall(i.
e.
,notsevere)changes.
Aseachchangeissmall,thedriftmaybenoticedonlyafteralongperiodoftime,evenifeachsmallchangeoccurssuddenly.
Insomedomains,situationswhenpreviousconceptsreappearaftersometimeareseparatelytreatedandanalyzedasrecurrentdrifts.
Thisre-occurrenceofdriftscouldbecyclic(conceptsreoccurinaspecicorder)ornot[175].
Moreover,datastreamsmaycon-tainblips(rareevents/outliers)andnoise,butthesearenotconsid-eredasconceptdriftsanddatastreamclassiersshouldberobusttothem.
ThedifferencesamongthedriftsaredepictedinFig.
3.
Someotherdriftcharacteristicsarealsoconsideredintheliter-ature.
Typically,realconceptdriftconcernschangesforallexam-plesbutitcouldbealsoasub-conceptchangewheredriftislim-136B.
Krawczyketal.
/InformationFusion37(2017)132–156Fig.
4.
Theideaofthemodelrestorationtime.
Fig.
5.
Theideaofdriftdetectionbasedontrackingclassiererrors.
itedtoasubspaceofadomain–seediscussionsonthedriftsever-ityin[126].
Moreover,inreallifesituations,conceptdriftsmaybeacomplexcombinationofmanytypesofbasicdrifts.
Formoreinformationontheseandotherchangesinunderly-ingdatadistributions,thereaderisreferredto[60,64,114,175,183].
Thesestudies,andmoreapplicationorientedpapers,suchas[208],demonstratethattheproblemofconceptdrifthasalsobeenrecognizedandaddressedinmultipleapplicationareas.
Thisshowsthestrongrequirementforstreamingclassierstobecapableofpredicting,detecting,andadaptingtoconceptdrifts.
2.
4.
DriftdetectionmethodsConceptdriftdetectorsaremethods,whichonthebasisofinformationaboutclassier'sperformanceortheincomingdataitemsthemselves,cansignalthatdatastreamdistributionsarechanging.
Suchsignalsusuallytriggerupdating/retrainingofthemodel,orsubstitutingtheoutdatedmodelbythenewone.
Ouraimisontheonehandtoreducethemaximumperformancede-teriorationandontheotherhandtominimizeso-calledrestorationtime(seeFig.
4).
Thedetectorsmayreturnnotonlysignalsaboutdriftdetection,butalsowarningsignals,whichareusuallytreatedasamomentwhenachangeissuspectedandanewtrainingsetrepresentingthenewconceptshouldstartbeinggathered.
Theideaofdriftde-tectionispresentedinFig.
5.
Driftdetectionisnotatrivialtask,becauseontheonehandwerequiresucientlyfastdriftdetectiontoquicklyreplaceoutdatedmodelandtoreducetherestorationtime.
Ontheotherhandwedonotwanttoomanyfalsealarms[69].
Therefore,toassessacon-ceptdriftdetector'sperformance,thefollowingmetricsareusuallyconsidered:numberoftruepositivedriftdetections,numberoffalsealarms,i.
e.
,falsepositivedriftdetections,driftdetectiondelay,i.
e.
,timebetweenrealdriftappearanceanditsdetection.
Onedicultyarisesbecausethereistypicallyatrade-offbe-tweendifferentmetrics.
Forinstance,adriftdetectorcantypicallybetunedtodecreasethedetectiondelay,butthismayleadtoahighernumberoffalsealarms.
Inviewofthat,Alippietal.
[7]haverecentlyusedthefollowingproceduretoevaluatetheirdriftdetec-tionmethodwhenusingarticialdatastreams.
Theygeneratesastreamthatcontainsenoughinstancesafteradriftsothatdriftsarealwaysdetectedbyalldriftdetectionmethodsbeingevaluated.
Theythenplottedthenumberoffalsealarmsversusthedriftde-tectiondelayforalldriftdetectors,usingseveraldifferentparame-tercongurations.
ThisleadtoacurvethatresemblestheReceiverOperatingCharacteristicscurve,butusedtoevaluatedriftdetectionmethodsratherthanclassiers.
Inafewpapersaggregatedmeasures,whichtakeintoconsid-erationtheaforementionedmetrics,arealsoproposed.
ItisworthmentioningtheworkofPesaranghaderandVictor[141],wheretheacceptabledelaylengthwasdenedtodeterminehowfarthede-tecteddriftcouldbefromthetruelocationofdrift,forbeingcon-sideredasatruepositive.
Arecentexperimentalframeworkforthedriftdetectionevaluationcanbefoundin[89].
Theauthorsof[64]proposetocategorizethedriftdetectorsintothefollowingfourmaingroups:1.
DetectorsbasedonStatisticalProcessControl.
2.
Detectorsbasedonthesequentialanalysis.
3.
Methodsmonitoringdistributionsoftwodifferenttimewin-dows.
4.
Contextualapproaches.
Inthenextparagraphs,webrieydescribeafewdriftdetectionmethods.
DDM(DriftDetectionMethod)[62]isthemostwellknownrep-resentativeoftherstcategory.
Itestimatesclassiererror(anditsstandarddeviation),which(assumingtheconvergenceoftheclassiertrainingmethod)hastodecreaseasmoretrainingexam-plesarereceived[147].
Iftheclassiererrorisincreasingwiththenumberoftrainingexamples,thenthissuggestsaconceptdrift,andthecurrentmodelshouldberebuilt.
Moretechnically,DDMgeneratesawarningsignaliftheestimatederrorplustwiceitsde-viationreachesawarninglevel.
Ifthewarninglevelisreached,newincomingexamplesarerememberedinaspecialwindow.
Ifafterwardstheerrorfallsbelowthewarningthreshold,thiswarn-ingistreatedasafalsealarmandthisspecialwindowisdropped.
However,ittheerrorincreaseswithtimeandreachesthedriftlevel,thecurrentclassierisdiscardedandanewoneislearnedfromtherecentlabeledexamplesstoredinthewindow.
Notethatthisdetectionideamaybealsousedtoestimatetimeintervalbe-tweenthewarninganddriftdetection,whereshortertimesindi-cateahigherrateofchanges.
EDDM(EarlyDriftDetectionMethod)isamodicationofDDMtoimprovethedetectionofgradualdrifts[10].
Thesameideaofwarninganddriftlevelsisrealizedwithanewproposalofcom-paringdistancesoferrorrates.
YetanotherdetectorECDDemploystheideaofobservingchangesintheexponentiallyweightedmov-ingaverage[152].
Thesequentialprobabilityratiotests,suchastheWaldtest,arethebasisfordetectorsbelongingtothesecondcategory.
Thecumulativesumapproach(CUSUM)[138]detectsachangeofagivenparametervalueofaprobabilitydistributionandindicatesB.
Krawczyketal.
/InformationFusion37(2017)132–156137whenthechangeissignicant.
Astheparametertheexpectedvalueoftheclassicationerrorcouldbeconsidered,whichmaybeestimatedonthebasisoflabelsofincomingexamplesfromdatastream.
AcomprehensiveanalysisoftherelationshipbetweenCUSUM'sparametersanditsperformancewaspresentedin[64].
PageHinkleyismodicationoftheCUSUMalgorithm,wherethecumulativedifferencebetweenobservedclassiererroranditsav-erageistakenintoconsideration[156].
Yetotherdriftdetectorsbasedonnon-parametricestimationofclassiererroremployingHoeffding'sandMcDiarmid'sinequalitieswereproposedin[22].
ADWINisthebestknownrepresentativeofmethodscomparingtwoslidingwindows.
Inthisalgorithm[14]awindowofincomingexamplesgrowsuntilidentifyingachangeintheaveragevaluein-sidethewindow.
Whenthealgorithmsucceedsatndingtwodis-tinctsub-windows,theirsplitpointisconsideredasanindicationofconceptdrift.
Besidestheuseofparametrictestsforconceptdriftdetection,somenon-parametrictestshavealsobeeninvestigated,suchasthecomputationalintelligencecumulativesumtest[8]andtheinter-sectionofcondenceintervals-basedchangedetectiontest[6].
Alippipresentsaninterestingcomparisonofdifferenttrigger-ingmechanismsforconceptdriftdetection[5].
Itisworthnotingthatdriftdetectorsfrequentlyrelyoncontinuousaccesstoclasslabels,whichusuallycannotbegrantedfromthepracticalpointofview.
Therefore,duringconstructingtheconceptdriftdetectorswehavetotakeintoconsiderationthecostofdatalabeling,whichisusuallypassedover.
Averyinterestingwaytodesigndetectorsistoemploytheactivelearningparadigm[68]orunlabeledexamplesonly.
Unsuperviseddetectionofvirtualconceptdriftismostoftenperformedwithstatisticaltests[120],whichcheckwhetheracur-rentdataportioncomesfromthesamedistributionastherefer-encedata.
Obviously,notallstatisticaltestsaresuitedforthistask,e.
g.
,two-sampleparametrictestssuchasaT2statistic[79]assumeaspecicdistribution,whichmightnotbeacorrectapproachintherealdatacase.
Also,thedistributionsmaynotbesimilartoanystandarddistribution,whatmoreoversuggestsnon-parametrictestsforthetaskofunsupervisedconceptdriftdetection.
Examplesofsuchtestsinclude[164]:CNFDensityEstimationtestintroducedin[45],describesthedatabyvectorsofbinaryfeatures,assignedbydiscretizingat-tributesintosetsofbins.
Then,itcreatesasetofBooleanattributes,whichcoversalloftheexamplesinthereferencedataset,meaningthateachtruefeatureinattributesetisthesameasinatleastoneofthevectorsdescribingthedatapointsinthereferenceset.
Next,anothersetofdataisdrawnfromthesamedistributionasthedatainthereferenceset,representedasbinaryvectors,andcomparedtotheattributesetbyapplyingaMatt–Whitneytest.
Ifthedifferenceisinsignicant,alldataisconsideredtocomefromthesamedistribution,otherwiseadifferenceindistributionsisdetected.
ThemultivariateversionoftheWald–Wolfowitztest[57]con-structsacompletegraph,withexamplesasverticesanddis-tancesbetweenthemasedges.
Thisgraphisthentransformedintoaforestandateststatisticiscomputedbasingontheamountoftrees.
Furthermore,non-parametricunivariatestatisticaltestsareof-tenusedfordetectingconceptdriftindatadistribution[160]:Two-sampleKolmogorov–Smirnovtest,Wilcoxonranksumtest,Two-samplet-test.
Unfortunately,itiseasytoshowthatwithoutaccesstoclasslabelstherealdriftcouldbeundetected[163]iftheyarenotasso-ciatedtochangesinP(x).
Asyetnotsomanypapersdealwithcombineddriftdetectors.
Bifetetal.
[21]proposedthesimplecombinationrulesbasedontheappearanceofdriftonceignoringsignalsaboutwarninglevel.
ItisworthmentioningDriftDetectionEnsemble[119],whereasmallensembleofdetectorsisusedtomakeadecisionaboutthedriftandSelectiveDetectorEnsemble[46]basedonaselectivede-tectorensembletodetectbothabruptandgradualdrifts.
Someex-perimentalstudiesshowedthatsimpledetectorensemblesdonotperformbetterthansimpledriftdetectionmethods[191].
3.
EvaluationindatastreamanalysisProperevaluationofclassiersorregressionmodelsisakeyissueinmachinelearning.
Manyevaluationmeasures,techniquesfortheirexperimentalestimationandapproachestocompareal-gorithmshavealreadybeenproposedforstaticdatascenarios.
Acomprehensivereviewispresentedin[88].
Inthecontextofdatastreammining,especiallyinnon-stationaryenvironments,newsolutionsareneeded.
Whileevalu-atingpredictiveability,itisnecessarytoconsiderbothincremen-talprocessingaswellasevolvingdatacharacteristicsandtheclas-sierreactionstochanges.
Newclassesmayappear,featurespacechangesanddecisionrulesloserelevanceovertime.
Moreover,oneshouldtakeintoaccountcomputationalaspectssuchasprocessingtime,recoveryofthemodelafterthechange,andmemoryusage.
Fastupdatingofalearningmodelandgradualrecoveryisoftenmorereasonablethangatheringdataforalongerperiodoftimeandtryingtorebuildthemodelinasingletimeconsumingstep.
Insteadofexaminingpointoraveragepredictionmeasuresoftheclassier,oneisusuallymoreinterestedintrackingitsworkingcharacteristicsoverthecourseofstreamprogression.
Theauthorsofseveralpapersoftenpresentgraphicalplotsforagivendatasetpresentingthealgorithms'functioningintermsofthechosenevaluationmeasure,suchase.
g.
trainingtime,testingtime,memoryusage,andclassicationaccuracyovertime.
Bypre-sentingthemeasurescalculatedaftereachdatachunkorsingleexampleonthey-axisandthenumberofprocessedtrainingex-amplesonthex-axis,onecanexaminethedynamicsofagivenclassier,inparticular,itsreactionstoconceptdrift.
Suchplotsalsonicelysupportacomparativeanalysisofseveralalgorithms.
Additionally,onemustalsoconsidertheavailabilityofinforma-tionregardingthetruetargetvaluesofincomingexamples.
Themajorityofcurrentmeasuresandevaluationtechniquesassumeimmediateornottoomuchdelayedaccesstotheselabels.
How-ever,insomereallifeproblems,thisassumptionisunrealistic.
Itisalsoworthmentioningthatathoroughevaluationofpre-dictivemodelsinnon-stationaryenvironmentstypicallyrequirestheuseofnotonlyrealworlddatastreams,butalsodatastreamswitharticiallygeneratedconceptdrifts.
Realworlddatastreamsenableustoevaluatehowhelpfulapredictivemodelisinrealworldsituations.
However,theyusuallydonotallowustoknowwhenexactlyadriftoccurs,oreveniftherearereallydrifts.
Thismakesitdiculttoprovideanindepthunderstandingofthebehaviourofpredictivemodelsordriftdetectionmethods.
Datastreamswitharticiallyinduceddriftsenableamoredetailedanal-ysis.
Therefore,bothrealworlddatastreamsanddatastreamswitharticiallyinduceddriftsareimportantwhenevaluatingpredic-tivemodelsandconceptdriftdetectorsinnon-stationaryenviron-ments.
Thecomparisonofalgorithmsproposedintheliteratureisnotaneasytask,asauthorsdonotalwaysfollowthesamerecom-mendations,experimentalevaluationproceduresand/ordatasets.
138B.
Krawczyketal.
/InformationFusion37(2017)132–156Below,wediscussthemostpopularevaluationmeasuresandthentheirexperimentalestimationprocedures.
3.
1.
EvaluationmeasuresThepredictiveabilityofclassiersorregressionmodelsisusu-allyevaluatedwiththesamemeasureasproposedforstatic,non-onlinelearningwhicharealsotheleastcomputationallydemand-ingones.
Belowwelistthemostpopularones:Accuracy:theproportionofallcorrectpredictionstothetotalnumberofexamples,oritscorrespondingmeasureclassica-tionerror,arethemostcommonlyusedforclassication.
Meansquareerrororabsoluteerrorisatypicalmeasureforregression.
Sensitivityoftheclassofinterest(alsocalledRecallorTruePositiveRate)isaccuracyofagivenclass.
G-Mean:thegeometricmeanofsensitivityandspecicityisoftenappliedonclass-imbalanceddatastreamstoavoidthebiasoftheoverallaccuracy.
KappaStatistic:K=p0pc1pc,wherep0isaccuracyoftheclas-sierandpcistheprobabilityofarandomclassiermakingacorrectprediction.
GeneralizedKappaStatisticssuchasKappaMproposedin[20],whichshouldbemoreappropriatethanthestandardKappaStatisticsfordealingwithimbalanceddatastreams.
Furthermore,inthecaseofstaticdatatheareaundertheRe-ceiverOperatingCharacteristicscurve,orsimplyAUC,isapopu-larmeasureforevaluatingclassiersbothonbalancedandimbal-ancedclassdistributions[54].
However,inordertocalculateAUConeneedstosortscoresoftheclassiersonagivendatasetanditeratethrougheachexample.
Thismeansthatthetraditionalver-sionofAUCcannotbedirectlycomputedonlargedatastreams.
ThecurrentuseofAUCfordatastreamshasbeenlimitedonlytoestimationsonperiodicalholdoutsets[76]orentirestreamsofalimitedlength[42].
Aquiterecentstudy[30]introducesane-cientalgorithmforcalculatingPrequentialAUC,suitableforassess-ingclassiersonevolvingdatastreams.
Itsstatisticalpropertiesandcomparisonagainstsimplerpointmeasures,suchasG-meanorKappastatistics,hasbeenexaminedin[33].
Whenanalyzingtheperformanceofclassiersdedicatedtodrifteddata,weshouldalsotakeintoconsiderationtheiradapta-tionabilities,i.
e.
,evaluatingthemaximumperformancedeteriora-tionandrestorationtime,asmentionedinSection2.
4.
Apartfromthepredictiveaccuracyorerror,thefollowingper-formancemetricsshouldbemonitoredandtakenintoaccountdur-ingproperlyexecutedevaluationofstreamingalgorithms:Memoryconsumption:itisnecessarytomonitornotonlytheaveragememoryrequirementsofeachalgorithm,butalsotheirchangeovertimewithrespecttoactionsbeingtaken.
Updatetime:hereoneisinterestedintheamountoftimethatanalgorithmrequirestoupdateitsstructureandaccommodatenewdatafromthestream.
Inanidealsituation,theupdatetimeshouldbelowerthanthearrivaltimeofanewexample(orchunkofdata).
Decisiontime:amountoftimethatamodelneedstomakeadecisionregardingnewinstancesfromthestream.
Thisphaseusuallycomesbeforetheupdatingproceduretakesplace.
So,anydecisionlatencymayresultincreatingabottleneckinthestreamprocessing.
Thisisespeciallycrucialforalgorithmsthatcannotupdateandmakepredictionsregardingnewinstancesatthesametime.
Nevertheless,inordertocalculatereactiontimesandotheradaptabilitymeasures,usuallyahumanexpertneedstodeterminemomentswhenadriftstartsandwhenaclassierrecoversfromit.
Alternately,suchevaluationsarecarriedoutwithsyntheticdatagenerators.
Morecomplexmeasureshavealsobeenproposedtoevaluateotherpropertiesofalgorithms.
ShakerandHüllermeier[158]pro-posedacompleteframeworkforevaluatingtherecoveryrateofthealgorithmonceachangehasoccurredinthestream.
Theycon-sidernotonlyhowwellthemodelreduceditserrorinthenewdecisionspace,butalsowhatwasthetimenecessarytoachievethis.
Zliobaiteetal.
[207]introducedthenotionofcost-sensitiveupdateinordertoevaluatethepotentialgainfromthecost(un-derstoodastimeandcomputationalresources)putintoadaptingthemodeltothecurrentchange.
Theauthorsarguethatthisal-lowstocheckiftheactualupdateofthemodelwasaworthwhileinvestment.
Hassanietal.
[71]proposedanewmeasureforeval-uatingclusteringalgorithmsfordriftingdatastreams,withspecialattentionbeingpaidtothebehaviorofmicro-clusters.
3.
2.
EstimationtechniquesInthecontextofstaticandbatchlearningthemostoftenusedscenarioforestimatingpredictionmeasuresiscrossvalidation.
However,inthecontextofonlinelearningwithcomputationallystrictrequirementsandconceptdrifts,itisnotdirectlyapplicable.
Othertechniquesareconsidered.
Twomainapproachesareuseddependingwhetherthestreamisstationaryornot,asshownbe-low.
Holdoutevaluation:Inthiscasetwosub-setsofdataareneed:thetrainingdataset(tolearnthemodel)andtheindependentholdoutsettotestit.
Itisarrangedthat,atanygivenmomentoftimewhenwewanttoconductmodelevaluation,wehaveatourdisposalaholdoutsetnotpreviouslyusedbyourmodel.
Bytestingthelearningmodelonsuchacontinuouslyupdatedset(itmustbechangedaftereachusagetoensurethatitrep-resentsthecurrentconceptwell),weobtainanunbiasedesti-matorofthemodelerror.
Whenconductedinagiventimeorinstanceinterval,itallowsustomonitortheprogressofthemodel.
Prequentialevaluationisasequentialanalysis[177]wherethesamplesizeisnotxedinadvance.
Instead,dataareevaluatedastheyarecollected.
Predictivesequentialevaluation,orpre-quential,alsoreferredtoasinterleavetrainandtest,followstheonlinelearningprotocol.
Wheneveranexampleisobserved,thecurrentmodelmakesaprediction;whenthesystemreceivesfeedbackfromtheenvironment,wecancomputethelossfunc-tion.
Prequentialmeasurescanbecalculatedonlyforselectedin-stances,thusallowingtoaccommodatetheassumptionoflim-itedlabelavailability.
Ontheotherhand,simplycalculatingacumulativemeasureovertheentirestreammayleadtostronglybiasedresults.
Onemayeasilyimagineasituationinwhichtheoverallcumulativeevaluationisstronglyinuencedbyacer-taintimeperiod,when,e.
g.
,accesstotrainingdatawaslimited,thedecisionproblemwasmuchmoresimple,ordriftwasnotpresent.
Thus,tomaketheerrorestimationmorerobusttosuchcases,aproperforgettingmechanismmustbeimplemented–slidingwindowsorfadingfactors.
Withthis,anemphasisisputonerrorcalculationfromthemostrecentexamples.
Indeedthetermprequential(combinationofwordspredictiveandsequen-tial)stemsfromonlinelearningandisusedintheliteraturetodenotealgorithmsthatbasetheirfunctioningonlyonthemostrecentdata.
Prequentialaccuracy[63]ispopularlyusedwithsupervisedlearning,butalsoaprequentialversionofAUCmetricwasproposedbyBrzezinskiandStefanowski[30],beingsuitableforstreamswithskeweddistributions.
ThisissuewasB.
Krawczyketal.
/InformationFusion37(2017)132–156139EnsemblelearningfromdatastreamsSupervisedlearningforclassicationChunk-basedensemblesforstationarystreamsOnlineensemblesforstationarystreamsChunk-basedensemblesfornon-stationarystreams1.
typical2.
alternativeOnlineensemblesfornon-stationarystreams1.
active2.
passiveSupervisedlearningforregressionAdvancedissuesImbalancedclassicationNoveltydetectionandone-classclassicationActiveandsemi-supervisedlearningComplexdataandstructuredoutputstaxonomyFig.
6.
Thetaxonomyofensemblelearningmethodsfordatastreamsdiscussedthoroughthissurvey.
alsoaddressedbyBifetandFrank[12],whoalsoproposedaprequentialmodicationofkappastatisticsuitableforstreams.
Amoreelaboratedapproachtoevaluateandcomparealgo-rithmsinstreamingscenarioshavebeenintroducedrecently.
ShakerandHüllermeier[158]proposedanapproach,calledre-coveryanalysis,whichusessyntheticdatasetstocalculateclassi-erreactiontimes.
Theauthorsproposedtodivideadatasetwithasingledriftintotwosetswithoutdrifts.
Afterwards,theypro-posetoplottheaccuracyofthetestedclassieroneachofthesedatasetsseparately.
Thecombinationofthesetwoplotsiscalledtheoptimalperformancecurveandservesasareferencethatcanbecomparedwiththeaccuracyplotoftheclassierontheoriginaldataset.
Zliobaiteproposedtousemodifyarealstreambycontrolledpermutationstobetterstudythereactionofclassi-erstodrifts[201].
RecentlyBifetatal.
consideredaprequentialandparallelevaluationstrategyinspiredbycross-validation,whichswitchesnewincomingexamplesbetweencopiesofclassiers–someofthemuseitforupdatingwhileothersfortesting[12].
Statisticaltestshavegainedasignicantpopularityinthema-chinelearningcommunity[66].
Intheareaofdatastreamstherewerefewapproachestousingthesetools[20].
However,theyusu-allyconcentratedonapplyingstandardtestsovertheaveragedre-sultsorbyusingslidingwindowtechnique.
Onemaybecriticaltosuchapproaches,astheyeithertrytotransformadynamicprob-lemintoastaticone,ortakeunderconsiderationonlylocalcharac-teristics.
Sofar,therehasbeennouniedstatisticaltestingframe-workproposedfordatastreamsthatwouldseemfullyappropriate.
4.
EnsemblelearningfromdatastreamsThissectiondiscussessuperviseddatastreamensemblelearn-ingapproachesforclassicationandregressionproblems.
Toorga-nizethesubjectsdiscussedinthissurveyandtoofferanavigationtoolforthereader,wesummarizetheproposedtaxonomyofen-semblelearningapproachesfordatastreamsinFig.
6.
Contentpre-sentedtherewillbediscussedindetailinSections4and5,within-depthpresentationofadvancesintherespectiveareas.
Here,wewouldliketoexplainadisproportioninthesubcategoriesbetweensupervisedlearninginclassicationandregressionproblems.
The-oretically,thesametaxonomyusedfortheclassicationensemblescouldbeusedfortheregressionones.
However,astherearestillveryfewmethodsdevelopedinthisarea,wehaveoptedfornotproposingaseparatetaxonomyforthestreamingregressionen-semblesyet.
4.
1.
SupervisedlearningforclassicationproblemsEnsemblesarethemostoftenstudiednewclassiersinthedatastreamcommunity,seee.
g.
listsofmethodsin[43,60].
Theproposedstreamclassierscanbecategorizedwithrespecttodif-ferentpointsofview.
Themostcommoncategorizationsarethefollowing:stationaryvs.
non-stationarystreamclassiers,activevs.
passiveapproaches,chunkbasedvs.
on-linelearningmodes,distinguishingdifferenttechniquesforupdatingcomponentclassiersandaggregatingtheirpredictions.
Approachesforstationaryenvironmentsdonotcontainanymechanismtoaccelerateadaptationwhenconceptdriftoccurs.
Ap-proachesfornon-stationaryenvironmentsareapproachesspeci-callydesignedtotacklepotentialconceptdrifts.
Whenstudyingapproachestotackleconceptdrift,researchersusuallydistinguishbetweenactivevs.
passive(alsocalledtriggervs.
adaptive)approaches,seee.
g.
adiscussionin[43,169,200].
Activealgorithmsusespecialtechniquestodetectconceptdriftwhichtriggerchangesoradaptationsinclassiers(e.
g.
,rebuildingitfromtherecentexamples)–seethediscussioninearlierSection2.
4.
Passiveapproachesdonotcontainanydriftdetectorandcontinu-ouslyupdatetheclassiereverytimethatanewdataitemispre-sented(regardlesswhetherrealdriftispresentinthedatastream140B.
Krawczyketal.
/InformationFusion37(2017)132–156ornot).
Themajorityofcurrentensemblesfollowapassiveschemaofadaptation,whiletriggersareusuallyusedmainlywithsingleonlineclassiers.
Afewrarecasesofintegratingthemwithen-sembles,suchasACE[133],BWE[38]orDDD[127],willbefurtherdiscussed.
Then,withrespecttothewayofprocessingexamples,theclas-sierscanbecategorizedintochunk-basedapproachesandon-linelearningapproaches.
Chunk-basedapproachesprocessincom-ingdatainchunks,whereeachchunkcontainsaxednumberoftrainingexamples.
Thelearningalgorithmmayiterateoverthetrainingexamplesineachchunkseveraltimes.
Itallowstoexploitbatchalgorithmstolearncomponentclassiers.
Onlinelearningap-proaches,ontheotherhand,processeachtrainingexamplessepa-rately,uponarrival.
Thistypeofapproachisintendedforappli-cationswithstricttimeandmemoryconstraints,orapplicationswherewecannotaffordprocessingeachtrainingexamplemorethanonce,e.
g.
,applicationswheretheamountofincomingdataisverylarge.
Itisworthnotingthattheabovecategorizationdoesnotmeanthatchunk-basedapproachesmustbeusedonlyforsituationswherenewtrainingexamplesarriveinchunks.
Theycanalsobeusedtolearntrainingexamplesthatarriveseparately,becauseeachnewtrainingexamplecanbestoredinabufferuntilthesizeofthisbufferreachesthesizeofthechunk.
Then,chunk-basedap-proachesmayprocessalltheseexamplesstoredinthebuffer.
Sim-ilarly,thiscategorizationdoesnotmeanthatonlinelearningap-proachesmustbeusedonlyforsituationswherenewtrainingex-amplesarriveseparately,one-by-one.
Onlinelearningapproachescanprocesseachtrainingexampleofachunkseparately.
Theycanbeusedforapplicationswheretrainingexamplesarriveinchunks.
Finally,consideringdifferentstrategiesforre-constructingen-semblecomponentclassiersandaggregatingtheirpredictions,onecanrecallKuncheva'scategorization[107],whereshehasdis-tinguishedthefollowingfourbasicstrategies:Dynamiccombiners–componentclassiersarelearntinad-vanceandarenotfurtherupdated;theensembleadaptsbychangingthecombinationphase(usuallybytuningtheclassi-erweightsinsidethevotingrule,e.
g.
,thelevelofcontribu-tiontothenaldecisionisdirectlyproportionaltotherele-vance[86,117]).
Thedrawbackofthisapproachisthatallcon-textsmustbeavailableinadvance;emergenceofnewunknowncontextsmayresultinalackofexperts.
Updatingtrainingdata–recenttrainingexamplesareusedtoonline-updatecomponentclassiers(e.
g.
inon-linebagging[137]oritsfurthergeneralizations[16,180]).
Updatingensemblemembers–updatingonlineorretraininginbatchmode(usingchunks)[15,55,100,136,150].
Structuralchangesoftheensemble–replacingtheworstper-formingclassiersintheensembleandaddinganewcompo-nent,e.
g.
,individualclassiersareevaluateddynamicallyandtheworstoneisreplacedbyanewonetrainedonthemostrecentdata[84,98]Inthispaper,themaincriterionusedtocategorizeclassicationensembleapproachesisthedataprocessingmethod,i.
e.
,whetherexamplesareprocessedinchunksorone-by-one.
Then,asthesec-ondcriterionweuseinformationonwhethertheapproachesaredesignedtodealwithstationaryornon-stationarydatastreams.
Weconsiderthesetwocriteriarstbecauseapproacheswithineachofthesecategoriestackledifferenttypesofdatastreamappli-cations.
Withineachofthesecategories,wewillthenusefurthercriteriatodistinguishamongexistingapproaches.
Section4.
1.
1presentschunk-basedensembleapproachesforstationaryenvironments,Section4.
1.
2presentsonlinelearningapproachesforstationaryenvironments,Section4.
1.
3presentschunk-basedensembleapproachesfornon-stationaryenviron-Table1Chunk-basedensemblesforstationarydatastreams.
AlgorithmDescriptionLearn++[143]IncrementalneuralnetworkensembleAda.
BoostRAN-LTM[92]CombinationofAdaBoost.
M1andRAN-LTMclassierGrowingNCL[124]IncrementalversionoftheNegativeCorrelationLearningBagging++[197]TrainingclassierswithBaggingfromincomingchunksofdataments,andSection4.
1.
4presentsonlinelearningapproachesfornon-stationaryenvironments.
4.
1.
1.
Chunk-basedensemblesforstationarystreamsChunk-basedensemblesforstationarydatastreamsarenotsowelldevelopedasonlineversionsanddidnotreceivesosignicantattentionfromtheresearchcommunity.
Theyarealsorelatedtotheissueofbatchprocessingoflargersetsofdata,andoftendonotexplicitlyrefertothisasstreammining.
Thissectionreviewsthemostpopularmethodsinthisarea.
TheyaresummarizedinTable1.
Learn++isoneofthemostwellrecognizedapproachestosta-tionarystreams[143].
Thisensembleconstructsnewneuralnet-workmodelsoneachincomingchunkofdata,andthencombinestheiroutputsusingmajorityvoting.
Thisallowstoaccommodatenewincominginstancesintotheensemble.
Thisapproachhoweverretainsallpreviouslylearnedclassiers,thusbeinginecientforhandlingmassivedatasetsasthesizeoftheensemblecontinuouslygrows.
Kideraetal.
[92]proposedacombinationofAdaBoost.
M1andResourceAllocatingNetworkwithLong-TermMemory,astableneuralnetworkclassierforincrementallearning.
Theyusedapre-determinednumberofbaseclassiersfortheentirestreampro-cessingandincrementallyupdatedthemwithnewchunks.
Theysuppressedtheforgettingfactorintheseclassiersinordertoal-lowanecientweightapproximationforweightedvotingcombi-nation.
Thishoweverlimitstheusabilityofthisapproachforpo-tentiallyunboundedstreams.
Minkuetal.
[124]introducedanincrementalversionofNega-tiveCorrelationLearningthataimedatco-traininganensembleofmutuallydiverseandindividuallyaccurateneuralnetworks.
Atthesametimetheirproposedlearningschemeallowedtomaintainatrade-offbetweentheforgettingrateandadaptingtonewincom-ingdata.
Twomodelswerediscussed:xedsizeandgrowingsize,differingintheirapproachtomaintainingtheensembleset-up.
Ex-perimentalresultsshowedthatthexedsizeapproachhasbettergeneralizationability,whilethegrowingsizemayeasilyovercometheimpactoftoostrongforgetting.
Bagging++[197]wasdevelopedasanimprovementoverLearn++byutilizingBaggingtoconstructnewmodelsfromincom-ingchunksofdata.
Additionally,theensembleconsistedofhetero-geneousclassiersselectedfromasetoffourdifferentbaseclassi-ers.
AuthorsshowedthattheirapproachgivescomparableresultstoLearn++andNegativeCorrelationLearning,whilebeingsigni-cantlyfaster.
4.
1.
2.
OnlineensemblesforstationarystreamsOnlineensemblesforstationarydatastreamshavegainedsig-nicantlymoreattentionthantheirchunk-basedcounterparts.
Thiswascausedbyageneralpopularityofonlinelearninganditsappli-cationtovariousreal-lifescenarios,notonlylimitedtostreamingdata.
Letusreviewthemostrepresentativeproposalsinthisarea.
TheyaresummarizedinTable2.
OzaandRussel[137]introducedOnlineBagging,whichalle-viatesthelimitationsofstandardBaggingofrequiringtheentireB.
Krawczyketal.
/InformationFusion37(2017)132–156141Table2Onlineensemblesforstationarydatastreams.
AlgorithmDescriptionBagging-basedOzaBag[137]OnlineBaggingASHT[17]Ensembleofadaptive-sizeHoeffdingtreesLevBag[16]LeveragingBaggingwithincreasedresamplingandoutputdetectioncodesORF[41,153]OnlineRandomForestMF[111]OnlineMondrianForestBoosting-basedOzaBoost[137]OnlineBoostingOthersUFFT[61]UltrafastforestofbinarytreesHOT[60]HoeffdingOptionTreesEOS-ELM[112]Ensembleofonlineextremelearningmachinestrainingsetavailablebeforehandforlearning.
Theyassumedthat,inonlinelearning,eachnewincominginstancemaybereplicatedzero,oneormanytimesduringtheupdateprocessofeachbaseclassier.
Thuseachclassierintheensembleisupdatedwithkcopiesofthenewlyarrivedinstance.
ThevalueofkisselectedonthebasisofPoissondistribution,wherekPoisson(1).
Thiscomesfromthefactthatforpotentiallyunboundeddatastreamsthebi-nominaldistributionofkinstandardBaggingtendstothisspe-cicPoissondistribution.
TheoreticalfoundationsofthisapproachwerefurtherdevelopedbyLeeandClyde[113].
TheyproposedaBayesianOnlineBaggingthatwasequivalenttothebatchBayesianversion.
Bycombiningitwithalosslesslearningalgorithm,theyobtainedalosslessonlinebaggingapproach.
Bifetetal.
introducedtwomodicationsofOza'salgorithmcalledAdaptive-SizeHoeffdingTrees(ASHT)[17]andLeveragingBagging[16],whichaimataddingmorerandomizationtothein-putandoutputofthebaseclassiers.
ASHTsynchronouslygrowstreesofdifferentsizes,whereasLeveragingBaggingincreasesre-samplingfromPoisson(1)toPoisson(λ)(whereλisauser-denedparameter)andusesoutputdetectioncodes[16].
AnotheronlineensembledevelopedbyOzaandRusselisOnlineBoosting[137].
Thisensemblemaintainsaxedsizesetofclassi-erstrainedontheexamplesreceivedsofar.
Eachnewexampleisusedtoupdateeachoftheclassiersinasequentialmanner.
Examplesmisclassiedbytheformerclassiersinthesequencehavetheirweightsupdatedsoastobeemphasizedbythelatterclassiers.
Thisisdoneinthefollowingway.
Foreachnewincom-ingexample,oneinitiallyassignsthehighestpossibleweightλ=1toit.
Therstclassierinthepoolisupdatedwiththisexamplek=Poisson(λ)times.
Aftertheupdate,thisclassierisusedtopre-dictthisexample,andtheweightedoverallfractionofexamplesthatitmisclassiedisupdated.
Iftheexampleiscorrectlyclassi-estheexample,theexample'sweightλismultipliedby12(1).
Ifthisclassiermisclassiedtheexample,wemultiplytheweightassociatedtothisexampleby12.
Thisprocedureisthenrepeatedforthenextclassierinthepool,butusingthenewweightλ.
Severalresearchersdevelopedensemblesbasedonacombina-tionofdecisiontrees.
HoeffdingOptionTrees(HOT)canbeseenasanextensionofKirkby'sOptionTree[142].
Itallowseachtrainingexampletoupdateasetofoptionnodesratherthanjustasin-gleleaf.
Itprovidesacompactstructurethatworkslikeasetofweightedclassiers,andjustlikeregularHoeffdingTrees,theyarebuiltinanincrementalway–foramoredetailedalgorithmrefertoitsdescriptionin[25,60].
UltraFastForestofTrees,developedbyGamaandMedas[61],usesanensembleofHoeffdingtreesforonlinelearning.
Theirsplitcriterionisapplicableonlytobinaryclassicationtasks.
Tohandlemulti-classproblems,abinarydecompositionisapplied.
Abinarytreeisconstructedforeachpossiblepairofclasses.
Whenanewinstancearrives,eachclassierisupdatedonlyifthetrueclassla-belforthisinstanceisusedbythebinarybaseclassier.
EnsembleofOnlineExtremeLearningMachines[112]wasde-velopedbyLanetal.
Itisasimplecombinationofonlinerandom-izedneuralnetworks,whereinitialdiversityofthepoolisachievedbyarandomizedtrainingprocedure.
Basemodelsarecombinedusingaveragingofindividualoutputs.
Eachbasemodelisupdatedwiththeincominginstances,butnodiscussionofvericationofhowthediversityintheensembleismaintainedduringthecourseofstreamprocessingwasgiven.
SomeotherresearchersfocusedtheirworkonproposingonlineversionsofthepopularRandomForestalgorithm[41,153].
TheyintroducedonlineRandomTreesthatgeneratetestfunctionsandthresholdsatrandomandselectthebestoneaccordingtoaqual-itymeasure.
Theironlineupdatemethodologyisbasedontheideaofgeneratinganewtreehavingonlyonerootnodewithasetofrandomlyselectedtests.
Twostatisticsarecalculatedonline:min-imumnumberofinstancesbeforesplitandminimumgaintobeachieved.
Whenasplitoccursstatisticsregardingtheinstancesthatwillfallintoleftandrightnodesplitsarepropagatedintochildrennodes,thustheystartalreadywiththeknowledgeoftheirparentnode.
AlthoughtheauthorsacknowledgetheexistenceoftheHoeffdingbound,theyarguethatusingonlineupdatedgainisclosertotherealideabehinddecisiontrees.
Additionally,aforget-tingmechanismviatemporalknowledgeweightingisappliedtoreducetheinuenceofoldinstances.
Thisisrealizedaspruningrandomtrees,whereaclassierisdiscardedfromtheensemblebasedonitsout-of-bagerrorandthetimeitsage(timespendintheensemble).
ThisideawasfurtherdevelopedbyLakshminarayananetal.
intoonlineMondrianForestalgorithm[111].
TheyusedMondrianprocessesfortheirtreeinductionscheme,whichareafamilyofrandombinarypartitions.
Astheywereoriginallyintroducedasin-nitestructures,theauthorsmodiedthemintoniteMondriantrees.
Themaindifferencesbetweenthisapproachandstandarddecisiontreesaretheindependenceofsplitsfromclasslabels,us-ageofsplittimeateverynode,introductionofparametercontrol-lingdynamicallythenumberofnodesandthattheslitisboundedbythetrainingdataandisnotgeneralizedovertheentirefea-turespace.
TheensembleisconstructedidenticallyasinstandardRandomForest,butanotherdifferenceliesinonlineupdatepro-cedure.
Mondriantreescanaccommodatenewinstancesbycreat-inganewsplitthatwillbeonhigherleveloftreehierarchythanexistingones,extendingtheexistingsplit,orsplittingtheexist-ingleafintochildrennodes.
PleasenotethatstandardonlineRan-domForestcanonlyupdatetheirstructureusingthethirdofmen-tionedmethods.
ThismakesMondrianForestsmuchmoreadapt-abletostreamingdata,allowingformorein-depthmodicationsinensemblestructure.
Theauthorsreportthattheirmethodout-performsexistingonlineRandomForests,achievesaccuracysimilartobatchversionsandisatleastanorderofmagnitudefasterthanreferenceensembles.
4.
1.
3.
Chunk-basedensemblesfornon-stationarystreamsChunk-basedapproachesfornon-stationaryenvironmentsusu-allyadapttoconceptdriftsbycreatingnewcomponent(a.
k.
a.
base)classiersfromnewchunks(blocksorbatches)oftrainingexamples.
Ingeneral,componentclassiersoftheensembleareconstructedfromchunkswhichcorrespondtodifferentpartsofthestream.
Therefore,theensemblemayrepresentamixtureofdifferentdistributions(concepts)thathavebeenpresentinthedatastream.
Learninganewcomponentfromthemostrecentchunkisalsoanaturalwayofadaptatingtodrifts[200].
Addition-ally,somechunk-basedensemblesmaintainanadditionalbufferforstoringoldclassiersthatcanbereusedwhenneeded,offer-ingapotentialtohandlerecurringconcepts.
142B.
Krawczyketal.
/InformationFusion37(2017)132–156Table3Chunk-basedensemblesfornon-stationarydatastreams.
AlgorithmDescriptionTypicalapproachesSEA[170]StreamingEnsembleAlgorithmAWE[178]AccuracyWeightedEnsembleAboost[36]Adaptive,fastandlightBoostingLearn++.
NSE[50]Learn++fornon-stationaryenvironmentsAlternativeapproachesKBS[154]Boosting-likemethodusingknowledge-basedsamplingAUE[31]AccuracyUpdatedEnsembleWAE[189]WeightedAgingEnsembleBWE[38]BatchWeightedEnsembleET[146]EnsembletrackingforrecurringconceptsLearningcomponentclassiersfromcompletechunksenablesapplyingstandard,batchlearningalgorithms.
Forgettingofoldclassicationknowledgecanbedonebyeliminatingtoopoorlyperformingcomponents.
Thisoffersawaytolimittheamountofmemoryrequiredtostoretheensemble,eventhoughitimpedestheensembleofrecoveringdeletedclassiersifandwhentheircorrespondingconceptreoccurs.
Mostofthechunk-basedensemblesperiodicallyevaluatetheircomponentswiththenewestchunk.
Theresultsofthisevaluationareusedtoupdateweightsassociatedtoeachcomponentclassi-er.
Theseweightscanbeusedtoemphasisetheclassiersthatbestreectthemostrecentdatadistributionwhenmakinganen-sembleprediction,ortodecidewhichunhelpfulclassiersshouldbediscarded.
Oneofthemainfeaturestodistinguishbetweendiffer-entchunk-basedensemblesfornon-stationaryenvironmentsiswhetherornottheyalwayscreatenewclassiersforeachnewchunkofdatainordertodealwithconceptdrift.
So,wediscusstheseapproachesunderthisperspectivebelow.
Presentedalgo-rithmsaresummarizedinTable3.
TypicalChunk-basedApproaches.
Typically,chunk-basedensemblesareconstructedaccordingtothefollowingschema:1.
ForeachnewchunkBi∈S,evaluatecomponentclassiersCjintheensemblewithrespecttoagivenevaluationmeasureQ(Cj);2.
LearnanewcandidateclassierCcusingBi;3.
AddCctotheensembleiftheensemblesizeisnotexceeded;otherwisereplaceoneoftheexistingcomponentsoftheen-semble.
Eachoftheseapproachesimplementsadifferentstrategytore-stricttheensemblesizeandtoweightdifferentclassiersintheensemble.
Asanewclassierisalwayscreatedtolearneachnewdatachunk,thesizeofthechunkplaysaparticularlyimportantrole.
Atoolargechunksizewouldresultinslowadaptationtodrifts.
Ontheotherhand,atoosmallchunksizewouldnotbeenoughtolearnanentirestableconceptwell,wouldincreasecomputationalcosts,andmayresultinpoorclassicationperformance[178].
OneoftheearliestwellknownapproachesinthiscategoryistheStreamingEnsembleAlgorithm(SEA),proposedbyStreetandKim[170].
Thisapproachcreatesanewclassiertolearneachnewchunkoftrainingdata.
Ifthemaximumensemblesizehasnotbeenreachedyet,thisnewclassierissimplyaddedtotheensemble.
Otherwise,thequalityofthenewclassierisrsteval-uatedbasedonthenextincomingtrainingchunk.
Then,thenewclassierreplacesanexistingclassierwhosequalityisworsethanthequalityofthenewclassieronthistrainingchunk.
Oneofthekeyfeaturesforthesuccessofthisapproachisitsqualitymeasure.
Itfavourstheclassierswhichcorrectlyclassifyexamplesthatarenearlyundecidedbytheensemble.
Inthisway,thisapproachcanavoidoverttingandmaintaindiversity.
Thepredictionsgivenbytheensemblearebasedonthemajorityvoting.
Thisapproachhasbeenshowntorecoverfasterfromconceptdriftthansingleclassi-ers.
Oneofitspotentialproblemsisthatoldclassierscanout-weighthenewclassier,potentiallyslowingdownadaptationtonewconcepts.
Howfasttheensemblecanrecoverfromdriftsde-pendsnotonlyonthechunksize,butalsoontheensemblesize.
AsimilarwayofrestructuringanensemblewasproposedbyWangetal.
asthealgorithmcalledAccuracyWeightedEnsemble(AWE)[178].
ThekeyideaofAWEistoassignweightstoeachclassieroftheensemblebasedontheirpredictionerroronthenewesttrainingchunk.
Aspecialvariantofthemeansquareer-ror(whichallowstodealwithprobabilitiesofacomponentclassi-erpredictions)isusedforthatpurpose.
Theassumptionmadebythisapproachisthatthenewesttrainingchunkislikelytorepresentthecurrenttestexamplesbetter.
Classiersthathaveequalorworseperformancethanarandomclassier(intermsoftheirmeansquareerrors)arediscarded.
Pruningcanalsobeap-pliedtomaintainonlytheKclassierswiththehighestweights.
Inthisway,itispossibletoremoveclassiersthatwouldhinderthepredictionsandincludenewclassiersthatcanlearnthenewconcepts.
Forcost-sensitiveapplications,itisalsopossibletouseinstance-baseddynamicensemblepruning[51].
Thisapproachwasshowntobesuccessfulinachievingbetteraccuracythansingleclassierswhentheensemblesizebecomeslargeenough(i.
e.
,af-terenoughdatachunksarereceived).
However,asnoticedin[27],theAWE'spruningstrategymaysometimesdeletetoomanycom-ponentclassiersinthecaseofcertainsuddendriftsanddecreasetoomuchofAWE'sclassicationaccuracy.
Anotherproblemcon-cernstheevaluationofthenewcandidateclassier–itrequiresk-foldcross-validationinsidethelatestchunk,whichincreasescom-putationaltime.
ChuandZaniolo[36]proposedachunk-basedapproachin-spiredbytheboostingframework.
Whenatrainingchunkisre-ceived,theensembleerroriscalculated.
Afterthat,amechanismbasedonstatisticaltestsisusedtodetectconceptdrifts.
Ifacon-ceptdriftisdetected,alltheclassierscomposingtheensemblearedeleted.
Aftertheconceptdriftdetectionmechanismisapplied(andthepossibledeletionofensemblemembers),anewclassieriscreatedtolearnthetrainingchunk.
ThetrainingexamplesofthechunkareassociatedtoweightsdeterminedinanAdaBoostwaybasedontheensembleerror.
Iftheensembleerroronthecurrentchunkiseandtheexampleiismisclassied,thenthisexample'sweightissettowi=(1e)/e.
Iftheexamplewascorrectlyclassi-ed,itsweightismaintainedas1.
Iftheinclusionofthenewclas-siermakestheensembleexceedthemaximumsizeM,theoldestensemblememberiseliminated.
Theclassicationisdonebyav-eragingtheprobabilitiespredictedbytheclassiersandselectingtheclasswiththehighestprobability.
Thisapproachwasshowntobeabletoimprovepredictiveperformanceincomparisontopre-viousapproachessuchasSEA[170]andWangetal.
's[178]inthepresenceofconceptdrift.
Apotentialproblemofthisapproachisthatitresetsthewholeensembleupondriftdetection.
Thisstrat-egycanbesensitivetofalsealarms(falsepositivedriftdetections)andisunabletodealwithrecurringconcepts.
AnotherapproachinspiredbytheboostingframeworkisEl-wellandPolikar'sgeneralizationofLearn++forNon-StationaryEn-vironments(calledLearn++.
NSE)[50].
Thisapproachalsosetstheweightsofthetrainingexamplesfromanewdatachunkbasedontheensembleerroronthischunk.
Ifanexampleiismisclassi-ed,itsweightissettowi=1/e.
Otherwise,itissetto1.
OneofthemaindifferencesbetweenthisapproachandChuandZaniolo's[36]isthatitdoesnotuseaconceptdriftdetectionmechanism.
Instead,reactiontodriftsisbasedonweightsassociatedtoeachbaseclassier.
Theseweightsarehigherwhenthecorrespondingbaseclassierisabletocorrectlyclassifyexamplesthatweremis-B.
Krawczyketal.
/InformationFusion37(2017)132–156143classiedbytheensemble.
Weightsarelowerifthecorrespondingbaseclassiermisclassiesexamplesthatwerecorrectlyclassiedbytheensemble.
Weightsarealsosettogivemoreimportancetothemisclassicationsonmorerecentdatachunks,whicharebelievedtorepresentthecurrentconceptbetter.
Thepredictionsgivenbytheensemblearebasedonweightedmajorityvoting.
Therefore,baseclassiersthatwerepoorlyperformingforsomeperiodoftimecanbeautomaticallyre-emphasisedthroughtheirweightsoncetheybecomeuseful.
Thefactthatbaseclassiersarenotdeletedcanhelpdealingwithrecurrentdrifts.
However,astheensemblesizeisunlimitedandanewbaseclassierisaddedforeverynewdatachunk,thenumberofbaseclassiersmaybecomehigh.
AlternativeChunk-BasedApproaches.
Chunk-basedensemblesaretypicallyquitesensitivetoapropertuningofthesizeofthedatachunk.
Inparticular,atoolargechunksizemaydelayreactiontodrifts,whileatoosmallchunksizemayleadtopoorlyperformingbaseclassiers.
Moreover,learningeverynewdatachunkmayin-troducealearningoverheadthatcouldbeunnecessarywhenexist-ingclassiersareconsideredgoodenoughforthecurrentconcept.
Someresearchersproposedapproachesthatdeviatefromthetypi-calchunk-basedlearningschemainanattempttoovercomesomeoftheseissues.
Wediscusssomerepresentativeapproachesinthissection.
ScholzandKlinkenberg'sapproach[154,155]decides,foreachnewtrainingchunk,whethertotrainanewclassierorupdatethenewestexistingclassierwithit.
Thisdecisionisbasedontheaccuracyresultingfromtrainingthemostrecentclassierwiththenewchunkincomparisonwiththeaccuracyobtainedbytraininganewclassieronthenewchunk.
Onlythebestbetweenthesetwoclassiersiskept.
Thisstrategymayreducetheproblemofcreat-ingpoorbaseclassiersduetosmallchunksizes,becauseexist-ingclassierscanbetrainedwithmorethanonechunk.
Besidesassigningweightstotheexampleswithinatrainingchunkinaboosting-likestyle,eachclassieritselfalsohasaweight,whichisassigneddependingonitsperformanceonthenewtrainingchunk.
Theseweightsarenotonlyusedtospeedupreactiontoconceptdrifts,butalsotopruneunhelpfulclassiers.
Thisapproachhasbeenshowntoperformwellincomparisontopreviousapproachessuchasadaptivewindowsize[95]andbatchselection[94,96].
However,itdidnotperformsowellwhenthedriftconsistedofanabruptconceptdriftquicklyfollowedbyachangebacktothepreviousconcept.
Deckert[38]proposedanensembleapproachthatusesacon-ceptdriftdetectionmethodtodecidewhetheranewclassiershouldbecreatedtolearnanewdatachunk,orwhetherthenewdatachunkshouldbediscardedwithoutfurthertraining.
Anotheralternativechunk-basedapproachistheAccuracyUp-datedEnsemble(AUE)[27,31].
Inthisensemble,allcomponentclassiersareincrementallyupdatedwithaportionoftheexam-plesfromthenewchunk.
Thismayhelpreducingtheproblemsas-sociatedtocreatingpoorbaseclassiersduetosmallchunksizes.
Anothernoveltyincludesweightingclassierswithnon-linearer-rorfunctions,whichbetterpromotesmoreaccuratecomponents.
Moreover,thenewestcandidateclassieralwaysreceivesthehigh-estweight,asitshouldreectthemostrecentdatadistributionbetter.
AUEalsocontainsothertechniquesforimprovingpruningofensemblesandachievingbettercomputationalcosts.
Theexper-imentalstudies[31]showedthatAUEconstructedwithHoeffdingTreesobtainedhigherclassicationaccuracythanotherchunken-semblesinscenarioswithvarioustypesofdriftsaswellasinsta-blestreams.
Yetanotherapproachtorebuildingachunk-basedensemblewaspresentedbyWozniaketal.
WeightedAgingEnsemble(WAE)modiestheclassierensembleline-uponthebasisoftheirdiver-Table4Onlineensemblesfornon-stationarydatastreams.
AlgorithmDescriptionPassiveapproachesDWM[100]DynamicWeightedMajorityAddExp[99]AddictiveexpertensemblesforclassicationHRE[107]HorseracingensemblesCDC[168]ConceptDriftCommitteeOAUE[29]OnlineAccuracyUpdatedEnsembleWWH[194]EnsembleofclassiersusingoverlappingwindowsADACC[83]AnticipativeDynamicAdaptationtoConceptChangesActiveapproachesACE[133]AdaptiveClassiers-EnsembleTodi[132]TwoOnlineClassiersForLearningAndDetectingConceptDriftDDD[127]DiversityforDealingwithDriftsADWINBagging[18]OnlineBaggingwithADWINdriftdetectorsity.
Theensemblepredictionismadeaccordingtotheweightedmajorityvoting,wheretheweightofagivenclassierdependsonitsaccuracyandtimespentinsideanensemble[189].
Anumberofapproacheshavebeendiscussedinthelitera-turetospecicallytacklerecurringconceptsindatastreams.
Ra-mamurthyandBhatnagar[146]proposedanensembletrackingapproachthattriestodealwithrecurringconceptsexplicitly.
Itmaintainsaglobalsetofclassiersrepresentingdifferentconcepts.
Wheneveranewtrainingchunkisavailable,theerrorofeachclas-sieronitisdetermined.
MaxMSEisdenedastheclassicationerrorofaclassierthatpredictsrandomly.
Ifatleastoneclassierhaserrorlowerthanapre-denedvalueτ,oriftheerroroftheweightedensembleformedbyallclassierswitherrorlowerthanAcceptanceFactorMaxMSEislowerthanτ,nonewclassieriscre-ated.
Thisreducestheoverheadassociatedtolearningeverynewdatachunk.
Ifneitherasingleclassiernortheabovementionedensemblehaveerrorlowerthanτ,anewclassieriscreatedandtrainedwiththenewdatachunk,whichisassumedtorepresentanewconcept.
Oneoftheproblemsofthisapproachisthatithasnostrategytolimitthesizeoftheglobalsetofclassiers.
Anotherapproachforstoringthespecialdenitionsofprevi-ousconceptshasbeenconsideredbyKatakisetal.
intheirensem-blewithconceptualclusterscalculatedandcomparedforeachdatachunk[91].
Jackowski[84]describedanevolutionaryapproachforselectingandweightingclassiersfortheensembleinthepres-enceofrecurrentdrifts,whileSobolewskiandWozniakusedtheideaoftherecurringconceptstogenerateapoolofarticialmod-elsandselectthebestttedinthecaseofconceptdrift[165].
4.
1.
4.
Onlineensemblesfornon-stationarystreamsOnlineensembleslearneachincomingtrainingexamplesep-arately,ratherthaninchunks,andthendiscardit.
Bydoingso,theseapproachesareabletolearnthedatastreaminonepass,potentiallybeingfasterandrequiringlessmemorythanchunk-basedapproaches.
Theseapproachesalsoavoidtheneedforse-lectinganappropriatechunksize.
Thismayreducetheproblemsassociatedwithpoorbasemodelsresultingfromsmallchunksizes,eventhoughtheseapproacheswouldstillnormallyhaveotherpa-rametersaffectingthespeedofreactiontodrifts(e.
g.
,parametersrelatedtoslidingwindowsandfadingfactors).
Oneofthemainfeaturestodistinguishbetweendifferenton-lineensemblelearningapproachesfornon-stationaryenviron-mentsistheuseofconceptdriftdetectionmethods.
So,theyaredividedintopassiveoractivecategories.
PresentedalgorithmsaresummarizedinTable4.
144B.
Krawczyketal.
/InformationFusion37(2017)132–156PassiveApproaches.
Passiveapproachesareapproacheswhichdonotuseexplicitconceptdriftdetectionmethods.
Differentpas-siveonlineensembleshavedifferentstrategiestoassignweightstoclassiers,aswellastodecidewhentoaddorremoveclas-siersfromtheensembleinordertoreacttopotentialconceptdrifts.
Mostoftheseapproachespresentmechanismstocontinu-ouslyadapttoconceptdriftsthatmayoccurinthestream.
Howfastadaptationisachievedandhowsensitivethisadaptationistonoiseusuallydependsonparameters.
OneofthemostwellknownapproachesunderthiscategoryisDynamicWeightedMajority(DWM)[100],proposedbyKolterandMaloof.
Inthisapproach,eachclassierhasaweightthatisre-ducedbyamultiplicativeconstantβ(0≤β<1)whenitmakesawrongprediction,similartoLittlestoneandWarmuth'sWeightedMajorityAlgorithm[117].
Thisallowstheensembletoemphasizetheclassiersthatarelikelytobemostaccurateatagivenpointintime.
Allclassiersareincrementallytrainedontheincomingtrainingexamples.
Inaddition,inordertoacceleratereactiontoconceptdrift,itispossibletoaddanewclassierorremoveexist-ingclassiers.
Newclassiersareaddedwhentheensemblemis-classiesagiventrainingexample.
Theycanlearnpotentiallynewconceptsfromscratch,avoidingtheneedforexistingclassierstoforgettheiroldknowledgewhenthereisconceptdrift.
Classierswhoseweightsaretoolowareclassiersthathavebeenunhelp-fulforalongperiodoftime.
Theycanbedeletedtoavoidtheensemblebecomingtoolarge.
Theweightupdatesandtheaddi-tionandremovalofclassiersareperformedonlyateveryptimesteps,wherepisapre-denedvalue.
Largervaluesofparelikelytobemorerobustagainstnoise.
However,toolargepvaluescanresultinslowadaptationtoconceptdrift.
Ateveryptrainingex-amples,theweightsofallensemblemembersarealsonormal-ized,sothatthenewmembertobeincludeddoesnotdominatethedecision-makingofalltheothers.
DWMhasdemonstratedtoachievegoodperformanceinthepresenceofconceptdrifts[100],usuallyachievingsimilarperformancetoanapproachwithper-fectforgetting.
However,itmaynotperformsowellasLittlestoneandWarmuth'sWeightedMajorityAlgorithm[117]understation-aryconditions.
AddictiveExpertEnsembles(AddExp)isamethodsimilartoDWM[99].
Themainmotivationforthismethodisthefactthatitallowsthedenitionofmistakeandlossbounds.
Inthismethod,theparameterpiseliminated,sothatweightupdateshappenwheneverabaseclassiermisclassiesanewtrainingexample.
Anewclassierisalwaysaddedwhenthepredictionoftheensem-bleasawholeiswrong.
Whencombinedwithastrategytoprunetheoldestclassiersonceamaximumpre-denedensemblesizeifreached,theboundsaredenedinthesamewayaswhennopruningofclassiersisperformed.
However,eliminatingtheoldestclassiersmaynotbeagoodstrategytodealwithnon-stationaryenvironments,asoldclassiersmaystillbeveryuseful.
Theal-ternativestrategyofpruningthelowestweightclassiersismorepractical,butoffersnotheoreticalguarantees.
Otherapproachestocombineonlineclassiersarealsoconsid-eredinHedgeβorWinnowalgorithm[117].
Kunchevacalledthem"horseracing"ensembles[107].
Forinstance,HedgeβworksinasimilarwaytotheWeightedMajorityAlgorithm,butinsteadofus-inganaggregatingruleitselectsonecomponentclassierbasedontheprobabilitydistributionobtainedbynormalizedweightstorepresentthenalensembleprediction.
WinnowalsofollowsthemainschemaofWeightedMajorityAlgorithm,butusesdifferentupdatingandcalculatingweightsideas.
Anotherexampleofpassiveonlinelearningensembleapproachfornon-stationaryenvironmentsisStanley'sConceptDriftCom-mittee(CDC)[168].
AswithDWMandAddExp,allclassiersthatcomposetheensemblearetrainedontheincomingtrainingex-amples.
Insteadofmultiplyingtheweightsoftheclassiersbyaconstantβuponmisclassications,CDCusesweightsthatarepro-portionaltotheclassier'saccuracyonthelastntrainingexam-ples.
Anewclassierisaddedwheneveranewtrainingexamplebecomesavailable,ratherthanonlywhentheensemblemisclassi-esthecurrenttrainingexample.
Whenamaximumpre-denedensemblesizeisreached,anewclassierisaddedonlyifanexist-ingonecanbeeliminated.
Aclassiercanbedeletedifitsweightisbelowapre-denedthresholdtanditsage(numberoftimestepssinceitscreation)ishigherthanapre-denedmaturityage.
Imatureclassiersdonotcontributetotheensemble'sprediction.
Thisgivesthemachancetolearntheconceptwithouthinderingtheensemble'sgeneralization.
ThisapproachwasshowntoachievecomparableorbetterperformancethanpreviousapproachessuchasFLORA4[184]andinstance-basedlearning3(IB3)[3]inthepresenceofconceptdrifts,butsometimespresentedworseperfor-mancethanFLORA4beforethedrifts.
YetanotherideahasbeenusedinOnlineAccuracyUpdatedEnsemble(OAUE)[29].
Itinheritssomepositivesolutionscom-ingfromitshybridprecederAUE,likeincrementalupdatingofcomponentclassiersandlearningnewclassiersatsometimesteps.
However,tomoreecientlyprocessincomingsingleex-amplesandweightcomponentclassiers,thenewproposalofacost-effectivefunctionwasintroduced.
Itachievesagoodtrade-offbetweenpredictiveaccuracy,memoryusageandprocessingtime.
TheWWHalgorithmfromYoshidaetal.
[194]buildsdifferentcomponentclassiersonoverlappingwindowstoselectthebestlearningexamplesandaggregatescomponentpredictionssimilarlytotheWeightedMajorityAlgorithm.
Therefore,WWHcanbeseenasacombinationofaninstanceselectionwindowingtechniquewithanadaptiveensemble.
Quiterecently,JaberproposedtheAnticipativeDynamicAdap-tationtoConceptChanges(ADACC)ensemble,whichattemptstooptimizecontrolovertheonlineclassiersbyrecognizingconceptsinincomingexamples[83].
ActiveApproaches.
Eventhoughactiveonlineensembleapproachesarenotsocommonaspassiveones,thereareafewapproachesinthiscategory.
Oneoftheadvantagesofusingexplicitdriftdetec-tionmethodsisthepossibilitytoinformpractitionersoftheexis-tenceofconceptdrifts.
Theuseofconceptdriftdetectorscanalsohelpapproachestoswiftlyreacttoconceptdriftsoncetheyarediscovered.
However,ifconceptdriftdetectorsfailtodetectdrifts,theseapproacheswillbeunabletoreacttodrifts.
Conceptdriftdetectorsmayalsopresentfalsealarms,i.
e.
,falsepositivedriftde-tections.
Therefore,itisimportantforactiveensembleapproachestoimplementmechanismstoachieverobustnessagainstfalsealarms.
AnexampleofactiveonlineensembleistheAdaptiveClassiers-Ensemble(ACE)[133].
Thisapproachusesbothanon-lineclassiertolearnnewtrainingexamplesandbatchclassierstrainedonoldexamplesstoredinabuffer.
Thebatchclassiersareusednotonlytomakepredictions,butalsotodetectconceptdrifts.
ACEconsidersthatthereisaconceptdriftiftheaccuracyofthemostaccuratebatchclassieroverthelastWexamplesisout-sidethecondenceintervalformedbyitsaccuracyovertheWex-amplesprecedingthelastWexamples.
Wheneveraconceptdriftisdetectedorthemaximumnumberoftrainingexamplestobestoredinthebufferisattained,anewbatchclassieristrainedwiththestoredexamplesandboththeonlineclassierandthebufferarereset.
Apruningmethodisusedtolimitthenumberofbatchclassiersused.
Thispruningmethodremovesolderclassi-ersrst,unlesstheypresentthehighestpredictiveaccuracyoveralongperiodoftime.
Inthatway,theapproachcanuseoldknowl-edgewhentherearerecurringconcepts.
Theclassicationisdonebyweightedmajorityvote.
TheweightisbasedontheaccuracyonB.
Krawczyketal.
/InformationFusion37(2017)132–156145themostrecentWtrainingexamples,anditiszeroifthisaccu-racyisequaltoorlowerthanthelowerendpointoftheaccuracycondenceinterval.
AsthesizeofthebufferofstoredexamplesisindependentofthesizeoftheslidingwindowW,ACEcanre-spondtosuddenchangesevenifthebufferislarge.
However,de-terminingthesizeWoftheslidingwindowmaynotbeeasy.
ACEalsorequiresstorageofexamplesinanincrementalwaytocreatethebatchclassiers,butthisissuecanbeeasilyovercomebyre-placingthebufferbyanonlinelearningalgorithm.
AcomparativeexperimentofACEagainstotherensembleshasbeenpresentedin[39].
TwoOnlineClassiersForLearningAndDetectingConceptDrift(Todi)[132]usestwoonlineclassierstodetectconceptdrift.
Oneofthem(H0)isrebuilteverytimeadriftisdetected.
Theotherone(H1)isnotrebuiltwhenadriftisdetected,butcanbereplacedbythecurrentH0ifadetecteddriftisconrmed.
Todidetectscon-ceptdriftbyperformingastatisticaltestofequalproportionstocompareH0'saccuraciesonthemostrecentWtrainingexamplesandonallthetrainingexamplespresentedsofarexcludingthelastWtrainingexamples.
Afterthedetectionofaconceptdrift,asta-tisticaltestofequalproportionswithsignicancelevelβisdonetocomparethenumberofcorrectlyclassiedtrainingexamplesbyH0andH1sincethebeginningofthetrainingofH0.
Ifstatisti-calsignicantdifferenceisdetected,thismeansthatH0wassuc-cessfultohandleconceptdrift,andthedriftisconrmed.
H0thenreplacesH1andisrebuilt.
TheclassicationisdonebyselectingtheoutputofthemostaccurateclassierconsideringtheWmostrecenttrainingexamples.
Thisstrategymakestheapproachmorerobusttofalsealarmsthanapproachesthatresetthelearningsys-temupondriftdetection[62,134].
However,nostrategyisadoptedtoacceleratethelearningofanewconcept,asthenewconcepthastobelearntfromscratch.
AnotherexampleofactiveonlineensemblelearningapproachinthiscategoryisDiversityforDealingwithDrifts(DDD)[127].
DDDisbasedontheobservationthatveryhighlydiverseensem-bles(whosebaseclassiersproduceverydifferentpredictionsfromeachother)arelikelytohavepoorpredictiveperformanceunderstationaryconditions,butmaybecomeusefulwhentherearecon-ceptdrifts.
So,inthemodepriortodriftdetection,DDDmaintainsbothalowdiversityensembleandahighdiversityensemble.
Thelowdiversityensembleisusedforlearningandformakingpredic-tions.
Thehighdiversityensembleisusedforlearningandisonlyactivatedforpredictionsupondriftdetection.
Thisisbecausethisensembleisunlikelytoperformwellunderstationaryconditions.
Conceptdriftscanbedetectedbyusingexistingmethodsfromtheliterature.
Onceaconceptdriftisdetected,theapproachshiftstothemodeafterdriftdetection,whereitactivatesboththelowandhighdiversityensemblesandcreatesnewlowandhighdiversityensemblestostartlearningthenewconceptfromscratch.
Thepre-dictiongivenbyDDDisthensettotheweightedmajorityvoteofthepredictionsgivenbyitsensembles,exceptforthenewhighdi-versityensemble.
Theweightofeachensembleisproportionaltoitsprequentialaccuracysincedriftdetection.
Thisapproachman-agestoachieverobustnesstodifferenttypesofdriftandtofalsealarms,becausethedifferentensemblesaremostadequatefordif-ferentsituations.
However,theuseofmorethanoneensemblecanmakethisapproachheavierforapplicationswithverystricttimeconstraints.
ModicationsofthearchitectureoftreeensembleswithdriftdetectorshavealsobeenconsideredbyBifetatal.
[13].
TheADWINchangedetectorhasbeenusedtoresetensemblememberswhentheirpredictiveaccuracydegradessignicantly.
Thismakesitpos-sibletobetterdealwithevolvingdatastreams.
ThesameADWINmethodmayalsobeintegratedwithonlinebaggingensemble–seeADWINBagging[18].
Table5Ensemblesforregressionfromdatastreams.
AlgorithmDescriptionOzaBag[137]OnlineBaggingforregressionOzaBoost[137]OnlineBoostingforregressionAddExp[99]AddictiveexpertensemblesforregressionILLSA[90]IncrementallocallearningsoftsensingalgorithmeFIMT-DD[81]Ensemblesofany-timemodeltreesAMRules[47]EnsembleofrandomizedadaptivemodelrulesiSOUP-Tree-MTR[135]EnsemblesofglobalandlocaltreesDCL[125]Dynamiccross-companylearningDycom[128]Dynamiccross-companymappedmodellearningLGPC[192]LazyGaussianProcesscommitteeOWE[162]OnlineweightedensembleofregressormodelsDOER[161]Dynamicandon-lineensembleregression4.
2.
SupervisedlearningforregressionproblemsRegressionanalysisisatechniqueforestimatingafunctionalrelationshipbetweenanumericdependentvariableandasetofindependentvariables.
Ithasbeenwidelystudiedinstatistics,pat-ternrecognition,machinelearninganddatamining.
Manyensem-blemethodscanbefoundintheliteratureforsolvingclassica-tiontasksonstreams,butonlyafewexistforregressiontasks.
DiscussedalgorithmsaresummarizedinTable5.
OzaandRussel'sonlinebaggingalgorithmforstationarydatastreams[137]describedinSection4.
1.
2isanexampleofmethodthatisinherentlyapplicablebothtoclassicationandregression.
KolterandMaloof'sAddictiveExpertEnsembles(AddExp)fornon-stationarydatastreamsalsocontainsanotherversionforcon-tinuousdependentvariables[99].
AsintheAddExpforclassica-tionproblems,aweightisassociatedtoeachbaselearner.
Forclas-sication,AddExpmakespredictionsbyusingweightedmajorityvote,whileforregression,weightedaverageisused.
Intheversionforclassication,theweightassociatedtoabaseclassierismulti-pliedbyβ,0≤β<1,wheneveritmisclassiesatrainingexample.
Intheversionforregression,theweightofabaselearnerisalwaysmultipliedbyβ|yy|,whereyisthepredictiongivenbythebaselearnerisyistheactualvalueofthedependentvariable.
KadlecandGabrysdevelopedanincrementallocallearningsoftsensingalgorithm(ILLSA)[90],operatingintwophases.
Duringtheinitialphaseanumberofbasemodelsisbeingtrained,eachusingdifferentconcepts(subsets)ofthetrainingdata.
Duringtheonlinedatastreamminingphase,weightsassignedtomodelsarerecal-culatedinstance-by-instanceusingtheirproposedBayesianframe-workworkingonoutputposteriorprobabilities.
Themostindepthstudyonlearningensemblesofmodeltreesfromdatastreamsappearsin[80,81].
Theseresearchincludetwodifferentmethodsforonlinelearningoftree-basedensemblesforregressionfromdatastreams.
BothmethodsareimplementedonthetopofsinglemodeltreesinducedusingtheFIMT-DDalgorithm(aspecialincrementalalgorithmforlearningany-timemodeltreesfromevolvingdatastreams).
Then,theensemblesofmodeltreesareinducedbytheonlinebaggingalgorithmandconsistofmodeltreeslearnedwiththeoriginalFIMT-DDalgorithmandarandom-izedversionnamedR-FIMT-DD.
Authorsexploretheideaofran-domizingthelearningprocessthroughdiversicationoftheinputspaceandthesearchtrajectoryandexaminethevalidityofthestatisticalreasoningbehindtheideaforaggregatingmultiplepre-dictions.
Itisexpectedthatthiswouldbringtheresultingmodelclosertotheoptimalorbesthypothesis,insteadofrelyingonlyonthesuccessofagreedysearchstrategyinaconstrainedhypothesisspace.
Theauthorsalsoperformacomparisonwithrespecttotheimprovementsthatanoptiontreebringstothelearningprocess.
In[82],theauthorsobservethattheuseofoptionsactsasakindofbacktrackpastselectiondecisions.
Theirempiricalcompar-146B.
Krawczyketal.
/InformationFusion37(2017)132–156isonhasshownthatthebesttreefoundwithintheoptiontreehasabetteraccuracy(onmostoftheproblems)thanthesin-gletreelearnedbyFIMT-DD.
Theincreasedpredictiveperformanceandstabilitycomesatthecostofasmallincreaseoftheprocess-ingtimeperexampleandacontrollableincreaseintheallocationofmemory.
Theincreaseinthecomputationalcomplexityisduetotheincreasednumberofinternalnodesbeingevaluatedatanygivenpointintime.
Theoptiontreeincursanadditionalincreaseincomputationalcomplexitywhencomputingtheaggregateofthemultiplepredictionsforasingletestingexample,asithastoex-aminealloftheoptionsonthepathfromtheroottothecorre-spondingleafnode.
AdaptiveModelRules[47]istherststreamingrulelearningalgorithmforregressionproblems.
ItextendsAMRulesalgorithmbyusingrandomrulesfromdatastreams.
Severalsetsofrulesarebeinggenerated.
EachrulesetisassociatedwithasetofNattat-tributes.
Theseattributesareselectedrandomlyfromthefullsetofattributesofthedataset.
Thenewalgorithmimprovestheperfor-manceofthepreviousversion.
Osojniketal.
[135]investigatedensemblesoflocalandglobaltreesformulti-targetregressionfromdatastreams.
Authorsar-guedthatpredictingalltargetatonceismorebenecialtominingstreamsthanusinglocalmodels.
Anovelglobalmethodwaspro-posed,namedincrementalStructuredOutputPredictionTreeforMulti-targetRegression(iSOUP-Tree-MTR).
Forimprovingthepre-dictivepower,theauthorsuseditasabaselearnerforOza'sOnlineBagging.
AnapproachcalledDynamicCross-companyLearning(DCL)[125]hasbeenproposedtoperformtransferlearningfordatastreamsinnon-stationaryenvironments.
Theapproachaimsatmakingpredictionsinthecontextofagiventargetcompanyororganization.
Adatastreamcontainingtrainingexamplesfromthiscompanyororganizationisavailable,butproducesfewexamplesovertime.
Thiscanhappen,forexample,whenitisexpensivetocollectlabeledexamplesinthecontextofagivencompany.
There-fore,thisapproachmaintainsnotonlyabaselearnertolearnsuchexamples,butalsootherbaselearnerstolearnexamplesobtainedfromothercompaniesororganizations.
Aweightisassociatedtoeachbaselearner.
Thisweightismultipliedbyβ,0≤β<1,wheneverthisbaselearnerisnottheonethatprovidedthebestpredictiontoanewtargetcompany/organizationtrainingexam-ple.
So,theseweightscanbeusedtoemphasizethebaselearn-ersthatcurrentlybestreectthepresentconceptofthetargetcompany/organization.
Thepredictiongivenbytheensembleistheweightedaverageofthepredictionsgivenbythebaselearners.
AnotherapproachcalledDynamicCross-companyMappedModelLearning(Dycom)[128]extendsDCLtolearnlinearfunc-tionstomapthebaselearnerscreatedwithdatafromothercom-paniesororganizationstothecurrentconceptofthetargetcom-panyororganization.
Thesemappingfunctionsaretrainedbasedonasimplealgorithmthatusestrainingexamplesfromthetar-getcompany/organizationdatastreamandthepredictionsgiventotheseexamplesbythebaselearnersrepresentingothercom-panies/organizations.
Thisalgorithmoperatesinanonlinemannerandgivesmoreimportancetomorerecenttrainingexamples,sothatthemappingfunctionsrepresentthecurrentconceptofthecompanies/organizations.
Itisexpectedtoenableareductioninthenumberoftrainingexamplesrequiredfromthetargetcom-panywhilekeepingasimilarpredictiveperformancetoDCL.
Thisisbecauseitcanbenetfromallbaselearnersbymappingthemtotheconceptofthetargetcompany,ratherthanbenetingonlyfrombaselearnersthatcurrentlybestrepresenttheconceptofthetargetcompany.
XiaoandEckert[192]proposedanapproximationofGaussianprocessesforonlineregressiontasks.
Theycombinedseveralbasemodels,eachbeinginitializedwithrandomparameters.
Eachin-cominginstanceisusedtoupdateaselectedsubsetofbasemodelsthatarebeingchosenusingareedysubsetselection,realizinganoptimizationofasubmodularfunction.
Theauthorsshowedthattheirmethoddisplaysfavorableresultsintermsoferrorreductionandcomputationalcomplexity,howeverusedonlymethodsbasedonGaussianprocessesasareference.
On-lineWeightedEnsemble(OWE)ofregressormodelswasdiscussedbySoaresandAraujo[162].
Itwasdesignedtohandlevarioustypesofconceptdrift,includingrecurrentones.
Theen-semblemodelisbasedonaslidingwindowthatallowstoin-corporatenewsamplesandremoveredundantones.
Aboosting-likesolutionisusedforweightcalculationofensemblemodels,bymeasuringtheirerroronthecurrentwindow.
Additionally,con-tributionofoldwindowscanbetakenintoconsiderationdur-ingweightcalculation,thusallowingforswitchingbetweenrecur-ringandnon-recurringenvironments.
Finally,OWEcanexpanditsstructurebyaddingnewmodelwhentheensembleerrorisin-creasingandpruningmodelscharacterizedbyhighestlossofac-curacy.
Thisconceptwasfurtherdevelopedbythesameauthorsintheirdynamicandon-lineensembleregression(DOER)[161].
Here,theselectionandpruningofmodelswithintheensembleisbeingdonedynamically,instanceafterinstance,toofferimprovedadap-tationcapabilities.
Additionalnoveltyliesinabilityofeachbasemodeltoupdateitsparametersduringthestreamminingproce-dure.
Anevolutionary-basedensemblethatcanadaptthecompetenceareasandweightsassignedtobasemodelsforregressiontaskswasalsodiscussedbyJackowskiin[85].
5.
AdvancedissuesindatastreamanalysisTheprevioussectionshavediscussedtypicalrepresentationsofexamplesandoutputvalues(asattribute-valuepairs)andlearn-ingproblemswhicharethecommonlyencounteredindatastreamanalysis.
However,inseveralnewstudiedproblemsonecanmeetmorecomplexrepresentationsorlearningissues.
Wewillnowdis-cussensemblesolutionstotheseproblems,includinglearningfromimbalanceddata,noveltydetection,lackofcounterexamples,activelearningandnon-standarddatastructures.
5.
1.
ImbalancedclassicationNon-stationarydatastreamsmaybeaffectedbyadditionaldatacomplexityfactorsbesidesconceptdriftsandcomputationalre-quirements.
Inparticular,itconcernsclassimbalance,i.
e.
,situa-tionswhenoneofthetargetclassesisrepresentedbymuchlessinstancesthanotherclasses.
Classimbalanceisanobstacleevenforlearningfromstaticdata,asclassiersarebiasedtowardthemajorityclassesandtendtomisclassifyminorityclassexamples.
Dealingwithunequalcardinalitiesofdifferentclassesisoneofthecontemporarychallengesinbatchlearningfromstaticdata.
Ithasbeenmorestudiedinthisstaticframeworkandmanynewalgo-rithmshavealreadybeenintroduced,fortheircomprehensivere-viewseetherecentmonograph[73]orsurveys[72,101,172].
Outofthesenewsolutionsensemblesareoneofthemostpromisingdirections.
However,classimbalancehasstillreceivedlessattentioninnon-stationarylearning[77].
Notethatimbal-anceddatastreamsmaynotbecharacterizedonlybyanapproxi-matelyxedclassimbalanceratioovertime.
Therelationshipsbe-tweenclassesmayalsobenolongerpermanentinevolvingim-balancedstreams.
Amorecomplexscenarioispossiblewheretheimbalanceratioandthenotionofaminorityclassmaychangeovertime.
Itbecomesevenmorecomplexwhenmulti-classprob-lemsarebeingconsidered[181].
Belowwediscussmainensemble-basedproposalsforminingimbalancedevolvingstreams.
TheyaresummarizedinTable6.
B.
Krawczyketal.
/InformationFusion37(2017)132–156147Table6Ensemblesforimbalanceddatastreams.
AlgorithmDescriptionChunk-basedapproachesSE[65]EnsemblewithmajorityclasssamplingSERA[34]SelectivelyrecursiveapproachforsamplingminorityclassREA[35]SERAwithk-NNforchunksimilarityanalysisBD[116]BoundarydenitionensembleLearn++.
CDC[42]Learn++withconceptdriftandSMOTEOnlineapproachesEONN[67]Ensembleofonlinecost-sensitiveneuralnetworksESOS-ELM[129]EnsembleofsubsetonlinesequentialextremelearningmachinesOOB[180]Oversampling-basedonlineBaggingUOB[180]Undersampling-basedonlineBaggingMOOB[181]Multi-classoversampling-basedonlineBaggingMUOB[181]Multi-classundersampling-basedonlineBaggingManyoftheseproposalsadaptanideaofre-samplingthedatainincomingdatatoobtainmorebalancedclassdistributions.
Ingeneralre-samplingmethodstransformthedistributionofexam-plesintheoriginaldatatowardsmorebalancedclasses.
Under-samplingremovessomeexamplesfromthemajorityclasseswhileoversamplingaddsminorityclassexamples(eitherbyrandomreplicatingorgeneratingsyntheticnewones).
TherstproposalbyGaoetal.
[65]wasanensembleapproachthatdividedexamplesfromtheincomingdatachunkintopositive(theminorityclass)andnegative(otherclasses)subsets.
Tobuildanewbaseclassieronetakesallpositiveinstancesgatheredsofarandrandomlyselectsasubsetofthenegativeinstancesofthenewdatachunk.
Thesizeofthissubsetiscalculatedbasingonaparameterreferringtotheclassdistributionratio.
Then,thisnewclassierisaddedtotheensemble.
Predictionsofbaseclassiersarecombinedusingasimplevotingtechnique.
Inordertoaccom-modatethisideaforapotentiallyinnitestreamauthorsproposetosampleexamplesfromonlyalimitednumberofthemostrecentchunks,usingeitherxed(eachchunkcontributesequally)orfad-ing(themorerecentchunkscontributemoreinstances)strategy.
However,asallpositiveexamplesareusedtolearneachclassier,thismethodislimitedtosituationswithastabledenitionoftheminorityclass.
Selectivelyrecursiveapproach(SERA)[34]isanotherensemblemethodproposedbyChenandHethatextendstheGaoetal.
con-ceptbyusingselectivesamplingoftheminorityclass.
Mahalanobisdistanceisusedtoselectasubsetofmostrelevantminorityin-stances(fromthepreviouschunks)forthecurrentchunkofthestreamandcombinethemwithbaggingmethodappliedonex-amplesfromthemajorityclass.
Thisapproachalleviatesthedraw-backsofthepreviousmethodregardingdriftsonminorityclass,butatthesametimemakesSERAverysensitivetoproperselec-tionofthenumberofminoritysamplestakenunderconsideration.
ChenandHeproposedyetanotherensemble,calledREA[35],whichchangesSERApropertiesbyadoptingthek-nearestneighborprincipletoestimatesimilaritybetweenpreviousminorityexam-pleswithonesinthemostrecentchunk.
Thepredictionsofbaseclassiersareweightedonthebasisoftheirclassicationoftherecentchunk.
LichtenwalterandChawla[116]proposedweightedensemblesinwhichbothclassiedminorityandmajorityinstancesarebe-ingpropagatedbetweenchunks.
Thisallowstobettercapturethepotentiallychangingboundarybetweenclasses.
AcombinationofinformationgainandHellinger'sdistance(askew-insensitivemet-ric)isusedtomeasuresimilaritiesbetweentwodatachunksandthustoimplicitlycheckifaconceptdrifthastakenplace.
Thisin-formationisthenusedtoweightensemblemembersduringthecombinationoftheirpredictions,withalinearfunctionbeingin-verseoftheactualclosenessofchunks.
Theauthorsacknowledgethepotentiallimitationsofthisapproach(likesmalldifferencesinweightsorreducedvariance)butleaveamorepreciseexaminationofdifferentcombinationfunctionsforfuturestudies.
DitzlerandPolikar[42]proposedanextensionoftheirLearn++ensembleforincrementallearningfromimbalanceddata.
Thiscombinestheirpreviousapproachtolearninginnon-stationaryscenarioswithideaofbagging,whereundersamplingisperformedineachbag.
Classiersareweightedbasedontheirperformanceonbothminorityandmajorityclasses,thuspreventingsignicantlossofaccuracyonnegativecases.
However,onemustpointoutthatthisapproachassumeswell-denedminorityclassandcan-nothandledynamicallychangingpropertiesofclasses.
Theau-thorsalsostudiedavariantcalledLearn++.
CDC(ConceptDriftwithSMOTE),whichemploysoversamplingoftheminorityclass.
Ghazikhanietal.
[67]introducedanensembleofonlineneuralnetworkstohandledriftingandimbalancedstreams.
Theyembed-dedacost-sensitivelearningintotheprocessofneuralnetworktraininginordertotackletheskewedclassdistribution.
Anumberofcost-sensitiveneuralnetworksistrainedatthebeginningofthestreamusingdifferentinitialrandomweights.
Then,theensembleisupdatedwithnewinstanceswithoutset-upmodications.
Acostmatrixispredened,withpenaltyforerrorsonminorityclassbe-ingtwicetheremainingcosts.
Theusageofthexedcostmatrixlimitstheadaptabilitytoevolvingstreams.
Classiersarecombinedusingweightedvoting,andindividualweightsarecalculatedwithamodiedWinnowstrategy.
Anensembleofonlinesequentialextremelearningmachine(ESOS-ELM)wasdevelopedbyMirzaetal.
[129].
Itmaintainsran-domizedneuralnetworksthataretrainedonbalancedsubsetsofstream.
Shortandlongtermmemorieswereimplementedtostoretheensembleandtheprogressofthestream.
Twodifferentlearn-ingschemeswereproposedformoderateandhighimbalancera-tios(thedifferencebeingthewayofprocessingmajorityclassin-stances).
However,thealgorithmreplicatesthelimitationsofsomeofthepreviousmethods,assumingnodriftontheminorityclasstakingplace.
Anotherapproachtoimbalancedanddriftingstreamswaspro-posedbyWangetal.
[180].
Theseauthorsaretheonlyresearcherswhichcurrentlyconsideralsodynamicchangesofclasscardinal-ities.
Theyproposedanumberofonlinebagging-basedsolutionsthatareabletocopewithdynamicallychangingimbalanceratioandswitchingofclassproperties(e.
g.
majoritybecomingminor-ityovertime).
Theyconsideredadedicatedconceptdriftdetec-torforimbalancedstreams,whoseoutputdirectlyinuencestheoversamplingorundersamplingratios,allowingtoaccommodateevolvingdataskewness.
Afurthermodication,calledWEOB,usesacombinationofbothunderandoversamplinginordertochoosethebetterstrategyforthecurrentstateofthestream.
Anadaptiveweightingcombinationschemewasproposedtoaccommodatethishybridsolution,wheretheweightsofthesamplingstrategiesareeithercomputedastheirG-meanvaluesorarebinary(meaningonlyoneofthemwillbeusedatatime).
Amulti-classextensionofthismethodwasdiscussedin[181],whereconceptsofmulti-minorityandmulti-majorityclassesareusedtomodelcomplexre-lationsamongclasses.
Finally,recentlysomeresearchershavestartedtodiscusstheneedfornewevaluationmeasurestoaddresscomplexityofimbal-anceddatastreams,see,e.
g.
,[20,30,33].
5.
2.
Noveltydetectionandone-classclassicationDuetotheevolvingnatureofdatastreamsthelearningalgo-rithmhastobepreparedtohandlenew,unseendatathatdonot148B.
Krawczyketal.
/InformationFusion37(2017)132–156Table7Ensemblesfornoveltydetectionandone-classclassication.
AlgorithmDescriptionOCLS[198]One-classlearningandsummarizationensembleUOCL[118]Extendedensembleforone-classlearningandsummarizationIncOCBagg[102]Incrementalone-classBaggingOLP[37]One-classensemblebasedonprototypesLearn++.
NC[130]Learn++ensemblefornovelclassdetectionECSMiner[122]EnsemblefornoveltydetectionwithtimeconstraintsMCM[121]EnsemblefornoveltydetectionanddriftingfeaturespaceAnyNovel[1]Two-stepclusteringensemblefornoveltydetectionCBCE[171]Class-basedensembleforclassevolutionCLAM[4]Class-basedmicroclassierensembleSCARN[4]StreamClassierandnovelandrecurringclassdetectorfollowthepreviouslyseendistributions.
Suchexamplesmaybecausedbynoiseinthestreamormayactuallyoriginatefromanovelconceptthatstartedemerging.
Suchanoveltymaybecausedbysomeabnormality(likezero-day-attachinnetworksoranomalyinthesystem)ormaybeanewinstancefromaconceptthatwaspreviouslynotseen.
Inthelattercaseacompletelynewclassmayappearinthedecisionspace,existingclassesmaymergeoroneoftheclassesmaystartodisappear.
Thismayhappeninthecon-textoftwopossiblescenarios:binaryandmulti-class.
Inthefor-mercasewemaytreatitasataskofrecognizingatarget(correct)conceptandasetofpotentialoutliers[115],whileinthelatterwemustdealatthesametimewitharecognitionproblemamonganumberofclassesanddetectionofpossiblenewemergingclasses[53].
Forthebinarycaseweoftenmustfacethefactthatitisdif-cultorevenimpossibletogathersucientrepresentativesofthenovelclass,orthattheymaynotevenformaclass.
Therefore,one-classclassication(knownaslearningintheabsenceofcounterex-amples)isbeingutilizedasitallowstomodelthetargetconceptwithoutmakinganyassumptionsregardingthepropertiesofthenoveltyobservationstoappear.
Letusdiscussnowmainensemble-basedmethodssuitableforthesescenarios.
TheyaresummarizedinTable7.
Zhuetal.
[198]proposedanone-classensembleapproachtominingdatastreamswithaconceptsummarizationapproachbyprovidinglabelsnotforsingleinstancesbutforchunksofin-stances.
Theyintroducedavagueone-classlearningmodule,basedonone-classSupportVectorMachines.
Eachbaseclassierutilizedweightsassignedtoinstancesfromgivenchunk,reectingtheirlevelofrelevance(inthediscussedapplicationtherelevancewasbasedonuser'sinterestsingiveninformation).
Thiswasdoneinatwo-stepprocedure,utilizinglocalandglobalweighting.
Localweightingcalculatedinstanceweightvaluesusingexamplesinthegivendatachunk.
Globalweightingwasusedtocalculateaweightvalueforbothpositiveandunlabeledinstancesingivenchunk,utilizinginformationcomingfromclassierstrainedonpreviousdatachunks.
Thisweightinformationwasdirectlyembeddedintheprocessofclassiertraining.
Aweightedclassiercombinationschemewasusedtomakeanalensembledecision,wheretheweightsofeachclassierwerecalculatedasanagreementmea-surebetweenitandthemostrecentclassierinthepool.
Onemustnoticethatthisapproachusedstaticone-classclassiersandthusadaptabilitywasachievedonlybyaddingnewmemberstotheensemble.
ThisideawasfurtherdevelopedbyLiuetal.
[118].
Theyalsoproposedachunk-basedensembleofone-classclassiersforsi-multaneouslearningfromuncertaindatastreamsandconceptsummarization.
Theyproposedadifferentschemeforcalculatinginstanceweightsbyusingalocalkernel-densityapproach.
Ital-lowedtogenerateaboundscoreforeachexamplebasedonitslocalnearestneighborsinakernelfeaturespace.
Thus,instanceweightwascalculatedonlyonceanddirectlyembeddedintheprocessofone-classSupportVectorMachinetraining.
Acombina-tionofclassierswasdoneusingaweightedaggregation,whereaweightforeachbaseclassierwasdeterminedbyitsmeansquareerror.
Similartothepreviouswork,classiersusedherewerestaticones.
Anensembleofadaptiveone-classclassiersfordriftingdatastreamswasproposedbyKrawczykandWozniak[102].
Here,clas-siersweretrainedwiththeusageofBagging.
Theset-upoftheensembleremainsunchangedduringthestreamprocessing,butbaseclassiersareupdatedwithrandomsubsetsofexam-plesfromincomingdatachunks.
Asabaseclassiertheyusedanincrementalweightedone-classSupportVectorMachine[103].
Itincorporatesnewexamplesbyre-weightingsupportvectorsandadding/removingthemaccordingtothestreamprogress.
Newinstancescanbeweightedaccordingtotwodifferentstrategies(highestprioritytonewestexamplesorweightsbasedonthedis-tancefromthehyperspherecenter).
Theforgettingmechanismwasimplementedasagradualdecreaseofweightsassignedtovectors,realizedasatime-dependentfunction(thelongertimegivenin-stancespentinthestream,thehighertheforgettingratio).
Thisapproachallowedthemethodtoadapttoconceptdriftwithoutaneedforanexternaldriftdetector,asoldconceptsweregradu-allyremovedfromtheensemblememory.
Additionally,aparallelimplementationwasproposedinordertoachieveacomputationalspeed-up.
However,authorsfocusedtheirworksonlywithchunk-basedprocessingofdatastreams.
CzarnowskiandJedrzejowicz[37]proposedyetanotherchunk-basedensembleofone-classclassiersforhandlingbinaryandmulti-classdatastreams.
Hereasingleone-classclassiers(deci-siontree)wasresponsiblefortacklingasingleclass.
Eachclass-baseddatachunkutilizedfortrainingclassiersconsistedofclassprototypesandinformationaboutwhethertheclasspredictionsoftheseinstances,carried-outatearliersteps,hasbeencorrect.
Whenanewchunkofdatabecomesavailable,aninstanceselec-tionalgorithmisappliedtoselectthemostvaluableexamples.
Classiersarecombinedusingaweightedvotingscheme.
Muhlbaieretal.
[130]introducedanextensionofLearn++forthecaseswithnovelclassappearanceinstreams.
Themainchangeoverthepreviousversionoftheensembleisanextensionoftheclassiercombinationphase.
Adynamicallyweightedconsultandvotewasproposed,whereindividualclassiersinterchangetheirinformationregardingnovelinstancesandselectthemostcompe-tentonesbyassigningthemhighestweights.
Thisallowstopre-ventcaseswhenanewclassiertrainedwithanovelclassisout-votedbyolderoneswhodidnothaveaccesstonewinstances.
However,thissolutionissuitableonlytoscenariosinwhichclassesemergeinatransientmanner.
Masudetal.
[122]introducedanensembleclassierforsi-multaneousclassicationandnoveltydetectionindriftingdatastreamswithembeddedtimeconstraints.
Itworkedunderanas-sumptionthateachexamplemustbeevaluatedwithinagiventimewindownottocreateabottleneckforrapidlyincomingin-stances.
Thisisofcrucialimportancetothenoveltydetectionmodulethatisusuallycharacterizedbythehighestcomputationalcomplexityintheentireclassicationsystem.
Additionally,authorstookintoaccountthepossibledelaywithwhichatrueclassla-belmaybecomeavailabletothesystem.
Thesetwoconstraintsallowedtocreateacomputationallyecientensembleforhigh-speedandevolvingdatastreams.
AsabasecomponentauthorsproposedEnhancedClassierforDataStreamswithnovelclassMiner(ECSMiner),anensemblesystemwiththreebuffers:forpo-B.
Krawczyketal.
/InformationFusion37(2017)132–156149tentiallynovelinstances,forinstanceswaitingforclasslabels,andforlabeledinstancestobeusedintrainingnewclassiers.
Intheirfollow-upworkMasudetal.
[121]proposedanewen-semblemethodthattakeintoaccountnotonlyconceptdriftandnovelclassappearance,butalsothepossibilityofevolvingfea-turespace.
Theyassumedthatnewfeaturesmayappearovertime,whichisbeingjustiedbyspecicdomain-basedapplications(e.
g.
,newphrasesintextstreammining).
Eachmodelintheensem-blewasbuiltusingfeaturespacehomogenizationusinglosslessconversion,toavoiddifferencesbetweentrainingandtestingsets.
However,thereareseveraldifferentmodicationsoftheirmeth-odsinthiswork.
Theoutlierdetectionmodulehasbeenenhancedwithanadaptivethresholdforchangingdenitionsofnovelin-stances.
ThenoveltydetectionmodulewasconstructedwiththeusageofGinicoecienttosimultaneouslymeasurethedifferenceamongnewinstanceandexistingclasses,aswellasitssimilaritytoothernovelinstancesstoredinabuffer.
Finally,theproposedclassicationsystemallowedfordetectingmultiplenovelclassesatthesametimeusingagraphanalysis.
Abdallahetal.
[1]proposedanadaptiveensembleapproachformulti-classnoveltydetection.
Theproposedmethodwasbasedonatwo-stepclusterformation.
Firstlyasupervisedlearningmethodwasappliedtodividetheinitialdataintoclass-basedclusters.
Then,anunsupervisedlearningwasappliedtodetectsub-conceptswithineachclusterandthustocreatemorelocalmodels.
Authorsshowedthattheiralgorithmcanecientlydistinguishbetweenac-tualnovelconceptappearance,driftpresentinoneoftheexist-ingsub-conceptsorsingularoutliersappearance.
Thiswasdonebydeningnovelconceptasresidingoutsideallexistingcluster-basedmodelsandconsistentlymovingawayfromallexistingconcepts.
Aforgettingmechanismwasimplementedtodetectconceptsthatnolongerappearintheincomingstreamandmarkthemasirrel-evant.
Toevaluatethemodelwithinthestreamprogress,authorsproposedanactivelearningstrategytoreducelabelingcosts.
Sunetal.
[171]introducedClass-BasedensembleforClassEvolution(CBCE).
Theyconsideredthreepossiblescenarios:classemergence,disappearanceandre-occurrence.
CBCEconstructsitsensemblebystoringinamemoryanonlineclassierforeverysin-gleclassthathasappearedduringthecourseofdatastreampro-cessing.
Thisisdoneviaone-vs-allbinarydecomposition.
Addition-ally,adynamicundersamplingtechniquetodealwithclassimbal-anceisappliedtoeachbaseclassiertocountertheevolvingdis-proportionsbetweeninstancesinclasses.
However,CBCErequiresitsbaseclassierstoprovidepredictionsintheformofascore,whichlimitsthenumberofpossiblemodelstobeused.
Whenanovelclassemerges,thenitspriorprobabilityisbeingestimatedandanewclassierisbeingtrained.
Classiersmaybedeactivatedwhenaconceptdisappearsandreactivatedwhenitsre-occurrencehasbeendetected.
Twootherensemble-basedapproachestonovelclassdetectionwereproposedbyAl-Khateebetal.
[4],namelyClassBasedMi-croClassierEnsemble(CLAM)andStreamClassierAndNovelandRecurringclassdetector(SCARN).
CLAMusesanensembleofmicro-classiers,whereeachbasemicro-classierhasbeentrainedusingonlypositiveinstancesfromagivenclass.
Thisisdoneviaaclusteringapproach.
Whenanewinstancebecomesavailable,theensembleofmicro-classiersdecideswhetherthisisinstancebe-longstoanyofexistingclassesoritisanovelone.
Afteracer-tainnumberofinstanceshasbeentaggedasrepresentativesofanovelconcept,anewclassieristrainedonthemandaddedtotheensemble.
Thenoveltydetectionisconductedusingaproposedneighborhood-baseddistancescore.
SCARNapproachusestwoen-semblemodels:primaryensembleandauxiliaryensemble.
Theprimaryensembleisresponsiblefordistinctionbetweenknownclassesandpotentialoutliers.
Iftheoutlierhasbeendetectedbytheprimaryensemble,itisthendelegatedtotheauxiliaryensem-Table8Activeandsemi-supervisedensembles.
AlgorithmDescriptionMV[199]OptimalWeightClassierEnsemblewithactivelearningReaSC[123]Ensembleofsemi-supervisedmicro-clustersECU[196]Semi-supervisedensembleintegratingclassiersandclustersCOMPOSE[49]EnsembleforinitiallylabeleddatastreamsSPASC[78]Ensembleofsemi-supervisedclusteringalgorithmsble.
Itsroleistodecidewhetherthisisareoccurringconceptfrompreviouslyknownclassoracompletelynewcase.
5.
3.
Activeandsemi-supervisedlearningFastavailabilityofinformationabouttruetargetvalue(class)ofincomingexamplesisanotherissuewhichshouldbetakenintoaccount.
AsmentionedinSection3mostofusedframeworksas-sumeimmediateornottoomuchdelayedaccesstotargetval-ues.
Insomesituationsitispossibletoobtaintrueexamplestateatminimalornocost.
Anexamplewouldbeweatherprognosis,whereourpredictionwillbeevaluatedinfuture.
Thisishoweverconnectedwiththeproblemoflabellatency-evenifwewillhaveaccesstosuchaninformationitdoesnotbecomeavailablerightafterthearrivalofanewinstance.
However,inmanyprac-ticalsituationthisassumptionmaynotberealistic,mainlyduetopotentiallyhighspeedofincomingexamplesandcostsofhu-manlabeling.
Notethatwhilecooperatingwithhumanexpertsonehastotakeintoaccounttheirlimitedabilities,responsiveness,andthresholdonamountofdatalabeledinacertainamountoftime.
Whenallexamplescannotbequicklylabeled,itmaybestillpos-sibletoobtaintruetargetvaluesforalimitednumberoftheseex-amplesatreasonablecosts–seeadiscussioninSection2.
2.
Thiscanbeexploitedwithactivelearning[58]orsemi-supervised(in-cludingself-labeling)learning[174].
Activelearningtechniquesmusttakeintoaccountthepossibledriftsindataandadapttheirsamplingrulestoit[205].
Onecan-notusestandardstaticuncertainty-basedmethods,astheyarenotrobusttosituationswheredriftoccursinaregionofhighclassi-ercertainty.
Inrecentyears,onecouldseeanincreasednumberofstudiesdealingwiththisproblemthatproposevariousmech-anismsforadaptiveactivelearningovernon-stationarystreams[23,93,187,190].
Ensemble-inspiredapproacheshavebeenalreadyappliedtoselectexamplesinstatic,non-streamdataframeworks.
However,existingworkonusingensemble-basedapproachesforactivelearningindatastreamminingisscarceandthisdirec-tionseemsworthwhileforfutureexploitation.
Wepresenttheen-semblesolutionsforactiveandsemi-supervisedlearningoverdatastreamsbelow.
DiscussedalgorithmsaresummarizedinTable8.
Itisworthmentioningoneofthekeyconceptsofactivelearn-ingcalledQuerybyCommittee[56],whereactivelearningsam-plingiscontrolledbyanensembleofclassiers.
Themostpopu-larmethodsfromthisdomainincludeQuerybyBagging[2]andQuerybyBoosting[2].
Theyhavebeenproventoofferincreasedstabilityandimprovedinstanceselectionforlabelingcomparedtoqueriesbasedonasingleclassierdecision.
However,workonus-ingensemble-basedapproachesforactivelearningindatastreamminingisscarceandthisdirectionalsoseemsworthwhileforfu-tureexploitation.
Zhuetal.
[199]proposedtouseactivelearningforcontrollingtheadaptationprogressofanensembleoverdriftingdatastreams.
Authorsarguedthatvarianceofanensemblehasadirectrelation-shipwithitserrorrateandthusoneshouldselectsuchinstancesforlabelingthatcontributetowardstheminimizationofthevari-ance.
Authorsusedbias-variancedecompositionofensembleerrorasabasisfortheirminimum-varianceinstanceselectionmethod.
150B.
Krawczyketal.
/InformationFusion37(2017)132–156Additionally,theseauthorsderivedanoptimalweightcalculationschemeforcombiningcomponents.
Thesetwoelementsworkinanactivelearningloop–weightsfromthepreviousiterationareusedtoguidetheactivelearningprocedure,afterwhichasetoflabeledexamplesisusedfortheweightupdatestep.
Masudetal.
[123]proposedanapproachwheremicro-clustersaregeneratedusingsemi-supervisedclusteringandacombinationofthesemodelsareusedtohandleunlabeleddataincomingfromthestream.
Alabelpropagationtechniqueisusedtoassigneachmicro-clustertoaclass.
Then,inductivelabelpropagationisusedtoclassifyanewinstance.
Newmicro-clusterscanbeaddedtoanensemblewiththeprogressandchangesinthestream.
Addition-ally,anensemblepruningtechniqueisutilized,deletinganymicro-clusterwithaccuracydroppingbelowthegiventhreshold(70%).
Learningwithdelayedlabelshasoftenbeenstudiedwithamechanismtopropagateavailablelabelsthroughthenextstepswhenonlyunlabeleddataisavailable.
Forinstance,Zhangetal.
consideredahybridensembleintegratingclassiersandclusters,wherelabeledexampleareusedtolearnclassierswhileclustersareformedfromunlabeleddata[196].
Newincominginstancere-ceivesalabelresultingfromvotingbothclassiersandclusters.
Anotherinterestingstatisticalapproachtorepresenteachclassinthestreambyamixtureofsub-populationwasconsideredbyKremplandHofer[104].
Howeverthisapproachisrestrictedtotrackonlylimitedgradualdriftsinunlabeleddata.
COMPOSE(COMPactedObjectSampleExtraction)ensemble[49]wasproposedforstreamswherelabeledinstancesareavail-ableonlyduringtheinitialtrainingofclassiers.
Afterthisphase,allincominginstancesareassumedtobenon-labeled.
COMPOSEworksinthreesteps.
First,initiallabelsarecombinedwithnewunlabeleddatatotrainasemi-supervisedclassieranduseittolabeltheseinstances.
Then,eachclassgetsassignedageometricdescriptortoconstructanenclosingboundaryandprovidethecur-rentdistributionofthisclass.
Finally,instancescalledcoresupportsareextractedtoserveasclassrepresentatives.
Thisallowstotrackconceptdriftinasemi-supervisedmannerandadaptmodelsac-cordingly.
Hosseinietal.
[78]proposedanensembleofsemi-supervisedclusteringalgorithms,whereeachclassisdescribedbyasinglemodel.
Eachnewincomingchunkobtainsapre-denednumberoflabeledinstances,whichareusedtoupdateclassiersintheen-semble.
Chunksareassignedbasedonasemi-supervisedBayesianapproach.
Authorsclaimthattheirapproachisabletoautomati-callyrecognizerecurrentconceptswithinthedatastream.
5.
4.
ComplexdatarepresentationsandstructuredoutputsNon-standarddataandclassstructureshavegainedincreasingattentioninrecentyearsfromthemachinelearningcommunity.
Duetotheadventofbigdataandthenecessitytomineunstruc-tured,heterogeneousandcomplexinformation,werequirelearn-ingmethodsthatcanecientlyaccommodatesuchinstances.
Al-thoughmostofthecurrentresearchconcernsstatic,non-streamingframeworks,someresearchhasbeenundertakeninthecaseofdatastreams.
ThemostimportantstreamingensemblesolutionsarediscussedbelowandaresummarizedinTable9.
Multi-labelandmulti-instancelearningisstillalargelyunex-ploredareaindatastreammining.
Incaseofmulti-labelalgorithmaproperexperimentalandevaluationframeworkwasproposedbyReadetal.
[149],butthereisnotanabundanceofworkthatfollowit,especiallyfromtheensemblepointofview.
Quetal.
[145]pro-posedadynamicclassierensembleformulti-labeldatastreams,whereabinaryrelevanceschemewasextendedbyusingfeatureweightingandkeepingasubsetofthemostrecentclassiersinthepool,insteadofallpossiblepairwisecombinations.
Classiersareweighteddynamicallyforeachincomingexamplefromthestream.
Table9Ensemblesforstreamingcomplexdatarepresentations.
AlgorithmDescriptionMulti-labeldatastreamsDI[145]DynamicensemblewithimprovedbinaryrelevanceMW[193]Multiple-windowensembleformulti-labelstreamsMLDE[166]Multi-votingdynamicensemblewithclusteringFCM-BR[173]BinaryrelevancewithfuzzyconfusionmatrixMulti-instancedatastreamsMILTrack[9]Multi-instanceonlineBoostingOMILBoost[144]OnlineBoostingbasedonimagepatchesSemi-WMIL[182]Semi-supervisedensembleofweakonlineclassiersOtherdatastructuresAdaTreeMiner[15]XMLstreamminingusingclosedtreealgorithmsXSC[26]EnsembleofmaximalfrequentsubtreesforeachclassgSLU[140]EnsemblebasedframeworktopartitiongraphstreamsgEboost[139]BoostingforimbalancedandnoisygraphstreamsXiousetal.
[193]introducedanensembleusingabinaryrel-evancemodelandmaintainingtwoseparatewindows–oneforpositiveandonefornegativeexamples.
Anecientimplementa-tionofk-NNclassierisusedduetoitsnaturalincrementalnature,whileeachbaseclassieristrainedonanundersampledlabelsettotacklepossiblelabelimbalance.
Theproblemsrelatedwithlabelingcostsformulti-labeldatastreamswerediscussedbyWangetal.
[179].
Atheoreticallossfunctionfortheirproposedensembleclassierandanactivelearn-ingfunctiontoselectexamplesminimizingthisfunctionwerede-rived.
Thisallowedforusinglesslabeledinstancesfortraininganddetectingconceptdriftonthebasisoflabelingthemostuncertainexamples.
Multi-LabelDynamicEnsemble(MLDE)wasdevelopedin[166].
Itusedadaptivecluster-basedclassiersthatwerecombinedbyavotingmethodutilizingtwoseparateweightsbasedonaccuracyonthegivendatachunkandsimilarityamongchunks.
TrajdosandKurzynski[173]proposedastream-basedextensionofbinaryrelevancemodelutilizingafuzzyconfusionmatrixtocor-rectthedecisionsofbaseclassiersintheensemble.
Thecorrec-tionmodelwasupdatedasthestreamprogressed,thusadaptingtoitscurrentstate.
However,noexplicitdriftdetectiontechniquewasused.
Multi-instancelearningisanevenlessexploitedareainthestreamminingcontext.
Mostworkinthisdomainconcentratesonimageanalysisapplicationsandisusedinonlinevideoprocess-ing.
However,onemayseeavideoasastreamofimages.
Babenkoetal.
[9]proposedamodicationofonlineboostingforlearningfrombagsofexamples.
Theyassumedthatonceabagislabeledasapositiveone,thenallexampleswithinarealsopositiveandhenceusedfortraining.
However,thisdrawbackwasreducedbychoosingweakclassiersonthebasisofabaglikelihoodlossfunc-tion.
TheensemblecouldbeupdatedwithnewmodelswiththeprogressofthestreamsimilartostandardonlineBoosting.
Asimi-larapproachwasproposedbyQietal.
[144],usinghoweveradif-ferentclassierselectionapproachbasedonselectingcorrectim-agepatcharoundthelabeledtarget.
Wangetal.
[182]proposedasemi-supervisedensembleofweakonlineclassiersforobjecttracking.
Thenalensemblewasconstructedbyselectingweakclassiersobtainedbymaximizingthelog-likelihoodfunctionbutminimizingtheinconsistencyfunction.
MiningXMLdataiswell-studiedinstaticscenarios.
However,moderncomputingenvironmentsrequireonlineandecientdoc-umentprocessingwithintimeandmemoryconstraints.
BifetandGavaldà[15]proposedcompressionofXMLtreesintovectorsthatarepossibleforprocessingbystandardclassiers,creatingclosedfrequentpatterndatastructures.
Thesearelaterfeedintoanum-berofstreamclassiersbasedonvariantsofBaggingandBoostingB.
Krawczyketal.
/InformationFusion37(2017)132–156151foronlineanalysis.
However,themaincontributionofthepaperliedinnewdatastructures,whereastheirensembleswerestan-dardonesfromtheliterature.
BrzezinskiandPiernik[26]developedXMLStreamClassier(XSC)ensemble.
Itcreatesasetofmaximalfrequentsubtreesforeachclassindependently.
Labelpredictionisdoneusingassocia-tionbetweennewdocumentsincomingfromthestreamandtheclosestmaximalfrequentsubtree(andthustheclasstowhichitisassociated).
Thebaseclassiersareupdatedinsequentialmanner,butaseachclasshasitsownclassiertheupdateratesorsizeoftheupdatechunksmayvary.
ThismakesXCSsuitableforprocess-ingimbalanceandlocallydriftingdatastreams.
Streamsofgraphsarealsoafrequentchallengeforlearningal-gorithms,astheybecomemoreandmoreprevalentwiththecon-stantgrowthofsocialnetworks.
Panetal.
[140]proposedanen-sembleapproachformininggraphstreams,whereastreamispar-titionedintoanumberofchunks,eachofwhichcontainsbothla-beledandunlabeledgraphs.
Aminimum-redundancyfeaturese-lectionisappliedindependentlyineachchunktoreduceitsdi-mensionality.
Aslidingwindowsolutionwithinstanceweightingisusedtoaccommodatethepossibilityofdriftpresenceandfor-getoutdatedexamples.
Eachchunkservesasatrainingsettobuildaclassier,andthenformthemintoanensemble.
NearlythesameauthorshaverecentlyextendedthisideabyproposingaBoostingapproachcalledgEboostforimbalancedandnoisygraphstreams[139].
Itmaintainsthegraphpartitioningapproach(in-cludingaspecialfeatureselectionfromsubgraphs),butforeachchunkaBoostingclassierwasconstructedandlearnedwithavariantofmarginmaximization.
Instanceweightingwasincorpo-rateddirectlyintothisschemetoputmoreemphasisonthemostdicultexamplesfortheimbalanceproblem.
6.
FutureresearchdirectionsInthispaper,wehavediscussedthechallengingissuesoflearn-ingensemblesfromdatastreams.
Wehaveconsideredbothclas-sicationandregressionensembles,eventhoughclassierensem-blesaretypicallythemostoftenappliedapproachesindatastreamanalysis.
Intherstsectionsofthepaper,wehavepresentedcharacter-isticswhichdistinguishdatastreamsfromthestandardstaticdatarepositories.
Newrequirementstousingcomputationallyeffectivealgorithms,whichshouldusuallyalsobeabletoadapttoconceptdriftinnon-stationarydatastreams,havebeendiscussed.
Differ-enttypesofconceptdrift,theircharacteristics,andmethodsfortheirdetectionindifferentstreamscenarioshavebeenreviewed.
Moreover,dicultiesinevaluatingstreamclassiersinpresenceofconceptdrifthavebeenshown.
Themainpartofourpaperin-cludesadetailedsurveyofensembles,whicharecategorizedwithrespecttodifferentcriteria(stationaryornotdata,chunkoron-lineprocessingmodes,passiveoractivereactionstodrifts).
Fur-thermore,wehaveextendedthisstudytomorecomplexstreamsituationssuchasclass-imbalancedlearning,noveltydetection,ac-tiveandsemi-supervisedlearning,anddealingwithmorecomplexdatastructures.
Despitemanyinterestingdevelopmentsintheeldofmin-ingdatastreams,thereisstillanumberofopenresearchprob-lemsandchallengesawaitingtobeproperlyaddressed.
Webrieypresentourviewsonpotentialdirectionsthatseemworthwhiletobefurtherexploredbelow:Betterhandlingdelayedinformationandextendingcur-renttechniqueswithinsemi-supervisedlearning:theseap-proachesarestilllimitedtofewensembleproposalsanddef-initelyneedmoreattention.
Inparticular,infastevolvingstreams,therelationshipbetweenattributesandtargetvaluesmaybeonlylocallyvalidduetoconceptdrift[105].
Manyofthediscussedapproachesemployakindoftransferlearning,wherepredictionsfrommodelslearnedfromlabeledexamplesaretransferredtonextunlabeledportionsofthedata.
Ingen-eral,theyaremoreusefulforlimitedgradualdrifts,whilemorecomplexscenariosarestillopenproblems.
Developingnewap-proachestodealwithdelayedinformation,includingensem-bles,thatwouldworkinthepresenceofdifferenttypesofdriftisanon-trivialresearchtask.
Itwouldbeparticularlyuse-fulformanyreallifeautomatedsystems,whereaninteractionwithhumanexpertsisquitelimited.
Finally,delayedinforma-tionmaynotrefertotargetvaluesonly,butmayconcernalsoincompleteattributedescriptions.
Theproblemofincompletedataismoreintensivelystudiedinstatic,off-linedatamining,wheredifferentimputationtechniqueshavebeendeveloped.
Inthestreamingcontext,thereisnottoomuchresearchonsuchtechniquesorotherapproacheswhichcouldlearnclassierswithomittingsuchincompletedescriptionsandthenupdatetheclassierstructure.
Newframeworksforevaluatingdatastreamclassiers:severalinterestingissuesonevaluatingclassiershavebeenstudiedforstatic,off-linedata.
Foracomprehensiveoverview,wereferthereaderto[88].
Althoughnewmeasures[20,30,63,158]havebeenrecentlyintroduced,thenatureofcomplexevolvingdatastreamsstillposesrequirementsfornoveltheoreticalandalgorithmicsolutions.
Thisisparticularlyneededformorecomplexstreamscenarioswithvericationlatency,changingclassimbalance,censoredevendatastreams[157],multipledatastreams[167],andchangesofmisclas-sicationcosts[105].
Asresearchershaveconsideredmanydifferentkindsofmeasures(e.
g.
predictiveperformance,timeormemorycosts,reactiontimeandmanyothers),amulti-criteriaanalysismaybemoreappropriatethanaggregatingseveralmeasuresintoasinglecoecient[28].
Anotheropenissueisrethinkingframeworksfortestingstreamalgorithms.
Tuningparametersofstreamingensemblesismoredicultthaninthestaticcase,wherespecialvalidationsetsorinternalcross-validationareusuallyemployed.
Theirequivalentsforevolvingstreamsareyettobeinvented.
Howtoaccessgroundtruthinunsupervisedstreamsalsoneedstobeelaborated.
Finally,statisticalanalysisofsignicanceofdifferencebe-tweenseveralalgorithmswithrespecttotimechangesshouldbedeveloped,similarlytorecentrecommendationstouseappropriatenon-parametrictestsforstaticoinesetup.
Benchmarkdatasets:thenumberofreal-worldpubliclyavail-abledatasetsfortestingstreamclassiersisstilltoosmall.
Itlimitscomparativestudiesofdifferentstreamingalgorithms.
Moreover,somepopulardatausedintheliteratureisques-tionedtorepresentsucientlyrealdrifts,seee.
g.
discussionsonelectricitydata[202].
Thisisamoredicultsituationcom-paredtothestateofavailablestaticdatasetssuchastheUCIMachineLearningRepository.
Dedicateddiversitymeasuresfordatastreamclassieren-sembles:recallthatensemblediversityisoneoftheimportantcharacteristicsofensemblesinthestandard,staticdatacontext[24,108,159].
AsdiscussedinSection1,severalresearchersstud-iedtherelationshipbetweenhighensemblepredictiveperfor-manceandthediversityofitscomponents.
Othersusedspe-cializeddiversitymeasures[108]tovisuallyanalyzingensem-bleclassicationaccuracy.
Thesemeasureshavealsobeenusedtotunethecombinationruleforaggregatingcomponentclas-sierpredictionsortoprunetoolargepoolofcomponentsin-sidetheensemble.
However,suchresearchisnotmuchvisibleincaseofstreamingensembles.
Ontheonehand,onecansaythatascomponentclassiersarelearntfromdifferentpartsofthestream,theyarealreadydifferentanddiverseones.
Onthe152B.
Krawczyketal.
/InformationFusion37(2017)132–156otherhand,ourliteraturesurveyshowsthatonlyfewauthorsdirectlyconsiderpromotingdiversitywhileconstructinganen-sembleorrebuildingtheminthemomentofdetectingdrifts,seee.
g.
DDDensemble[126]orothergeneralizationsofonlinebaggingsuchas[16].
However,nearlynobodydirectlymeasuresthediversityofcomponentclassiersinstreams.
Rarestudiesarebasedontakingintoconsiderationthediversitymeasuresdevelopedforstatic,off-linesolutions.
Themostrecentstudy[32]providesawiderexperimentalstudyofusingsixofthemostpopulardiversitymeasures[108],whereafewonlineandchunk-basedensembleswereevaluatedinseveralscenariosofdrifts.
Therstobservationfromtheseexperimentsisthatdi-versityofensemblesisratherlow.
Somediversitymeasures,e.
g.
,κinter-agreementmeasure,changevaluesoverthestreamwithrelationtooccurringdrifts–itismorevisibleforsud-denchangesratherthanforgradualdrifts.
So,theseresultsmayindicatefurtherresearchlinesoncombiningselecteddiversitymeasures,perhapsalsowithmoretypicaldriftdetectorstobet-termonitorchangesintheevolvingstreamandtomorepre-ciselyidentifymomentsofdrifts.
Thiscouldalsoleadtonewsolutionsformonitoringchangesinunlabeledstreams.
Never-theless,moreresearchonnewdiversitymeasuresspecializedforevolvingstreamshouldbeundertaken.
Dealingwithmultiplestreamsandmorecomplexrepresen-tations:nearlyallstreamingensembleshavebeenproposedtoprocessingasinglestreamonly.
However,someapplications,seee.
g.
studiesoninternetmessagesorcensoreddatainthevariantofsurvivalanalysis[157],mayprovideseveralparal-lelstreams.
Insuchmultiplestreams,thesamedataevents(objectsidentiedinthedatasources)mayappearindiffer-enttimemomentsineachstreamandmayhavedifferentde-scriptions.
Thisposesseveralinterestingandnewchallenges,e.
g.
,howtoaggregatetheinformationaboutthesameeventavailableindifferentstreams,howtopredictthemomentofaneventappearinginoneofthestreams,givenknowledgeonotherstreams,andwhethertodevelopanewensemblededi-catedtoworkoversuchmultiplestreams.
Theseaspectsshouldbeparticularlyimportantinthecontextofintegratingdifferent(alsoheterogeneous)datarepositoriesinBigDataAnalysis[87].
Notethatdatastreamsarebecomingmoreandmorecomplexinsomenewapplications,suchassocialmediaorelectronichealthrecords,whichrequiretodealwithmanyheterogeneousdatarepresentationsatthesamemoment.
Suchmixedrepre-sentationsincludebothstructured,semi-structuredandcom-pletelyunstructureddataelds,quiteoftenreferringtostaticimages,videosequences,orothersignals.
Tofullycomprehendthedynamicandphenomenonofthesedatasources,weneedtondinteractionsamongsuchcomplexandvaryingdata.
Asensemblesnaturallyintegratediversemodels,theyseemtobeahighlypromisingsolutionforthischallenge.
Consideringmorecomplexclassdistributionsinimbalancedstreams:workingwithclass-imbalancedandevolvingstreamsisstillinearlystages.
Amongveryfewexistingensembleproposals,mostresearchersconsiderthesimplestproblemoftheimbalancedclassratio,withoutchangesofimbalanceratio[180]overtime.
Notethatinthestaticdataframework,otherdatadicultyfactorssuchasdecompositionoftheminorityclassintoraresub-concepts,overlappingwithotherclasses,andpresenceofveryrareminoritycasesinthemajorityclassre-gionsarealsoconsideredasmoreinuentialthantheglobalimbalancebetweenclasses.
Consideringthemindriftingsce-narios,wheresub-conceptsorrarecasesappearovertimeandoverlappingregionschange,isanopenresearchproblem.
Simi-larnewchallengesmayrefertostudyingchangingmultiplemi-norityclasses[181].
Finally,newevaluationmeasuresandmorerigorousevaluationproceduresareneededforevaluatingalgo-rithmsinsuchcompleximbalancedstreams–seeadiscussionin[105].
Morestudiesonthenatureofsomedrifttypes:althoughalotofresearchhasbeendoneonadaptatingensemblestodif-ferentconceptdrifts,severalmoredetailedcharacteristicsofdriftshavenotyetbeenconsistentlyexaminedinliterature.
Inparticular,gradualdriftsaremorediculttobedetectedandtrackedthansuddenchangesorreoccurringconcepts.
Thecurrentdriftdetectorsworkbetterwithsuddendrifts,whiletheidenticationofcharacteristicmomentsofdevelopinggrad-ualorincrementaldriftsinrealstreamsarestillnotsu-cientlydeveloped.
Furthermore,amoreformaldenitionofdif-ferentkindsofgradualdriftsshouldbeproposed.
Theauthorsof[183]showedthattheprogressofchangesinsidegradualdriftsmayberealizedinmanydifferentwaysandneedsmorespecializedsolutions.
Theworkof[127]alsoconsidersdiffer-enttypesofgradualdrifts,besidesconsideringthatdriftsmayoccurinasequenceofseveralabruptandnon-severedrifts.
Thepaper[43]postulatesthattheideaofthesocalledlim-itedgradualdriftisusedratherinanintuitivewayinmostwork.
Althoughtheworkof[183]hasattemptedtoprovidemoreformaldenitionsofdriftcharacteristicsandintroducesanewtaxonomyofdifferenttypesofdrift,moreresearchshouldbeundertakentobetterunderstandthenatureofsomedrifts,howtheydevelopinrealstreams,howtomeasuredriftmag-nitude(e.
g.
small,mediumorhigh),andwhichformsofdriftcouldbebetterhandledbyspeciccategoriesofensembles.
Consideringbackgroundknowledgeorcontextwhileclas-sifyingdatastreams:someresearchersargueforincludingmoreadditionalinformationthanbasicdescriptionsofin-stanceswhenconstructingpredictionsfromstreams.
Oneoftheoptionsistoaddbackgroundknowledgeintodriftadaptationtechniques[208].
Forinstance,takingintoaccountseasonalef-fectswhileanalyzingtheelectricitybenchmarkdatasetnicelyillustratestheusefulnessofthispostulate[206].
Anotherpos-sibilityisclassifyingdatastreamstakingcontextintoconsider-ation,i.
e.
,usuallyMarkovchainsareusedtoanalyzethedatastreamwhenthereareinter-dependenciesbetweenthesucces-sivelabels,e.
g.
,medicaldiagnosis–thestateofthepatientde-pendsnotonlyontherecentobservationbutalsohis/herhis-toryistakenintoconsideration.
Thesameinthecaseofcharac-terrecognition,whenweknowthatthetextis,e.
g.
,writteninEnglish,wherewecanrecognizethecurrentletteronthebasisofitscharacteristic,butalsotakeintoconsiderationwhatwasthepreviousletter(somecombinationsarenotpossibleandsomeofthemarealmostimpossible).
Thereareseveralstud-iesonclassicationwithcontext,e.
g.
,[70,148,186].
Self-tuningensembles:mostonlineandchunk-basedap-proachesusemodelswithparametersbeingeitherindividu-allytunedorusingsomepresetvalues–xedforthecom-pleteanalysisprocess.
However,withthechangeswithinthestreamthepreviouslysetparametersmaynolongerbethesuf-cientlygood(especiallyincaseofparameter-sensitivemeth-ods,likesupportvectormachinesorneuralnetworks).
There-fore,proposinganewmethodologyforself-tuningstreamingensemblesystemsmayleadtoimprovedpredictivepower.
Ad-ditionally,tuningparametersforsingleclassiersshouldtakeintoaccountthattheyarecomponentswithintheensemble.
Thus,moreglobalupdatemethodsthatcanleadtoobtainmorecomplementarymodelsseemstobeworthexploring.
Ensemblepruning:althoughmanyensemblesfordatastreamsapplypruningprocedures,theyareusuallybasedonpredic-tionperformanceortimethatthemodelhasspentwithintheensemble.
However,asdatastreamminingisacomplextask,thesefactorsmaynotbesucienttocapturethefulldynamicsofchanges.
Moreadvancedpruningtechniquescouldalsoex-B.
Krawczyketal.
/InformationFusion37(2017)132–156153ploitamultiplecriteriaanalysis,includingnotonlycurrentpre-dictiveability,butalsocomputationaleciencyofbasemodels,memoryusageorotherresources,currentdiversityoftheen-semble,availableinformationonclasslabels,etc.
Atthesametime,thesepruningtechniquesshouldimposeminimalcompu-tationaloverhead.
Suchcompound,yetlightweightapproaches,shouldleadtomaintainingbetterensemblesetupandimproveadaptationabilitiestovarioustypesofchanges.
OtherrequirementstoprocessingBigDataandprivacyis-sues:whendealingwithmassivedatastreams,algorithmsshouldbeabletohandlenotonlychangingdata,butalsobigvolumesofinstancesarrivingrapidly.
Atthesametime,anen-sembleforsuchdatamuststillworkunderstricttimeandmemoryconstraints.
Thiscanbehandledintwoways–byproposingalgorithmswithimprovedscalabilityorbyusingspe-cialperformancecomputingenvironments,likeSPARK,HadooporGPUclusters.
Althoughsomeattemptstoextendthemostoftenusedsoftware,likeMOA,havealreadybeenundertaken,thereisstillaneedforecientimplementationsofexistingmethodswithinthesespecializedframeworksforBigData,aswellasdevelopingnewsolutionsnativelyforthem.
Anotheras-pectofanalyzingBigDataconcernstherequirementsforpri-vacyprotection,especiallyincomplexsystemswherestreamsareasub-partofamorecomplexanalyticalworkow[87].
Here,oftennotonlynoinformationcanbeleakedoutside,butalsotheteamsparticipatingwithintheanalysismaynotbewillingtodirectlysharetheirdata.
Itraisestheneedfordatastreamensemblealgorithmsabletoworkinsuchscenar-ioswithoutthepossibilityofreverse-engineeringtheunderly-ingdatafromtheirdecisionsandmodels.
AcknowledgmentsThisworkwassupportedbythePolishNationalScienceCen-terunderthegrantsno.
DEC-2013/09/B/ST6/02264andno.
DEC-2013/11/B/ST6/00963.
J.
GamaacknowledgesprojectMAESTRA-ICT-7502013-612944.
References[1]Z.
S.
Abdallah,M.
M.
Gaber,B.
Srinivasan,S.
Krishnaswamy,Anynovel:detec-tionofnovelconceptsinevolvingdatastreams,EvolvingSyst.
7(2)(2016)73–93.
[2]N.
Abe,H.
Mamitsuka,Querylearningstrategiesusingboostingandbagging,in:ProceedingsoftheFifteenthInternationalConferenceonMachineLearn-ing(ICML1998),Madison,Wisconsin,USA,July24–27,1998,1998,pp.
1–9.
[3]D.
Aha,D.
Kibler,M.
K.
Albert,Instance-basedlearningalgorithms,Mach.
Learn.
6(1991)37–66.
[4]T.
Al-Khateeb,M.
M.
Masud,K.
Al-Naami,S.
E.
Seker,A.
M.
Mustafa,L.
Khan,Z.
Trabelsi,C.
C.
Aggarwal,J.
Han,Recurringandnovelclassdetectionusingclass-basedensembleforevolvingdatastream,IEEETrans.
Knowl.
DataEng.
28(10)(2016)2752–2764.
[5]C.
Alippi,IntelligenceforEmbeddedSystems.
AMethodologicalApproach,SpringerInternationalPublishing,2014.
[6]C.
Alippi,G.
Boracchi,M.
Roveri,ChangedetectiontestsusingtheICIrule,in:Proceedingsofthe2010InternationalJointConferenceonNeuralNetworks(IJCNN),2010,pp.
1–7.
[7]C.
Alippi,G.
Boracchi,M.
Roveri,Hierarchicalchange-detectiontests,IEEETrans.
NeuralNetw.
Learn.
Syst.
28(2)(2017)246–258.
[8]C.
Alippi,M.
Roveri,Just-in-timeadaptiveclassiers.
Parti:detectingnonsta-tionarychanges,IEEETrans.
NeuralNetw.
19(7)(2008)1145–1153.
[9]B.
Babenko,M.
Yang,S.
J.
Belongie,Robustobjecttrackingwithonlinemul-tipleinstancelearning,IEEETrans.
PatternAnal.
Mach.
Intell.
33(8)(2011)1619–1632.
[10]M.
Baena-García,J.
DelCampo-vila,R.
Fidalgo,A.
Bifet,Earlydriftdetectionmethod,in:ProceedingsoftheForthECMLPKDDInternationalWorkshoponKnowledgeDiscoveryFromDataStreams(IWKDDS'06),2006,pp.
77–86.
Berlin,Germany[11]A.
Bifet,AdaptiveLearningandMiningforDataStreamsandFrequentPat-terns,UniversitatPolitécnicadeCatalunya,2009Ph.
D.
thesis.
[12]A.
Bifet,E.
Frank,Sentimentknowledgediscoveryintwitterstreamingdata,in:DiscoveryScience-13thInternationalConference,DS2010,Canberra,Australia,October6–8,2010.
Proceedings,2010,pp.
1–15.
[13]A.
Bifet,E.
Frank,G.
Holmes,B.
Pfahringer,Accurateensemblefordatastreams:combiningrestrictedHoeffdingtreeswithstacking,in:2ndAsianConferenceonMachineLearning(ACML2010),2010a,pp.
225–240.
[14]A.
Bifet,R.
Gavaldà,Learningfromtime-changingdatawithadaptivewindow-ing,Proceedingsofthe7thSIAMInternationalConferenceonDataMining,2007.
[15]A.
Bifet,R.
Gavaldà,AdaptiveXMLtreeclassicationonevolvingdatastreams,in:MachineLearningandKnowledgeDiscoveryinDatabases,EuropeanCon-ference,ECMLPKDD2009,Bled,Slovenia,September7–11,2009,Proceedings,PartI,2009,pp.
147–162.
[16]A.
Bifet,G.
Holmes,B.
Pfahringer,Leveragingbaggingforevolvingdatastreams,in:ECML/PKDD(1),2010b,pp.
135–150.
[17]A.
Bifet,G.
Holmes,B.
Pfahringer,R.
Gavaldà,Improvingadaptivebaggingmethodsforevolvingdatastreams,in:Proceedingsofthe1stAsianCon-ferenceonMachineLearning,in:LectureNotesinComputerScience,5828,Springer,2009a,pp.
23–47.
[18]A.
Bifet,G.
Holmes,B.
Pfahringer,R.
Kirkby,R.
Gavaldà,Newensemblemeth-odsforevolvingdatastreams,in:Proceedingsofthe15thACMSIGKDDIn-ternationalConferenceonKnowledgeDiscoveryandDataMining,ACMPress,NewYork,NY,USA,2009b,pp.
139–148.
[19]A.
Bifet,G.
Holmes,B.
Pfahringer,P.
Kranen,H.
Kremer,T.
Jansen,T.
Seidl,Moa:massiveonlineanalysis,J.
Mach.
Learn.
Res.
(JMLR)(2010c)1601–1604.
[20]A.
Bifet,G.
D.
F.
Morales,J.
Read,G.
Holmes,B.
Pfahringer,Ecientonlineeval-uationofbigdatastreamclassiers,in:Proceedingsofthe21thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,Sydney,NSW,Australia,August10–13,2015,2015,pp.
59–68.
[21]A.
Bifet,J.
Read,B.
Pfahringer,G.
Holmes,I.
Zliobaite,CD-MOA:changedetec-tionframeworkformassiveonlineanalysis,in:AdvancesinIntelligentDataAnalysisXII-12thInternationalSymposium,IDA2013,London,UK,October17–19,2013.
Proceedings,2013,pp.
92–103.
[22]I.
I.
F.
Blanco,J.
delCampo-Avila,G.
Ramos-Jimenez,R.
M.
Bueno,A.
A.
O.
Diaz,Y.
C.
Mota,Onlineandnon-parametricdriftdetectionmethodsbasedonHo-effding'sbounds,IEEETrans.
Knowl.
DataEng.
27(3)(2015)810–823,doi:10.
1109/TKDE.
2014.
2345382.
[23]M.
Bouguelia,Y.
Belad,A.
Belad,Anadaptivestreamingactivelearningstrat-egybasedoninstanceweighting,PatternRecognit.
Lett.
70(2016)38–44.
[24]G.
Brown,J.
L.
Wyatt,R.
Harris,X.
Yao,Diversitycreationmethods:asurveyandcategorisation,Inf.
Fusion6(1)(2005)5–20.
[25]D.
Brzezinski,Block-BasedandOnlineEnsemblesforConcept-DriftingDataStreams,PoznanUniversityofTechnology,2015Ph.
D.
thesis.
[26]D.
Brzezinski,M.
Piernik,StructuralXMLclassicationinconceptdriftingdatastreams,NewGener.
Comput.
33(4)(2015)345–366.
[27]D.
Brzezinski,J.
Stefanowski,Accuracyupdatedensemblefordatastreamswithconceptdrift,in:Proceedingsofthe6thHAISInternationalConferenceonHybridArticialIntelligentSystems,PartII,in:LNCS,6679,Springer,2011,pp.
155–163.
[28]D.
Brzezinski,J.
Stefanowski,Classiersforconcept-driftingdatastreams:evaluatingthingsthatreallymatter,in:ECMLPKDD2013WorkshoponRe-al-WorldChallengesforDataStreamMining,September27th,Prague,CzechRepublic,2013,pp.
10–14.
[29]D.
Brzezinski,J.
Stefanowski,Combiningblock-basedandonlinemethodsinlearningensemblesfromconceptdriftingdatastreams,Inf.
Sci.
265(2014a)50–67.
[30]D.
Brzezinski,J.
Stefanowski,PrequentialAUCforclassierevaluationanddriftdetectioninevolvingdatastreams,in:NewFrontiersinMiningComplexPatterns-ThirdInternationalWorkshop,NFMCP2014,HeldinConjunctionwithECML-PKDD2014,Nancy,France,September19,2014,RevisedSelectedPapers,2014b,pp.
87–101.
[31]D.
Brzezinski,J.
Stefanowski,Reactingtodifferenttypesofconceptdrift:theaccuracyupdatedensemblealgorithm,IEEETrans.
NeuralNetw.
Learn.
Syst.
25(1)(2014c)81–94,doi:10.
1109/TNNLS.
2013.
2251352.
[32]D.
Brzezinski,J.
Stefanowski,Ensemblediversityinevolvingdatastreams,in:DiscoveryScience:19thInternationalConference,DS2016.
Proceedings,in:LNCS,9956,Springer,2016,pp.
229–244.
[33]D.
Brzezinski,J.
Stefanowski,Prequentialauc:propertiesoftheareaundertheroccurvefordatastreamswithconceptdrift,Knowl.
Inf.
Syst.
(2017).
[34]S.
Chen,H.
He,SERA:selectivelyrecursiveapproachtowardsnonstationaryimbalancedstreamdatamining,in:InternationalJointConferenceonNeu-ralNetworks,IJCNN2009,Atlanta,Georgia,USA,14–19June2009,2009,pp.
522–529.
[35]S.
Chen,H.
He,Towardsincrementallearningofnonstationaryimbalanceddatastream:amultipleselectivelyrecursiveapproach,EvolvingSyst.
2(1)(2011)35–50.
[36]F.
Chu,C.
Zaniolo,Fastandlightboostingforadaptiveminingofdatastreams,in:ProceedingsoftheEightPacic-AsiaKnowledgeDiscoveryandDataMin-ingConference(PAKDD'04),2004,pp.
282–292.
Sydney[37]I.
Czarnowski,P.
Jedrzejowicz,Ensembleonlineclassierbasedontheone–classbaseclassiersforminingdatastreams,Cybern.
Syst.
46(1–2)(2015)51–68.
[38]M.
Deckert,Batchweightedensembleforminingdatastreamswithconceptdrift,in:InternationalSymposiumonMethodologiesforIntelligentSystems,SpringerBerlinHeidelberg,2011,pp.
290–299.
[39]M.
Deckert,J.
Stefanowski,Comparingblockensemblesfordatastreamswithconceptdrift,in:NewTrendsinDatabasesandInformationSystems,SpringerBerlinHeidelberg,2013,pp.
69–78.
154B.
Krawczyketal.
/InformationFusion37(2017)132–156[40]S.
Delany,P.
Cunningham,A.
Tsymbal,L.
Coyle,Acase-basedtechniquefortrackingconceptdriftinspamltering,Knowl.
BasedSyst.
18(4–5)(2005)187–195.
[41]M.
Denil,D.
Matheson,N.
deFreitas,Consistencyofonlinerandomforests,in:Proceedingsofthe30thInternationalConferenceonMachineLearning,ICML2013,Atlanta,GA,USA,16–21June2013,2013,pp.
1256–1264.
[42]G.
Ditzler,R.
Polikar,Incrementallearningofconceptdriftfromstreamingimbalanceddata,IEEETrans.
Knowl.
DataEng.
25(10)(2013)2283–2301.
[43]G.
Ditzler,M.
Roveri,C.
Alippi,R.
Polikar,Learninginnonstationaryenviron-ments:asurvey,IEEEComput.
Intell.
Mag.
10(4)(2015)12–25,doi:10.
1109/MCI.
2015.
2471196.
[44]P.
Domingos,G.
Hulten,Mininghigh-speeddatastreams,in:I.
Parsa,R.
Ra-makrishnan,S.
Stolfo(Eds.
),ProceedingsoftheACMSixthInternationalCon-ferenceonKnowledgeDiscoveryandDataMining,ACMPress,Boston,USA,2000,pp.
71–80.
[45]A.
Dries,U.
Rückert,Adaptiveconceptdriftdetection,Stat.
Anal.
DataMin.
2(56)(2009)311–327,doi:10.
1002/sam.
v2:5/6.
[46]L.
Du,Q.
Song,L.
Zhu,X.
Zhu,Aselectivedetectorensembleforconceptdriftdetection,Comput.
J.
(2014),doi:10.
1093/comjnl/bxu050.
[47]J.
Duarte,J.
Gama,A.
Bifet,Adaptivemodelrulesfromhigh-speeddatastreams,TKDD10(3)(2016)30,doi:10.
1145/2829955.
[48]R.
O.
Duda,P.
E.
Hart,D.
G.
Stork,PatternClassication,2.
,Wiley,NewYork,2001.
[49]K.
B.
Dyer,R.
Capo,R.
Polikar,COMPOSE:asemisupervisedlearningframe-workforinitiallylabelednonstationarystreamingdata,IEEETrans.
NeuralNetw.
Learn.
Syst.
25(1)(2014)12–26.
[50]R.
Elwell,R.
Polikar,Incrementallearningofconceptdriftinnonstationaryenvironments,IEEETrans.
NeuralNetw.
22(10)(2011)1517–1531.
[51]W.
Fan,F.
Chu,H.
Wang,P.
S.
Yu,Pruninganddynamicschedulingofcost-sen-sitiveensembles,in:ProceedingsoftheEighteenthNationalConferenceonArticialIntelligence(AAAI'02),AmericanAssociationforArticialIntelli-gence,MenloPark,USA,2002,pp.
146–151.
[52]W.
Fan,Y.
anHuang,H.
Wang,P.
S.
Yu,Activeminingofdatastreams,Pro-ceedingsofthe4thSIAMInternationalConferenceonDataMining,2004.
[53]E.
R.
deFaria,I.
R.
Goncalves,J.
Gama,A.
C.
P.
deLeonFerreiradeCarvalho,Eval-uationofmulticlassnoveltydetectionalgorithmsfordatastreams,IEEETrans.
Knowl.
DataEng.
27(11)(2015)2961–2973.
[54]T.
Fawcett,Anintroductionto{ROC}analysis,PatternRecognit.
Lett.
27(8)(2006)861–874.
http://dx.
doi.
org/10.
1016/j.
patrec.
2005.
10.
010.
{ROC}AnalysisinPatternRecognition[55]A.
Fern,R.
Givan,Onlineensemblelearning:anempiricalstudy,Mach.
Learn.
53(1–2)(2003)71–109.
[56]Y.
Freund,H.
S.
Seung,E.
Shamir,N.
Tishby,Selectivesamplingusingthequerybycommitteealgorithm,Mach.
Learn.
28(2–3)(1997)133–168.
[57]J.
Friedman,L.
Rafsky,MultivariategeneralizationsoftheWald-WolfowitzandSmirnovtwo-sampletests,Ann.
Stat.
(1979)697–717.
[58]Y.
Fu,X.
Zhu,B.
Li,Asurveyoninstanceselectionforactivelearning,Knowl.
Inf.
Syst.
35(2)(2013)249–283.
[59]M.
M.
Gaber,A.
Zaslavsky,S.
Krishnaswamy,ASurveyofClassicationMeth-odsinDataStreams,SpringerU,pp.
39–59.
doi:10.
1007/978-0-387-47534-9_3[60]J.
Gama,KnowledgeDiscoveryfromDataStreams,Chapman&Hall,CRCPress,2010.
[61]J.
Gama,P.
Medas,Learningdecisiontreesfromdynamicdatastreams,J.
UCS11(8)(2005)1353–1366.
[62]J.
Gama,P.
Medas,G.
Castillo,P.
Rodrigues,Learningwithdriftdetection,in:ProceedingsoftheSeventhBrazilianSymposiumonArticialIntelligence(SBIA'04)-LectureNotesinComputerScience,3171,Springer,SoLuizdoMaranho,Brazil,2004,pp.
286–295.
[63]J.
Gama,R.
Sebastio,P.
P.
Rodrigues,Onevaluatingstreamlearningalgo-rithms,Mach.
Learn.
90(3)(2013)317–346.
[64]J.
Gama,I.
liobait˙e,A.
Bifet,M.
Pechenizkiy,A.
Bouchachia,Asurveyoncon-ceptdriftadaptation,ACMComput.
Surv.
46(4)(2014)44:1–44:37,doi:10.
1145/2523813.
[65]J.
Gao,B.
Ding,W.
Fan,J.
Han,P.
S.
Yu,Classifyingdatastreamswithskewedclassdistributionsandconceptdrifts,IEEEInternetComput.
12(6)(2008)37–49.
[66]S.
García,A.
Fernández,J.
Luengo,F.
Herrera,Advancednonparametrictestsformultiplecomparisonsinthedesignofexperimentsincomputationalin-telligenceanddatamining:experimentalanalysisofpower,Inf.
Sci.
180(10)(2010)2044–2064.
[67]A.
Ghazikhani,R.
Monse,H.
S.
Yazdi,Ensembleofonlineneuralnetworksfornon-stationaryandimbalanceddatastreams,Neurocomputing122(2013)535–544.
[68]R.
Greiner,A.
J.
Grove,D.
Roth,Learningcost-sensitiveactiveclassiers,Artif.
Intell.
139(2)(2002)137–174.
[69]F.
Gustafsson,AdaptiveFilteringandChangeDetection,Wiley,2000.
[70]R.
M.
Haralick,Decisionmakingincontext,IEEETrans.
PatternAnal.
Mach.
Intell.
5(4)(1983)417–428,doi:10.
1109/TPAMI.
1983.
4767411.
[71]M.
Hassani,Y.
Kim,S.
Choi,T.
Seidl,Subspaceclusteringofdatastreams:newalgorithmsandeffectiveevaluationmeasures,J.
Intell.
Inf.
Syst.
45(3)(2015)319–335.
[72]H.
He,E.
A.
Garcia,Learningfromimbalanceddata,IEEETrans.
Knowl.
DataEng.
21(9)(2009)1263–1284.
[73]H.
He,Y.
Ma(Eds.
),ImbalancedLearning:Foundations,Algorithms,andAp-plications,Wiley-IEEEPress,2013.
[74]T.
K.
Ho,Complexityofclassicationproblemsandcomparativeadvantagesofcombinedclassiers,in:ProceedingsoftheFirstInternationalWorkshoponMultipleClassierSystems,in:MCS'00,Springer-Verlag,London,UK,UK,2000,pp.
97–106.
[75]T.
K.
Ho,J.
J.
Hull,S.
N.
Srihari,Decisioncombinationinmultipleclassiersys-tems,IEEETrans.
PatternAnal.
Mach.
Intell.
16(1)(1994)66–75.
[76]Hoens,N.
Chawla,Learninginnon-stationaryenvironmentswithclassim-balance,in:Proceedingsofthe18thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining,2012,pp.
168–176.
[77]T.
R.
Hoens,R.
Polikar,N.
V.
Chawla,Learningfromstreamingdatawithcon-ceptdriftandimbalance:anoverview,Prog.
AI1(1)(2012)89–101.
[78]M.
J.
Hosseini,A.
Gholipour,H.
Beigy,Anensembleofcluster-basedclassiersforsemi-supervisedclassicationofnon-stationarydatastreams,Knowl.
Inf.
Syst.
46(3)(2016)567–597.
[79]H.
Hotelling,Thegeneralizationofstudent'sratio,Ann.
Math.
Stat.
2(3)(1931)360–378.
[80]E.
Ikonomovska,J.
Gama,S.
Dzeroski,Learningmodeltreesfromevolvingdatastreams,DataMin.
Knowl.
Discov.
23(1)(2011a)128–168,doi:10.
1007/s10618-010-0201-y.
[81]E.
Ikonomovska,J.
Gama,S.
Dzeroski,Onlinetree-basedensemblesandoptiontreesforregressiononevolvingdatastreams,Neurocomputing150(2015)458–470,doi:10.
1016/j.
neucom.
2014.
04.
076.
[82]E.
Ikonomovska,J.
Gama,B.
Zenko,S.
Dzeroski,Speeding-upHoeffding-basedregressiontreeswithoptions,in:L.
Getoor,T.
Scheffer(Eds.
),Proceedingsofthe28thInternationalConferenceonMachineLearning,ICML2011,Bellevue,Washington,USA,June28,-July2,2011,Omnipress,2011b,pp.
537–544.
[83]G.
Jaber,AnApproachforOnlineLearninginthePresenceofConceptChanges,AgroParisTechUniversity,2001Ph.
D.
thesis.
[84]K.
Jackowski,Fixed-sizeensembleclassiersystemevolutionarilyadaptedtoarecurringcontextwithanunlimitedpoolofclassiers,PatternAnal.
Appl.
17(4)(2014)709–724.
[85]K.
Jackowski,Adaptivesplittingandselectionalgorithmforregression,NewGener.
Comput.
33(4)(2015)425–448.
[86]R.
A.
Jacobs,M.
I.
Jordan,S.
J.
Nowlan,G.
E.
Hinton,Adaptivemixturesoflocalexperts,NeuralComput.
3(1991)79–87.
[87]N.
Japkowicz,S.
Jerzy(Eds.
),BigDataAnalysis:NewAlgorithmsforaNewSociety,Springer,2016.
[88]N.
Japkowicz,M.
Shah,EvaluatingLearningAlgorithms:AClassicationPer-spective,CambridgeUniversityPress,2011.
[89]P.
M.
G.
Jr.
,S.
G.
deCarvalhoSantos,R.
S.
Barros,D.
C.
Vieira,Acomparativestudyonconceptdriftdetectors,ExpertSyst.
Appl.
41(18)(2014)8144–8156.
http://dx.
doi.
org/10.
1016/j.
eswa.
2014.
07.
019.
[90]P.
Kadlec,B.
Gabrys,Locallearning-basedadaptivesoftsensorforcatalystac-tivationprediction,AlChEJ.
57(5)(2011)1288–1301.
[91]I.
Katakis,G.
Tsumakas,I.
Vlahavas,Anensembleofclassierforcopingwithrecurringcontextsindatastreams,in:FrontiersinArticialIntelligenceandApplications(ECAI2008),2008,pp.
763–764.
[92]T.
Kidera,S.
Ozawa,S.
Abe,Anincrementallearningalgorithmofensembleclassiersystems,in:ProceedingsoftheInternationalJointConferenceonNeuralNetworks,IJCNN2006,PartoftheIEEEWorldCongressonCompu-tationalIntelligence,WCCI2006,Vancouver,BC,Canada,16–21July2006,2006,pp.
3421–3427.
[93]H.
Kim,S.
Madhvanath,T.
Sun,Hybridactivelearningfornon-stationarystreamingdatawithasynchronouslabeling,in:2015IEEEInternationalCon-ferenceonBigData,BigData2015,SantaClara,CA,USA,October29,-November1,2015,2015,pp.
287–292.
[94]R.
Klinkenberg,Learningdriftingconcepts:exampleselectionvs.
exampleweighting,Intell.
DataAnal.
(IDA)J.
-Spec.
IssueIncrementalLearn.
Syst.
Ca-pableDealingConceptDrift8(3)(2004)281–300.
[95]R.
Klinkenberg,T.
Joachims,Detectingconceptdriftwithsupportvectorma-chines,in:ProceedingsoftheSeventeenthInternationalConferenceonMa-chineLearning(ICML'00),MorganKaufmannPublishers,SanFrancisco,CA,2000,pp.
487–494.
[96]R.
Klinkenberg,S.
Ruping,ConceptDriftandtheImportanceofExamples,Physica-Verlag,pp.
55–77.
[97]M.
Kmieciak,J.
Stefanowski,HandlingsuddenconceptdriftinEnronmessagedatastreams,ControlCybern.
40(3)(2011)667–695.
[98]J.
Kolter,M.
Maloof,Dynamicweightedmajority:anewensemblemethodfortrackingconceptdrift,in:DataMining,2003.
ICDM2003.
ThirdIEEEInterna-tionalConferenceon,2003,pp.
123–130.
[99]J.
Z.
Kolter,M.
A.
Maloof,Usingadditiveexpertensemblestocopewithcon-ceptdrift,in:ProceedingsoftheTwentySecondACMInternationalConfer-enceonMachineLearning(ICML'05),2005,pp.
449–456.
Bonn,Germany[100]J.
Z.
Kolter,M.
A.
Maloof,Dynamicweightedmajority:anensemblemethodfordriftingconcepts,J.
Mach.
Learn.
Res.
8(2007)2755–2790.
[101]B.
Krawczyk,Learningfromimbalanceddata:openchallengesandfuturedi-rections,Prog.
Artif.
Intell.
5(4)(2016)221–232.
[102]B.
Krawczyk,M.
Wozniak,Incrementalone-classbaggingforstreamingandevolvingbigdata,in:2015IEEETrustCom/BigDataSE/ISPA,Helsinki,Finland,August20–22,2015,Volume2,2015a,pp.
193–198.
[103]B.
Krawczyk,M.
Wozniak,One-classclassierswithincrementallearningandforgettingfordatastreamswithconceptdrift,SoftComput.
19(12)(2015b)3387–3400.
[104]G.
Krempl,V.
Hofer,Classicationinpresenceofdriftandlatency,in:DataMiningWorkshops(ICDMW),2011IEEE11thInternationalConferenceon,Vancouver,BC,Canada,December11,2011,2011,pp.
596–603.
B.
Krawczyketal.
/InformationFusion37(2017)132–156155[105]G.
Krempl,I.
Zliobaite,D.
Brzezinski,E.
Hüllermeier,M.
Last,V.
Lemaire,T.
Noack,A.
Shaker,S.
Sievi,M.
Spiliopoulou,J.
Stefanowski,Openchallengesfordatastreamminingresearch,SIGKDDExplor.
Newsl.
16(1)(2014)1–10,doi:10.
1145/2674026.
2674028.
[106]A.
Krogh,J.
Vedelsby,Neuralnetworkensembles,crossvalidation,andac-tivelearning,in:AdvancesinNeuralInformationProcessingSystems,7,1995,pp.
231–238.
[107]L.
I.
Kuncheva,Classierensemblesforchangingenvironments,in:Proceed-ingsofthe5thMCSInternationalWorkshoponMultipleClassierSystems,in:LectureNotesinComputerScience,3077,Springer,2004a,pp.
1–15.
[108]L.
I.
Kuncheva,CombiningPatternClassiers:MethodsandAlgorithms,Wi-ley-Interscience,Hoboken,NJ,2004b.
[109]L.
I.
Kuncheva,Classierensemblesfordetectingconceptchangeinstream-ingdata:overviewandperspectives,in:Proceedingsofthe2ndWorkshopSUEMA2008(ECAI2008),2008,pp.
5–10.
[110]B.
Kurlej,M.
Wozniak,Activelearningapproachtoconceptdriftproblem,LogicJ.
IGPL20(3)(2012)550–559.
[111]B.
Lakshminarayanan,D.
M.
Roy,Y.
W.
Teh,Mondrianforests:ecientonlinerandomforests,in:AdvancesinNeuralInformationProcessingSystems27:AnnualConferenceonNeuralInformationProcessingSystems2014,Decem-ber8–132014,Montreal,Quebec,Canada,2014,pp.
3140–3148.
[112]Y.
Lan,Y.
C.
Soh,G.
Huang,Ensembleofonlinesequentialextremelearningmachine,Neurocomputing72(13–15)(2009)3391–3395.
[113]H.
K.
H.
Lee,M.
A.
Clyde,LosslessonlineBayesianbagging,J.
Mach.
Learn.
Res.
5(2004)143–151.
[114]V.
Lemaire,C.
Salperwyck,A.
Bondu,Asurveyonsupervisedclassicationondatastreams,in:4thEuropeanSummerSchoolonBusinessIntelligenceeBISS2014,in:LectureNotesBusinessInformationProcessing,205,Springer,2014,pp.
88–125.
[115]C.
Li,Y.
Zhang,X.
Li,Ocvfdt:one-classveryfastdecisiontreeforone–classclassicationofdatastreams,in:ProceedingsoftheThirdInternationalWorkshoponKnowledgeDiscoveryfromSensorData,Paris,France,June28,2009,2009,pp.
79–86.
[116]R.
Lichtenwalter,N.
V.
Chawla,Adaptivemethodsforclassicationinarbi-trarilyimbalancedanddriftingdatastreams,in:NewFrontiersinAppliedDataMining,PAKDD2009InternationalWorkshops,Bangkok,Thailand,April27–30,2009.
RevisedSelectedPapers,2009,pp.
53–75.
[117]N.
Littlestone,M.
K.
Warmuth,Theweightedmajorityalgorithm,Inf.
Comput.
108(1994)212–261.
[118]B.
Liu,Y.
Xiao,P.
S.
Yu,L.
Cao,Y.
Zhang,Z.
Hao,Uncertainone-classlearningandconceptsummarizationlearningonuncertaindatastreams,IEEETrans.
Knowl.
DataEng.
26(2)(2014)468–484.
[119]B.
I.
F.
Maciel,S.
G.
T.
C.
Santos,R.
S.
M.
Barros,Alightweightconceptdriftdetec-tionensemble,in:ToolswithArticialIntelligence(ICTAI),2015IEEE27thInternationalConferenceon,2015,pp.
1061–1068,doi:10.
1109/ICTAI.
2015.
151.
[120]M.
Markou,S.
Singh,Noveltydetection:areview–part1:statisticalap-proaches,SignalProcess.
83(12)(2003)2481–2497.
[121]M.
M.
Masud,Q.
Chen,L.
Khan,C.
C.
Aggarwal,J.
Gao,J.
Han,A.
N.
Srivastava,N.
C.
Oza,Classicationandadaptivenovelclassdetectionoffeature-evolvingdatastreams,IEEETrans.
Knowl.
DataEng.
25(7)(2013)1484–1497.
[122]M.
M.
Masud,J.
Gao,L.
Khan,J.
Han,B.
M.
Thuraisingham,Classicationandnovelclassdetectioninconcept-driftingdatastreamsundertimeconstraints,IEEETrans.
Knowl.
DataEng.
23(6)(2011)859–874.
[123]M.
M.
Masud,C.
Woolam,J.
Gao,L.
Khan,J.
Han,K.
W.
Hamlen,N.
C.
Oza,Facingtherealityofdatastreamclassication:copingwithscarcityoflabeleddata,Knowl.
Inf.
Syst.
33(1)(2012)213–244.
[124]F.
L.
Minku,H.
Inoue,X.
Yao,Negativecorrelationinincrementallearning,Nat.
Comput.
8(2)(2009)289–320.
[125]L.
Minku,X.
Yao,Cancross-companydataimproveperformanceinsoftwareeffortestimationin:PROMISE,2012,pp.
69–78.
[126]L.
L.
Minku,A.
P.
White,X.
Yao,Theimpactofdiversityononlineensemblelearninginthepresenceofconceptdrift,IEEETrans.
Knowl.
DataEng.
22(5)(2010)730–742.
[127]L.
L.
Minku,X.
Yao,DDD:anewensembleapproachfordealingwithconceptdrift,IEEETrans.
Knowl.
DataEng.
(TKDE)(2010).
16p.
(accepted)[128]L.
L.
Minku,X.
Yao,Howtomakebestuseofcross-companydatainsoftwareeffortestimationin:ICSE,2014,pp.
446–456.
[129]B.
Mirza,Z.
Lin,N.
Liu,Ensembleofsubsetonlinesequentialextremelearningmachineforclassimbalanceandconceptdrift,Neurocomputing149(2015)316–329.
[130]M.
D.
Muhlbaier,A.
Topalis,R.
Polikar,Learn++.
nc:combiningensembleofclassierswithdynamicallyweightedconsult-and-voteforecientincre-mentallearningofnewclasses,IEEETrans.
NeuralNetw.
20(1)(2009)152–168.
[131]H.
-L.
Nguyen,Y.
-K.
Woon,W.
-K.
Ng,Asurveyondatastreamclusteringandclassication,Knowl.
Inf.
Syst.
45(3)(2015)535–569,doi:10.
1007/s10115-014-0808-1.
[132]K.
Nishida,LearningandDetectingConceptDrift,HokkaidoUniversity,Japan,2008Ph.
D.
thesis.
[133]K.
Nishida,K.
Yamauchi,Adaptiveclassiers-ensemblesystemfortrackingconceptdrift,in:ProceedingsoftheSixthInternationalConferenceonMa-chineLearningandCybernetics(ICMLC'07),2007a,pp.
3607–3612.
HonkKong[134]K.
Nishida,K.
Yamauchi,Detectingconceptdriftusingstatisticaltesting,in:ProceedingsoftheTenthInternationalConferenceonDiscoveryScience(DS'07)-LectureNotesinArticialIntelligence,3316,2007b,pp.
264–269.
Sendai,Japan[135]A.
Osojnik,P.
Panov,S.
Dzeroski,Comparisonoftree-basedmethodsformul-ti-targetregressionondatastreams,in:NewFrontiersinMiningComplexPatterns-4thInternationalWorkshop,NFMCP2015,HeldinConjunctionwithECML-PKDD2015,Porto,Portugal,September7,2015,RevisedSelectedPapers,2015,pp.
17–31.
[136]N.
C.
Oza,Onlineensemblelearning,in:ProceedingsoftheSeventeenthNa-tionalConferenceonArticialIntelligenceandTwelfthConferenceonInnova-tiveApplicationsofArticialIntelligence,in:AAAI/IAAI,AAAIPress/TheMITPress,Austin,Texas,USA,2000,p.
1109.
[137]N.
C.
Oza,S.
Russell,Onlinebaggingandboosting,in:ProceedingsoftheEighthInternationalWorkshoponArticialIntelligenceandStatistics(AIS-TATS'01),MorganKaufmann,KeyWest,USA,2001,p.
105112.
[138]E.
S.
Page,Continuousinspectionschemes,Biometrika41(1/2)(1954)100–115,doi:10.
2307/2333009.
[139]S.
Pan,J.
Wu,X.
Zhu,C.
Zhang,Graphensembleboostingforimbalancednoisygraphstreamclassication,IEEETrans.
Cybern.
45(5)(2015)940–954.
[140]S.
Pan,X.
Zhu,C.
Zhang,P.
S.
Yu,Graphstreamclassicationusinglabeledandunlabeledgraphs,in:29thIEEEInternationalConferenceonDataEngi-neering,ICDE2013,Brisbane,Australia,April8–12,2013,2013,pp.
398–409.
[141]A.
Pesaranghader,H.
L.
Viktor,FastHoeffdingdriftdetectionmethodforevolvingdatastreams,in:MachineLearningandKnowledgeDiscoveryinDatabases-EuropeanConference,ECMLPKDD2016,RivadelGarda,Italy,September19–23,2016,Proceedings,PartII,2016,pp.
96–111.
[142]B.
Pfahringer,G.
Holmes,R.
Kirkby,NewoptionsforHoeffdingtrees,in:Pro-ceedingsofthe20thAustralianJointConferenceonArticialIntelligence,in:LectureNotesinComputerScience,4830,Springer,2007,pp.
90–99.
[143]R.
Polikar,L.
Upda,S.
S.
Upda,V.
Honavar,Learn++:anincrementallearningalgorithmforsupervisedneuralnetworks,IEEETrans.
Syst.
ManCybern.
PartC31(4)(2001)497–508.
[144]Z.
Qi,Y.
Xu,L.
Wang,Y.
Song,Onlinemultipleinstanceboostingforobjectdetection,Neurocomputing74(10)(2011)1769–1775.
[145]W.
Qu,Y.
Zhang,J.
Zhu,Q.
Qiu,Miningmulti-labelconcept-driftingdatastreamsusingdynamicclassierensemble,in:AdvancesinMachineLearn-ing,FirstAsianConferenceonMachineLearning,ACML2009,Nanjing,China,November2–4,2009.
Proceedings,2009,pp.
308–321.
[146]S.
Ramamurthy,R.
Bhatnagar,Trackingrecurrentconceptdriftinstream-ingdatausingensembleclassiers,in:ProceedingsoftheSixthInterna-tionalConferenceonMachineLearningandApplications(ICMLA'07),2007,pp.
404–409.
Cincinnati,Ohio[147]S.
Raudys,StatisticalandNeuralClassiers:AnIntegratedApproachtoDe-sign,SpringerPublishingCompany,Incorporated,2014.
[148]J.
Raviv,DecisionmakinginMarkovchainsappliedtotheproblemofpat-ternrecognition,IEEETrans.
Inf.
Theor.
13(4)(2006)536–551,doi:10.
1109/TIT.
1967.
1054060.
[149]J.
Read,A.
Bifet,G.
Holmes,B.
Pfahringer,Scalableandecientmulti-la-belclassicationforevolvingdatastreams,Mach.
Learn.
88(1–2)(2012)243–272.
[150]J.
J.
Rodríguez,L.
I.
Kuncheva,Combiningonlineclassicationapproachesforchangingenvironments,in:Proceedingsofthe2008JointIAPRInternationalWorkshoponStructural,Syntactic,andStatisticalPatternRecognition,in:SSPR&SPR'08,Springer-Verlag,Berlin,Heidelberg,2008,pp.
520–529.
[151]F.
Roli,G.
Giacinto,DesignofMultipleClassierSystems,WorldScienticPublishing.
[152]G.
Ross,N.
Adams,D.
Tasoulis,D.
Hand,Exponentiallyweightedmovingav-eragechartsfordetectingconceptdrifts,PatternRecognit.
Lett.
33(2)(2012)191–198.
[153]A.
Saffari,C.
Leistner,J.
Santner,M.
Godec,H.
Bischof,On-linerandomforests,in:ComputerVisionWorkshops(ICCVWorkshops),2009IEEE12thInterna-tionalConferenceon,IEEE,2009,pp.
1393–1400.
[154]M.
Scholz,R.
Klinkenberg,Anensembleclassierfordriftingconcepts,in:ProceedingsoftheSecondInternationalWorkshoponKnowledgeDiscoveryfromDataStreams(IWKDDS'05),2005,pp.
53–64.
Porto,Portugal[155]M.
Scholz,R.
Klinkenberg,Boostingclassiersfordriftingconcepts,Intell.
DataAnal.
(IDA)-Spec.
IssueKnowl.
Discov.
DataStreams11(1)(2007)3–28.
[156]R.
Sebastiao,J.
Gama,Astudyonchangedetectionmethods,in:ProgressinArticialIntelligence,14thPortugueseConferenceonArticialIntelligence,EPIA,2009,pp.
12–15.
[157]A.
Shaker,E.
Hüllermeier,Survivalanalysisondatastreams:analyzingtem-poraleventsindynamicallychangingenvironments,Int.
J.
Appl.
Math.
Com-put.
Sci.
24(1)(2014)199–212.
[158]A.
Shaker,E.
Hüllermeier,Recoveryanalysisforadaptivelearningfromnon-s-tationarydatastreams:experimentaldesignandcasestudy,Neurocomputing150(2015)250–264.
[159]A.
J.
C.
Sharkey,N.
E.
Sharkey,Combiningdiverseneuralnets,Knowl.
Eng.
Rev.
12(3)(1997)231–247.
[160]D.
J.
Sheskin,HandbookofParametricandNonparametricStatisticalProce-dure.
5thEd.
,5thed.
,BocaRaton,FL:CRCPress,2011.
[161]S.
G.
Soares,R.
Araújo,Adynamicandon-lineensembleregressionforchang-ingenvironments,ExpertSyst.
Appl.
42(6)(2015a)2935–2948.
[162]S.
G.
Soares,R.
Araújo,Anon-lineweightedensembleofregressormodelstohandleconceptdrifts,Eng.
Appl.
AI37(2015b)392–406.
156B.
Krawczyketal.
/InformationFusion37(2017)132–156[163]P.
Sobolewski,M.
Wozniak,Comparablestudyofstatisticaltestsforvir-tualconceptdriftdetection,in:R.
Burduk,K.
Jackowski,M.
Kurzynski,M.
Wozniak,A.
Zolnierek(Eds.
),Proceedingsofthe8thInternationalCon-ferenceonComputerRecognitionSystemsCORES2013,AdvancesinIntelli-gentSystemsandComputing,226,SpringerInternationalPublishing,2013a,pp.
329–337.
[164]P.
Sobolewski,M.
Wozniak,Conceptdriftdetectionandmodelselectionwithsimulatedrecurrenceandensemblesofstatisticaldetectors,J.
UniversalCom-put.
Sci.
19(4)(2013b)462–483.
[165]P.
Sobolewski,M.
Wozniak,Scr:simulatedconceptrecurrenceanon-supervisedtoolfordealingwithshiftingconcept,ExpertSyst.
(2013c)n/a–n/a,doi:10.
1111/exsy.
12059.
EXSY-Oct-12-210.
R1[166]G.
Song,Y.
Ye,Anewensemblemethodformulti-labeldatastreamclassi-cationinnon-stationaryenvironment,in:2014InternationalJointConfer-enceonNeuralNetworks,IJCNN2014,Beijing,China,July6–11,2014,2014,pp.
1776–1783.
[167]M.
Spiliopoulou,G.
Krempl,Tutorialonminingmultiplethreadsofstream-ingdata,in:ThePacic-AsiaConferenceofKnowledgeDiscoveryandDataMining(PAKDD2013),2013.
[168]K.
O.
Stanley,LearningConceptDriftWithaCommitteeofDecisionTrees,TechnicalReport,DepartmentofComputerSciences,UniversityofTexasatAustin,Austin,USA,2003.
UT-AI-TR-03-302[169]J.
Stefanowski,Adaptiveensemblesforevolvingdatastreams-combiningblock-basedandonlinesolutions,in:NewFrontiersinMiningComplexPat-terns-4thInternationalWorkshop,NFMCP2015,HeldinConjunctionwithECML-PKDD2015,Porto,Portugal,September7,2015,RevisedSelectedPa-pers,2015,pp.
3–16.
[170]W.
Street,Y.
Kim,Astreamingensemblealgorithm(SEA)forlarge-scaleclas-sication,in:ProceedingsoftheSeventhACMInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD'01),ACMPress,NewYork,2001,pp.
377–382.
[171]Y.
Sun,K.
Tang,L.
L.
Minku,S.
Wang,X.
Yao,Onlineensemblelearningofdatastreamswithgraduallyevolvedclasses,IEEETrans.
Knowl.
DataEng.
28(6)(2016)1532–1545.
[172]P.
B.
andLuisTorgo,R.
Ribeiro,Asurveyofpredictivemodelingunderimbal-anceddistributions,ACMComput.
Surv.
49(2)(2016)31:1–31:50.
[173]P.
Trajdos,M.
Kurzynski,Multi-labelstreamclassicationusingextendedbi-naryrelevancemodel,in:2015IEEETrustCom/BigDataSE/ISPA,Helsinki,Fin-land,August20–22,2015,Volume2,2015,pp.
205–210.
[174]I.
Triguero,S.
García,F.
Herrera,Self-labeledtechniquesforsemi-supervisedlearning:taxonomy,softwareandempiricalstudy,Knowl.
Inf.
Syst.
42(2)(2015)245–284.
[175]A.
Tsymbal,TheProblemofConceptDrift:DenitionsandRelatedWork,TechnicalReport,TrinityCollegeDublin,Ireland,2004.
TCD-CS-2004-15[176]K.
Tumer,J.
Ghosh,Analysisofdecisionboundariesinlinearlycombinedneu-ralclassiers,PatternRecognit.
29(2)(1996)341–348.
[177]A.
Wald,SequentialAnalysis,JohnWileyandSons,1947.
[178]H.
Wang,W.
Fan,P.
S.
Yu,J.
Han,Miningconcept-driftingdatastreamsusingensembleclassiers,in:ProceedingsoftheNinthACMInternationalConfer-enceonKnowledgeDiscoveryandDataMining(KDD'03),ACMPress,NewYork,2003,pp.
226–235.
[179]P.
Wang,P.
Zhang,L.
Guo,Miningmulti-labeldatastreamsusingensem-ble-basedactivelearning,in:ProceedingsoftheTwelfthSIAMInternationalConferenceonDataMining,Anaheim,California,USA,April26–28,2012.
,2012,pp.
1131–1140.
[180]S.
Wang,L.
L.
Minku,X.
Yao,Resampling-basedensemblemethodsforon-lineclassimbalancelearning,IEEETrans.
Knowl.
DataEng.
27(5)(2015)1356–1368.
[181]S.
Wang,L.
L.
Minku,X.
Yao,Dealingwithmultipleclassesinonlineclassim-balancelearning,in:ProceedingsoftheTwenty-FifthInternationalJointCon-ferenceonArticialIntelligence,IJCAI2016,NewYork,NY,USA,9–15July2016,2016a,pp.
2118–2124.
[182]Z.
Wang,S.
Yoon,S.
J.
Xie,Y.
Lu,D.
S.
Park,Visualtrackingwithsemi-su-pervisedonlineweightedmultipleinstancelearning,VisualComput.
32(3)(2016b)307–320.
[183]G.
Webb,R.
Hyde,H.
Cao,H.
L.
Nguyen,F.
Petitjean,Characterizingconceptdrift,DataMin.
Knowl.
Discov.
J.
30(2016)964–994.
[184]G.
Widmer,M.
Kubat,Learninginthepresenceofconceptdriftandhiddencontext,Mach.
Learn.
23(1996)69–101.
[185]D.
H.
Wolpert,Thesupervisedlearningno-free-lunchtheorems,in:Proceed-ingsofthe6thOnlineWorldConferenceonSoftComputinginIndustrialAp-plications,2001,pp.
25–42.
[186]M.
Wozniak,B.
Cyganek,AFirstAttemptonOnlineDataStreamClassi-erUsingContext,SpringerInternationalPublishing,Cham,pp.
497–504.
doi:10.
1007/978-3-319-40973-3_50[187]M.
Wozniak,B.
Cyganek,A.
Kasprzak,P.
Ksieniewicz,K.
Walkowiak,Activelearningclassierforstreamingdata,in:HybridArticialIntelligentSystems-11thInternationalConference,HAIS2016,Seville,Spain,April18–20,2016,Proceedings,2016a,pp.
186–197.
[188]M.
Wozniak,M.
Graa,E.
Corchado,Asurveyofmultipleclassiersystemsashybridsystems,Inf.
Fusion16(2014)3–17.
[189]M.
Wozniak,A.
Kasprzak,P.
Cal,Applicationofcombinedclassierstodatastreamclassication,in:Proceedingsofthe10thInternationalConferenceonFlexibleQueryAnsweringSystemsFQAS2013,in:LNCS,Springer-Verlag,Berlin,Heidelberg,2013,p.
inpress.
[190]M.
Wozniak,P.
Ksieniewicz,B.
Cyganek,A.
Kasprzak,K.
Walkowiak,Activelearningclassicationofdriftedstreamingdata,in:InternationalConferenceonComputationalScience2016,ICCS2016,6–8June2016,SanDiego,Califor-nia,USA,2016b,pp.
1724–1733.
[191]M.
Wozniak,P.
Ksieniewicz,B.
Cyganek,K.
Walkowiak,EnsemblesofHetero-geneousConceptDriftDetectors-ExperimentalStudy,SpringerInternationalPublishing,Cham,pp.
538–549.
doi:10.
1007/978-3-319-45378-1_48[192]H.
Xiao,C.
Eckert,Lazygaussianprocesscommitteeforreal-timeonlinere-gression,in:ProceedingsoftheTwenty-SeventhAAAIConferenceonArticialIntelligence,July14–18,2013,Bellevue,Washington,USA.
,2013.
[193]E.
S.
Xious,M.
Spiliopoulou,G.
Tsoumakas,I.
P.
Vlahavas,Dealingwithcon-ceptdriftandclassimbalanceinmulti-labelstreamclassication,in:IJCAI2011,Proceedingsofthe22ndInternationalJointConferenceonArticialIn-telligence,Barcelona,Catalonia,Spain,July16–22,2011,2011,pp.
1583–1588.
[194]S.
ichiYoshida,K.
Hatano,E.
Takimoto,M.
Takeda,Adaptiveonlinepredictionusingweightedwindows,IEICETrans.
94-D(10)(2011)1917–1923.
[195]G.
Zenobi,P.
Cunningham,Usingdiversityinpreparingensemblesofclassi-ersbasedondifferentfeaturesubsetstominimizegeneralizationerror,in:MachineLearning:ECML2001,2001,pp.
576–587.
[196]P.
Zhang,X.
Zhu,J.
Tan,L.
Guo,Classierandclusterensemblesforminingconceptdriftingdatastreams,in:ProceedingsoftheIEEEInternationalCon-ferenceonDataMining,2010,pp.
1175–1180.
[197]Q.
Zhao,Y.
Jiang,M.
Xu,Incrementallearningbyheterogeneousbaggingen-semble,in:AdvancedDataMiningandApplications-6thInternationalCon-ference,ADMA2010,Chongqing,China,November19–21,2010,Proceedings,PartII,2010,pp.
1–12.
[198]X.
Zhu,W.
Ding,P.
S.
Yu,C.
Zhang,One-classlearningandconceptsumma-rizationfordatastreams,Knowl.
Inf.
Syst.
28(3)(2011)523–553.
[199]X.
Zhu,P.
Zhang,X.
Lin,Y.
Shi,Activelearningfromstreamdatausingoptimalweightclassierensemble,IEEETrans.
Syst.
ManCybern.
PartB40(6)(2010)1607–1621.
[200]I.
Zliobaite,AdaptiveTrainingSetFormation,VilniusUniversity,2010Ph.
D.
thesis.
[201]I.
Zliobaite,Controlledpermutationsfortestingadaptivelearningmodels,Knowl.
Inf.
Syst.
30(2013a)1–4.
[202]I.
Zliobaite,Howgoodistheelectricitybenchmarkforevaluatingconceptdriftadaptation,arXivpreprintarXiv:1301.
3524(2013b).
[203]I.
Zliobaite,A.
Bifet,M.
Gaber,B.
Gabrys,J.
Gama,L.
Minku,K.
Musial,Nextchallengesforadaptivelearningsystems,SIGKDDExplor.
Newsl.
14(1)(2012)48–55,doi:10.
1145/2408736.
2408746.
[204]I.
Zliobaite,A.
Bifet,B.
Pfahringer,G.
Holmes,Activelearningwithevolvingstreamingdata,in:Proceedingsofthe2011EuropeanConferenceonMachineLearningandKnowledgeDiscoveryinDatabases,in:SpringerLNCS,6913,2011,pp.
597–612.
[205]I.
Zliobaite,A.
Bifet,B.
Pfahringer,G.
Holmes,Activelearningwithdriftingstreamingdata,IEEETrans.
NeuralNetw.
Learn.
Syst.
25(1)(2014)27–39.
[206]I.
Zliobaite,A.
Bifet,J.
Read,B.
Pfahringer,G.
Holmes,Evaluationmethodsanddecisiontheoryforclassicationofstreamingdatawithtemporalde-pendence,Mach.
Learn.
98(3)(2015a)455–482.
[207]I.
Zliobaite,M.
Budka,F.
T.
Stahl,Towardscost-sensitiveadaptation:whenisitworthupdatingyourpredictivemodelNeurocomputing150(2015b)240–249.
[208]I.
Zliobaite,M.
Pechenizkiy,J.
Gama,Anoverviewofconceptdriftapplica-tions,in:N.
Japkowicz,J.
Stefanowski(Eds.
),BigDataAnalysis:NewAlgo-rithmsforaNewSociety,Springer,2016,pp.
91–114.

展开全文