restrictlinuxmint
linuxmint 时间:2021-03-28 阅读:(
)
CombiningPrototypeSelectionwithLocalBoostingChristosK.
Aridas(B),SotirisB.
Kotsiantis,andMichaelN.
VrahatisComputationalIntelligenceLaboratory(CILab),DepartmentofMathematics,UniversityofPatras,26110Patras,Greecechar@upatras.
gr,{sotos,vrahatis}@math.
upatras.
grAbstract.
Reallifeclassicationproblemsrequireaninvestigationofrelationshipsbetweenfeaturesinheterogeneousdatasets,wheredierentpredictivemodelscanbemoreproperfordierentregionsofthedataset.
Asolutiontothisproblemistheapplicationofthelocalboostingofweakclassiersensemblemethod.
Amaindrawbackofthisapproachisthetimethatisrequiredatthepredictionofanunseeninstanceaswellasthedecreaseoftheclassicationaccuracyinthepresenceofnoiseinthelocalregions.
Inthisresearchwork,animprovedversionofthelocalboostingofweakclassiers,whichincorporatesprototypeselection,ispresented.
Experimentalresultsonseveralbenchmarkreal-worlddatasetsshowthattheproposedmethodsignicantlyoutperformsthelocalboostingofweakclassiersintermsofpredictiveaccuracyandthetimethatisneededtobuildalocalmodelandclassifyatestinstance.
Keywords:Localboosting·Weaklearning·Prototypeselection·Patternclassication1IntroductionInmachinelearning,instance-based(ormemory-based)learnersclassifyanunseenobjectbycomparingittoadatabaseofpre-classiedobjects.
Thefun-damentalassumptionisthatsimilarinstanceswillsharesimilarclasslabels.
Machinelearningmodels'assumptionswouldnotnecessarilyholdglobally.
Locallearning[1]methodscometosolvethisproblem.
Thelatterallowtoextendlearningalgorithms,thataredesignedforsimplemodels,tothecaseofcomplexdata,forwhichthemodels'assumptionsarevalidonlylocally.
Themostcommoncaseistheassumptionoflinearseparability,whichisusuallynotfullledgloballyinclassicationproblems.
Despitethis,anysupervisedlearningalgorithmthatisabletondonlyalinearseparation,canbeusedinsidealocallearningprocess,producingamodelthatisabletomodelcomplexnon-linearclassboundaries.
Atechniqueofboostinglocalweakclassiers,thatisbasedonareducedtrainingsetaftertheusageofprototypeselection[11],isproposed.
Itiscommonthatboostingalgorithmsarewell-knowntobesusceptibletonoise[2].
InthecasecIFIPInternationalFederationforInformationProcessing2016PublishedbySpringerInternationalPublishingSwitzerland2016.
AllRightsReservedL.
IliadisandI.
Maglogiannis(Eds.
):AIAI2016,IFIPAICT475,pp.
94–105,2016.
DOI:10.
1007/978-3-319-44944-99CombiningPrototypeSelectionwithLocalBoosting95oflocalboosting,thealgorithmshouldmanagereasonablenoiseandbeatleastasgoodasboosting,ifnotbetter.
Fortheexperiments,weusedtwovariantsofDecisionTrees[21]asweaklearningmodels:one-levelDecisionTrees,whichareknownasDecisionStumps[12]andtwo-levelDecisionTrees.
Anextensivecomparisonoverseveraldatasetswasperformedandtheresultsshowthattheproposedmethodoutperformssimpleandlocalboostingintermsofclassicationaccuracy.
InthenextSection,specicallyinSubsect.
2.
1,thelocalizedexpertsaredis-cussed,whileboostingapproachesaredescribedinSubsect.
2.
2.
InSect.
3theproposedmethodispresented.
Furthermore,inSect.
4theresultsoftheexper-imentsonseveralUCIdatasets,afterbeingcomparedwithstandardboostingandlocalboosting,areportrayedanddiscussed.
Finally,Sect.
5concludesthepaperandsuggestsfurtherdirectionsincurrentresearch.
2BackgroundMaterialForcompletenesspurposes,localweightedlearning,prototypeselectionmethodsaswellasboostingclassiertechniquesarebrieydescribedinthefollowingsubsections.
2.
1LocalWeightedLearningandPrototypeSelectionSupervisedlearningalgorithmsareconsideredglobaliftheyuseallavailabletrainingsets,inordertobuildasinglepredictivemodel,thatwillbeappliedinanyunseentestinstance.
Ontheotherhand,amethodisconsideredlocalifonlythenearesttraininginstancesaroundthetestinginstancecontributetotheclassprobabilities.
Whenthesizeofthetrainingdatasetissmallincontrasttothecomplexityoftheclassier,thepredictivemodelfrequentlyovertsthenoiseinthetrain-ingdata.
Therefore,thesuccessfulcontrolofthecomplexityofaclassierhasahighimpactinaccomplishinggoodgeneralization.
Severaltheoreticalandexper-imentalresults[23]indicatethatalocallearningalgorithmprovidesareasonablesolutiontothisproblem.
Inlocallearning[1],eachlocalmodelisbuiltcompletelyindependentofallothermodelsinawaythatthetotalnumberoflocalmodelsinthelearningmethodindirectlyinuenceshowcomplexafunctioncanbeestimated-com-plexitycanonlybecontrolledbythelevelofadaptabilityofeachlocalmodel.
Thisfeaturepreventsoverttingifastronglearningpatternexistsfortrainingeachlocalmodel.
Prototypeselectionisatechniquethataimstodecreasethetrainingsizewithoutsurfacingthepredictionperformanceofamemorybasedlearner[18].
Besidesthis,byreducingthetrainingsetsizeitmightdecreasethecomputationalcostthatwillbeappliedinthepredictionphase.
Prototypeselectiontechniquescanbegroupedinthreecategories:preserva-tiontechniques,whichaimtondaconsistentsubsetfromthetrainingdataset,96Ch.
K.
Aridasetal.
ignoringthepresenceofnoise,noiseremovaltechniques,whichaimtoremovenoise,andhybridtechniques,whichperformbothobjectivesconcurrently[22].
2.
2BoostingClassiersExperimentalresearchworkshaveproventhatensemblemethodsusuallyper-formbetter,intermsofclassicationaccuracy,thantheindividualbaseclassier[2],andlately,severaltheoreticalexplanationshavebeenadvisedtoexplainthesuccessofsomecommonlyusedensemblemethods[13].
Inthiswork,alocalboostingtechniquethatisbasedonareducedtrainingset,aftertheusageofprototypeselection[11],isproposedandforthisreasonthissectionintroducestheboostingapproach.
Boostingconstructstheensembleofclassiersbysubsequentlytweakingthedistributionofthetrainingsetbasedontheaccuracyofthepreviouslycreatedclassiers.
Thereareseveralboostingvariants.
Thesemethodsassignaweighttoeachtraininginstance.
Firstly,allinstancesareequallyweighted.
Ineachiter-ationanewclassicationmodel,namedbaseclassier,isgeneratedusingthebaselearningalgorithm.
Thecreationofthebaseclassierhastoconsidertheweightdistribution.
Then,theweightofeachinstanceisadjusted,dependingontheaccuracyofthepredictionofthebaseclassierforthatinstance.
Thus,Boostingattemptstoconstructnewclassicationmodelsthatareabletobet-terclassifythe"hard"instancesforthepreviousensemblemembers.
Thenalclassicationisobtainedfromaweightedvoteofthebaseclassiers.
AdaBoost[8]isthemostwell-knownboostingmethodandtheonethatisusedovertheexperimentalanalysisthatispresentedinSect.
3.
Adaboostisabletouseweightsintwowaystogenerateanewtrainingdatasettoprovidetothebaseclassier.
Inboostingbysampling,thetraininginstancesaresampledwithreplacementwithprobabilityrelativetotheirweights.
In[26]theauthorsshowedempiricallythatalocalboosting-by-resamplingtech-niqueismorerobusttonoisethanthestandardAdaBoost.
Theauthorsof[17]proposedaBoostedk-NNalgorithmthatcreatesanensembleofmodelswithlocallymodieddistanceweightingthathasincreasedgeneralizationaccuracyandneverperformsworsethanstandardk-NN.
In[10]theauthorspresentedanovelmethodforinstanceselectionbasedonboostinginstanceselectionalgo-rithmsinthesamewayboostingisappliedtoclassication.
3TheProposedAlgorithmTwomaindisadvantagesofsimplelocalboostingare:(i)Whentheamountofnoiseislarge,simplelocalboostingdoesnothavethesameperformance[26]asBagging[3]andRandomForest[4].
(ii)Savingthedataforeachpatternincreasesstoragecomplexity.
Thismightrestricttheusageofthismethodtolimitedtrainingsets[21].
Theproposedalgorithmincorporatesprototypeselec-tiontohandle,amongothers,thetwopreviousproblems.
Inthelearningphase,aprototypeselection[11]methodbasedontheEditedNearestNeighbor(ENN)CombiningPrototypeSelectionwithLocalBoosting97[24]techniquereducesthetrainingsetbyremovingthetraininginstancesthatdonotagreewiththemajorityoftheknearestneighbors.
Intheapplicationphase,itconstructsamodelforeachtestinstancetobeestimated,consideringonlyasubsetofthetraininginstances.
Thissubsetisselectedaccordingtothedistancebetweenthetestingsampleandtheavailabletrainingsamples.
Foreachtestinginstance,aboostingensembleofaweaklearnerisbuiltusingonlythetraininginstancesthatarelyingclosetothecurrenttestinginstance.
Thepro-totypeselectionaimstoimprovetheclassicationaccuracyaswellasthetimethatisneededtobuildamodelforeachtestinstanceattheprediction.
Theproposedensemblemethodhassomefreeparameters,suchasthenumberofneighbors(k1)tobeconsideredwhentheprototypeselectionisexecuted,thenumberofneighbors(k2)tobeselectedinordertobuildthelocalmodel,thedistancemetricandtheweaklearner.
Intheexperiments,themostwell-knownEuclideansimilarityfunctionwasusedasadistancemetric.
Ingeneral,thedistancebetweenpointsxandyinaEuclideanspaceRnisgivenby(1).
d(x,y)=xy2=ni=1|xiyi|2.
(1)Themostcommonvalueforthenearestneighborruleis5.
Thus,thek1wassetto5andk2=50.
sinceataboutthissizeofinstances,itisappropriateforasimplealgorithmtobuildaprecisemodel[14].
TheproposedmethodispresentedinAlgorithm1.
Algorithm1.
PSLB(k1,k2,distanceMetric,weakLearner)procedureTraining(k1,distanceMetric)foreachtraininginstancedoFindthek1nearestneighborsusingtheselecteddistanceMetricifinstancedoesnotagreewiththemajorityofthek1thenRemovethisinstancefromthetrainingsetendifendforendprocedureprocedureClassification(k2,distanceMetric,weakLearner)foreachtestinginstancedoFindthek2nearestneighborsusingtheselecteddistanceMetricApplyboostingtothebaseweakLearnerusingthek2nearestneighborsTheansweroftheboostingensembleisthepredictionforthetestinginstanceendforendprocedure98Ch.
K.
Aridasetal.
4NumericalExperimentsInordertoevaluatetheperformanceoftheproposedmethod,aninitialversionwasimplemented1andanumberofexperimentswereconductedusingseveraldatasetsfromdierentdomains.
FromtheUCIrepository[16]severaldatasetswerechosen.
Discretefeaturestransformedtonumericbyusingasimplequantization.
Eachfeatureisscaledtohavezeromeanandstandarddeviationone.
Alsoallmissingvaluesweretreatedaszero.
InTable1thename,thenumberofpatterns,theattributes,aswellasthenumberofdierentclassesforeachdatasetareshown.
Table1.
BenchmarkdatasetsusedintheexperimentsDataset#patterns#attribues#classescardiotocography21262110cylinder-bands512252dermatology366246ecoli33678energy-y176883glass21496low-res-spect5311009magic19020102musk-14761662ozone2536722page-blocks5473105pima76882synthetic-control600606tic-tac-toe95892AllexperimentswererunonanIntelCorei3–3217Umachineat1.
8GHz,with8GBofRAM,runningLinuxMint17.
364bitusingPythonandthescikit-learn[19]library.
Fortheexperiments,weusedtwovariantsofDecisionTrees[25]asweaklearners.
One-levelDecisionTrees[12],alsoknownasDecisionStumps,andtwo-levelDecisionTrees[20].
WeusedtheGiniImpurity[5]ascriteriontomeasurethequalityofthesplitsinbothalgorithms.
TheboostingprocessforallclassiersperformedusingtheAdaBoostalgorithmwith25iterationsineachmodel.
Inordertocalculatetheclassiersaccuracy,thewholedatasetwasdividedintovemutuallyexclusivefoldsandforeachfoldtheclassierwastrainedontheunionofalloftheotherfolds.
Then,cross-validationwasrunvetimesforeachalgorithmandthemeanvalueofthevefoldswascalculated.
1https://bitbucket.
org/chkoar/pslb.
CombiningPrototypeSelectionwithLocalBoosting994.
1PrototypeSelectionTheprototypeselectionprocessisindependentofthebaseclassierandittakesplaceonceinthetrainingphaseoftheproposedalgorithm.
Itdependsonlyonthek1parameter.
Thenumberofneighborstobeconsideredwhentheprototypeselectionisexecuted.
InTable2theaverageoftrainingpatterns,theaverageoftheremovedpatternsaswellastheaveragereductionofeachdatasetispresented.
Theaveragereferstotheaverageofalltrainingfoldsduringthe5-foldcross-validation.
Table2.
AveragereductionDataset#avgtrainingpatterns#avgremovedpatterns%avgreductioncardiotocography170126815,73cylinder-bands4106014,60dermatology293702,53ecoli2692910,86energy-y16141702,77glass1714023,13lowresspect4254711,11magic15216179811,81musk-13812406,25ozone20295002,48page-blocks437811102,53pima61410917,77synthetic-control480801,67tic-tac-toe766100,134.
2UsingDecisionStumpasBaseClassierIntherstpartoftheexperiments,DecisionStumps[12]wereusedasweaklearningclassiers.
DecisionStumps(DS)areone-levelDecisionTreesthatclas-sifyinstancesbasedonthevalueofjustasingleinputattribute.
Eachnodeinadecisionstumprepresentsafeatureinaninstancetobeclassiedandeachbranchrepresentsavaluethatthenodecantake.
Instancesareclassiedstartingattherootnodeandaresortedbasedontheirattributevalues.
Intheworstcase,aDecisionStumpwillbehaveasabaselineclassierandwillpossiblyperformbetter,iftheselectedattributeisparticularlyinformative.
Theproposedmethod,denotedasPSLBDS,iscomparedwiththeBoostingDecisionStumps,denotedasBDSandtheLocalBoostingofDecisionStumps,denotedasLBDS.
Sincetheproposedmethodusesftyneighbors,a50-Nearest100Ch.
K.
Aridasetal.
Neighbors(50NN)classierhasincludedinthecomparisons.
InTable3theaver-ageaccuracyofthecomparedmethodsispresented.
Table3indicatesthatthehypothesesgeneratedbyPSLBDSareapparentlybettersincethePSLBDSalgo-rithmhasthebestmeanaccuracyscoreinnearlyallcases.
Table3.
Averageaccuracyofthecomparedalgorithmsusinganone-leveldecisiontreeasbaseclassierDatasetPSLBDSBDSLBDS50NNcardiotocography0.
682±0.
0280.
548±0.
0880.
659±0.
0150.
607±0.
029cylinder-bands0.
613±0.
0300.
560±0.
0370.
584±0.
0140.
582±0.
017dermatology0.
942±0.
0270.
641±0.
1420.
940±0.
0220.
902±0.
020ecoli0.
821±0.
0290.
622±0.
1290.
794±0.
0260.
780±0.
050energy-y10.
844±0.
0900.
706±0.
0500.
836±0.
0920.
822±0.
091glass0.
582±0.
0850.
285±0.
0940.
568±0.
0650.
446±0.
169low-res-spect0.
850±0.
0250.
584±0.
0690.
846±0.
0120.
851±0.
023magic0.
849±0.
0050.
828±0.
0050.
834±0.
0050.
828±0.
004musk-10.
727±0.
0960.
727±0.
0520.
718±0.
0850.
618±0.
096ozone0.
966±0.
0080.
960±0.
0100.
887±0.
1330.
971±0.
001page-blocks0.
954±0.
0120.
853±0.
1630.
950±0.
0130.
942±0.
007pima0.
757±0.
0280.
755±0.
0240.
685±0.
0240.
749±0.
017synthetic-control0.
947±0.
0110.
472±0.
0740.
943±0.
0200.
887±0.
030tic-tac-toe0.
884±0.
0840.
733±0.
0340.
882±0.
0830.
747±0.
101Demˇsar[6]suggeststhatthenon-parametrictestsshouldbepreferredovertheparametricinthecontextofmachinelearningproblems,sincetheydonotassumenormaldistributionsorhomogeneityofvariance.
Therefore,inthedirec-tionofvalidatingthesignicanceoftheresults,theFriedmantest[9],whichisarank-basednon-parametrictestforcomparingseveralmachinelearningalgo-rithmsonmultipledatasets,wasused,havingasacontrolmethodthePSLBDSalgorithm.
Thenullhypothesisoftheteststatesthatallthemethodsperformequivalentlyandthustheirranksshouldbeequivalent.
Theaveragerankings,accordingtotheFriedmantest,arepresentedinTable4.
Assumingasignicancelevelof0.
05inTable4,thep-valueoftheFriedmantestindicatesthatthenullhypothesishastoberejected.
So,thereisatleastonemethodthatperformsstatisticallydierentfromtheproposedmethod.
Withtheintentionofinvestigatingtheaforementioned,Finner's[7]andLi's[15]posthocprocedureswereused.
InTable5thep-valuesobtainedbyapplyingposthocproceduresovertheresultsoftheFriedmanstatisticaltestarepresented.
Finner'sandLi'sprocedurerejectsthosehypothesesthathaveap-value≤0.
05.
Thatsaid,theadjustedp-valuesobtainedthroughtheapplicationoftheposthocproceduresarepresentedCombiningPrototypeSelectionwithLocalBoosting101inTable6.
Hence,bothposthocproceduresagreethatthePSLBDSalgorithmperformssignicantlybetterthantheBDS,theLBDSaswellasthe50NNrule.
4.
3UsingTwo-LevelDecisionTreeasaBaseClassierAfterwards,two-levelDecisionTreeswereusedasweaklearningbaseclassiers.
Atwo-levelDecisionTreeisatreewithmaxdepth=2.
Theproposedmethod,denotedasPSLBDT,iscomparedtotheBoostingDecisionTree,denotedasBDTandtheLocalBoostingofDecisionTrees,denotedasLBDT.
Sincetheproposedmethodusesftyneighborsa50-NearestNeighbors(50NN)classierhasincludedinthecomparisons.
InTable7theaverageaccuracyofthecom-paredmethodsispresented.
Table7indicatesthatthehypothesesgeneratedbyPSLBDTareapparentlybetter,sincethePSLBDTalgorithmhasthebestmeanaccuracyscoreinmostcases.
Theaveragerankings,accordingtotheFriedmantest,arepresentedinTable8.
Theproposedalgorithmwasrankedintherstplaceagain.
Assumingsignicancelevelof0.
05inTable8,thep-valueoftheFriedmantestindicatesthatthenullhypothesishastoberejected.
So,thereisatleastonemethodthatperformsstatisticallydierentfromtheproposedmethod.
Aimingtoinvestigatetheaforesaid,Finner'sandLi'sposthocprocedureswereusedagain.
InTable9thep-valuesobtainedbyapplyingposthocproceduresovertheresultsofFriedman'sstatisticaltestarepresented.
Finner'sandLi'sprocedurerejectsthosehypothesesthathaveap-value≤0.
05.
Thatsaid,theadjustedp-valuesobtainedthroughtheapplicationoftheposthocproceduresarepre-sentedinTable10.
BothposthocproceduresagreethatthePSLBDTalgorithmperformssignicantlybetterthantheBDTandthe50NNrulebutnotsigni-cantlybetterthantheLBDTasfarasthetesteddatasetsareconcerned.
4.
4TimeAnalysisOneofthetwocontributionsofthisstudywastoimprovetheclassicationtimeoverthelocalboostingapproach.
Inordertoprovethis,thetotaltimethatisrequiredtopredictallinstancesinthetestfoldswasrecorded.
Specically,Table4.
AveragerankingsofFriedmantest(DS)AlgorithmRankingPSLBDS1.
1429LBDS2.
428650NN2.
8571BDS3.
5714Statistic26.
228571p-value0.
000009102Ch.
K.
Aridasetal.
Table5.
PosthoccomparisonfortheFriedmanstest(DS)iAlgorithmz=(R0Ri)/SEpFinnerLi3BDS4.
977090.
0000010.
0169520.
052189250NN3.
513240.
0004430.
0336170.
0521891LBDS2.
634930.
0084150.
050.
05Table6.
Adjustedp-values(DS)iAlgorithmpUnadjustedpFinnerpLi3BDS0.
0000010.
0000020.
000001250NN0.
0004430.
0006640.
0004461LBDS0.
0084150.
0084150.
008415Table7.
Averageaccuracyofthecomparedalgorithmsusingatwo-leveldecisiontreeasbaseclassierDatasetPSLBDTBDTLBDT50NNcardiotocography0.
683±0.
0200.
584±0.
0720.
686±0.
0170.
607±0.
029cylinder-bands0.
609±0.
0490.
608±0.
0340.
564±0.
0250.
582±0.
017dermatology0.
958±0.
0400.
800±0.
0410.
951±0.
0200.
902±0.
020ecoli0.
813±0.
0320.
753±0.
0360.
800±0.
0300.
780±0.
050energy-y10.
845±0.
0600.
830±0.
0640.
844±0.
0710.
822±0.
091glass0.
608±0.
0480.
569±0.
1120.
652±0.
0570.
446±0.
169low-res-spect0.
877±0.
0420.
573±0.
1360.
872±0.
0230.
851±0.
023magic0.
849±0.
0060.
856±0.
0070.
841±0.
0060.
828±0.
004musk-10.
738±0.
0720.
752±0.
0240.
746±0.
0740.
618±0.
096ozone0.
967±0.
0080.
925±0.
0640.
888±0.
1320.
971±0.
001page-blocks0.
960±0.
0100.
924±0.
0230.
956±0.
0100.
942±0.
007pima0.
763±0.
0230.
742±0.
0140.
730±0.
0180.
749±0.
017synthetic-control0.
950±0.
0110.
830±0.
0360.
953±0.
0160.
887±0.
030tic-tac-toe0.
893±0.
0780.
665±0.
1260.
889±0.
0810.
747±0.
101thepredictionofeachtestfoldwasexecutedthreetimesandtheminimumtimewasrecordedforeachfold.
Then,theaverageofallfoldswascalculated.
InTable11theaveragepredictiontimeinsecondsofLBDS,PSLBDS,LBDTandPSLBDTSispresented.
Inthecaseofone-leveldecisiontrees(LBDS,PSLBDS)theproposedmethodreducedtheexpectedpredictiontimeinmorethan15%in6of14cases,whileinthecaseoftwo-leveldecisiontrees(LBDT,PSLBDT)theproposedmethodreducedtheexpectedpredictiontimeinmorethan15%in7of14cases.
InFig.
1theabsolutepercentagechangesarepresented.
CombiningPrototypeSelectionwithLocalBoosting103Table8.
AveragerankingsofFriedmantest(two-leveltree)AlgorithmRankingPSLBDT1.
5LBDT2.
285750NN3.
0714BDT3.
1429Statistic15p-value0.
001817Table9.
PosthoccomparisonfortheFriedmanstest(two-leveltree)iAlgorithmz=(R0Ri)/SEpFinnerLi3BDT3.
3668550.
000760.
0169520.
046982250NN3.
220470.
001280.
0336170.
0469821LBDT1.
6102350.
1073470.
050.
05Table10.
Adjustedp-values(two-leveltree)iAlgorithmpUnadjustedpFinnerpLi3BDT0.
000760.
0022790.
000851250NN0.
001280.
0022790.
0014321LBDT0.
1073470.
1073470.
107347Table11.
Averagepredictiontimes,insecondsDatasetLBDSPSLBDSLBDTPSLBDTcardiotocography33.
8933.
2632.
4329.
20cylinder-bands8.
168.
078.
457.
86dermatology3.
563.
523.
283.
20ecoli5.
003.
614.
662.
92energy-y18.
587.
197.
596.
25glass3.
393.
373.
463.
16low-res-spect6.
746.
385.
773.
77magic257.
14160.
31213.
59107.
98musk-19.
539.
508.
807.
99ozone14.
844.
897.
241.
69page-blocks17.
279.
3412.
284.
27pima11.
728.
9011.
077.
56synthetic-control6.
326.
123.
893.
76tic-tac-toe13.
5613.
5612.
1812.
00104Ch.
K.
Aridasetal.
Fig.
1.
PercentagechangeofpredictiontimebetweenLocalBoostingandtheproposedmethod5SynopsisandFutureWorkLocalmemory-basedtechniquesdelaytheprocessingofthetrainingsetuntiltheyreceivearequestforanactionlikeclassicationorlocalmodelling.
Adatasetofobservedtrainingexamplesisalwaysretainedandtheestimateforanewtestinstanceisobtainedfromaninterpolationbasedonaneighborhoodofthequeryinstance.
Inthisresearchworkathand,alocalboostingafterprototypeselectionmethodispresented.
Experimentsonseveraldatasetsshowthattheproposedmethodsignicantlyoutperformstheboostingandlocalboostingmethod,intermsofclassicationaccuracyandthetimethatisrequiredtobuildalocalmodelandclassifyatestinstance.
Typically,boostingalgorithmsarewellknowntobesubtletonoise[2].
Inthecaseoflocalboosting,thealgorithmshouldhandlesucientnoiseandbeatleastasgoodasboosting,ifnotbetter.
Bymeansofthepromisingresultsobtainedfromperformedexperiments,onecanassumethattheproposedmethodcanbesuccessfullyappliedtotheclassicationtaskintherealworldcasewithmoreaccuracythanthecomparedmachinelearningapproaches.
Inafollowingworktheproposedmethodwillbeinvestigatedasfarasregres-sionproblemsareconcernedaswellastheproblemofreducingthesizeofthestoredsetofinstances,byalsoapplyingfeatureselectioninsteadofsimplepro-totypeselection.
References1.
Atkeson,C.
G.
,Schaal,S.
,Moore,A.
W.
:Locallyweightedlearning.
Artif.
Intell.
Rev.
11(1),11–73(1997)2.
Bauer,E.
,Kohavi,R.
:Anempiricalcomparisonofvotingclassicationalgorithms:bagging,boosting,andvariants.
Mach.
Learn.
36(1),105–139(1999)3.
Breiman,L.
:Baggingpredictors.
Mach.
Learn.
24(2),123–140(1996)4.
Breiman,L.
:Randomforests.
Mach.
Learn.
45(1),5–32(2001)5.
Breiman,L.
,Friedman,J.
,Stone,C.
,Olshen,R.
:ClassicationandRegressionTrees.
Chapman&Hall,NewYork(1993)CombiningPrototypeSelectionwithLocalBoosting1056.
Demar,J.
:Statisticalcomparisonsofclassiersovermultipledatasets.
J.
Mach.
Learn.
Res.
7,1–30(2006)7.
Finner,H.
:Onamonotonicityprobleminstep-downmultipletestprocedures.
J.
Am.
Stat.
Assoc.
88(423),920–923(1993)8.
Freund,Y.
,Schapire,R.
E.
:Others:experimentswithanewboostingalgorithm.
ICML.
96,148–156(1996)9.
Friedman,M.
:Theuseofrankstoavoidtheassumptionofnormalityimplicitintheanalysisofvariance.
J.
Am.
Stat.
Assoc.
32(200),675(1937)10.
Garca-Pedrajas,N.
,deHaro-Garca,A.
:Boostinginstanceselectionalgorithms.
Knowl.
BasedSyst.
67,342–360(2014)11.
Garcia,S.
,Derrac,J.
,Cano,J.
R.
,Herrera,F.
:Prototypeselectionfornearestneighborclassication:taxonomyandempiricalstudy.
IEEETrans.
PatternAnal.
Mach.
Intell.
34(3),417–435(2012)12.
Iba,W.
,Langley,P.
:Inductionofone-leveldecisiontrees.
In:ProceedingsoftheNinthInternationalWorkshoponMachineLearning,pp.
233–240,ML1992.
MorganKaufmannPublishersInc.
,SanFrancisco(1992)13.
Kleinberg,E.
M.
:Amathematicallyrigorousfoundationforsupervisedlearning.
In:Kittler,J.
,Roli,F.
(eds.
)MCS2000.
LNCS,vol.
1857,pp.
67–76.
Springer,Heidelberg(2000)14.
Kotsiantis,S.
B.
,Kanellopoulos,D.
,Pintelas,P.
E.
:Localboostingofdecisionstumpsforregressionandclassicationproblems.
J.
Comput.
1(4),30–37(2006)15.
Li,J.
:Atwo-steprejectionprocedurefortestingmultiplehypotheses.
J.
Stat.
Plann.
Infer.
138(6),1521–1527(2008)16.
Lichman,M.
:UCIMachineLearningRepository(2013)17.
Neo,T.
K.
C.
,Ventura,D.
:Adirectboostingalgorithmforthek-nearestneighborclassiervialocalwarpingofthedistancemetric.
PatternRecogn.
Lett.
33(1),92–102(2012)18.
Olvera-Lpez,J.
A.
,Carrasco-Ochoa,J.
A.
,Martnez-Trinidad,J.
F.
,Kittler,J.
:Areviewofinstanceselectionmethods.
Artif.
Intell.
Rev.
34(2),133–144(2010)19.
Pedregosa,F.
,Varoquaux,G.
,Gramfort,A.
,Michel,V.
,Thirion,B.
,Grisel,O.
,Blondel,M.
,Prettenhofer,P.
,Weiss,R.
,Dubourg,V.
,Vanderplas,J.
,Passos,A.
,Cournapeau,D.
,Brucher,M.
,Perrot,M.
,Duchesnay,D.
:Scikit-learn:machinelearninginPython.
J.
Mach.
Learn.
Res.
12,2825–2830(2011)20.
Quinlan,J.
R.
:Inductionofdecisiontrees.
Mach.
Learn.
1(1),81–106(1986)21.
Rokach,L.
:Ensemble-basedclassiers.
Artif.
Intell.
Rev.
33(1–2),1–39(2010)22.
Segata,N.
,Blanzieri,E.
,Delany,S.
J.
,Cunningham,P.
:Noisereductionforinstance-basedlearningwithalocalmaximalmarginapproach.
J.
Intell.
Inf.
Syst.
35(2),301–331(2010)23.
Vapnik,V.
N.
:StatisticalLearningTheory:AdaptiveandLearningSystemsforSignalProcessing,Communications,andControl.
Wiley,NewYork(1998)24.
Wilson,D.
L.
:Asymptoticpropertiesofnearestneighborrulesusingediteddata.
IEEETrans.
Sys.
ManCybern.
2(3),408–421(1972)25.
Witten,I.
H.
,Frank,E.
,Hall,M.
A.
:DataMining:PracticalMachineLearningToolsandTechniques.
MorganKaufmannSeriesinDataManagementSystems,3rdedn.
MorganKaufmann,Burlington(2011)26.
Zhang,C.
X.
,Zhang,J.
S.
:Alocalboostingalgorithmforsolvingclassicationprob-lems.
Comput.
Stat.
DataAnal.
52(4),1928–1941(2008)
上次部落分享过VirMach提供的End of Life Plans系列的VPS主机,最近他们又发布了DEDICATED MIGRATION SPECIALS产品,并提供6.5-7.5折优惠码,优惠后最低每月27.3美元起。同样的这些机器现在订购,将在2021年9月30日至2022年4月30日之间迁移,目前这些等待迁移机器可以在洛杉矶、达拉斯、亚特兰大、纽约、芝加哥等5个地区机房开设,未来迁移的时...
近日CloudCone发布了最新的补货消息,针对此前新年闪购年付便宜VPS云服务器计划方案进行了少量补货,KVM虚拟架构,美国洛杉矶CN2 GT线路,1Gbps带宽,最低3TB流量,仅需14美元/年,有需要国外便宜美国洛杉矶VPS云服务器的朋友可以尝试一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2017年的美国服务器...
ucloud6.18推出全球大促活动,针对新老用户(个人/企业)提供云服务器促销产品,其中最低配快杰云服务器月付5元起,中国香港快杰型云服务器月付13元起,最高可购3年,有AMD/Intel系列。当然这都是针对新用户的优惠。注意,UCloud全球有31个数据中心,29条专线,覆盖五大洲,基本上你想要的都能找到。注意:以上ucloud 618优惠都是新用户专享,老用户就随便看看!点击进入:uclou...
linuxmint为你推荐
固态硬盘是什么固态硬盘是什么?郭泊雄郭佰雄最后一次出现是什么时候?789se.com莫非现在的789mmm珍的com不管了www.javmoo.comjavimdb怎么看www.bbb551.combbb是什么意思www.147.qqq.com谁有147清晰的视频?学习学习网页源代码网页源代码是什么,具体讲一下?www.88ququ.comwww.mncast.com这个网站的视频怎么下载www.xiaoyuan.com性感内衣秀哪里有?高清图片。凤阙寒宫宫斗群的宫规是什么,详细一点
如何申请域名 国外vps租用 2019年感恩节 北京主机 免费ftp站点 天互数据 anylink 200g硬盘 空间登入 网站加速软件 国外的代理服务器 免费asp空间申请 阿里云邮箱登陆 免费赚q币 脚本大全 ncp是什么 发证机构 带宽测速 天翼云主机 sockscap下载 更多