somejbwww.mfcclub.net

www.mfcclub.net 时间:2021-04-07 阅读:()

JIntellInfSyst(2013)41:371–406DOI10.
1007/s10844-013-0250-yClassificationaccuracyisnotenoughOntheevaluationofmusicgenrerecognitionsystemsBobL.
SturmReceived:12November2012/Revised:10March2013/Accepted:14May2013/Publishedonline:14July2013TheAuthor(s)2013.
ThisarticleispublishedwithopenaccessatSpringerLink.
comAbstractWearguethatanevaluationofsystembehavioratthelevelofthemusicisrequiredtousefullyaddressthefundamentalproblemsofmusicgenrerecognition(MGR),andindeedothertasksofmusicinformationretrieval,suchasautotagging.
ArecentreviewofworksinMGRsince1995showsthatmost(82%)measurethecapacityofasystemtorecognizegenrebyitsclassificationaccuracy.
AfterreviewingevaluationinMGR,weshowthatneitherclassificationaccuracy,norrecallandpre-cision,norconfusiontables,necessarilyreflectthecapacityofasystemtorecognizegenreinmusicalsignals.
Hence,suchfiguresofmeritcannotbeusedtoreliablyrank,promoteordiscountthegenrerecognitionperformanceofMGRsystemsifgenrerecognition(ratherthanidentificationbyirrelevantconfoundingfactors)istheobjective.
Thismotivatesthedevelopmentofaricherexperimentaltoolboxforeval-uatinganysystemdesignedtointelligentlyextractinformationfrommusicsignals.
KeywordsMusic·Evaluation·Classification·Genre1IntroductionTheproblemofidentifying,discriminatingbetween,andlearningthecriteriaofmusicgenresorstyles—musicgenrerecognition(MGR)—hasmotivatedmuchworksince1995(MatityahoandFurst1995),andevenearlier,e.
g.
,PorterandNeuringer(1984).
Indeed,arecentreviewofMGRbyFuetal.
(2011)writes,BLSissupportedinpartbyIndependentPostdocGrant11-105218fromDetFrieForskningsrd;andtheDanishCouncilforStrategicResearchoftheDanishAgencyforScienceTechnologyandInnovationintheCoSoundproject,casenumber11-115328.
B.
L.
Sturm(B)AudioAnalysisLab,AD:MT,AalborgUniversityCopenhagen,A.
C.
MeyersVnge15,2450,CopenhagenSV,Denmarke-mail:bst@create.
aau.
dk372JIntellInfSyst(2013)41:371–406"Genreclassificationisthemostwidelystudiedareain[musicinformationre-trieval].
"MGRresearchisnowmakinganappearanceintextbooks(Lerch2012).
MostpublishedstudiesofMGRsystemsreportclassificationperformancesig-nificantlybetterthanchance,andsometimesaswellasorbetterthanhumans.
Forabenchmarkdatasetofmusicexcerptssingly-labeledintengenres(GTZAN,TzanetakisandCook2002;Sturm2013b),reportedclassificationaccuracieshaverisenfrom61%(TzanetakisandCook2002)toabove90%,e.
g.
,Guaus(2009),Panagakisetal.
(2009a,b),PanagakisandKotropoulos(2010)andChangetal.
(2010).
Indeed,asBergstraetal.
(2006a)write,"Giventhesteadyandsignificantimprovementinclassificationperformancesince1997,wewonderifautomaticmethodsarenotalreadymoreefficientatlearninggenresthansomepeople.
"Thisperformanceincreasemeritsacloserlookatwhatisworkinginthesesystems,andmotivatesre-evaluatingtheargumentthatgenreexiststoalargeextentoutsideoftheacousticsignalitself(Fabbri1982;McKayandFujinaga2006;Wiggins2009).
Mostexciting,itmightalsoilluminatehowpeoplehearandconceptualizethecomplexphenomenonof"music"(AucouturierandBigand2013).
Itmightbetoosoontoasksuchquestions,however.
Recentwork(Sturm2012b;Marquesetal.
2010,2011a)showsthatanMGRsystemcanactasifgenreisnotwhatitisrecognizing,evenifitshowshighclassificationaccuracy.
InacomprehensivereviewoftheMGRliterature(Sturm2012a),wefindthatover91%ofpaperswithanexperimentalcomponent(397of435papers)evaluateMGRsystemsbyclassifyingmusicexcerptsandcomparingthelabelstothe"groundtruth,"andover82%of467publishedworksciteclassificationaccuracyasafigureofmerit(FoM).
Ofthosethatemploythisapproachtoevaluation,47%employonlythisapproach.
Furthermore,wefindseveralcasesofmethodologicalerrorsleadingtoinflatedaccuracies:thoseofPanagakisetal.
(2009a,b)andPanagakisandKotropoulos(2010)comefromaccidentallyusingthetruelabelsinclassification(privatecorrespondencewithY.
Panagakis)(SturmandNoorzad2012);thoseofChangetal.
(2010),areirreproducible,andcontradictresultsseeninotherareasapplyingthesametechnique(Sturm2013a);andthoseofBagciandErzin(2007)arehighlyunlikelywithananalysisoftheirapproach(SturmandGouyon2013,unpublished).
Onemustwonderifthe"progress"inMGRseensince1995isnotduetosolvingtheproblem:canasystemhaveahighclassificationaccuracyinsomedatasetsyetnotevenaddresstheproblematallWeshowherethatclassificationaccuracydoesnotreliablyreflectthecapacityofanMGRsystemtorecognizegenre.
Furthermore,recall,precisionandconfusiontablesarestillnotenough.
WeshowtheseFoMs—allofwhichhavebeenusedinthepasttorankMGRsystems,e.
g.
,ChaiandVercoe(2001),TzanetakisandCook(2002),AucouturierandPachet(2003),BurredandLerch(2004),TurnbullandElkan(2005),Flexer(2006),DeCoroetal.
(2007),BenetosandKotropoulos(2008),Panagakisetal.
(2009b),Bergstraetal.
(2010),Fuetal.
(2011)andRenandJang(2012)citingoneworkfromeachyearsince2001—donotreliablyreflectthecapacityofanMGRsystemtorecognizegenre.
Whiletheseclaimshavenotbeenmadeovertinanyofthe467referenceswesurvey(Sturm2012a),shadesofithaveappearedbefore(Craftetal.
2007;Craft2007;Lippensetal.
2004;Wiggins2009;Seyerlehneretal.
2010;Sturm2012b),whichargueforevaluatingperformanceinwaysthataccountfortheambiguityofgenrebeinginlargepartasubjectiveconstructionJIntellInfSyst(2013)41:371–406373(Fabbri1982;Frow2005).
WegofurtherandarguethattheevaluationofMGRsystems—theexperimentaldesigns,thedatasets,andtheFoMs—andindeed,thedevelopmentoffuturesystems,mustembracethefactthattherecognitionofgenreistoalargeextentamusicalproblem,andmustbeevaluatedassuch.
Inshort,classificationaccuracyisnotenoughtoevaluatetheextenttowhichanMGRsystemaddresseswhatappearstobeoneofitsprincipalgoals:toproducegenrelabelsindistinguishablefromthosehumanswouldproduce.
1.
1ArgumentsSomearguethatsinceMGRisnowreplacedby,orisasubproblemof,themoregen-eralproblemofautomatictagging(AucouturierandPampalk2008;Bertin-Mahieuxetal.
2010),workinMGRisirrelevant.
However,genreisoneofthemostuseddescriptorsofmusic(AucouturierandPachet2003;Scaringellaetal.
2006;McKayandFujinaga2006):in2007,nearly70%ofthetagsonlast.
fmaregenrelabels(Bertin-Mahieuxetal.
2010);andanotinsignificantportionofthetagsintheMillionSongDatasetaregenre(Bertin-Mahieuxetal.
2011;Schindleretal.
2012).
SomearguethatautomatictaggingismorerealisticthanMGRbecausemultipletagscanbegivenratherthanthesingleoneinMGR,e.
g.
,Panagakisetal.
(2010b),Marquesetal.
(2011a),Fuetal.
(2011)andSeyerlehneretal.
(2012).
ThisclaimanditsoriginsaremysteriousbecausenothingaboutMGR—theproblemofidentifying,discriminatingbetween,andlearningthecriteriaofmusicgenresorstyles—naturallyrestrictsthenumberofgenrelabelspeopleusetodescribeapieceofmusic.
PerhapsthisimaginedlimitationofMGRcomesfromthefactthatof435workswithanexperimentalcomponentwesurvey(Sturm2012a),wefindonlytenthatuseamultilabelapproach(BarbedoandLopes2008;Lukashevichetal.
2009;Maceetal.
2011;McKay2004;Sanden2010;SandenandZhang2011a,b;Scaringellaetal.
2006;TacchiniandDamiani2011;Wangetal.
2009).
PerhapsitcomesfromthefactthatmostoftheprivateandpublicdatasetssofarusedinMGRassumeamodelofonegenrepermusicalexcerpt(Sturm2012a).
Perhapsitcomesfromtheassumptionthatgenreworksinsuchawaythatanobjectbelongstoagenre,ratherthanusesagenre(Frow2005).
Somearguethat,giventheambiguityofgenreandtheobservedlackofhumanconsensusaboutsuchmatters,MGRisanill-posedproblem(McKayandFujinaga2006).
However,peopleoftendoagree,evenundersurprisingconstraints(GjerdingenandPerrott2008;Krumhansl2010;Maceetal.
2011).
ResearchershavecompiledMGRdatasetswithvalidationfromlisteningtests,e.
g.
,(Lippensetal.
2004;Mengetal.
2005);andveryfewresearchershaveovertlyarguedagainstanyofthegenreassignmentsofthemost-usedpublicdatasetforMGR(Sturm2012a,2013b).
Hence,MGRdoesnotalwaysappeartobeanill-posedproblemsincepeopleoftenusegenretodescribeanddiscussmusicinconsistentways,andthat,nottoforget,MGRmakesnorestrictiononthenumberofgenresrelevantfordescribingaparticularpieceofmusic.
Somearguethatthoughpeopleshowsomeconsistencyinusinggenre,theyaremakingdecisionsbasedoninformationnotpresentintheaudiosignal,suchascomposerintentionormarketingstrategies(McKayandFujinaga2006;Bergstraetal.
2006b;Wiggins2009).
However,thereexistsomegenresorstylesthatappeardistinguishableandidentifiablefromthesound,e.
g.
,musicologicalcriterialiketempo(GouyonandDixon2004),chordprogressions(Angladeetal.
2010),instrumentation(McKayandFujinaga2005),lyrics(LiandOgihara2004),andsoon.
374JIntellInfSyst(2013)41:371–406SomearguethatMGRisreallyjustaproxyproblemthathaslittlevalueinandofitself;andthatthepurposeofMGRisreallytoprovideanefficientmeanstogaugetheperformanceoffeaturesandalgorithmssolvingtheproblemofmeasuringmusicsimilarity(Pampalk2006;SchedlandFlexer2012).
Thispointofview,however,isnotevidentinmuchoftheMGRliterature,e.
g.
,thethreereviewsdevotedspecificallytoMGR(AucouturierandPachet2003;Scaringellaetal.
2006;Fuetal.
2011),theworkofTzanetakisandCook(2002),BarbedoandLopes(2008),Bergstraetal.
(2006a),HolzapfelandStylianou(2008),Marquesetal.
(2011b),Panagakisetal.
(2010a),BenetosandKotropoulos(2010),andsoon.
ItisthusnotidiosyncratictoclaimthatonepurposeofMGRcouldbetoidentify,discriminatebetween,andlearnthecriteriaofmusicgenresinordertoproducegenrelabelsthatareindistinguishablefromthosehumanswouldproduce.
Onemightargue,"MGRdoesnothavemuchvaluesincemosttrackstodayarealreadyannotatedwithgenre.
"However,genreisnotafixedattributelikeartistorinstrumentation(Fabbri1982;Frow2005);anditiscertainlynotanattributeofonlycommercialmusicinfalliblyordainedbycomposers,producers,and/orconsumersusingperfecthistoricalandmusicologicalreflection.
Onecannotassumesuchmetadataarestaticandunquestionable,orthatevensuchinformationisuseful,e.
g.
,forcomputationalmusicology(Collins2012).
SomemightarguethatthereasonsMGRworkisstillpublishedisthat:1)itprovidesawaytoevaluatenewfeatures;and2)itprovidesawaytoevaluatenewapproachestomachinelearning.
Whilesuchaclaimaboutpublicationistenuous,wearguethatitmakeslittlesensetoevaluatefeaturesormachinelearningapproacheswithoutconsideringforwhattheyaretobeused,andthendesigningandusingappropriateproceduresforevaluation.
WeshowinthispaperthatthetypicalwaysinwhichnewfeaturesandmachinelearningmethodsareevaluatedforMGRprovidelittleinformationabouttheextentstowhichthefeaturesandmachinelearningforMGRaddressthefundamentalproblemofrecognizingmusicgenre.
1.
2OrganizationandconventionsWeorganizethisarticleasfollows.
Section2distillsalongthreedimensionsthevarietyofapproachesthathavebeenusedtoevaluateMGR:experimentaldesign,datasets,andFoMs.
Wedelimitourstudytoworkspecificallyaddressingtherecog-nitionofmusicgenreandstyle,andnottagsingeneral,i.
e.
,the467workswesurvey(Sturm2012a).
WeshowmostworkinMGRreportsclassificationaccuracyfromacomparisonofpredictedlabelsto"groundtruths"ofprivatedatasets.
Thethirdsectionreviewsthreestate-of-the-artMGRsystemsthatshowhighclassificationaccuracyinthemost-usedpublicmusicgenredatasetGTZAN(TzanetakisandCook2002;Sturm2013b).
Inthefourthsection,weevaluatetheperformancestatisticsofthesethreesystems,startingfromhigh-levelFoMssuchasclassificationaccuracy,recallandprecision,continuingtomid-levelclassconfusions.
Inthefifthsection,weevaluatethebehaviorsofthesesystemsbyinspectinglow-levelexcerptmisclassifications,andperformingalisteningtestthatprovesthebehaviorsofallthreesystemsarehighlydistinguishablefromthoseofhumans.
Weconcludebydiscussingourresultsandfurthercriticisms,andalookforwardtothedevelopmentandpracticeofbettermeansforevaluation,notonlyinMGR,butalsothemoregeneralproblemofmusicdescription.
JIntellInfSyst(2013)41:371–406375Weusethefollowingconventionsthroughout.
WhenwerefertoDisco,wearereferringtothose100excerptsintheGTZANcategorynamed"Disco"withoutadvocatingthattheyareexemplaryofthegenredisco.
ThesameappliesfortheexcerptsoftheotherninecategoriesofGTZAN.
WecapitalizethecategoriesofGTZAN,e.
g.
,Disco,capitalizeandquotelabels,e.
g.
,"Disco,"butdonotcapitalizegenres,e.
g.
,disco.
AnumberfollowingacategoryinGTZANreferstotheidentifyingnumberofitsexcerptfilename.
Alltogether,"itappearsthissystemdoesnotrecognizediscobecauseitclassifiesDisco72as'Metal'.
"2EvaluationinmusicgenrerecognitionresearchSurprisinglylittlehasbeenwrittenaboutevaluation,i.
e.
,experimentaldesign,data,andFoMs,withrespecttoMGR(Sturm2012a).
Anexperimentaldesignisamethodfortestingahypothesis.
Dataisthematerialonwhichasystemistested.
AFoMreflectstheconfidenceinthehypothesisafterconductinganexperiment.
OfthreereviewarticlesdevotedinlargeparttoMGR(AucouturierandPachet2003;Scaringellaetal.
2006;Fuetal.
2011),onlyAucouturierandPachet(2003)giveabriefparagraphonevaluation.
TheworkbyVatolkin(2012)providesacomparisonofvariousperformancestatisticsformusicclassification.
Otherworks(Berenzweigetal.
2004;Craftetal.
2007;Craft2007;Lippensetal.
2004;Wiggins2009;Seyerlehneretal.
2010;Sturm2012b)argueformeasur-ingperformanceinwaysthattakeintoaccountthenaturalambiguityofmusicgenreandsimilarity.
Forinstance,weSturm(2012b),Craftetal.
(2007)andCraft(2007)argueforricherexperimentaldesignsthanhavingasystemapplyasinglelabeltomusicwithapossiblyproblematic"groundtruth.
"Flexer(2006)criticizestheabsenceofformalstatisticaltestinginmusicinformationresearch,andprovidesanexcellenttutorialbaseduponMGRforhowtoapplystatisticaltests.
Derivedfromoursurvey(Sturm2012a),Fig.
1showstheannualnumberofpublicationsinMGR,andtheproportionthatuseformalstatisticaltestingincomparingMGRsystems.
010203040506070199511996019971199841999220004200162002112003182004272005323200641820073062008387200949920106611201155620124713No.
PublicationsAllworksExperimentalw/ostatisticsExperimentalw/statisticsFig.
1AnnualnumbersofreferencesinMGRdividedbywhichuseanddonotuseformalstatisticaltestsformakingcomparisons(Sturm2012a).
Onlyabout12%ofreferencesinMGRemployformalstatisticaltesting;andonly19.
4%ofthework(91papers)appearsattheConferenceoftheInternationalSocietyforMusicInformationRetrieval376JIntellInfSyst(2013)41:371–406Table1TenexperimentaldesignsofMGR,andthepercentageofreferenceshavinganexperimen-talcomponent(435references)inoursurvey(Sturm2012a)thatemploythemDesignDescription%ClassifyToanswerthequestion,"Howwelldoesthesystempredictthegenresused91bymusic"Thesystemappliesgenrelabelstomusic,whichresearcherthencomparestoa"groundtruth"FeaturesToanswerthequestion,"Atwhatisthesystemlookingtoidentifythegenres33usedbymusic"Thesystemranksand/orselectsfeatures,whichresearchertheninspectsGeneralizeToanswerthequestion,"Howwelldoesthesystemidentifygenreinvaried16datasets"Classifywithtwoormoredatasetshavingdifferentgenres,and/orvariousamountsoftrainingdataRobustToanswerthequestion,"Towhatextentisthesysteminvarianttoaspects7inconsequentialforidentifyinggenre"ThesystemclassifiesmusicthatresearchermodifiesortransformsinwaysthatdonotharmitsgenreidentificationbyahumanEyeballToanswerthequestion,"Howwelldotheparametersmakesensewith7respecttoidentifyinggenre"Thesystemderivesparametersfrommusic;researchervisuallycomparesClusterToanswerthequestion,"Howwelldoesthesystemgrouptogethermusic7usingthesamegenres"Thesystemcreatesclustersofdataset,whichresearchertheninspectScaleToanswerthequestion,"Howwelldoesthesystemidentifymusicgenre7withvaryingnumbersofgenres"ClassifywithvaryingnumbersofgenresRetrieveToanswerthequestion,"Howwelldoesthesystemidentifymusicusing4thesamegenresusedbythequery"Thesystemretrievesmusicsimilartoquery,whichresearchertheninspectsRulesToanswerthequestion,"Whatarethedecisionsthesystemismaking4toidentifygenres"TheresearcherinspectsrulesusedbyasystemtoidentifygenresComposeToanswerthequestion,"Whataretheinternalgenremodelsofthesystem"0.
7Thesystemcreatesmusicinspecificgenres,whichtheresearchertheninspectsSomereferencesusemorethanonedesignTable1summarizestenexperimentaldesignswefindinoursurvey(Sturm2012a).
HereweseethatthemostwidelyuseddesignbyfarisClassify.
TheexperimentaldesignusedtheleastisCompose,andappearsinonlythreeworks(CruzandVidal2003,2008;Sturm2012b).
Almosthalfoftheworkswesurvey(213references),usesonlyoneexperimentaldesign;andofthese,47%employClassify.
Wefindonly36worksexplicitlymentionevaluatingwithanartistoralbumfilter(Pampalketal.
2005;Flexer2007;FlexerandSchnitzer2009,2010).
Wefindonly12worksusinghumanevaluationforgaugingthesuccessofasystem.
Typically,formallyjustifyingamisclassificationasanerrorisataskresearchinMGRoftendeferstothe"groundtruth"ofadataset,whethercreatedbyalistener(TzanetakisandCook2002),theartist(Seyerlehneretal.
2010),musicvendors(GjerdingenandPerrott2008;AriyaratneandZhang2012),thecollectiveagreementofseverallisteners(Lippensetal.
2004;Garcíaetal.
2007)professionalmusicologists(Abeeretal.
2012),ormultipletagsgivenbyanonlinecommunity(Law2011).
Table2showsthedatasetsusedbyreferencesinoursurvey(Sturm2012a).
Overall,79%ofthisworkusesaudiodataorfeaturesderivedfromaudiodata,about19%JIntellInfSyst(2013)41:371–406377Table2DatasetsusedinMGR,thetypeofdatatheycontain,andthepercentageofexperimentalwork(435references)inoursurvey(Sturm2012a)thatusethemDatasetDescription%PrivateConstructedforresearchbutnotmadeavailable58GTZANAudio;http://marsyas.
info/download/data_sets23ISMIR2004Audio;http://ismir2004.
ismir.
net/genre_contest17Latin(Sillaetal.
2008)Features;http://www.
ppgia.
pucpr.
br/silla/lmd/5BallroomAudio;http://mtg.
upf.
edu/ismir2004/contest/tempoContest/3HomburgAudio;http://www-ai.
cs.
uni-dortmund.
de/audio.
html3(Homburgetal.
2005)BodhidharmaSymbolic;http://jmir.
sourceforge.
net/Codaich.
html3USPOP2002Audio;http://labrosa.
ee.
columbia.
edu/projects/musicsim/2(Berenzweigetal.
2004)uspop2002.
html1517-artistsAudio;http://www.
seyerlehner.
info1RWC(Gotoetal.
2003)Audio;http://staff.
aist.
go.
jp/m.
goto/RWC-MDB/1SOMeJBFeatures;http://www.
ifs.
tuwien.
ac.
at/andi/somejb/1SLACAudio&symbols;http://jmir.
sourceforge.
net/Codaich.
html1SALAMI(Smithetal.
2011)Features;http://ddmal.
music.
mcgill.
ca/research/salami0.
7UniqueFeatures;http://www.
seyerlehner.
info0.
7MillionsongFeatures;http://labrosa.
ee.
columbia.
edu/millionsong/0.
7(Bertin-Mahieuxetal.
2011)ISMIS2011Features;http://tunedit.
org/challenge/music-retrieval0.
4AlldatasetslistedafterPrivatearepublicusessymbolicmusicdata,and6%usesfeaturesderivedfromothersources,e.
g.
,lyrics,theWWW,andalbumart.
(Someworksusemorethanonetypeofdata.
)About27%ofworkevaluatesMGRsystemsusingtwoormoredatasets.
Whilemorethan58%oftheworksusesdatasetsthatarenotpubliclyavailable,themost-usedpublicdatasetisGTZAN(TzanetakisandCook2002;Sturm2013b).
Table3showstheFoMsusedintheworkswesurvey(Sturm2012a).
GivenClassifyisthemost-useddesign,itisnotsurprisingtofindmeanaccuracyappearsTable3Figuresofmerit(FoMs)ofMGR,theirdescription,andthepercentageofwork(467references)inoursurvey(Sturm2012a)thatusethemFoMDescription%MeanaccuracyProportionofthenumberofcorrecttrialstothetotalnumberoftrials82ConfusiontableCountsoflabelingoutcomesforeachlabeledinput32RecallForaspecificinputlabel,proportionofthenumberofcorrecttrials25tothetotalnumberoftrialsConfusionsDiscussionofconfusionsofthesystemingeneralorwithspecifics24PrecisionForaspecificoutputlabel,proportionofthenumberofcorrecttrials10tothetotalnumberoftrialsF-measureTwicetheproductofRecallandPrecisiondividedbytheirsum4CompositionObservationsofthecompositionofclusterscreatedbythesystem,4distanceswithinandbetweenPrecision@kProportionofthenumberofcorrectitemsofaspecificlabelinthek3itemsretrievedROCPrecisionvs.
Recall(truepositivesvs.
falsepositives)forseveralsystems,1parameters,etc.
378JIntellInfSyst(2013)41:371–406themostoften.
Whenitappears,onlyabout25%ofthetimeisitaccompaniedbystandarddeviation(orequivalent).
Wefind6%ofthereferencesreportmeanaccuracyaswellasrecallandprecision.
ConfusiontablesarethenextmostprevalentFoM;andwhenoneappears,itisnotaccompaniedbyanykindofmusicologicalreflectionabouthalfthetime.
OftheworksthatuseClassify,wefindabout44%ofthemreportoneFoMonly,andabout53%reportmorethanoneFoM.
Atleastsixworksreporthuman-weightedratingsofclassificationand/orclusteringresults.
Onemightarguethattheevaluationabovedoesnotclearlyreflectthatmostpapersonautomaticmusictaggingreportrecall,precision,andF-measures,andnotmeanaccuracy.
However,inoursurveywedonotconsiderworkinautomatictaggingunlesspartoftheevaluationspecificallyconsiderstheresultinggenretags.
Hence,weseethatmostworkinMGRusesclassificationaccuracy(theexperimentaldesignClassifywithmeanaccuracyasaFoM)inprivatedatasets,orGTZAN(TzanetakisandCook2002;Sturm2013b).
3Threestate-of-the-artsystemsformusicgenrerecognitionWenowdiscussthreeMGRsystemsthatappeartoperformwellwithrespecttostateoftheartclassificationaccuracyinGTZAN(TzanetakisandCook2002;Sturm2013b),andwhichweevaluateinlatersections.
3.
1AdaBoostwithdecisiontreesandbagsofframesoffeatures(AdaBFFs)AdaBFFswasproposedbyBergstraetal.
(2006a),andperformedthebestinthe2005MIREXMGRtask(MIREX2005).
ItcombinesweakclassifierstrainedbymulticlassAdaBoost(FreundandSchapire1997;SchapireandSinger1999),whichcreatesastrongclassifierbycounting"votes"ofweakclassifiersgivenobservationx.
WiththefeaturesinRMofatrainingsetlabeledinKclasses,iterationladdsaweakclassifiervl(x):RM→{1,1}Kandweightwl∈[0,1]tominimizethetotalpredictionerror.
Apositiveelementmeansitfavorsaclass,whereasnegativemeanstheopposite.
AfterLtrainingiterations,theclassifieristhefunctionf(x):RM→[1,1]Kdefinedf(x):=Ll=1wlvl(x)Ll=1wl.
(1)ForanexcerptofrecordedmusicconsistingofasetoffeaturesX:={xi},AdaBFFspickstheclassk∈{1,K}associatedwiththemaximumelementinthesumofweightedvotes:fk(X):=|X|i=1[f(xi)]k(2)where[a]kisthekthelementofthevectora.
Weusethe"multiboostpackage"(Benbouzidetal.
2012)withdecisiontreesastheweaklearners,andAdaBoost.
MH(SchapireandSinger1999)asthestronglearner.
ThefeaturesweusearecomputedfromaslidingHannwindowof46.
4msand50%overlap:40Mel-frequencycepstralcoefficients(MFCCs)(Slaney1998),zerocrossings,meanandvarianceofthemagnitudeFouriertransform(centroidandJIntellInfSyst(2013)41:371–406379spread),16quantilesofthemagnitudeFouriertransform(rolloff),andtheerrorofa32-orderlinearpredictor.
Wedisjointlypartitionthesetoffeaturesintogroupsof130consecutiveframes,andthencomputeforeachgroupthemeansandvariancesofeachdimension.
Fora30-smusicexcerpt,thisproduces9featurevectorsof120dimensions.
Bergstraetal.
(2006a)reportthisapproachobtainsaclassificationaccuracyofupto83%inGTZAN.
Inourreproductionoftheapproach(Sturm2012b),weachieveusingstumps(singlenodedecisiontrees)asweakclassifiersaclassificationaccuracyofupto77.
6%inGTZAN.
Weincreasethistoabout80%byusingtwo-nodedecisiontrees.
3.
2Sparserepresentationclassificationwithauditorytemporalmodulations(SRCAM)SRCAM(Panagakisetal.
2009b;SturmandNoorzad2012)usessparserepresenta-tionclassification(Wrightetal.
2009)inadictionarycomposedofauditoryfeatures.
Thisapproachisreportedtohaveclassificationaccuraciesabove90%(Panagakisetal.
2009a,b;PanagakisandKotropoulos2010),butthoseresultsarisefromaflawintheexperimentinflatingaccuraciesfromaround60%(SturmandNoorzad2012)(privatecorrespondencewithY.
Panagakis).
Wemodifytheapproachtoproduceclassificationaccuraciesabove80%(Sturm2012b).
Eachfeaturecomesfromamod-ulationanalysisofatime-frequencyrepresentation;andfora30-ssoundexcerptwithsamplingrate22,050Hz,thefeaturedimensionalityis768.
Tocreateadictionary,weeithernormalizethesetoffeatures(mappingallvaluesineachdimensionto[0,1]bysubtractingtheminimumvalueanddividingbythelargestdifference),orstandardizethem(makingalldimensionshavezeromeanandunitvariance).
WiththedictionaryD:=[d1|d2|dN],andamappingofcolumnstoclassidentitiesKk=1Ik={1,N},whereIkspecifiesthecolumnsofDbelongingtoclassk,SRCAMfindsforafeaturevectorx(whichisthefeaturexwetransformbythesamenormalizationorstandardizationapproachusedtocreatethedictionary)asparserepresentationsbysolvingmins1subjecttoxDs22≤ε2(3)foraε2>0wespecify.
SRCAMthendefinesthesetofclass-restrictedweights{sk∈RN}k∈{1,.
.
.
,K}[sk]n:=[s]n,n∈Ik0,else.
(4)Thus,skaretheweightsinsspecifictoclassk.
Finally,SRCAMclassifiesxbyfindingtheclass-dependentweightsgivingthesmallesterrork(x):=argmink∈{1,.
.
.
,K}xDsk22.
(5)WedefinetheconfidenceofSRCAMinassigningclassktoxbycomparingtheerrors:C(k|x):=maxkJkJkl[maxkJkJl](6)whereJk:=xDsk22.
Thus,C(k|x)∈[0,1]where1iscertainty.
380JIntellInfSyst(2013)41:371–4063.
3Maximumaposterioriclassificationofscatteringcoefficients(MAPsCAT)MAPsCATusesthenovelfeaturesproposedinMallat(2012),theuseofwhichforMGRwasfirstproposedbyAndénandMallat(2011).
MAPsCATappliesthesefeatureswithinaBayesianframework,whichseekstochoosetheclasswithminimumexpectedriskgivenobservationx.
Assumingthecostofallmisclassificationsarethesame,andthatallclassesareequallylikely,theBayesianclassifierbecomesthemaximumaposteriori(MAP)classifier(TheodoridisandKoutroumbas2009):k=argmaxk∈{1,.
.
.
,K}P[x|k]P(k)(7)whereP[x|k]istheconditionalmodeloftheobservationsforclassk,andP(k)isaprior.
MAPsCATassumesP[x|k]N(μk,Ck),i.
e.
,theobservationsfromclasskaredistributedmultivariateGaussianwithmeanμkandcovarianceCk.
MAPsCATestimatestheseparametersusingunbiasedminimummean-squarederrorestimationandthetrainingset.
WhenamusicexcerptproducesseveralfeaturesX:={xi},MAPsCATassumesindependencebetweenthem,andpickstheclassmaximizingthesumofthelogposteriors:pk(X):=logP(k)+|X|i=1logP[xi|k].
(8)Scatteringcoefficientsareattractivefeaturesforclassificationbecausetheyarede-signedtobeinvarianttoparticulartransformations,suchastranslationandrotation,topreservedistancesbetweenstationaryprocesses,andtoembodybothlarge-andshort-scalestructures(Mallat2012).
Onecomputesthesefeaturesbyconvolvingthemodulusofsuccessivewaveletdecompositionswiththescalingwavelet.
Weusethe"scatterbox"implementation(AndénandMallat2012)withasecond-orderdecom-position,filterq-factorof16,andamaximumscaleof160.
Fora30-ssoundexcerptwithsamplingrate22,050Hz,thisproduces40featurevectorsofdimension469.
AndénandMallat(2011)reportthesefeaturesusedwithasupportvectormachineobtainsaclassificationaccuracyof82%inGTZAN.
Weobtaincomparableresults.
4EvaluatingtheperformancestatisticsofMGRsystemsWenowevaluatetheperformanceofAdaBFFs,SRCAMandMAPsCATusingClassifyandmeanaccuracyinGTZAN(TzanetakisandCook2002).
DespitethefactthatGTZANisaproblematicdataset—ithasmanyrepetitions,mislabelings,anddistortions(Sturm2013b)—weuseitforfourreasons:1)itisthepublicbenchmarkdatasetmostusedinMGRresearch(Table2);2)itwasusedintheinitialevaluationofAdaBFFs(Bergstraetal.
2006a),SRCAM(Panagakisetal.
2009b),andthefeaturesofMAPsCAT(AndénandMallat2011);3)evaluationsofMGRsystemsusingGTZANandotherdatasetsshowcomparableperformance,e.
g.
,Moerchenetal.
(2006),RenandJang(2012),Dixonetal.
(2010),SchindlerandRauber(2012);and4)sinceitscontentsandfaultsarenowwell-studied(Sturm2013b),wecanappropriatelyhandleitsproblems,andinfactusethemtoouradvantage.
Wetesteachsystemby10trialsofstratified10-foldcross-validation(10*10fCV).
Foreachfold,wetestallsystemsusingthesametrainingandtestingdata.
JIntellInfSyst(2013)41:371–406381Everymusicexcerptisthusclassifiedtentimesbyeachsystemtrainedwiththesamedata.
ForAdaBFFs,werunAdaBoostfor4000iterations,andtestbothdecisiontreesoftwonodesoronenode(stumps).
ForSRCAM,wetestbothstandardizedandnormalizedfeatures,andsolveitsinequality-constrainedoptimizationproblem(3)forε2=0.
01usingSPGL1(vandenBergandFriedlander2008)withatmost200iterations.
ForMAPsCAT,wetestsystemstrainedwithclass-dependentcovariances(eachCkcanbedifferent)ortotalcovariance(allCkthesame).
Wedefineallpriorstobeequal.
Itmightbethatthesizeofthisdatasetistoosmallforsomeapproaches.
Forinstance,sinceforSRCAMoneexcerptproducesa768-dimensionalfeature,wemightnotexpectittolearnagoodmodelfromonly90excerpts.
However,westartasmanyhavebefore:assumeGTZANislargeenoughandhasenoughintegrityforevaluatinganMGRsystem.
4.
1EvaluatingclassificationaccuracyTable4showsclassificationaccuracystatisticsfortwoconfigurationsofeachsystempresentedabove.
IntheirreviewofseveralMGRsystems,Fuetal.
(2011)comparetheperformanceofseveralalgorithmsusingonlyclassificationaccuracyinGTZAN.
TheworkproposingAdaBFFs(Bergstraetal.
2006a),SRCAM(Panagakisetal.
2009b),andthefeaturesofMAPsCAT(AndénandMallat2011),presentonlyclassificationaccuracy.
Furthermore,basedonclassificationaccuracy,Seyerlehneretal.
(2010)arguethattheperformancegapbetweenMGRsystemsandhumansisnarrowing;andinthisissue,Humphreyetal.
conclude"progressincontent-basedmusicinformaticsisplateauing"(Humphreyetal.
2013).
Figure2showsthatwithrespecttotheclassificationaccuraciesinGTZANreportedin83publishedworks(Sturm2013b),thoseofAdaBFFs,SRCAM,andMAPsCATlieabovewhatisreportedbestinhalfofthiswork.
Itisthustemptingtoconcludefromthesethat,withrespecttothemeanaccuracyanditsstandarddeviation,someconfigurationsofthesesystemsarebetterthanothers,thatAdaBFFsisnotasgoodasSRCAMandMAPsCAT,andthatAdaBFFs,SRCAM,andMAPsCATarerecognizinggenrebetterthanatleasthalfofthe"competition".
Theseconclusionsareunwarrantedforatleastthreereasons.
First,wecannotcomparemeanclassificationaccuraciescomputedfrom10*10fCVbecausethesamplesarehighlydependent(Dietterich1996;Salzberg1997).
Hence,wecannottestahypothesisofonesystembeingbetterthananotherbyusing,e.
g.
,at-test,aswehaveerroneouslydonebefore(Sturm2012b).
Second,Classifyisansweringthequestion,"Howwelldoesthesystempredictalabelassignedtoapieceofdata"Table4MeanaccuraciesinGTZANforeachsystem,andthemaximum{pi}(9)overall10CVrunsSystemSystemconfiguationMeanacc.
,std.
dev.
Max{pi}AdaBFFsDecisionstumps0.
776±0.
004>0.
024Two-nodetrees0.
800±0.
006>0.
024SRCAMNormalizedfeatures0.
835±0.
005>0.
024Standardizedfeatures0.
802±0.
006>0.
024MAPsCATClass-dependentcovariances0.
754±0.
004lasthenumberoftimesthesystemwiththehighmeanaccuracyiscorrectandtheotherwrong;andchlasthatfromwhichch>lisasample;andsimilarlyforChl=Chl+chl≥ch>l|q=0.
5]=ch>l+chlch>l+chl+ch|x0.
5|/σ(x)]whereTisdistrib-utedStudent'stwithN2degreesoffreedom(twodegreeslostintheestimationoftheBernoulliparameteranditsvariance).
ForonlyfourDiscoCMexcerpts—11,13,15,and18—dowefindthatwecannotrejectthenullhypothesis(p>0.
1).
Furthermore,inthecaseofexcerpts10and34,wecanrejectthenullhypothesisinfavorofthemisclassificationofMAPsCATandAdaBFFs,respectively(p0.
1).
Furthermore,onlyinthecaseofReggae88canwerejectthenullhypothesisinfavorofSRCAM(p0.
48).
However,forHiphop00,themeanlisteningdurationsofsubjectswhoselected"Disco"(4.
9±1.
1s)versusthosewhoselected"Hiphop"(9.
5±1.
6s)issignificant(p<6·105).
Apparently,manysubjectshastilychosethelabel"Disco.
"Inthesetwocases,then,wecanarguethatSRCAMandMAPsCATareclassifyingacceptably.
5.
4SummaryInSection4,wewereconcernedwithquantitativelymeasuringtheextenttowhichanMGRsystempredictsthegenrelabelsofGTZAN.
Thispresentsaratherrosypictureofperformance:allofoursystemshavehighclassificationaccuracies,precisionandF-measuresinmanycategories,andconfusionbehaviorsthatappeartomakemusicalsense.
ThoughtheirclassificationaccuraciesinGTZANdropsignificantlywhenusinganartistfilter(Sturm2013b),theystillremainshighabovethatofchance.
DuetoClassify,however,wecannotreasonablyarguethatthismeanstheyarerecognizingthegenresinGTZAN,ormorebroadlythattheywillperformwellintherealworldrecognizingthesamegenres(Urbanoetal.
2013).
Inthissection,wehavethusbeenconcernedwithevaluatingtheextenttowhichanMGRsystemdisplaysthekindsofbehaviorweexpectofasystemthathascapacitytorecognizegenre.
Byinspectingthepathologicalerrorsofthesystems,andtakingintoconsiderationthemislabelingsinGTZAN(Sturm2013b),wefindevidenceforandagainsttheclaimthatanyofthemcanrecognizegenre,orthatanyofthemarebetterthantheothers.
WeseeMAPsCAThasoveronehundredmoreC3sthanSRCAMandAdaBFFs,butAdaBFFs"correctly"classifiesthemostmislabeledDiscoexcerptsthantheothertwo.
Allthreesystems,however,makeerrorsthataredifficulttoexplainifgenreiswhateachisrecognizing.
WeseethattheconfidenceofthesesystemsintheirpathologicalerrorsareforthemostpartindistinguishablefromtheirconfidenceintheirC3s.
Whiletherankofthe"correct"classisoftenpenultimatetothe"wrong"onetheyselect,therearerankingsthataredifficulttoexplainifgenreiswhateachisrecognizing.
Finally,ourlisteningtestrevealsthatforthemostpartthepathologicalerrorsofthesesystemsarereadilyapparentfromthosehumanswouldcommit.
Theirperformanceinthatrespectisquitepoor.
6OnevaluationWhilegenreisaninescapableresultofhumancommunication(Frow2005),itcanalsosometimesbeambiguousandsubjective,e.
g.
,Lippensetal.
(2004),Ahrendt(2006),Craftetal.
(2007),Craft(2007),MengandShawe-Taylor(2008),GjerdingenandPerrott(2008)andSeyerlehneretal.
(2010).
AmajorconundrumintheevaluationofMGRsystemsisthustheformaljustificationofwhyparticularlabelsarebetterthanothers.
Forinstance,whilewederideitabove,anargumentmightbemadethatABBA's"MammaMia"employssomeofthesamestylisticelementsofmetalusedbyMotrheadin"AceOfSpades"—thoughitisdifficulttoimagine398JIntellInfSyst(2013)41:371–406theaudiencesofthetwowouldagree.
ThematterofevaluatingMGRsystemswouldbequitesimpleifonlywehadachecklistofessential,oratleastimportant,attributesforeachgenre.
BarbedoandLopes(2007)providesalonglistofsuchattributesineachofseveralgenresandsub-genres,e.
g.
,LightOrchestraInstrumentClassicalismarkedby"lightandslowsongs.
.
.
playedbyanorchestra"andhavenovocalelement(likeJ.
S.
Bach's"AirontheGString");andSoftCountryOrganicPop/Rockismarkedby"slowandsoftsongs.
.
.
typicalofsouthernUnitedStates[with]elementsbothfromrockandblues[andwhere]electricguitarsandvocalsare[strongly]predominant[butthereislittleifany]electronicelements"(like"YourCheatingHeart"byHankWilliamsSr.
).
Someoftheseattributesareclearandactionable,like"slow,"butothersarenot,like,"[with]elementsbothfromrockandblues.
"Suchanapproachtoevaluationmightthusbeapoormatchwiththenatureofgenre(Frow2005).
WehaveshownhowevaluatingtheperformancestatisticsofMGRsystemsusingClassifyinGTZANisinadequatetomeaningfullymeasuretheextentstowhichasystemisrecognizinggenre,orevenwhetheritaddressesthefundamentalproblemofMGR.
Indeed,replacingGTZANwithanotherdataset,e.
g.
,ISMIR2004(ISMIR2004),orexpandingit,doesnothelpaslongaswedonotcontrolforallindependentvariablesinadataset.
Ontheotherhand,thereisnodoubtthatweseesystemsperformingwithclassificationaccuraciessignificantlyaboverandominGTZANandotherdatasets.
Hence,somethingisworkinginthepredictionofthelabelsinthesedatasets,butisthat"something"genrerecognitionOnemightargue,"Theanswertothisquestionisirrelevant.
The'engineeringapproach'—assembleasetoflabeleddata,extractfeatures,andletthepatternrecognitionmachinerylearntherelevantcharacteristicsanddiscriminatingrules—resultsinperformancesignificantlybetterthanrandom.
Furthermore,withasetofbenchmarkdatasetsandstandardperformancemeasures,weareabletomakemeaningfulcomparisonsbetweensystems.
"ThismightbeagreeableinsofarthatonerestrictstheapplicationdomainofMGRtopredictingthesinglelabelsofthemusicrecordingexcerptsinthehandfulofdatasetsinwhichtheyaretrainedandtested.
Whenitcomestoascertainingtheirsuccessintherealworld,todecidewhichofseveralMGRsystemsisbestandwhichisworst,whichhaspromiseandwhichdoesnot,Classifyandclassificationaccuracyprovidenoreliableorevenrelevantgauge.
Onemightargue,"accuracy,recall,precision,F-measuresarestandardperfor-mancemeasures,andthisisthewayithasalwaysbeendoneforrecognitionsystemsinmachinelearning.
"Wedonotadvocateeliminatingsuchmeasures,notusingClassify,orevenofavoidingorsomehow"sanitizing"GTZAN.
WebuildallofSection5upontheoutcomeofClassifyinGTZAN,butwithamajormethodologicaldifferencefromSection4:weconsiderthecontentsofthecategories.
WeusethefaultsofGTZAN,thedecisionstatistics,andalisteningtest,toilluminatethepathologicalbehaviorsofeachsystem.
Aswelookmorecloselyattheirbehaviors,therosypictureofthesystemsevaporates,aswellasourconfidencethatanyofthemisaddressingtheproblem,thatanyoneofthemisbetterthantheothers,oreventhatoneofthemwillbesuccessfulinareal-worldcontext.
Onemightarguethatconfusiontablesprovidearealisticpictureofsystemperformance.
However,inclaimingthattheconfusionbehaviorofasystem"makesmusicalsense,"oneimplicitlymakestwocriticalassumptions:1)thatthedatasetbeingusedhasintegrityforMGR;and2)thatthesystemisusingcuessimilartoJIntellInfSyst(2013)41:371–406399thoseusedbyhumanswhencategorizingmusic,e.
g.
,whatinstrumentsareplaying,andhowaretheybeingplayedwhatistherhythm,andhowfastisthetempoisitfordancing,moshing,protestingorlisteningissomeonesinging,andifsowhatisthesubjectThefaultsofGTZAN,andthewidecompositionofitscategories,obviouslydonotbodewellforthefirstassumption(Sturm2013b).
Thesecondassumptionisdifficulttojustify,andrequiresonetodigdeeperthantheconfu-sionbehaviors,todeterminehowthesystemisencodingandusingsuchrelevantfeatures.
AnalyzingthepathologicalbehaviorsofanMGRsystemprovidesinsightintowhetheritsinternalmodelsofgenresmakesensewithrespecttotheambiguousnatureofgenre.
Comparingtheclassificationresultswiththetagsgivenbyacommunityoflistenersshowthatsomebehaviorsdo"makemusicalsense,"butotherappearlessacceptable.
Inthecaseofusingtags,theimplicitassumptionisthatthetagsgivenbyanunspecifiedpopulationtomaketheirmusicmoreusefultothemaretobetrustedindescribingtheelementsofmusicthatcharacterizethegenre(s)ituses—whetherusersfoundtheseupongenre("funk"and"soul"),style("melodic"and"classic"),form("ballad"),function("dance"),history("70s"and"oldschool"),geography("jamaican"and"britpop"),orothers("romantic").
Thisassumptionisthusquiteunsatisfying,andonewonderswhethertagspresentagoodwaytoformallyevaluateMGRsystems.
AnalyzingthesamepathologicalbehaviorsofanMGRsystem,butbyalisteningtestdesignedspecificallytotesttheacceptabilityofitschoices,circumventstheneedtocomparetags,andgetstotheheartofwhetherasystemisproducinggenrelabelsindistinguishablefromthosehumanswouldproduce.
Hence,wefinallyseebythisthatthoughoursystemshaveclassificationaccuraciesandotherstatisticsthataresignificantlyhigherthanchance,andthougheachsystemhasconfusiontablesthatappearreasonable,acloseranalysisoftheirconfusionsatthelevelofthemusicandalisteningtestmeasuringtheacceptabilityoftheirclassificationsrevealsthattheyarelikelynotrecognizinggenreatall.
Ifperformancestatisticsbetterthanrandomdonotreflecttheextentstowhichasystemissolvingaproblem,thenwhatcanTheanswertothishasimportnotjustforMGR,butmusicinformationresearchingeneral.
Tothisend,consideramanclaiminghishorse"CleverHans"canaddandsubtractintegers.
WewatchtheowneraskHans,"Whatis2and3"ThenHanstapshishoofuntilhisearsraiseafteritsfifthtap,atwhichpointheisrewardedbytheowner.
TomeasuretheextenttowhichHansunderstandstheadditionandsubtractionofintegers,havingtheowneraskmorequestionsinanuncontrolledenvironmentdoesnotaddevidence.
Wecaninsteadperformavarietyofexperimentsthatdo.
Forinstance,withtheownerpresentandhandlingHans,twopeoplecanwhisperseparatequestionstoHansandtheowner,withtheoneswhisperingnotknowingwhetherthesamequestionisgivenornot.
Inplaceofrealquestions,wemightaskHansnonsensicalquestions,suchas,"WhatisBertandErnie"Thenwecancompareitsanswerswitheachofthequestions.
Ifthisdemonstratesthatsomethingotherthananunderstandingofbasicmathematicsmightbeatplay,thenwemustsearchforthemechanismbywhichHansisabletocorrectlyanswertheowner'squestionsinanuncontrolledenvironment.
Wecan,forinstance,blindfoldHanstodeterminewhetheritisvision;orisolateitinasoundproofroomwiththeowneroutsidetodeterminewhetheritissound.
Suchahistoricalcaseiswell-documentedbyPfungst(1911).
400JIntellInfSyst(2013)41:371–406ClassifyusingdatasetshavingmanyindependentvariableschangingbetweenclassesisakintoaskingHanstoanswermorequestionsinanuncontrolledenviron-ment.
Whatisneededisaricherandmorepowerfultoolboxforevaluation(Urbanoetal.
2013).
Onemustsearchforthemechanismofcorrectresponse,whichcanbeevaluatedby,e.
g.
,RulesandRobust.
Dixonetal.
(2010)useRulestoinspectthesanityofwhattheirsystemdiscoversusefulfordiscriminatingdifferentgenres.
WeshowusingRobust(Sturm2012b)thattwohigh-accuracyMGRsystemscanclassifythesameexcerptofmusicinradicallydifferentwayswhenwemakeminoradjust-mentsbylteringthatdonotaffectitsmusicalcontent.
Akintononsensequestions,MatityahoandFurst(1995)noticethattheirsystemclassifiesazero-amplitudesignalas"Classical,"andwhitenoiseas"Pop.
"PorterandNeuringer(1984),investigatingthetrainingandgeneralizationcapabilitiesofpigeonsindiscriminatingbetweentwogenres,testwhetherresponsesareduetothemusicitself,ortoconfoundssuchascharacteristicsoftheplaybackmechanisms,andthelengthsandloudnessofexcerpts.
Chase(2001)doesthesameforkoi,andlooksattheeffectoftimbreaswell.
Sinceitisasremarkableaclaimthatanartificialsystem"recognizesgenrewith85%accuracy"asahorseisabletoperformmathematics,thisadvocatesapproachinganMGRsystem—orautotagger,oranymusicinformationsystem—asifitwere"CleverHans.
"Thisofcoursenecessitatescreativityinexperimentaldesign,andrequiresmuchmoreeffortthancomparingselectedtagstoa"groundtruth.
"Onemightargue,"OneofthereasonsMGRissopopularisbecauseevaluationisstraightforwardandeasy.
Yourapproachislessstraightforward,andcertainlyunscalable,e.
g.
,usingthemillionsongdataset(Bertin-Mahieuxetal.
2011;HuandOgihara2012;Schindleretal.
2012).
"Tothiswecanonlyask:whyattempttosolveverybigproblemswithademonstrablyweakapproachtoevaluation,whenthesmallerproblemshaveyettobeindisputablysolved7ConclusionInthiswork,wehaveevaluatedtheperformancestatisticsandbehaviorsofthreeMGRsystems.
Table4showstheirclassificationaccuraciesaresignificantlyhigherthanchance,andareamongthebestobserved(andreproduced)fortheGTZANdataset.
Figure3showstheirrecalls,precisions,andF-measurestobesimilarlyhigh.
Finally,Fig.
4showstheirconfusions"makemusicalsense.
"Thus,onemighttaketheseasevidencethatthesystemsarecapableofrecognizingsomeofthegenresinGTZAN.
Theveracityofthisclaimisconsiderablychallengedwhenweevaluatethebehaviorsofthesystems.
WeseethatSRCAMhasjustashighconfidencesinitsconsistentmisclassificationsasinitsconsistentlycorrectclassifications.
WeseeMAPsCAT—asystemwithahighF-scoreinMetal—alwaysmistakestheexcerptof"MammaMia"byABBAas"Metal"first,"Rock"second,and"Reggae"or"Coun-try"third.
Weseethatallsubjectsofourlisteningtesthavelittletroublediscrimi-natingbetweenalabelgivenbyahumanandthatgivenbythesesystems.
Inshort,thoughthesesystemshavesuperbclassificationaccuracy,recalls,etc.
,inGTZAN,theydonotreliablyproducegenrelabelsindistinguishablefromthosehumansproduce.
FromtheverynatureofClassifyinGTZAN,weareunabletorejectthehypothe-sisthatanyofthesesystemsisnotabletorecognizegenre,nomattertheaccuracyweJIntellInfSyst(2013)41:371–406401observe.
Inessence,"genre"isnottheonlyindependentvariablechangingbetweentheexcerptsofparticulargenresinourdataset;andClassifydoesnotaccountforthem.
Thereisalso,justtonameafew,instrumentation(discoandclassicalmayormaynotusestrings),loudness(metalandclassicalcanbeplayedathighorlowvolumes),tempo(bluesandcountrycanbeplayedfastorslow),dynamics(classicalandjazzcanhavefeworseverallargechangesindynamics),reverberation(reggaecaninvolvespringreverberation,andclassicalcanbeperformedinsmallorlargehalls),production(hiphopandrockcanbeproducedinastudioorinaconcert),channelbandwidth(countryandclassicalcanbeheardonAMorFMradio),noise(bluesandjazzcanbeheardfromanoldrecordoranewCD),etc.
Hence,todetermineifanMGRsystemhasacapacitytorecognizeanygenre,onemustlookdeeperthanclassificationaccuracyandrelatedstatistics,andfrommanymoreperspectivesthanjustClassify.
AcknowledgementsManythanksto:CarlaT.
Sturmforherbibliographicprowess;andGeraintWiggins,NickCollins,MatthewDavies,FabienGouyon,ArthurFlexer,andMarkPlumbleyfornumerousandinsightfulconversations.
Thankyoutothenumerousanonymouspeerreviewerswhocontributedgreatlytothisarticleanditsorganization.
OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttributionLicensewhichpermitsanyuse,distribution,andreproductioninanymedium,providedtheoriginalauthor(s)andthesourcearecredited.
ReferencesAbeer,J.
,Lukashevich,H.
,Bruer,P.
(2012).
Classificationofmusicgenresbasedonrepetitivebasslines.
JournalofNewMusicResearch,41(3),239–257.
Ahrendt,P.
(2006).
Musicgenreclassificationsystems—Acomputationalapproach.
Ph.
D.
thesis,TechnicalUniversityofDenmark.
Ammer,C.
(2004).
Dictionaryofmusic(4thed.
).
NewYork:TheFactsonFile,Inc.
Andén,J.
,&Mallat,S.
(2011).
Multiscalescatteringforaudioclassification.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
657–662).
Andén,J.
,&Mallat,S.
(2012).
Scatterboxv.
1.
02.
http://www.
cmap.
polytechnique.
fr/scattering/.
Accessed15Oct2012.
Anglade,A.
,Benetos,E.
,Mauch,M.
,Dixon,S.
(2010).
Improvingmusicgenreclassificationusingautomaticallyinducedharmonyrules.
JournalofNewMusicResearch,39(4),349–361.
Ariyaratne,H.
B.
,&Zhang,D.
(2012).
Anovelautomatichierachicalapproachtomusicgenreclassification.
InProc.
IEEEInternationalConferenceonMultimedia&Expo(pp.
564–569).
Aucouturier,J.
J.
(2009).
Soundsliketeenspirit:Computationalinsightsintothegroundingofeverydaymusicalterms.
InJ.
Minett,&W.
Wang(Eds.
),Language,evolutionandthebrain:Frontiersinlinguisticseries(pp.
35–64).
AcademiaSinicaPress.
Aucouturier,J.
-J.
&Bigand,E.
(2013).
SevenproblemsthatkeepMIRfromattractingtheinterestofcognitionandneuroscience.
JournalofIntelligentInformationSystems.
doi:10.
1007/s10844-013-0251-x.
Aucouturier,J.
J.
,&Pachet,F.
(2003).
Representingmusicgenre:Astateoftheart.
JournalofNewMusicResearch,32(1),83–93.
Aucouturier,J.
J.
,&Pachet,F.
(2004).
Improvingtimbresimilarity:HowhighistheskyJournalofNegativeResultsinSpeechandAudioSciences,1(1),1–13.
Aucouturier,J.
J.
,&Pampalk,E.
(2008).
Introduction—fromgenrestotags:Alittleepistemologyofmusicinformationretrievalresearch.
JournalofNewMusicResearch,37(2),87–92.
Bagci,U.
,&Erzin,E.
(2007).
Automaticclassificationofmusicalgenresusinginter-genresimilarity.
IEEESignalProcessingLetters,14(8),521–524.
Barbedo,J.
G.
A.
,&Lopes,A.
(2007).
Automaticgenreclassificationofmusicalsignals.
EURASIPJournalonAdvancesinSignalProcessing.
doi:10.
1155/2007/64960.
402JIntellInfSyst(2013)41:371–406Barbedo,J.
G.
A.
,&Lopes,A.
(2008).
Automaticmusicalgenreclassificationusingaflexibleap-proach.
JournaloftheAudioEngineeringSociety,56(7/8),560–568.
Benbouzid,D.
,Busa-Fekete,R.
,Casagrande,N.
,Collin,F.
D.
,Kégl,B.
(2012).
Multiboost:Amulti-purposeboostingpackage.
JournalofMachineLearningResearch,13,549–553.
Benetos,E.
,&Kotropoulos,C.
(2008).
Atensor-basedapproachforautomaticmusicgenreclas-sification.
InProc.
EuropeanSignalProcessingConference.
Benetos,E.
,&Kotropoulos,C.
(2010).
Non-negativetensorfactorizationappliedtomusicgenreclassification.
IEEETransactionsonAudio,Speech,andLanguageProcessing,18(8),1955–1967.
Berenzweig,A.
,Logan,B.
,Ellis,D.
P.
W.
,Whitman,B.
(2004).
Alarge-scaleevaluationofacousticandsubjectivemusic-similaritymeasures.
ComputerMusicJournal,28(2),63–76.
Bergstra,J.
,Casagrande,N.
,Erhan,D.
,Eck,D.
,Kégl,B.
(2006a).
AggregatefeaturesandAdaBoostformusicclassification.
MachineLearning,65(2–3),473–484.
Bergstra,J.
,Lacoste,A.
,Eck,D.
(2006b).
PredictinggenrelabelsforartistusingFreeDB.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
85–88).
Bergstra,J.
,Mandel,M.
,Eck,D.
(2010).
Scalablegenreandtagpredictionwithspectralcovariance.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
507–512).
Bertin-Mahieux,T.
,Eck,D.
,Mandel,M.
(2010).
Automatictaggingofaudio:Thestate-of-the-art.
InW.
Wang(Ed.
),Machineaudition:Principles,algorithmsandsystems(pp.
334–352).
IGIPublishing.
Bertin-Mahieux,T.
,Ellis,D.
P.
,Whitman,B.
,Lamere,P.
(2011).
Themillionsongdataset.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
591–596).
http://labrosa.
ee.
columbia.
edu/millionsong/.
Burred,J.
J.
,&Lerch,A.
(2004).
Hierarchicalautomaticaudiosignalclassification.
JournaloftheAudioEngineeringSociety,52(7),724–739.
Chai,W.
,&Vercoe,B.
(2001).
FolkmusicclassificationusinghiddenMarkovmodels.
InProc.
InternationalConferenceonArticialIntelligenceChang,K.
,Jang,J.
S.
R.
,Iliopoulos,C.
S.
(2010).
Musicgenreclassificationviacompressivesampling.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
387–392).
Chase,A.
(2001).
Musicdiscriminationsbycarp"Cyprinuscarpio".
Learning&Behavior,29,336–353.
Chen,S.
H.
,&Chen,S.
H.
(2009).
Content-basedmusicgenreclassificationusingtimbralfeaturevectorsandsupportvectormachine.
InProc.
InternationalConferenceonInteractionSciences:InformationTechnology,CultureandHuman(pp.
1095–1101).
Collins,N.
(2012).
Influenceinearlyelectronicdancemusic:Anaudiocontentanalysisinvestigation.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
1–6).
Craft,A.
(2007).
Theroleofcultureinthemusicgenreclassificationtask:Humanbehaviouranditseffectonmethodologyandevaluation.
Tech.
Rep.
,QueenMaryUniversityofLondon.
Craft,A.
,Wiggins,G.
A.
,Crawford,T.
(2007).
HowmanybeansmakefiveTheconsensusprob-leminmusic-genreclassificationandanewevaluationmethodforsingle-genrecategorisationsystems.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
73–76).
Cruz,P.
,&Vidal,E.
(2003).
Modelingmusicalstyleusinggrammaticalinferencetechniques:atoolforclassifyingandgeneratingmelodies.
InProc.
WebDeliveringofMusic(pp.
77–84).
doi:10.
1109/WDM.
2003.
1233878.
Cruz,P.
,&Vidal,E.
(2008).
Twogrammaticalinferenceapplicationsinmusicprocessing.
AppliedArtificialIntelligence,22(1/2),53–76.
DeCoro,C.
,Barutcuoglu,S.
,Fiebrink,R.
(2007).
Bayesianaggregationforhierarchicalgenreclas-sification.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
77–80).
Deshpande,H.
,Singh,R.
,Nam,U.
(2001).
Classificationofmusicsignalsinthevisualdomain.
InProc.
DigitalAudioEffects.
Limerick,Ireland.
Dietterich,T.
(1996).
Statisticaltestsforcomparingsupervisedlearningalgorithms.
Tech.
Rep.
,OregonStateUniversity,Corvallis,OR.
Dixon,S.
,Mauch,M.
,Anglade,A.
(2010).
Probabilisticandlogic-basedmodellingofharmony.
InProc.
ComputerMusicModelingandRetrieval(pp.
1–19).
Fabbri,F.
(1982).
Atheoryofmusicalgenres:Twoapplications.
InP.
Tagg&D.
Horn(Eds.
),Popularmusicperspectives(pp.
55–59).
GothenburgandExeter.
Flexer,A.
(2006).
Statisticalevaluationofmusicinformationretrievalexperiments.
JournalofNewMusicResearch,35(2),113–120.
Flexer,A.
(2007).
Acloserlookonartistfiltersformusicalgenreclassification.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
341–344).
JIntellInfSyst(2013)41:371–406403Flexer,A.
,&Schnitzer,D.
(2009).
Albumandartisteffectsforaudiosimilarityatthescaleoftheweb.
InProc.
SoundandMusicComputing(pp.
59–64).
Flexer,A.
,&Schnitzer,D.
(2010).
Effectsofalbumandartistfiltersinaudiosimilaritycomputedforverylargemusicdatabases.
ComputerMusicJournal,34(3),20–28.
Freund,Y.
,&Schapire,R.
E.
(1997).
Adecision-theoreticgeneralizationofon-linelearningandanapplicationtoboosting.
JournalofComputerandSystemSciences,55,119–139.
Frow,J.
(2005).
Genre.
NewYork:Routledge.
Fu,Z.
,Lu,G.
,Ting,K.
M.
,Zhang,D.
(2011).
Asurveyofaudio-basedmusicclassificationandannotation.
IEEETransactionsonMultimedia,13(2),303–319.
García,J.
,Hernández,E.
,Meng,A.
,Hansen,L.
K.
,Larsen,J.
(2007).
Discoveringmusicstructureviasimilarityfusion.
InProc.
NIPSworkshoponmusic,brain&cognition:Learningthestructureofmusicanditseffectsonthebrain.
Gasser,M.
,Flexer,A.
,Schnitzer,D.
(2010).
Hubsandorphans—anexplorativeapproach.
InProc.
SoundandMusicComputing.
Gjerdingen,R.
O.
,&Perrott,D.
(2008).
Scanningthedial:Therapidrecognitionofmusicgenres.
JournalofNewMusicResearch,37(2),93–100.
Goto,M.
,Hashiguchi,H.
,Nishimura,T.
,Oka,R.
(2003).
RWCmusicdatabase:Musicgenredata-baseandmusicalinstrumentsounddatabase.
InProc.
InternationalSocietyforMusicInforma-tionRetrieval(pp.
229–230).
Gouyon,F.
,&Dixon,S.
(2004).
Dancemusicclassification:Atempo-basedapproach.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
501–504).
Guaus,E.
(2009).
Audiocontentprocessingforautomaticmusicgenreclassification:Descriptors,databases,andclassifiers.
Ph.
D.
thesis,UniversitatPompeuFabra,Barcelona,Spain.
Holzapfel,A.
,&Stylianou,Y.
(2008).
Musicalgenreclassificationusingnonnegativematrixfactorization-basedfeatures.
IEEETransactionsonAudio,Speech,andLanguageProcessing,16(2),424–434.
Homburg,H.
,Mierswa,I.
,Mller,B.
,Morik,K.
,Wurst,M.
(2005).
Abenchmarkdatasetforaudioclassificationandclustering.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
528–531).
Hu,Y.
,&Ogihara,M.
(2012).
Genreclassificationformillionsongdatasetusingconfidence-basedclassifierscombination.
InProc.
ACMSpecialInterestGrouponInformationRetrieval(pp.
1083–1084).
Humphrey,E.
J.
,Bello,J.
P.
,LeCun,Y.
(2013).
Featurelearninganddeeparchitectures:Newdirectionsformusicinformatics.
JournalofIntelligentInformationSystems.
doi:10.
1007/s10844-013-0248-5.
ISMIR(2004).
Genreresults.
http://ismir2004.
ismir.
net/genre_contest/index.
htm.
Accessed15Oct2012.
Krumhansl,C.
L.
(2010).
Plink:"Thinslices"ofmusic.
MusicPerception:AnInterdisciplinaryJournal,27(5),337–354.
Langlois,T.
,&Marques,G.
(2009).
Automaticmusicgenreclassificationusingahierarchicalclusteringandalanguagemodelapproach.
InProc.
InternationalConferenceonAdvancesinMultimedia(pp.
188–193).
Law,E.
(2011).
Humancomputationformusicclassification.
InT.
Li,M.
Ogihara,G.
Tzanetakis(Eds.
),Musicdatamining(pp.
281–301).
BocaRaton,FL:CRCPress.
Lee,J.
W.
,Park,S.
B.
,Kim,S.
K.
(2006).
Musicgenreclassificationusingatime-delayneuralnetwork.
InJ.
Wang,Z.
Yi,J.
Zurada,B.
L.
Lu,H.
Yin(Eds.
),Advancesinneuralnetworks(pp.
178–187).
Berlin/Heidelberg:Springer.
doi:10.
1007/11760023_27.
Lerch,A.
(2012).
Anintroductiontoaudiocontentanalysis:Applicationsinsignalprocessingandmusicinformatics.
NewYork,Hoboken:Wiley/IEEEPress.
Li,T.
,&Ogihara,M.
(2004).
Musicartiststyleidentificationbysemi-supervisedlearningfrombothlyricsandcontents.
InProc.
ACMMultimedia(pp.
364–367).
Lin,C.
R.
,Liu,N.
H.
,Wu,Y.
H.
,Chen,A.
(2004).
Musicclassificationusingsignificantrepeatingpatterns.
InY.
Lee,J.
Li,K.
Y.
Whang,D.
Lee(Eds.
),Databasesystemsforadvancedapplications(pp.
27–29).
Berlin/Heidelberg:Springer.
Lippens,S.
,Martens,J.
,DeMulder,T.
(2004).
Acomparisonofhumanandautomaticmusicalgenreclassification.
InProc.
IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(pp.
233–236).
Lopes,M.
,Gouyon,F.
,Koerich,A.
,Oliveira,L.
E.
S.
(2010).
Selectionoftraininginstancesformusicgenreclassification.
InProc.
InternationalConferenceonPatternRecognition(pp.
4569–4572).
404JIntellInfSyst(2013)41:371–406Lukashevich,H.
,Abeer,J.
,Dittmar,C.
,Gromann,H.
(2009).
Frommulti-labelingtomulti-domain-labeling:Anoveltwo-dimensionalapproachtomusicgenreclassification.
InInternationalSocietyforMusicInformationRetrieval(pp.
459–464).
Mace,S.
T.
,Wagoner,C.
L.
,Teachout,D.
J.
,Hodges,D.
A.
(2011).
Genreidentificationofverybriefmusicalexcerpts.
PsychologyofMusic,40(1),112–128.
Mallat,S.
(2012).
Groupinvariantscattering.
CommunicationsonPureandAppliedMathematics,65(10),1331–1398.
Marques,G.
,Domingues,M.
,Langlois,T.
,Gouyon,F.
(2011a).
Threecurrentissuesinmusicautotagging.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
795–800).
Marques,G.
,Langlois,T.
,Gouyon,F.
,Lopes,M.
,Sordo,M.
(2011b).
Short-termfeaturespaceandmusicgenreclassification.
JournalofNewMusicResearch,40(2),127–137.
Marques,G.
,Lopes,M.
,Sordo,M.
,Langlois,T.
,Gouyon,F.
(2010).
Additionalevidencethatcommonlow-levelfeaturesofindividualaudioframesarenotrepresentativeofmusicgenres.
InProc.
SoundandMusicComputing.
Matityaho,B.
,&Furst,M.
(1995).
Neuralnetworkbasedmodelforclassificationofmu-sictype.
InProc.
ConventionofElectricalandElectronicsEngineersinIsrael(pp.
1–5).
doi:10.
1109/EEIS.
1995.
514161.
McDermott,J.
,&Hauser,M.
D.
(2007).
Nonhumanprimatespreferslowtemposbutdislikemusicoverall.
Cognition,104(3),654–668.
doi:10.
1016/j.
cognition.
2006.
07.
011.
McKay,C.
(2004).
AutomaticgenreclassificationofMIDIrecordings.
Ph.
D.
thesis,McGillUniversity,Montréal,Canada.
McKay,C.
,&Fujinaga,I.
(2005).
Automaticmusicclassificationandtheimportanceofinstrumentidentification.
InProc.
ConferenceonInterdisciplinaryMusicology.
McKay,C.
,&Fujinaga,I.
(2006).
Musicgenreclassification:IsitworthpursuingandhowcanitbeimprovedInProc.
InternationalSocietyforMusicInformationRetrieval(pp.
101–106).
Meng,A.
,Ahrendt,P.
,Larsen,J.
(2005).
Improvingmusicgenreclassificationbyshort-timefeatureintegration.
InProc.
IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(pp.
497–500).
Meng,A.
,&Shawe-Taylor,J.
(2008).
Aninvestigationoffeaturemodelsformusicgenreclas-sificationusingthesupportvectorclassifier.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
604–609).
MIREX(2005).
Genreresults.
http://www.
music-ir.
org/mirex/wiki/2005:MIREX2005_Results.
Ac-cessed15Oct2012.
Moerchen,F.
,Mierswa,I.
,Ultsch,A.
(2006).
Understandablemodelsofmusiccollectionsbasedonexhaustivefeaturegenerationwithtemporalstatistics.
InInt.
ConferenceonKnowledgeDiscoveryandDataMining(pp.
882–891).
Pampalk,E.
(2006).
Computationalmodelsofmusicsimilarityandtheirapplicationinmusicinfor-mationretrieval.
Ph.
D.
thesis,ViennaUniversityofTech.
,Vienna,Austria.
Pampalk,E.
,Flexer,A.
,Widmer,G.
(2005).
Improvementsofaudio-basedmusicsimilarityandgenreclassification.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
628–233).
Panagakis,Y.
,&Kotropoulos,C.
(2010).
Musicgenreclassificationviatopologypreservingnon-negativetensorfactorizationandsparserepresentations.
InProc.
IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(pp.
249–252).
Panagakis,Y.
,Kotropoulos,C.
,Arce,G.
R.
(2009a).
Musicgenreclassificationusinglocalitypreserv-ingnon-negativetensorfactorizationandsparserepresentations.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
249–254).
Panagakis,Y.
,Kotropoulos,C.
,Arce,G.
R.
(2009b).
Musicgenreclassificationviasparserepre-sentationsofauditorytemporalmodulations.
InProc.
EuropeanSignalProcessingConference(pp.
1–5).
Panagakis,Y.
,Kotropoulos,C.
,Arce,G.
R.
(2010a).
Non-negativemultilinearprincipalcomponentanalysisofauditorytemporalmodulationsformusicgenreclassification.
IEEETransactionsonAudio,Speech,andLanguageProcessing,18(3),576–588.
Panagakis,Y.
,Kotropoulos,C.
,Arce,G.
R.
(2010b).
Sparsemulti-labellinearembeddingnonneg-ativetensorfactorizationforautomaticmusictagging.
InProc.
EuropeanSignalProcessingConference(pp.
492–496).
Pfungst,O.
(translatedbyC.
L.
Rahn)(1911).
Cleverhans(ThehorseofMr.
VonOsten):Acontribu-tiontoexperimentalanimalandhumanpsychology.
NewYork:HenryHolt.
Porter,D.
,&Neuringer,A.
(1984).
Musicdiscriminationsbypigeons.
ExperimentalPsychology:AnimalBehaviorProcesses,10(2),138–148.
JIntellInfSyst(2013)41:371–406405Ren,J.
M.
,&Jang,J.
S.
R.
(2011).
Time-constrainedsequentialpatterndiscoveryformusicgenreclas-sification.
InProc.
IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(pp.
173–176).
Ren,J.
M.
,&Jang,J.
S.
R.
(2012).
Discoveringtime-constrainedsequentialpatternsformusicgenreclassification.
IEEETransactionsonAudio,Speech,andLanguageProcessing,20(4),1134–1144.
Rizzi,A.
,Buccino,N.
M.
,Panella,M.
,Uncini,A.
(2008).
Genreclassificationofcompressedaudiodata.
InProc.
InternationalWorkshoponMultimediaSignalProcessing(pp.
654–659).
Salzberg,S.
L.
(1997).
Oncomparingclassifiers:Pitfallstoavoidandarecommendedapproach.
DataMiningandKnowledgeDiscovery,1,317–328.
Sanden,C.
(2010).
Anempiricalevaluationofcomputationalandperceptualmulti-labelgenreclas-sificationonmusic.
Master'sthesis,UniversityofLethbridge.
Sanden,C.
,&Zhang,J.
Z.
(2011a).
Algorithmicmulti-genreclassificationofmusic:Anempiricalstudy.
InProc.
InternationalComputerMusicConference(pp.
559–566).
Sanden,C.
,&Zhang,J.
Z.
(2011b).
Enhancingmulti-labelmusicgenreclassificationthroughensem-bletechniques.
InProc.
ACMSpecialInterestGrouponInformationRetrieval(pp.
705–714).
Scaringella,N.
,Zoia,G.
,Mlynek,D.
(2006).
Automaticgenreclassificationofmusiccontent:Asurvey.
IEEESignalProcessingMagazine,23(2),133–141.
Schapire,R.
,&Singer,Y.
(1999).
Improvedboostingalgorithmsusingconfidence-ratedpredictions.
MachineLearning,37(3),297–336.
Schedl,M.
,&Flexer,A.
(2012).
Puttingtheuserinthecenterofmusicinformationretrieval.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
385–390).
Schedl,M.
,Flexer,A.
,Urbano,J.
(2013).
Theneglecteduserinmusicinformationretrievalresearch.
JournalofIntelligentInformationSystems.
doi:10.
1007/s10844-013-0247-6.
Schindler,A.
,Mayer,R.
,Rauber,A.
(2012).
Facilitatingcomprehensivebenchmarkingexperimentsonthemillionsongdataset.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
469–474).
Schindler,A.
,&Rauber,A.
(2012).
Capturingthetemporaldomaininechonestfeaturesforim-provedclassificationeffectiveness.
InProc.
AdaptiveMultimediaRetrieval.
Schnitzer,D.
,Flexer,A.
,Schedl,M.
,Widmer,G.
(2012).
Localandglobalscalingreducehubsinspace.
JournalofMachineLearningResearch,13,2871–2902.
Seyerlehner,K.
,Schedl,M.
,Sonnleitner,R.
,Hauger,D.
,Ionescu,B.
(2012).
Fromimprovedauto-taggerstoimprovedmusicsimilaritymeasures.
InProc.
AdaptiveMultimediaRetrieval.
Seyerlehner,K.
,Widmer,G.
,Knees,P.
(2010).
Acomparisonofhuman,automaticandcollaborativemusicgenreclassificationandusercentricevaluationofgenreclassificationsystems.
InProc.
AdaptiveMultimediaRetrieval(pp.
118–131).
Shapiro,P.
(2005).
Turnthebeataround:Thesecrethistoryofdisco.
London,UK:Faber&Faber.
Silla,C.
N.
,Koerich,A.
L.
,Kaestner,C.
A.
A.
(2008).
TheLatinmusicdatabase.
In:Proc.
InternationalSocietyforMusicInformationRetrieval(pp.
451–456).
Slaney,M.
(1998).
Auditorytoolbox.
Tech.
Rep.
,IntervalResearchCorporation.
Smith,J.
B.
L.
,Burgoyne,J.
A.
,Fujinaga,I.
,Roure,D.
D.
,Downie,J.
S.
(2011).
Designandcreationofalarge-scaledatabaseofstructuralannotations.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
555–560).
Song,W.
,Chang,C.
J.
,Liou,S.
(2009).
ImprovedconfidenceintervalsontheBernoulliparameter.
CommunicationsandStatisticsTheoryandMethods,38(19),3544–3560.
Sturm,B.
L.
(2012a).
Asurveyofevaluationinmusicgenrerecognition.
InProc.
AdaptiveMultimediaRetrieval.
Sturm,B.
L.
(2012b).
Twosystemsforautomaticmusicgenrerecognition:WhataretheyreallyrecognizingInProc.
ACMWorkshoponMusicInformationRetrievalwithUser-CenteredandMultimodalStrategies(pp.
69–74).
Sturm,B.
L.
(2013a).
Onmusicgenreclassificationviacompressivesampling.
InProc.
IEEEInterna-tionalConferenceonMultimedia&Expo.
Sturm,B.
L.
(2013b).
TheGTZANdataset:Itscontents,itsfaults,theireffectsonevaluation,anditsfutureuse.
http://arxiv.
org/abs/1306.
1461.
Sturm,B.
L.
,&Noorzad,P.
(2012).
Onautomaticmusicgenrerecognitionbysparserepresentationclassificationusingauditorytemporalmodulations.
InProc.
ComputerMusicModelingandRetrieval.
Sundaram,S.
,&Narayanan,S.
(2007).
Experimentsinautomaticgenreclassificationoffull-lengthmusictracksusingaudioactivityrate.
InProc.
WorkshoponMultimediaSignalProcessing(pp.
98–102).
406JIntellInfSyst(2013)41:371–406Tacchini,E.
,&Damiani,E.
(2011).
Whatisa"musicalworld"Anaffinitypropagationapproach.
InProc.
ACMWorkshoponMusicInformationRetrievalwithUser-CenteredandMultimodalStrategies(pp.
57–62).
Theodoridis,S.
,&Koutroumbas,K.
(2009).
PatternRecognition(4thed.
).
Amsterdam,TheNetherlands:AcademicPress,Elsevier.
Turnbull,D.
,&Elkan,C.
(2005).
FastrecognitionofmusicalgenresusingRBFnetworks.
IEEETransactionsonKnowledgeandDataEngineering,17(4),580–584.
Tzanetakis,G.
,&Cook,P.
(2002).
Musicalgenreclassificationofaudiosignals.
IEEETransactionsonSpeechandAudioProcessing,10(5),293–302.
Tzanetakis,G.
,Ermolinskyi,A.
,Cook,P.
(2003).
Pitchhistogramsinaudioandsymbolicmusicinformationretrieval.
JournalofNewMusicResearch,32(2),143–152.
Umapathy,K.
,Krishnan,S.
,Jimaa,S.
(2005).
Multigroupclassificationofaudiosignalsusingtime-frequencyparameters.
IEEETransactionsonMultimedia,7(2),308–315.
Urbano,J.
,Schedl,M.
,Serra,X.
(2013).
Evaluationinmusicinformationretrieval.
JournalofIntelligentInformationSystems.
doi:10.
1007/s10844-013-0249-4.
vandenBerg,E.
,&Friedlander,M.
P.
(2008).
ProbingtheParetofrontierforbasispursuitsolutions.
SIAMJournalonScientificComputing,31(2),890–912.
Vatolkin,I.
(2012).
Multi-objectiveevaluationofmusicclassification.
InW.
A.
Gaul,A.
Geyer-Schulz,L.
Schmidt-Thieme,J.
Kunze(Eds.
),Challengesattheinterfaceofdataanalysis,computerscience,andoptimization(pp.
401–410).
Berlin:Springer.
Wang,F.
,Wang,X.
,Shao,B.
,Li,T.
,Ogihara,M.
(2009).
Tagintegratedmulti-labelmusicstyleclassificationwithhypergraph.
InProc.
InternationalSocietyforMusicInformationRetrieval(pp.
363–368).
Watanabe,S.
,&Sato,K.
(1999).
Discriminativestimuluspropertiesofmusicinjavasparrows.
BehaviouralProcesses,47(1),53–57.
Wiggins,G.
A.
(2009).
SemanticgapSchemanticschmap!
!
Methodologicalconsiderationsinthescientificstudyofmusic.
InProc.
IEEEInternationalSymposiumonMultimedia(pp.
477–482).
Wright,J.
,Yang,A.
Y.
,Ganesh,A.
,Sastry,S.
S.
,Ma,Y.
(2009).
Robustfacerecognitionviasparserepresentation.
IEEETransactionsonPatternAnalysisandMachineIntelligence,31(2),210–227.
Wu,M.
J.
,Chen,Z.
S.
,Jang,J.
S.
R.
,Ren,J.
M.
(2011).
Combiningvisualandacousticfeaturesformusicgenreclassification.
InProc.
InternationalConferenceonMachineLearningandApplicationsandWorkshops(pp.
124–129).
Yao,Q.
,Li,H.
,Sun,J.
,Ma,L.
(2010).
Visualizedfeaturefusionandstyleevaluationformusicalgenreanalysis.
InProc.
InternationalConferenceonPervasiveComputing,SignalProcessingandApplications(pp.
883–886).

展开全文