knowledgeoffsetrect
offsetrect 时间:2021-02-21 阅读:(
)
Copyright2007PsychonomicSociety,Inc.
510Therehavebeenconvincingsuggestionsinthelitera-ture(e.
g.
,Landauer&Dumais,1997;Lund&Burgess,1996;Patel,Bullinaria,&Levy,1997)thatpsychologi-callyrelevantandplausiblerepresentationsofwordmean-ingcanbelearnedfromexposuretostreamsofnaturallanguage.
Theseclaimshavedirectrelevancetoboththelearningoflexicalsemanticsbyhumans,andtheuseofsuchrepresentationslearnedbycomputersbutusedinmodelsofhumanpsychologicalperformance(e.
g.
,Lowe&McDonald,2000).
Thestrongestclaimisper-hapsthathumaninfantscanacquirerepresentationsofwordmeaningsbybuildingupandmanipulatingthewordco-occurrencestatisticsofthespeech/textstreamstheyencounter.
Thebasicideaissimplythatwordswithsimi-larmeaningswilltendtooccurinsimilarcontexts,andhencewordco-occurrencestatisticscanprovideanaturalbasisforsemanticrepresentations.
Explicitsimulationsdoshowthatvectorspacerepresentationsformedinthiswaycanbeusedtoperformremarkablywellonvariousperformancecriteria,e.
g.
,usingsimplevectorspacedis-tancemeasurestocarryoutmultiple-choicesynonymjudgmentsofthetypeusedinTestsofEnglishasaForeignLanguage(TOEFL)(Landauer&Dumais,1997;Levy&Bullinaria,2001).
Obviously,co-occurrencestatisticsontheirownwillnotbesufficienttobuildcompleteandreliablelexicalrepresentations(French&Labiouse,2002).
Forexample,withoutextracomputationalapparatus,theywillneverbeabletodealwithhomophonesandhomographs—wordswiththesameformbutdifferentmeaning(e.
g.
,Schütze,1998).
Norwilltheyaccountforthehumanabilitytolearnthemeaningofwordsfromdictionariesorinstruction.
However,onecanseehowthestatisticalrepresentationscouldformacomputationallyefficientfoundationforthelearningofsemanticrepresentations.
Acompletelearningprocessmighttakethefollowingform:1.
Iterativelyupdatethewordco-occurrencestatisticsasmoretrainingdata(i.
e.
,naturallanguageusage)isencountered.
2.
Processthatinformationintoanappropriaterepresen-tationofsemantics,possiblyemployingsomeformofdi-mensionalreductionorotherformofdatacompression.
3.
Usesupervisedlearningtechniquestorefinethoserepresentations,e.
g.
,byseparatinghomophones,orbyin-sertingdictionarylearnedwords.
Ifwecanshowthatsuchcomputationalprocedurescancreateausefullexicalsemanticrepresentationfromnatu-rallanguageinput,thenitisplausibletosuggestthatevo-lutionwillhavefurnishedhumanswiththeabilitytotakeadvantageofthesestatistics.
This,ofcourse,stillleavesuswiththetaskofdescribingexactlyhowthehumansys-temworks,butunderstandinghowinprincipleonebestcomputessuchrepresentationsisanecessaryfirststep.
Inaddition,althoughitisnotthemainfocusofthisarticle,understandingandfollowingthesehumanproceduresmayalsobeagoodstrategyforbuildingartificiallanguageprocessingsystems.
TherearenumeroustechniquesintheliteraturethatcouldbeusedtoimplementStage3,suchasvariationsonthethemeoflearningvectorquantization(LVQ),inwhichExtractingsemanticrepresentationsfromwordco-occurrencestatistics:AcomputationalstudyJOHNA.
BULLINARIAUniversityofBirmingham,Birmingham,EnglandANDJOSEPHP.
LEVYRoehamptonUniversity,London,EnglandTheideathatatleastsomeaspectsofwordmeaningcanbeinducedfrompatternsofwordco-occurrenceisbecomingincreasinglypopular.
However,thereislessagreementabouttheprecisecomputationsinvolved,andtheappropriateteststodistinguishbetweenthevariouspossibilities.
Itisimportantthattheeffectoftherelevantdesignchoicesandparametervaluesareunderstoodifpsychologicalmodelsusingthesemethodsaretobereli-ablyevaluatedandcompared.
Inthisarticle,wepresentasystematicexplorationoftheprincipalcomputationalpossibilitiesforformulatingandvalidatingrepresentationsofwordmeaningsfromwordco-occurrencestatis-tics.
Wefindthat,oncewehaveidentifiedthebestprocedures,averysimpleapproachissurprisinglysuccessfulandrobustoverarangeofpsychologicallyrelevantevaluationmeasures.
BehaviorResearchMethods2007,39(3),510-526J.
A.
Bullinaria,j.
a.
bullinaria@cs.
bham.
ac.
ukEXTRACTINGSEMANTICREPRESENTATIONS511representationsgeneratedbyunsupervisedclusteringorlearningmethodsareadjustedbysupervisedlearning(Kohonen,1997).
Implementingproceduresforperform-ingtheco-occurrencecountsofStage1isalsostraight-forward,butitisunlikelythatinhumansthewordcountswouldbecollectedfirstandthenprocessedlater.
Itismorelikelythatthethreestagescoexist,sothattheacquisitionoftherepresentationswouldautomaticallyoccurinthegrad-ualon-linefashionobserved.
However,forthepurposesofthisarticleweshallassumethatifwecancomeupwithsuitableformulationsofthethreestagesindependently,thentheycanbecombinedintoaconsistentandcoher-enton-linewholeusingexistingconnectionisttechniques(e.
g.
,Bishop,1995;Haykin,1999),andanyconstraintsfrombiologicalplausibilitywillbeaddressedatthesametime.
WearethusleftwiththetaskofspecifyingStage2.
Thefirstmajorproblemonefacesisthattherearemanydifferenttypesofstatisticsonecouldfeasiblyextractfromtherawco-occurrencecountstobuildthevectorspacerepresentationsofwordmeanings,anditisnotatallobvi-ouswhichisbest.
Thisleadsusontothesecondmajorproblemwhichisthatitisnotclearhowoneshouldmea-surethequalityofthevariouspossiblerepresentations.
Onecancertainlytrythemoutonvarioushuman-likelanguagetasks,suchassynonymjudgments,butthenitisnotobvioushowoneshouldmaptheuseofourcom-puter-basedrepresentationsontothewaythathumansemploythem(e.
g.
,Bullinaria&Huckle,1997).
Norisitobviousthatforbuildingusefulcomputerbasedrepre-sentations,wewanttousetheminthesamewayanyway.
Ourownpreliminaryinvestigations(Levy&Bullinaria,2001;Levy,Bullinaria&Patel,1998;Pateletal.
,1997)haveindicatedthatthecomputationaldetailswhichre-sultinthebestperformancelevelsdependcruciallyonthedetailsoftheparticularhuman-liketaskandonhowexactlyweimplementit.
Thisobviouslymakesitdifficulttoidentifyreliablythestrengthsandweaknessesofthewholeapproachingeneral.
Fortunately,themorecom-pleteanalysispresentedhererevealsthatonceweidentifyouroverallbestapproach,theresultsaremuchmorecon-sistentlygood.
Intheremainderofthisarticle,weshallpresentoursystematicexplorationoftheprincipalpossibilitiesforformulatingthewordco-occurrenceapproachtowordmeaningrepresentation.
Webeginwithabriefoverviewofpreviousworkinthisarea,andthenoutlinetherangeofcomputationaltechniquesandteststobeconsideredhere.
Wethenexploretheimportanceofthevariousdetailsbysummarizinganddiscussingthekeyresultswehaveob-tainedusingsemanticvectorsderivedfromthetextualcomponentoftheBritishNationalCorpus(BNC),whichconsistsofabout90millionwordsfromarepresentativevarietyofsources(Aston&Burnard,1998).
Therobust-nessoftheseresultsisthentestedwithrespecttocorpussizeandquality.
Weendwithsomemoregeneraldiscus-sionandconclusions.
PreviousWorkonCo-OccurrenceStatisticsInspiredbyintuitionsfromlinguistics(e.
g.
,Saussure,1916;Firth,1957),workinthisareahastakenplacewithinthecomponentdisciplinesofcomputationallinguistics,informationretrievalandthepsychologyoflanguage.
Weshallnowbrieflyoutlinesomeofthepastwork,empha-sizingpsychologicallyrelevantresultsatalexicallevelratherthanhigherlevelsoforganizationsuchassentencesordocuments.
TheworkofSchützeandcolleagues(e.
g.
,Schütze,1993)showedhowco-occurrencestatisticsofletter4-gramsinrelativelysmallcorporacouldbeusedtoexam-inedistancesbetweenlexicalrepresentationsinasemanti-callyrelevantmanner,anddemonstratedthesurprisinglylargeamountofinformationthatispresentinsimpleco-occurrencemeasurements.
This"WordSpace"modelex-tractedthemoststatisticallyimportantdimensionsfromtheco-occurrencestatisticsusingsingularvaluedecom-position(SVD),awellknownstatisticaltechniquethathassincebeenusedintheworkonLSAdescribedbelow.
FinchandChater(1992)usedco-occurrencestatisticsasabasisforinducingsyntacticcategories.
Theylookedattheco-occurrencesofthe1,000mostfrequenttargetwordswiththe150mostfrequentcontextwordsusingatwowordwin-dowina40millionwordUSENETnewsgroupcorpus.
Theresultingvectorsproducedclusteranalysisdendrogramsthatreflectedahierarchyofsyntacticcategoriesremarkablyclosetoastandardlinguistictaxonomy,includingstruc-turerightuptophrases.
Theyalsofoundthatsomeoftheirclustersexhibitedsemanticregularities.
Themostcommon150wordsinacorpusofEnglisharemostlyclosedclassorgrammaticalfunctionwords.
Theuseofsuchclosedclasswordco-occurrencepatternstoinducemeasuresofseman-ticsimilaritywillbeexaminedfurtherbelow.
ThisworkwascontinuedbyRedington,Chater,andFinch(1998)usingtheCHILDEScorpusofchild-directedspeech.
Morerecently,Monaghan,Chater,andChristiansen(2005)haveexaminedthedifferentcontributionsofco-occurrencebasedandpho-nologicalcuesintheinductionofsyntacticcategoriesfromtheCHILDEScorpus.
LundandBurgess(1996)havedevelopedarelatedframeworktheycallHAL(hyperspaceapproximationtolanguage).
UsingtheEuclideandistancebetweenco-occurrencevectorsobtainedwithweighted10wordwin-dowsina160millionwordcorpusofUSENETnewsgrouptext,theywereabletopredictthedegreeofprimingofonewordbyanotherinalexicaldecisiontask.
Theirworkshowedhowsimpleco-occurrencepatternsfromaneasilyavailablesourceoftextcanproducestatisticscapableofsimulatingpsychologicaltasksatalexicalsemanticlevel,withoutagreatdegreeofpre-processingormanipulationssuchasdimensionalityreduction.
Thisgrouphasgoneontousetheirmethodinseveralfurtherstudies(e.
g.
,Audet&Burgess,1999;Burgess&Conley,1999).
McDonaldandLowehavealsoreportedontheuseofco-occurrencestatisticsasmeasuresofsemanticre-latedness(e.
g.
,Lowe,2001;McDonald&Lowe,1998).
McDonaldandShillcock(2001)describeameasureof"contextualsimilarity"basedonco-occurrencestatistics.
LoweandMcDonald(2000)describedtheuseofco-occurrencestatisticstomodelmediatedpriming.
Usinga10wordwindow,theyselectedthecontextworddi-mensionsusinganANOVAtojudgehowconsistentthe512BULLINARIAANDLEVYco-occurrencepatternswereacrossdifferentsubcorpora.
Usingaratherconservativecriterion,themethodyielded536contextwords.
Theyruledouta"stop-list"of571wordsincludingclosedclasswordsandothermostlyverycommonwordsthatareusuallyseenasuninformativeintheinformationretrievalliterature.
Ourowngrouphasalsoreportedmethodologicalre-sultsusingsimilarsimpleco-occurrencestatistics.
Wehavedevelopedevaluationmethodsandusedthemtoexploretheparameterspaceofthemethodsunderlyingtheuseofvector-basedsemanticrepresentations(Levy&Bullinaria,2001;Levyetal.
,1998;Pateletal.
,1997).
Wehavefoundthatthechoiceofwindowshapeandsize,thenumberofcontextwords,andthe"stoplist"canhaveanenormouseffectontheresults,andthatusingsimpleinformation-theoreticdistancemeasurescanoftenworkbetterthanthetraditionalEuclideanandCosinemeasures.
Oneofthemainaimsofthisarticleistoexploremoresystematicallyandfullytherangeofdesignchoicesthatcanaffecttheperformanceofthesemethods.
LandauerandDumaishaveadoptedaslightlydifferentapproachderivedfrominformationretrieval(Letsche&Berry,1997)thattheycalllatentsemanticanalysis(LSA),stressingtheimportanceofdimensionalityreductionasamethodofuncoveringtheunderlyingcomponentsofwordmeaning.
LandauerandDumais(1997)isanimportantpaperinthisfieldasitdemonstratedhowsimplewordco-occurrencedatawassufficienttosimulatethegrowthinachild'svocabularyandthusmadeastrongclaimforthepsychologicalutilityofwordco-occurrence.
Using30,473articlesdesignedforchildrenfromGrolier'sAca-demicAmericanEncyclopaedia,theymeasuredcontextstatisticsusingawindowthatcorrespondedtothelengthofeacharticleoritsfirst2,000characters.
Theythenusedanentropybasedtransformontheirdataandextractedthe300mostimportantdimensionsusingsingularvaluedecomposition(SVD),aprocedurerelatedtostandardprincipalcomponentanalysis(PCA)thatallowsthemostimportantunderlyingdimensionstobeextractedfromanonsquarematrix.
Aswellasprovidingfurtherevidencethatwordco-occurrencedatacontainssemanticinforma-tionthatcanbeextracted,theyshowedhowinductivelearningfromrealisticlanguageinputcanexplainanin-creaseinperformancethatmirrorsthatofchildreninvo-cabularyacquisition.
LandauerandDumais(1997)demonstratedtheutilityoftheirframeworkbyusingitonthesynonymportionofaTestofEnglishasaForeignLanguage(TOEFL).
Thistestisdescribedinfulldetailbelow,butessentially,foreachof80targetwords,thewordmostcloselyrelatedinmeaningmustbechosenfromfourotherwords.
Theirpro-gramscoredaround64%usingthestrategyofchoosingthewordwiththelargestcosine(i.
e.
,smallestangulardis-tance)betweenitsderivedco-occurrencevectorandthatofthetarget.
TheynotethatthisscoreiscomparabletotheaveragescorebyapplicantstoU.
S.
collegesfromnon-Englishspeakingcountries,andwouldbehighenoughtoallowadmissiontomanyU.
S.
universities.
Theygoontoshowthatthelearningrateoftheirmodelmirrorsthepatternofvocabularyacquisitionofchildrenandshowshowachildcaninducetheroughmeaningofapreviouslyunseenwordfromitspresentcontextandaknowledgeofpastwordco-occurrences.
Theirworkisanimportantexampleofadetailedcognitivemodelthatemploysco-occurrencestatisticstogiveanumericalfittoobserva-tionaldata.
ThecomputationalmethodsunderlyingLSAhavebeenapplied,developedandexpandedfurtheroverthepastde-cade.
ThishasincludedusingLSAtomodelmetaphorcomprehension(Kintsch,2000;Kintsch&Bowles,2002);amodelofchildren'ssemanticmemorybuiltfromanLSAanalysisofachildcorpus(Denhière&Lemaire,2004);applicationtogradingstudentessays(Miller,2003);ap-plicationofdifferentsourcesofknowledgeonreasoning(Wolfe&Goldman,2003);mathematicalimprovementstotheLSAdistancemeasure(Huetal.
,2003);potentialimprovementsinthestatisticalmethodsunderlyingLSA(Hofmann,2001);andmanyotherstudies.
Theabovebriefandselectivereviewdemonstratesthevarietyofpsychologicalareasofinterestthatmodelsusingco-occurrencestatisticscanbeappliedto.
Theapproachhasprovidedinsightsintodevelopmentalpsychology(e.
g.
,Landauer&Dumais,1997;Huetal.
,2003;Monaghanetal.
,2005),psycholinguistics(e.
g.
,Lowe&McDonald,2000;Lund&Burgess,1996),neuropsychology(e.
g.
,Conley,Burgess,&Glosser,2001),aswellasmoretech-nologicalapplicationsthatmayhavepotentialrelevancetopsychology,suchasinformationretrieval(Deerwesteretal.
,1990)andwordsensedisambiguation/synonymyrecognition(e.
g.
,Burgess,2001;Schütze,1998;Turney,2001).
Themodelsforallthesedomainsdependuponanempiricistperspectiveofinducinglinguisticgener-alitiesfromlanguageinput.
Theresultswereportinthisarticlearesignificantinthattheydemonstratevariousoptimalitiesinthedesignandparameterspaceforthesestatisticalmethods,andsostrengthenthetheoreticalun-derpinningsofthemodelsbasedonthisapproach.
TheneedtocomparesemanticrepresentationsarisingfromdifferentapproachesandparametershasbeendiscussedinamoregeneralsettingbyHuetal.
(2005).
Herewearenotsomuchinterestedinmeasuresofthesimilaritybetweendifferentsemanticspaces,asmeasuresofhowwelleachpossiblecorpusbasedvectorspaceperformsasasemanticrepresentation.
Wemustnotethatthereremainssomecontroversycon-cerningtheuseofwordco-occurrencestatisticsasthebasisforrepresentingmeaninginhumans.
GlenbergandRobertson(2000)attackHALandLSAfornotsolvingHarnad's(1990)symbolgroundingproblem.
Theiralter-nativeisanembodiedapproachwheremeaningdependsonbodilyactionsandtheaffordancesofobjectsintheenvironment.
Anypurelysymbolicapproachincludingtheoriesbasedonwordco-occurrenceisjudgedtobeinadequateinthattheynevermakecontactwiththerealworld,relyingonlyoninternalrelationsbetweensymbolicrepresentations.
TheyrejectthesolutionofferedforthisproblembyLandauerandDumais(1997),ofencodingco-occurrencebetweenperceptualeventsandwordsorotherperceptualevents,becausethishasnotyetbeenim-plementedinapproachessuchasHALorLSA.
BurgessEXTRACTINGSEMANTICREPRESENTATIONS513(2000),inhisreplytoGlenbergandRobertson(2000),championsmodelswheremeaningisrepresentedashighdimensionalvectorsderivedfromwordco-occurrenceforbeingexplicitandtransparent.
HereasonsthatGlenberg&Robertson'sexperimentaldatashowingthatoneimple-mentationofLSAcannotaccountforflexiblejudgments(suchastheplausibilityoffillingasweaterwithleavesasasubstituteforapillow,asagainstfillingasweaterwithwater)areunfairtestsbecausetheLSAvectorshadnotbeenderivedfromrelevant"experiences.
"BurgessalsopointsoutthatHALandLSAarepurelyrepresentationalmodels,anddonotdescribethenecessaryprocessingma-chineryfortakingadvantageoftheknowledgederivedfromaccumulatingco-occurrencepatterns.
FrenchandLabiouse(2002)alsorightlyclaimthatco-occurrencepatternsontheirowncannotaccountforallaspectsof"real-wordsemantics.
"Theyarguethatwith-outtheuseofaspectsofworldknowledgeandtheflex-ibilityofuseofcontextthatcanchangethemeaningofawordorphrase,co-occurrencecannotcapturesubtleusesoflanguagesuchaslawyersbeingmorelikenedtosharksthankangaroos,orthatanIsraeliministerismorelikelytohaveaJewishsoundingnamethanaPalestin-ianone.
Withouttrainingamodelontheappropriatelanguagematerialthatmightgiveitachancetopickupthiskindofinformation,wewouldliketoreservejudg-mentonhowwellco-occurrencestatisticscouldcapturesuchmeanings,butweagreethatitisunlikelythatwordco-occurrencesaloneareenoughtocaptureallaspectsofsemantics.
Wesimplyclaimthatitissurprisinghowmuchtheycancapture,thattheyareagoodcandidatesourceforinducingwordrolesaswecandemonstratethatasignificantamountofsemanticinformationispresentandavailableforextractionusingsimplecomputationalmeans,andthattheyprovideasolidfoundationformorecompleterepresentations.
ComputingtheCo-OccurrenceVectorsGeneratingtherawwordco-occurrencecountsissim-plyamatterofgoingthroughalargespokenorwrittencorpusandcountingthenumberoftimesn(c,t)eachcon-textwordcoccurswithinawindowofacertainsizeWaroundeachtargetwordt.
Weshallassumethatthecorpusisusedinitsrawstate,withnopreprocessing,thusgiv-ingusaconservativeestimateoftheperformancelevelsachievable.
Humansmaywellmakeuseofsimpletrans-formations,suchasstemmingorlemmatization(Manning&Schütze,1999,p.
132),astheyexperiencethestreamofwords,andthusformbetterrepresentationsthanourbasiccountingapproach.
Forexample,theymightimprovetheirperformancebymakinguseofthekindofgrammaticalknowledgethattellsusthat"walk"and"walked"aremor-phologicallyandthussemanticallyrelated.
Ouraimhereistoconductcomputationalexperimentswithaviewtoarrivingatsomegeneralguidelinesforextractingthebestpossiblelexicalsemanticinformationfromagivencor-pus.
Thiswillprovidethebasisformorepsychologicallyplausiblemodelsandtheories,yetavoidtheneedtomakespecificclaimsandassumptionsaboutthedetailsofthosesystemsbeforeweunderstandtherangeofcomputationalpossibilities.
Naturally,thewordmeaningswillbeindependentofthecorpussize,sothecountsarenormalizedtogivethebasicsemanticvectorforeachwordtwhichisjustthevectorofconditionalprobabilitiespctpctptnctnctcwhichsatisfiesalltheusualpropertiesofprobabilities(i.
e.
,allcomponentsarepositiveandsumtoone).
TheindividualwordfrequenciesinthecorpusareftWnctfcWnctct11—thatis,thesummedco-occurrencecountsdividedbythenumberoftimeseachwordgetscounted(thewindowsizeW);andtheindividualwordprobabilitiesareptNWnctpcNWnctct11,—thatis,thewordfrequenciesdividedbyN,thetotalnumberofwordsinthecorpus.
Clearlythewindowaroundourtargetwordcanbede-finedinmanyways(e.
g.
,Lund&Burgess,1996).
Wecouldjustuseawindowtotheleftof(i.
e.
,before)thetargetword,orjusttotheright(i.
e.
,after),orwecouldhaveasymmetricwindowthatsumstheleftandrightcounts,orwecouldhavevectorsthatkeeptheleftandrightcountsseparately.
Wecanhaveflatwindowsinwhichallwordpositionsarecountedequally,orwindowsinwhichtheclosestcontextwordscountmorethanthosemoredistant—forinstance,inatriangularorGaussianfashion.
Onecouldeasilycomeupwithfurthervariationsonthistheme.
Theeffectofthesevariationsisoneoftheimplementationaldetailsweshallexplorelater.
Tojudgehowusefulthesebasicco-occurrencevectorsareforrepresentingsemantics,weneedtodefinesomeindependentempiricaltestsoftheirquality.
Therearetwoaspectstothis:1.
HowreliablearethevectorsfromastatisticaldataacquisitionpointofviewForexample,towhatextentwilldifferentrepresentationsemergefromdistinctsubsetsofthecorpus.
Thiscanbetestedusingonlythetrainingdata—thatis,onlyinformationinthecorpusitself.
2.
Howwelldothe"semanticvectors"providewhatweexpectofasemanticrepresentationTotestthisweneedcomparisonsagainstexternalmeasuresofwhatweknowagoodsemanticrepresentationshouldbeabletodo,e.
g.
,basedonhumanperformanceonsuitabletasks.
Asystematicexplorationofthesepointswillgiveuscluesastowhatfurtherprocessingmightbeappropriate,andhowfeasiblethewholeapproachis.
Itwillalsopro-videsomeusefulguidelinesonappropriateimplementa-tionaldetailswhichcantheninformthedevelopmentofspecificmodelsandtheories.
ValidatingtheSemanticRepresentationsClearly,therearecountlessempiricalteststhatonemightemploytoestimatethesemanticvalidityofourrep-514BULLINARIAANDLEVYresentations.
Inthisarticle,weshallpresentresultsfromfourteststhathavebeendesignedtoprobedifferentas-pectsofthecorpusderivedvectors:TOEFL(TestofEnglishasaForeignLanguage).
ThisisamuchstudiedperformancemeasurebasedonwordstakenfromrealTOEFLtestsusedbyuniversitiesintheUSA(Landauer&Dumais,1997).
Itconsistsofeightymultiplechoicejudgmentsontheclosestmeaningbetweenatargetwordandfourothers(e.
g.
,whichofthefollowingisclosestinmeaningtoenormously:appropri-ately,uniquely,tremendouslyordecidedly).
ThistestwashelpfullyprovidedbyTomLandauer,andweconvertedthespellingofafewofthewordstomatchourUKEnglishcorpus.
Itwasimplementedbycomputingthedistancesinoursemanticspacebetweenthetargetandeachofthefourchoicewords,andcountingthenumberforwhichthecorrectwordisclosesttothetarget.
DistanceComparison.
ThisissimilartotheTOEFLtestinthatitinvolvesmultiplechoicesimilarityjudgments,butratherthantestfinedistinctionsbetweenwords,manyofwhichoccurveryrarelyinthecorpus,itisdesignedtestthelargescalestructureofthesemanticspaceusingwordsthatarewelldistributedinthecorpus.
Itinvolves200tar-getwordsandthecomparisonisbetweenonesemanticallyrelatedwordandtenotherrandomlychosenwordsfromthe200pairs(e.
g.
,typicalrelatedwordsarebrotherandsister,blackandwhite,lettuceandcabbage,bindandtie,competenceandability).
Theperformanceisthepercent-ageofcontrolwordsthatarefurtherthantherelatedwordfromthetargetword.
SemanticCategorization.
Thistestisdesignedtoexploretheextenttowhichsemanticcategoriesarerep-resentedinthevectorspace.
Itmeasureshowoftenin-dividualwordvectorsareclosertotheirownsemanticcategorycenterratherthanoneoftheothercategorycen-ters(Pateletal.
,1997).
Tenwordsweretakenfromeachof53semanticcategories(e.
g.
,metals,fruits,weapons,sports,colors)basedonhumancategorynorms(Battig&Montague,1969),andthepercentageofthe530wordsthatfellclosertotheirowncategorycenterratherthananotherwascomputed.
SyntacticCategorization.
Thistestexamineswhethersyntacticinformationcanberepresentedinthesamevec-torspaceassemantics,orifaseparatevectorspaceisrequired.
Thedegreetowhichwordvectorsareclosertotheirownsyntacticcategorycenterratherthanothercategorycentersismeasured(Levyetal.
,1998).
Onehundredwordsweretakenforeachoftwelvecommonpartsofspeech,andthepercentageofthe1200wordsthatfallclosertotheirowncategorycenterthananotherwascomputed.
Itisimmediatelyclearthateachofthesetestsreliesonthedefinitionofsomeformofdistancemeasureonthespaceofsemanticvectors.
Againtherearemanypos-sibilities.
ThreefamiliarandcommonlyusedgeometricmeasuresareEuclideandttpctpctc1212212,||,/§¨·CityBlockdttpctpctc1212,||,andCosinedtt12,11211§¨·§pctpctpctpctcc|.
||.
|¨·§¨·122212//|.
|.
pctpctcEuclideanandCityBlockarewellknownMinkowskimetrics.
Cosineisoneminusthecosineoftheanglebe-tweenthetwovectors,andmeasuresthesimilarityofthevectordirections,ratherthanthepositionsinthevectorspace(Landauer&Dumais,1997).
Giventhatthevec-torsareprobabilities,itisquitepossiblethatinformationtheoreticmeasuressuchasHellingerdttpctpctc121122122,||,//§Bhattacharyadttpctpctc12112212,log||//§r§¤,andKullback–Leiblerdttpctpctpctc12112,|log||¤couldbemoreappropriate(Zhu,1997).
TheHellingerandKullback–Leiblermeasureshavealreadybeenshowntoworkwellinpreviousstudies(Levy&Bullinaria,2001;Pateletal.
,1997).
Thereareanumberofnaturalalternativestotherawprobabilitiesp(c|t)thatweshouldalsoconsiderforoursemanticvectors.
Perhapsthemostwidelyconsidered(e.
g.
,Church&Hanks,1990;Manning&Schütze,1999)isthePointwiseMutualInformation(PMI)whichcomparestheactualconditionalprobabilitiesp(c|t)foreachwordttotheaverageorexpectedprobabilityp(c);thatis,ictpctpcpctptpc(,)log(|)()log(,)()().
Negativevaluesindicatelessthantheexpectednumberofco-occurrences,whichcanariseformanyreasons,includingapoorcoverageoftherepresentedwordsinthecorpus.
Apotentiallyusefulvariation,therefore,istosetallthenegativecomponentstozero,anduseonlythePositivePMI.
Therearemanyothervariationsonthistheme,suchasvariousoddsratios(e.
g.
,Lowe,2001)andtheentropybasednormalizationusedinLSA(Landauer&Dumais,1997).
Hereweshalljustconsiderthesimplestofthese—namely,thesimpleprobabilityratiovectorsEXTRACTINGSEMANTICREPRESENTATIONS515rctpctpcpctptpc(,)(|)()(,)()(),orjustthePMIwithoutthelogarithm(whichweshallsimplycallRatios).
Westillneedtocomputedistancesbetweenthesenewvectorsi(c,t)andr(c,t),buttheyarenolongerprobabilities,soitmakeslittlesensetousetheinformationtheoreticmeasures,andwerestrictourselvestousingthegeometricmeasureswiththem.
TheBNCcorpuscontainstagsrepresentingsyntacticclassesandsoon,whichnaturallydonotexistinmostwrittenandspokencontexts,soforourexperimentsonthesemantictaskstheseareremoved.
Furthermore,allpunctuationisremoved,leavingacorpusconsistingonlyofalongorderedlistofwords.
Ourresultsarethereforeconservative,notrelyingonanyothermechanismssuchassentencecomprehension.
Forthesyntacticclusteringtask,thesyntactictagsareretainedinordertogeneratethesyn-tacticcategorycenters.
Inbothcases,itisthenstraightfor-wardtoreadthroughthecleanedcorpusgeneratingallthenecessarycountsinonepass.
Wehavealreadynotedmanyfactorsthatneedtobeexploredsystematically.
Tobeginwith,wehavethewin-dowshapesandsizes,thetypeofvectorswestartwith,andthedistancemetricsweusewiththem.
Thenwecanseefromtheaboveequationsthatsomedependonthelowfrequencycontextwordsmorethanothers,andgiventhatstatisticalreliabilitydependsonreasonablyhighwordcounts,wemightgetbetterresultsbyremovingthecom-ponentscorrespondingtothelowestfrequencycontextwords.
Weneedtoexplorehowbesttodothis.
Thenweneedtodeterminetheeffectofthecorpussize,whichwillnaturallyaffecthowreliablethevariousvectorcompo-nentsare.
Allthesefactorsarelikelytoberelated,andalsodependonthekindoftaskweareusingourvectorsfor.
Clearlywecannotpresentallourresultshere,butitispossibleforustopresentaselectionthatgivesafairpictureofwhichaspectsaremostimportant,andthemaininteractionsbetweenthem.
Weshallstartbylookingatthebestperformancewecangetforeachofourfourtesttasksforthevariouscompo-nenttypesanddistancemeasures.
Thispointsustowhichisbestoverall,andwecanthenconcentrateonthatforpre-sentingourexplorationoftheotherfactors.
Wethencon-siderthestatisticalreliabilityofthesemanticvectors,andhowthetaskperformancesdependonwindowshape,sizeandtype,andonhowmanyvectorcomponentsareused.
Weendbystudyingtheeffectofchangingthesizeandqualityofthecorpus,andseehowthetaskperformanceschangewhenmuchsmallercorporaareavailable.
VaryingtheComponentTypeandDistanceMeasureThevariousfactorsdiscussedaboveallinteractandalldependontheperformancemeasurethatisbeingused.
Wehaveperformedafairlyexhaustivesearchacrossthevariousparameterconfigurations,andshallbeginbyplot-tingtheoverallbestperformancefoundoneachtaskusingthefullBNCtextcorpusforeachofthevariousvectorcomponenttypesanddistancemeasures.
Weshallthenlookinmoredetailatthevariousfactorsandparametersthatwereoptimizedtogivethosebestperformancelevels.
Figure1showsthebestperformancehistogramsorderedbyperformance.
Thedefaultcomponenttypeforeachdis-tancemeasureistheprobabilitiesp(c|t),andwealsocon-siderthePMI,PositivePMI,andRatioscomponentsforusewiththegeometricdistancemeasures.
Forthethreesemantictasksweseethatthereisaclearbestapproach:PositivePMIcomponentswiththeCosinedistancemea-sure.
Thisalsoworkswellforthesyntacticclustering,makingitthebestapproachoverall.
RatiocomponentswithCosinedistancesisalsoprettygood.
Theotherap-proachesaremorevariableinperformance.
ThePositivePMIresultsherecompareextremelywellwithresultsfromourownandothers'previouswork.
FortheTOEFLtask,weobtainascoreof85.
0%.
Thiscom-pares,forexample,withourpreviousbestresultof75.
0%usingrawprobabilitycomponentsandtheHellingerdis-tancemetric(Levy&Bullinaria,2001),73.
8%byTurney(2001),whousedaPMIdistancemetriconprobabilitycomponentscomputedbysearchenginequeriesovertheentireWWW,64.
4%byLSAusingamuchsmallercorpusandSVDdimensionalityreduction(Landauer&Dumais,1997),and64.
5%asanaveragescorebynon-EnglishspeakingapplicantstoU.
S.
universities(Landauer&Dumais,1997).
Itisperhapssurprisingthatsuchasim-plealgorithmperformssowellonTOEFL,aswellastheotherthreetasks.
Thisdemonstrateshowmuchinforma-tionthereisavailableinmutualinformationstatisticsofwordco-occurrences.
Giventhatthereissuchaclearbestapproach,whichweshallseelaterisevenclearerforsmallercorpussizes,itmakessensetoconcentrateonPositivePMIcomponentswiththeCosinedistancemeasureinourdiscussionoftheinfluenceofthevariousparameterchoices.
StatisticalReliabilityHavinggotanideaofthebestperformingsemanticvec-torswecanhopetogetfromourcorpus,wenowlookatsomeofthepropertiesofthesevectors.
Itisappropri-atetobeginbyconsideringthereliabilityofthesevec-torsfromapurelystatisticalpointofview.
Clearly,usingsmallrandomsamplesofrealtextisgoingtointroduceerrorsintoanyestimationoftheprobabilities,andsincechildrenareexposedtoquitesmalldatasets,thiscouldbeproblematicifthiskindoftechniqueistoaccountforanempiricistmechanismoffirstlanguageacquisition.
Wecangetanestimateofthelikelystatisticalvariationsbycomparingthevectorsgeneratedfromtwodistincthalvesofthecorpus.
TheuppergraphsofFigure2comparethePositivePMIvectorsobtainedfromtwohalvesofthefullBNCcorpus,usingaco-occurrencewindowconsistingofonewordoneachsideofthetargetword.
ThesamewordsetwasusedasfortheDistanceComparisontaskdiscussedabove.
OntheleftweplottheCosinedistancesbetweenthevectorsgeneratedfromthetwodistinctsubcorporaforeachtargetword,andcomparethosewiththedistancesbetweenthevectorsforeachtargetwordandaseman-ticallyrelatedwordandanunrelatedcontrolword.
The516BULLINARIAANDLEVYhorizontalaxisshowsthewordcount(i.
e.
,frequency)ofthetargetwordinthecorpus.
Asonewouldhope,thedis-tancesbetweentargetandcontrolwordsarelargerthanthosebetweensemanticallyrelatedwords,whichinturnaregreaterthanthosebetweenidenticalwords.
Thedif-ferencesareevenclearerintheplotsofthedistanceratiosshowninthegraphsontheright.
Control/Relatedratiosgreaterthanonecorrespondtoasuccessfulsemanticrelat-ednessdistinctionandgoodperformanceonoursemantictasks.
Same/Relatedratiosoflessthanoneindicategoodstatisticalreliabilityofthevectors.
Fromastatisticalpointofview,onewouldexpectthevectorqualitytobebetterforlargecorpussizesandforhighfrequencywords.
WecanseeboththeseeffectsclearlyinFigure2.
Theuppergraphscorrespondtotwo44.
8mil-lionwordhalvesofthefullBNCcorpus.
Thelowertwographscorrespondtotwo4.
6millionwordsubcorpora,whichcorrespondtothecorpussizeintheLandauer&Dumais(1997)study.
Ontheleft,thebestfitlinesforthethreeclassesshowclearwordcounteffects,withsmallerrelatedandsameworddistancesforhigherfrequenciesandlargercorpora.
Ontheright,thepatternisclearerintheratioplots,andwecanseehowthesemanticvectorqualityiscompromisedifthewordfrequencyorcorpussizebecomestoosmall.
Wecanconcludethatourvectorsdoshowreasonablestatisticalreliability,andexhibittheexpectedeffectsofse-manticrelatedness,wordfrequencyandcorpussize.
Italsoappearsthattheperformancedegradesgracefullyasthecorpussizeisreducedtowardthatoftypicalhumanexperi-ence,butweshallneedtolookatthatinmoredetaillater.
VaryingtheContextWindowTheplotsinFigure2werebasedonthesimplestco-occurrencecountspossible,namelyawindowofasinglewordoneachsideofthetargetword.
ThemostobviousvariationistoextendthiswindowtoincludeWwordsoneachside(arectangularwindow).
Itisalsonaturaltocon-Figure1.
Thebestperformanceobtainedonthefourtasksforeachofthevectortypesanddistancemeasures.
PosPMICosineRatiosCosineHellingerBhattacharyaCityBlockPosPMIEuclideanKullback-LeiblerRatiosCityBlockPosPMICityBlockCosineEuclideanRatiosEuclideanPMICityBlockPMIEuclidean45607590PercentPosPMICosineRatiosCosineKullback-LeiblerCityBlockHellingerBhattacharyaPosPMICityBlockCosinePosPMIEuclideanPMICityBlockEuclideanRatiosCityBlockPMIEuclideanRatiosEuclidean708090100PercentPosPMICosineRatiosCosineCityBlockHellingerRatiosEuclideanPosPMIEuclideanBhattacharyaRatiosCityBlockPosPMICityBlockCosineKullback-LeiblerEuclideanPMICityBlockPMIEuclidean40557085PercentRatiosCosineKullback-LeiblerHellingerPosPMICosineBhattacharyaPMIEuclideanCityBlockRatiosEuclideanPMICityBlockPosPMIEuclideanRatiosCityBlockCosineEuclideanPosPMICityBlock708090100PercentTOEFLDistanceSem.
ClusterSynt.
ClusterEXTRACTINGSEMANTICREPRESENTATIONS517siderthecontextwordstobemoreimportanttheclosertheyaretothetargetwords,inwhichcasewecangivethemaweightingthatfallsofflinearlywithdistancefromthetargetword(atriangularwindow).
AsimilarGaussianweightedwindowwouldalsobenatural,thoughweshallnotlookatthathere.
Anotherpossibilityisthattheclosestwordstothetargetwordsmightbemoresyntacticallythansemanticallyrelevant,andsowemightdowelltoexcludethemfromthewindow(anoffsetrectangularwindow).
Figure3showshowtheperformanceonourfourtesttasksdependsonthewindowsizeandshapes.
UsingPosi-tivePMICosine,asymmetricalrectangularwindowofsizeoneproducesthehighestscoreineachcase,apartfromtheTOEFLtaskwhereatriangularwindowofsizefourisslightlybetter.
Thereisageneraltrendforthetriangularwindowstoproduceplotsthatareessentiallyequivalenttorectangularwindowsofasmallersize.
ForthebestperformingPositivePMICosinecase,afairlyFigure2.
CosinedistancesbetweenPositivePMIvectorsfromtwocorporaforthesameword,semanticrelatedwords,andunrelatedcontrolwords(leftgraphs),andtheratiosofthosedistancesforindividualwords(rightgraphs).
Twocorporasizesareused:44.
8mil-lionwords(uppergraphs),and4.
6millionwords(lowergraphs).
1051041031020.
60.
70.
80.
91.
0SameRelatedControlWordCount1051041031020.
70.
80.
91.
01.
11.
21.
3Control/relatedSame/relatedWordCount1041031021010.
60.
70.
80.
91.
0SameRelatedControlWordCount1041031021010.
70.
80.
91.
01.
11.
21.
3Control/relatedSame/relatedWordCount518BULLINARIAANDLEVYclearpictureemergesinwhichperformanceisbestforwindowsizeone,andtheoffsetrectangularwindowsarenotagoodideaatall.
Forthelesssuccessfulvectoranddistancetypes,thepatternismuchlessclear.
TheProb-abilityEuclideancaseillustratesthisinFigure3.
Some-timestheoffsetrectangularwindowisbest(forsemanticclustering),sometimesfarworsethantheothers(TOEFLandsyntacticclustering),andtheoptimalwindowsizeisdifferentforeachtask.
Thechangeinperformanceasonevariesthewindowsizecanbeunderstoodasaconsequenceofthetrade-offoftheincreasedcontextinformation,higherwordcountsandbetterstatisticalreliabilityforlargerwindows,againsttheincreasedlikelihoodofirrelevantandmisleadingcontextinformationbeingincludedinthecounts.
Itisnotsurprisingthen,thatthetrade-offandoptimalwindowtypeandsizedependsonthevectorcomponenttypeanddistancemeasureemployed,andweshallseelaterthatitisalsoaffectedbythenumberofvectorcomponentsusedandthesizeofthecorpus.
ItisinterestingthathereusingPositivePMICosineweachievethebestperformancelevelsforalltasksusingminimalwindowsizes,whereasinpreviousworkwithlesseffectivevectortypesanddistancemeasures(Levyetal.
,1998;Pateletal.
,1997),weconcludedthatminimalwin-dowswereonlyappropriateforsyntactictasks,andthatlargerwindowsizeswerebetterforsemantictasks,withnoclearoptimalwindowsizeforallsuchtasks.
Thisshowstheimportanceofafullsystematicstudysuchasthis,andmayhaveimplicationsfortheoriesoftheimplementationofsuchalgorithmsinpsychologicalorneuralmodelswhereonlyminimalbuffersizeorworkingmemorystoragewouldap-peartobenecessarytoextractusefulinformation.
TheNumberofVectorComponentsAreasonablesizedcorpus,suchasthe89.
7millionwordBNCcorpus,willcontainoftheorderof600,000differentwordstypeswhichwilleachgiverisetoonecomponentforeachofourvectors.
Ifwerankthesewordsinorderoffrequencyofoccurrenceinthecorpus,wefindthefamiliarZipf'slawplotsseeninFigure4,inwhichthelogofeachword'sfrequencyfallsalmostlinearlywiththelogofitspositioninthefrequencyorderedwordlist.
ThisreflectsacommonfeatureofnaturallanguageswherebythereareveryfewveryhighfrequencywordsandveryFigure3.
Performanceonthefourtasksasafunctionofwindowsizeandshapefortworepresentativevectortypesanddistancemeasures.
10010130507090RectangularTriangularOffsetrect.
RectangularTriangularOffsetrect.
RectangularTriangularOffsetrect.
RectangularTriangularOffsetrect.
WindowSizePercent100101708090100WindowSizePercent10010130507090WindowSizePercent100101255075100WindowSizePercentPositivePMICosineProbabilityEuclideanPositivePMICosineProbabilityEuclideanPositivePMICosineProbabilityEuclideanPositivePMICosineProbabilityEuclideanTOEFLDistanceSem.
ClusterSynt.
ClusterEXTRACTINGSEMANTICREPRESENTATIONS519manyverylowfrequencywords.
Goodestimatesoftheprobabilitiesthatmakeupoursemanticvectorcompo-nentswillrequirereasonablyhighfrequenciesforboththetargetandcontextwords.
Inthesamewaythatweearliersawthatlowfrequencytargetwordshadlessreliablevec-tors,itislikelythatthecomponentscorrespondingtolowfrequencycontextwordswillalsobeunreliable,andifweuseadistancemeasure(suchasEuclidean)whichtreatsallthecomponentsequally,thiscouldresultinpoorper-formance.
Astraightforwardwaytotestthisistoorderthevectorcomponentsaccordingtothecontextwordfrequen-ciesandseehowtheperformancevariesaswereducethevectordimensionalitybyremovingthelowestfrequencycomponents.
Althoughthiswillremovetheleastreliablecomponents,italsomeansthattheprobabilitieswillnolongersumtoone,andwemayberemovingusefulinfor-mationfromthedistancemeasure.
Thisisatrade-offthatwillclearlyneedempiricalinvestigation.
Figure5showshowtheperformanceonourfourtasksdependsonthenumberofcomponentsusedforPositivePMICosineforwindowsizeone.
Italsoshowstheeffectoftreatingtheleftandrightcontextwordsseparatelytogivefourdifferentrectangularwindowtypes:awindowofonewordtotheleftofthetarget(L),awindowofonewordtotheright(R),awindowconsistingofonewordontheleftandoneontheright(LR),andadoublelengthvectorcontainingseparateleftandrightwindowcom-ponents(L&R).
Thegeneraltrendhereisthatthemorecomponentsweuse,thebetter,andthattheL&Rstylevectorsworkbest(thoughforthesemantictasks,onlyslightlybetterthanLR).
FortheTOEFLtask,whichcontainsanumberofratherlowfrequencywords,wedofindaslightfalloffinperformancebeyondaround10,000components,butfortheothertaskswearestillseeingim-provementsat100,000components.
Suchapatternisnotgeneral,however.
Forlessefficientcomponenttypesand/ordistancemeasures,theperformancecanfalloffdrasti-callyifweusetoomanylowerfrequencycomponents.
Forexample,Figure6showsthisclearlyfortheEuclideandistancemeasurewiththePositivePMIcomponents.
ThisismorelikethedependenceonvectordimensionfoundintheworkofLandauerandDumais(1997),thoughthepeakhereisaround1,000dimensionsofrawco-occurrencedataratherthan300dimensionsderivedusingSVD.
Thereareotherwaysinwhichonemightreasonablyat-tempttoimproveperformancebyreducingthenumberofvectorcomponents,andwehavelookedatsomeoftheseinmoredetailelsewhere(Levy&Bullinaria,2001).
First,itiscommonpracticeintheinformationretrievallitera-turetoexcludea"stoplist"ofclosedclassandotherpre-sumeduninformativewordsfromconsiderationascontextdimensions(Manning&Schütze,1999).
Wehavefoundthatthispracticeactuallyresultsinasignificantreductioninperformance,andshouldthusbeavoided.
Theutilityofclosedclassorgrammaticalwordscanbeestimatedbylookingatscoresforthefirst150orsodimensionscor-respondingtothehighestfrequencywordsinEnglish,asthesearelargelythosethatwouldbeexcludedbyuseofastoplist.
WecanseeinFigure5thatthesewordsaloneareabletoachieveaTOEFLscoreofaround65%.
Anotherideaistoorderandtruncatethecontextwordsaccordingtothevarianceoftheircomponentsacrossallthetargetwordsinthecorpus(Lund&Burgess,1996),ratherthanbyfrequency.
Wefoundtheretobesuchastrongcorre-lationbetweensuchvarianceandwordfrequencyanyway,thatthisapproachgivesverysimilarresultstothefrequencyordering,andsoonemightjustaswellusethefrequencyorderingandavoidtheneedtocomputethevariances.
Ourresultsherehaveobviousimplicationsforneuralandpsychologicalmodelbuilding.
MethodssuchasPosi-tivePMICosineautomaticallymakegooduseofthelessstatisticallyreliabledimensionscorrespondingtothelowerfrequencycontextwords,andthusobviatetheneedforanydimensionalreductionorothermanipulationsoftherawvectorspace.
However,ifthereexistimplementationalreasons(e.
g.
,relatedtoneuralorcognitivecomplexity)forusingothermethods,forwhichdetrimentaleffectscanarisefromthelowerfrequencycontextwords,thenthesewillclearlyneedtobeaddressedbyincorporatingaddi-tionalmechanisms.
Figure4.
Zipf'slawplotsoflogwordfrequencyagainstlogofwordpositioninafrequencyorderedlist,fortheuntaggedandtaggedversionsoftheBNCcorpus.
100,00010,0001,000100101WordPosition100,00010,0001,000100101WordPosition107106105104103102101107106105104103102101CountsinUntaggedCorpusCountsinTaggedCorpus520BULLINARIAANDLEVYDependenceonCorpusSizeClearly,thelargerthetrainingcorpus,themorerepresen-tativeitislikelytobe,andthusthemorereliablethestatis-ticsforthelowfrequencywordsandcomponents.
WehavealreadyseenthisexplicitlyinFigure2.
Wealsoknowthatourfullcorpussizeismorethanmostchildrenwillexperi-ence,andsoifoneneedsacorpusthislargeforthelearningoflexicalsemanticinformation,thissimplemethodalonewillnotbeadequatetoaccountforhumanperformance.
Fortunately,itisstraightforwardtoexploretheeffectofcor-pussizebyslicinguptheBNCcorpusintodisjointsubsetsofvarioussizesandrepeatingtheaboveexperiments.
Figure7showshowthebestperformancelevelsfallforourfourtasksaswereducethecorpussize.
Notethelogarithmicscale,andthatevenforcorporaofaround90millionwords,theTOEFLandsemanticclusteringresultsarestillclearlyimprovingwithincreasedcorpussize.
Thedistanceandsyntacticclusteringtasksareclosetoceilingperformanceat90millionwords.
Humanchildrenwillbeluckytoexperience10millionwords,andperformanceonallthesemantictasksdeterioratesignificantlywhenthecorpussizeisreducedthatmuch.
With4.
6millionwordsfromtheBNCCorpus,theperformanceontheTOEFLtaskis60.
4%4.
4%,comparedwith64.
4%obtainedintheLandauerandDumais(1997)studyusingadifferentcorpusofthatsize.
WehaveseenthatusingthePositivePMICosinemethod,performanceincreasesasthecorporagetlarger,andthatthesemanticclusteringandsyntacticclusteringappeartobeparticularlysensitivetosmallcorpussizes.
Thisdemonstrateshowinvestigationssuchasourscanconstrainneuralandcognitivemodelbuilding,inthatwemayfindtheperformanceisunrealisticallylowwithrealisticlevelsoflearningmaterial.
ItcouldbethatthisindicatestheneedtoincorporatemorepowerfulstatisticalinductivetechniquessuchasthedimensionalityreductionusedinLSA(Landauer&Dumais,1997).
CorpusQualityAnotherfactorthatwillsurelyaffectthequalityoftheemergentsemanticrepresentationsisthequalityofthecorpustheyarederivedfrom.
Wehavealreadyseen,inFigure7,alargevarianceinresultsfromdistinctsubsec-tionsoftheBNCcorpus.
SomeofthisvarianceisduetothestatisticalvariationsevidentinFigure2,butmuchisduetoqualityissues.
Forexample,theBNCcorpusisdesignedtorepresentarangeofdifferentsources(Aston&Burnard,1998),whichresultsingoodvectorsfortheFigure5.
Performanceonthefourtasks,forfourdifferentrectangularwindowtypes,asafunctionofthenumberoffrequencyorderedvectordimensions,forPositivePMIcomponentsandtheCosinedistancemeasure.
100,00010,0001,000100101104070100LRL+RL&RLRL+RL&RL&RLRL+RL&RLRL+RL&RLRL+RL&RL&RLRL+RL&RDimensionsPercentDistance100,00010,0001,000100101104070100DimensionsPercent100,00010,0001,0001001010306090DimensionsPercent100,00010,0001,000100101104070100DimensionsPercentTOEFLSynt.
ClusterSem.
ClusterEXTRACTINGSEMANTICREPRESENTATIONS521corpusasawhole,butitalsoresultsinsomesubsectionshavingunusualwordfrequencydistributions,andotherswithsignificantportionsofnonstandardEnglish(suchashaving"picturewindows"writtenas"pitcherwinders"),bothofwhichwillresultinpoorvectorsfromthosesec-tions.
Weneedtolookmorecarefullyattheeffectofpoorqualitycorpora,andtesttheintuitionthatincreasedquan-titycouldbeusedtocompensateforpoorquality.
Areadysourceof"poorqualityEnglish"isprovidedbyInternet-basednewsgroups,andsowecreateda168millionwordcorpusfromarandomselectionofsuchmes-sagesonaparticulardayin1997.
Wedidthisbydown-loadingtherawfiles,removingduplicatemessages,fileheaders,nontextsegments,andpunctuation,toleaveasimplewordlistinthesameformatasourde-taggedBNCcorpus.
WecouldthenrepeattheexperimentscarriedoutontheBNCcorpus.
Thelackoftagsprecludedusingthesyntacticclusteringtest,andtherewasinsufficientusageoftoomanyTOEFLwordstogivereliableresultsforthattest.
Figure8showstheresultsonthesemanticclusteringanddistancecomparisontestsforvarioussizednewsgroupcorpora,comparedwithcorrespondingBNCsubsets.
Atallcorpussizesweseeamassivereductioninperformance,andincreaseinvariability,forthenewsgroupcorpora,andtheincreaseinquantityrequiredtoachievecomparableperformancelevelsisconsiderable.
Thisdependenceoncorpusqualitywillnaturallyhaveenormousconsequencesformodelinghumanperfor-mance.
Itisclearlynotsufficienttomatchthequantityoflanguageexperiencedbetweenhumanandmodel,onehastomatchthequalitytoo.
ResultsforSmallerCorporaThereducedperformanceandincreasedvariabilityfoundforsmallcorporaleadsustoconsiderwhetherthegeneraltrendsobservedabovestillholdformuchsmallercorpora.
LandauerandDumais(1997)useda4.
6millionwordcorpusderivedfromtheelectronicversionofGro-lier'sAcademicAmericanEncyclopedia.
Thisislikelytobeamorerepresentativecorpusthanthesimilarsizedran-domsubsectionsoftheBNCcorpususedforFigure7,andshouldthusbeofbetter"quality"inthesensediscussedabove.
Wethereforeusedthatcorpustorepeatthemainsemantictaskexperimentspresentedearlier.
Thelackoftaggingprecludesusingitforthesyntactictask,butthevariationinthatcaseacrossBNCsubcorporaisrelativelysmall,soatypicalBNCsubsetofthesamesizewasusedinsteadforthattask.
Figure6.
Performanceonthefourtasks,forfourdifferentrectangularwindowtypes,asafunctionofthenumberoffrequencyorderedvectordimensions,forPositivePMIcomponentsandtheEuclideandistancemeasure.
100,00010,0001,000100101104070100LRL+RL&RL&RLRL+RL&RLRL+RL&RLRL+RL&RL&RLRL+RL&RLRL+RL&RDimensionsPercentDistance100,00010,0001,000100101104070100DimensionsPercent100,00010,0001,0001001010306090DimensionsPercent100,00010,0001,000100101104070100DimensionsPercentTOEFLSynt.
ClusterSem.
Cluster522BULLINARIAANDLEVYFigure7.
ThebestperformanceoneachtaskasafunctionofcorpussizeforPositivePMIcomponentsandCosinedistancemeasure.
TheerrorbarsshowthevariationoverdifferentsubcorporafromthefullBNCcorpus.
106107108106107108104070100CorpusSizePercent104070100CorpusSize106107108106107108CorpusSizeCorpusSizePercent104070100Percent104070100PercentDistanceTOEFLSynt.
ClusterSem.
ClusterFigure9showsthehistogramsofbestperformanceforeachvectortypeanddistancemeasure,forcomparisonwithFigure1.
Wedoseechangesintheorderings,butforthesemantictasksPositivePMICosineisstilltheclearbestperformer,andRatiosCosineisstillsecondbest.
ForthesyntacticclusteringRatiosCosineisagainthebestap-proach.
ComparisonwithFigure7showsthattheGroliercorpusdoesgiveusmuchbetterperformancethansimilarsizedBNCsubcorpora:72.
5%comparedwith60.
4%4.
4%,andthe64.
4%obtainedintheLandauerandDumais(1997)study.
Thisconfirmshowthequalityofthecorpus,aswellasthecomputationalmethod,affectstheresults,anditisgratifyingthatamorepsychologicallyrealisticcorpusshowsbetterperformance.
InFigure10wesummarizethemaineffectsofwindowsizeandvectordimensionsforthePositivePMICosinecase,forcomparisonwiththeresultsinFigures3and5.
ForsyntacticclusteringtheperformancefallsoffsharplywithwindowsizeasforthefullBNCcorpus,butforthesemantictasksthedependenceismorevariable.
Forthedistancecomparisontask,thedependenceisstillratherflat,butthereisaclearerpeakatwindowsizetwo.
Forse-manticclusteringthedependenceisagainratherflat,andthepeakhasshiftedtoaroundwindowsizeeight.
FortheTOEFLtask,windowsizetwoisnowbest,withafairlysharpfalloffforlargerwindows.
Asfarasthenumberofvectorcomponentsgo,wegetsimilarpatternsheretothosefoundforthelargerBNCcorpus,withageneraltrendformorecomponentsbeingbetter,exceptforverylargenumbersofcomponentsfortheTOEFLtask.
Theseresultsdemonstratethat,althoughsomeofourop-timaldetails(suchasvectortypeanddistancemeasure)arerobustacrossdifferentconditions,others(suchaswindowsize)dovarydependingonfactorssuchascorpussize,qual-ityofcorpus,andnatureofthetask.
Althoughthemainvari-ationsareunderstandablefromatheoreticalpointofview(e.
g.
,forsmallercorpora,largerwindowsprovidelargerwordcountsandthusreducethestatisticalunreliabilityofthevectorcomponents),theydohaveobviousimplicationsforbuildingmodelsofhumanperformance.
DiscussionandConclusionsThecomputationalexperimentsthatwehavereportedinthisarticleprovidefurtherconfirmationthatusefulin-formationaboutlexicalsemanticscanbeextractedfromsimpleco-occurrencestatisticsusingstraightforwarddis-tancemetrics.
Thetechnologicalimplicationsareclearandhavealreadybeendemonstratedelsewhere,namelythatEXTRACTINGSEMANTICREPRESENTATIONS523thereisagreatdealofinformationavailableforthetak-ing,andthatitmaybeusefulformanyapplications,suchaswordsensedisambiguationandinformationretrieval.
However,ourfocusherehasbeenontheuseofsuchanunderlyingframeworkinpsychologicaltheorizing.
Hereaswell,previousstudieshaveshownnumerouspotentialapplications.
Nevertheless,wewouldarguethatitisusefultostepbackbeforeanyparticularmethodologybecomesfavoredorfashionable,andfullyexploretheavailablepa-rameterspace.
Wehavepresentedhereamoredetailedandsystematicexplorationoftherelevantparametersandde-signdetailsthanisevidentinpreviousstudies.
Ourexperimentshavedemonstratedthatasimplemethodbasedonvectorswithcomponentsthatarethepositivepointwisemutualinformation(PMI)betweenthetargetwordsandwordswithinasmallcontextwindow,anddistancescomputedusingastandardcosine,isremarkablyeffectiveonourthreebenchmarksemantictasksandonesyntactictask.
Smallwindowsarefoundtobethemostef-fective,closedclasswordsdoprovideusefulinformation,lowfrequencywordsdoaddusefulinformationformosttasks,andcorpussizeandqualityareimportantfactors.
Wenotealsothatforourbestperformingco-occurrencestatistics,dimensionalityreductionisnotnecessarytoproducesomeexcellentresults.
AprimeexampleisouranalysisoftheTOEFLtaskwhere,fora90millionwordcorpus,weachieveabestperformanceof85%,andshowexactlyhowtheperformancefallsoff(butisstilluseful)aswevarytheparametersawayfromthebestvalueswehavefound.
Oncewesettleonthebestapproachwehavefound,namelyPositivePMIcomponentsandCosinedistances,theoptimalparametervaluesarefairlyrobustacrossdiffer-enttasksandcorpora,butforotherapproaches,theresultsappeartobemuchmorevariable.
Wehavelimitedourexperimentstothesimplestmanip-ulations,preferringtounderstandthesebeforecommittingourselvestomorecomplexassumptions.
Thismeansthatthisworkisentirelymethodological,andneednotinitselfcontradicttheconclusionsdrawnbymodelsofpsycholog-icalphenomenathathavealreadybeendeveloped,suchastheLandauerandDumais(1997)modelofchildren'swordmeaningacquisitionfromtextinputatschool.
Rather,weareclaimingthatitisimportanttofullyunderstandhowvariationsinparametersanddesignaffectthesuccessofthemethod,sothatthedetailsofaparticularmodelcanbefullyjustified.
Forexample,windowsizemightmirrortheconstraintofaworkingmemorycomponent,andcor-pussizeandqualitymayconstrainhowrealisticasourcecorpusmustbefortrainingamodelsothatitaccuratelymirrorsgenuinehumanexperience.
Formodelandtheorybuildinginpsychologyandcognitivescience,knowledgeaboutoptimalparametervaluesisundoubtedlyuseful,butneednotbetotallyconstraining.
Whatisimportantisthatweunderstandhowparametersthatareconstrainedbyourknowledgeofneuralorcognitivesystems,suchasthena-tureoflanguageexperience,workingmemorycapacityorthelearningalgorithmsthatunderliethecomputationofco-occurrenceorpairwisedistancecomputations,mightaffecttheefficiencyoflexicalinformationinduction.
Wehope,nowthatthesimplestformsofextractingsemanticinformationfromco-occurrencepatternshavebeensystematicallystudied,thatthemethodologycanbeextendedtoincludeconstraintsfromfurthersourcesofknowledge.
Itislikelythat,ifco-occurrencepatternsareusedasasourceofinformationforinducinglexi-calsemanticconstraints,thenknowledgeofsyntaxandmorphologyisalsoused.
Thiswouldmeanthatthecom-putationalexperimentswehaveoutlinedherecouldbeex-tendedtoexploretheeffectsoflemmatizingorstemming(reducingtheformsofwordstotheirbasicformssothatwalk,walking,walks,andwalkedwouldallbecountedasinstancesofwalk),orthatthepartofspeechofawordthatcanbeinducedfromparsingasentencecouldbeusedtocountthedifferentsyntacticusagesofawordseparately(e.
g.
,bankasanounorverb).
Extrainformationfromper-ceptioncouldalsobeincluded,eitherassomethingforawordtoco-occurwith(Landauer&Dumais,1997),orasacompletelyseparatesourceofinformationthatiscom-binedwithsimplelexicalco-occurrenceinordertolearnmoreflexiblyandperhapstogroundtherepresentationsFigure8.
ThebestperformanceoneachtaskasafunctionofcorpussizeandqualityforPositivePMIcomponentsandCosinedis-tancemeasure.
Theerrorbarsshowthevariationoverdifferentsubcorpora.
0306090CorpusSizePercentBNCNewsgroups104070100PercentBNCNewsgroups106107108109CorpusSize106107108109DistanceSem.
Cluster524BULLINARIAANDLEVYinducedfromsimpleco-occurrence.
Cuecombinationisclaimedtobenecessarytosolveproblemsthatappeartobesimplerthanthelearningofmeaning,forexample,thelearningofwordsegmentation(Christiansen,Allen,&Seidenberg,1998),anditwouldappearlikelythatmulti-plesourcesofinformationarerequiredforlearningaboutmeaning.
Afinalsuggestionforextendingwhatco-occurrencepatternsmightaccountfor,istotakeadvantageofthefactthatnotalllearningisunsupervised.
Humansdomorethanprocessstreamsofwordco-occurrences—theyarealsotaughtwordmeanings,usedictionaries,andlearnfrommanyothersourcesofinformation.
Learningalgo-rithmsintheneuralnetworkliteraturetendtobeeithersupervisedorunsupervised,butthesemethodscanbecombined(O'Reilly,1998).
Forexample,anunsuper-visedself-organizingmap(SOM)canberefinedusingsupervisedlearningvectorquantization(LVQ)methods(Kohonen,1997).
Inthesameway,wecanrefineourbasiccorpusderivedrepresentationsdescribedabovebyanynumberofsupervisedtechniques.
OnesimpleapproachcouldbetodefineatotaldistancemeasureDbetweenmembersofsetsofsynonymsS{si}withvectorcom-ponentsv(si,c),DvscvscijcssSSij,,,,2andthenuseastandardgradientdescentprocedure(Bishop,1995)toreducethatdistance—thatis,updatethevectorsusing(si,c)hD/(si,c)forasuitablestepsizeh.
Asimilarapproachcouldbeusedtominimizeanywelldefinedmeasureofperformanceerror.
Makingsurethatmeasureissufficientlyrepresentative,andthatthestepsizeisnottoodisruptive,willnotbeeasy,andinpractice,quitesophisticatedvariationsonthisthemearelikelytoberequired,butthisisanaspectofthisfieldthatwillcertainlybeworthpursuinginfuture.
Figure9.
Thebestperformanceonthefourtasksforeachofthevectortypesanddistancemeasures,forthesmaller4.
6millionwordGroliercorpus.
PosPMICosineRatiosCosineBhattacharyaHellingerCityBlockPosPMIEuclideanKullback-LeiblerPosPMICityBlockCosineRatiosCityBlockEuclideanRatiosEuclideanPMICityBlockPMIEuclidean45556575PercentPosPMICosineRatiosCosineHellingerCityBlockPMICityBlockPMIEuclideanPosPMICityBlockPosPMIEuclideanRatiosCityBlockKullback-LeiblerEuclideanRatiosEuclideanBhattacharyaCosine50607080PercentPosPMICosineRatiosCosineBhattacharyaHellingerCityBlockRatiosEuclideanCosineEuclideanRatiosCityBlockPosPMIEuclideanPosPMICityBlockKullback-LeiblerPMIEuclideanPMICityBlock25405570PercentRatiosCosineRatiosEuclideanHellingerKullback-LeiblerBhattacharyaPosPMICosinePMIEuclideanCosineRatiosCityBlockPMICityBlockCityBlockEuclideanPosPMIEuclideanPosPMICityBlock406080100PercentDistanceSem.
ClusterTOEFLSynt.
ClusterEXTRACTINGSEMANTICREPRESENTATIONS525AUTHORNOTEWethankMaltiPatelforearliercollaborations,WillLowefornumer-oushelpfuldiscussionsoncorpusanalysis,TomLandauerforarrangingaccesstotheTOEFLmaterials,MacquarieUniversityandtheUniversityofBirminghamforvisitingappointments,andtheuniversitiesthathaveemployeduswhileworkingonthisproject:UniversityofEdinburgh,BirkbeckCollegeattheUniversityofLondon,UniversityofReading,UniversityofGreenwich,UniversityofBirmingham,andRoehamptonUniversity.
CorrespondenceconcerningthisarticleshouldbeaddressedtoJ.
A.
Bullinaria,SchoolofComputerScience,UniversityofBirming-ham,BirminghamB152TT,England(e-mail:j.
a.
bullinaria@cs.
bham.
ac.
uk).
REFERENCESAston,G.
,&Burnard,L.
(1998).
TheBNChandbook:ExploringtheBritishNationalCorpuswithSARAEdinburgh:EdinburghUniversityPress.
Audet,C.
,&Burgess,C.
(1999).
Usingahigh-dimensionalmemorymodeltoevaluatethepropertiesofabstractandconcretewords.
Pro-ceedingsoftheTwenty-FirstAnnualConferenceoftheCognitiveSci-enceSociety(pp.
37-42).
Mahwah,NJ:Erlbaum.
Battig,W.
F.
,&Montague,W.
E.
(1969).
Categorynormsforverbalitemsin56categories:AreplicationandextensionoftheConnecticutcategorynorms.
JournalofExperimentalPsychology,80(3,Pt.
2),1-46.
Bishop,C.
M.
(1995).
Neuralnetworksforpatternrecognition.
Oxford:OxfordUniversityPress.
Bullinaria,J.
A.
,&Huckle,C.
C.
(1997).
Modellinglexicaldeci-sionusingcorpusderivedsemanticrepresentationsinaconnectionistnetwork.
InJ.
A.
Bullinaria,D.
W.
Glasspool,&G.
Houghton(Eds.
),FourthNeuralComputationandPsychologyWorkshop:Connectionistrepresentations(pp.
213-226).
London:Springer.
Burgess,C.
(2000).
Theoryandoperationaldefinitionsincomputa-tionalmemorymodels:AresponsetoGlenbergandRobertson.
Jour-nalofMemory&Language,43,402-408.
Burgess,C.
(2001).
Representingandresolvingsemanticambiguity:Acontributionfromhigh-dimensionalmemorymodeling.
InD.
S.
Gor-fein(Ed.
),Ontheconsequencesofmeaningselection:Perspectivesonresolvinglexicalambiguity.
Washington,DC:AmericanPsychologi-calAssociation.
Burgess,C.
,&Conley,P.
(1999).
Representingpropernamesandob-jectsinacommonsemanticspace:Acomputationalmodel.
Brain&Cognition,40,67-70.
Christiansen,M.
H.
,Allen,J.
,&Seidenberg,M.
S.
(1998).
Learn-ingtosegmentspeechusingmultiplecues:Aconnectionistmodel.
Language&CognitiveProcesses,13,221-268.
Church,K.
W.
,&Hanks,P.
(1990).
Wordassociationnorms,mutualinformationandlexicography.
ComputationalLinguistics,16,22-29.
Conley,P.
,Burgess,C.
,&Glosser,G.
(2001).
AgeandAlzheimer's:Acomputationalmodelofchangesinrepresentation.
Brain&Cogni-tion,46,86-90.
Deerwester,S.
,Dumais,S.
T.
,Furnas,G.
W.
,Landauer,T.
K.
,&Harshman,R.
(1990).
IndexingbyLatentSemanticAnalysis.
Jour-naloftheAmericanSocietyforInformationScience,41(6),391-407.
Denhière,G.
,&Lemaire,B.
(2004).
Acomputationalmodelofchil-dren'ssemanticmemory.
InProceedingsTwenty-SixthAnnualMeet-ingoftheCognitiveScienceSociety(pp.
297-302).
Mahwah,NJ:Erlbaum.
Finch,S.
P.
,&Chater,N.
(1992).
Bootstrappingsyntacticcategories.
InProceedingsoftheFourteenthAnnualConferenceoftheCognitiveScienceSocietyofAmerica(pp.
820-825).
Hillsdale,NJ:Erlbaum.
Firth,J.
R.
(1957)Asynopsisoflinguistictheory1930–1955.
InStud-iesinlinguisticanalysis(pp.
1-32).
Oxford:PhilologicalSociety.
[ReprintedinF.
R.
Palmer(Ed.
)(1968).
SelectedpapersofJ.
R.
Firth1952–1959.
London:Longman.
]French,R.
M.
,&Labiouse,C.
(2002).
Fourproblemswithextractinghumansemanticsfromlargetextcorpora.
ProceedingsoftheTwenty-FourthAnnualConferenceoftheCognitiveScienceSociety(pp.
316-322).
Mahwah,NJ:Erlbaum.
Glenberg,A.
M.
,&Robertson,D.
A.
(2000).
Symbolgroundingandmeaning:Acomparisonofhigh-dimensionalandembodiedtheoriesofmeaning,JournalofMemory&Language,43,379-401.
Harnad,S.
(1990).
Thesymbolgroundingproblem.
PhysicaD,42,335-346.
Haykin,S.
(1999).
Neuralnetworks:Acomprehensivefoundation(2nded.
).
UpperSaddleRiver,NJ:PrenticeHall.
Hofmann,T.
(2001).
Unsupervisedlearningbyprobabilisticlatentse-manticanalysis.
MachineLearningJournal,42,177-196Hu,X.
,Cai,Z.
,Franceschetti,D.
,Graesser,A.
C.
,&Ventura,M.
(2005).
Similaritybetweensemanticspaces.
InProceedingsoftheTwenty-SeventhAnnualConferenceoftheCognitiveScienceSociety(pp.
995-1000).
Mahwah,NJ:Erlbaum.
Hu,X.
,Cai,Z.
,Franceschetti,D.
,Penumatsa,P.
,Graesser,A.
C.
,Louwerse,M.
M.
,McNamara,D.
S.
,&TRG(2003).
LSA:Thefirstdimensionanddimensionalweighting.
InProceedingsoftheTwenty-FifthAnnualConferenceoftheCognitiveScienceSociety(pp.
1-6).
Mahwah,NJ:Erlbaum.
Kintsch,W.
(2000).
Metaphorcomprehension:Acomputationaltheory.
PsychonomicBulletin&Review,7,257-266.
Kintsch,W.
,&Bowles,A.
R.
(2002).
Metaphorcomprehension:WhatmakesametaphordifficulttounderstandMetaphor&Symbol,17,249-262.
Kohonen,T.
(1997).
Self-organizingmaps(2nded.
).
Berlin:Springer.
Landauer,T.
K.
,&Dumais,S.
T.
(1997).
AsolutiontoPlato'sproblem:Figure10.
Dependenceonwindowsizeandnumberoffrequencyorderedvectordimensionsforthefourtasks,forPositivePMIcomponentsandCosinedistancemeasure,forthesmaller4.
6millionwordGroliercorpus.
10010130507090TOEFLDistanceSem.
Clust.
Synt.
Clust.
TOEFLDistanceSem.
Clust.
Synt.
Clust.
WindowSizePercent100,00010,0001,0001001010306090DimensionsPercent526BULLINARIAANDLEVYThelatentsemanticanalysistheoryofacquisition,inductionandrep-resentationofknowledge.
PsychologicalReview,104,211-240.
Letsche,T.
A.
,&Berry,M.
W.
(1997).
Large-scaleinformationretrievalwithLatentSemanticIndexing.
InformationSciences—Applications,100,105-137.
Levy,J.
P.
,&Bullinaria,J.
A.
(2001).
Learninglexicalpropertiesfromwordusagepatterns:WhichcontextwordsshouldbeusedInR.
F.
French&J.
P.
Sougne(Eds.
),Connectionistmodelsoflearning,developmentandevolution:ProceedingsoftheSixthNeuralCompu-tationandPsychologyWorkshop(pp.
273-282).
London:Springer.
Levy,J.
P.
,Bullinaria,J.
A.
,&Patel,M.
(1998).
Explorationsinthederivationofsemanticrepresentationsfromwordco-occurrencestatistics.
SouthPacificJournalofPsychology,10,99-111.
Lowe,W.
(2001).
Towardsatheoryofsemanticspace.
InProceedingsoftheTwenty-ThirdAnnualConferenceoftheCognitiveScienceSociety(pp.
576-581).
Mahwah,NJ:Erlbaum.
Lowe,W.
,&McDonald,S.
(2000).
Thedirectroute:Mediatedpriminginsemanticspace.
ProceedingsoftheTwenty-SecondAnnualConfer-enceoftheCognitiveScienceSociety(pp.
806-811).
Mahwah,NJ:Erlbaum.
Lund,K.
,&Burgess,C.
(1996).
Producinghigh-dimensionalsemanticspacesfromlexicalco-occurrence.
BehaviorResearchMethods,In-struments,&Computers,28,203-208.
Manning,C.
D.
,&Schütze,H.
(1999).
Foundationsofstatisticalnaturallanguageprocessing.
Cambridge,MA:MITPress.
McDonald,S.
,&Lowe,W.
(1998).
Modellingfunctionalprimingandtheassociativeboost.
InProceedingsoftheTwentiethAnnualConferenceoftheCognitiveScienceSociety(pp.
675-680).
Mahwah,NJ:Erlbaum.
McDonald,S.
A.
,&Shillcock,R.
C.
(2001).
Rethinkingthewordfrequencyeffect:Theneglectedroleofdistributionalinformationinlexicalprocessing.
Language&Speech,44,295-323.
Miller,T.
(2003).
Essayassessmentwithlatentsemanticanalysis.
JournalofEducationalComputingResearch,28,2003.
Monaghan,P.
,Chater,N.
,&Christiansen,M.
H.
(2005).
Thedif-ferentialroleofphonologicalanddistributionalcuesingrammaticalcategorization,Cognition,96,143-182.
O'Reilly,R.
C.
(1998).
Sixprinciplesforbiologically-basedcompu-tationalmodelsofcorticalcognition.
TrendsinCognitiveSciences,2,455-462.
Patel,M.
,Bullinaria,J.
A.
,&Levy,J.
P.
(1997).
Extractingseman-ticrepresentationsfromlargetextcorpora.
InJ.
A.
Bullinaria,D.
W.
Glasspool,&G.
Houghton(Eds.
),FourthNeuralComputationandPsychologyWorkshop:ConnectionistRepresentations(pp.
199-212).
London:Springer.
Redington,M.
,Chater,N.
,&Finch,S.
(1998).
Distributionalinfor-mation:Apowerfulcueforacquiringsyntacticcategories,CognitiveScience,22,425-469.
Saussure,F.
de(1916).
Coursdelinguistiquegénérale.
Paris:Payot.
Schütze,H.
(1993).
Wordspace.
InS.
J.
Hanson,J.
D.
Cowan,&C.
L.
Giles(Eds.
),Advancesinneuralinformationprocessingsystems(Vol.
5,pp.
895-902).
SanMateo,CA:MorganKauffmann.
Schütze,H.
(1998).
Automaticwordsensediscrimination,Computa-tionalLinguistics,24,97-123.
Turney,P.
D.
(2001).
MiningtheWebforsynonyms:PMI-IRversusLSAonTOEFL.
InL.
DeRaedt&P.
A.
Flach(Eds.
),ProceedingsoftheTwelfthEuropeanConferenceonMachineLearning(ECML-2001)(pp.
491-502).
Berlin:Springer.
Wolfe,M.
B.
W.
,&Goldman,S.
R.
(2003).
UseofLatentSemanticAnalysisforpredictingpsychologicalphenomena:Twoissuesandproposedsolutions.
BehaviorResearchMethods,Instruments,&Computers,35,22-31.
Zhu,H.
(1997).
Bayesiangeometrictheoryoflearningalgorithms.
InProceedingsoftheInternationalConferenceonNeuralNetworks(ICNN'97),2,1041-1044.
(ManuscriptreceivedJanuary6,2006;revisionacceptedforpublicationMay22,2006.
)
官方网站:https://www.shuhost.com/公司名:LucidaCloud Limited尊敬的新老客户:艰难的2021年即将结束,年终辞旧迎新之际,我们准备了持续优惠、及首月优惠,为中小企业及个人客户降低IT业务成本。我们将持续努力提供给客户更好的品质与服务,在新的一年期待与您有美好的合作。# 下列价钱首月八折优惠码: 20211280OFF (每客户限用1次) * 自助购买可复制...
关于HostDare服务商在之前的文章中有介绍过几次,算是比较老牌的服务商,但是商家背景财力不是特别雄厚,算是比较小众的个人服务商。目前主流提供CKVM和QKVM套餐。前者是电信CN2 GIA,不过库存储备也不是很足,这不九月份发布新的补货库存活动,有提供九折优惠CN2 GIA,以及六五折优惠QKVM普通线路方案。这次活动截止到9月30日,不清楚商家这次库存补货多少。比如 QKVM基础的五个方案都...
hostwebis怎么样?hostwebis昨天在webhosting发布了几款美国高配置大硬盘机器,但报价需要联系客服。看了下该商家的其它产品,发现几款美国服务器、法国服务器还比较实惠,100Mbps不限流量,高配置大硬盘,$44/月起,有兴趣的可以关注一下。HostWebis是一家国外主机品牌,官网宣称1998年就成立了,根据目标市场的不同,以不同品牌名称提供网络托管服务。2003年,通过与W...
offsetrect为你推荐
雅虎社区福建晋江社区是什么?申请证书申请毕业证书雅虎天盾我装了360安全卫士,原来的雅虎天盾需不需要卸载xp系统停止服务XP停止服务后该怎么办?iphone6上市时间苹果6什么时候出?分词技术百度的中文分词原理是什么?与IK分词有区别吗?网站优化方案网站优化方法有哪些怎么上传音乐怎样可以上传本地音乐到网上?如何清理ie缓存怎么清除IE缓存.南北互通为何会出现网络的南北互通问题
已备案域名查询 西安域名注册 godaddy域名注册 北京vps主机 cn域名备案 google电话 enzu 老鹰主机 wavecom 视频存储服务器 59.99美元 20g硬盘 云图标 中国特价网 小米数据库 什么是刀片服务器 服务器托管什么意思 太原网通测速平台 如何注册阿里云邮箱 彩虹云 更多