SearchingforCommonSense:PopulatingCycfromtheWebCynthiaMatuszek,MichaelWitbrock,RobertC.
Kahlert,JohnCabral,DaveSchneider,PurveshShah,DougLenatCycorp,Inc.
3721ExecutiveCenterDrive,Suite100,Austin,TX78731{cynthia,witbrock,rck,jcabral,daves,shah,lenat}@cyc.
comAbstractTheCycprojectispredicatedontheideathateffectivemachinelearningdependsonhavingacoreofknowl-edgethatprovidesacontextfornovellearnedinforma-tion–whatisknowninformallyas"commonsense.
"Overthelasttwentyyears,asufficientcoreofcommonsenseknowledgehasbeenenteredintoCyctoallowittobegineffectivelyandflexiblysupportingitsmostimportanttask:increasingitsownstoreofworldknowledge.
Inthispaper,wepresentinitialworkonamethodofusingacombinationofCycandtheWorldWideWeb,accessedviaGoogle,toassistinenteringknowledgeintoCyc.
Thelong-termgoalisautomatingtheprocessofbuildingaconsistent,formalizedrepre-sentationoftheworldintheCycknowledgebaseviamachinelearning.
Wepresentpreliminaryresultsofthisworkanddescribehowweexpecttheknowledgeacqui-sitionprocesstobecomemoreaccurate,faster,andmoreautomatedinthefuture.
1IntroductionTheideaofbuildingaverylarge-scaleknowledgebasethatcanbeusedasafoundationforautomatedknowledgeacqui-sitionhasbeenpresentinartificialintelligenceresearchformorethantwentyyears[Lenatetal.
,1983].
Inthattime,anenormousamountofprogresshasbeenmade[Thrunetal.
,1998];techniquesdevelopedundertheumbrellaofmachinelearninghavebeensuccessfullyappliedtoworkrangingfromrobotics,tovoicerecognition,tobioinformatics.
Inallofthesefields,theuseofpreexistingknowledgeiswide-spread.
Muchofthisworkreliesoneitherprogramminganinductivebiasintoalearningsystem(e.
g.
,insystemslikeAM[Lenat,1976]);oronprovidinganinductivebiasintheformoftrainingexamples[Brown,1996].
Alsointhattime,theWebhasemergedasahugereposi-toryofelectronicallyavailableknowledge,andindexingsystemssuchasGooglehavemadethatknowledgeprogres-sivelymoreaccessible[BrinandPage,1998].
Workthatreliesonthewebingeneral,andGoogleinparticular,forinformationextractionisprovingtobeafertileresearcharea[Ghani,2000;Kwoketal.
2001;Etzionietal.
2004].
ThepurposeoftheCycprojectistoprovidecomputerswithastoreofformallyrepresented"commonsense":realworldknowledgethatcanprovideabasisforadditionalknowledgetobegatheredandinterpretedautomatically[Lenat,1995].
Inthelasttwentyyears,overthreemillionfactsandruleshavebeenformallyrepresentedintheCycknowledgebasebyontologistsskilledinCycL,Cyc'sfor-malrepresentationlanguage.
Toolshavebeendevelopedwhichallowsubjectmatterexpertstocontributedirectly[Pantonetal.
,2002;Witbrocketal.
,2003;Belascoetal.
,2004].
Inaddition,naturallanguagegenerationandparsingcapabilitieshavebeendevelopedtoprovidesupportforlearningfromEnglishcorpora[Witbrocketal.
,2004].
Asaresult,theCycknowledgebasenowcontainsenoughknowledgetosupportexperimentationwiththeacquisitionofadditionalknowledgeviamachinelearning.
Inthispaper,wedescribeamethodforgatheringandverifyingfactsfromtheWorldWideWeb.
Theknowledgeacquisitionprocedureisdescribedatbothanoverviewlevelandindetail.
Theworkfocusesonthreenovelapproaches:usingknowledgealreadyintheCycKBtofocustheacquisitionoffurtherknowledge;representingacquiredknowledgeintheknowl-edgebase;andusingGoogleintwodistinctways,tofindfactsand,separately,toverifythem.
Whilethisresearchisatanearlystage,theinitialresultsarepromisingintermsofboththeacquisitionspeedandqualityofresults.
Eveninitspreliminaryform,themecha-nismdescribedisausefultoolforreducingthecostofmanuallyenteringknowledgeintoCyc;thelevelofexper-tiserequiredtoenableapersontocontributetotheKBisreduced,andmanyofthenecessarystepsarehandledauto-matically,reducingthetotaltimerequired.
Thenumberofsentencesthatcanbeacquiredinthiswayandreviewedforaccuracybyanuntrainedrevieweroutstripstherateatwhichsentencescanbehand-authoredbyatrainedontolo-gist,andtheverificationstepsreducetheamountofworkrequiredofahumanreviewerbyapproximately90%.
2CycandCycLTheCycsystemismadeupofthreedistinctcomponents,allofwhicharecrucialtothemachinelearningprocess:theknowledgebase(KB),theinferenceengine,andthenaturallanguagesystem.
TheCycKBcontainsmorethan3.
2mil-lionassertions(factsandrules)describingmorethan280,000concepts,includingmorethan12,000concept-AAAI-05/1430interrelatingpredicates.
FactsstoredintheCycKBmaybeatomic(makingthemGroundAtomicFormul,orGAFs),ortheymaybecomplexsentences.
Dependingonthepredi-cateused,aGAFcandescribeinstance-levelortype-levelknowledge.
AllinformationintheKBisassertedintoahi-erarchicalgraphofmicrotheories,orreasoningcontexts[Guha,1991;Lenat,1998].
CycLqueriesaresyntacticallylegalCycLsentences,whichmaybepartiallybound(thatis,containoneormorevariables).
TheCycinferenceengineisresponsibleforusinginformationintheKBtodeterminethetruthofasentenceand,ifnecessary,findprovablycorrectvariablebindings.
Sampleinstance-levelGAF:(foundingDateCyc(YearFn1985))Sampleentity-to-typeandtype-to-typeGAFS:(sellsProductTypeSaudiAramcoPetroleumProduct)(conditionAffectsPartTypeCutaneousAnthraxSkin)Figure1:Learningisaprocessofselectinginterestingquestions,searchingforthatinformationontheweb,pars-ingtheresults,performingverificationandconsistencycheckswiththedocumentcorpusandtheKB,reviewing,andassertingthatknowledgeintotheKB.
Samplenon-atomicsentence:(or(foundingDateAlQaida(YearFn1987)))(foundingDateAlQaida(YearFn1988)))Samplequery:3.
Parsingresults:Therelevantcomponentsofsentencesareidentifiedbytheirlocationrelativetothesearchstring.
ThetermsarethenparsedintoCycLviathenaturallan-guageparsingprocessdescribedinsection3.
3,resultinginoneormoreGAFssuchas:(foundingAgentPalestineIslamicJihadWHO)Thenaturallanguagecomponentofthesystemconsistsofalexicon,andparsingandgenerationsubsystems.
ThelexiconisacomponentoftheknowledgebasethatmapswordsandphrasestoCycconcepts,whilevariousparsersprovidemethodsfortranslatingEnglishtextintoCycL.
ThesystemalsohasarelativelycompleteabilitytorenderCycLsen-tencesintoEnglish,althoughthephrasingcanbesomewhatstiltedwhenlongersentencesaregenerated.
(foundingAgentPalestineIslamicJihadTerrorist-Nafi)4.
KBconsistencychecking:Someoftheresultsretrievedduringthesearchprocessaredisprovable,becausetheyareinconsistentwithknowledgealreadypresentintheknowl-edgebase;othersarealreadyknownortriviallyprovable,andthereforeredundant.
AnyGAFfoundviainferencetobeinconsistentorredundantisdiscarded.
TheworkdescribedinthispapertargetstheautomaticacquisitionofGAFs.
Simplefactsaremorelikelytobereadilyfoundontheweb,andthisapproachminimizesdif-ficultiesingeneratingandparsingcomplexnaturallanguageconstructs.
5.
Googleverification:Duringsearch,thelargestpossiblesetofcandidateGAFsiscreated.
ThoseGAFsthatarenotdiscardedduringKBconsistencycheckingarere-renderedintoEnglishsearchstrings,suchas:2.
1OverviewoftheLearningCycleGatheringinformationfromthewebproceedsinsixstages,asillustratedinFigure1:"BashirNafiisafounderofPalestineIslamicJihad"andasecondGooglesearchoverthosestringsisperformed.
AnyGAFthatresultsinnoretrieveddocumentsduringthisphaseisdiscarded.
1.
Choosingaquery:BecausethenumberofconceptsintheKBissolarge,thenumberofpossibleCycLqueriesisenormous;choosinginteresting,productivequeriesauto-maticallyisanecessarystepinautomatingtheknowledgeacquisitionprocess.
Anexampleofsuchaquerymightbe:6.
Reviewingandasserting:TheremainingGAFsareas-sertedintospecialhypotheticalcontextsintheknowledgebase.
Anontologistorhumanvolunteerreviewsthemforaccuracy,usingatoolspecifictothattask[Witbrocketal.
,2005],andtheonesfoundtobecorrectareassertedintotheknowledgebase.
(foundingAgentPalestineIslamicJihadWHO)2.
Searching:Onceaqueryisselected,itistranslatedintooneormoreEnglishsearchstrings.
Thequeryabovemightberenderedintostringssuchas:3ImplementationoftheLearningCycle"PIJ,foundedby""PalestineIslamicJihadfounder"ThesestringsarepassedontotheGoogleAPI.
Theappro-priatesectionsofanyresultingdocumentsaredownloaded,andtherelevantsectionisextracted(e.
g.
,"PIJfounderBashirMusaMohammedNafiisstillatlarge…").
3.
1SelectingQueriesWhileitisoftenusefulinanapplicationcontexttolookfortheanswertoaspecificquestion,orautomaticallypopu-lateaclassofinformation(suchasfoundersofgroups,orAAAI-05/1431primeministersofcountries),satisfyingtheultimategoalofpopulatingtheCycKBviamachinelearningreliesinpartonautomaticallyselectingsuitablesentences.
Thereareanumberofchallengesthatmustbemetinthisregard.
Que-riesshouldhavereasonableprobabilityofhavinginterestingbindingsandofbeingfindableinthecorpus(inthiscase,ontheweb).
Somesearchesareunlikelytobeproductiveforsemanticreasons:someargumentpositionsareofaninfiniteorcontinuoustype,suchasTime-Quantity,andcouldthere-foreyieldaninfinitenumberofmostly-uninterestingsearches,suchas(ageOBJECT(YearsDuration300)).
Otherqueriesareguaranteedtoberedundant.
Weinitiallylimitedsearchestoasetof134binarypredi-cateswhich,whenusedtogeneratesearchstrings,tendedtomaximizeusefulresultsfromwebsearches.
1Thealgorithmforproceedingthroughthosepredicateswasasfollows:Foragivensearchrun,adepthofDisselected.
Disthemaximumnumberofdifferentvaluesthatcanbeusedforeachargumentofapredicate.
ForeachbinarypredicatepiinourtestsetP(where|P|=134),weretrievefromtheKBthetypeconstraintsoneachofitstwoarguments.
Unlessthetypegeneralizestoaninfiniteclass,weretrievetheDmostfullyrepresentedvaluesfromtheknowledgebase–thatis,thosethatappearinthemostassertions,andthereforeaboutwhichthemostisknown.
Theseareassumedtobethemostinterestingtermsofthattype,andthereforetheonesmostlikelytobefoundbyawebsearch.
ForpiwethenhavetypesTi1andTi2.
TheDbestrepresentedvalueswouldbe(ti11…ti1D)and(ti21…t12D).
Ifneitherofapredicate'sar-gumentstookvaluesofacontinuoustype,therewouldbe2D*|P|queriesgenerated:(p1t111VAR)…(p1t11DVAR)(p1VARt121)…(p1VARt12D)…(p|P|t|P|11VAR)…(p|P|t|P|1DVAR)(p|P|VARt|P|21)…(p|P|VARt|P|2D)Forexample,asetofthepredicatesfoundingAgentandfoundingDate,givenadepthof1,wouldproducethreequeries:(foundingAgentAlQaidaWHO)(foundingAgentWHATTerrorist-Salamat)(foundingDateAlQaidaWHEN)Thefourthpermutationisnotproduced,becausetheargu-mentconstraint,Date,isofacontinuoustype.
Thisapproachisnotwithoutproblems.
Itreliesheavilyonthenatureofthetypeconstraintsplacedonpredicates;forsomepredicates,suchasfoundingDate,thisworkswell,whileforotherstheargumentconstraintsaretoobroad.
Forexample,thepredicatesellsProductTypetakesaconstantoftypesomethingExistinginitssecondargumentposition,becausealmostanythingcanbesold.
TheproposedwaytoaddressthisproblemiswithapredicatetypicalArgIsa,which1Examples:foundingAgent,foundingDate,sellsPro-ductType,primeMinister,lifeExpectancy,awardWinners.
Productivepredicateswerefoundviamanualtrialanderror,fromasetofdomainsselectedtospanabroadportionoftheKB(terror-ism,medicaltechnology,conceptualworks,globalpolitics,familyrelationships,andsales).
wouldconnectpredicatessuchassellsProductTypewiththecollectionstowhichtheytypicallyrefer(inthiscase,Com-modityProduct)2.
AnotherproblemlieswiththeassumptionthatthemembersofaclassaboutwhichCycknowsthemostarethemostinterestingones,whichisonlysometimescorrect.
Thebest-describedinstancesoftheclassPerson,forexample,tendtobetheontologistswhoworkatCycorp,ratherthan,forexample,headsofstate.
FutureworkwillinvolveusingGoogleintheselectionofappropriatequeries.
RatherthanusingthetopDmostsupportedtermsintheKBoftypeT,itshouldbepossibletoretrievethehitcountforuptoseveralthousandmembersofT,andseekinformationabouttheDtermsofTforwhichthemostinformationisavailable.
3.
2SearchGeneratingSearchStringsOncethesystemhasselectedaquery,itgeneratesaseriesofsearchstrings.
TheexistingNLgenerationmachineryisappliedto233manuallycreatedspecialgenerationtem-platesforthe134predicates.
3SeveralfactorsmotivatedtheconstructionofspecializedsearchgenerationtemplateswithintheNLsystem.
InadditiontothefactthatproductivesearchstringsareoftenunlikestandardEnglish,CycLgen-erationstendtobesomewhatstilted,sinceontologistshavepreferredunambiguousexpressionofCycLmeaningsovernaturalness.
Inaddition,theKBgenerallycontainsoneortwogenerationtemplatesforanygivenpredicate,whiletheremaybemanycommonwaysofexpressingtheinfor-mationthatmaybeusefulforsearching.
Forexample,forthequery(foundingAgentPalestineIs-lamicJihadX),thesystemgeneratesthefollowingsetofsearchstrings:PalestineIslamicJihadfounder____PalestinianIslamicJihadfounder____PIJfounder____PalestineIslamicJihad,foundedby____PalestinianIslamicJihad,foundedby____PIJ,foundedby____AllpossiblesearchesweregeneratedfromtheCartesianproductofthegenerationtemplateswithallEnglishrender-ingsofthearguments.
Inthisexample,CycknowsthreenamesforPalestineIslamicJihad,andhastwotemplatesforfoundingAgent,resultinginsixsearchstrings.
Tosimplifymatchdetection,argumentpositionswereonlyallowedatthebeginningorendofthetemplatestring.
Inordertocarryouttheactualsearch,the"___"placeholderswerestripped2Inmostcases,theseconstraintswillbelearnedfromanalysisofcommonusageintheexistingcorpus.
3Slightlyfewerthanhalfofthepredicateshaveonlyoneasso-ciatedsearchtemplate.
Manyobvioustemplatesexistbuthavenotbeenrepresented;infuturework,ananalysisofthesearchstringswillbeperformedtodeterminewhattypesofstringsproducegoodresults.
Thatinformationwillbeusedforautomaticallygeneratingsearchstringsforotherpredicates.
AAAI-05/1432off,andtheremainingstringwassubmittedasaquotedstringtoGoogle.
SearchingviaGoogleThesearchstringisusedwithaninterfacetotheGoogleAPItoretrieveatupleconsistingoftheURL,theGoogleranking,thematchpositionandthewebpagetext,whichisthenhandedofftoaparserthatattemptstoconvertitintoameaningfulCycLentity.
3.
3ParsingintoCycLOnceadocumentisretrieved,theanswermustbefoundandinterpreted.
First,thesystemsearchesfortheexacttextofthequerystring,andreturnsthesearchstringpluseitherthebeginningortheendofthesentence,dependingontheposi-tionofthe"___"inthegeneratedstring.
TheresultingstringissearchedforphrasesthatcanbeinterpretedasaCycLconceptthatmeetsthetypeconstraintsimposedbythepredicate.
Forexample,inthestring"PIJfounderBashirMusaMohammedNafiisstill…","BashirMusaMoham-medNafi"isrecognizedasaperson,andthereforeasacan-didateforthearg2positionoffoundingAgent.
Forpredicatesthatrequirestrings(suchasnameString,whichrelatesanentitytoitsname),anamedentityrecognizer[Prageretal.
,2000]isusedtofindasuitablecandidate.
Inothercases,standardparsingtechniquesareusedtotrytofindausefulinterpretation,includinglookingupstringsinCyclexicon,interpretingthemasnouncompoundsordates,andcompositionalinterpretation.
Forspeed,composi-tionalinterpretationisonlyattemptedfortermsjudgedtobeconstituentsbyaprobabilisticCFGparser[Charniak,2001].
Thishastheeffectofeliminatingafewcorrectanswers(mostlywheretheparserproducesanincorrectsyntacticparse),butalsodecreasesthetotaltimespentonanalysisbyatleast50%.
CreatingcandidateCycLSentencesTheresultofparsingthematchedsectioninthewebpageisasetofcandidateCycLterms,usuallyconstantssuchasTerrorist-Nafi.
Substitutingthesetermsintotheoriginalin-completeCycLqueryproducesasetofcandidateGAFs.
3.
4CheckingCycKBConsistencyInprinciple,anyusefulfactaddedtothesystemshouldnei-therdisprovablenortriviallyprovable.
Forexample,thesentence:(foundingAgentPalestineIslamicJihadTerrorist-AlShikaki)isnotnovel,becauseCycalreadyknowsthis.
Meanwhile,(foundingAgentPalestineIslamicJihadAugusteRodin)isnovel,buttriviallydisprovable,asCycknowshediedin1917(72yearsbeforethePIJwasfounded).
IftheCycKBalreadycontainsknowledgethatrendersanewfactredundantorcontradictory,thatfactwillbedis-carded.
Thisischeckedviainference;eachnewfactistreatedasaquery,andinferenceisperformedtodeterminewhetheritcanbeprovenfalse(inconsistent)ortrue(redun-dant.
)Cycprovidesjustificationsforfactsusedtoproducequeryresults,asinFigure2,anditishelpfulforredundantfactstobemarkedasadditionallyconfirmedviasearch-basedlearning.
Previousworksuggeststhatone-stepinfer-enceissufficienttoidentifyduplicateinformationordis-provecontradictionsduringtypicalknowledgeacquisitiontasks[Pantonetal.
,2002].
Figure2:Justificationsfortheclaimthata2001attackonAnkarameetsthecriteriaforthequerybeingrun.
3.
5GoogleVerificationInordertoguardagainstparsererrorsandexcessivelygen-eralsearchterms(suchasambiguousacronyms),asecondGooglesearchisperformed,inordertodeterminewhethersearchstringsgeneratedfromthenewGAFwillproduceresults.
SearchstringsaregeneratedfromthecandidateGAFthatwaslearned,butanystringcontaininganacronymorabbreviation(e.
g.
,"PIJ"for"PalestianIslamicJihad")issupplementedwiththedisambiguationterm:theleastcom-monword(basedonGooglehitcounts)oftheexpandedacronym.
Inthiscase,thestring"Palestine"isaddedasaterm,sinceitistheleastcommonwordintheset'Pales-tine,''Palestinian,''Islamic,'and'Jihad.
'Theresultingverificationsearchstringis:"PIJfounderBashirNafi"+"Palestine"Anyfactforwhichthisverificationstepreturnsnoresultsisconsideredunverified,andwillnotbepresentedtoare-viewer.
3.
6ReviewandAssertionInthefinalstep,learnedsentencesarereviewedbyahumancurator,and,ifcorrect,assertedintotheCycKB.
Currently,suggestedsentencesarepresentedtothereviewerinnopar-ticularorder;infuture,sortingmethodswillbeimplementedandtested.
ThemoststraightforwardapproachesinvolvemakinggreateruseofinformationalreadyretrievedfromGoogle:sinceinformationaboutthesearchesunderlyingacandidatesentenceisstoredintheKB,itshouldbepossibleandproductivetogiveprioritytosentencesthataresup-portedbyalargertotalnumberofdocuments.
Ineffect,thenumericalvaluesreturnedduringtheverificationstepcouldbeusedtosortthemostwidelysupportedsentencesup-AAAI-05/1433wardsinthereviewprocess.
Anotherpossibility,ifseveralcontradictoryfactsarefound,isgivingreviewprioritytothosefoundindocumentswiththehighestGoogleranking.
4ResultsStatisticsweregatheredforacaseinwhich134predicatesinPwereused,andDwassetto20.
4Themajorityofthesearchesexpended,about80%,wereperformedintheveri-ficationphaseratherthantheinitialsearchphase.
There-sultswereasfollows:Queries:348Searchesexpended:4290(817initial,3477verification)GAFsfound:1016…andrejectedduetoKBinconsistency:4…andalreadyknowntotheKB:384…andrejectedbyGoogleverification:566NovelGAFSfoundandverified:61AhumanreviewerthenwentthroughtheverifiedGAFs,andasampleof53oftheunverifiedGAFs,anddeterminedtheiractualcorrectnessrate.
Theresultswereasfollows:VerifiedUnverifiedTrue(correct)328**False(incorrect)29*45Totalnovelfacts:114Novel,correctfactsdiscovered:77Incorrectfactsdiscovered:37Factscategorizedcorrectly:68%Factscategorizedincorrectly:32%…*falsepositives(falsebutverified):25%…**falsenegatives(truebutunverified):7%Examplesoftheseresulttypes:Query:(#$hasBeliefSystems#$IranX)Searchstring:"Iranadheresto"CandidateGAF:(#$hasBeliefSystems#$Iran#$Islam)Verificationsearchstrings:"IslamicRepublicofIranadherestoIslam""IranadherestoIslam""IranbelievesinIslam"(found)"IslamicRepublicofIranbelievesinIslam"(found)ExampleGAFsalreadyknowntotheKB:(#$vestedInterest#$Iran#$Iraq(#$inhabitantTypes#$Lebanon#$EthnicGroupOfKurds)ExampleGAFsrejectedduetoKBinconsistency:(#$northOf#$Iran#$Iran)(#$geopoliticalSubdivision#$Iraq#$Iran)4Ittakesbetweenfourandfivehourstoexhaustanallotmentof3,000searchesperdaythroughtheGoogleAPI.
Correct,verifiedGAF:(foundingDateAfricanNationalCongress(YearFn1912))*IncorrectbutverifiedGAF:(foundingDateJewishDefenseLeague(DecadeFn198))Incorrect,rejectedGAF:(objectFoundInLocationKuKluxKlanGillianAnderson)**CorrectbutrejectedGAF:(foundingDateKarenNationalUnion(MonthFnApril(YearFn1947)))Theverificationstepproducescomparativelyfewfalsenegatives(inwhichatruefactisincorrectlyclassifiedasfalse);inthisrun,80%ofthenovel,correctfactsretrievedwerecorrectlyidentifiedassuch.
Giventhis,itisreasonabletorejectallunverifiablesentences,especiallygiventhewealthofpossiblequeriesandthesizeandbreadthofthecorporaavailable.
Only61%oftheincorrectfactsretrievedwereidentified,suggestingthatsubstantialworkindecreas-ingtheoccurrenceoffalsepositiveswillbenecessarybe-foretheneedforhumanreviewiseliminated;thisisunsur-prising,astheInternetcontainslargeamountsofunstruc-tured,uncheckedinformation.
SlightlyoverathirdoftheGAFsdiscoveredwerefactsthatwerealreadyknowntotheKB,andpresumablycorrecttoabaselinelevel(i.
e.
,thecorrectnesslevelachievedbyhumanontologists);thetotalnumberofcorrectfactsdis-coveredwastherefore425,42%ofthetotal.
Verificationreducesthenumberofnovelsentencesthatmustundergohumanreviewfrom1016to61,andthehumanreviewproc-ess,whichtakesplaceentirelyinEnglish,isquickandstraightforward.
Anintermediatesteptowardsfullautoma-tionwouldbetoidentifyclassesofsentencesthatcanbeassertedwithouthumanreview.
5ConclusionsWhilegreatstrideshavebeenmadeinmachinelearninginthelastfewdecades,automaticallygatheringuseful,consis-tentknowledgeinamachine-usableformisstillarelativelyunexploredresearcharea.
TheoriginalpromiseoftheCycproject–toprovideabasisofreal-worldknowledgesuffi-cienttosupportthesortoflearningfromlanguageofwhichhumansarecapable–hasnotyetbeenfulfilled.
Inthattime,informationhasbecomeenormouslymoreaccessible,dueinnosmallparttothewidespreadpopularityoftheWebandtoeffectiveindexingsystemssuchasGoogle.
Makinguseofthatrepositoryrequiresastoreofreal-worldknowledgeandsomefacilityfornaturallanguageparsingandgeneration.
Theseresults,whileextremelypreliminary,areencourag-ing.
Inparticular,usingCycasabasisforlearningiseffec-tive,bothinguidingthelearningprocessandinrepresentingandusingtheresults.
Pre-existingknowledgeintheKBsupportstheconstructionofmeaningfulqueriesandpro-videsaframeworkintowhichlearnedknowledgecanbeassertedandreasonedover.
ComparativelyshallownaturallanguageparsingcombinedwiththetypeconstraintandrelationknowledgeintheCycsystemallowstheretrieval,AAAI-05/1434verification,andreviewofunconstrainedfactsatahigherratethanthatachievedbyhumanknowledgerepresentationexpertsworkingunassisted.
Perhapsmoreimportantly,thekindofknowledgeretrievedisexactlytheinstance-levelknowledgethatshouldnotrequirehumanexperts–itshouldinsteadbeobtained,maintained,andreasonedoverbytoolsthatneedandusethatknowledge.
InvolvingGoogleineverystageofthelearningprocessallowsustoexploitbothCyc'sknowledgeandtheknowledgeonthewebinanex-tremelynaturalway.
Theworkbeingdonehereisimmediatelyusefulasatoolthatmakeshumanknowledgeentryfaster,easierandmoreeffective,butitalsoprovidesabasisforanalysisofwhatinformationcanbelearnedeffectivelywithouthumaninter-action.
Thus,overtime,wehopetoprovideCycwithamechanismtotrulyacquireknowledgebylearning.
AcknowledgmentsThisresearchwaspartiallysponsoredbyARDA'sAQUAINTprogram.
Additionally,wethankGoogleforallowingaccesstotheirAPIforresearchsuchasthis.
References[Belascoetal.
,2004]A.
Belasco,J.
Curtis,RCKahlert,C.
Klein,C.
Mayans,R.
Reagan.
RepresentingKnowledgeGapsEffectively.
InProc.
ofthe5thInternationalCon-ferenceonPracticalAspectsofKnowledgeManage-ment,Vienna,Austria,p.
159-164.
Dec2004.
[BrinandPage,1998]SergeyBrinandLarryPage,Anat-omyofaLarge-scaleHypertextualSearchEngine.
InProc.
ofthe7thInternationalWorldWideWebConfer-ence,pp107-117,Brisbane,Australia,Apr1998.
[Brown,1996]R.
D.
Brown,Example-BasedMachineTranslationinthePanglossSystem.
InProc.
ofthe16thInternationalConferenceonComputationalLinguistics,pp169-174.
Copenhagen,Denmark,August5-9,1996.
[Charniak,2001]E.
Charniak.
AMaximum-Entropy-InspiredParser.
InProc.
ofthe1stconferenceonNorthAmericanchapteroftheAssociationforComputationalLinguistics,pp132-139.
Seattle,WA,2000.
MorganKaufmannPublishers.
[Etzionietal.
,2004]O.
Etzioni,M.
Cafarella,D.
Downey,A,Popescu,T.
Shaked,S.
Soderland,D.
Weld,A.
Yates.
Web-scaleInformationExtractioninKnowItAll.
InProc.
ofthe13thinternationalconferenceonWorldWideWeb,pp100-110,NewYork,NY,2004.
[Ghani,2000]R.
Ghani,R.
Jones,D.
Mladenic,K.
Nigam,S.
Slattery.
DataMiningonSymbolicKnowledgeEx-tractedfromtheWeb.
InProc.
ofthe6thInternationalConferenceonKnowledgeDiscoveryandDataMiningWorkshoponTextMining,pp29-36,Boston,MA,Aug2000.
[Guha,1991]R.
V.
Guha.
Contexts:AFormalizationandSomeApplications.
PhDthesis,StanfordUniversity,STAN-CS-91-1399-Thesis,1991.
[Kwoketal.
,2001]C.
Kwok,O.
Etzioni,D.
Weld.
ScalingQuestionAnsweringtotheWeb.
InACMTransactionsonInformationSystems,Vol19,Issue3,pp242–262.
2001[Lenat,1976]D.
B.
Lenat.
AM:AnArtificialIntelligenceApproachtoDiscoveryinMathematicsasHeuristicSearch,Ph.
D.
Dissertation,StanfordUniversity,STAN-CS-76-570,1976.
[Lenatetal.
,1983]D.
B.
Lenat,A.
Borning,D.
McDonald,C.
Taylor,S.
Weyer.
Knoesphere:BuildingExpertSys-temswithEncyclopedicKnowledge.
InProc.
ofthe8thInternationalJointConferenceonArtificialIntelligence,Vol1,pp167–169,Karlsruhe,Germany,August1983.
[Lenat,1995]D.
B.
Lenat.
Cyc:aLarge-ScaleInvestmentinKnowledgeInfrastructure.
InCommunicationsoftheACM,Vol38,Issue11,pp33-38.
Nov1995.
[Lenat,1998]D.
B.
Lenat,TheDimensionsofContext-Space,fromhttp://www.
cyc.
com/doc/context-space.
pdf.
[Pantonetal.
,2002]K.
Panton,P.
Miraglia,N.
Salay,R.
C.
Kahlert,D.
Baxter,R.
Reagan.
KnowledgeFormationandDialogueUsingtheKRAKENToolset.
InProc.
ofthe18thNationalConferenceonArtificialIntelligence,pp900-905,Edmonton,Canada,2002.
[Prageretal.
,2000]J.
Prager,E.
Brown,A.
Coden,D.
Radev.
QuestionAnsweringbyPredictiveAnnotation.
InProc.
ofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInforma-tionRetrieval,pp184-191.
Athens,Greece,2000.
[Thrunetal.
,1998]S.
Thrun,C.
Faloutsos,T.
Mitchell,L.
Wasserman.
AutomatedLearningandDiscovery:State-Of-The-ArtandResearchTopicsinaRapidlyGrowingField,tech.
reportCMU-CALD-98-100,ComputerSci-enceDepartment,CarnegieMellonUniversity,1998.
[Witbrocketal.
,2003]M.
Witbrock,D.
Baxter,J.
Curtis,D.
SchneiderR.
C.
Kahlert,P.
Miraglia,P.
Wagner,K.
Panton,G.
Matthews,A.
Vizedom.
AnInteractiveDia-logueSystemforKnowledgeAcquisitioninCyc.
InProc.
ofthe18thInternationalJointConferenceonArti-ficialIntelligence,Acapulco,Mexico,2003.
[Witbrocketal.
,2004]M.
Witbrock,K.
Panton,S.
Reed,D.
Schneider,B.
Aldag,M.
ReimersandS.
Bertolo.
AutomatedOWLAnnotationAssistedbyaLargeKnowledgeBase.
InWorkshopNotesofthe2004Work-shoponKnowledgeMarkupandSemanticAnnotationatthe3rdInternationalSemanticWebConference,Hi-roshima,Japan,pp71-80.
Nov2004.
[Witbrocketal.
,2005]:M.
Witbrock,C.
Matuszek,A.
Brusseau,R.
C.
Kahlert,C.
B.
Fraser,D.
Lenat.
"Knowl-edgeBegetsKnowledge:StepstowardsAssistedKnowl-edgeAcquisitioninCyc,"inProc.
oftheAAAI2005SpringSymposiumonKnowledgeCollectionfromVol-unteerContributors,Stanford,CA,March2005.
AAAI-05/1435
wordpress高级企业自适应主题,通用型企业展示平台 + 流行宽屏设计,自适应PC+移动端屏幕设备,完美企业站功能体验+高效的自定义设置平台。一套完美自适应多终端移动屏幕设备的WordPress高级企业自适应主题, 主题设置模块包括:基本设置、首页设置、社会化网络设置、底部设置、SEO设置; 可以自定义设置网站通用功能模块、相关栏目、在线客服及更多网站功能。点击进入:wordpress高级企业...
六一云互联六一云互联为西安六一网络科技有限公司的旗下产品。是一个正规持有IDC/ISP/CDN的国内公司,成立于2018年,主要销售海外高防高速大带宽云服务器/CDN,并以高质量.稳定性.售后相应快.支持退款等特点受很多用户的支持!近期公司也推出了很多给力的抽奖和折扣活动如:新用户免费抽奖,最大可获得500元,湖北新购六折续费八折折上折,全场八折等等最新活动:1.湖北100G高防:新购六折续费八折...
月神科技是由江西月神科技有限公司运营的一家自营云产品的IDC服务商,提供香港安畅、香港沙田、美国CERA、成都电信等机房资源,月神科技有自己的用户群和拥有创宇认证,并且也有电商企业将业务架设在月神科技的平台上。本次带来的是全场八折促销,续费同价。并且上新了国内成都高防服务器,单机100G集群1.2T真实防御,上层屏蔽UDP,可定制CC策略。非常适合网站用户。官方网站:https://www.ysi...
169pp com为你推荐
赛我网怎么激活赛我网申请证书申请毕业证书苹果5怎么越狱苹果5怎么越狱人人逛街人人都喜欢逛街吗ios系统iOS系统为什么那么好二层交换机什么是二层交换机和三层交换机???安全漏洞如何发现系统安全漏洞宽带接入服务器用wifi连不上服务器怎么办声母是什么22个声母是什么声母是什么什么是声母
虚拟主机软件 免费域名跳转 购买域名和空间 海外服务器 免费ftp空间 panel1 个人空间申请 京东商城0元抢购 韩国名字大全 老左来了 183是联通还是移动 如何安装服务器系统 独享主机 路由跟踪 上海电信测速 畅行云 万网空间 国外网页代理 免费网络空间 稳定空间 更多