candidate169pp

169pp com  时间:2021-03-03  阅读:()
SearchingforCommonSense:PopulatingCycfromtheWebCynthiaMatuszek,MichaelWitbrock,RobertC.
Kahlert,JohnCabral,DaveSchneider,PurveshShah,DougLenatCycorp,Inc.
3721ExecutiveCenterDrive,Suite100,Austin,TX78731{cynthia,witbrock,rck,jcabral,daves,shah,lenat}@cyc.
comAbstractTheCycprojectispredicatedontheideathateffectivemachinelearningdependsonhavingacoreofknowl-edgethatprovidesacontextfornovellearnedinforma-tion–whatisknowninformallyas"commonsense.
"Overthelasttwentyyears,asufficientcoreofcommonsenseknowledgehasbeenenteredintoCyctoallowittobegineffectivelyandflexiblysupportingitsmostimportanttask:increasingitsownstoreofworldknowledge.
Inthispaper,wepresentinitialworkonamethodofusingacombinationofCycandtheWorldWideWeb,accessedviaGoogle,toassistinenteringknowledgeintoCyc.
Thelong-termgoalisautomatingtheprocessofbuildingaconsistent,formalizedrepre-sentationoftheworldintheCycknowledgebaseviamachinelearning.
Wepresentpreliminaryresultsofthisworkanddescribehowweexpecttheknowledgeacqui-sitionprocesstobecomemoreaccurate,faster,andmoreautomatedinthefuture.
1IntroductionTheideaofbuildingaverylarge-scaleknowledgebasethatcanbeusedasafoundationforautomatedknowledgeacqui-sitionhasbeenpresentinartificialintelligenceresearchformorethantwentyyears[Lenatetal.
,1983].
Inthattime,anenormousamountofprogresshasbeenmade[Thrunetal.
,1998];techniquesdevelopedundertheumbrellaofmachinelearninghavebeensuccessfullyappliedtoworkrangingfromrobotics,tovoicerecognition,tobioinformatics.
Inallofthesefields,theuseofpreexistingknowledgeiswide-spread.
Muchofthisworkreliesoneitherprogramminganinductivebiasintoalearningsystem(e.
g.
,insystemslikeAM[Lenat,1976]);oronprovidinganinductivebiasintheformoftrainingexamples[Brown,1996].
Alsointhattime,theWebhasemergedasahugereposi-toryofelectronicallyavailableknowledge,andindexingsystemssuchasGooglehavemadethatknowledgeprogres-sivelymoreaccessible[BrinandPage,1998].
Workthatreliesonthewebingeneral,andGoogleinparticular,forinformationextractionisprovingtobeafertileresearcharea[Ghani,2000;Kwoketal.
2001;Etzionietal.
2004].
ThepurposeoftheCycprojectistoprovidecomputerswithastoreofformallyrepresented"commonsense":realworldknowledgethatcanprovideabasisforadditionalknowledgetobegatheredandinterpretedautomatically[Lenat,1995].
Inthelasttwentyyears,overthreemillionfactsandruleshavebeenformallyrepresentedintheCycknowledgebasebyontologistsskilledinCycL,Cyc'sfor-malrepresentationlanguage.
Toolshavebeendevelopedwhichallowsubjectmatterexpertstocontributedirectly[Pantonetal.
,2002;Witbrocketal.
,2003;Belascoetal.
,2004].
Inaddition,naturallanguagegenerationandparsingcapabilitieshavebeendevelopedtoprovidesupportforlearningfromEnglishcorpora[Witbrocketal.
,2004].
Asaresult,theCycknowledgebasenowcontainsenoughknowledgetosupportexperimentationwiththeacquisitionofadditionalknowledgeviamachinelearning.
Inthispaper,wedescribeamethodforgatheringandverifyingfactsfromtheWorldWideWeb.
Theknowledgeacquisitionprocedureisdescribedatbothanoverviewlevelandindetail.
Theworkfocusesonthreenovelapproaches:usingknowledgealreadyintheCycKBtofocustheacquisitionoffurtherknowledge;representingacquiredknowledgeintheknowl-edgebase;andusingGoogleintwodistinctways,tofindfactsand,separately,toverifythem.
Whilethisresearchisatanearlystage,theinitialresultsarepromisingintermsofboththeacquisitionspeedandqualityofresults.
Eveninitspreliminaryform,themecha-nismdescribedisausefultoolforreducingthecostofmanuallyenteringknowledgeintoCyc;thelevelofexper-tiserequiredtoenableapersontocontributetotheKBisreduced,andmanyofthenecessarystepsarehandledauto-matically,reducingthetotaltimerequired.
Thenumberofsentencesthatcanbeacquiredinthiswayandreviewedforaccuracybyanuntrainedrevieweroutstripstherateatwhichsentencescanbehand-authoredbyatrainedontolo-gist,andtheverificationstepsreducetheamountofworkrequiredofahumanreviewerbyapproximately90%.
2CycandCycLTheCycsystemismadeupofthreedistinctcomponents,allofwhicharecrucialtothemachinelearningprocess:theknowledgebase(KB),theinferenceengine,andthenaturallanguagesystem.
TheCycKBcontainsmorethan3.
2mil-lionassertions(factsandrules)describingmorethan280,000concepts,includingmorethan12,000concept-AAAI-05/1430interrelatingpredicates.
FactsstoredintheCycKBmaybeatomic(makingthemGroundAtomicFormul,orGAFs),ortheymaybecomplexsentences.
Dependingonthepredi-cateused,aGAFcandescribeinstance-levelortype-levelknowledge.
AllinformationintheKBisassertedintoahi-erarchicalgraphofmicrotheories,orreasoningcontexts[Guha,1991;Lenat,1998].
CycLqueriesaresyntacticallylegalCycLsentences,whichmaybepartiallybound(thatis,containoneormorevariables).
TheCycinferenceengineisresponsibleforusinginformationintheKBtodeterminethetruthofasentenceand,ifnecessary,findprovablycorrectvariablebindings.
Sampleinstance-levelGAF:(foundingDateCyc(YearFn1985))Sampleentity-to-typeandtype-to-typeGAFS:(sellsProductTypeSaudiAramcoPetroleumProduct)(conditionAffectsPartTypeCutaneousAnthraxSkin)Figure1:Learningisaprocessofselectinginterestingquestions,searchingforthatinformationontheweb,pars-ingtheresults,performingverificationandconsistencycheckswiththedocumentcorpusandtheKB,reviewing,andassertingthatknowledgeintotheKB.
Samplenon-atomicsentence:(or(foundingDateAlQaida(YearFn1987)))(foundingDateAlQaida(YearFn1988)))Samplequery:3.
Parsingresults:Therelevantcomponentsofsentencesareidentifiedbytheirlocationrelativetothesearchstring.
ThetermsarethenparsedintoCycLviathenaturallan-guageparsingprocessdescribedinsection3.
3,resultinginoneormoreGAFssuchas:(foundingAgentPalestineIslamicJihadWHO)Thenaturallanguagecomponentofthesystemconsistsofalexicon,andparsingandgenerationsubsystems.
ThelexiconisacomponentoftheknowledgebasethatmapswordsandphrasestoCycconcepts,whilevariousparsersprovidemethodsfortranslatingEnglishtextintoCycL.
ThesystemalsohasarelativelycompleteabilitytorenderCycLsen-tencesintoEnglish,althoughthephrasingcanbesomewhatstiltedwhenlongersentencesaregenerated.
(foundingAgentPalestineIslamicJihadTerrorist-Nafi)4.
KBconsistencychecking:Someoftheresultsretrievedduringthesearchprocessaredisprovable,becausetheyareinconsistentwithknowledgealreadypresentintheknowl-edgebase;othersarealreadyknownortriviallyprovable,andthereforeredundant.
AnyGAFfoundviainferencetobeinconsistentorredundantisdiscarded.
TheworkdescribedinthispapertargetstheautomaticacquisitionofGAFs.
Simplefactsaremorelikelytobereadilyfoundontheweb,andthisapproachminimizesdif-ficultiesingeneratingandparsingcomplexnaturallanguageconstructs.
5.
Googleverification:Duringsearch,thelargestpossiblesetofcandidateGAFsiscreated.
ThoseGAFsthatarenotdiscardedduringKBconsistencycheckingarere-renderedintoEnglishsearchstrings,suchas:2.
1OverviewoftheLearningCycleGatheringinformationfromthewebproceedsinsixstages,asillustratedinFigure1:"BashirNafiisafounderofPalestineIslamicJihad"andasecondGooglesearchoverthosestringsisperformed.
AnyGAFthatresultsinnoretrieveddocumentsduringthisphaseisdiscarded.
1.
Choosingaquery:BecausethenumberofconceptsintheKBissolarge,thenumberofpossibleCycLqueriesisenormous;choosinginteresting,productivequeriesauto-maticallyisanecessarystepinautomatingtheknowledgeacquisitionprocess.
Anexampleofsuchaquerymightbe:6.
Reviewingandasserting:TheremainingGAFsareas-sertedintospecialhypotheticalcontextsintheknowledgebase.
Anontologistorhumanvolunteerreviewsthemforaccuracy,usingatoolspecifictothattask[Witbrocketal.
,2005],andtheonesfoundtobecorrectareassertedintotheknowledgebase.
(foundingAgentPalestineIslamicJihadWHO)2.
Searching:Onceaqueryisselected,itistranslatedintooneormoreEnglishsearchstrings.
Thequeryabovemightberenderedintostringssuchas:3ImplementationoftheLearningCycle"PIJ,foundedby""PalestineIslamicJihadfounder"ThesestringsarepassedontotheGoogleAPI.
Theappro-priatesectionsofanyresultingdocumentsaredownloaded,andtherelevantsectionisextracted(e.
g.
,"PIJfounderBashirMusaMohammedNafiisstillatlarge…").
3.
1SelectingQueriesWhileitisoftenusefulinanapplicationcontexttolookfortheanswertoaspecificquestion,orautomaticallypopu-lateaclassofinformation(suchasfoundersofgroups,orAAAI-05/1431primeministersofcountries),satisfyingtheultimategoalofpopulatingtheCycKBviamachinelearningreliesinpartonautomaticallyselectingsuitablesentences.
Thereareanumberofchallengesthatmustbemetinthisregard.
Que-riesshouldhavereasonableprobabilityofhavinginterestingbindingsandofbeingfindableinthecorpus(inthiscase,ontheweb).
Somesearchesareunlikelytobeproductiveforsemanticreasons:someargumentpositionsareofaninfiniteorcontinuoustype,suchasTime-Quantity,andcouldthere-foreyieldaninfinitenumberofmostly-uninterestingsearches,suchas(ageOBJECT(YearsDuration300)).
Otherqueriesareguaranteedtoberedundant.
Weinitiallylimitedsearchestoasetof134binarypredi-cateswhich,whenusedtogeneratesearchstrings,tendedtomaximizeusefulresultsfromwebsearches.
1Thealgorithmforproceedingthroughthosepredicateswasasfollows:Foragivensearchrun,adepthofDisselected.
Disthemaximumnumberofdifferentvaluesthatcanbeusedforeachargumentofapredicate.
ForeachbinarypredicatepiinourtestsetP(where|P|=134),weretrievefromtheKBthetypeconstraintsoneachofitstwoarguments.
Unlessthetypegeneralizestoaninfiniteclass,weretrievetheDmostfullyrepresentedvaluesfromtheknowledgebase–thatis,thosethatappearinthemostassertions,andthereforeaboutwhichthemostisknown.
Theseareassumedtobethemostinterestingtermsofthattype,andthereforetheonesmostlikelytobefoundbyawebsearch.
ForpiwethenhavetypesTi1andTi2.
TheDbestrepresentedvalueswouldbe(ti11…ti1D)and(ti21…t12D).
Ifneitherofapredicate'sar-gumentstookvaluesofacontinuoustype,therewouldbe2D*|P|queriesgenerated:(p1t111VAR)…(p1t11DVAR)(p1VARt121)…(p1VARt12D)…(p|P|t|P|11VAR)…(p|P|t|P|1DVAR)(p|P|VARt|P|21)…(p|P|VARt|P|2D)Forexample,asetofthepredicatesfoundingAgentandfoundingDate,givenadepthof1,wouldproducethreequeries:(foundingAgentAlQaidaWHO)(foundingAgentWHATTerrorist-Salamat)(foundingDateAlQaidaWHEN)Thefourthpermutationisnotproduced,becausetheargu-mentconstraint,Date,isofacontinuoustype.
Thisapproachisnotwithoutproblems.
Itreliesheavilyonthenatureofthetypeconstraintsplacedonpredicates;forsomepredicates,suchasfoundingDate,thisworkswell,whileforotherstheargumentconstraintsaretoobroad.
Forexample,thepredicatesellsProductTypetakesaconstantoftypesomethingExistinginitssecondargumentposition,becausealmostanythingcanbesold.
TheproposedwaytoaddressthisproblemiswithapredicatetypicalArgIsa,which1Examples:foundingAgent,foundingDate,sellsPro-ductType,primeMinister,lifeExpectancy,awardWinners.
Productivepredicateswerefoundviamanualtrialanderror,fromasetofdomainsselectedtospanabroadportionoftheKB(terror-ism,medicaltechnology,conceptualworks,globalpolitics,familyrelationships,andsales).
wouldconnectpredicatessuchassellsProductTypewiththecollectionstowhichtheytypicallyrefer(inthiscase,Com-modityProduct)2.
AnotherproblemlieswiththeassumptionthatthemembersofaclassaboutwhichCycknowsthemostarethemostinterestingones,whichisonlysometimescorrect.
Thebest-describedinstancesoftheclassPerson,forexample,tendtobetheontologistswhoworkatCycorp,ratherthan,forexample,headsofstate.
FutureworkwillinvolveusingGoogleintheselectionofappropriatequeries.
RatherthanusingthetopDmostsupportedtermsintheKBoftypeT,itshouldbepossibletoretrievethehitcountforuptoseveralthousandmembersofT,andseekinformationabouttheDtermsofTforwhichthemostinformationisavailable.
3.
2SearchGeneratingSearchStringsOncethesystemhasselectedaquery,itgeneratesaseriesofsearchstrings.
TheexistingNLgenerationmachineryisappliedto233manuallycreatedspecialgenerationtem-platesforthe134predicates.
3SeveralfactorsmotivatedtheconstructionofspecializedsearchgenerationtemplateswithintheNLsystem.
InadditiontothefactthatproductivesearchstringsareoftenunlikestandardEnglish,CycLgen-erationstendtobesomewhatstilted,sinceontologistshavepreferredunambiguousexpressionofCycLmeaningsovernaturalness.
Inaddition,theKBgenerallycontainsoneortwogenerationtemplatesforanygivenpredicate,whiletheremaybemanycommonwaysofexpressingtheinfor-mationthatmaybeusefulforsearching.
Forexample,forthequery(foundingAgentPalestineIs-lamicJihadX),thesystemgeneratesthefollowingsetofsearchstrings:PalestineIslamicJihadfounder____PalestinianIslamicJihadfounder____PIJfounder____PalestineIslamicJihad,foundedby____PalestinianIslamicJihad,foundedby____PIJ,foundedby____AllpossiblesearchesweregeneratedfromtheCartesianproductofthegenerationtemplateswithallEnglishrender-ingsofthearguments.
Inthisexample,CycknowsthreenamesforPalestineIslamicJihad,andhastwotemplatesforfoundingAgent,resultinginsixsearchstrings.
Tosimplifymatchdetection,argumentpositionswereonlyallowedatthebeginningorendofthetemplatestring.
Inordertocarryouttheactualsearch,the"___"placeholderswerestripped2Inmostcases,theseconstraintswillbelearnedfromanalysisofcommonusageintheexistingcorpus.
3Slightlyfewerthanhalfofthepredicateshaveonlyoneasso-ciatedsearchtemplate.
Manyobvioustemplatesexistbuthavenotbeenrepresented;infuturework,ananalysisofthesearchstringswillbeperformedtodeterminewhattypesofstringsproducegoodresults.
Thatinformationwillbeusedforautomaticallygeneratingsearchstringsforotherpredicates.
AAAI-05/1432off,andtheremainingstringwassubmittedasaquotedstringtoGoogle.
SearchingviaGoogleThesearchstringisusedwithaninterfacetotheGoogleAPItoretrieveatupleconsistingoftheURL,theGoogleranking,thematchpositionandthewebpagetext,whichisthenhandedofftoaparserthatattemptstoconvertitintoameaningfulCycLentity.
3.
3ParsingintoCycLOnceadocumentisretrieved,theanswermustbefoundandinterpreted.
First,thesystemsearchesfortheexacttextofthequerystring,andreturnsthesearchstringpluseitherthebeginningortheendofthesentence,dependingontheposi-tionofthe"___"inthegeneratedstring.
TheresultingstringissearchedforphrasesthatcanbeinterpretedasaCycLconceptthatmeetsthetypeconstraintsimposedbythepredicate.
Forexample,inthestring"PIJfounderBashirMusaMohammedNafiisstill…","BashirMusaMoham-medNafi"isrecognizedasaperson,andthereforeasacan-didateforthearg2positionoffoundingAgent.
Forpredicatesthatrequirestrings(suchasnameString,whichrelatesanentitytoitsname),anamedentityrecognizer[Prageretal.
,2000]isusedtofindasuitablecandidate.
Inothercases,standardparsingtechniquesareusedtotrytofindausefulinterpretation,includinglookingupstringsinCyclexicon,interpretingthemasnouncompoundsordates,andcompositionalinterpretation.
Forspeed,composi-tionalinterpretationisonlyattemptedfortermsjudgedtobeconstituentsbyaprobabilisticCFGparser[Charniak,2001].
Thishastheeffectofeliminatingafewcorrectanswers(mostlywheretheparserproducesanincorrectsyntacticparse),butalsodecreasesthetotaltimespentonanalysisbyatleast50%.
CreatingcandidateCycLSentencesTheresultofparsingthematchedsectioninthewebpageisasetofcandidateCycLterms,usuallyconstantssuchasTerrorist-Nafi.
Substitutingthesetermsintotheoriginalin-completeCycLqueryproducesasetofcandidateGAFs.
3.
4CheckingCycKBConsistencyInprinciple,anyusefulfactaddedtothesystemshouldnei-therdisprovablenortriviallyprovable.
Forexample,thesentence:(foundingAgentPalestineIslamicJihadTerrorist-AlShikaki)isnotnovel,becauseCycalreadyknowsthis.
Meanwhile,(foundingAgentPalestineIslamicJihadAugusteRodin)isnovel,buttriviallydisprovable,asCycknowshediedin1917(72yearsbeforethePIJwasfounded).
IftheCycKBalreadycontainsknowledgethatrendersanewfactredundantorcontradictory,thatfactwillbedis-carded.
Thisischeckedviainference;eachnewfactistreatedasaquery,andinferenceisperformedtodeterminewhetheritcanbeprovenfalse(inconsistent)ortrue(redun-dant.
)Cycprovidesjustificationsforfactsusedtoproducequeryresults,asinFigure2,anditishelpfulforredundantfactstobemarkedasadditionallyconfirmedviasearch-basedlearning.
Previousworksuggeststhatone-stepinfer-enceissufficienttoidentifyduplicateinformationordis-provecontradictionsduringtypicalknowledgeacquisitiontasks[Pantonetal.
,2002].
Figure2:Justificationsfortheclaimthata2001attackonAnkarameetsthecriteriaforthequerybeingrun.
3.
5GoogleVerificationInordertoguardagainstparsererrorsandexcessivelygen-eralsearchterms(suchasambiguousacronyms),asecondGooglesearchisperformed,inordertodeterminewhethersearchstringsgeneratedfromthenewGAFwillproduceresults.
SearchstringsaregeneratedfromthecandidateGAFthatwaslearned,butanystringcontaininganacronymorabbreviation(e.
g.
,"PIJ"for"PalestianIslamicJihad")issupplementedwiththedisambiguationterm:theleastcom-monword(basedonGooglehitcounts)oftheexpandedacronym.
Inthiscase,thestring"Palestine"isaddedasaterm,sinceitistheleastcommonwordintheset'Pales-tine,''Palestinian,''Islamic,'and'Jihad.
'Theresultingverificationsearchstringis:"PIJfounderBashirNafi"+"Palestine"Anyfactforwhichthisverificationstepreturnsnoresultsisconsideredunverified,andwillnotbepresentedtoare-viewer.
3.
6ReviewandAssertionInthefinalstep,learnedsentencesarereviewedbyahumancurator,and,ifcorrect,assertedintotheCycKB.
Currently,suggestedsentencesarepresentedtothereviewerinnopar-ticularorder;infuture,sortingmethodswillbeimplementedandtested.
ThemoststraightforwardapproachesinvolvemakinggreateruseofinformationalreadyretrievedfromGoogle:sinceinformationaboutthesearchesunderlyingacandidatesentenceisstoredintheKB,itshouldbepossibleandproductivetogiveprioritytosentencesthataresup-portedbyalargertotalnumberofdocuments.
Ineffect,thenumericalvaluesreturnedduringtheverificationstepcouldbeusedtosortthemostwidelysupportedsentencesup-AAAI-05/1433wardsinthereviewprocess.
Anotherpossibility,ifseveralcontradictoryfactsarefound,isgivingreviewprioritytothosefoundindocumentswiththehighestGoogleranking.
4ResultsStatisticsweregatheredforacaseinwhich134predicatesinPwereused,andDwassetto20.
4Themajorityofthesearchesexpended,about80%,wereperformedintheveri-ficationphaseratherthantheinitialsearchphase.
There-sultswereasfollows:Queries:348Searchesexpended:4290(817initial,3477verification)GAFsfound:1016…andrejectedduetoKBinconsistency:4…andalreadyknowntotheKB:384…andrejectedbyGoogleverification:566NovelGAFSfoundandverified:61AhumanreviewerthenwentthroughtheverifiedGAFs,andasampleof53oftheunverifiedGAFs,anddeterminedtheiractualcorrectnessrate.
Theresultswereasfollows:VerifiedUnverifiedTrue(correct)328**False(incorrect)29*45Totalnovelfacts:114Novel,correctfactsdiscovered:77Incorrectfactsdiscovered:37Factscategorizedcorrectly:68%Factscategorizedincorrectly:32%…*falsepositives(falsebutverified):25%…**falsenegatives(truebutunverified):7%Examplesoftheseresulttypes:Query:(#$hasBeliefSystems#$IranX)Searchstring:"Iranadheresto"CandidateGAF:(#$hasBeliefSystems#$Iran#$Islam)Verificationsearchstrings:"IslamicRepublicofIranadherestoIslam""IranadherestoIslam""IranbelievesinIslam"(found)"IslamicRepublicofIranbelievesinIslam"(found)ExampleGAFsalreadyknowntotheKB:(#$vestedInterest#$Iran#$Iraq(#$inhabitantTypes#$Lebanon#$EthnicGroupOfKurds)ExampleGAFsrejectedduetoKBinconsistency:(#$northOf#$Iran#$Iran)(#$geopoliticalSubdivision#$Iraq#$Iran)4Ittakesbetweenfourandfivehourstoexhaustanallotmentof3,000searchesperdaythroughtheGoogleAPI.
Correct,verifiedGAF:(foundingDateAfricanNationalCongress(YearFn1912))*IncorrectbutverifiedGAF:(foundingDateJewishDefenseLeague(DecadeFn198))Incorrect,rejectedGAF:(objectFoundInLocationKuKluxKlanGillianAnderson)**CorrectbutrejectedGAF:(foundingDateKarenNationalUnion(MonthFnApril(YearFn1947)))Theverificationstepproducescomparativelyfewfalsenegatives(inwhichatruefactisincorrectlyclassifiedasfalse);inthisrun,80%ofthenovel,correctfactsretrievedwerecorrectlyidentifiedassuch.
Giventhis,itisreasonabletorejectallunverifiablesentences,especiallygiventhewealthofpossiblequeriesandthesizeandbreadthofthecorporaavailable.
Only61%oftheincorrectfactsretrievedwereidentified,suggestingthatsubstantialworkindecreas-ingtheoccurrenceoffalsepositiveswillbenecessarybe-foretheneedforhumanreviewiseliminated;thisisunsur-prising,astheInternetcontainslargeamountsofunstruc-tured,uncheckedinformation.
SlightlyoverathirdoftheGAFsdiscoveredwerefactsthatwerealreadyknowntotheKB,andpresumablycorrecttoabaselinelevel(i.
e.
,thecorrectnesslevelachievedbyhumanontologists);thetotalnumberofcorrectfactsdis-coveredwastherefore425,42%ofthetotal.
Verificationreducesthenumberofnovelsentencesthatmustundergohumanreviewfrom1016to61,andthehumanreviewproc-ess,whichtakesplaceentirelyinEnglish,isquickandstraightforward.
Anintermediatesteptowardsfullautoma-tionwouldbetoidentifyclassesofsentencesthatcanbeassertedwithouthumanreview.
5ConclusionsWhilegreatstrideshavebeenmadeinmachinelearninginthelastfewdecades,automaticallygatheringuseful,consis-tentknowledgeinamachine-usableformisstillarelativelyunexploredresearcharea.
TheoriginalpromiseoftheCycproject–toprovideabasisofreal-worldknowledgesuffi-cienttosupportthesortoflearningfromlanguageofwhichhumansarecapable–hasnotyetbeenfulfilled.
Inthattime,informationhasbecomeenormouslymoreaccessible,dueinnosmallparttothewidespreadpopularityoftheWebandtoeffectiveindexingsystemssuchasGoogle.
Makinguseofthatrepositoryrequiresastoreofreal-worldknowledgeandsomefacilityfornaturallanguageparsingandgeneration.
Theseresults,whileextremelypreliminary,areencourag-ing.
Inparticular,usingCycasabasisforlearningiseffec-tive,bothinguidingthelearningprocessandinrepresentingandusingtheresults.
Pre-existingknowledgeintheKBsupportstheconstructionofmeaningfulqueriesandpro-videsaframeworkintowhichlearnedknowledgecanbeassertedandreasonedover.
ComparativelyshallownaturallanguageparsingcombinedwiththetypeconstraintandrelationknowledgeintheCycsystemallowstheretrieval,AAAI-05/1434verification,andreviewofunconstrainedfactsatahigherratethanthatachievedbyhumanknowledgerepresentationexpertsworkingunassisted.
Perhapsmoreimportantly,thekindofknowledgeretrievedisexactlytheinstance-levelknowledgethatshouldnotrequirehumanexperts–itshouldinsteadbeobtained,maintained,andreasonedoverbytoolsthatneedandusethatknowledge.
InvolvingGoogleineverystageofthelearningprocessallowsustoexploitbothCyc'sknowledgeandtheknowledgeonthewebinanex-tremelynaturalway.
Theworkbeingdonehereisimmediatelyusefulasatoolthatmakeshumanknowledgeentryfaster,easierandmoreeffective,butitalsoprovidesabasisforanalysisofwhatinformationcanbelearnedeffectivelywithouthumaninter-action.
Thus,overtime,wehopetoprovideCycwithamechanismtotrulyacquireknowledgebylearning.
AcknowledgmentsThisresearchwaspartiallysponsoredbyARDA'sAQUAINTprogram.
Additionally,wethankGoogleforallowingaccesstotheirAPIforresearchsuchasthis.
References[Belascoetal.
,2004]A.
Belasco,J.
Curtis,RCKahlert,C.
Klein,C.
Mayans,R.
Reagan.
RepresentingKnowledgeGapsEffectively.
InProc.
ofthe5thInternationalCon-ferenceonPracticalAspectsofKnowledgeManage-ment,Vienna,Austria,p.
159-164.
Dec2004.
[BrinandPage,1998]SergeyBrinandLarryPage,Anat-omyofaLarge-scaleHypertextualSearchEngine.
InProc.
ofthe7thInternationalWorldWideWebConfer-ence,pp107-117,Brisbane,Australia,Apr1998.
[Brown,1996]R.
D.
Brown,Example-BasedMachineTranslationinthePanglossSystem.
InProc.
ofthe16thInternationalConferenceonComputationalLinguistics,pp169-174.
Copenhagen,Denmark,August5-9,1996.
[Charniak,2001]E.
Charniak.
AMaximum-Entropy-InspiredParser.
InProc.
ofthe1stconferenceonNorthAmericanchapteroftheAssociationforComputationalLinguistics,pp132-139.
Seattle,WA,2000.
MorganKaufmannPublishers.
[Etzionietal.
,2004]O.
Etzioni,M.
Cafarella,D.
Downey,A,Popescu,T.
Shaked,S.
Soderland,D.
Weld,A.
Yates.
Web-scaleInformationExtractioninKnowItAll.
InProc.
ofthe13thinternationalconferenceonWorldWideWeb,pp100-110,NewYork,NY,2004.
[Ghani,2000]R.
Ghani,R.
Jones,D.
Mladenic,K.
Nigam,S.
Slattery.
DataMiningonSymbolicKnowledgeEx-tractedfromtheWeb.
InProc.
ofthe6thInternationalConferenceonKnowledgeDiscoveryandDataMiningWorkshoponTextMining,pp29-36,Boston,MA,Aug2000.
[Guha,1991]R.
V.
Guha.
Contexts:AFormalizationandSomeApplications.
PhDthesis,StanfordUniversity,STAN-CS-91-1399-Thesis,1991.
[Kwoketal.
,2001]C.
Kwok,O.
Etzioni,D.
Weld.
ScalingQuestionAnsweringtotheWeb.
InACMTransactionsonInformationSystems,Vol19,Issue3,pp242–262.
2001[Lenat,1976]D.
B.
Lenat.
AM:AnArtificialIntelligenceApproachtoDiscoveryinMathematicsasHeuristicSearch,Ph.
D.
Dissertation,StanfordUniversity,STAN-CS-76-570,1976.
[Lenatetal.
,1983]D.
B.
Lenat,A.
Borning,D.
McDonald,C.
Taylor,S.
Weyer.
Knoesphere:BuildingExpertSys-temswithEncyclopedicKnowledge.
InProc.
ofthe8thInternationalJointConferenceonArtificialIntelligence,Vol1,pp167–169,Karlsruhe,Germany,August1983.
[Lenat,1995]D.
B.
Lenat.
Cyc:aLarge-ScaleInvestmentinKnowledgeInfrastructure.
InCommunicationsoftheACM,Vol38,Issue11,pp33-38.
Nov1995.
[Lenat,1998]D.
B.
Lenat,TheDimensionsofContext-Space,fromhttp://www.
cyc.
com/doc/context-space.
pdf.
[Pantonetal.
,2002]K.
Panton,P.
Miraglia,N.
Salay,R.
C.
Kahlert,D.
Baxter,R.
Reagan.
KnowledgeFormationandDialogueUsingtheKRAKENToolset.
InProc.
ofthe18thNationalConferenceonArtificialIntelligence,pp900-905,Edmonton,Canada,2002.
[Prageretal.
,2000]J.
Prager,E.
Brown,A.
Coden,D.
Radev.
QuestionAnsweringbyPredictiveAnnotation.
InProc.
ofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInforma-tionRetrieval,pp184-191.
Athens,Greece,2000.
[Thrunetal.
,1998]S.
Thrun,C.
Faloutsos,T.
Mitchell,L.
Wasserman.
AutomatedLearningandDiscovery:State-Of-The-ArtandResearchTopicsinaRapidlyGrowingField,tech.
reportCMU-CALD-98-100,ComputerSci-enceDepartment,CarnegieMellonUniversity,1998.
[Witbrocketal.
,2003]M.
Witbrock,D.
Baxter,J.
Curtis,D.
SchneiderR.
C.
Kahlert,P.
Miraglia,P.
Wagner,K.
Panton,G.
Matthews,A.
Vizedom.
AnInteractiveDia-logueSystemforKnowledgeAcquisitioninCyc.
InProc.
ofthe18thInternationalJointConferenceonArti-ficialIntelligence,Acapulco,Mexico,2003.
[Witbrocketal.
,2004]M.
Witbrock,K.
Panton,S.
Reed,D.
Schneider,B.
Aldag,M.
ReimersandS.
Bertolo.
AutomatedOWLAnnotationAssistedbyaLargeKnowledgeBase.
InWorkshopNotesofthe2004Work-shoponKnowledgeMarkupandSemanticAnnotationatthe3rdInternationalSemanticWebConference,Hi-roshima,Japan,pp71-80.
Nov2004.
[Witbrocketal.
,2005]:M.
Witbrock,C.
Matuszek,A.
Brusseau,R.
C.
Kahlert,C.
B.
Fraser,D.
Lenat.
"Knowl-edgeBegetsKnowledge:StepstowardsAssistedKnowl-edgeAcquisitioninCyc,"inProc.
oftheAAAI2005SpringSymposiumonKnowledgeCollectionfromVol-unteerContributors,Stanford,CA,March2005.
AAAI-05/1435

HostYun:联通AS9929线路,最低月付18元起,最高500Mbps带宽,洛杉矶机房

最近AS9929线路比较火,联通A网,对标电信CN2,HostYun也推出了走联通AS9929线路的VPS主机,基于KVM架构,开设在洛杉矶机房,采用SSD硬盘,分为入门和高带宽型,最高提供500Mbps带宽,可使用9折优惠码,最低每月仅18元起。这是一家成立于2008年的VPS主机品牌,原主机分享组织(hostshare.cn),商家以提供低端廉价VPS产品而广为人知,是小成本投入学习练手首选。...

buyvm迈阿密机房VPS国内首发测评,高性能平台:AMD Ryzen 9 3900x+DDR4+NVMe+1Gbps带宽不限流量

buyvm的第四个数据中心上线了,位于美国东南沿海的迈阿密市。迈阿密的VPS依旧和buyvm其他机房的一样,KVM虚拟,Ryzen 9 3900x、DDR4、NVMe、1Gbps带宽、不限流量。目前还没有看见buyvm上架迈阿密的block storage,估计不久也会有的。 官方网站:https://my.frantech.ca/cart.php?gid=48 加密货币、信用卡、PayPal、...

老薛主机入门建站月付34/月,年付345元,半价香港VPS主机

老薛主机怎么样?老薛主机这个商家有存在有一些年头。如果没有记错的话,早年老薛主机是做虚拟主机业务的,还算不错在异常激烈的市场中生存到现在,应该算是在众多商家中早期积累到一定的用户群的,主打小众个人网站业务所以能持续到现在。这不,站长看到商家有在进行夏季促销,比如我们很多网友可能有需要的香港vps主机季度及以上可以半价优惠,如果有在选择不同主机商的香港机房的可以看看老薛主机商家的香港vps。点击进入...

169pp com为你推荐
找不到光驱我的电脑里找不到光驱微信如何建群在微信里怎么创建一个群别人可以加入扫描二维码的加入pw美团网电话是什么pw百度手写百度手写显示arm开发板新手入门应如何选择 ARM 开发板?网易公开课怎么下载如何下载网易公开课保护气球抖音里面看的,这是什么游戏xp系统停止服务xp系统停止服务怎么办xp系统停止服务Windowsxp系统为什么停止服务怎么升级ios6苹果6怎么升级最新系统
lunarpages webhostingpad 免费主机 BWH 缓存服务器 tightvnc java空间 商务主机 浙江独立 美国凤凰城 西安主机 注册阿里云邮箱 杭州电信 脚本大全 register.com ncp 时间服务器 cpu使用率过高怎么办 e-mail 服务器操作系统 更多