candidate169pp

169pp com  时间:2021-03-03  阅读:()
SearchingforCommonSense:PopulatingCycfromtheWebCynthiaMatuszek,MichaelWitbrock,RobertC.
Kahlert,JohnCabral,DaveSchneider,PurveshShah,DougLenatCycorp,Inc.
3721ExecutiveCenterDrive,Suite100,Austin,TX78731{cynthia,witbrock,rck,jcabral,daves,shah,lenat}@cyc.
comAbstractTheCycprojectispredicatedontheideathateffectivemachinelearningdependsonhavingacoreofknowl-edgethatprovidesacontextfornovellearnedinforma-tion–whatisknowninformallyas"commonsense.
"Overthelasttwentyyears,asufficientcoreofcommonsenseknowledgehasbeenenteredintoCyctoallowittobegineffectivelyandflexiblysupportingitsmostimportanttask:increasingitsownstoreofworldknowledge.
Inthispaper,wepresentinitialworkonamethodofusingacombinationofCycandtheWorldWideWeb,accessedviaGoogle,toassistinenteringknowledgeintoCyc.
Thelong-termgoalisautomatingtheprocessofbuildingaconsistent,formalizedrepre-sentationoftheworldintheCycknowledgebaseviamachinelearning.
Wepresentpreliminaryresultsofthisworkanddescribehowweexpecttheknowledgeacqui-sitionprocesstobecomemoreaccurate,faster,andmoreautomatedinthefuture.
1IntroductionTheideaofbuildingaverylarge-scaleknowledgebasethatcanbeusedasafoundationforautomatedknowledgeacqui-sitionhasbeenpresentinartificialintelligenceresearchformorethantwentyyears[Lenatetal.
,1983].
Inthattime,anenormousamountofprogresshasbeenmade[Thrunetal.
,1998];techniquesdevelopedundertheumbrellaofmachinelearninghavebeensuccessfullyappliedtoworkrangingfromrobotics,tovoicerecognition,tobioinformatics.
Inallofthesefields,theuseofpreexistingknowledgeiswide-spread.
Muchofthisworkreliesoneitherprogramminganinductivebiasintoalearningsystem(e.
g.
,insystemslikeAM[Lenat,1976]);oronprovidinganinductivebiasintheformoftrainingexamples[Brown,1996].
Alsointhattime,theWebhasemergedasahugereposi-toryofelectronicallyavailableknowledge,andindexingsystemssuchasGooglehavemadethatknowledgeprogres-sivelymoreaccessible[BrinandPage,1998].
Workthatreliesonthewebingeneral,andGoogleinparticular,forinformationextractionisprovingtobeafertileresearcharea[Ghani,2000;Kwoketal.
2001;Etzionietal.
2004].
ThepurposeoftheCycprojectistoprovidecomputerswithastoreofformallyrepresented"commonsense":realworldknowledgethatcanprovideabasisforadditionalknowledgetobegatheredandinterpretedautomatically[Lenat,1995].
Inthelasttwentyyears,overthreemillionfactsandruleshavebeenformallyrepresentedintheCycknowledgebasebyontologistsskilledinCycL,Cyc'sfor-malrepresentationlanguage.
Toolshavebeendevelopedwhichallowsubjectmatterexpertstocontributedirectly[Pantonetal.
,2002;Witbrocketal.
,2003;Belascoetal.
,2004].
Inaddition,naturallanguagegenerationandparsingcapabilitieshavebeendevelopedtoprovidesupportforlearningfromEnglishcorpora[Witbrocketal.
,2004].
Asaresult,theCycknowledgebasenowcontainsenoughknowledgetosupportexperimentationwiththeacquisitionofadditionalknowledgeviamachinelearning.
Inthispaper,wedescribeamethodforgatheringandverifyingfactsfromtheWorldWideWeb.
Theknowledgeacquisitionprocedureisdescribedatbothanoverviewlevelandindetail.
Theworkfocusesonthreenovelapproaches:usingknowledgealreadyintheCycKBtofocustheacquisitionoffurtherknowledge;representingacquiredknowledgeintheknowl-edgebase;andusingGoogleintwodistinctways,tofindfactsand,separately,toverifythem.
Whilethisresearchisatanearlystage,theinitialresultsarepromisingintermsofboththeacquisitionspeedandqualityofresults.
Eveninitspreliminaryform,themecha-nismdescribedisausefultoolforreducingthecostofmanuallyenteringknowledgeintoCyc;thelevelofexper-tiserequiredtoenableapersontocontributetotheKBisreduced,andmanyofthenecessarystepsarehandledauto-matically,reducingthetotaltimerequired.
Thenumberofsentencesthatcanbeacquiredinthiswayandreviewedforaccuracybyanuntrainedrevieweroutstripstherateatwhichsentencescanbehand-authoredbyatrainedontolo-gist,andtheverificationstepsreducetheamountofworkrequiredofahumanreviewerbyapproximately90%.
2CycandCycLTheCycsystemismadeupofthreedistinctcomponents,allofwhicharecrucialtothemachinelearningprocess:theknowledgebase(KB),theinferenceengine,andthenaturallanguagesystem.
TheCycKBcontainsmorethan3.
2mil-lionassertions(factsandrules)describingmorethan280,000concepts,includingmorethan12,000concept-AAAI-05/1430interrelatingpredicates.
FactsstoredintheCycKBmaybeatomic(makingthemGroundAtomicFormul,orGAFs),ortheymaybecomplexsentences.
Dependingonthepredi-cateused,aGAFcandescribeinstance-levelortype-levelknowledge.
AllinformationintheKBisassertedintoahi-erarchicalgraphofmicrotheories,orreasoningcontexts[Guha,1991;Lenat,1998].
CycLqueriesaresyntacticallylegalCycLsentences,whichmaybepartiallybound(thatis,containoneormorevariables).
TheCycinferenceengineisresponsibleforusinginformationintheKBtodeterminethetruthofasentenceand,ifnecessary,findprovablycorrectvariablebindings.
Sampleinstance-levelGAF:(foundingDateCyc(YearFn1985))Sampleentity-to-typeandtype-to-typeGAFS:(sellsProductTypeSaudiAramcoPetroleumProduct)(conditionAffectsPartTypeCutaneousAnthraxSkin)Figure1:Learningisaprocessofselectinginterestingquestions,searchingforthatinformationontheweb,pars-ingtheresults,performingverificationandconsistencycheckswiththedocumentcorpusandtheKB,reviewing,andassertingthatknowledgeintotheKB.
Samplenon-atomicsentence:(or(foundingDateAlQaida(YearFn1987)))(foundingDateAlQaida(YearFn1988)))Samplequery:3.
Parsingresults:Therelevantcomponentsofsentencesareidentifiedbytheirlocationrelativetothesearchstring.
ThetermsarethenparsedintoCycLviathenaturallan-guageparsingprocessdescribedinsection3.
3,resultinginoneormoreGAFssuchas:(foundingAgentPalestineIslamicJihadWHO)Thenaturallanguagecomponentofthesystemconsistsofalexicon,andparsingandgenerationsubsystems.
ThelexiconisacomponentoftheknowledgebasethatmapswordsandphrasestoCycconcepts,whilevariousparsersprovidemethodsfortranslatingEnglishtextintoCycL.
ThesystemalsohasarelativelycompleteabilitytorenderCycLsen-tencesintoEnglish,althoughthephrasingcanbesomewhatstiltedwhenlongersentencesaregenerated.
(foundingAgentPalestineIslamicJihadTerrorist-Nafi)4.
KBconsistencychecking:Someoftheresultsretrievedduringthesearchprocessaredisprovable,becausetheyareinconsistentwithknowledgealreadypresentintheknowl-edgebase;othersarealreadyknownortriviallyprovable,andthereforeredundant.
AnyGAFfoundviainferencetobeinconsistentorredundantisdiscarded.
TheworkdescribedinthispapertargetstheautomaticacquisitionofGAFs.
Simplefactsaremorelikelytobereadilyfoundontheweb,andthisapproachminimizesdif-ficultiesingeneratingandparsingcomplexnaturallanguageconstructs.
5.
Googleverification:Duringsearch,thelargestpossiblesetofcandidateGAFsiscreated.
ThoseGAFsthatarenotdiscardedduringKBconsistencycheckingarere-renderedintoEnglishsearchstrings,suchas:2.
1OverviewoftheLearningCycleGatheringinformationfromthewebproceedsinsixstages,asillustratedinFigure1:"BashirNafiisafounderofPalestineIslamicJihad"andasecondGooglesearchoverthosestringsisperformed.
AnyGAFthatresultsinnoretrieveddocumentsduringthisphaseisdiscarded.
1.
Choosingaquery:BecausethenumberofconceptsintheKBissolarge,thenumberofpossibleCycLqueriesisenormous;choosinginteresting,productivequeriesauto-maticallyisanecessarystepinautomatingtheknowledgeacquisitionprocess.
Anexampleofsuchaquerymightbe:6.
Reviewingandasserting:TheremainingGAFsareas-sertedintospecialhypotheticalcontextsintheknowledgebase.
Anontologistorhumanvolunteerreviewsthemforaccuracy,usingatoolspecifictothattask[Witbrocketal.
,2005],andtheonesfoundtobecorrectareassertedintotheknowledgebase.
(foundingAgentPalestineIslamicJihadWHO)2.
Searching:Onceaqueryisselected,itistranslatedintooneormoreEnglishsearchstrings.
Thequeryabovemightberenderedintostringssuchas:3ImplementationoftheLearningCycle"PIJ,foundedby""PalestineIslamicJihadfounder"ThesestringsarepassedontotheGoogleAPI.
Theappro-priatesectionsofanyresultingdocumentsaredownloaded,andtherelevantsectionisextracted(e.
g.
,"PIJfounderBashirMusaMohammedNafiisstillatlarge…").
3.
1SelectingQueriesWhileitisoftenusefulinanapplicationcontexttolookfortheanswertoaspecificquestion,orautomaticallypopu-lateaclassofinformation(suchasfoundersofgroups,orAAAI-05/1431primeministersofcountries),satisfyingtheultimategoalofpopulatingtheCycKBviamachinelearningreliesinpartonautomaticallyselectingsuitablesentences.
Thereareanumberofchallengesthatmustbemetinthisregard.
Que-riesshouldhavereasonableprobabilityofhavinginterestingbindingsandofbeingfindableinthecorpus(inthiscase,ontheweb).
Somesearchesareunlikelytobeproductiveforsemanticreasons:someargumentpositionsareofaninfiniteorcontinuoustype,suchasTime-Quantity,andcouldthere-foreyieldaninfinitenumberofmostly-uninterestingsearches,suchas(ageOBJECT(YearsDuration300)).
Otherqueriesareguaranteedtoberedundant.
Weinitiallylimitedsearchestoasetof134binarypredi-cateswhich,whenusedtogeneratesearchstrings,tendedtomaximizeusefulresultsfromwebsearches.
1Thealgorithmforproceedingthroughthosepredicateswasasfollows:Foragivensearchrun,adepthofDisselected.
Disthemaximumnumberofdifferentvaluesthatcanbeusedforeachargumentofapredicate.
ForeachbinarypredicatepiinourtestsetP(where|P|=134),weretrievefromtheKBthetypeconstraintsoneachofitstwoarguments.
Unlessthetypegeneralizestoaninfiniteclass,weretrievetheDmostfullyrepresentedvaluesfromtheknowledgebase–thatis,thosethatappearinthemostassertions,andthereforeaboutwhichthemostisknown.
Theseareassumedtobethemostinterestingtermsofthattype,andthereforetheonesmostlikelytobefoundbyawebsearch.
ForpiwethenhavetypesTi1andTi2.
TheDbestrepresentedvalueswouldbe(ti11…ti1D)and(ti21…t12D).
Ifneitherofapredicate'sar-gumentstookvaluesofacontinuoustype,therewouldbe2D*|P|queriesgenerated:(p1t111VAR)…(p1t11DVAR)(p1VARt121)…(p1VARt12D)…(p|P|t|P|11VAR)…(p|P|t|P|1DVAR)(p|P|VARt|P|21)…(p|P|VARt|P|2D)Forexample,asetofthepredicatesfoundingAgentandfoundingDate,givenadepthof1,wouldproducethreequeries:(foundingAgentAlQaidaWHO)(foundingAgentWHATTerrorist-Salamat)(foundingDateAlQaidaWHEN)Thefourthpermutationisnotproduced,becausetheargu-mentconstraint,Date,isofacontinuoustype.
Thisapproachisnotwithoutproblems.
Itreliesheavilyonthenatureofthetypeconstraintsplacedonpredicates;forsomepredicates,suchasfoundingDate,thisworkswell,whileforotherstheargumentconstraintsaretoobroad.
Forexample,thepredicatesellsProductTypetakesaconstantoftypesomethingExistinginitssecondargumentposition,becausealmostanythingcanbesold.
TheproposedwaytoaddressthisproblemiswithapredicatetypicalArgIsa,which1Examples:foundingAgent,foundingDate,sellsPro-ductType,primeMinister,lifeExpectancy,awardWinners.
Productivepredicateswerefoundviamanualtrialanderror,fromasetofdomainsselectedtospanabroadportionoftheKB(terror-ism,medicaltechnology,conceptualworks,globalpolitics,familyrelationships,andsales).
wouldconnectpredicatessuchassellsProductTypewiththecollectionstowhichtheytypicallyrefer(inthiscase,Com-modityProduct)2.
AnotherproblemlieswiththeassumptionthatthemembersofaclassaboutwhichCycknowsthemostarethemostinterestingones,whichisonlysometimescorrect.
Thebest-describedinstancesoftheclassPerson,forexample,tendtobetheontologistswhoworkatCycorp,ratherthan,forexample,headsofstate.
FutureworkwillinvolveusingGoogleintheselectionofappropriatequeries.
RatherthanusingthetopDmostsupportedtermsintheKBoftypeT,itshouldbepossibletoretrievethehitcountforuptoseveralthousandmembersofT,andseekinformationabouttheDtermsofTforwhichthemostinformationisavailable.
3.
2SearchGeneratingSearchStringsOncethesystemhasselectedaquery,itgeneratesaseriesofsearchstrings.
TheexistingNLgenerationmachineryisappliedto233manuallycreatedspecialgenerationtem-platesforthe134predicates.
3SeveralfactorsmotivatedtheconstructionofspecializedsearchgenerationtemplateswithintheNLsystem.
InadditiontothefactthatproductivesearchstringsareoftenunlikestandardEnglish,CycLgen-erationstendtobesomewhatstilted,sinceontologistshavepreferredunambiguousexpressionofCycLmeaningsovernaturalness.
Inaddition,theKBgenerallycontainsoneortwogenerationtemplatesforanygivenpredicate,whiletheremaybemanycommonwaysofexpressingtheinfor-mationthatmaybeusefulforsearching.
Forexample,forthequery(foundingAgentPalestineIs-lamicJihadX),thesystemgeneratesthefollowingsetofsearchstrings:PalestineIslamicJihadfounder____PalestinianIslamicJihadfounder____PIJfounder____PalestineIslamicJihad,foundedby____PalestinianIslamicJihad,foundedby____PIJ,foundedby____AllpossiblesearchesweregeneratedfromtheCartesianproductofthegenerationtemplateswithallEnglishrender-ingsofthearguments.
Inthisexample,CycknowsthreenamesforPalestineIslamicJihad,andhastwotemplatesforfoundingAgent,resultinginsixsearchstrings.
Tosimplifymatchdetection,argumentpositionswereonlyallowedatthebeginningorendofthetemplatestring.
Inordertocarryouttheactualsearch,the"___"placeholderswerestripped2Inmostcases,theseconstraintswillbelearnedfromanalysisofcommonusageintheexistingcorpus.
3Slightlyfewerthanhalfofthepredicateshaveonlyoneasso-ciatedsearchtemplate.
Manyobvioustemplatesexistbuthavenotbeenrepresented;infuturework,ananalysisofthesearchstringswillbeperformedtodeterminewhattypesofstringsproducegoodresults.
Thatinformationwillbeusedforautomaticallygeneratingsearchstringsforotherpredicates.
AAAI-05/1432off,andtheremainingstringwassubmittedasaquotedstringtoGoogle.
SearchingviaGoogleThesearchstringisusedwithaninterfacetotheGoogleAPItoretrieveatupleconsistingoftheURL,theGoogleranking,thematchpositionandthewebpagetext,whichisthenhandedofftoaparserthatattemptstoconvertitintoameaningfulCycLentity.
3.
3ParsingintoCycLOnceadocumentisretrieved,theanswermustbefoundandinterpreted.
First,thesystemsearchesfortheexacttextofthequerystring,andreturnsthesearchstringpluseitherthebeginningortheendofthesentence,dependingontheposi-tionofthe"___"inthegeneratedstring.
TheresultingstringissearchedforphrasesthatcanbeinterpretedasaCycLconceptthatmeetsthetypeconstraintsimposedbythepredicate.
Forexample,inthestring"PIJfounderBashirMusaMohammedNafiisstill…","BashirMusaMoham-medNafi"isrecognizedasaperson,andthereforeasacan-didateforthearg2positionoffoundingAgent.
Forpredicatesthatrequirestrings(suchasnameString,whichrelatesanentitytoitsname),anamedentityrecognizer[Prageretal.
,2000]isusedtofindasuitablecandidate.
Inothercases,standardparsingtechniquesareusedtotrytofindausefulinterpretation,includinglookingupstringsinCyclexicon,interpretingthemasnouncompoundsordates,andcompositionalinterpretation.
Forspeed,composi-tionalinterpretationisonlyattemptedfortermsjudgedtobeconstituentsbyaprobabilisticCFGparser[Charniak,2001].
Thishastheeffectofeliminatingafewcorrectanswers(mostlywheretheparserproducesanincorrectsyntacticparse),butalsodecreasesthetotaltimespentonanalysisbyatleast50%.
CreatingcandidateCycLSentencesTheresultofparsingthematchedsectioninthewebpageisasetofcandidateCycLterms,usuallyconstantssuchasTerrorist-Nafi.
Substitutingthesetermsintotheoriginalin-completeCycLqueryproducesasetofcandidateGAFs.
3.
4CheckingCycKBConsistencyInprinciple,anyusefulfactaddedtothesystemshouldnei-therdisprovablenortriviallyprovable.
Forexample,thesentence:(foundingAgentPalestineIslamicJihadTerrorist-AlShikaki)isnotnovel,becauseCycalreadyknowsthis.
Meanwhile,(foundingAgentPalestineIslamicJihadAugusteRodin)isnovel,buttriviallydisprovable,asCycknowshediedin1917(72yearsbeforethePIJwasfounded).
IftheCycKBalreadycontainsknowledgethatrendersanewfactredundantorcontradictory,thatfactwillbedis-carded.
Thisischeckedviainference;eachnewfactistreatedasaquery,andinferenceisperformedtodeterminewhetheritcanbeprovenfalse(inconsistent)ortrue(redun-dant.
)Cycprovidesjustificationsforfactsusedtoproducequeryresults,asinFigure2,anditishelpfulforredundantfactstobemarkedasadditionallyconfirmedviasearch-basedlearning.
Previousworksuggeststhatone-stepinfer-enceissufficienttoidentifyduplicateinformationordis-provecontradictionsduringtypicalknowledgeacquisitiontasks[Pantonetal.
,2002].
Figure2:Justificationsfortheclaimthata2001attackonAnkarameetsthecriteriaforthequerybeingrun.
3.
5GoogleVerificationInordertoguardagainstparsererrorsandexcessivelygen-eralsearchterms(suchasambiguousacronyms),asecondGooglesearchisperformed,inordertodeterminewhethersearchstringsgeneratedfromthenewGAFwillproduceresults.
SearchstringsaregeneratedfromthecandidateGAFthatwaslearned,butanystringcontaininganacronymorabbreviation(e.
g.
,"PIJ"for"PalestianIslamicJihad")issupplementedwiththedisambiguationterm:theleastcom-monword(basedonGooglehitcounts)oftheexpandedacronym.
Inthiscase,thestring"Palestine"isaddedasaterm,sinceitistheleastcommonwordintheset'Pales-tine,''Palestinian,''Islamic,'and'Jihad.
'Theresultingverificationsearchstringis:"PIJfounderBashirNafi"+"Palestine"Anyfactforwhichthisverificationstepreturnsnoresultsisconsideredunverified,andwillnotbepresentedtoare-viewer.
3.
6ReviewandAssertionInthefinalstep,learnedsentencesarereviewedbyahumancurator,and,ifcorrect,assertedintotheCycKB.
Currently,suggestedsentencesarepresentedtothereviewerinnopar-ticularorder;infuture,sortingmethodswillbeimplementedandtested.
ThemoststraightforwardapproachesinvolvemakinggreateruseofinformationalreadyretrievedfromGoogle:sinceinformationaboutthesearchesunderlyingacandidatesentenceisstoredintheKB,itshouldbepossibleandproductivetogiveprioritytosentencesthataresup-portedbyalargertotalnumberofdocuments.
Ineffect,thenumericalvaluesreturnedduringtheverificationstepcouldbeusedtosortthemostwidelysupportedsentencesup-AAAI-05/1433wardsinthereviewprocess.
Anotherpossibility,ifseveralcontradictoryfactsarefound,isgivingreviewprioritytothosefoundindocumentswiththehighestGoogleranking.
4ResultsStatisticsweregatheredforacaseinwhich134predicatesinPwereused,andDwassetto20.
4Themajorityofthesearchesexpended,about80%,wereperformedintheveri-ficationphaseratherthantheinitialsearchphase.
There-sultswereasfollows:Queries:348Searchesexpended:4290(817initial,3477verification)GAFsfound:1016…andrejectedduetoKBinconsistency:4…andalreadyknowntotheKB:384…andrejectedbyGoogleverification:566NovelGAFSfoundandverified:61AhumanreviewerthenwentthroughtheverifiedGAFs,andasampleof53oftheunverifiedGAFs,anddeterminedtheiractualcorrectnessrate.
Theresultswereasfollows:VerifiedUnverifiedTrue(correct)328**False(incorrect)29*45Totalnovelfacts:114Novel,correctfactsdiscovered:77Incorrectfactsdiscovered:37Factscategorizedcorrectly:68%Factscategorizedincorrectly:32%…*falsepositives(falsebutverified):25%…**falsenegatives(truebutunverified):7%Examplesoftheseresulttypes:Query:(#$hasBeliefSystems#$IranX)Searchstring:"Iranadheresto"CandidateGAF:(#$hasBeliefSystems#$Iran#$Islam)Verificationsearchstrings:"IslamicRepublicofIranadherestoIslam""IranadherestoIslam""IranbelievesinIslam"(found)"IslamicRepublicofIranbelievesinIslam"(found)ExampleGAFsalreadyknowntotheKB:(#$vestedInterest#$Iran#$Iraq(#$inhabitantTypes#$Lebanon#$EthnicGroupOfKurds)ExampleGAFsrejectedduetoKBinconsistency:(#$northOf#$Iran#$Iran)(#$geopoliticalSubdivision#$Iraq#$Iran)4Ittakesbetweenfourandfivehourstoexhaustanallotmentof3,000searchesperdaythroughtheGoogleAPI.
Correct,verifiedGAF:(foundingDateAfricanNationalCongress(YearFn1912))*IncorrectbutverifiedGAF:(foundingDateJewishDefenseLeague(DecadeFn198))Incorrect,rejectedGAF:(objectFoundInLocationKuKluxKlanGillianAnderson)**CorrectbutrejectedGAF:(foundingDateKarenNationalUnion(MonthFnApril(YearFn1947)))Theverificationstepproducescomparativelyfewfalsenegatives(inwhichatruefactisincorrectlyclassifiedasfalse);inthisrun,80%ofthenovel,correctfactsretrievedwerecorrectlyidentifiedassuch.
Giventhis,itisreasonabletorejectallunverifiablesentences,especiallygiventhewealthofpossiblequeriesandthesizeandbreadthofthecorporaavailable.
Only61%oftheincorrectfactsretrievedwereidentified,suggestingthatsubstantialworkindecreas-ingtheoccurrenceoffalsepositiveswillbenecessarybe-foretheneedforhumanreviewiseliminated;thisisunsur-prising,astheInternetcontainslargeamountsofunstruc-tured,uncheckedinformation.
SlightlyoverathirdoftheGAFsdiscoveredwerefactsthatwerealreadyknowntotheKB,andpresumablycorrecttoabaselinelevel(i.
e.
,thecorrectnesslevelachievedbyhumanontologists);thetotalnumberofcorrectfactsdis-coveredwastherefore425,42%ofthetotal.
Verificationreducesthenumberofnovelsentencesthatmustundergohumanreviewfrom1016to61,andthehumanreviewproc-ess,whichtakesplaceentirelyinEnglish,isquickandstraightforward.
Anintermediatesteptowardsfullautoma-tionwouldbetoidentifyclassesofsentencesthatcanbeassertedwithouthumanreview.
5ConclusionsWhilegreatstrideshavebeenmadeinmachinelearninginthelastfewdecades,automaticallygatheringuseful,consis-tentknowledgeinamachine-usableformisstillarelativelyunexploredresearcharea.
TheoriginalpromiseoftheCycproject–toprovideabasisofreal-worldknowledgesuffi-cienttosupportthesortoflearningfromlanguageofwhichhumansarecapable–hasnotyetbeenfulfilled.
Inthattime,informationhasbecomeenormouslymoreaccessible,dueinnosmallparttothewidespreadpopularityoftheWebandtoeffectiveindexingsystemssuchasGoogle.
Makinguseofthatrepositoryrequiresastoreofreal-worldknowledgeandsomefacilityfornaturallanguageparsingandgeneration.
Theseresults,whileextremelypreliminary,areencourag-ing.
Inparticular,usingCycasabasisforlearningiseffec-tive,bothinguidingthelearningprocessandinrepresentingandusingtheresults.
Pre-existingknowledgeintheKBsupportstheconstructionofmeaningfulqueriesandpro-videsaframeworkintowhichlearnedknowledgecanbeassertedandreasonedover.
ComparativelyshallownaturallanguageparsingcombinedwiththetypeconstraintandrelationknowledgeintheCycsystemallowstheretrieval,AAAI-05/1434verification,andreviewofunconstrainedfactsatahigherratethanthatachievedbyhumanknowledgerepresentationexpertsworkingunassisted.
Perhapsmoreimportantly,thekindofknowledgeretrievedisexactlytheinstance-levelknowledgethatshouldnotrequirehumanexperts–itshouldinsteadbeobtained,maintained,andreasonedoverbytoolsthatneedandusethatknowledge.
InvolvingGoogleineverystageofthelearningprocessallowsustoexploitbothCyc'sknowledgeandtheknowledgeonthewebinanex-tremelynaturalway.
Theworkbeingdonehereisimmediatelyusefulasatoolthatmakeshumanknowledgeentryfaster,easierandmoreeffective,butitalsoprovidesabasisforanalysisofwhatinformationcanbelearnedeffectivelywithouthumaninter-action.
Thus,overtime,wehopetoprovideCycwithamechanismtotrulyacquireknowledgebylearning.
AcknowledgmentsThisresearchwaspartiallysponsoredbyARDA'sAQUAINTprogram.
Additionally,wethankGoogleforallowingaccesstotheirAPIforresearchsuchasthis.
References[Belascoetal.
,2004]A.
Belasco,J.
Curtis,RCKahlert,C.
Klein,C.
Mayans,R.
Reagan.
RepresentingKnowledgeGapsEffectively.
InProc.
ofthe5thInternationalCon-ferenceonPracticalAspectsofKnowledgeManage-ment,Vienna,Austria,p.
159-164.
Dec2004.
[BrinandPage,1998]SergeyBrinandLarryPage,Anat-omyofaLarge-scaleHypertextualSearchEngine.
InProc.
ofthe7thInternationalWorldWideWebConfer-ence,pp107-117,Brisbane,Australia,Apr1998.
[Brown,1996]R.
D.
Brown,Example-BasedMachineTranslationinthePanglossSystem.
InProc.
ofthe16thInternationalConferenceonComputationalLinguistics,pp169-174.
Copenhagen,Denmark,August5-9,1996.
[Charniak,2001]E.
Charniak.
AMaximum-Entropy-InspiredParser.
InProc.
ofthe1stconferenceonNorthAmericanchapteroftheAssociationforComputationalLinguistics,pp132-139.
Seattle,WA,2000.
MorganKaufmannPublishers.
[Etzionietal.
,2004]O.
Etzioni,M.
Cafarella,D.
Downey,A,Popescu,T.
Shaked,S.
Soderland,D.
Weld,A.
Yates.
Web-scaleInformationExtractioninKnowItAll.
InProc.
ofthe13thinternationalconferenceonWorldWideWeb,pp100-110,NewYork,NY,2004.
[Ghani,2000]R.
Ghani,R.
Jones,D.
Mladenic,K.
Nigam,S.
Slattery.
DataMiningonSymbolicKnowledgeEx-tractedfromtheWeb.
InProc.
ofthe6thInternationalConferenceonKnowledgeDiscoveryandDataMiningWorkshoponTextMining,pp29-36,Boston,MA,Aug2000.
[Guha,1991]R.
V.
Guha.
Contexts:AFormalizationandSomeApplications.
PhDthesis,StanfordUniversity,STAN-CS-91-1399-Thesis,1991.
[Kwoketal.
,2001]C.
Kwok,O.
Etzioni,D.
Weld.
ScalingQuestionAnsweringtotheWeb.
InACMTransactionsonInformationSystems,Vol19,Issue3,pp242–262.
2001[Lenat,1976]D.
B.
Lenat.
AM:AnArtificialIntelligenceApproachtoDiscoveryinMathematicsasHeuristicSearch,Ph.
D.
Dissertation,StanfordUniversity,STAN-CS-76-570,1976.
[Lenatetal.
,1983]D.
B.
Lenat,A.
Borning,D.
McDonald,C.
Taylor,S.
Weyer.
Knoesphere:BuildingExpertSys-temswithEncyclopedicKnowledge.
InProc.
ofthe8thInternationalJointConferenceonArtificialIntelligence,Vol1,pp167–169,Karlsruhe,Germany,August1983.
[Lenat,1995]D.
B.
Lenat.
Cyc:aLarge-ScaleInvestmentinKnowledgeInfrastructure.
InCommunicationsoftheACM,Vol38,Issue11,pp33-38.
Nov1995.
[Lenat,1998]D.
B.
Lenat,TheDimensionsofContext-Space,fromhttp://www.
cyc.
com/doc/context-space.
pdf.
[Pantonetal.
,2002]K.
Panton,P.
Miraglia,N.
Salay,R.
C.
Kahlert,D.
Baxter,R.
Reagan.
KnowledgeFormationandDialogueUsingtheKRAKENToolset.
InProc.
ofthe18thNationalConferenceonArtificialIntelligence,pp900-905,Edmonton,Canada,2002.
[Prageretal.
,2000]J.
Prager,E.
Brown,A.
Coden,D.
Radev.
QuestionAnsweringbyPredictiveAnnotation.
InProc.
ofthe23rdAnnualInternationalACMSIGIRConferenceonResearchandDevelopmentinInforma-tionRetrieval,pp184-191.
Athens,Greece,2000.
[Thrunetal.
,1998]S.
Thrun,C.
Faloutsos,T.
Mitchell,L.
Wasserman.
AutomatedLearningandDiscovery:State-Of-The-ArtandResearchTopicsinaRapidlyGrowingField,tech.
reportCMU-CALD-98-100,ComputerSci-enceDepartment,CarnegieMellonUniversity,1998.
[Witbrocketal.
,2003]M.
Witbrock,D.
Baxter,J.
Curtis,D.
SchneiderR.
C.
Kahlert,P.
Miraglia,P.
Wagner,K.
Panton,G.
Matthews,A.
Vizedom.
AnInteractiveDia-logueSystemforKnowledgeAcquisitioninCyc.
InProc.
ofthe18thInternationalJointConferenceonArti-ficialIntelligence,Acapulco,Mexico,2003.
[Witbrocketal.
,2004]M.
Witbrock,K.
Panton,S.
Reed,D.
Schneider,B.
Aldag,M.
ReimersandS.
Bertolo.
AutomatedOWLAnnotationAssistedbyaLargeKnowledgeBase.
InWorkshopNotesofthe2004Work-shoponKnowledgeMarkupandSemanticAnnotationatthe3rdInternationalSemanticWebConference,Hi-roshima,Japan,pp71-80.
Nov2004.
[Witbrocketal.
,2005]:M.
Witbrock,C.
Matuszek,A.
Brusseau,R.
C.
Kahlert,C.
B.
Fraser,D.
Lenat.
"Knowl-edgeBegetsKnowledge:StepstowardsAssistedKnowl-edgeAcquisitioninCyc,"inProc.
oftheAAAI2005SpringSymposiumonKnowledgeCollectionfromVol-unteerContributors,Stanford,CA,March2005.
AAAI-05/1435

HostKvm开年促销:香港国际/美国洛杉矶VPS七折,其他机房八折

HostKvm也发布了开年促销方案,针对香港国际和美国洛杉矶两个机房的VPS主机提供7折优惠码,其他机房业务提供8折优惠码。商家成立于2013年,提供基于KVM架构的VPS主机,可选数据中心包括日本、新加坡、韩国、美国、中国香港等多个地区机房,均为国内直连或优化线路,延迟较低,适合建站或者远程办公等。下面列出几款主机配置信息。美国洛杉矶套餐:美国 US-Plan1CPU:1core内存:2GB硬盘...

优林云(53元)哈尔滨电信2核2G

优林怎么样?优林好不好?优林 是一家国人VPS主机商,成立于2016年,主营国内外服务器产品。云服务器基于hyper-v和kvm虚拟架构,国内速度还不错。今天优林给我们带来促销的是国内东北地区哈尔滨云服务器!全部是独享带宽!首月5折 续费5折续费!地区CPU内存硬盘带宽价格购买哈尔滨电信2核2G50G1M53元直达链接哈尔滨电信4核4G50G1M83元直达链接哈尔滨电信8核8G50G1M131元直...

Gcore(gcorelabs)俄罗斯海参崴VPS简单测试

有一段时间没有分享Gcore(gcorelabs)的信息了,这是一家成立于2011年的国外主机商,总部位于卢森堡,主要提供VPS主机和独立服务器租用等,数据中心包括俄罗斯、美国、日本、韩国、新加坡、荷兰、中国(香港)等多个国家和地区的十几个机房,商家针对不同系列的产品分为不同管理系统,比如VPS(Hosting)、Cloud等都是独立的用户中心体系,部落分享的主要是商家的Hosting(Virtu...

169pp com为你推荐
刷网站权重怎么才能提升网站百度权重呢windows优化大师怎么用windows优化大师怎么用啊?eset最新用户名密码eset smart security3.0.621.0最新用户名和密码怎么找吴晓波频道买粉《充电时间》的节目跟《吴晓波频道》哪个好听?吴晓波频道买粉看吴晓波频道的心得bluestacksbluestacks怎么用数码资源网手机练习打字的软件今日热点怎么删除“今日热点”到底要怎样才能取消弹窗,每次开机都会淘宝店推广如何推广淘宝店中小企业信息化中小企业信息化途径有哪些
河南vps 独享100m 堪萨斯服务器 kvmla java主机 鲜果阅读 好看的桌面背景图片 太原联通测速平台 网站木马检测工具 服务器合租 能外链的相册 银盘服务 环聊 杭州电信 SmartAXMT800 蓝队云 ping值 中国最年轻博士 美国凤凰城旅游 免费网络小说 更多