sentencesjrzj
jrzj com 时间:2021-03-02 阅读:(
)
TimeML-CompliantTextAnalysisforTemporalReasoningBranimirBoguraevandRieKubotaAndoIBMT.
J.
WatsonResearchCenter,19SkylineDrive,Hawthorne,NY10532,USAbran@us.
ibm.
com,rie1@us.
ibm.
comAbstractReasoningwithtime1needsmorethanjustalistoftemporalexpressions.
TimeML—anemergingstandardfortemporalannotationasalanguagecap-turingpropertiesandrelationshipsamongtime-denotingexpressionsandeventsintext—isagoodstartingpointforbridgingthegapbetweentempo-ralanalysisofdocumentsandreasoningwiththeinformationderivedfromthem.
HardasTimeML-compliantanalysisis,thesmallsizeoftheonlycurrentlyavailableannotatedcorpusmakesitevenharder.
WeaddressthisproblemwithahybridTimeMLannotator,whichusescascadednite-stategrammars(fortemporalexpressionanalysis,shal-lowsyntacticparsing,andfeaturegeneration)to-getherwithamachinelearningcomponentcapableofeffectivelyusinglargeamountsofunannotateddata.
1TemporalAnalysisofDocumentsManyinformationextractiontaskslimitanalysisoftimetoidentifyinganarrowclassoftimeexpressions,whichliterallyspecifyatemporalpointoraninterval.
Forin-stance,arecent(2004)ACEtaskisthatoftemporalex-pressionrecognitionandnormalisation(TERN;seehttp://timex2.
mitre.
org/tern.
html).
Ittargetsabsolutedate/timespecications(e.
g.
"June15th,1998"),descriptionsofintervals("threesemesters"),referential(relative)expres-sions("lastweek"),andsoforth.
Afractionofsuchexpres-sionsmayincludearelationalcomponent("thetwoweekssincetheconference","amonthofdelaysfollowingthedis-closure"),makingthemevent-anchored;however,themajor-ityreferonlytowhatinamoresyntacticframeworkwouldbeconsideredasa'temporaladjunct'.
TheTERNtaskthusdoesnotaddressthegeneralquestionofassociatingatimestampwithanevent.
Deeperdocumentanalysisrequiresawarenessoftempo-ralaspectsofdiscourse.
Severalapplicationshaverecentlystartedaddressingsomeissuesoftime.
Documentsummari-sationtacklesidenticationandnormalisationoftimeexpres-1ThisworkwassupportedbytheARDANIMD(NovelIntelli-genceandMassiveData)programPNWD-SW-6059.
sions[Mani&Wilson,2000],timestampingofeventclauses[FilatovaandHovy,2001],andtemporalorderingofeventsinnews[Manietal.
,2003].
Operationalquestionanswering(QA)systemscannow(undercertainconditions)answere.
g.
'when'or'howlong'questions[Prageretal.
,2003].
Beyondmanipulationoftemporalexpressions,advancedcontentanalysisprojectsarebeginningtodeneoperationalrequirementsfor,ineffect,temporalreasoning.
Moresophis-ticatedQA,forinstance,needsmorethanjustinformationde-rivedfrom'bare'temporalmarkers[Pustejovskyetal.
,2003;Schilder&Habel,2003].
Intelligenceanalysistypicallyhan-dlescontradictoryinformation,whilelookingformutuallycorroboratingfacts;forthis,temporalrelationswithinsuchaninformationspaceareessential.
Multi-documentsum-marisationcruciallyrequirestemporalorderingovereventsdescribedacrossthecollection.
Atemporalreasonerrequiresaframeworkcapturingthewaysinwhichrelationshipsamongentitiesaredescribedintext,anchoredintime,andrelatedtoeachother.
Relatedarequestionsofdeningarepresentationthatcanaccommodatecomponentsofatemporalstructure,andimplementingatextanalysisprocessforinstantiatingsuchastructure.
Thispaperdescribesanefforttowardsananalyticalframe-workfordetailedtimeinformationextraction.
Wesketchthetemporalreasoningcomponentwhichistheultimate'client'oftheanalysis.
WemotivateourchoiceofTimeML,anemergingstandardforannotationoftemporalinformationintext,asarepresentationalframework;intheprocess,wehigh-lightTimeML'smainfeatures,andcharacteriseamappingfromaTimeML-compliantrepresentationtoanisomorphicsetoftime-pointsandintervalsexpectedbythereasoner.
Wedevelopastrategyfortimeanalysisoftext,asyn-ergisticapproachdeployingbothnite-state(FS)grammarsandmachinelearningtechniques.
Therespectivestrengthsofthesetechnologiesarewellsuitedforthechallengesofthetask:complexityofanalysis,andpaucityofexamplesofTimeML-styleannotation.
AcomplexcascadeofFSgram-marstargetscertaincomponentsofTimeML(timeexpres-sions,inparticular),identiessyntacticcluesformarkingothercomponents(relatedtotemporallinks),andderivesfea-turesforusebymachinelearning.
ThetrainingisonaTime-MLannotatedcorpus;giventhesmall—andthusproblematicfortraining—sizeoftheonly(sofar)availablereferencecor-pus(TimeBank),weincorporatealearningstrategydevelopedtoleveragelargevolumesofunlabeleddata.
Toourknowledge,thisistherstattempttousetherep-resentationalprinciplesofTimeMLforpracticalanalysisoftime.
ThisisalsotherstuseofaTimeMLcorpusasrefer-encedataforimplementingtemporalanalysis.
2Motivation:ReasoningwithTimeWearemotivatedbydevelopingauseful,andreusable,tem-poralanalysisframework,where'downstream'applicationsareenabledtoreasonanddrawinferencesovertimeelements.
Ahybridreasoner[Fikesetal.
,2003],tobedeployedinin-telligenceanalysis,maintainsadirectedgraphoftimepoints,intervalsdenedviastartandendpoints,andtemporalre-lationssuchasBEFORE,AFTER,andEQUALPOINT.
Thegraphisassumedgeneratedviaamappingprocess,exter-naltothereasoner,froma(temporal)textanalysis.
Rela-tionsareoperationalised,andtemporalalgebraevaluatesin-stances,drawsinferenceovergoals,andbroadensabaseofinferredassertionsonthebasisofrelationalaxioms.
Anex-amplewithinthereasoner'sinferentialcapabilityis:(ndinstancesofintsuchthat(duringint2003)).
Reasoningwithrelationssuchasduring(associatinganeventwithaninterval),costarts(associatingtwoevents),in-stantiatedfortheexamplefragment:"On9AugustIranac-cusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccu-sation.
"wouldinfer,onthebasisofpredicateslike:(duringIran-accuses-Taliban-take-hostagesAugust-9-1998)(costartsIran-accuses-Taliban-take-hostagesIran-Taliban-Crisis)thattheanswertothequestion"WhendidtheIranian-Talibancrisisbegin"is"August9,1998".
Detailsofthisinferentialprocessneednotconcernushere.
Weglossoverissueslikeenumeratingtherangeoftempo-ralrelationsandaxioms,describingthereasoner'smodelofevents(e.
g.
Iran-accuses-Taliban-take-hostages),andelabo-ratingitsnotionof'apointintime'(subsumingbothlit-eralexpressionsandeventspecications).
Operationally,aseparatecomponentmapstemporalanalysisresultstoasuitablyneutral,andexpressive,ontologicalrepresentationoftime(DAML-Time[Hobbsetal.
,2002]).
Thisallowsforarepresentationhospitabletorst-orderlogicinferenceformalism—liketheoneassumedinHobbsetal.
—tobekeptseparatefromsurfacetextanalysis:muchlikethetraditionalseparationalongthesyntax-semanticsinterface.
Westartfromthebeliefthattherepresentationfortherea-sonerisderivablefromaTimeML-complianttextanalysis.
2TimeMLisaproposalforannotatingtimeinformation;e.
g.
therstexamplesentenceabovewouldbemarkedupas:2Wearenotalone:workontemporalreasoningfromformalin-ferencepointofviewreachesasimilarconclusion:".
.
.
the[TimeML]annotationschemeitself,duetoitsclosertietosurfacetexts,canbeusedastherstpassinthesyntax-semanticsinterfaceofatemporalresolutionframeworksuchasours.
Themorecomplexrepresenta-tion,suitableformoresophisticatedreasoning,canthenbeobtainedbytranslatingfromtheannotations.
"[Han&Lavie,2004].
On9AugustIranaccusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccusation.
TimeMLisdescribedinSection3.
Essentially,itpromotesexplicitrepresentationandtypingoftimeexpressionsandevents,andanequallyexplicitmechanismforlinkingthesewithtemporallinks,usingavocabularyoftemporalrelations.
Inadditiontoin-linemark-up,explicitlinksaremarked.
Eventinstanceidentiers,ei1,ei2,andei4referto,respec-tively,theaccusationintherstsentence,thecrisis,andthereferenceto"thataccusation"inthesecondsentence.
TherelTypeattributesonthelinkdescriptionsdenetemporalre-lationshipsbetweeneventinstancesandtimeexpressions;inthisparticularexample,anIDENTITYlinkencodestheco-referentialitybetweentheeventinstances(mentions)inthetwosentencesoftheaccusationeventoftheearlierexample.
Itisthecombinationofeventdescriptors,theiranchoringtotimepoints,andthesemanticsofrelationallinks,whichenablethederivationofduringandcostartsassociationsthatthereasonerunderstands.
3TimeML:aMark-upLanguageforTimeMostcontentanalysisapplicationstodatedonotexplicitlyincorporatetemporalreasoning,andtheirneedscanbemetbyanalysisofsimpletimeexpressions(dates,intervals,etc).
ThisislargelythemotivationforTERN'sTIMEX2tag;atthesametimeitexplainswhyTIMEX2isinadequateforsupport-ingtherepresentationalrequirementsoutlinedearlier.
3TimeMLaimsatcapturingtherichnessoftimeinformationindocuments.
Itmarksupmorethanjusttemporalexpres-sions,andfocusesonwaysofsystematicallyanchoringeventpredicatestoatimedenotingexpressions,andonorderingsucheventexpressionsrelativetoeachother.
TimeMLderiveshigherexpressivenessfromexplicitlyse-paratingrepresentationoftemporalexpressionsfromthatofevents.
Timeanalysisisdistributedacrossfourcomponentstructures:TIMEX3,SIGNAL,EVENT,andLINK;allareren-deredastags,withattributes[Sauretal.
,2004].
43ForanotableextensiontoTIMEX2,see[Gaizauskas&Setzer,2002].
AnattempttocodifysomerelationalinformationlinkingtheTIMEXwithanevent,itisstilllimited,bothintermsofscope(onlylinkswithcertainsyntacticshapecanbecaptured)andrepresenta-tionalpower(itishardtoseparateaneventmentionfrompossiblymultipleeventinstances);see[Pustejovskyetal.
,2003].
4Additionally,aMKINSTANCEtagembodiesthedifferencebe-tweeneventtokensandeventinstances:forexample,theanalysisof"MaxtaughtonMondayandTuesday"requirestwodifferentin-stancestobecreatedforateachingEVENT.
Eveniftypicallythereisaone-to-onemappingbetweenanEVENTandaninstance,thelanguagerequiresthatarealisationofthateventiscreated.
TIMEX3extends5theTIMEX2[Ferro,2001]attributes:itcapturestemporalexpressions(commonlycategorisedasDATE,TIME,DURATION),bothliteralandintensionallyspec-ied.
SIGNALtagsare(typically)functionwordsindicativeofrelationshipsbetweentemporalobjects:temporalprepo-sitions(for,during,etc.
)ortemporalconnectives(before,while).
EVENT,inTimeMLnomenclature,isacovertermforsituationsthathappenoroccur;thesecanbepunctual,orlastforaperiodoftime.
TimeMLpositsarenedtypol-ogyofevents[Pustejovskyetal.
,2003].
Allclassesofeventexpressions—tensedverbs,stativeadjectivesandothermodi-ers,eventnominals—aremarkedupwithsuitableattributesontheEVENTtag.
Finally,theLINKtagisusedtoencodeavarietyofrelationsthatexistbetweenthetemporalelementsinadocument,aswellastoestablishanexplicitorderingofevents.
ThreesubtypestotheLINKtagareusedtorepresentstricttemporalrelationshipsbetweeneventsorbetweenaneventandatime(TLINK),subordinationbetweentwoeventsoraneventandasignal(SLINK),andaspectualrelationshipbetweenanaspectualeventanditsargument(ALINK).
TimeML'srichercomponentset,in-linemark-upoftempo-ralprimitives,andnon-consumingtagsfortemporalrelationsacrossarbitrarilylongtextspans,makeithighlycompatiblewiththecurrentparadigmofannotation-basedencapsulationofdocumentanalysis.
4TimeMLandTemporalAnalysisTimeML'sannotation-basedrepresentationfacilitatesintegra-tionoftimeanalysiswiththeanalysisofothersyntacticand/ordiscoursephenomena;italsonaturallysupportsexploita-tionoflargercontextualeffectsbythetemporalparserproper(see4.
4).
Thisisacrucialobservation,giventhatthepromi-nentlyattractivecharacteristicofTimeML—itsintrinsicrich-nessofexpression—makesitchallengingforanalysis.
Therearetwobroadcategoriesofproblemsfordevelop-inganautomatedTimeMLanalyser:ofsubstanceandofin-frastructure.
Substantiveissuesincludenormalisingtimeex-pressionstoacanonicalrepresentation(TIMEX3'svalueat-tribute),identifyingabroadrangeofevents(e.
g.
eventnom-inalsandpredicativeadjectivesactingaseventspeciers),linkingtime-denotingexpressions(typicallyaTIMEX3andanEVENT),andtypingofthoseLINKs.
Theinfrastructureproblems—smallsizeandlessthancon-sistentmark-upoftheTimeBankcorpus—areduetothefactthatthis,rst,versionislargelyasideproductofasmallnum-berofannotatorstryingoutTimeML'sexpressivecapabilities.
TimeBankisthusintendedasareference,andnotfortraining.
Ourhybridapproachtotemporalparsing,combiningnite-state(FS)recognitionwithmachinelearningfromsparsedata(4.
2),islargelymotivatedbythisnatureofTimeBank.
4.
1TheTimeBankcorpusTimeBankhasonly186documents(68.
5Kwords).
Ifweheldout10%ofthecorpusastestdata,wehavebarelyover60Kwordsfortraining.
Belowweshowcountsof5TIMEX2andTIMEX3differsubstantiallyintheirtreatmentofeventanchoringandsetsoftimes.
(EVENT-TIMEX3)TLINK6andEVENTtypes[Sauretal.
,2004].
TLINKexamplesareparticularlysparse;thedataalsoshowshighlyunevendistributionofexamplesofdifferenttypes.
Incomparison,thePennTreeBankcorpusforpart-of-speechtaggingcontains>1Mwords(>16timeslargerthanTIMEBANK);theCoNLL'03namedentitychunkingtrain-ingset(athttp://cnts.
uia.
ac.
be/conll2003/ner/)hasover200Kwordswith23Kexamples(15timesmorethanTLINKexamples)overjust4nameclasses(comparedtothe13TLINKclassesdenedbyTimeML).
TERN'strain-ingset—almost800documents/300Kwords—isconsideredtobesomewhatsparse,withover8KTIMEXexamples.
tlinktype#occurrenceseventtype#occurrencesISINCLUDED866OCCURRENCE4,452DURING146STATE1,181ENDS102REPORTING1,010SIMULTANEOUS69IACTION668ENDEDBY52ISTATE586AFTER41ASPECTUAL295BEGINS37PERCEPTION51BEFORE35INCLUDES29BEGUNBY27IAFTER5IDENTITY5IBEFORE1Total:1,451Total:8,2434.
2AnalyticalstrategyMinimally,thereasonerwouldrequirethattheanalyticalframeworksupportstimestampingandtemporalorderingofevents;thuswetargettheanalysistasksofndingTIMEX3's,assigningcanonicalvalues,markingandtypingEVENTs,andassociating(someofthem)withTIMEX3tags.
TIMEX3expressionsarenaturallyamenabletoFSdescrip-tion.
FSdevicescanalsoencodesomelargercontextfortimeanalysis(temporalconnectivesformarkingputativeevents,clauseboundariesforscopingpossibleevent-timepairs,etc;see4.
4).
Tocomplementsuchanalysis,amachinelearn-ingapproachcancasttheproblemofmarkingEVENTsaschunking.
Recently,[Ando,2004]hasdevelopedaframe-workforexploitinglargeamountsofunannotatedcorporainsupervisedlearningforchunking.
Insuchaframework,mid-to-high-levelsyntacticparsing—typicallyderivedbyFScascades—canproducerichfeaturesforclassiers.
Thus,wecombineFSgrammarsfortemporalexpressions,embeddedinageneralpurposeshallowparser,withmachinelearningtrainedwithTimeBankandunannotatedcorpora.
4.
3FS-basedparserfortemporalexpressionsViewingTIMEX3analysisasaninformationextractiontask,acascadeofnite-stategrammarswithbroadcoverage(com-pileddowntoasingleTIMEX3automatonwith500statesandover16000transitions)targetsabstracttemporalentitiessuch6InallofourexperimentsweexcludeTIMEX3markupinmeta-data;theTLINKcountsonlyreectlinkstotemporalexpressionsinthebodyofdocuments.
asUNIT,POINT,PERIOD,RELATION,etc;thesemaybefur-therdecomposedandtypedintoe.
g.
MONTH,DAY,YEAR(foraUNIT);orINTERVALorDURATION(foraPERIOD).
Fine-grainedanalysisoftemporalexpressions,in-stantiatingattributeslikegranularity,cardinality,refdirection,andsoforth,iscruciallyrequiredfornor-malisingaTIMEX3:representing"thelastveyears"asil-lustratedbelowfacilitatesthederivationofavaluefortheTIMEX3valueattribute.
[timex:[relative:true][ref_direction:past][cardinality:5][granularity:year]]SuchanalysisamountstoaparsetreeundertheTIMEX3.
(Notshownaboveisadditionalinformation,anchoringtheexpressionintothelargerdiscourseandinformingothernormalisationprocesseswhichemitthefullcomplementofTIMEX3attributes—type,temporalFunction,anchorTimeID,etc).
TimeBankdoesnotcontainsuchne-grainedmark-up:thegrammarsthusperformanadditional'discovery'task,forwhichnotrainingdatacurrentlyexists,butwhichisessen-tialfordiscourse-levelpost-processing,handlinge.
g.
ambigu-ousand/orunderspeciedtimeexpressionsortherelationshipbetweendocument-internalanddocument-externaltemporalproperties(suchas'documentcreationtime').
4.
4ShallowparsingforfeaturegenerationInprinciple,substantialdiscourseanalysiscanbecarriedoutfromashallowsyntacticbase,andderivedbymeansofFScascading[Kennedy&Boguraev,1996].
Ourgrammarsin-terleaveshallowparsingwithnamedentityextraction.
Theyspecifytemporalexpressionsintermsoflinguisticunits,asopposedtosimplylexicalcues(asmanytemporaltaggerstodatedo).
Thispointcannotbeover-emphasised.
OneofthecomplexproblemsforTimeMLanalysisisthatofeventiden-tication.
Atemporaltagger,ifnarrowlyfocusedontimeex-pressionsonly(cf.
[Schilder&Habel,2003]),offersnocluestowhateventsarethereinthetext.
Incontrast,atempo-ralparserawareofthesyntaxofatimephraselike"duringthelongandultimatelyunsuccessfulwarinAfghanistan"isveryclosetoknowing—fromcongurationalpropertiesofaprepositionalphrase—thatthenominalargument("war")ofthetemporalpreposition("during")isaneventnominal.
Ultimately,syntacticanalysisbeyondTimeMLcomponentsisusedtoderivefeaturesfortheclassierstaskedwithndingEVENTsandLINKs(Section5).
Featuregenerationtypicallyreliesonamixoflexicalprop-ertiesandsomecongurationalsyntacticinformation(de-pendingonthecomplexityofthetask).
Ourschemeaddi-tionallyneedssomesemantictyping,knowledgeofbound-ariesoflongersyntacticunits(typicallyavarietyofclauses),andsomegrammaticalfunction.
Anexample(simplied)oftheFScascadeoutputis:[Snt[svoClause[tAdjunctIn[NP[timex3the1988periodtimex3]NP]tAdjunct],[SUB[NPthecompanyNP]SUB][VG[GrmEventOccurrenceearnedgrmEventOccurrence]VG][OBJ[NP[Money$20.
6millionMoney]NP]OBJ]svoClause].
.
.
Snt]Mostoftheaboveisself-explanatory,butweemphasiseafewkeypoints.
Theanalysiscapturesthemixofsyntacticchunks,semanticcategories,andTimeMLcomponentsusedforfeaturegeneration.
ItmaintainslocalTIMEX3analysis;thetimeexpressionisinsideofalargerclauseboundary,withinternalgrammaticalfunctionidenticationforsomeoftheeventpredicates.
ThespecicsofmappingcongurationalinformationintofeaturevectorsisdescribedinSection5.
4.
5MachinelearningforTimeMLcomponentsTimeMLparsingisthusabifurcatedprocessofTimeMLcom-ponentsrecognition:TIMEX3'saremarkedbyFSgram-mars;SIGNALs,EVENTsandLINKsareidentiedbyclas-sicationmodelsderivedfromanalysisofbothTimeBankandlargeunannotatedcorpora.
Featuresforthesemodelsarederivedfromcommonstrategiesforexploitinglocalcon-text,aswellasfromminingtheresults—bothmark-upandcongurational—fromtheFSgrammarcascading,asillus-tratedintheprevioussection.
(Moredetailsonfeaturegener-ationfollowinSection5below.
)ClassiersandfeaturevectorsTheclassicationframeworkweadoptforthisworkisbasedonaprincipleofempiricalriskminimization.
Inparticular,weusealinearclassier,whichmakesclassicationdeci-sionsbythresholdinginnerproductsoffeaturevectorsandweightvectors.
Itlearnsweightvectorsbyminimizingclas-sicationerrors(empiricalrisk)onannotatedtrainingdata.
Forourexperiments(Section6),weusetheRobustRiskMinimization(RRM)classier[Zhangetal.
,2002],whichhasbeenshownusefulforanumberoftextanalysistaskssuchassyntacticchunking,namedentitychunking,andpart-of-speechtagging.
Inmarkedcontrasttogenerativemodels,whereassump-tionsaboutfeaturesaretightlycoupledwithalgorithms,RRM—asisthecasewithdiscriminativeanalysis—enjoysclearseparationoffeaturerepresentationfromtheunderlyingalgorithmsfortrainingandclassication.
Thisfacilitatesex-perimentationwithdifferentfeaturerepresentations,sincetheseparationbetweentheseandthealgorithmswhichmanipu-latethemdoesnotrequirechangeinalgorithms.
WeshowhowchoiceoffeaturesaffectsperformanceinSection6.
WordprolingforexploitationofunannotatedcorporaIngeneral,classicationlearningrequiressubstantialamountoflabeleddatafortraining—considerablymorethanwhatTimeBankoffers(cf.
4.
1).
Thischaracteristicofsizeispoten-tiallyalimitingfactorinsupervisedlearningapproaches.
We,however,seektoimproveperformancebyexploitingunan-notatedcorpora,withtheirnaturaladvantagesofsizeandavailability.
Weuseawordprolingtechnique,developedspeciallyforexploitingalargeunannotatedcorpusfortag-ging/chunkingtasks[Ando,2004].
Wordprolingidenties,andextracts,word-characteristicinformationfromunanno-tatedcorpora;itdoesthis,inessence,bycollectingandcom-pressingfeaturefrequenciesfromthecorpus.
Wordprolingturnsco-occurrencecountsofwordsandfeatures(e.
g.
'nextword','headofsubject',etc)intonewfeaturevectors.
Forinstance,observingthat"extinction"and"explosion"areoftenusedassyntacticsubjectto"occur",andthat"earthquakes""happen",helpstopredictthat"ex-plosion","extinction",and"earthquake"allfunctionlikeeventnominals.
Below(6.
1)wedemonstratetheeffective-nessofwordproling,specicallyforEVENTrecognition.
5ImplementationTouseclassiers,oneneedstodesignfeaturevectorrepre-sentationfortheobjectstobeclassied.
Thisentailsselectionofsomepredictiveattributesoftheobjects(ineffectpromot-ingthesetothestatusoffeatures)anddenitionofmappingsbetweenvectordimensionsandthoseattributes(featuremap-ping).
InthissectionwedescribetheessenceofourfeaturedesignforEVENTandTLINKrecognition.
75.
1EVENTrecognitionSimilarlytonamedentitychunking,wecasttheEVENTrecognitiontaskasaproblemofsequentiallabelingoftokensbyencodingchunkinformationintotokentags.
Foragivenclass,thisgeneratesthreetags:E:class(thelast,end,tokenofachunkdenotingamentionofclasstype),I:class(ato-keninsideofachunk),andO(anytokenoutsideofanytargetchunk).
Theexamplesequencebelowindicatesthatthetwotokens"verybad"arespannedbyanevent-stateannotation.
···another/Overy/I:event-statebad/E:event-stateweek/O···Inthisway,theEVENTchunkingtaskbecomesa(2k+1)-wayclassicationoftokenswherekisthenumberofEVENTtypes;thisisfollowedbyaViterbi-styledecoding.
(WeusethesameschemeforSIGNALrecognition.
)ThefeaturerepresentationusedforEVENTextractionex-perimentsmimicstheonedevelopedforacomparativestudyofentityrecognitionwithwordproling[Ando,2004].
Thefeaturesweextractare:token,capitalization,part-of-speech(POS)in3-tokenwindow;bi-gramsofadjacentwordsin5-tokenwindow;wordsinthesamesyntacticchunk;headwordsin3-chunkwindow;worduni-andbi-gramsbasedonsubject-verb-objectandpreposition-nounconstructions;syntacticchunktypes(nounorverbgroupchunksonly);tokentagsin2-tokenwindowtotheleft;tri-gramsofPOS,capitalization,andwordending;tri-gramsofPOS,capitalization,andlefttag.
5.
2TLINKrecognitionTLINKisarelationbetweeneventsandtimeexpressionswhichcanlinktwoEVENTs,twoTIMEX3's,oranEVENTandaTIMEX3.
Presently(see4.
2)wefocusonTLINKsbetweeneventsandtimeexpressions.
Asarelationallink,TLINKdoesnotnaturallytthetag-gingabstractionforachunkingproblem,outlinedabove.
In-stead,weformulateaclassicationtaskasfollows.
AfterpostingEVENTandTIMEX3annotations(bytheeventclassi-erandtheFStemporalparser,respectively),foreachpairing7WedonotdiscussSIGNALrecognitionhere,asthesignaltagitselfcontributesnothingtoEVENTorTLINKrecognition,beyondwhatiscapturedbyalexicalfeatureoverthetemporalconnective.
betweenanEVENTandaTIMEX3,weaskwhetheritisacer-taintypeofTLINK.
Thisdenesa(+1)-wayclassicationproblem,whereisthenumberofTLINKtypes(BEFORE,AFTER,etc;Section4.
1).
Theadjustmentterm'+1'isforthenegativeclass(not-a-temporal-link).
Therelation-extractionnatureofthetaskofpostingTLINKsrequiresadifferentfeaturerepresentation,capableofencodingthesyntacticfunctionoftherelationarguments(EVENTsandTIMEX3's),andsomeofthelargercontextoftheirmentions.
Tothatend,weconsiderthefollowingvepartitions(denedintermsoftokens):spansofarguments(P1orP2);twotokenstotheleft/rightoftheleft/rightargu-ment(Pleft/Pright);andthetokensbetweenthearguments(Pmiddle).
Fromeachpartition,weextracttokensandparts-of-speechasfeatures.
Wealsoconsidersegments(syntacticconstructionsderivedbyFSanalysis:'when-clause','subject',etc)incertainre-lationshiptopartitions:containedinP1,P2,orPmiddle;coveringP1(orP2)butnotoverlappingwithP2(orP1);occurringtotheleftofP1(ortherightofP2);orcoveringbothP1andP2.
Weuseuni-andbi-gramsoftypesofthesesegmentsasfeatures.
Inthisfeaturerepresentation,segmentsplayacrucialrolebycapturingthesyntacticfunctionsofEVENTsandTIMEX3's,aswellasthesyntacticrelationsbetweenthem.
Thusintheexampleanalysisonp.
4,svoClauseisthesmallestsegmentcontainingbothanEVENTandaTIMEX3,indicativeofadirectsyntacticrelationbetweenthetwo.
Inthenextexample,theTIMEX3andEVENTchunksarecon-tainedindifferentclauses(athatClauseandasvoClause,respectively),whichstructurallyprohibitsaTLINKrelationbetweenthetwo.
[SntAnalystshavecomplained[thatClausethat[timex3third-quartertimex3]corporateearningshaven'tbeenverygoodthatClause][svoClause,buttheeffect[eventhitevent].
.
.
svoClause]Snt]ThusourfeaturerepresentationiscapableofcapturingthisinformationviathetypesofthesegmentsthatcontaineachofEVENTandTIMEX3withoutoverlapping.
6ExperimentsWepresenthereperformanceresultsonEVENTandTLINKrecognitiononly.
ThisislargelybecausetheprimaryfocusofthispaperistoreportonhoweffectiveouranalyticalstrategyisinleveragingthereferencenatureofthesmallTimeBankcorpusfortrainingclassiersforTimeML.
Ofthese,SIGNALwasbrieymentionedearlier(seefootnote7),andTIMEX3recognition,drivenbyFSgrammars,belongstoadifferentpaper.
SincethisistherstattempttobuildaTimeML-compliantanalyser(cf.
Section1),therearenocomparableresultsintheliterature.
Theresults(micro-averagedF-measure)reectexperi-mentswithdifferentsettings,againsttheTimeBankcorpus,andproducedby5-foldcrossvalidation.
6.
1EVENTrecognitionItshouldbeclear,bylookingattheexampleanalysis(p.
4),howlocalinformationandsyntacticenvironmentbothcon-tributetothefeaturegenerationprocess.
Figure1showsper-formanceresultswithandwithoutwordprolingforexploit-inganunannotatedcorpus.
Forwordproling,weextractedfeatureswithtypingw/otypingbasic61.
378.
6basic+word-proling64.
0(+2.
7)80.
3(+1.
7)Figure1:Eventextractionresults,with/withouttyping.
Parenthesesshowcontributionofwordproling,overusingbasicfeaturesonly.
featureco-occurrencecountsfrom40Mwordsof1991WallStreetJournal.
Theproposedeventchunksarecountedascorrectonlywhenboththechunkboundariesandeventtypesarecorrect.
Whilewordprolingimprovesperformance,64.
0%F-measureislowerthantypicalperformanceof,forinstance,namedentitychunking.
Ontheotherhand,whenwetraintheEVENTclassierswithouttyping,weobtain80.
3%F-measure.
ThisisindicativeoftheinherentcomplexityoftheEVENTtypingtask.
6.
2TLINKrecognitionInthisexperimentalsetting,weonlyconsiderthepairingsofEVENTandTIMEX3whichappearwithinacertaindistanceinthesamesentences.
8Forcomparison,weimplementthefollowingsimplebase-linemethod.
ConsideringthetextsequenceofEVENTsandTIMEX3's,only'close'pairsofpotentialargumentsarecou-pledwithTLINKs;EVENTeandTIMEX3tarecloseifandonlyifeistheclosestEVENTtotandtistheclosestTIMEX3toe.
Forallotherpairings,notemporalrelationisposted.
Dependingonthe'with-'/'without-typing'setting,thebase-linemethodeithertypestheTLINKasthemostpopulousclassinTimeBank,ISINCLUDED,orsimplymarksitas'itexists'.
ResultsareshowninFigure2.
Clearly,thedetectionofdistance(#oftlinks)featureswithtypingw/otypingdistance≤64tokensbaseline21.
834.
9(1370tlinks)basic52.
174.
1basic+FS53.
1(+1.
0)74.
8(+0.
7)distance≤16tokensbaseline38.
761.
3(1269tlinks)basic52.
875.
8basic+FS54.
3(+1.
5)76.
5(+0.
7)distance≤4tokensbaseline49.
876.
1(789tlinks)basic57.
080.
1basic+FS58.
8(+1.
8)81.
8(+1.
7)Figure2:TLINKextractionresults,with/withouttyping.
Parenthe-sesshowcontributionofgrammar-derivedfeatures,overusingbasiconesonly.
BaselinepostsTLINKsover'close'EVENT/TIMEX3pairs.
temporalrelationsbetweeneventsandtimeexpressionsre-quiresmorethansimplycouplingtheclosestpairswithina8ToevaluatetheTLINKclassieralone,weusetheEVENTandTIMEX3annotationsinTimeBank.
sentence(asthebaselinedoes).
Itisalsoclearthatthebase-linemethodperformspoorly,especiallyforpairingsoverrel-ativelylongdistances.
Forinstance,itproduces34.
9%whenweconsiderthepairingswithin64tokenswithouttyping.
Inthesamesetting,ourmethodproduces74.
8%inF-measure,signicantlyoutperformingthebaseline.
Wecompareperformanceintwotypesoffeaturerepre-sentation:'basic'and'basic+FSgrammar',whichreectthewithout-andwith-segment-typeinformationobtainedbythegrammaranalysis,respectively.
Asthepositivedelta'sshow,congurationalsyntacticinformationcanbeexploitedbene-ciallybyourprocess.
Focusingonwithin-4-tokenspairings,weachieve81.
8%F-measurewithouttypingofTLINKs,and58.
8%withtyping.
(Thetaskwithouttypingisabinaryclas-sicationtodetectwhetherthepairinghasaTLINKrelationornot,regardlessofthetype.
)Asthegureshows,thetaskbecomesharderwhenweconsiderlongerdistancepairings.
Withina64tokendistance,weobtainguresof74.
8%and53.
1%,withoutandwithtypingrespectively.
Whilewearemoderatelysuccessfulindetectingtheex-istenceoftemporalrelations,thenoticeabledifferencesinperformancebetweenthetasksettingswithandwithouttyp-ingindicatethatwearenotassuccessfulindistinguishingonetypefromanother.
Inparticular,therelativelylowper-formanceofTLINKtypinghighlightsthedifcultyindistin-guishingbetweenDURINGandISINCLUDED.
Theguidelines(andcommonsenseanalysis)suggestthatISINCLUDEDtypeshouldbeassignedifthetimepointordu-rationofEVENTisincludedinthedurationoftheassociatedTIMEX3.
DURING,ontheotherhand,shouldbeassignedasatypeifsomerelationrepresentedbytheEVENTholdsduringthedurationoftheTIMEX3.
Wenotethatforthisparticulartypingproblem,thesubtledistinctionsarehardevenforhu-manannotators:theTimeBankcorpusdisplaysanumberofoccasionswhereinconsistenttaggingisevident.
7ConclusionTimeMLisasignicantdevelopmentintimeanalysis,asitcapturesdetailedinformation,anchoredineventualityandlinguisticstructure,andshowntobecrucialinferentialandreasoningtasks.
Inadditiontodeningannotationguidelines,theTimeMLeffortnotablycreatedtherstreferencecorpusillustrativeofexpressivenessofthelanguage.
Unfortunately,thesmallsizeoftheTimeBankcorpuspre-ventsitsstraightforwarduseasatrainingresource,aproblemfurtherexacerbatedbytheinherentcomplexityofTimeML-compliantanalysis.
Andyet,forreasoningenginestofunc-tion,TimeMLanalysersneedtobebuilt.
[Manietal.
,2004]discusssomepioneeringworkinlink-ingeventswithtimes,andorderingevents,indicativeofproductivestrategiesforposting(some)TLINKinformation.
However,thenatureoftheseeffortsissuchthatdifferencesinpremises,representation,andfocusmakeadirectper-formancecomparisonimpossible.
Furthermore,theworkpre-datesTimeML,andcannotbeconvenientlymappedtoTimeBankdata;this,ineffect,precludesaquantitativecom-parisonwithourwork.
InarstsystematicattemptatTimeML-compliantanalysis,andleveragingtheTimeBankcorpus,wehavedevelopedastrategywhichsynergisticallyblendsnite-stateanalysisoverlinguisticannotationswithastate-of-the-artmachinelearningtechnique.
Particularlyeffectiveare:aggressiveanalysis,bycomplexgrammars,ofbothTimeMLcomponentsandsyntac-ticstructure;coupledwithalearningalgorithmcapableoftrainingoverunannotateddata,inadditiontoexploitingar-bitrarilysmallamountsoflabeleddata.
Whileworkremains(notablyreningtheTLINKrecogniser,targetingothertypesofLINKs,andenhancingEVENTrecognitionwithexternallexicalresources),thisisasignicantstepininstantiatingadeepertimeanalysis,capableofsatisfyingtheneedsofrea-soningengines.
References[Ando,2004]R.
K.
Ando.
Exploitingunannotatedcorporafortaggingandchunking.
InProceedingsofACL-04.
[Ferro,2001]L.
Ferro.
TIDES:Instructionmanualforthean-notationoftemporalexpressions.
MTR01W0000046V01,TheMITRECorporation,2001.
[Fikesetal.
,2003]R.
Fikes,J.
Jenkins,andG.
Frank.
JTP:Asystemarchitectureandcomponentlibraryforhybridreasoning.
TRKSL-03-01,StanfordUniversity,2003.
[FilatovaandHovy,2001]E.
Filatova&E.
Hovy.
Assigningtime-stampstoevent-clauses.
InProceedingsofthe10thConferenceoftheEACL,Toulouse,France,2001.
[Gaizauskas&Setzer,2002]R.
GaizauskasandA.
Setzer,editors.
AnnotationStandardsforTemporalInformationinNL,LasPalmas,Spain,2002.
[Han&Lavie,2004]B.
HanandA.
Lavie.
Frameworkforresolutionoftimeinnaturallanguage.
TALIPSpecialIs-sue,SpatialandTemporalInformationProcessing,2004.
[Hobbsetal.
,2002]J.
R.
Hobbs,G.
Ferguson,J.
Allen,P.
Hayes,andA.
Pease.
ADAMLontologyoftime,2002.
[Kennedy&Boguraev,1996]C.
Kennedy&B.
Boguraev.
Anaphoraforeveryone:Pronominalanaphoraresolutionwithoutaparser.
InProceedingsofCOLING-96,Copen-hagen,DK,1996.
[Mani&Wilson,2000]I.
ManiandG.
Wilson.
Robusttem-poralprocessingofnews.
InProceedingsofthe38thAn-nualMeetingoftheACL,HongKong,2000.
[Manietal.
,2003]I.
Mani,B.
Schiffman,andJ.
Zhang.
In-ferringtemporalorderingofeventsinnews.
InProceed-ingsofACL-41(HLT-NAACL),Edmonton,Canada,2003.
[Manietal.
,2004]I.
Mani,J.
Pustejovsky,andB.
Sund-heim.
Introduction:specialissueontemporalinformationprocessing.
ACMTransactionsAsianLanguageInforma-tionProcessing,3(1):1–10,2004.
[Prageretal.
,2003]J.
Prager,J.
Chu-Carroll,E.
Brown,andC.
Czuba.
Questionansweringusingpredictiveannotation.
InAdvancesinQuestionAnswering,2003.
[Pustejovskyetal.
,2003]J.
Pustejovsky,J.
Castano,R.
In-gria,R.
Saur,R.
Gaizauskas,A.
Setzer,G.
Katz,andD.
Radev.
TimeML:Robustspecicationofeventandtemporalexpressionsintext.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages28–34,Stanford,CA,2003.
[Sauretal.
,2004]R.
Saur,J.
Littman,R.
Gaizauskas,A.
Setzer,andJ.
Pustejovsky.
TimeMLannotationguide-lines,Version1.
1,TERQASWorkshop,2004.
[Schilder&Habel,2003]F.
SchilderandC.
Habel.
Tem-poralinformationextractionfortemporalQA.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages35–44,Stanford,CA,2003.
[Zhangetal.
,2002]T.
Zhang,F.
Damerau,andD.
E.
John-son.
TextchunkingbasedonageneralizationofWinnow.
JournalofMachineLearningResearch,2:615–637,2002.
亚洲云Asiayun怎么样?亚洲云成立于2021年,隶属于上海玥悠悠云计算有限公司(Yyyisp),是一家新国人IDC商家,且正规持证IDC/ISP/CDN,商家主要提供数据中心基础服务、互联网业务解决方案,及专属服务器租用、云服务器、云虚拟主机、专属服务器托管、带宽租用等产品和服务。Asiayun提供源自大陆、香港、韩国和美国等地骨干级机房优质资源,包括BGP国际多线网络,CN2点对点直连带宽以...
傲游主机怎么样?傲游主机是一家成立于2010年的老牌国外VPS服务商,在澳大利亚及美国均注册公司,是由在澳洲留学的害羞哥、主机论坛知名版主组长等大佬创建,拥有多家海外直连线路机房资源,提供基于VPS主机和独立服务器租用等,其中VPS基于KVM或者XEN架构,可选机房包括中国香港、美国洛杉矶、韩国、日本、德国、荷兰等,均为CN2或者国内直连优秀线路。傲游主机提供8折优惠码:haixiuge,适用于全...
JUSTG,这个主机商第二个接触到,之前是有介绍到有提供俄罗斯CN2 GIA VPS主机活动的,商家成立时间不久看信息是2020年,公司隶属于一家叫AFRICA CLOUD LIMITED的公司,提供的产品为基于KVM架构VPS主机,数据中心在非洲(南非)、俄罗斯(莫斯科),国内访问双向CN2,线路质量不错。有很多服务商实际上都是国人背景的,有的用英文、繁体搭建的冒充老外,这个服务商不清楚是不是真...
jrzj com为你推荐
什么是电子邮件 什么是电子邮件数码资源网安卓有没有可以离线刷题的软件?硬盘人什么叫“软盘人”和“硬盘人”?vbscript教程请教一下高手们,这个VBS脚本难不难啊,我想学学这个,但是又不知道该从哪入手,希望高手指点指点??qq空间打扮如何打扮QQ空间?怎么升级ios6苹果6怎么升级最新系统ejb开发什么是EJB?mate8价格华为mate8 128g售价多少钱bluestackbluestacks安卓模拟器有什么用分词技术怎样做好百度分词技术和长尾词优化
网页空间租用 域名备案批量查询 cdn服务器 好看的桌面背景图 空间出租 腾讯实名认证中心 metalink 购买国外空间 789电视剧 789 下载速度测试 石家庄服务器 shuangcheng windowsserver2012 wannacry勒索病毒 byebyelove dbank ddos攻击软件 主机声音大 qq空间申请关闭 更多