sentencesjrzj
jrzj com 时间:2021-03-02 阅读:(
)
TimeML-CompliantTextAnalysisforTemporalReasoningBranimirBoguraevandRieKubotaAndoIBMT.
J.
WatsonResearchCenter,19SkylineDrive,Hawthorne,NY10532,USAbran@us.
ibm.
com,rie1@us.
ibm.
comAbstractReasoningwithtime1needsmorethanjustalistoftemporalexpressions.
TimeML—anemergingstandardfortemporalannotationasalanguagecap-turingpropertiesandrelationshipsamongtime-denotingexpressionsandeventsintext—isagoodstartingpointforbridgingthegapbetweentempo-ralanalysisofdocumentsandreasoningwiththeinformationderivedfromthem.
HardasTimeML-compliantanalysisis,thesmallsizeoftheonlycurrentlyavailableannotatedcorpusmakesitevenharder.
WeaddressthisproblemwithahybridTimeMLannotator,whichusescascadednite-stategrammars(fortemporalexpressionanalysis,shal-lowsyntacticparsing,andfeaturegeneration)to-getherwithamachinelearningcomponentcapableofeffectivelyusinglargeamountsofunannotateddata.
1TemporalAnalysisofDocumentsManyinformationextractiontaskslimitanalysisoftimetoidentifyinganarrowclassoftimeexpressions,whichliterallyspecifyatemporalpointoraninterval.
Forin-stance,arecent(2004)ACEtaskisthatoftemporalex-pressionrecognitionandnormalisation(TERN;seehttp://timex2.
mitre.
org/tern.
html).
Ittargetsabsolutedate/timespecications(e.
g.
"June15th,1998"),descriptionsofintervals("threesemesters"),referential(relative)expres-sions("lastweek"),andsoforth.
Afractionofsuchexpres-sionsmayincludearelationalcomponent("thetwoweekssincetheconference","amonthofdelaysfollowingthedis-closure"),makingthemevent-anchored;however,themajor-ityreferonlytowhatinamoresyntacticframeworkwouldbeconsideredasa'temporaladjunct'.
TheTERNtaskthusdoesnotaddressthegeneralquestionofassociatingatimestampwithanevent.
Deeperdocumentanalysisrequiresawarenessoftempo-ralaspectsofdiscourse.
Severalapplicationshaverecentlystartedaddressingsomeissuesoftime.
Documentsummari-sationtacklesidenticationandnormalisationoftimeexpres-1ThisworkwassupportedbytheARDANIMD(NovelIntelli-genceandMassiveData)programPNWD-SW-6059.
sions[Mani&Wilson,2000],timestampingofeventclauses[FilatovaandHovy,2001],andtemporalorderingofeventsinnews[Manietal.
,2003].
Operationalquestionanswering(QA)systemscannow(undercertainconditions)answere.
g.
'when'or'howlong'questions[Prageretal.
,2003].
Beyondmanipulationoftemporalexpressions,advancedcontentanalysisprojectsarebeginningtodeneoperationalrequirementsfor,ineffect,temporalreasoning.
Moresophis-ticatedQA,forinstance,needsmorethanjustinformationde-rivedfrom'bare'temporalmarkers[Pustejovskyetal.
,2003;Schilder&Habel,2003].
Intelligenceanalysistypicallyhan-dlescontradictoryinformation,whilelookingformutuallycorroboratingfacts;forthis,temporalrelationswithinsuchaninformationspaceareessential.
Multi-documentsum-marisationcruciallyrequirestemporalorderingovereventsdescribedacrossthecollection.
Atemporalreasonerrequiresaframeworkcapturingthewaysinwhichrelationshipsamongentitiesaredescribedintext,anchoredintime,andrelatedtoeachother.
Relatedarequestionsofdeningarepresentationthatcanaccommodatecomponentsofatemporalstructure,andimplementingatextanalysisprocessforinstantiatingsuchastructure.
Thispaperdescribesanefforttowardsananalyticalframe-workfordetailedtimeinformationextraction.
Wesketchthetemporalreasoningcomponentwhichistheultimate'client'oftheanalysis.
WemotivateourchoiceofTimeML,anemergingstandardforannotationoftemporalinformationintext,asarepresentationalframework;intheprocess,wehigh-lightTimeML'smainfeatures,andcharacteriseamappingfromaTimeML-compliantrepresentationtoanisomorphicsetoftime-pointsandintervalsexpectedbythereasoner.
Wedevelopastrategyfortimeanalysisoftext,asyn-ergisticapproachdeployingbothnite-state(FS)grammarsandmachinelearningtechniques.
Therespectivestrengthsofthesetechnologiesarewellsuitedforthechallengesofthetask:complexityofanalysis,andpaucityofexamplesofTimeML-styleannotation.
AcomplexcascadeofFSgram-marstargetscertaincomponentsofTimeML(timeexpres-sions,inparticular),identiessyntacticcluesformarkingothercomponents(relatedtotemporallinks),andderivesfea-turesforusebymachinelearning.
ThetrainingisonaTime-MLannotatedcorpus;giventhesmall—andthusproblematicfortraining—sizeoftheonly(sofar)availablereferencecor-pus(TimeBank),weincorporatealearningstrategydevelopedtoleveragelargevolumesofunlabeleddata.
Toourknowledge,thisistherstattempttousetherep-resentationalprinciplesofTimeMLforpracticalanalysisoftime.
ThisisalsotherstuseofaTimeMLcorpusasrefer-encedataforimplementingtemporalanalysis.
2Motivation:ReasoningwithTimeWearemotivatedbydevelopingauseful,andreusable,tem-poralanalysisframework,where'downstream'applicationsareenabledtoreasonanddrawinferencesovertimeelements.
Ahybridreasoner[Fikesetal.
,2003],tobedeployedinin-telligenceanalysis,maintainsadirectedgraphoftimepoints,intervalsdenedviastartandendpoints,andtemporalre-lationssuchasBEFORE,AFTER,andEQUALPOINT.
Thegraphisassumedgeneratedviaamappingprocess,exter-naltothereasoner,froma(temporal)textanalysis.
Rela-tionsareoperationalised,andtemporalalgebraevaluatesin-stances,drawsinferenceovergoals,andbroadensabaseofinferredassertionsonthebasisofrelationalaxioms.
Anex-amplewithinthereasoner'sinferentialcapabilityis:(ndinstancesofintsuchthat(duringint2003)).
Reasoningwithrelationssuchasduring(associatinganeventwithaninterval),costarts(associatingtwoevents),in-stantiatedfortheexamplefragment:"On9AugustIranac-cusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccu-sation.
"wouldinfer,onthebasisofpredicateslike:(duringIran-accuses-Taliban-take-hostagesAugust-9-1998)(costartsIran-accuses-Taliban-take-hostagesIran-Taliban-Crisis)thattheanswertothequestion"WhendidtheIranian-Talibancrisisbegin"is"August9,1998".
Detailsofthisinferentialprocessneednotconcernushere.
Weglossoverissueslikeenumeratingtherangeoftempo-ralrelationsandaxioms,describingthereasoner'smodelofevents(e.
g.
Iran-accuses-Taliban-take-hostages),andelabo-ratingitsnotionof'apointintime'(subsumingbothlit-eralexpressionsandeventspecications).
Operationally,aseparatecomponentmapstemporalanalysisresultstoasuitablyneutral,andexpressive,ontologicalrepresentationoftime(DAML-Time[Hobbsetal.
,2002]).
Thisallowsforarepresentationhospitabletorst-orderlogicinferenceformalism—liketheoneassumedinHobbsetal.
—tobekeptseparatefromsurfacetextanalysis:muchlikethetraditionalseparationalongthesyntax-semanticsinterface.
Westartfromthebeliefthattherepresentationfortherea-sonerisderivablefromaTimeML-complianttextanalysis.
2TimeMLisaproposalforannotatingtimeinformation;e.
g.
therstexamplesentenceabovewouldbemarkedupas:2Wearenotalone:workontemporalreasoningfromformalin-ferencepointofviewreachesasimilarconclusion:".
.
.
the[TimeML]annotationschemeitself,duetoitsclosertietosurfacetexts,canbeusedastherstpassinthesyntax-semanticsinterfaceofatemporalresolutionframeworksuchasours.
Themorecomplexrepresenta-tion,suitableformoresophisticatedreasoning,canthenbeobtainedbytranslatingfromtheannotations.
"[Han&Lavie,2004].
On9AugustIranaccusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccusation.
TimeMLisdescribedinSection3.
Essentially,itpromotesexplicitrepresentationandtypingoftimeexpressionsandevents,andanequallyexplicitmechanismforlinkingthesewithtemporallinks,usingavocabularyoftemporalrelations.
Inadditiontoin-linemark-up,explicitlinksaremarked.
Eventinstanceidentiers,ei1,ei2,andei4referto,respec-tively,theaccusationintherstsentence,thecrisis,andthereferenceto"thataccusation"inthesecondsentence.
TherelTypeattributesonthelinkdescriptionsdenetemporalre-lationshipsbetweeneventinstancesandtimeexpressions;inthisparticularexample,anIDENTITYlinkencodestheco-referentialitybetweentheeventinstances(mentions)inthetwosentencesoftheaccusationeventoftheearlierexample.
Itisthecombinationofeventdescriptors,theiranchoringtotimepoints,andthesemanticsofrelationallinks,whichenablethederivationofduringandcostartsassociationsthatthereasonerunderstands.
3TimeML:aMark-upLanguageforTimeMostcontentanalysisapplicationstodatedonotexplicitlyincorporatetemporalreasoning,andtheirneedscanbemetbyanalysisofsimpletimeexpressions(dates,intervals,etc).
ThisislargelythemotivationforTERN'sTIMEX2tag;atthesametimeitexplainswhyTIMEX2isinadequateforsupport-ingtherepresentationalrequirementsoutlinedearlier.
3TimeMLaimsatcapturingtherichnessoftimeinformationindocuments.
Itmarksupmorethanjusttemporalexpres-sions,andfocusesonwaysofsystematicallyanchoringeventpredicatestoatimedenotingexpressions,andonorderingsucheventexpressionsrelativetoeachother.
TimeMLderiveshigherexpressivenessfromexplicitlyse-paratingrepresentationoftemporalexpressionsfromthatofevents.
Timeanalysisisdistributedacrossfourcomponentstructures:TIMEX3,SIGNAL,EVENT,andLINK;allareren-deredastags,withattributes[Sauretal.
,2004].
43ForanotableextensiontoTIMEX2,see[Gaizauskas&Setzer,2002].
AnattempttocodifysomerelationalinformationlinkingtheTIMEXwithanevent,itisstilllimited,bothintermsofscope(onlylinkswithcertainsyntacticshapecanbecaptured)andrepresenta-tionalpower(itishardtoseparateaneventmentionfrompossiblymultipleeventinstances);see[Pustejovskyetal.
,2003].
4Additionally,aMKINSTANCEtagembodiesthedifferencebe-tweeneventtokensandeventinstances:forexample,theanalysisof"MaxtaughtonMondayandTuesday"requirestwodifferentin-stancestobecreatedforateachingEVENT.
Eveniftypicallythereisaone-to-onemappingbetweenanEVENTandaninstance,thelanguagerequiresthatarealisationofthateventiscreated.
TIMEX3extends5theTIMEX2[Ferro,2001]attributes:itcapturestemporalexpressions(commonlycategorisedasDATE,TIME,DURATION),bothliteralandintensionallyspec-ied.
SIGNALtagsare(typically)functionwordsindicativeofrelationshipsbetweentemporalobjects:temporalprepo-sitions(for,during,etc.
)ortemporalconnectives(before,while).
EVENT,inTimeMLnomenclature,isacovertermforsituationsthathappenoroccur;thesecanbepunctual,orlastforaperiodoftime.
TimeMLpositsarenedtypol-ogyofevents[Pustejovskyetal.
,2003].
Allclassesofeventexpressions—tensedverbs,stativeadjectivesandothermodi-ers,eventnominals—aremarkedupwithsuitableattributesontheEVENTtag.
Finally,theLINKtagisusedtoencodeavarietyofrelationsthatexistbetweenthetemporalelementsinadocument,aswellastoestablishanexplicitorderingofevents.
ThreesubtypestotheLINKtagareusedtorepresentstricttemporalrelationshipsbetweeneventsorbetweenaneventandatime(TLINK),subordinationbetweentwoeventsoraneventandasignal(SLINK),andaspectualrelationshipbetweenanaspectualeventanditsargument(ALINK).
TimeML'srichercomponentset,in-linemark-upoftempo-ralprimitives,andnon-consumingtagsfortemporalrelationsacrossarbitrarilylongtextspans,makeithighlycompatiblewiththecurrentparadigmofannotation-basedencapsulationofdocumentanalysis.
4TimeMLandTemporalAnalysisTimeML'sannotation-basedrepresentationfacilitatesintegra-tionoftimeanalysiswiththeanalysisofothersyntacticand/ordiscoursephenomena;italsonaturallysupportsexploita-tionoflargercontextualeffectsbythetemporalparserproper(see4.
4).
Thisisacrucialobservation,giventhatthepromi-nentlyattractivecharacteristicofTimeML—itsintrinsicrich-nessofexpression—makesitchallengingforanalysis.
Therearetwobroadcategoriesofproblemsfordevelop-inganautomatedTimeMLanalyser:ofsubstanceandofin-frastructure.
Substantiveissuesincludenormalisingtimeex-pressionstoacanonicalrepresentation(TIMEX3'svalueat-tribute),identifyingabroadrangeofevents(e.
g.
eventnom-inalsandpredicativeadjectivesactingaseventspeciers),linkingtime-denotingexpressions(typicallyaTIMEX3andanEVENT),andtypingofthoseLINKs.
Theinfrastructureproblems—smallsizeandlessthancon-sistentmark-upoftheTimeBankcorpus—areduetothefactthatthis,rst,versionislargelyasideproductofasmallnum-berofannotatorstryingoutTimeML'sexpressivecapabilities.
TimeBankisthusintendedasareference,andnotfortraining.
Ourhybridapproachtotemporalparsing,combiningnite-state(FS)recognitionwithmachinelearningfromsparsedata(4.
2),islargelymotivatedbythisnatureofTimeBank.
4.
1TheTimeBankcorpusTimeBankhasonly186documents(68.
5Kwords).
Ifweheldout10%ofthecorpusastestdata,wehavebarelyover60Kwordsfortraining.
Belowweshowcountsof5TIMEX2andTIMEX3differsubstantiallyintheirtreatmentofeventanchoringandsetsoftimes.
(EVENT-TIMEX3)TLINK6andEVENTtypes[Sauretal.
,2004].
TLINKexamplesareparticularlysparse;thedataalsoshowshighlyunevendistributionofexamplesofdifferenttypes.
Incomparison,thePennTreeBankcorpusforpart-of-speechtaggingcontains>1Mwords(>16timeslargerthanTIMEBANK);theCoNLL'03namedentitychunkingtrain-ingset(athttp://cnts.
uia.
ac.
be/conll2003/ner/)hasover200Kwordswith23Kexamples(15timesmorethanTLINKexamples)overjust4nameclasses(comparedtothe13TLINKclassesdenedbyTimeML).
TERN'strain-ingset—almost800documents/300Kwords—isconsideredtobesomewhatsparse,withover8KTIMEXexamples.
tlinktype#occurrenceseventtype#occurrencesISINCLUDED866OCCURRENCE4,452DURING146STATE1,181ENDS102REPORTING1,010SIMULTANEOUS69IACTION668ENDEDBY52ISTATE586AFTER41ASPECTUAL295BEGINS37PERCEPTION51BEFORE35INCLUDES29BEGUNBY27IAFTER5IDENTITY5IBEFORE1Total:1,451Total:8,2434.
2AnalyticalstrategyMinimally,thereasonerwouldrequirethattheanalyticalframeworksupportstimestampingandtemporalorderingofevents;thuswetargettheanalysistasksofndingTIMEX3's,assigningcanonicalvalues,markingandtypingEVENTs,andassociating(someofthem)withTIMEX3tags.
TIMEX3expressionsarenaturallyamenabletoFSdescrip-tion.
FSdevicescanalsoencodesomelargercontextfortimeanalysis(temporalconnectivesformarkingputativeevents,clauseboundariesforscopingpossibleevent-timepairs,etc;see4.
4).
Tocomplementsuchanalysis,amachinelearn-ingapproachcancasttheproblemofmarkingEVENTsaschunking.
Recently,[Ando,2004]hasdevelopedaframe-workforexploitinglargeamountsofunannotatedcorporainsupervisedlearningforchunking.
Insuchaframework,mid-to-high-levelsyntacticparsing—typicallyderivedbyFScascades—canproducerichfeaturesforclassiers.
Thus,wecombineFSgrammarsfortemporalexpressions,embeddedinageneralpurposeshallowparser,withmachinelearningtrainedwithTimeBankandunannotatedcorpora.
4.
3FS-basedparserfortemporalexpressionsViewingTIMEX3analysisasaninformationextractiontask,acascadeofnite-stategrammarswithbroadcoverage(com-pileddowntoasingleTIMEX3automatonwith500statesandover16000transitions)targetsabstracttemporalentitiessuch6InallofourexperimentsweexcludeTIMEX3markupinmeta-data;theTLINKcountsonlyreectlinkstotemporalexpressionsinthebodyofdocuments.
asUNIT,POINT,PERIOD,RELATION,etc;thesemaybefur-therdecomposedandtypedintoe.
g.
MONTH,DAY,YEAR(foraUNIT);orINTERVALorDURATION(foraPERIOD).
Fine-grainedanalysisoftemporalexpressions,in-stantiatingattributeslikegranularity,cardinality,refdirection,andsoforth,iscruciallyrequiredfornor-malisingaTIMEX3:representing"thelastveyears"asil-lustratedbelowfacilitatesthederivationofavaluefortheTIMEX3valueattribute.
[timex:[relative:true][ref_direction:past][cardinality:5][granularity:year]]SuchanalysisamountstoaparsetreeundertheTIMEX3.
(Notshownaboveisadditionalinformation,anchoringtheexpressionintothelargerdiscourseandinformingothernormalisationprocesseswhichemitthefullcomplementofTIMEX3attributes—type,temporalFunction,anchorTimeID,etc).
TimeBankdoesnotcontainsuchne-grainedmark-up:thegrammarsthusperformanadditional'discovery'task,forwhichnotrainingdatacurrentlyexists,butwhichisessen-tialfordiscourse-levelpost-processing,handlinge.
g.
ambigu-ousand/orunderspeciedtimeexpressionsortherelationshipbetweendocument-internalanddocument-externaltemporalproperties(suchas'documentcreationtime').
4.
4ShallowparsingforfeaturegenerationInprinciple,substantialdiscourseanalysiscanbecarriedoutfromashallowsyntacticbase,andderivedbymeansofFScascading[Kennedy&Boguraev,1996].
Ourgrammarsin-terleaveshallowparsingwithnamedentityextraction.
Theyspecifytemporalexpressionsintermsoflinguisticunits,asopposedtosimplylexicalcues(asmanytemporaltaggerstodatedo).
Thispointcannotbeover-emphasised.
OneofthecomplexproblemsforTimeMLanalysisisthatofeventiden-tication.
Atemporaltagger,ifnarrowlyfocusedontimeex-pressionsonly(cf.
[Schilder&Habel,2003]),offersnocluestowhateventsarethereinthetext.
Incontrast,atempo-ralparserawareofthesyntaxofatimephraselike"duringthelongandultimatelyunsuccessfulwarinAfghanistan"isveryclosetoknowing—fromcongurationalpropertiesofaprepositionalphrase—thatthenominalargument("war")ofthetemporalpreposition("during")isaneventnominal.
Ultimately,syntacticanalysisbeyondTimeMLcomponentsisusedtoderivefeaturesfortheclassierstaskedwithndingEVENTsandLINKs(Section5).
Featuregenerationtypicallyreliesonamixoflexicalprop-ertiesandsomecongurationalsyntacticinformation(de-pendingonthecomplexityofthetask).
Ourschemeaddi-tionallyneedssomesemantictyping,knowledgeofbound-ariesoflongersyntacticunits(typicallyavarietyofclauses),andsomegrammaticalfunction.
Anexample(simplied)oftheFScascadeoutputis:[Snt[svoClause[tAdjunctIn[NP[timex3the1988periodtimex3]NP]tAdjunct],[SUB[NPthecompanyNP]SUB][VG[GrmEventOccurrenceearnedgrmEventOccurrence]VG][OBJ[NP[Money$20.
6millionMoney]NP]OBJ]svoClause].
.
.
Snt]Mostoftheaboveisself-explanatory,butweemphasiseafewkeypoints.
Theanalysiscapturesthemixofsyntacticchunks,semanticcategories,andTimeMLcomponentsusedforfeaturegeneration.
ItmaintainslocalTIMEX3analysis;thetimeexpressionisinsideofalargerclauseboundary,withinternalgrammaticalfunctionidenticationforsomeoftheeventpredicates.
ThespecicsofmappingcongurationalinformationintofeaturevectorsisdescribedinSection5.
4.
5MachinelearningforTimeMLcomponentsTimeMLparsingisthusabifurcatedprocessofTimeMLcom-ponentsrecognition:TIMEX3'saremarkedbyFSgram-mars;SIGNALs,EVENTsandLINKsareidentiedbyclas-sicationmodelsderivedfromanalysisofbothTimeBankandlargeunannotatedcorpora.
Featuresforthesemodelsarederivedfromcommonstrategiesforexploitinglocalcon-text,aswellasfromminingtheresults—bothmark-upandcongurational—fromtheFSgrammarcascading,asillus-tratedintheprevioussection.
(Moredetailsonfeaturegener-ationfollowinSection5below.
)ClassiersandfeaturevectorsTheclassicationframeworkweadoptforthisworkisbasedonaprincipleofempiricalriskminimization.
Inparticular,weusealinearclassier,whichmakesclassicationdeci-sionsbythresholdinginnerproductsoffeaturevectorsandweightvectors.
Itlearnsweightvectorsbyminimizingclas-sicationerrors(empiricalrisk)onannotatedtrainingdata.
Forourexperiments(Section6),weusetheRobustRiskMinimization(RRM)classier[Zhangetal.
,2002],whichhasbeenshownusefulforanumberoftextanalysistaskssuchassyntacticchunking,namedentitychunking,andpart-of-speechtagging.
Inmarkedcontrasttogenerativemodels,whereassump-tionsaboutfeaturesaretightlycoupledwithalgorithms,RRM—asisthecasewithdiscriminativeanalysis—enjoysclearseparationoffeaturerepresentationfromtheunderlyingalgorithmsfortrainingandclassication.
Thisfacilitatesex-perimentationwithdifferentfeaturerepresentations,sincetheseparationbetweentheseandthealgorithmswhichmanipu-latethemdoesnotrequirechangeinalgorithms.
WeshowhowchoiceoffeaturesaffectsperformanceinSection6.
WordprolingforexploitationofunannotatedcorporaIngeneral,classicationlearningrequiressubstantialamountoflabeleddatafortraining—considerablymorethanwhatTimeBankoffers(cf.
4.
1).
Thischaracteristicofsizeispoten-tiallyalimitingfactorinsupervisedlearningapproaches.
We,however,seektoimproveperformancebyexploitingunan-notatedcorpora,withtheirnaturaladvantagesofsizeandavailability.
Weuseawordprolingtechnique,developedspeciallyforexploitingalargeunannotatedcorpusfortag-ging/chunkingtasks[Ando,2004].
Wordprolingidenties,andextracts,word-characteristicinformationfromunanno-tatedcorpora;itdoesthis,inessence,bycollectingandcom-pressingfeaturefrequenciesfromthecorpus.
Wordprolingturnsco-occurrencecountsofwordsandfeatures(e.
g.
'nextword','headofsubject',etc)intonewfeaturevectors.
Forinstance,observingthat"extinction"and"explosion"areoftenusedassyntacticsubjectto"occur",andthat"earthquakes""happen",helpstopredictthat"ex-plosion","extinction",and"earthquake"allfunctionlikeeventnominals.
Below(6.
1)wedemonstratetheeffective-nessofwordproling,specicallyforEVENTrecognition.
5ImplementationTouseclassiers,oneneedstodesignfeaturevectorrepre-sentationfortheobjectstobeclassied.
Thisentailsselectionofsomepredictiveattributesoftheobjects(ineffectpromot-ingthesetothestatusoffeatures)anddenitionofmappingsbetweenvectordimensionsandthoseattributes(featuremap-ping).
InthissectionwedescribetheessenceofourfeaturedesignforEVENTandTLINKrecognition.
75.
1EVENTrecognitionSimilarlytonamedentitychunking,wecasttheEVENTrecognitiontaskasaproblemofsequentiallabelingoftokensbyencodingchunkinformationintotokentags.
Foragivenclass,thisgeneratesthreetags:E:class(thelast,end,tokenofachunkdenotingamentionofclasstype),I:class(ato-keninsideofachunk),andO(anytokenoutsideofanytargetchunk).
Theexamplesequencebelowindicatesthatthetwotokens"verybad"arespannedbyanevent-stateannotation.
···another/Overy/I:event-statebad/E:event-stateweek/O···Inthisway,theEVENTchunkingtaskbecomesa(2k+1)-wayclassicationoftokenswherekisthenumberofEVENTtypes;thisisfollowedbyaViterbi-styledecoding.
(WeusethesameschemeforSIGNALrecognition.
)ThefeaturerepresentationusedforEVENTextractionex-perimentsmimicstheonedevelopedforacomparativestudyofentityrecognitionwithwordproling[Ando,2004].
Thefeaturesweextractare:token,capitalization,part-of-speech(POS)in3-tokenwindow;bi-gramsofadjacentwordsin5-tokenwindow;wordsinthesamesyntacticchunk;headwordsin3-chunkwindow;worduni-andbi-gramsbasedonsubject-verb-objectandpreposition-nounconstructions;syntacticchunktypes(nounorverbgroupchunksonly);tokentagsin2-tokenwindowtotheleft;tri-gramsofPOS,capitalization,andwordending;tri-gramsofPOS,capitalization,andlefttag.
5.
2TLINKrecognitionTLINKisarelationbetweeneventsandtimeexpressionswhichcanlinktwoEVENTs,twoTIMEX3's,oranEVENTandaTIMEX3.
Presently(see4.
2)wefocusonTLINKsbetweeneventsandtimeexpressions.
Asarelationallink,TLINKdoesnotnaturallytthetag-gingabstractionforachunkingproblem,outlinedabove.
In-stead,weformulateaclassicationtaskasfollows.
AfterpostingEVENTandTIMEX3annotations(bytheeventclassi-erandtheFStemporalparser,respectively),foreachpairing7WedonotdiscussSIGNALrecognitionhere,asthesignaltagitselfcontributesnothingtoEVENTorTLINKrecognition,beyondwhatiscapturedbyalexicalfeatureoverthetemporalconnective.
betweenanEVENTandaTIMEX3,weaskwhetheritisacer-taintypeofTLINK.
Thisdenesa(+1)-wayclassicationproblem,whereisthenumberofTLINKtypes(BEFORE,AFTER,etc;Section4.
1).
Theadjustmentterm'+1'isforthenegativeclass(not-a-temporal-link).
Therelation-extractionnatureofthetaskofpostingTLINKsrequiresadifferentfeaturerepresentation,capableofencodingthesyntacticfunctionoftherelationarguments(EVENTsandTIMEX3's),andsomeofthelargercontextoftheirmentions.
Tothatend,weconsiderthefollowingvepartitions(denedintermsoftokens):spansofarguments(P1orP2);twotokenstotheleft/rightoftheleft/rightargu-ment(Pleft/Pright);andthetokensbetweenthearguments(Pmiddle).
Fromeachpartition,weextracttokensandparts-of-speechasfeatures.
Wealsoconsidersegments(syntacticconstructionsderivedbyFSanalysis:'when-clause','subject',etc)incertainre-lationshiptopartitions:containedinP1,P2,orPmiddle;coveringP1(orP2)butnotoverlappingwithP2(orP1);occurringtotheleftofP1(ortherightofP2);orcoveringbothP1andP2.
Weuseuni-andbi-gramsoftypesofthesesegmentsasfeatures.
Inthisfeaturerepresentation,segmentsplayacrucialrolebycapturingthesyntacticfunctionsofEVENTsandTIMEX3's,aswellasthesyntacticrelationsbetweenthem.
Thusintheexampleanalysisonp.
4,svoClauseisthesmallestsegmentcontainingbothanEVENTandaTIMEX3,indicativeofadirectsyntacticrelationbetweenthetwo.
Inthenextexample,theTIMEX3andEVENTchunksarecon-tainedindifferentclauses(athatClauseandasvoClause,respectively),whichstructurallyprohibitsaTLINKrelationbetweenthetwo.
[SntAnalystshavecomplained[thatClausethat[timex3third-quartertimex3]corporateearningshaven'tbeenverygoodthatClause][svoClause,buttheeffect[eventhitevent].
.
.
svoClause]Snt]ThusourfeaturerepresentationiscapableofcapturingthisinformationviathetypesofthesegmentsthatcontaineachofEVENTandTIMEX3withoutoverlapping.
6ExperimentsWepresenthereperformanceresultsonEVENTandTLINKrecognitiononly.
ThisislargelybecausetheprimaryfocusofthispaperistoreportonhoweffectiveouranalyticalstrategyisinleveragingthereferencenatureofthesmallTimeBankcorpusfortrainingclassiersforTimeML.
Ofthese,SIGNALwasbrieymentionedearlier(seefootnote7),andTIMEX3recognition,drivenbyFSgrammars,belongstoadifferentpaper.
SincethisistherstattempttobuildaTimeML-compliantanalyser(cf.
Section1),therearenocomparableresultsintheliterature.
Theresults(micro-averagedF-measure)reectexperi-mentswithdifferentsettings,againsttheTimeBankcorpus,andproducedby5-foldcrossvalidation.
6.
1EVENTrecognitionItshouldbeclear,bylookingattheexampleanalysis(p.
4),howlocalinformationandsyntacticenvironmentbothcon-tributetothefeaturegenerationprocess.
Figure1showsper-formanceresultswithandwithoutwordprolingforexploit-inganunannotatedcorpus.
Forwordproling,weextractedfeatureswithtypingw/otypingbasic61.
378.
6basic+word-proling64.
0(+2.
7)80.
3(+1.
7)Figure1:Eventextractionresults,with/withouttyping.
Parenthesesshowcontributionofwordproling,overusingbasicfeaturesonly.
featureco-occurrencecountsfrom40Mwordsof1991WallStreetJournal.
Theproposedeventchunksarecountedascorrectonlywhenboththechunkboundariesandeventtypesarecorrect.
Whilewordprolingimprovesperformance,64.
0%F-measureislowerthantypicalperformanceof,forinstance,namedentitychunking.
Ontheotherhand,whenwetraintheEVENTclassierswithouttyping,weobtain80.
3%F-measure.
ThisisindicativeoftheinherentcomplexityoftheEVENTtypingtask.
6.
2TLINKrecognitionInthisexperimentalsetting,weonlyconsiderthepairingsofEVENTandTIMEX3whichappearwithinacertaindistanceinthesamesentences.
8Forcomparison,weimplementthefollowingsimplebase-linemethod.
ConsideringthetextsequenceofEVENTsandTIMEX3's,only'close'pairsofpotentialargumentsarecou-pledwithTLINKs;EVENTeandTIMEX3tarecloseifandonlyifeistheclosestEVENTtotandtistheclosestTIMEX3toe.
Forallotherpairings,notemporalrelationisposted.
Dependingonthe'with-'/'without-typing'setting,thebase-linemethodeithertypestheTLINKasthemostpopulousclassinTimeBank,ISINCLUDED,orsimplymarksitas'itexists'.
ResultsareshowninFigure2.
Clearly,thedetectionofdistance(#oftlinks)featureswithtypingw/otypingdistance≤64tokensbaseline21.
834.
9(1370tlinks)basic52.
174.
1basic+FS53.
1(+1.
0)74.
8(+0.
7)distance≤16tokensbaseline38.
761.
3(1269tlinks)basic52.
875.
8basic+FS54.
3(+1.
5)76.
5(+0.
7)distance≤4tokensbaseline49.
876.
1(789tlinks)basic57.
080.
1basic+FS58.
8(+1.
8)81.
8(+1.
7)Figure2:TLINKextractionresults,with/withouttyping.
Parenthe-sesshowcontributionofgrammar-derivedfeatures,overusingbasiconesonly.
BaselinepostsTLINKsover'close'EVENT/TIMEX3pairs.
temporalrelationsbetweeneventsandtimeexpressionsre-quiresmorethansimplycouplingtheclosestpairswithina8ToevaluatetheTLINKclassieralone,weusetheEVENTandTIMEX3annotationsinTimeBank.
sentence(asthebaselinedoes).
Itisalsoclearthatthebase-linemethodperformspoorly,especiallyforpairingsoverrel-ativelylongdistances.
Forinstance,itproduces34.
9%whenweconsiderthepairingswithin64tokenswithouttyping.
Inthesamesetting,ourmethodproduces74.
8%inF-measure,signicantlyoutperformingthebaseline.
Wecompareperformanceintwotypesoffeaturerepre-sentation:'basic'and'basic+FSgrammar',whichreectthewithout-andwith-segment-typeinformationobtainedbythegrammaranalysis,respectively.
Asthepositivedelta'sshow,congurationalsyntacticinformationcanbeexploitedbene-ciallybyourprocess.
Focusingonwithin-4-tokenspairings,weachieve81.
8%F-measurewithouttypingofTLINKs,and58.
8%withtyping.
(Thetaskwithouttypingisabinaryclas-sicationtodetectwhetherthepairinghasaTLINKrelationornot,regardlessofthetype.
)Asthegureshows,thetaskbecomesharderwhenweconsiderlongerdistancepairings.
Withina64tokendistance,weobtainguresof74.
8%and53.
1%,withoutandwithtypingrespectively.
Whilewearemoderatelysuccessfulindetectingtheex-istenceoftemporalrelations,thenoticeabledifferencesinperformancebetweenthetasksettingswithandwithouttyp-ingindicatethatwearenotassuccessfulindistinguishingonetypefromanother.
Inparticular,therelativelylowper-formanceofTLINKtypinghighlightsthedifcultyindistin-guishingbetweenDURINGandISINCLUDED.
Theguidelines(andcommonsenseanalysis)suggestthatISINCLUDEDtypeshouldbeassignedifthetimepointordu-rationofEVENTisincludedinthedurationoftheassociatedTIMEX3.
DURING,ontheotherhand,shouldbeassignedasatypeifsomerelationrepresentedbytheEVENTholdsduringthedurationoftheTIMEX3.
Wenotethatforthisparticulartypingproblem,thesubtledistinctionsarehardevenforhu-manannotators:theTimeBankcorpusdisplaysanumberofoccasionswhereinconsistenttaggingisevident.
7ConclusionTimeMLisasignicantdevelopmentintimeanalysis,asitcapturesdetailedinformation,anchoredineventualityandlinguisticstructure,andshowntobecrucialinferentialandreasoningtasks.
Inadditiontodeningannotationguidelines,theTimeMLeffortnotablycreatedtherstreferencecorpusillustrativeofexpressivenessofthelanguage.
Unfortunately,thesmallsizeoftheTimeBankcorpuspre-ventsitsstraightforwarduseasatrainingresource,aproblemfurtherexacerbatedbytheinherentcomplexityofTimeML-compliantanalysis.
Andyet,forreasoningenginestofunc-tion,TimeMLanalysersneedtobebuilt.
[Manietal.
,2004]discusssomepioneeringworkinlink-ingeventswithtimes,andorderingevents,indicativeofproductivestrategiesforposting(some)TLINKinformation.
However,thenatureoftheseeffortsissuchthatdifferencesinpremises,representation,andfocusmakeadirectper-formancecomparisonimpossible.
Furthermore,theworkpre-datesTimeML,andcannotbeconvenientlymappedtoTimeBankdata;this,ineffect,precludesaquantitativecom-parisonwithourwork.
InarstsystematicattemptatTimeML-compliantanalysis,andleveragingtheTimeBankcorpus,wehavedevelopedastrategywhichsynergisticallyblendsnite-stateanalysisoverlinguisticannotationswithastate-of-the-artmachinelearningtechnique.
Particularlyeffectiveare:aggressiveanalysis,bycomplexgrammars,ofbothTimeMLcomponentsandsyntac-ticstructure;coupledwithalearningalgorithmcapableoftrainingoverunannotateddata,inadditiontoexploitingar-bitrarilysmallamountsoflabeleddata.
Whileworkremains(notablyreningtheTLINKrecogniser,targetingothertypesofLINKs,andenhancingEVENTrecognitionwithexternallexicalresources),thisisasignicantstepininstantiatingadeepertimeanalysis,capableofsatisfyingtheneedsofrea-soningengines.
References[Ando,2004]R.
K.
Ando.
Exploitingunannotatedcorporafortaggingandchunking.
InProceedingsofACL-04.
[Ferro,2001]L.
Ferro.
TIDES:Instructionmanualforthean-notationoftemporalexpressions.
MTR01W0000046V01,TheMITRECorporation,2001.
[Fikesetal.
,2003]R.
Fikes,J.
Jenkins,andG.
Frank.
JTP:Asystemarchitectureandcomponentlibraryforhybridreasoning.
TRKSL-03-01,StanfordUniversity,2003.
[FilatovaandHovy,2001]E.
Filatova&E.
Hovy.
Assigningtime-stampstoevent-clauses.
InProceedingsofthe10thConferenceoftheEACL,Toulouse,France,2001.
[Gaizauskas&Setzer,2002]R.
GaizauskasandA.
Setzer,editors.
AnnotationStandardsforTemporalInformationinNL,LasPalmas,Spain,2002.
[Han&Lavie,2004]B.
HanandA.
Lavie.
Frameworkforresolutionoftimeinnaturallanguage.
TALIPSpecialIs-sue,SpatialandTemporalInformationProcessing,2004.
[Hobbsetal.
,2002]J.
R.
Hobbs,G.
Ferguson,J.
Allen,P.
Hayes,andA.
Pease.
ADAMLontologyoftime,2002.
[Kennedy&Boguraev,1996]C.
Kennedy&B.
Boguraev.
Anaphoraforeveryone:Pronominalanaphoraresolutionwithoutaparser.
InProceedingsofCOLING-96,Copen-hagen,DK,1996.
[Mani&Wilson,2000]I.
ManiandG.
Wilson.
Robusttem-poralprocessingofnews.
InProceedingsofthe38thAn-nualMeetingoftheACL,HongKong,2000.
[Manietal.
,2003]I.
Mani,B.
Schiffman,andJ.
Zhang.
In-ferringtemporalorderingofeventsinnews.
InProceed-ingsofACL-41(HLT-NAACL),Edmonton,Canada,2003.
[Manietal.
,2004]I.
Mani,J.
Pustejovsky,andB.
Sund-heim.
Introduction:specialissueontemporalinformationprocessing.
ACMTransactionsAsianLanguageInforma-tionProcessing,3(1):1–10,2004.
[Prageretal.
,2003]J.
Prager,J.
Chu-Carroll,E.
Brown,andC.
Czuba.
Questionansweringusingpredictiveannotation.
InAdvancesinQuestionAnswering,2003.
[Pustejovskyetal.
,2003]J.
Pustejovsky,J.
Castano,R.
In-gria,R.
Saur,R.
Gaizauskas,A.
Setzer,G.
Katz,andD.
Radev.
TimeML:Robustspecicationofeventandtemporalexpressionsintext.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages28–34,Stanford,CA,2003.
[Sauretal.
,2004]R.
Saur,J.
Littman,R.
Gaizauskas,A.
Setzer,andJ.
Pustejovsky.
TimeMLannotationguide-lines,Version1.
1,TERQASWorkshop,2004.
[Schilder&Habel,2003]F.
SchilderandC.
Habel.
Tem-poralinformationextractionfortemporalQA.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages35–44,Stanford,CA,2003.
[Zhangetal.
,2002]T.
Zhang,F.
Damerau,andD.
E.
John-son.
TextchunkingbasedonageneralizationofWinnow.
JournalofMachineLearningResearch,2:615–637,2002.
易探云怎么样?易探云最早是主攻香港云服务器的品牌商家,由于之前香港云服务器性价比高、稳定性不错获得了不少用户的支持。易探云推出大量香港云服务器,采用BGP、CN2线路,机房有香港九龙、香港新界、香港沙田、香港葵湾等,香港1核1G低至18元/月,183.60元/年,老站长建站推荐香港2核4G5M+10G数据盘仅799元/年,性价比超强,关键是延迟全球为50ms左右,适合国内境外外贸行业网站等,如果需...
老薛主机,虽然是第一次分享这个商家的信息,但是这个商家实际上也有存在有一些年头。看到商家有在进行夏季促销,比如我们很多网友可能有需要的香港VPS主机季度及以上可以半价优惠,如果有在选择不同主机商的香港机房的可以看看老薛主机商家的香港VPS。如果没有记错的话,早年这个商家是主营个人网站虚拟主机业务的,还算不错在异常激烈的市场中生存到现在,应该算是在众多商家中早期积累到一定的用户群的,主打小众个人网站...
昨天我们很多小伙伴们应该都有看到,包括有隔壁的一些博主们都有发布Vultr商家新的新用户注册福利活动。以前是有赠送100美元有效期30天的,这次改成有效期14天。早年才开始的时候有效期是60天的,这个是商家行为,主要还是吸引到我们后续的充值使用,毕竟他们的体验金赠送,在同类商家中算是比较大方的。昨天活动内容:重新调整Vultr新注册用户赠送100美元奖励金有效期14天今天早上群里的朋友告诉我,两年...
jrzj com为你推荐
邮箱怎么写正确的邮箱地址怎么写免费开通黄钻能免费开通黄钻吗??怎么在qq空间里添加背景音乐怎么在QQ空间里免费添加背景音乐????微信如何建群微信怎么建立群滚动代码来回滚动代码网易公开课怎么下载手机上的网易公开课的付费课程怎么下载??????天天酷跑刷金币天天酷跑如何刷分刷金币?ios7固件下载iphone自动下载IOS7固件版本怎么删除数据库损坏数据库坏了,怎么修复?宕机何谓宕机?
高防服务器租用选锐一 国内vps 什么是域名解析 directspace godaddy续费优惠码 NetSpeeder 52测评网 40g硬盘 申请个人网站 赞助 四核服务器 视频服务器是什么 网通服务器 英雄联盟台服官网 贵阳电信测速 日本代理ip 云服务是什么意思 域名和主机 闪讯网 hdsky 更多