sentencesjrzj

jrzj com  时间:2021-03-02  阅读:()
TimeML-CompliantTextAnalysisforTemporalReasoningBranimirBoguraevandRieKubotaAndoIBMT.
J.
WatsonResearchCenter,19SkylineDrive,Hawthorne,NY10532,USAbran@us.
ibm.
com,rie1@us.
ibm.
comAbstractReasoningwithtime1needsmorethanjustalistoftemporalexpressions.
TimeML—anemergingstandardfortemporalannotationasalanguagecap-turingpropertiesandrelationshipsamongtime-denotingexpressionsandeventsintext—isagoodstartingpointforbridgingthegapbetweentempo-ralanalysisofdocumentsandreasoningwiththeinformationderivedfromthem.
HardasTimeML-compliantanalysisis,thesmallsizeoftheonlycurrentlyavailableannotatedcorpusmakesitevenharder.
WeaddressthisproblemwithahybridTimeMLannotator,whichusescascadednite-stategrammars(fortemporalexpressionanalysis,shal-lowsyntacticparsing,andfeaturegeneration)to-getherwithamachinelearningcomponentcapableofeffectivelyusinglargeamountsofunannotateddata.
1TemporalAnalysisofDocumentsManyinformationextractiontaskslimitanalysisoftimetoidentifyinganarrowclassoftimeexpressions,whichliterallyspecifyatemporalpointoraninterval.
Forin-stance,arecent(2004)ACEtaskisthatoftemporalex-pressionrecognitionandnormalisation(TERN;seehttp://timex2.
mitre.
org/tern.
html).
Ittargetsabsolutedate/timespecications(e.
g.
"June15th,1998"),descriptionsofintervals("threesemesters"),referential(relative)expres-sions("lastweek"),andsoforth.
Afractionofsuchexpres-sionsmayincludearelationalcomponent("thetwoweekssincetheconference","amonthofdelaysfollowingthedis-closure"),makingthemevent-anchored;however,themajor-ityreferonlytowhatinamoresyntacticframeworkwouldbeconsideredasa'temporaladjunct'.
TheTERNtaskthusdoesnotaddressthegeneralquestionofassociatingatimestampwithanevent.
Deeperdocumentanalysisrequiresawarenessoftempo-ralaspectsofdiscourse.
Severalapplicationshaverecentlystartedaddressingsomeissuesoftime.
Documentsummari-sationtacklesidenticationandnormalisationoftimeexpres-1ThisworkwassupportedbytheARDANIMD(NovelIntelli-genceandMassiveData)programPNWD-SW-6059.
sions[Mani&Wilson,2000],timestampingofeventclauses[FilatovaandHovy,2001],andtemporalorderingofeventsinnews[Manietal.
,2003].
Operationalquestionanswering(QA)systemscannow(undercertainconditions)answere.
g.
'when'or'howlong'questions[Prageretal.
,2003].
Beyondmanipulationoftemporalexpressions,advancedcontentanalysisprojectsarebeginningtodeneoperationalrequirementsfor,ineffect,temporalreasoning.
Moresophis-ticatedQA,forinstance,needsmorethanjustinformationde-rivedfrom'bare'temporalmarkers[Pustejovskyetal.
,2003;Schilder&Habel,2003].
Intelligenceanalysistypicallyhan-dlescontradictoryinformation,whilelookingformutuallycorroboratingfacts;forthis,temporalrelationswithinsuchaninformationspaceareessential.
Multi-documentsum-marisationcruciallyrequirestemporalorderingovereventsdescribedacrossthecollection.
Atemporalreasonerrequiresaframeworkcapturingthewaysinwhichrelationshipsamongentitiesaredescribedintext,anchoredintime,andrelatedtoeachother.
Relatedarequestionsofdeningarepresentationthatcanaccommodatecomponentsofatemporalstructure,andimplementingatextanalysisprocessforinstantiatingsuchastructure.
Thispaperdescribesanefforttowardsananalyticalframe-workfordetailedtimeinformationextraction.
Wesketchthetemporalreasoningcomponentwhichistheultimate'client'oftheanalysis.
WemotivateourchoiceofTimeML,anemergingstandardforannotationoftemporalinformationintext,asarepresentationalframework;intheprocess,wehigh-lightTimeML'smainfeatures,andcharacteriseamappingfromaTimeML-compliantrepresentationtoanisomorphicsetoftime-pointsandintervalsexpectedbythereasoner.
Wedevelopastrategyfortimeanalysisoftext,asyn-ergisticapproachdeployingbothnite-state(FS)grammarsandmachinelearningtechniques.
Therespectivestrengthsofthesetechnologiesarewellsuitedforthechallengesofthetask:complexityofanalysis,andpaucityofexamplesofTimeML-styleannotation.
AcomplexcascadeofFSgram-marstargetscertaincomponentsofTimeML(timeexpres-sions,inparticular),identiessyntacticcluesformarkingothercomponents(relatedtotemporallinks),andderivesfea-turesforusebymachinelearning.
ThetrainingisonaTime-MLannotatedcorpus;giventhesmall—andthusproblematicfortraining—sizeoftheonly(sofar)availablereferencecor-pus(TimeBank),weincorporatealearningstrategydevelopedtoleveragelargevolumesofunlabeleddata.
Toourknowledge,thisistherstattempttousetherep-resentationalprinciplesofTimeMLforpracticalanalysisoftime.
ThisisalsotherstuseofaTimeMLcorpusasrefer-encedataforimplementingtemporalanalysis.
2Motivation:ReasoningwithTimeWearemotivatedbydevelopingauseful,andreusable,tem-poralanalysisframework,where'downstream'applicationsareenabledtoreasonanddrawinferencesovertimeelements.
Ahybridreasoner[Fikesetal.
,2003],tobedeployedinin-telligenceanalysis,maintainsadirectedgraphoftimepoints,intervalsdenedviastartandendpoints,andtemporalre-lationssuchasBEFORE,AFTER,andEQUALPOINT.
Thegraphisassumedgeneratedviaamappingprocess,exter-naltothereasoner,froma(temporal)textanalysis.
Rela-tionsareoperationalised,andtemporalalgebraevaluatesin-stances,drawsinferenceovergoals,andbroadensabaseofinferredassertionsonthebasisofrelationalaxioms.
Anex-amplewithinthereasoner'sinferentialcapabilityis:(ndinstancesofintsuchthat(duringint2003)).
Reasoningwithrelationssuchasduring(associatinganeventwithaninterval),costarts(associatingtwoevents),in-stantiatedfortheexamplefragment:"On9AugustIranac-cusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccu-sation.
"wouldinfer,onthebasisofpredicateslike:(duringIran-accuses-Taliban-take-hostagesAugust-9-1998)(costartsIran-accuses-Taliban-take-hostagesIran-Taliban-Crisis)thattheanswertothequestion"WhendidtheIranian-Talibancrisisbegin"is"August9,1998".
Detailsofthisinferentialprocessneednotconcernushere.
Weglossoverissueslikeenumeratingtherangeoftempo-ralrelationsandaxioms,describingthereasoner'smodelofevents(e.
g.
Iran-accuses-Taliban-take-hostages),andelabo-ratingitsnotionof'apointintime'(subsumingbothlit-eralexpressionsandeventspecications).
Operationally,aseparatecomponentmapstemporalanalysisresultstoasuitablyneutral,andexpressive,ontologicalrepresentationoftime(DAML-Time[Hobbsetal.
,2002]).
Thisallowsforarepresentationhospitabletorst-orderlogicinferenceformalism—liketheoneassumedinHobbsetal.
—tobekeptseparatefromsurfacetextanalysis:muchlikethetraditionalseparationalongthesyntax-semanticsinterface.
Westartfromthebeliefthattherepresentationfortherea-sonerisderivablefromaTimeML-complianttextanalysis.
2TimeMLisaproposalforannotatingtimeinformation;e.
g.
therstexamplesentenceabovewouldbemarkedupas:2Wearenotalone:workontemporalreasoningfromformalin-ferencepointofviewreachesasimilarconclusion:".
.
.
the[TimeML]annotationschemeitself,duetoitsclosertietosurfacetexts,canbeusedastherstpassinthesyntax-semanticsinterfaceofatemporalresolutionframeworksuchasours.
Themorecomplexrepresenta-tion,suitableformoresophisticatedreasoning,canthenbeobtainedbytranslatingfromtheannotations.
"[Han&Lavie,2004].
On9AugustIranaccusestheTalibanoftaking9diplomatsand35truckdrivershostageinMazar-e-Sharif.
Thecrisisbeganwiththataccusation.
TimeMLisdescribedinSection3.
Essentially,itpromotesexplicitrepresentationandtypingoftimeexpressionsandevents,andanequallyexplicitmechanismforlinkingthesewithtemporallinks,usingavocabularyoftemporalrelations.
Inadditiontoin-linemark-up,explicitlinksaremarked.
Eventinstanceidentiers,ei1,ei2,andei4referto,respec-tively,theaccusationintherstsentence,thecrisis,andthereferenceto"thataccusation"inthesecondsentence.
TherelTypeattributesonthelinkdescriptionsdenetemporalre-lationshipsbetweeneventinstancesandtimeexpressions;inthisparticularexample,anIDENTITYlinkencodestheco-referentialitybetweentheeventinstances(mentions)inthetwosentencesoftheaccusationeventoftheearlierexample.
Itisthecombinationofeventdescriptors,theiranchoringtotimepoints,andthesemanticsofrelationallinks,whichenablethederivationofduringandcostartsassociationsthatthereasonerunderstands.
3TimeML:aMark-upLanguageforTimeMostcontentanalysisapplicationstodatedonotexplicitlyincorporatetemporalreasoning,andtheirneedscanbemetbyanalysisofsimpletimeexpressions(dates,intervals,etc).
ThisislargelythemotivationforTERN'sTIMEX2tag;atthesametimeitexplainswhyTIMEX2isinadequateforsupport-ingtherepresentationalrequirementsoutlinedearlier.
3TimeMLaimsatcapturingtherichnessoftimeinformationindocuments.
Itmarksupmorethanjusttemporalexpres-sions,andfocusesonwaysofsystematicallyanchoringeventpredicatestoatimedenotingexpressions,andonorderingsucheventexpressionsrelativetoeachother.
TimeMLderiveshigherexpressivenessfromexplicitlyse-paratingrepresentationoftemporalexpressionsfromthatofevents.
Timeanalysisisdistributedacrossfourcomponentstructures:TIMEX3,SIGNAL,EVENT,andLINK;allareren-deredastags,withattributes[Sauretal.
,2004].
43ForanotableextensiontoTIMEX2,see[Gaizauskas&Setzer,2002].
AnattempttocodifysomerelationalinformationlinkingtheTIMEXwithanevent,itisstilllimited,bothintermsofscope(onlylinkswithcertainsyntacticshapecanbecaptured)andrepresenta-tionalpower(itishardtoseparateaneventmentionfrompossiblymultipleeventinstances);see[Pustejovskyetal.
,2003].
4Additionally,aMKINSTANCEtagembodiesthedifferencebe-tweeneventtokensandeventinstances:forexample,theanalysisof"MaxtaughtonMondayandTuesday"requirestwodifferentin-stancestobecreatedforateachingEVENT.
Eveniftypicallythereisaone-to-onemappingbetweenanEVENTandaninstance,thelanguagerequiresthatarealisationofthateventiscreated.
TIMEX3extends5theTIMEX2[Ferro,2001]attributes:itcapturestemporalexpressions(commonlycategorisedasDATE,TIME,DURATION),bothliteralandintensionallyspec-ied.
SIGNALtagsare(typically)functionwordsindicativeofrelationshipsbetweentemporalobjects:temporalprepo-sitions(for,during,etc.
)ortemporalconnectives(before,while).
EVENT,inTimeMLnomenclature,isacovertermforsituationsthathappenoroccur;thesecanbepunctual,orlastforaperiodoftime.
TimeMLpositsarenedtypol-ogyofevents[Pustejovskyetal.
,2003].
Allclassesofeventexpressions—tensedverbs,stativeadjectivesandothermodi-ers,eventnominals—aremarkedupwithsuitableattributesontheEVENTtag.
Finally,theLINKtagisusedtoencodeavarietyofrelationsthatexistbetweenthetemporalelementsinadocument,aswellastoestablishanexplicitorderingofevents.
ThreesubtypestotheLINKtagareusedtorepresentstricttemporalrelationshipsbetweeneventsorbetweenaneventandatime(TLINK),subordinationbetweentwoeventsoraneventandasignal(SLINK),andaspectualrelationshipbetweenanaspectualeventanditsargument(ALINK).
TimeML'srichercomponentset,in-linemark-upoftempo-ralprimitives,andnon-consumingtagsfortemporalrelationsacrossarbitrarilylongtextspans,makeithighlycompatiblewiththecurrentparadigmofannotation-basedencapsulationofdocumentanalysis.
4TimeMLandTemporalAnalysisTimeML'sannotation-basedrepresentationfacilitatesintegra-tionoftimeanalysiswiththeanalysisofothersyntacticand/ordiscoursephenomena;italsonaturallysupportsexploita-tionoflargercontextualeffectsbythetemporalparserproper(see4.
4).
Thisisacrucialobservation,giventhatthepromi-nentlyattractivecharacteristicofTimeML—itsintrinsicrich-nessofexpression—makesitchallengingforanalysis.
Therearetwobroadcategoriesofproblemsfordevelop-inganautomatedTimeMLanalyser:ofsubstanceandofin-frastructure.
Substantiveissuesincludenormalisingtimeex-pressionstoacanonicalrepresentation(TIMEX3'svalueat-tribute),identifyingabroadrangeofevents(e.
g.
eventnom-inalsandpredicativeadjectivesactingaseventspeciers),linkingtime-denotingexpressions(typicallyaTIMEX3andanEVENT),andtypingofthoseLINKs.
Theinfrastructureproblems—smallsizeandlessthancon-sistentmark-upoftheTimeBankcorpus—areduetothefactthatthis,rst,versionislargelyasideproductofasmallnum-berofannotatorstryingoutTimeML'sexpressivecapabilities.
TimeBankisthusintendedasareference,andnotfortraining.
Ourhybridapproachtotemporalparsing,combiningnite-state(FS)recognitionwithmachinelearningfromsparsedata(4.
2),islargelymotivatedbythisnatureofTimeBank.
4.
1TheTimeBankcorpusTimeBankhasonly186documents(68.
5Kwords).
Ifweheldout10%ofthecorpusastestdata,wehavebarelyover60Kwordsfortraining.
Belowweshowcountsof5TIMEX2andTIMEX3differsubstantiallyintheirtreatmentofeventanchoringandsetsoftimes.
(EVENT-TIMEX3)TLINK6andEVENTtypes[Sauretal.
,2004].
TLINKexamplesareparticularlysparse;thedataalsoshowshighlyunevendistributionofexamplesofdifferenttypes.
Incomparison,thePennTreeBankcorpusforpart-of-speechtaggingcontains>1Mwords(>16timeslargerthanTIMEBANK);theCoNLL'03namedentitychunkingtrain-ingset(athttp://cnts.
uia.
ac.
be/conll2003/ner/)hasover200Kwordswith23Kexamples(15timesmorethanTLINKexamples)overjust4nameclasses(comparedtothe13TLINKclassesdenedbyTimeML).
TERN'strain-ingset—almost800documents/300Kwords—isconsideredtobesomewhatsparse,withover8KTIMEXexamples.
tlinktype#occurrenceseventtype#occurrencesISINCLUDED866OCCURRENCE4,452DURING146STATE1,181ENDS102REPORTING1,010SIMULTANEOUS69IACTION668ENDEDBY52ISTATE586AFTER41ASPECTUAL295BEGINS37PERCEPTION51BEFORE35INCLUDES29BEGUNBY27IAFTER5IDENTITY5IBEFORE1Total:1,451Total:8,2434.
2AnalyticalstrategyMinimally,thereasonerwouldrequirethattheanalyticalframeworksupportstimestampingandtemporalorderingofevents;thuswetargettheanalysistasksofndingTIMEX3's,assigningcanonicalvalues,markingandtypingEVENTs,andassociating(someofthem)withTIMEX3tags.
TIMEX3expressionsarenaturallyamenabletoFSdescrip-tion.
FSdevicescanalsoencodesomelargercontextfortimeanalysis(temporalconnectivesformarkingputativeevents,clauseboundariesforscopingpossibleevent-timepairs,etc;see4.
4).
Tocomplementsuchanalysis,amachinelearn-ingapproachcancasttheproblemofmarkingEVENTsaschunking.
Recently,[Ando,2004]hasdevelopedaframe-workforexploitinglargeamountsofunannotatedcorporainsupervisedlearningforchunking.
Insuchaframework,mid-to-high-levelsyntacticparsing—typicallyderivedbyFScascades—canproducerichfeaturesforclassiers.
Thus,wecombineFSgrammarsfortemporalexpressions,embeddedinageneralpurposeshallowparser,withmachinelearningtrainedwithTimeBankandunannotatedcorpora.
4.
3FS-basedparserfortemporalexpressionsViewingTIMEX3analysisasaninformationextractiontask,acascadeofnite-stategrammarswithbroadcoverage(com-pileddowntoasingleTIMEX3automatonwith500statesandover16000transitions)targetsabstracttemporalentitiessuch6InallofourexperimentsweexcludeTIMEX3markupinmeta-data;theTLINKcountsonlyreectlinkstotemporalexpressionsinthebodyofdocuments.
asUNIT,POINT,PERIOD,RELATION,etc;thesemaybefur-therdecomposedandtypedintoe.
g.
MONTH,DAY,YEAR(foraUNIT);orINTERVALorDURATION(foraPERIOD).
Fine-grainedanalysisoftemporalexpressions,in-stantiatingattributeslikegranularity,cardinality,refdirection,andsoforth,iscruciallyrequiredfornor-malisingaTIMEX3:representing"thelastveyears"asil-lustratedbelowfacilitatesthederivationofavaluefortheTIMEX3valueattribute.
[timex:[relative:true][ref_direction:past][cardinality:5][granularity:year]]SuchanalysisamountstoaparsetreeundertheTIMEX3.
(Notshownaboveisadditionalinformation,anchoringtheexpressionintothelargerdiscourseandinformingothernormalisationprocesseswhichemitthefullcomplementofTIMEX3attributes—type,temporalFunction,anchorTimeID,etc).
TimeBankdoesnotcontainsuchne-grainedmark-up:thegrammarsthusperformanadditional'discovery'task,forwhichnotrainingdatacurrentlyexists,butwhichisessen-tialfordiscourse-levelpost-processing,handlinge.
g.
ambigu-ousand/orunderspeciedtimeexpressionsortherelationshipbetweendocument-internalanddocument-externaltemporalproperties(suchas'documentcreationtime').
4.
4ShallowparsingforfeaturegenerationInprinciple,substantialdiscourseanalysiscanbecarriedoutfromashallowsyntacticbase,andderivedbymeansofFScascading[Kennedy&Boguraev,1996].
Ourgrammarsin-terleaveshallowparsingwithnamedentityextraction.
Theyspecifytemporalexpressionsintermsoflinguisticunits,asopposedtosimplylexicalcues(asmanytemporaltaggerstodatedo).
Thispointcannotbeover-emphasised.
OneofthecomplexproblemsforTimeMLanalysisisthatofeventiden-tication.
Atemporaltagger,ifnarrowlyfocusedontimeex-pressionsonly(cf.
[Schilder&Habel,2003]),offersnocluestowhateventsarethereinthetext.
Incontrast,atempo-ralparserawareofthesyntaxofatimephraselike"duringthelongandultimatelyunsuccessfulwarinAfghanistan"isveryclosetoknowing—fromcongurationalpropertiesofaprepositionalphrase—thatthenominalargument("war")ofthetemporalpreposition("during")isaneventnominal.
Ultimately,syntacticanalysisbeyondTimeMLcomponentsisusedtoderivefeaturesfortheclassierstaskedwithndingEVENTsandLINKs(Section5).
Featuregenerationtypicallyreliesonamixoflexicalprop-ertiesandsomecongurationalsyntacticinformation(de-pendingonthecomplexityofthetask).
Ourschemeaddi-tionallyneedssomesemantictyping,knowledgeofbound-ariesoflongersyntacticunits(typicallyavarietyofclauses),andsomegrammaticalfunction.
Anexample(simplied)oftheFScascadeoutputis:[Snt[svoClause[tAdjunctIn[NP[timex3the1988periodtimex3]NP]tAdjunct],[SUB[NPthecompanyNP]SUB][VG[GrmEventOccurrenceearnedgrmEventOccurrence]VG][OBJ[NP[Money$20.
6millionMoney]NP]OBJ]svoClause].
.
.
Snt]Mostoftheaboveisself-explanatory,butweemphasiseafewkeypoints.
Theanalysiscapturesthemixofsyntacticchunks,semanticcategories,andTimeMLcomponentsusedforfeaturegeneration.
ItmaintainslocalTIMEX3analysis;thetimeexpressionisinsideofalargerclauseboundary,withinternalgrammaticalfunctionidenticationforsomeoftheeventpredicates.
ThespecicsofmappingcongurationalinformationintofeaturevectorsisdescribedinSection5.
4.
5MachinelearningforTimeMLcomponentsTimeMLparsingisthusabifurcatedprocessofTimeMLcom-ponentsrecognition:TIMEX3'saremarkedbyFSgram-mars;SIGNALs,EVENTsandLINKsareidentiedbyclas-sicationmodelsderivedfromanalysisofbothTimeBankandlargeunannotatedcorpora.
Featuresforthesemodelsarederivedfromcommonstrategiesforexploitinglocalcon-text,aswellasfromminingtheresults—bothmark-upandcongurational—fromtheFSgrammarcascading,asillus-tratedintheprevioussection.
(Moredetailsonfeaturegener-ationfollowinSection5below.
)ClassiersandfeaturevectorsTheclassicationframeworkweadoptforthisworkisbasedonaprincipleofempiricalriskminimization.
Inparticular,weusealinearclassier,whichmakesclassicationdeci-sionsbythresholdinginnerproductsoffeaturevectorsandweightvectors.
Itlearnsweightvectorsbyminimizingclas-sicationerrors(empiricalrisk)onannotatedtrainingdata.
Forourexperiments(Section6),weusetheRobustRiskMinimization(RRM)classier[Zhangetal.
,2002],whichhasbeenshownusefulforanumberoftextanalysistaskssuchassyntacticchunking,namedentitychunking,andpart-of-speechtagging.
Inmarkedcontrasttogenerativemodels,whereassump-tionsaboutfeaturesaretightlycoupledwithalgorithms,RRM—asisthecasewithdiscriminativeanalysis—enjoysclearseparationoffeaturerepresentationfromtheunderlyingalgorithmsfortrainingandclassication.
Thisfacilitatesex-perimentationwithdifferentfeaturerepresentations,sincetheseparationbetweentheseandthealgorithmswhichmanipu-latethemdoesnotrequirechangeinalgorithms.
WeshowhowchoiceoffeaturesaffectsperformanceinSection6.
WordprolingforexploitationofunannotatedcorporaIngeneral,classicationlearningrequiressubstantialamountoflabeleddatafortraining—considerablymorethanwhatTimeBankoffers(cf.
4.
1).
Thischaracteristicofsizeispoten-tiallyalimitingfactorinsupervisedlearningapproaches.
We,however,seektoimproveperformancebyexploitingunan-notatedcorpora,withtheirnaturaladvantagesofsizeandavailability.
Weuseawordprolingtechnique,developedspeciallyforexploitingalargeunannotatedcorpusfortag-ging/chunkingtasks[Ando,2004].
Wordprolingidenties,andextracts,word-characteristicinformationfromunanno-tatedcorpora;itdoesthis,inessence,bycollectingandcom-pressingfeaturefrequenciesfromthecorpus.
Wordprolingturnsco-occurrencecountsofwordsandfeatures(e.
g.
'nextword','headofsubject',etc)intonewfeaturevectors.
Forinstance,observingthat"extinction"and"explosion"areoftenusedassyntacticsubjectto"occur",andthat"earthquakes""happen",helpstopredictthat"ex-plosion","extinction",and"earthquake"allfunctionlikeeventnominals.
Below(6.
1)wedemonstratetheeffective-nessofwordproling,specicallyforEVENTrecognition.
5ImplementationTouseclassiers,oneneedstodesignfeaturevectorrepre-sentationfortheobjectstobeclassied.
Thisentailsselectionofsomepredictiveattributesoftheobjects(ineffectpromot-ingthesetothestatusoffeatures)anddenitionofmappingsbetweenvectordimensionsandthoseattributes(featuremap-ping).
InthissectionwedescribetheessenceofourfeaturedesignforEVENTandTLINKrecognition.
75.
1EVENTrecognitionSimilarlytonamedentitychunking,wecasttheEVENTrecognitiontaskasaproblemofsequentiallabelingoftokensbyencodingchunkinformationintotokentags.
Foragivenclass,thisgeneratesthreetags:E:class(thelast,end,tokenofachunkdenotingamentionofclasstype),I:class(ato-keninsideofachunk),andO(anytokenoutsideofanytargetchunk).
Theexamplesequencebelowindicatesthatthetwotokens"verybad"arespannedbyanevent-stateannotation.
···another/Overy/I:event-statebad/E:event-stateweek/O···Inthisway,theEVENTchunkingtaskbecomesa(2k+1)-wayclassicationoftokenswherekisthenumberofEVENTtypes;thisisfollowedbyaViterbi-styledecoding.
(WeusethesameschemeforSIGNALrecognition.
)ThefeaturerepresentationusedforEVENTextractionex-perimentsmimicstheonedevelopedforacomparativestudyofentityrecognitionwithwordproling[Ando,2004].
Thefeaturesweextractare:token,capitalization,part-of-speech(POS)in3-tokenwindow;bi-gramsofadjacentwordsin5-tokenwindow;wordsinthesamesyntacticchunk;headwordsin3-chunkwindow;worduni-andbi-gramsbasedonsubject-verb-objectandpreposition-nounconstructions;syntacticchunktypes(nounorverbgroupchunksonly);tokentagsin2-tokenwindowtotheleft;tri-gramsofPOS,capitalization,andwordending;tri-gramsofPOS,capitalization,andlefttag.
5.
2TLINKrecognitionTLINKisarelationbetweeneventsandtimeexpressionswhichcanlinktwoEVENTs,twoTIMEX3's,oranEVENTandaTIMEX3.
Presently(see4.
2)wefocusonTLINKsbetweeneventsandtimeexpressions.
Asarelationallink,TLINKdoesnotnaturallytthetag-gingabstractionforachunkingproblem,outlinedabove.
In-stead,weformulateaclassicationtaskasfollows.
AfterpostingEVENTandTIMEX3annotations(bytheeventclassi-erandtheFStemporalparser,respectively),foreachpairing7WedonotdiscussSIGNALrecognitionhere,asthesignaltagitselfcontributesnothingtoEVENTorTLINKrecognition,beyondwhatiscapturedbyalexicalfeatureoverthetemporalconnective.
betweenanEVENTandaTIMEX3,weaskwhetheritisacer-taintypeofTLINK.
Thisdenesa(+1)-wayclassicationproblem,whereisthenumberofTLINKtypes(BEFORE,AFTER,etc;Section4.
1).
Theadjustmentterm'+1'isforthenegativeclass(not-a-temporal-link).
Therelation-extractionnatureofthetaskofpostingTLINKsrequiresadifferentfeaturerepresentation,capableofencodingthesyntacticfunctionoftherelationarguments(EVENTsandTIMEX3's),andsomeofthelargercontextoftheirmentions.
Tothatend,weconsiderthefollowingvepartitions(denedintermsoftokens):spansofarguments(P1orP2);twotokenstotheleft/rightoftheleft/rightargu-ment(Pleft/Pright);andthetokensbetweenthearguments(Pmiddle).
Fromeachpartition,weextracttokensandparts-of-speechasfeatures.
Wealsoconsidersegments(syntacticconstructionsderivedbyFSanalysis:'when-clause','subject',etc)incertainre-lationshiptopartitions:containedinP1,P2,orPmiddle;coveringP1(orP2)butnotoverlappingwithP2(orP1);occurringtotheleftofP1(ortherightofP2);orcoveringbothP1andP2.
Weuseuni-andbi-gramsoftypesofthesesegmentsasfeatures.
Inthisfeaturerepresentation,segmentsplayacrucialrolebycapturingthesyntacticfunctionsofEVENTsandTIMEX3's,aswellasthesyntacticrelationsbetweenthem.
Thusintheexampleanalysisonp.
4,svoClauseisthesmallestsegmentcontainingbothanEVENTandaTIMEX3,indicativeofadirectsyntacticrelationbetweenthetwo.
Inthenextexample,theTIMEX3andEVENTchunksarecon-tainedindifferentclauses(athatClauseandasvoClause,respectively),whichstructurallyprohibitsaTLINKrelationbetweenthetwo.
[SntAnalystshavecomplained[thatClausethat[timex3third-quartertimex3]corporateearningshaven'tbeenverygoodthatClause][svoClause,buttheeffect[eventhitevent].
.
.
svoClause]Snt]ThusourfeaturerepresentationiscapableofcapturingthisinformationviathetypesofthesegmentsthatcontaineachofEVENTandTIMEX3withoutoverlapping.
6ExperimentsWepresenthereperformanceresultsonEVENTandTLINKrecognitiononly.
ThisislargelybecausetheprimaryfocusofthispaperistoreportonhoweffectiveouranalyticalstrategyisinleveragingthereferencenatureofthesmallTimeBankcorpusfortrainingclassiersforTimeML.
Ofthese,SIGNALwasbrieymentionedearlier(seefootnote7),andTIMEX3recognition,drivenbyFSgrammars,belongstoadifferentpaper.
SincethisistherstattempttobuildaTimeML-compliantanalyser(cf.
Section1),therearenocomparableresultsintheliterature.
Theresults(micro-averagedF-measure)reectexperi-mentswithdifferentsettings,againsttheTimeBankcorpus,andproducedby5-foldcrossvalidation.
6.
1EVENTrecognitionItshouldbeclear,bylookingattheexampleanalysis(p.
4),howlocalinformationandsyntacticenvironmentbothcon-tributetothefeaturegenerationprocess.
Figure1showsper-formanceresultswithandwithoutwordprolingforexploit-inganunannotatedcorpus.
Forwordproling,weextractedfeatureswithtypingw/otypingbasic61.
378.
6basic+word-proling64.
0(+2.
7)80.
3(+1.
7)Figure1:Eventextractionresults,with/withouttyping.
Parenthesesshowcontributionofwordproling,overusingbasicfeaturesonly.
featureco-occurrencecountsfrom40Mwordsof1991WallStreetJournal.
Theproposedeventchunksarecountedascorrectonlywhenboththechunkboundariesandeventtypesarecorrect.
Whilewordprolingimprovesperformance,64.
0%F-measureislowerthantypicalperformanceof,forinstance,namedentitychunking.
Ontheotherhand,whenwetraintheEVENTclassierswithouttyping,weobtain80.
3%F-measure.
ThisisindicativeoftheinherentcomplexityoftheEVENTtypingtask.
6.
2TLINKrecognitionInthisexperimentalsetting,weonlyconsiderthepairingsofEVENTandTIMEX3whichappearwithinacertaindistanceinthesamesentences.
8Forcomparison,weimplementthefollowingsimplebase-linemethod.
ConsideringthetextsequenceofEVENTsandTIMEX3's,only'close'pairsofpotentialargumentsarecou-pledwithTLINKs;EVENTeandTIMEX3tarecloseifandonlyifeistheclosestEVENTtotandtistheclosestTIMEX3toe.
Forallotherpairings,notemporalrelationisposted.
Dependingonthe'with-'/'without-typing'setting,thebase-linemethodeithertypestheTLINKasthemostpopulousclassinTimeBank,ISINCLUDED,orsimplymarksitas'itexists'.
ResultsareshowninFigure2.
Clearly,thedetectionofdistance(#oftlinks)featureswithtypingw/otypingdistance≤64tokensbaseline21.
834.
9(1370tlinks)basic52.
174.
1basic+FS53.
1(+1.
0)74.
8(+0.
7)distance≤16tokensbaseline38.
761.
3(1269tlinks)basic52.
875.
8basic+FS54.
3(+1.
5)76.
5(+0.
7)distance≤4tokensbaseline49.
876.
1(789tlinks)basic57.
080.
1basic+FS58.
8(+1.
8)81.
8(+1.
7)Figure2:TLINKextractionresults,with/withouttyping.
Parenthe-sesshowcontributionofgrammar-derivedfeatures,overusingbasiconesonly.
BaselinepostsTLINKsover'close'EVENT/TIMEX3pairs.
temporalrelationsbetweeneventsandtimeexpressionsre-quiresmorethansimplycouplingtheclosestpairswithina8ToevaluatetheTLINKclassieralone,weusetheEVENTandTIMEX3annotationsinTimeBank.
sentence(asthebaselinedoes).
Itisalsoclearthatthebase-linemethodperformspoorly,especiallyforpairingsoverrel-ativelylongdistances.
Forinstance,itproduces34.
9%whenweconsiderthepairingswithin64tokenswithouttyping.
Inthesamesetting,ourmethodproduces74.
8%inF-measure,signicantlyoutperformingthebaseline.
Wecompareperformanceintwotypesoffeaturerepre-sentation:'basic'and'basic+FSgrammar',whichreectthewithout-andwith-segment-typeinformationobtainedbythegrammaranalysis,respectively.
Asthepositivedelta'sshow,congurationalsyntacticinformationcanbeexploitedbene-ciallybyourprocess.
Focusingonwithin-4-tokenspairings,weachieve81.
8%F-measurewithouttypingofTLINKs,and58.
8%withtyping.
(Thetaskwithouttypingisabinaryclas-sicationtodetectwhetherthepairinghasaTLINKrelationornot,regardlessofthetype.
)Asthegureshows,thetaskbecomesharderwhenweconsiderlongerdistancepairings.
Withina64tokendistance,weobtainguresof74.
8%and53.
1%,withoutandwithtypingrespectively.
Whilewearemoderatelysuccessfulindetectingtheex-istenceoftemporalrelations,thenoticeabledifferencesinperformancebetweenthetasksettingswithandwithouttyp-ingindicatethatwearenotassuccessfulindistinguishingonetypefromanother.
Inparticular,therelativelylowper-formanceofTLINKtypinghighlightsthedifcultyindistin-guishingbetweenDURINGandISINCLUDED.
Theguidelines(andcommonsenseanalysis)suggestthatISINCLUDEDtypeshouldbeassignedifthetimepointordu-rationofEVENTisincludedinthedurationoftheassociatedTIMEX3.
DURING,ontheotherhand,shouldbeassignedasatypeifsomerelationrepresentedbytheEVENTholdsduringthedurationoftheTIMEX3.
Wenotethatforthisparticulartypingproblem,thesubtledistinctionsarehardevenforhu-manannotators:theTimeBankcorpusdisplaysanumberofoccasionswhereinconsistenttaggingisevident.
7ConclusionTimeMLisasignicantdevelopmentintimeanalysis,asitcapturesdetailedinformation,anchoredineventualityandlinguisticstructure,andshowntobecrucialinferentialandreasoningtasks.
Inadditiontodeningannotationguidelines,theTimeMLeffortnotablycreatedtherstreferencecorpusillustrativeofexpressivenessofthelanguage.
Unfortunately,thesmallsizeoftheTimeBankcorpuspre-ventsitsstraightforwarduseasatrainingresource,aproblemfurtherexacerbatedbytheinherentcomplexityofTimeML-compliantanalysis.
Andyet,forreasoningenginestofunc-tion,TimeMLanalysersneedtobebuilt.
[Manietal.
,2004]discusssomepioneeringworkinlink-ingeventswithtimes,andorderingevents,indicativeofproductivestrategiesforposting(some)TLINKinformation.
However,thenatureoftheseeffortsissuchthatdifferencesinpremises,representation,andfocusmakeadirectper-formancecomparisonimpossible.
Furthermore,theworkpre-datesTimeML,andcannotbeconvenientlymappedtoTimeBankdata;this,ineffect,precludesaquantitativecom-parisonwithourwork.
InarstsystematicattemptatTimeML-compliantanalysis,andleveragingtheTimeBankcorpus,wehavedevelopedastrategywhichsynergisticallyblendsnite-stateanalysisoverlinguisticannotationswithastate-of-the-artmachinelearningtechnique.
Particularlyeffectiveare:aggressiveanalysis,bycomplexgrammars,ofbothTimeMLcomponentsandsyntac-ticstructure;coupledwithalearningalgorithmcapableoftrainingoverunannotateddata,inadditiontoexploitingar-bitrarilysmallamountsoflabeleddata.
Whileworkremains(notablyreningtheTLINKrecogniser,targetingothertypesofLINKs,andenhancingEVENTrecognitionwithexternallexicalresources),thisisasignicantstepininstantiatingadeepertimeanalysis,capableofsatisfyingtheneedsofrea-soningengines.
References[Ando,2004]R.
K.
Ando.
Exploitingunannotatedcorporafortaggingandchunking.
InProceedingsofACL-04.
[Ferro,2001]L.
Ferro.
TIDES:Instructionmanualforthean-notationoftemporalexpressions.
MTR01W0000046V01,TheMITRECorporation,2001.
[Fikesetal.
,2003]R.
Fikes,J.
Jenkins,andG.
Frank.
JTP:Asystemarchitectureandcomponentlibraryforhybridreasoning.
TRKSL-03-01,StanfordUniversity,2003.
[FilatovaandHovy,2001]E.
Filatova&E.
Hovy.
Assigningtime-stampstoevent-clauses.
InProceedingsofthe10thConferenceoftheEACL,Toulouse,France,2001.
[Gaizauskas&Setzer,2002]R.
GaizauskasandA.
Setzer,editors.
AnnotationStandardsforTemporalInformationinNL,LasPalmas,Spain,2002.
[Han&Lavie,2004]B.
HanandA.
Lavie.
Frameworkforresolutionoftimeinnaturallanguage.
TALIPSpecialIs-sue,SpatialandTemporalInformationProcessing,2004.
[Hobbsetal.
,2002]J.
R.
Hobbs,G.
Ferguson,J.
Allen,P.
Hayes,andA.
Pease.
ADAMLontologyoftime,2002.
[Kennedy&Boguraev,1996]C.
Kennedy&B.
Boguraev.
Anaphoraforeveryone:Pronominalanaphoraresolutionwithoutaparser.
InProceedingsofCOLING-96,Copen-hagen,DK,1996.
[Mani&Wilson,2000]I.
ManiandG.
Wilson.
Robusttem-poralprocessingofnews.
InProceedingsofthe38thAn-nualMeetingoftheACL,HongKong,2000.
[Manietal.
,2003]I.
Mani,B.
Schiffman,andJ.
Zhang.
In-ferringtemporalorderingofeventsinnews.
InProceed-ingsofACL-41(HLT-NAACL),Edmonton,Canada,2003.
[Manietal.
,2004]I.
Mani,J.
Pustejovsky,andB.
Sund-heim.
Introduction:specialissueontemporalinformationprocessing.
ACMTransactionsAsianLanguageInforma-tionProcessing,3(1):1–10,2004.
[Prageretal.
,2003]J.
Prager,J.
Chu-Carroll,E.
Brown,andC.
Czuba.
Questionansweringusingpredictiveannotation.
InAdvancesinQuestionAnswering,2003.
[Pustejovskyetal.
,2003]J.
Pustejovsky,J.
Castano,R.
In-gria,R.
Saur,R.
Gaizauskas,A.
Setzer,G.
Katz,andD.
Radev.
TimeML:Robustspecicationofeventandtemporalexpressionsintext.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages28–34,Stanford,CA,2003.
[Sauretal.
,2004]R.
Saur,J.
Littman,R.
Gaizauskas,A.
Setzer,andJ.
Pustejovsky.
TimeMLannotationguide-lines,Version1.
1,TERQASWorkshop,2004.
[Schilder&Habel,2003]F.
SchilderandC.
Habel.
Tem-poralinformationextractionfortemporalQA.
InAAAISpringSymposiumonNewDirectionsinQuestion-Answering,pages35–44,Stanford,CA,2003.
[Zhangetal.
,2002]T.
Zhang,F.
Damerau,andD.
E.
John-son.
TextchunkingbasedonageneralizationofWinnow.
JournalofMachineLearningResearch,2:615–637,2002.

RAKsmart:美国圣何塞服务器限量秒杀$30/月起;美国/韩国/日本站群服务器每月189美元起

RAKsmart怎么样?RAKsmart是一家由华人运营的国外主机商,提供的产品包括独立服务器租用和VPS等,可选数据中心包括美国加州圣何塞、洛杉矶、中国香港、韩国、日本、荷兰等国家和地区数据中心(部分自营),支持使用PayPal、支付宝等付款方式,网站可选中文网页,提供中文客服支持。本月商家继续提供每日限量秒杀服务器月付30.62美元起,除了常规服务器外,商家美国/韩国/日本站群服务器、1-10...

wordpress公司网站模板 wordpress简洁高级通用公司主题

wordpress公司网站模板,wordpresss简洁风格的高级通用自适应网站效果,完美自适应支持多终端移动屏幕设备功能,高级可视化后台自定义管理模块+规范高效的搜索优化。wordpress公司网站模板采用标准的HTML5+CSS3语言开发,兼容当下的各种主流浏览器: IE 6+(以及类似360、遨游等基于IE内核的)、Firefox、Google Chrome、Safari、Opera等;同时...

半月湾($59.99/年),升级带宽至200M起步 三网CN2 GIA线路

在前面的文章中就有介绍到半月湾Half Moon Bay Cloud服务商有提供洛杉矶DC5数据中心云服务器,这个堪比我们可能熟悉的某服务商,如果我们有用过的话会发现这个服务商的价格比较贵,而且一直缺货。这里,于是半月湾服务商看到机会来了,于是有新增同机房的CN2 GIA优化线路。在之前的文章中介绍到Half Moon Bay Cloud DC5机房且进行过测评。这次的变化是从原来基础的年付49....

jrzj com为你推荐
回收站在哪回收站 在c盘的路径pwlosera,pw是什么,是不认识的人发的短信。请解释::照片转手绘美图秀秀可以照片转手绘吗?是手机版的数码资源网手机练习打字的软件办公协同软件免费的多人协同办公软件哪些,我了解的有钉钉、企业微信,其他的还有么?彩信中心短信中心号码是多少保护气球气球保护液可以用什么来代替?如何清理ie缓存怎么清除IE缓存网站排名靠前怎样才能做好一个网站?让网站排名靠前?新手求解防钓鱼如何才能防钓鱼网站
买域名 子域名查询 政务和公益机构域名注册管理中心 VPS之家 堪萨斯服务器 rackspace 老左博客 发包服务器 个人域名 有益网络 炎黄盛世 lol台服官网 双12 百度云加速 web应用服务器 美国盐湖城 国内空间 密钥索引 web是什么意思 apachetomcat 更多