disadvantagegraphcore

graphcore 时间:2021-03-26 阅读:()

HALId:hal-01637143https://hal.
archives-ouvertes.
fr/hal-01637143Submittedon17Nov2017HALisamulti-disciplinaryopenaccessarchiveforthedepositanddisseminationofsci-entificresearchdocuments,whethertheyarepub-lishedornot.
ThedocumentsmaycomefromteachingandresearchinstitutionsinFranceorabroad,orfrompublicorprivateresearchcenters.
L'archiveouvertepluridisciplinaireHAL,estdestinéeaudéptetàladiffusiondedocumentsscientifiquesdeniveaurecherche,publiésounon,émanantdesétablissementsd'enseignementetderecherchefranaisouétrangers,deslaboratoirespublicsouprivés.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurveyDennisDiefenbach,VanessaLopez,KamalSingh,PierreMaretTocitethisversion:DennisDiefenbach,VanessaLopez,KamalSingh,PierreMaret.
CoreTechniquesofQuestionAnswer-ingSystemsoverKnowledgeBases:aSurvey.
KnowledgeandInformationSystems(KAIS),Springer,2017.
hal-01637143PublishedinKnowledgeandInformationSystemsCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurveyDennisDiefenbach1,VanessaLopez2,KamalSingh1andPierreMaret11UniversitédeLyon,CNRSUMR5516LaboratoireHubertCurien,F-42023,Saint-Etienne,France2IBMResearchIreland,DamastownIndustrialEstate,DublinAbstract.
TheSemanticWebcontainsanenormousamountofinformationintheformofknowl-edgebases(KB).
Tomakethisinformationavailable,manyquestionanswering(QA)systemsoverKBswerecreatedinthelastyears.
BuildingaQAsystemoverKBsisdifcultbecausetherearemanydifferentchallengestobesolved.
Inordertoaddressthesechallenges,QAsystemsgenerallycombinetechniquesfromnaturallanguageprocessing,informationretrieval,machinelearningandSemanticWeb.
TheaimofthissurveyistogiveanoverviewofthetechniquesusedincurrentQAsystemsoverKBs.
WepresentthetechniquesusedbytheQAsystemswhichwereevaluatedonapopularseriesofbenchmarks:QuestionAnsweringoverLinkedData(QALD).
Techniquesthatsolvethesametaskarerstgroupedtogetherandthendescribed.
Theadvantagesanddisadvantagesarediscussedforeachtechnique.
Thisallowsadirectcomparisonofsimilartechniques.
Additionally,wepointtotechniquesthatareusedoverWebQuestionsandSimple-Questions,whicharetwootherpopularbenchmarksforQAsystems.
Keywords:QuestionAnswering;QALD;WebQuestions;SimpleQuestions;Survey;SemanticWeb;Knowledgebase1.
IntroductionQuestionanswering(QA)isaveryoldresearcheldincomputerscience.
TherstQAsystemsweredevelopedtoaccessdataoverdatabasesinthelatesixtiesandearlysev-enties.
MorerecentworksdesignedQAsystemsoverfreetext,i.
e.
,ndingthepartoftextthatcontainstheanswerofaquestionfromasetofdocuments.
Thesearereferredasinformationretrieval(IR)basedQAsystems.
Inthelasttwodecades,thankstothedevelopmentoftheSemanticWeb,alotofnewstructureddatahasbecomeavailableonReceived29Jul2016Revised10Jul2017Accepted30Jul20172D.
Diefenbachetalthewebintheformofknowledgebases(KBs).
Nowadays,thereareKBsaboutmedia,publications,geography,life-scienceandmore1.
QAoverKBshasemergedtomakethisinformationavailabletotheenduser.
Foramoredetailedhistoricaloverviewsee(Lopezetal.
2011).
ThepublicationofnewKBsacceleratedthankstothepublicationbytheW3CofthedefactostandardsRDF2andSPARQL3.
RDFisaformatforpub-lishingnewKBs.
SPARQLisapowerfulquerylanguageforRDF,butitrequiresexpertknowledge.
TheideabehindaQAsystemoverKBsistondinaKBtheinformationrequestedbytheuserusingnaturallanguage.
ThisisgenerallyaddressedbytranslatinganaturalquestiontoaSPARQLquerythatcanbeusedtoretrievethedesiredinfor-mation.
Inthefollowing,whenwespeakaboutQAsystems,werefertoQAsystemsoverKBs.
DuetothelargenumberofQAsystemsnowadays,anin-depthanalysisofallofthemwouldrequireextensivework.
Instead,werestricttoQAsystemsparticipatinginaspecicbenchmark.
ThiswillallowforacomparisonoftheoverallperformanceoftheQAsystemsandgivesatleastahintofhowgoodthesingletechniquesusedinQAsystemsare.
Weconcentratehereonthe"QuestionAnsweringoverLinkedData"(QALD)benchmarkwhichisinlinewiththeprinciplesoftheSemanticWeb:itfo-cusesonRDFKBs,itproposesKBsindifferentdomainsassumingthataQAsystemshouldbeabletoqueryanyKB,itproposeslargeopen-domainKBlikeDBpediasuchthatscalabilitybecomesanissue,itoffersquestionswheretheinformationshouldbeextractedfrommultipleinterlinkedKBs,itcontainsquestionswithmultiplerelationsandoperatorssuchascomparativesandsuperlativesanditproposesmultilingualques-tionsassumingthattheuserscanbeofdifferentnationalities.
Moreover,wecollectedalistofQAsystemsevaluatedoverWebQuestionsandSimpleQuestions,twopopularbenchmarks,andwepointtoapproachesusedbytheseQAsystemsandnotbytheonesevaluatedoverQALD.
DifferentlyfromothersurveysintheeldofQAoverKBs,wefocusonthetech-niquesbehindexistingQAsystems.
WeidentiedvetasksintheQAprocessandde-scribehowQAsystemssolvethem.
ThiswayeachQAsystemparticipatinginQALDisdecomposedandcompared.
Thetasksarequestionanalysis,phrasemapping,dis-ambiguation,queryconstructionandqueryingdistributedknowledge.
Ourmainaimistodescribe,classifyandcomparealltechniquesusedbyQAsystemsparticipatinginQALD.
ThisprovidesagoodoverviewonhowQAsystemsoverKBswork.
Thepaperisorganizedasfollows:insection2,wepositionoursurveywithrespecttopastsurveysonQAsystems.
Insection3,wedescribethreepopularbenchmarksinQAnamely:WebQuestions,SimpleQuestionsandQALD.
Insection4,wedescribehowthecomparedQAsystemswereselected.
Insection5,wegiveanoverviewoverthevetasks.
Insection6,wedescribethetechniquesusedforthequestionanalysistask.
Insection7,wediscussthetechniquesusedforthephrasemappingtaskand,insection8,thetechniquesforthedisambiguationtaskareprovided.
Insection9,wedescribehowthedifferentQAsystemsconstructaquery.
Finallyinsection10,wedescribewhatchangesshouldbemadeiftherequiredinformationisdistributedacrossdifferentKBs.
Insection11and12,wecomparetheanalysedsystemstotheonesevaluatedoverWe-bQuestionsandSimpleQuestions.
Insection13,wegiveanoverviewoftheevolutionofthechallengesintheeldofQA.
Basedontheanalysisofthispublication,insection14,wepointtofuturedevelopmentsinthedomainofQA.
1http://lod-cloud.
net2http://www.
w3.
org/TR/rdf11-mt/3http://www.
w3.
org/TR/sparql11-query/CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey32.
RelatedWorkTherehavebeenvarioussurveysonQASystemsintheeldofIRdrivenbytheTextRetrievalConference(TREC),theCrossLanguageEvaluationForum(CLEF)andNIITestCollectionsforIRsystems(NTCIR)campaigns,see(Kolomiyets&Moens2011)(Allam&Haggag2012)(Dwivedi&Singh2013).
In(Allam&Haggag2012)(Dwivedi&Singh2013),IRbasedQAsystemsareanalysedbasedonthreecorecomponents:(1)questionanalysisandclassication,(2)informa-tionretrievaltoextracttherelevantdocumentswiththeanswerand(3)answerextrac-tionandranking,whichcombinetechniquesfromnaturallanguageprocessing(NLP),informationretrieval(IR)andinformationextraction(IE),respectively(Allam&Hag-gag2012).
Basedontheoverallapproachin(Dwivedi&Singh2013),QAsystemsarecategorisedinto:Linguistic,StatisticalandPatternMatching.
Inthesesurveys,onlyafewdomain-specicQAapproachesoverstructuredKBsarebrieydescribed.
IntheopendomainQAsystemsanalysed,KBsandtaxonomiesareused,howevertheyarenotusedtondtheanswer,butratherareusedasfeaturesintheQAprocess,whereanswersareextractedfromtext.
Forexample,theyareusedtosupportthequestiontypeclassicationandpredicttheexpectedanswertypeortose-manticallyenrichtheparse-treestofacilitatethequestion-answeringmatching,oftencombinedwithMLapproaches(SupportVectorMachine,NearestNeighbors,Bayesianclassiers,DecisionTree,etc.
)orapproachesthatlearntextpatternsfromtext.
Differentlyfromthesesurveys,wedeepdiveintothetechniquesappliedbyQAsys-temstoextractanswersfromKBs.
Inparticular,wefocusedonansweringopen-domainqueriesoverLinkedDatathathavebeenproposedsincetheintroductionoftheQALDin2011.
WewillseethattheQALDbenchmarkintroducesdifferentchallengesforQAsystemsoverKBs(detailedinsection13),forwhichawidevarietyofnovelorexistenttechniquesandtheircombinationhavebeenapplied.
Moreover,wepointtosometech-niquesusedbyQAsystemsevaluatedoverWebQuestionsandSimpleQuestions.
TheexistingQALDreports(Lopezetal.
2013)(Cimianoetal.
2013)(Ungeretal.
2014)(Ungeretal.
2015)(Ungeretal.
2016),surveysonthechallengesfacedbyopendomainQAsystemsoverKBs(Lopezetal.
2011)(Freitasetal.
2012)andthelatestsurveyonthetopic(Hffneretal.
2016)givesshortoverviewsforeachQAsystem.
TheydonotallowforadetailedcomparisonofthetechniquesusedineachofthestagesoftheQAprocess.
Inthissurvey,theadvantagesanddisadvantagesofeachtechniqueappliedbyeachofthe32QAsystemsparticipatinginanyoftheQALDcampaignsisdescribed.
3.
BenchmarksforQABenchmarksforQAsystemsoverKBsareanewdevelopment.
In2011,therewasstillnoestablishedbenchmarkthatwasusedtocomparedifferentQAsystemsasdescribedin(Lopezetal.
2011).
Therefore,QAsystemsweretestedondifferentKBsusingsmallscalescenarioswithad-hoctasks.
Inthelastnumberofyears,differentbenchmarksforQAsystemsoverKBsarised.
ThethreemostpopularonesareWebQuestions(Berantetal.
2013),SimpleQuestion(Bordesetal.
2015)andQALD4(Lopezetal.
2013)(Cimianoetal.
2013)(Ungeretal.
2014)(Ungeretal.
2015)(Ungeretal.
2016).
InfactQALDisnotonebenchmarkbut4http://www.
sc.
cit-ec.
uni-bielefeld.
de/qald/4D.
DiefenbachetalChallangeTaskDatasetQuestionsLanguageQALD-11DBpedia3.
650train/50testen2MusicBrainz50train/50testenQALD-21DBpedia3.
7100train/100testen2MusicBrainz100train/100testenQALD-31DBpedia3.
8100train/99testen,de,es,it,fr,nl2MusicBrainz100train/99testenQALD-41DBpedia3.
9200train/50testen,de,es,it,fr,nl,ro2SIDER,Diseasome,Drugbank25train/50testen3DBpedia3.
9withabstracts25train/10testenQALD-51DBpedia2014300train/50testen,de,es,it,fr,nl,ro2DBpedia2014withabstracts50train/10testenQALD-61DBpedia2015350train/100testen,de,es,it,fr,nl,ro,fa2DBpedia2015withabstracts50train/25testen3LinkedSpending100train/50testenTable1.
OverviewoftheQALDchallengeslaunchedsofar.
aseriesofevaluationcampaignsforQAsystemsoverKBs.
Sixchallengeshavebeenproposeduptillnow.
AnoverviewisgiveninTable1.
Eachyearoneofthetasksin-cludedQAoverDBpedia.
Inthefollowing,weconcentrateontheseresults.
Moreover,toaddressthecasewheretherequestedinformationisdistributedacrossmanyKBs,weconsiderQALD-4Task2.
Itoffers3interlinkedKBs,25trainingquestionsand25testquestionsthatcanbeansweredonlybysearchingandcombininginformationdis-tributedacrossthe3KBs.
Wecomparethesethreebenchmarksbasedonthedataset,thequestionsandtheevaluationmetrics.
3.
1.
DatasetsWebQuestionsandSimpleQuestionscontainquestionsthatcanbeansweredusingFree-base(Google2016).
TheQALDchallengesalwaysincludedDBpedia5asanRDFdatasetbesidessomeothers.
ThemaindifferencebetweenDBpediaandFreebaseishowtheinformationisstructured.
Freebasecontainsbinaryandalsonon-binaryrelationships.
Non-binaryrelationshipsarestoredusingonetypeofreicationthatuses"mediatornodes".
AnexampleisgiveninFigure1.
ThisisnotthecaseinDBpediawhereonlybinaryrelationsarestored.
Inparticulartemporalinformationisnotpresent.
SomeoftheworksoverWebQuestionslike(Bordesetal.
2015)and(Jain2016)convertednon-binaryrelationshipstobinaryrelationshipsbydroppingthe"mediatornode".
(Bordesetal.
2015)notedthatthiswaystill86%ofthequestionsinWebQuestionscanbean-swered.
Moreover,foreachrelationship,Freebasealsostoresitsinverse,thisisnotthecaseforDBpediasincetheyareconsideredredundant.
Regardingthesize,theDBpe-diaendpointcontainsnearly400.
000tripleswhilethelastFreebasedumpcontains1.
9billiontriples.
Notethatduetoreicationandduetothematerializationoftheinverseofeachproperty,thisisroughlythesameamountofinformation.
MoreovernotethatWebQuestionsandSimpleQuestionsareoftentackeledbyusingonlyasubsetofFree-base.
ThisisdenotedasFB2MandFB5Mcontainingonly2and5millionentities,respectively.
5http://www.
dbpedia.
org/CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey5PortugalmEU1986-01-01organization_member.
member_oforganization_membership.
organizationorganization.
organization_membership.
fromFigure1.
Freebasealsocontainsnon-binaryrelationshipslike"PortugalisamemberoftheEuropeanUnionsince1986.
".
ThisinformationisstoredinFreebaseusingonetypeofreication.
Ituses"mediatornodes"(inthisexamplem).
Notethatinthedump,foreachrelationship,itsinverseisalsostored.
ThisisnotthecasewithDBpedia.
3.
2.
QuestionsWebQuestionscontains5810questionswhichfollowaverysimilarpatternduetothecrowd-sourcingmethodusedtoconstructthedataset.
Around97%ofthemcanbean-sweredusingonlyonereiedstatementwithpotentiallysomeconstraintsliketypecon-straintsortemporalconstraints6.
SimpleQuestioncontains108.
442questionsthatcanbeansweredusingonebinary-relation.
ThequestionsofQALDarepreparedeachyearbytheorganizersofthechallenge.
Theycanbegenerallyansweredusinguptothreebinary-relationsandoftenneedmodierslikeORDERBYandCOUNT.
Moreover,someofthequestionsareoutofscope.
Thetrainingsetisrathersmallwithabout50to250questions.
Thisresultsinfewopportuni-tiesforsupervisedlearning.
Finally,whiletheSimpleQuestionandtheQALDdatasetsareannotatedwithaSPARQLqueryforeachquestion,theWebQuestionsdatasetisonlyannotatedwiththelabelsoftheanswers.
ThereexistsadatasetcontainingSPARQLqueriesforeachofthequestioninWebQuestions,whichiscalledWebQuestionsSP(Yihetal.
2016).
3.
3.
EvaluationForevaluation,threeparametersareused:precision,recall,andF-measure.
Theyarethesameacrossallbenchmarks.
Precisionindicateshowmanyoftheanswersarecorrect.
Foraquestionqitiscomputedas:Precision(q)=numberofcorrectsystemanswersforqnumberofsystemanswersforq.
Thesecondparameterisrecall.
Foreachquestionthereisanexpectedsetofcorrectanswers,thesearecalledthegoldstandardanswers.
Recallindicateshowmanyofthereturnedanswersareinthegoldstandard.
Itiscomputedas:Recall(q)=numberofcorrectsystemanswersforqnumberofgoldstandardanswersforq.
Weexplaintheseparametersusingthefollowingexample.
Assumeauserasksthequestion:"Whicharethethreeprimarycolors".
Thegoldstandardanswerswouldbe"green","red"and"blue".
Ifasystemreturns"green"and"blue",astheanswers,then6https://www.
microsoft.
com/en-us/download/details.
aspxid=527636D.
Diefenbachetaltheprecisionis1,sinceallanswersarecorrectbuttherecallisonly2/3sincetheanswer"red"ismissing.
Theglobalprecisionandrecallofasystemcanbecomputedintotwoways,theyaredenotedasthemicroandmacroprecisionandrecall,respectively.
Themicroprecisionandrecallofasystemarecomputedbytakingtheaverageoftheprecisionandrecalloverallansweredquestions,i.
e.
nonansweredquestionsarenottakenintoaccount.
Themacroprecisionandrecallarecomputedbytakingtheaverageoftheprecisionandrecalloverallquestions.
The(microormacro)F-measureistheweightedaveragebetweenthe(microormacro)precisionandrecallanditiscomputedasfollows:F-measure=2PrecisionRecallPrecision+Recall.
NotethattheF-measureisneartozeroifeitherprecisionorrecallareneartozeroandisequaltooneonlyifbothprecisionandrecallareone.
Inthissurveywereportmacroprecision,recallandF-measuresincethesearetheofcialmetricsusedbythebenchmarks.
Anotherparameterwhichisimportantfortheuser,isruntime.
Thisisgen-erallyindicatedastheaveragetimetakenbytheQAsystemforansweringthequestions.
4.
SelectionoftheQAsystemsWebrieydescribehowtheQAsystemscomparedinthispaperwereselected.
WeconsidertheQAsystemsthateitherdirectlyparticipatedintheQALDchallengesorthatwereevaluatedafterwardsreusingthesameevaluationset-up.
Toidentifythelattersystems,wesearchedinGoogleScholarforallpublicationsmentioningorcitingthepublicationsoftheQALDchallenges(Lopezetal.
2013)(Cimianoetal.
2013)(Ungeretal.
2014)(Ungeretal.
2015)(Ungeretal.
2016).
Fromamongthese,wetookthepub-licationsreferringtoQAsystems.
Weexcludedsystemsthatwereonlyabletoanswerquestionsformulatedinacon-trollednaturallanguagenamelysquall2sparql(Ferré2013)andCANaLI(Mazzeo&Zaniolo2016).
Moreover,weexcludesystemsthatproposeagraphicaluserinterfacetoguideusersthroughtheconstructionofaquery.
TheseincludeSparklis(Ferré2017)andScalewelis(Joris&Ferré2013).
NotethatoverthelastyearsothertypesofQAsystemshavealsoemerged.
BetweenthesethereishybridQA,i.
e.
QAusingbothfreetextandKBs.
Thisproblemwastack-ledbytheQAsystemsHAWK(Usbecketal.
2015)andYodaQA(Baudi&ediv`y2015).
AnothertypeofQAsystemisQAoverstatisticaldataexpressedusingtheRDFDataCubevocabulary.
ThisproblemwastackledbyCubeQA(Hffner&Lehmann2015)andQA3(Atzorietal.
2016).
Sincethereareonlyafewworksinthesedirec-tions,wedirectthereadersdirectlytothem.
TheQAsystemsAPAQ,AskDBpedia,KWGAnswer,MHE,NbFramework,PersianQA,RO_FIIandUIQAthatparticipatedinQALDwereexcludedsincenocorrespondingpublicationcouldbefound.
Table2containsalistoftheconsideredQAsystemwiththescorestheyachievedontheQALDtestsets.
5.
ThequestionansweringprocessToclassifythetechniquesusedinQAsystemswedividetheQAprocessintovetasks:questionanalysis,phrasemapping,disambiguation,queryconstructionandqueryingCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey7QAsystemLangTotalPrecisionRecallF-measureRuntimeReferenceQALD-1FREyA(Damljanovicetal.
2010)en500.
540.
460.
5036s(Lopezetal.
2013)PowerAqua(Lopezetal.
2012)en500.
480.
440.
4620s(Lopezetal.
2013)TBSL(Ungeretal.
2012)en500.
410.
420.
42-(Ungeretal.
2012)Treo(Freitas&Curry2014)en50QALD-2SemSeK(Aggarwal&Buitelaar2012)en1000.
350.
380.
37-(Lopezetal.
2013)BELA(Walteretal.
2012)en1000.
190.
220.
21-(Walteretal.
2012)QAKiS(Cabrioetal.
n.
d.
)en1000.
140.
130.
13-(Lopezetal.
2013)QALD-3gAnswer(Zouetal.
2014)en1000.
400.
400.
401s(Zouetal.
2014)RTV(Giannoneetal.
2013)en990.
320.
340.
33-(Cimianoetal.
2013)Intui2(Dima2013)en990.
320.
320.
32-(Cimianoetal.
2013)SINA(Shekarpouretal.
2015)en1000.
320.
320.
3210-20s(Shekarpouretal.
2015)DEANNA(Yahyaetal.
2012)en1000.
210.
210.
211-50s(Zouetal.
2014)SWIP(Pradeletal.
2012)en990.
160.
170.
17-(Cimianoetal.
2013)Zhuetal.
(Zhuetal.
2015)en990.
380.
420.
38-(Zhuetal.
2015)QALD-4Xser(Xuetal.
2014)en500.
720.
710.
72-(Ungeretal.
2014)gAnswer(Zouetal.
2014)en500.
370.
370.
370.
973s(Ungeretal.
2014)CASIA(Heetal.
2014)en500.
320.
400.
36-(Ungeretal.
2014)Intui3(Dima2014)en500.
230.
250.
24-(Ungeretal.
2014)ISOFT(Parketal.
2014)en500.
210.
260.
23-(Ungeretal.
2014)Hakimovetal.
(Hakimovetal.
2015)en500.
520.
130.
21-(Hakimovetal.
2015)QALD-4Task2GFMed(Marginean2017)en250.
991.
00.
99-(Ungeretal.
2014)SINA(Shekarpouretal.
2015)en250.
950.
900.
924-120s(Shekarpouretal.
2015)POMELO(Hamonetal.
2014)en250.
870.
820.
852s(Ungeretal.
2014)Zhangetal.
(Zhangeetal.
2016)en250.
890.
880.
88-(Zhangeetal.
2016)TRDiscover(Songetal.
2015)en250.
340.
800.
48-(Songetal.
2015)QALD-5Xser(Xuetal.
2014)en500.
740.
720.
73-(Ungeretal.
2015)QAnswer(Rusetietal.
2015)en500.
460.
350.
40-(Ungeretal.
2015)SemGraphQA(Beaumontetal.
2015)en500.
310.
320.
31-(Ungeretal.
2015)YodaQA(Baudi&ediv`y2015)en500.
280.
250.
26-(Ungeretal.
2015)QALD-6UTQA(Pouran-ebnveyseh2016)en1000.
820.
690.
75-(Ungeretal.
2016)UTQA(Pouran-ebnveyseh2016)es1000.
760.
620.
68-(Ungeretal.
2016)UTQA(Pouran-ebnveyseh2016)fs1000.
700.
610.
65-(Ungeretal.
2016)SemGraphQA(Beaumontetal.
2015)en1000.
700.
250.
37-(Ungeretal.
2016)EvaluatedonthetrainingsetTable2.
ThistablesummarizestheresultsobtainedbytheQAsystemsevaluatedoverQALD.
WeincludedtheQALDtasksqueryingDBpediaandQALD-4task2whichad-dressestheproblemofQAoverinterlinkedKBs.
Weindicatedwith""thesystemsthatdidnotparticipatedirectlyinthechallenges,butwereevaluatedonthesamebenchmarkafterwards.
Weindicatetheaveragerunningtimeofaqueryforthesystemswherewefoundit.
Eveniftheruntimeevaluationswereexecutedondifferenthardware,itstillhelpstogiveanideaaboutthescalability.
distributedknowledge(seeFigure2).
Inthefollowing,webrieydescribetherstfourstepsusingasimpleexample.
Assumetheuserinputsthequestion:"WhatisthepopulationofEurope"andthatDBpediaisthequeriedKB.
Inthequestionanalysistask,weincludealltechniquesthatusepurelysyntacticfeaturestoextractinformationaboutagivenquestion.
Theseinclude,forexample,determiningthetypeofthequestion,itisa"What"question,identifyingnamedentities,like"Eu-rope",identifyingrelationsandentities(subjectsandobjects),liketherelation"isthepopulationof",andtheirdependencies,likethatthereisadependencybetween"isthepopulationof"and"Europe".
Inthephrasemappingtask,onestartswithaphrasesandtriestoidentifyresourcesthatwithhighprobabilitycorrespondtos.
Forexample,forthephrase"Europe"possible8D.
DiefenbachetalFigure2.
TasksintheQAprocessresourcesinDBpediaare:dbr:Europe_(band)7(thatreferstoaBandcalledEurope),dbr:Europe(thatreferstoEuropeasacontinent)anddbr:Europe_(dinghy)(aparticulartypeofboat).
Inthedisambiguationtask,weincludetechniquesthatareusedtodeterminewhichoftheresourcesidentiedduringthephrasemappingtaskaretherightones.
Intheaboveexample,"Europe"cannotrefertoabandoraboatsinceitdoesnotmakesensetospeakabouttheirpopulation.
Therefore,thesetworesourcescanbeexcluded.
ThequeryconstructiontaskdescribeshowtoconstructaSPARQLquerythatcanbesendtoanendpoint.
Thispartalsocoverstheconstructionofqueriesthatrequirespecialoperatorssuchascomparativesandsuperlatives.
Intheexampleabove,thegeneratedSPARQLquerywouldbe:Selectpwhere{dbr:Europedbp:populationTotalp}Inthedistributedknowledgetask,weincludetechniquesthatareusedinthecasewhentheinformationtoanswerthequestionmustberetrievedfromseveralKBs.
NotethatQAsystemsgenerallydonotfollowsuchastrictdivisioninvetasks.
However,basedonthecurrentstateoftheart(includingsystemsnotreviewedherebecausetheywerenotevaluatedforQALD),wehavefoundthatQAsystemscantypicallybedecomposedinthisway.
Notethatthesubdivisionintasksdoesnotaimtopresentageneralarchi-tectureforQAsystems,buttoclassifyandcompareQAsystems.
Moreover,ingeneralthereisnopreferredordertoaddressthetasks,i.
e.
Xser(Xuetal.
2014)rstperformsdisambiguationandthenconstructsthequerywhileSEMPRE(Berantetal.
2013)pro-ceedstheotherwayaround.
Insection10,thetasksaremappedtothechallengesidentiedinthestateoftheartforQAsystems.
Thefollowingvesections,describethetechniquesthatareusedbyQAsystemstosolvethedifferenttasks.
WefocusontechniquesthatcanbeadaptedtoanyKB,i.
e.
techniquesthatdonotusesomeDBpedia/Wikipediaspecicfeatures.
6.
QuestionanalysisInthisstep,thequestionoftheuserisanalyzedbasedonpurelysyntacticfeatures.
QAsystemsusesyntacticfeaturestodeduce,forexample,therightsegmentationofthequestion,determinewhichphrasecorrespondstoaninstance(subjectorobject),prop-ertyorclassandthedependencybetweenthedifferentphrases.
NotethatsomesystemslikeXser(Xuetal.
2014)decidealreadyinthisstepwhatpartofthequestioncorre-spondstoaninstance,relationandclass.
OthersystemslikePowerAqua(Lopezetal.
2012)usethissteptondhowthetermsinthequestionrelatetoeachotherandmakeanhypothesisaboutthecorrespondence.
Therightonecanbeselectedafterwards.
Weclassifytechniquesforquestionanalysisintechniquesforrecognizingnamedentities,7@prexdbr:CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey9WRBVBDDTNNPNNPVBN.
WhenwastheEuropeanUnionfoundedFigure3.
POStagsreturnedbytheStanfordPOStaggerhttp://nlp.
stanford.
edu/software/tagger.
shtmlforthequestion"WhenwastheEuropeanUnionfounded".
ForexampleDTstandsfordeterminer,NNPforpropersingularnoun.
techniquesforsegmentingthequestionusingPart-of-speech(POS)taggingandtech-niquestoidentifydependenciesusingparsers.
6.
1.
RecognizingnamedentitiesAnimportanttaskinQAistosegmentthequestion,i.
e.
identifycontiguousspansoftokensthatrefertoaresource.
Asanexampleconsiderthequestion:"WhodirectedOneDayinEurope".
Thespanoftokens"OneDayinEurope"referstoalm.
QAsystemsusedifferentstrategiestosolvethisproblem.
NamedEntityRecognition(NER)toolsfromNLPOneapproachistouseNERtoolsusedinnaturallanguageprocessing.
Unfortunately,NERgenerallyareadaptedtoaspecicdomain.
Itwasobservedin(Heetal.
2014)thattheStanfordNERtool8couldrecognizeonly51.
5%ofthenamedentitiesintheQALD-3trainingset.
N-gramstrategyAcommonstrategyistotrytomapn-grams(groupsofnwords)inthequestiontoentitiesintheunderlyingKB.
Ifatleastoneresourceisfound,thenthecorrespond-ingn-gramisconsideredasapossiblenamedentity(NE).
ThisisusedforexampleinSINA(Shekarpouretal.
2015)andCASIA(Heetal.
2014).
ThishastheadvantagethateachNEintheKBcanberecognized.
Thedisadvantageisthatmanypossiblecan-didateswillappearandthatthedisambiguationwillbecomecomputationallyexpensive.
EntityLinking(EL)ToolsSomeenginesuseothertoolstorecognizeNE.
TheseareDBpediaSpotlight(Daiberetal.
2013)andAIDA(Yosefetal.
2011).
ThesetoolsdonotonlyrecognizeNE,butalsondthecorrespondingresourcesintheunderlyingKB,i.
e.
theyperformentitylinking.
Asaresult,theyperformmanystepsatonce:identifyingcontiguousspanoftokensthatrefertosomeentity,ndpossibleresourcesthatcancorrespondtoitanddisambiguatebetweenthem.
6.
2.
SegmentingthequestionusingPOStaggingPOStagsareusedmainlytoidentifywhichphrasescorrespondtoinstances(subjectsorobjects),toproperties,toclassesandwhichphrasesdonotcontainrelevantinforma-tion.
AnexampleofaPOStaggedquestionisgiveninFigure3.
Oftennounsrefertoinstancesandclasses(like"European")andverbstoproperties(like"founded"above).
Thisdoesnotalwayshold.
Forexampleinthequestion"WhoisthedirectorofStar8http://nlp.
stanford.
edu/software/CRF-NER.
shtml10D.
DiefenbachetalnoneV-BC-BnonenoneE-BE-IR-B.
BywhichcountrieswastheEuropeanUnionfoundedFigure4.
QuestionannotatedusingtheCoNLLIOBformat.
E-BindicatesthattheentityisbeginningandE-Ithatitiscontinuing(thiswayphraseswithmorewordsarelabeled).
Wars"thenoun"director"mayrefertoaproperty(e.
g.
,dbo:director).
ThegeneralstrategyusingPOStagsistoidentifysomereliablePOStagsexpressionstorecognizeentities,relationsandclasses.
TheseexpressionscanthenbeeasilyidentiedusingregularexpressionsoverthePOStags.
HandmaderulesSomeenginesrelyonhandwrittenregularexpressions.
ThisisthecaseforPower-Aqua((Lopezetal.
2012)(Lopezetal.
2007)),Treo((Freitas&Curry2014))andDEANNA((Yahyaetal.
2012)).
PowerAquausesregularexpressions,basedontheGATENLPtool((Cunninghametal.
2002)),toidentifythequestiontypeandtogrouptheidentiedexpressionsintotriplesofphrases.
TodothisPowerAquareliesonanextensivelistofquestiontemplates,i.
e.
,dependingonthetypeofthequestionanditsstructure,thephrasesaremappedtotriples.
LearningrulesInsteadofwritinghandmaderules,itwasproposedtousemachinelearningalgorithmstodetectthem(Xuetal.
2014).
Theideaistocreateatrainingsetwherequestionsaretaggedwithentities(E),relations(R),classes(C),variables(V)and"none"tagsusingtheCoNLLIOBformat(inside-outside-beginning)asinFigure4.
Onceatrain-ingcorpusisconstructeditisusedtobuildaphrasetagger.
ThefeaturesusedbythephrasetaggerbuiltinXser((Xuetal.
2014))are:POS-tags,NER-tagsandthewordsofthequestionitself.
Thiswayonecanconstructataggerthatisabletoidentifyen-tities,relations,classesandvariablesinagivenquestionandlearnautomaticallyrulestodothatstartingfromthetrainingdata.
AsimilarapproachwasfollowedalsobyUTQA((Pouran-ebnveyseh2016)).
Thedisadvantageisthatatrainingcorpusmustbecreated.
Theadvantageisthatnohandmaderuleshavetobefound.
6.
3.
IdentifyingdependenciesusingparsersPOStagginginformationdoesnotallowtoidentifytherelationbetweenthedifferentchunksinaquestion.
Thisisthereasonforusingparsers.
Aparserisbasedonaformalgrammar.
Aformalgrammarconsistsofsymbols(words)andproductionrulesthatareusedtocombinethesymbols.
Givenasentence,aparsercomputesthecombinationofproductionrulesthatgeneratethesentenceaccordingtotheunderlyinggrammar.
Exampleswillbegiveninthenextsection.
Notethattherearedifferenttypesofformalgrammars.
ThesystemsparticipatingtotheQALDchallengesuseparsersbasedondifferenttypesofgrammars.
Wepresentthembelowandshowtheiradvantages.
6.
3.
1.
ParsersbasedonphrasestructuregrammarsSomeparsersrelyonphrasestructuregrammars.
Theideaofphrasestructuregrammarsistobreakdownasentenceintoitsconstituentparts.
Anexampleisgiveningure5.
ThesetypesoftreesareusedsimilarlytoPOStags,i.
e.
onetriestondsomegraphCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey11SBARQ.
SQVPVBNfoundedNPNNPUnionNNPEuropeanDTtheVBDwasWHPPWHNPNNScountriesWDTwhichINByFigure5.
Parsingtreeofthequestion"BywhichcountrieswastheEuropeanUnionfounded"returnedbytheStanfordParser(http://nlp.
stanford.
edu/software/lex-parser.
shtml).
AtthebottomofthetreearethewordsinthequestionandthecorrespondingPOStags.
Thetagsabovedenotephrasalcategorieslikenounphrase(NP).
patternsthatmaptoinstances,propertiesorclasseswithhighcondence.
DifferentlyfromPOStags,onecandeducethedependenciesbetweenentities,relationsandclasses.
ParsersofsuchtypeareusedintheQAsystemsIntui2((Dima2013)),Intui3((Dima2014))andFreya((Damljanovicetal.
2010)).
6.
3.
2.
ParsersbasedondependencygrammarsTheideabehinddependencygrammarsisthatthewordsinasentencedependoneachother,i.
e.
aword"A"dependsonaword"B".
"B"iscalledthehead(orgovernor)and"A"iscalledthedependent.
Moreover,parsersalsogenerallyindicatethetypeofrela-tionbetween"A"and"B".
Standforddependencies,UniversaldependenciesFigure6showstheresultoftheStanforddependencyparserforthequestions"BywhichcountrieswastheEuropeanUnionfounded".
Forexample,thetreeindicatesthat"founded"istheheadof"By",orthat"By"isthedependentof"founded".
TheStanfordparseralsoreturnsthetypeofthedependency,intheexampleabove"prep"indicatesthat"by"isaprepositionof"founded".
Thead-vantagesofusingdependencytreescanbeseenbylookingattheexampleabove.
Intheoriginalquestion,thewordsintherelationalexpression"foundedby"arenotsubsequentwhichmakesthemdifculttoextract.
Thisisnotthecaseinthedependencytreewhere"by"isconnectedto"founded".
Thisisthemainreasonwhydependencyrepresenta-tionsareusedforrelationextraction.
Moreover,theparsingtreecontainsgrammaticalrelationslikensubj(nominalsubject),nsubjpass(nominalpassivesubject)andotherswhichcanbeusedtoidentifytherelationsbetweendifferentphrasesinthequestion.
Amethodtoextractrelationsistosearchthedependencytreeforthebiggestconnectedsubtreethatcanbemappedtoaproperty.
Intheexampleabovethiswouldbethesub-treeconsistingofthenodes"founded"and"by".
Theargumentsassociatedwithare-lationarethenextractedsearchingaroundthesubtree.
ThisisforexampleusedbygAnswer(Zouetal.
2014).
Anotherstrategyistorstidentifynamedentitiesinthede-pendencytreeandchoosetheshortestpathbetweenthemasarelation.
Intheexampleabovethesearetheentities"country"and"EuropeanUnion".
ThisisusedinthepatternlibraryPATTY(Nakasholeetal.
2012).
12D.
DiefenbachetalfoundedByUnioncountriestheEuropeanWhichwasprepnsubjpassauxpaspobjdetnndetFigure6.
ResultoftheStanforddependencyparserforthequestions"Bywhichcoun-trieswastheEuropeanUnionfounded".
By[which]V[countries]Cwasthe[EuropeanUnion]E[founded]PFigure7.
Parsetreeforthequestions"BywhichcountrieswastheEuropeanUnionfounded"usingtheStanforddependencyparserPhrasedependenciesandDAGsWhiletheStanforddependencyparserconsidersdependenciesbetweenwords,thesys-temXser(Xuetal.
2014)considersdependenciesonaphraselevel,namelybetweenphrasesreferringtoentities,relations,classesandvariablesasin6.
2.
Thefollowingdependencyrelationisconsidered:relationsaretheheadofthetwocorrespondingar-gumentsandclassesaretheheadoftheonecorrespondingargument(argumentsareeitherentities,variablesorotherrelationsandclasses).
Asanexample,lookatthede-pendencygraphofthequestion"BywhichcountrieswastheEuropeanUnionfounded"ingure7.
Thephrase"founded"istheheadofthephrase"which"andtheentity"Eu-ropeanUnion".
Theclass"country"istheheadofthevariable"which".
Notethatthisisnotatree,butadirectacyclicgraph(DAG).
TocomputetheDAG,XserusesaSHIFT-REDUCEDparserthatistrainedonamanuallyannotateddataset.
TheparserusesasfeaturesthePOStagsofthewords,thetypeofthephraseandthephraseitself.
Theadvantageisthattheparserlearnsautomaticallywhichistherelationbetweenthephrases.
Thedisadvantageisthatoneneedsamanuallyannotatedcorpus.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey13NERNEn-gramstartegyELtoolsPOShand-madePOSlearnedParserstructuralgrammarDependencyparserPhrasedependenciesandDAGReferenceBELAx(Walteretal.
2012)CASIAxxx(Heetal.
2014)DEANNAxxx(Yahyaetal.
2012)FREyAx(Damljanovicetal.
2010)gAnswerx(Zouetal.
2014)GFMed(Marginean2017)Hakimovetal.
x(Hakimovetal.
2015)Intui2x(Dima2013)Intui3xx(Dima2014)ISOFTxxx(Parketal.
2014)POMELOx(Hamonetal.
2014)PowerAquax(Lopezetal.
2012)QAKiSxx(Cabrioetal.
n.
d.
)QAnswerxx(Rusetietal.
2015)RTVx(Giannoneetal.
2013)SemGraphQAxx(Beaumontetal.
2015)SemSeKxxx(Aggarwal&Buitelaar2012)SINAx(Shekarpouretal.
2015)SWIPx(Pradeletal.
2012)TBSLx(Ungeretal.
2012)TRDiscover(Songetal.
2015)Treoxx(Freitas&Curry2014)UTQAx(Pouran-ebnveyseh2016)Xserxx(Xuetal.
2014)Zhangetal.
x(Zhangeetal.
2016)Zhuetal.
x(Zhuetal.
2015)Table3.
EachlineofthistablecorrespondstoaQAsystem.
Thecolumnsindicatewhichofthetechniquespresentedinthequestionanalysistask(lefttable)andinthephrasemappingtask(righttable)isused.
Thecolumnsrefertothesubsectionsofsection6and7.
Weputaquestionmarkifitisnotclearfromthepublicationwhichtechniquewasused.
6.
4.
SummaryandresearchchallengesforquestionanalysisTable3showswhichstrategyisusedbythedifferentenginesforquestionanalysis.
Weputaquestionmarkifitisnotclearfromthedescriptionofthepublicationwhichtech-niqueisused.
Theresearchchallengesthathavebeenidentiedarethefollowing:theidenticationofquestiontypes,themultilingualityandtheidenticationofaggregation,comparisonandnegationoperators.
Section13willgiveatransversalviewonresearchchallengesthroughthesurveyswheretheyarediscussed.
7.
PhrasemappingThisstepstartswithaphrase(oneormorewords)sandtriestond,intheunderlyingKB,asetofresourceswhichcorrespondtoswithhighprobability.
Notethatscouldcor-respondtoaninstance,aproperty,oraclass.
Asanexample,thephraseEUcouldcorre-spondtotheDBpediaresourcesdbr:European_Unionbutalsotodbr:University_of_Edinburghordbr:Execution_unit(apartofaCPU).
Inthefollowing,wewanttoprovideanoverviewofthetechniquesusedbyQAsystemstodealwiththeseproblems.
14D.
Diefenbachetal7.
1.
KnowledgebaselabelsRDFSchemaintroducesthepropertyrdfs:label9toprovideahuman-readablever-sionofaresource'sname.
Moreover,multilinguallabelsaresupportedusinglanguagetagging.
Forexample,thefollowingtriplesappearinDBpedia:PREFIXrdfs:;PREFIXdbr:;dbr:European_Unionrdfs:label"EuropeanUnion"@en.
dbr:European_Unionrdfs:label"EuropischeUnion"@de.
dbr:European_Unionrdfs:label"Unioneuropéenne"@fr.
sayingthatthehuman-readableversionofdbr:European_UnioninfrenchisUnioneuropéenne.
ByusingtheRDFSchemaconventiontheresourcescorrespondingtoaphrasescanbefoundbysearchingallresourcesrintheKBwhoselabellabel(r)isequaltoorcontainss.
ThisstrategyisusedbyallQAsystems.
Toansweraquestioninreasonabletimesometypeofindexisneededtosearchthroughallthelabels.
Therearetwocommonchoices.
Therstistouseatriple-storebuild-inindexlikeinVirtuoso10.
ThisisforexampleusedbySINA(Shekarpouretal.
2015).
ThesecondistouseLucene.
EnginesthatuseLucenearePowerAqua(Lopezetal.
2012)andSemSeK(Aggarwal&Buitelaar2012).
Inbothcasesonecansearchveryfastthroughthelabels.
Theadvantageofabuild-intriple-storeindexisthatitiskeptsynchronizedwiththecorrespondingKB.
Severalproblemscanarisethatcanbegroupedintwocategories.
Therstcategorycontainsproblemsthatarisebecausethephrases,asastring,issimilartolabel(r)butnotastrictsubset.
Forexample,scouldbemisspelled,theorderofthewordsinsandlabel(r)couldbedifferentorsingularpluralvariations(likecity/cites).
Thesecondcategorycontainsproblemsthatarisebecausesandlabel(r)arecompletelydifferentasstringsbutsimilarfromasemanticpointofview.
Thiscanhappenforex-amplewhenthequestionscontainsabbreviationslikeEUfordbr:European_Union,nicknameslikeMuttifordbr:Angela_MerkelorrelationalphraseslikeismarriedtocorrespondingtotheDBpediapropertydbo:spouse.
AlltheseproblemsarisebecausethevocabularyusedinthequestionisdifferentfromthevocabularyusedintheKB.
Theterm"lexicalgap"isusedtorefertotheseproblems.
7.
2.
DealingwithstringsimilarityOnepossibilitytodealwithmisspellingistomeasurethedistancebetweenthephrasesandthelabelsofthedifferentresourcesoftheKB.
Therearealotofdifferentdistancemetricsthatcanbeused.
ThemostcommononearetheLevenshteindistanceandtheJaccarddistance.
LuceneforexampleoffersfuzzysearcheswhichreturnwordssimilartothegivenonebasedontheLevenshteinDistanceoranothereditdistance.
MoreoverLuceneoffersalsostemmingwhichallowstomap"cities"to"city"sincethestemisinbothcases"citi".
9@prexrdfs:10https://github.
com/openlink/virtuoso-opensourceCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey157.
3.
DealingwithsemanticsimilarityInthissectionwewanttoconsiderthephrasemappingproblemwhensandlabel(r)haveonlyasemanticrelation.
7.
3.
1.
DatabaseswithlexicalizationsOnefrequentapproachistousealexicaldatabase.
ExamplesforsuchdatabasesareWordNetandWiktionary.
Thestrategyistorstexpandsbysynonymss1,.
.
.
,snfoundinthelexicaldatabase.
Thens1,.
.
.
.
,snareusedtoretrievetheresourcesinsteadofs.
Forexample,WordNetreturnsforEUthesynonymsEuropeanUnion,EuropeanCom-munity,EC,EuropeanEconomicCommunity,EEC,CommonMarket,EuropewhicharethenusedtoretrievethecorrespondingresourcesintheKB.
MorepreciselyWordNetgroupswordsinsetsofroughlysynonymouswordscalledsynsets.
Forexample,thewordEUiscontainedintwosynsets.
Therstwasdescribedaboveandrefersto"aninternationalorganizationofEuropeancountriesformedafterWorldWarIItoreducetradebarriersandincreasecooperationamongitsmembers".
Thesecondsynsetcon-tainseuropium,Eu,atomicnumber63andrefersto"abivalentandtrivalentmetallicelementoftherareearthgroup".
WordNetisusedforexampleinTBSL(Ungeretal.
2012),PowerAqua(Lopezetal.
2012)andSemSek(Aggarwal&Buitelaar2012).
Themainadvantageisthatmoremappingscanberesolvedimprovingrecall.
Thedisadvantageisthattheexpansionofsincreasesthenumberofpossiblecorrespondingresources.
Thiscanaffectprecisionandhasthesideeffectthatthedisambiguationprocessbecomescomputationallymoreheavy.
Moreoverlexicaldatabasesareoftendomain-independentandcannothelpfordomain-specicmappings.
7.
3.
2.
RedirectsAnotherwaytocollectnewlabelsofaresourceistofollow,ifpresent,theowl:sameAslinks.
ThelabelsintheconnectedKBcanbeusedtheninthesamewayasintheorig-inalKB.
Moreover,theanchortextsoflinks,mappingtoanKBresource,canalsobeused.
ThelabelsgeneratedbytheanchortextsoftheWikipedialinksarecontainedinDBpedialexicalizations11.
7.
3.
3.
Adatabasewithrelationallexcalizations(PATTY)PATTY(Nakasholeetal.
2012)isaDatabasewithrelationallexicalizations.
Themainobjectiveistocollectnaturallanguageexpressionsandgroupthemiftheyrefertothesamerelation.
Thesegroupsarecalled"patternsynsets"inanalogyofWordnet.
More-over,thepatternsynsetsareorganizedinataxonomylikesynsetsinWordNet.
Anex-ampleofsuchapatternsynsetis(isalbum,[[num]]albumby,wasalbum,[[det]]albumwith,[[adj]]albumby)where[[num]]correspondstoacardinalnumber,[[det]]cor-respondstoadeterminerand[[adj]]correspondstoanadjective(forthisreasontheyarecalled"patterns").
Exampleswheredifferentphrasesexpressthesamerelationare:"Beatles'salbumRevolverRevolverisa1966albumbytheBeatles","RevolverisanalbumwiththeBeatles",and"RevolveristheseventhalbumbytheBeatles".
PATTYcanbeusedasstandarddatabasewithlexicalizationsasin7.
3.
1andsharestheiradvantagesanddisadvantages.
PATTYisforexampleusedbyXser(Xuetal.
2014).
11http://wiki.
dbpedia.
org/lexicalizations16D.
Diefenbachetal7.
3.
4.
FindingmappingsusingextractedknowledgeTherearedifferenttoolsthatextractbinaryrelationsexpressedinnaturallanguagefromtextcorpora.
Anexampleofsuchabinaryrelationis("AngelaMerkel","ismarriedto","JoachimSauer").
Notethatitisarelationexpressedinnaturallanguagesothesubject,objectandpredicatearenotmappedtoanyKB.
ToolsthatcanextractthistypeoftriplesareforexampleReVerb(Faderetal.
2011),TEXTRUNNER(Yatesetal.
2007),WOE(Wu&Weld2010)andPATTY(Nakasholeetal.
2012).
AnapproachthatusethisdatatondnaturallanguagerepresentationsofpropertiesinaKBwasdescribedin(Berantetal.
2013).
ItassumesthatthesubjectandobjectofthebinaryrelationaremappedtoinstancesinanunderlyingKB.
Firsttherelationalphraseisnormalizedtoanexpressionrel.
Inasecondsteptheclassesofthesubjectandobjectareaddedobtainingrel[class1,class2],i.
e.
intheexampleabovemarried[Person,Person].
Thenallentitypairsthatareconnectedinthetextbytherelationrelandthatmatchestheclassesarecomputed.
DenotethissetasSup(rel[class1,class2]).
Moreover,forallpropertiesintheKBthesetofconnectedentitypairsarecomputed.
ForapropertypwedenotethissetasSup(p).
Therelationrel[class1,class2]isthenalignedtothepropertypifthedomainandrangeofbothagreeandSup(rel[class1,class2])\Sup(p)6=;.
ThistechniquewasusedbyCASIA(Heetal.
2014).
Themaindisadvantagehereisthatthedatacanbeverynoisy.
AsimilarapproachisdescribedingAnswer(Zouetal.
2014).
7.
3.
5.
FindingmappingsusinglargetextsTherearetwomethodsthatuselargetextstondnaturallanguagerepresentationforphrases.
TherstisthecoreoftheBOAframework(Gerber&Ngomo2011)andM-Atoll(Walteretal.
2014).
BothtakeasaninputapropertypcontainedinaKBandtrytoextractnaturallanguageexpressionsforpfromalargetextcorpus.
TodothatbothextractfromtheKBthesubject-objectpairs(x,y)thatareconnectedbytheprop-ertyp.
Thenthetextcorpusisscannedandallsentencesareretrievedthatcontainbothlabel(x)andlabel(y)).
Attheendthesegmentsoftextbetweenlabel(x)andlabel(y),orlabel(y)andlabel(x)areextracted.
Theideaisthatthesetextsegmentsareanatu-rallanguagerepresentationofthepropertyp.
Thesetextsegmentsarestoredtogetherwiththerangeanddomainofpandrankedsuchthatthetoprankedpatternsareamorelikelynaturallanguagerepresentationofp.
Usingthisdata,arelationalphrasercanbemappedtoapropertypbysearchingasimilartextfragmentintheBOAframework.
ThistechniqueisforexampleusedinTBSL(Ungeretal.
2012).
MoreoverQakis(Cabrioetal.
n.
d.
)usesthepatternlibraryWikiFrameworks(Mahendraetal.
2011)thatwasconstructedinaverysimilarway.
AlsoQAnswer(Rusetietal.
2015)usesthisap-proach,buttherelationalphraseisextractedusingdependencytrees.
Thebigadvantageofthisapproachisthattheextractedrelationalphrasesaredirectlymappedtothecorre-spondingproperties.
TherelationalexpressionsfoundaretightlyconnectedtotheKB,differentlytoPATTYwheretherelationsaremoreKB-independent.
Anotherapproachthatuseslargetextsisbasedondistributionalsemantics.
Theideabehinddistributionalsemanticsisthatiftwowordsaresimilarthentheyappearinthesamecontext.
Ann-dimensionalvectorviisassociatedwitheachwordwi.
Thevectorsarecreatedsuchthatwordsthatappearinthesamecontexthavesimilarvectors.
FromamongtheQAsystemsparticipatinginQALD,twotoolsthatarebasedondistributionalsemanticsare:word2vec12andExplicitSemanticAnalysis(ESA)13.
Inbothcasesthe12https://code.
google.
com/p/word2vec/13http://code.
google.
com/p/dkpro-similarity-asl/CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey17vectorsthatareassociatedwiththewordshavethepropertythatthecosinesimilarityofagivenpairofwordsissmallifthewordsaresimilar.
Inthecaseofword2vec,theexperimentalresultsshowedthattheclosestvectorstothevectorofFrancevec(France)arethevectorsvec(Spain),vec(Belgium),vec(Netherlands),vec(Italy).
Moreover,thevectorvec(queen)isveryneartothevectorvec(king)-vec(man)+vec(woman).
Inthissense,thesemanticsofthewordsarereectedintotheirassociatedvectors.
Thegen-eralstrategyinthephrasemappingtaskisthefollowing.
Letusassumethatthereisaphrasesandasetofpossiblecandidates{x1,.
.
.
.
,xn}whichcanbeinstances,relationsorclasses.
Thenthevectorrepresentationv0ofsandv1,.
.
.
,vnofthelexicalizationsofx1,.
.
.
.
,xnareretrieved.
Sincethesimilarityofthewordsisreectedinthesimi-larityofthevectors,thebestcandidatesfrom{x1,.
.
.
.
,xn}aretheoneswhosevectorsaremoresimilartov0.
Forexampleiftherightsemanticsarecapturedthenthevectorvec(spouse)shouldbesimilartothevectorofvec(married).
Themainadvantageisthatthistechniquehelpstoclosethelexicalgap.
However,thedisadvantagesarethatitcanintroducenoiseandthatitisgenerallyaquiteexpensiveoperation.
Forthisrea-sonthepossiblecandidatesetisgenerallynottheentiresetofinstances,relationsandclasses,butonlyasubsetofthem.
CASIA(Heetal.
2014)forexampleusesthistech-niqueonlyforclassesandusesword2vecasthetool.
Treousesastrategysuchthatitcanassumethataphrasescorrespondstoapropertyorclassofaparticularinstance.
Inthiscasethecandidatesetcontainsonly10-100elements.
HereESAisusedasthetool.
7.
4.
WikipediaspecicapproachesSomeenginesuseothertoolsforthephrasemappingtasknamely:DBpedialookup14andtheWikipediaMinerTool15.
TherstisforexampleusedbygAnswer(Zouetal.
2014)thesecondbyXser(Xuetal.
2014)andZhuetal.
(Zhuetal.
2015).
7.
5.
SummaryandresearchchallengesforthephrasemappingtaskTable4givesanoverviewshowingwhichtechniqueisusedbywhichengineforphrasemapping.
Theimportantpointinthisstepisthatonehastondabalancebetweenselectingasfewcandidateresourcesaspossibletoimproveprecisionandtimeperfor-mance,andselectenoughcandidatessothattherelevantoneisalsoselectedtoimproverecall.
Theresearchchallengesidentiedinthephrasemappingstepare:llingthelex-ical/vocabularygapandmultilinguality.
ThelatterappliesifthevocabularyintheuserqueryandtheKBvocabularyareexpressed(lexicalized)indifferentlanguages.
Seesection13foratransversalviewontheresearchchallenges.
8.
DisambiguationTwoambiguityproblemscanarise.
Therstisthatfromthequestionanalysisstepthesegmentationandthedependenciesbetweenthesegmentsareambiguous.
Forexample,inthequestion"Givemealleuropeancountries.
"thesegmentationcangroupornottheexpression"europeancountries"leadingtotwopossibilities.
Thesecondisthatthephrasemappingstepreturnsmultiplepossibleresourcesforonephrase.
Intheexample14https://github.
com/dbpedia/lookup15http://wikipedia-miner.
cms.
waikato.
ac.
nz18D.
DiefenbachetalKnowledgebaselabelsStringsimilarityLuceneindexorsimilarWordNet/WiktionaryRedirectsPATTYUsingextractedknowledgeBOAorsimilarDistributionalSemanticsWikipediaspecicapproachesReferenceBELAxxxxxx(Walteretal.
2012)CASIAxxxx(Heetal.
2014)DEANNAx(Yahyaetal.
2012)FREyAxxx(Damljanovicetal.
2010)gAnswerxxxx(Zouetal.
2014)GFMedxx(Marginean2017)Hakimovetal.
xx(Hakimovetal.
2015)Intui2x(Dima2013)Intui3xxxx(Dima2014)ISOFTxxxx(Parketal.
2014)POMELOx(Hamonetal.
2014)PowerAquaxxxx(Lopezetal.
2012)QAKiSxx(Cabrioetal.
n.
d.
)QAnswerxxxxxx(Rusetietal.
2015)RTVxxx(Giannoneetal.
2013)SemGraphQAxxxx(Beaumontetal.
2015)SemSeKxxxxx(Aggarwal&Buitelaar2012)SINAx(Shekarpouretal.
2015)SWIPxx(Pradeletal.
2012)TBSLxxxxx(Ungeretal.
2012)TRDiscoverx(Songetal.
2015)Treoxxx(Freitas&Curry2014)UTQAxxxx(Pouran-ebnveyseh2016)Xserxxx(Xuetal.
2014)Zhangetal.
xx(Zhangeetal.
2016)Zhuetal.
xxx(Zhuetal.
2015)Table4.
EachlineofthistablecorrespondstoaQAsystem.
Thecolumnsindicatewhichofthetechniquespresentedinthephrasemappingtaskisused.
Thecolumnsrefertothesubsectionsofsection7.
above"european"couldmaptodifferentmeaningsoftheword"Europe".
Thissectionexplainshowquestionansweringsystemsdealwiththeseambiguities.
8.
1.
LocalDisambiguationDuetoambiguities,QAsystemsgeneratemanypossibleinterpretations.
Torankthemmainlytwofeaturesareused.
Therstisthe(stringorsemantic)similarityofthelabeloftheresourceandthecorrespondingphrase.
Thesecondisatypeconsistencycheckbetweenthepropertiesandtheirarguments.
Therstfeatureisusedtorankthepossibleinterpretations,thesecondtoexcludesome.
Thesefeaturesare"local"inasensethatonlytheconsistencybetweenthetworesourcesthataredirectlyrelatedischecked.
Theadvantageisthat"local"disambiguationisveryeasyandfast.
Moreover,itisoftenverypowerful.
AmaindisadvantageisthatactualKBsoftendonotcontaindomainandrangeinformationofapropertysothatthetypeconsistencycannotbedone.
Considerforexamplethequestion"WhoisthedirectorofTheLordoftheRings".
Inthiscase"TheLordoftheRings"isclearlyreferringtothelmandnottothebook.
Ifthepropertycorrespondingtodirectorhasnodomain/rangeinformationthenthisstrategywouldnotallowtodecideif"TheLordOftheRings"isabookoralm.
TheinterpretationasaCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey19v2v1EUwifepresidentFigure8.
AgraphGwhichissemanticallyequivalenttothequestion"WhoisthewifeofthepresidentoftheEU".
Theambiguityof"EU"inthephrasecarriesovertoanambiguityofthecorrespondingvertexofG.
bookcanonlybeexcludedlaterbyqueryingtheKB.
Thisformofdisambiguationisusedbyallsystems.
8.
2.
GraphsearchTheQAsystemsgAnswer(Zouetal.
2014),PowerAqua(Lopezetal.
2012),Sem-Sek(Aggarwal&Buitelaar2012)andTreo(Freitas&Curry2014)usethegraphstruc-tureoftheKBtoresolvetheambiguityalthoughtheyfollowtwodifferentstrategies.
Weexplainthemusingthefollowingquestionasanexample:"WhoisthewifeofthepresidentoftheEU".
TheQAsystemgAnswer(Zouetal.
2014)assumesthatinthequestionanalysissteptheintentionofaquestionqcanbetranslatedintoagraphG.
SeeFigure8forthequestionabove.
Thiscontainsanodeforeachvariable,entityandclassinthequestion,andanedgeforeachrelation.
Moreover,itissemanticallyequivalenttoq.
TheambiguitythatarisesinthephrasemappingstepcarriesovertoanambiguityofverticesandedgesofG.
TheideaofgAnsweristodisambiguatetheverticesandedgesofGbysearchingintotheKBasubgraphisomorphictoGsuchthatthecorrespondingverticescorrespondtothesegmentsofqwithhighprobability.
Thisisachievedbyassigningascoretoeachpossiblematch,whichisproportionaltothedistancebetweenthelabelsoftheresourcesandthesegmentsofthequestion.
Thetop-kmatchesareretrieved.
Inasimilarfashion,PowerAquaexploresthecandidateproperties,howeveritusesaniterativeapproachtobalanceprecisionandrecall,selectingrstthemostlikelymappingsandinterpretationsbasedonthequestionanalysis,andre-iteratinguntilananswerisfoundoralltheso-lutionspacehasbeenanalysed.
Itassignsascoretoeachofthequeriesbasedonthescoreoftheselectedmatchesandifthematchesaredirectlyrelatedornot(semanticdistance).
TheQAsystemsSemSekandTreosolvetheambiguityfromanotherperspective.
Inthiscase,onlytheinstancesidentiedinthequestionanalysisstepareexpandedinthephrasemappingstepleadingtoambiguity.
Therelationalphrasesarenotexpanded.
Fortheconcreteexample"EU"wouldbeidentiedandsomecandidateinstanceswouldbegenerated.
Therelationalphrases"presidentof"and"wifeof"arenotexpanded.
Insteadagraphsearchisstartedfromtheinstancesandallthepropertiesattachedtothemarecomparedtotherelationalphrasesdetectedinthequestion.
Inthiscase,allpropertiesattachedtothedifferentinterpretationsof"EU"arecompared.
Ifnorelationtsthentheexpandedinstanceisexcluded.
Notethatwhilethesecondapproachhashigherrecallsinceallattachedpropertiesareexplored,though,thersthasprobablyhigherprecision.
TheapproachusedinSemSek,TreoandgAnswerassumethatonecandeduceallre-lationalphrasesfromthequestion.
However,theycanalsobeimplicit.
PowerAqua,is20D.
DiefenbachetalXt1XtXt+1Yt1YtYt+1Figure9.
ConditionaldependenciesinaHiddenMarkowModelabletondasubgraphtotranslatetheuserquery,evenifnotallentitiesinthequeryarematched.
Forallofthem,theperformancewilldecreaseifthereistoomuchambiguity,i.
e.
iftherearetoomanycandidatematchestoanalyse.
8.
3.
HiddenMarkovModel(HMM)HiddenMarkovModels(HMM)areusedbySINA(Shekarpouretal.
2015)andRTV(Gi-annoneetal.
2013)toaddresstheambiguitythatarisesinthephrasemappingphase.
Theideabehindthisstrategyissketchedusinganexample.
Assumethequestionis:"BywhichcountrieswastheEUfounded".
InaHiddenMarkovModel(HMM)onehastwostochasticprocesses(Xt)t2Nand(Yt)t2Nwhereonlythelastoneisobserved.
Thepos-siblevaluesoftherandomvariablesXtarecalledhiddenstateswhereasthepossiblevaluesoftherandomvariablesYtarecalledobservedstates.
Fortheexampleabove,thesetofobservedstatesis{"countries","EU","founded"},i.
e.
thesetofsegmentsofthequestionthathaveanassociatedresource.
Thesetofhiddenstatesis{dbo:Country,dbr:Euro,dbr:European_Union,dbr:Europium,dbp:founded,dbp:establishedEvent},i.
e.
thesetofpossibleresourcesassociatedwiththesegments.
InaHiddenMarkovModelthefollowingdependencybetweentherandomvariablesareassumed:–P(Xt=xt|X0=x0;.
.
.
;Xt1=xt1;Y0=y0;.
.
.
;Yt=yt)=P(Xt=xt|Xt1=xt1),i.
e.
thevalueofavariableXtonlydependsfromthevalueofXt1whichmeansthat(Xt)t2NisaMarkovChain;–P(Yt=yt|X0=x0;.
.
.
;Xt=xt;Y0=y0;.
.
.
;Yt1=yt1)=P(Yt=yt|Xt=xt),i.
e.
thatthevalueofYtdependsonlyfromXt.
Ifoneindicatestheconditionaldependencieswithanarrowonegetsthediagramingure9.
Inthecontextofdisambiguationthismeansthattheappearanceofaresourceattimetdependsonlyontheappearanceofanotherresourceattimet1andthatthesegmentsappearwithsomeprobabilitygivenaresource.
Thedisambiguationprocessisreducedtothecaseofndingthemostlikelysequenceofhiddenstates(theresources)giventhesequenceofobservedstates(thesegments).
ThisisastandardproblemthatcanbesolvedbytheViterbiAlgorithm.
Tocompletethemodelingoneneedstoindicatethreemoreparameters:–theinitialprobability,i.
e.
P(X0=x)forx2X;–thetransitionprobability,i.
e.
P(Xt=x1|Xt1=x2)forx1,x22X;–theemissionprobability,i.
e.
P(Yt=y|Xt=x)forx2Xandy2Y.
Thesecanbebootstrappeddifferently.
InSINAtheemissionprobabilityissetaccord-CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey21ingtoastringsimilaritymeasurebetweenthelabeloftheresourceandthesegment.
InRTVtheemissionprobabilitiesareestimatedusingwordembeddings.
InSINAtheinitialprobabilitiesandthetransitionprobabilitiesareestimatedbasedonthedistanceoftheresourcesintheKBandtheirpopularity.
Retrievingthedistancebetweenallre-sourcesiscomputationallyexpensivemakingthisapproachslow.
InRTVtheinitialandtransitionprobabilitiesaresettobeuniformforallresourcesmakingtheestimationfastbutmoreinaccurate.
Anadvantageofthistechniqueisthatitisnotnecessarytoknowthedependencybe-tweenthedifferentresourcesbutonlyasetofpossibleresources.
8.
4.
IntegerLinearProgram(ILP)IntheQAsystemDEANNA(Yahyaetal.
2012)itwasproposedtosetupanIntegerLinearProgram(ILP),whichisanoptimisationtool,tosolvethedisambiguationtask.
Thistechniqueaddressestheambiguityofthephrasemappingphaseandsomeambi-guitythatarisesduringthesegmentation.
TheILPusesdecisionvariablestomodeltheproblem,ofndingthebestanswer,asanoptimisationproblem.
Inparticular,itusesbooleanvariablestoindicateifasegmentofthequestionischosenornot,ifaresourcecorrespondingtoasegmentischosenorwhetherasegmentcorrespondstoapropertyoraninstance.
Theconstraintsincludeconditionssuchthatthechosensegmentsdonotoverlap,suchthatifasegmentischosenthenonecorrespondingresourcemustbechosenandsoon.
Theoptimizationfunctionincludesthreeterms.
Therstincreasesifthelabelofaresourceissimilartothecor-respondingsegment.
Thesecondincreasesiftwoselectedresourcesoftenoccurinthesamecontext.
Thethirdtriestomaximizethenumberofselectedsegments.
Notethatthesecondtermmakesthedisambiguationprocesswork.
Thus,aftersolvingtheILPtheoptimalvaluesobtainedpointtotheoptimalanswer,whichisreturnedastheanswertothequestion.
Themaindisadvantageisthatsomedependenciesbetweenthesegmentshavetobecomputedinthequestionanalysisphase.
8.
5.
MarkovLogicNetworkThequestionansweringsystemCASIA(Heetal.
2014)usesaMarkovLogicNetwork(MLN)forthedisambiguationtask.
MLNisusedtolearnamodelforchoosingtherightsegmentation,formappingphrasestoresources,andgroupingresourcesintoagraph.
Theideaistodenesomeconstraintsusingrst-orderlogicformulas.
MLNallowstoconsidersomeofthemashardconstraintsthatmustbefullledandothersassoftconstraints,i.
e.
iftheyarenotsatisedapenaltyisapplied.
Inthiscaseboththeambiguitiesthatariseinthequestionanalysisandphrasemappingstageareresolved.
Examplesofhardconstraintsare:ifaphraseofthequestionischosenthenoneofthecorrespondingresourcesmustbechosen,orthatthechosenphrasescannotoverlap.
Examplesforsoftconstraintsare:ifaphrasehasaparticularPOStagthenitismappedtoarelation,theresourcewhoselabelismostsimilartothecorrespondingphraseinthequestionmustbechosen.
ThehardconstraintsofaMLNhaveasimilarbehaviorastheconstraintsinaILPwhilethesoftconstraintsallowmoreexibility.
Thepenaltyforthesoftconstraintsarelearnedinatrainingphase.
22D.
DiefenbachetalTheadvantageofaMLNisthattheyallowmoreexibilitythananILPinchoosingtheconstraints.
However,atrainingphaseisneeded.
8.
6.
StructuredperceptronTheengineXser(Xuetal.
2014)usesastructuredperceptrontosolvethedisambigua-tiontask.
Theideaistoconsiderfeaturesduringdisambiguationsuchas:thesimilarityofaphraseandthecorrespondingresource,thepopularityofalabelforaresource,thecompatibilityoftherangeanddomainofapropertywiththetypesoftheargumentsandthenumberofphrasesinthequestionthatareinthesamedomain.
Inatrainingphaseforeachofthefeaturesfaweightwiscomputedsuchthattheexpectedcongurationzfullls:z=argmaxy2Y(x)w·f(x,y)wherexisthequestionandY(x)isthesetofpossiblecongurationsoftheresourcesanddependencyrelations.
Thecongurationwhichmaximizestheexpressionaboveischosen.
Inthisapproach,theambiguitythatarisesinthephrasemappingphaseisresolved.
However,atrainingphaseisneeded.
8.
7.
UserfeedbackThereexistssituationsinwhichtheQAenginecannotdothedisambiguationautomat-ically.
Thiscanhappenbecausethedisambiguationtechniqueusedbytheenginedoesnotsufceorbecausethequestionisreallyambiguous.
Therefore,somesystemsasktheusertoresolvetheambiguitybychoosingbetweensomeproposedresources.
AsystemthatreliesheavilyonuserfeedbackforthedisambiguationisFreya(Damljanovicetal.
2012).
8.
8.
SummaryandresearchchallengesfordisambiguationTable5givesanoverviewofthedifferenttechniquesusedbytheQAsystemsfordis-ambiguation.
NotethatthesystemsGFMed,POMELO,TRDiscover,Zhangetal.
wereevaluatedontheQALD-4task2andtherethedisambiguationproblemdoesnotreallyarisesincethethreeinter-linkedKBsaretoosmall.
Beyondthedisambiguationtech-niques,theresearchchallengesinthisdomainarethelexicalambiguityofthematches,theuseofvagueprepositionsorverbs(have/be)insteadofexplicitrelationships(forexample:moviesofPedroAlmodovar)thatcanconveydifferentintepretations(intheexample:directed,produced,starring,etc.
).
Thesechallengesarelistedinsection13wheretheyarediscussedthroughtheprevioussurveysonKB-basedQAsystems.
9.
QueryconstructionInthissection,wedescribehoweachQAsystemconstructaSPARQLquery.
Aproblemarisesduringthequeryconstruction,thesocalled"semanticgap".
Assumeforexamplethatauserasksthequestion:"WhichcountriesareintheEuropeanUnion".
OnewouldprobablyassumethatintheKBtherearetripleslike:CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey23LocaldisambiguationGraphsearchHMMLIPMLNStructuredperceptronUserfeedbackReferenceBELAx(Walteretal.
2012)CASIAxx(Heetal.
2014)DEANNAxx(Yahyaetal.
2012)FREyAxx(Damljanovicetal.
2010)gAnswerxx(Zouetal.
2014)GFMedx(Marginean2017)Hakimovetal.
x(Hakimovetal.
2015)Intui2x(Dima2013)Intui3x(Dima2014)ISOFTx(Parketal.
2014)POMELOx(Hamonetal.
2014)PowerAquaxx(Lopezetal.
2012)QAKiSx(Cabrioetal.
n.
d.
)QAnswerx(Rusetietal.
2015)RTVxx(Giannoneetal.
2013)SemGraphQAx(Beaumontetal.
2015)SemSeKxx(Aggarwal&Buitelaar2012)SINAxx(Shekarpouretal.
2015)SWIPxx(Pradeletal.
2012)TBSLx(Ungeretal.
2012)Treoxx(Freitas&Curry2014)TRDiscoverx(Songetal.
2015)UTQA(Pouran-ebnveyseh2016)Xserxx(Xuetal.
2014)Zhangetal.
x(Zhangeetal.
2016)Zhuetal.
x(Zhuetal.
2015)Table5.
EachlineofthistablecorrespondstoaQAsystem.
Thecolumnsindicatewhichofthetechniquespresentedinthedisambiguationtaskareused.
Thecolumnsrefertothesubsectionsofsection8.
Weputaquestionmarkifitisnotclearfromthepublicationwhichtechniquewasused.
dbr:Greecedbp:memberdbr:European_Union.
dbr:Francedbp:memberdbr:European_Union.
Butthisisnotthecase,inDBpediatherequestedinformationisencodedas:dbr:Greecedct:subjectdbc:Member_states_of_the_European_Union.
dbr:Francedct:subjectdbc:Member_states_of_the_European_Union.
Soinsteadofaproperty"dbp:member"DBpediausestheclass"dbc:Member_states_of_the_European_Union"toencodetheinformation.
The"semanticgap"referstotheproblemthattheKBencodesaninformationdifferentlyfromwhatonecoulddeducefromthequestion.
ThisshowsthatingeneralitisimpossibletodeducetheformoftheSPARQLqueryknowingonlythequestion.
Therefore,weclassifytheapproachesforthequeryconstructionbasedonhowtheSPARQLqueryformisdeduced.
Wedis-tinguishbetweenapproacheswheretheSPARQLqueryformisbasedontemplates,approacheswhereitisdeducedfromthequestionanalysisphase,whereitisdeducedusingmachinelearningtechniquesorwhereitisdeducedusingonlysemanticinfor-mation.
ThelastsubsectiondescribestheapproachofSemSekandTreothatdonotgenerateaSPARQLquery.
24D.
Diefenbachetal9.
1.
QueryconstructionusingtemplatesSomeenginesusetemplatestogeneratetheSPARQLquery,i.
e.
asetofpredenedquerieswithsomeslotsthathavetobelled.
ThesystemQAKiS(Cabrioetal.
n.
d.
)restrictstoselectquerieswithonlyonetriple.
ThesystemISOFT(Parketal.
2014)usesasmallsetoftemplatestogenerateSPARQLqueries:theseincludeASKqueriesoveronetriple,somesimpleSELECTquerieswithoneortwotriplesandtemplatesthatuseaCOUNT,ORDERBYorFILTERexpressioncontainingonlyonetriple.
Also,PowerAqua(Lopezetal.
2012)assumesthattheinputquestioncanbereducedtooneortwolinguistictriples(notmorethantwopredicates)followingasetoftemplates,theneachlinguistictripleissemanticallymatchedintooneormoreKBtriplesthatcanbecombinedintoaSELECTquery.
Aftersomegraph-baseddisambiguationtheSPARQLqueryisconstructed.
Thedisadvantageisthatnotallquestionscanbetreatedusingtemplates.
9.
2.
QueryconstructionguidedbyinformationfromthequestionanalysisMostofthesystemsstartwiththeinformationobtainedinthequestionanalysispartanddeducefromittheformoftheSPARQLquery.
FreyaandIntui3(Dima2014)startfromthesegmentationofthequestion.
Inthephrasemappingphasesomesegmentshaveanassociatedresource.
Theseresourcesarethencombinedintotriplesrespectingtheorderofthesegmentsinthequestion.
Ifnecessarysomeadditionalvariablesbetweentheidentiedsegmentsareadded,forexampleifonerelationisfollowedbyanotherrelation.
DEANNA(Yahyaetal.
2013)selectssometriplesphrasecandidatesinthequestionanalysisphaseusingregularexpressionsoverPOStags.
Thesearemappedtoresourcesinthephrasemappingphase.
InthedisambiguationphasethebestphrasecandidatesandthebestphrasemappingsarechosenusingaILP.
ThisreturnsasetoftriplesthatisthenusedtoconstructaSELECTquery.
TheQAsystemsgAnswer,QAnswer,RTVandSemGraphQAstartwithadependencytree.
IngAnswer(Zouetal.
2014)rsttherelationsandtheassociatedargumentsarededuced.
AgraphGisconstructedwhichhasanedgeforeachrelationandavertexforeachargument.
ThegraphGreectsthestructureofthenalSPARQLquery.
Therela-tionsandargumentsaremappedtopossibleresources.
Therightresourcesareidentiedusingthesub-isomorphismstrategydescribedinsection8.
2.
ThentheSPARQLqueryisconstructed.
QAnswer(Rusetietal.
2015)rstscansthedependencytreetondsubgraphsoftokensthatcorrespondtosomeresources.
Thiswaymanygraphswithassociateresourcesarecreated.
Thenalocaldisambiguationisperformed.
ThetoprankedgraphischosenandtheSPARQLqueryisconstructedfromthisgraph.
RTV(Giannoneetal.
2013)usesthedependencygraphtoconstructanorderedlistofalternatingpropertiesandnonproperties.
ThecorrespondingresourcesaresearchedanddisambiguatedusingaHMM.
FromthissequencetheSPARQLqueryisgenerated.
Xser(Xuetal.
2014)usesthreedifferentmachinelearningalgorithms.
TherstandthesecondareclaimedtobeKBindependent.
Therstisusedtodeterminethesegmentsofthequestioncorrespondingtovariables,properties,instances,andclasses.
Thesec-ondisusedtondthedependenciesbetweenthephrasesasdescribedinsection6.
3.
2.
Thethirdisusedinthedisambiguationphase,whichisdescribedinsection8.
6andisKBdependent.
SincethersttwoalgorithmsareclaimedtobeKBindependentthisapproachalsoconstructstheformoftheSPARQLbyanalyzingthequestion.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey25[H]LexicalitemSyntacticcategorySemanticrepresentationBarackObamaNPdbr:Barack_Obamais(S\NP)/(S\NP)f.
x.
f(x)marriedto(S\NP)/NPy.
x.
dbo:spouse(x,y)MichelleObamaNPdbr:Michelle_ObamaAlltheseapproachessharethesamedisadvantage.
Allofthemmaketheimplicitas-sumptionthatitispossibletodeducethestructureoftheSPARQLqueryfromthestructureofthequestionwithoutknowinghowtheknowledgeisencodedintotheKB.
9.
3.
QueryconstructionusingSemanticParsingSemanticparsersareaparticulartypeofparsersthatcouplesyntacticrulestosemanticcomposition.
Afterparsingthequestiononegetsasemanticinterpretationofit.
FromamongtheQAsystemsevaluatedoverQALD,differentgrammarsforseman-ticparserswereused:GFgrammarsusedbyGFMed(Marginean2017),feature-basedcontext-freegrammar(FCFG)usedbyTRDiscover(Songetal.
2015),CombinatoryCategorialGrammar(CCG)usedbyHakimovetal.
(Hakimovetal.
2015)andlexicaltree-adjointgrammars(LTAG)usedbyTBSL(Ungeretal.
2012)andBELA(Walteretal.
2012).
WegiveabriefexampleusingtheCCGgrammar.
Considerthephrase"BarackObamaismarriedtoMichelleObama".
Toparsethesentencethefollowinggrammarrulesareneeded:Therstcolumnindicatesthephrasestowhichtherulesareassociated.
ThemainsyntacticcategoriesareNPstandingfornounphraseandSstandingforsentenceandcombinationsofthem.
Thesyntacticcategory(S\NP)/NPforexampleindicatesthatitcanbecombinedwithanounphrase(NP)ontheleftandontherighttogetasen-tenceS.
Applyingtheserules,thesentencecanbeparsedfromasyntacticpointofview.
Coupledtothesyntacticrulesisasemanticrepresentation.
Withoutgoingintodetailsthisisexpressedusinglambdacalculus.
Forexample,thephrasemarriedtosemanti-callyisabinaryfunctionwhichtakestwoarguments.
Sinceitisthepassiveformofthepropertydbo:spousetheargumentsareinverted.
ThesemanticrepresentationofMichelleObamaisjustaconstantwhichinDBpediaisdbr:MichelleObama.
Theap-plicationoftheabovesyntacticrulebetweenmarriedtoandMichelleObamaresultsinthesemanticrepresentationx.
dbo:spouse(x,dbr:Michelle_Obama),i.
e.
xisthespouseofMichelleObama.
Thiswaythewholesentencecanbeparsedleadingtothese-manticrepresentationdbo:spouse(dbr:Barack_Obama,dbr:Michelle_Obama),i.
e.
BarackObama'sspouseisMichelleObama.
Thiswaythesentenceiscompletelyunderstoodfromasemanticpointofviewaswell.
ThesemanticrepresentationcanbetranslatedthentoaSPARQLquery.
Themainadvantageofthisapproachisthatonecandirectlygetasemanticrepresenta-tionofaquestion.
Thisalsoincludesthesuperlativeandcomparativevariationsofthequestion.
Adisadvantageisthatthequestionshavetobewell-formulated,i.
e.
theyarenotrobustwithrespecttomalformedquestions.
Themaindisadvantageisthatoneneedstohaveforeachlexicalitemacorrespondingsemanticrepresentation.
Togeneratethesemanticrepresentations,Hakimovetal.
(Hakimovetal.
2015)adaptedthealgorithmofZettlemoyer&Collins(Zettlemoyer&Collins2012)thatgeneratesthesemanticrep-resentationfromalearningcorpusofpairsofquestionsandthecorrespondingsemanticrepresentation.
ThemainproblemencounteredbyHakimovetal.
(Hakimovetal.
2015)isthatmanylexicalitemsdonotappearinthetrainingcorpusleadingtolowrecall.
To26D.
Diefenbachetalalleviatethisproblem,TBSL(Ungeretal.
2012)generatescandidatesemanticrepre-sentationsofunknownlexicalitemsbasedontheirPOStags.
Intheexampleabove,ifthelexicalitemmarriedtoisunknownthenpossibleinterpretationsaregeneratedsuchastwobinaryfunctionswiththeargumentsxandyexchanged.
Sincethereisnoknowl-edgeaboutthefactthatmarriedtohastobemappedtothepropertydbo:spouse,thussometemplatesaregenerated,i.
e.
theparserisabletoparsethequestion,butabinarypropertycorrespondingtomarriedtohasstilltobefound.
9.
4.
QueryconstructionusingmachinelearningCASIA(Heetal.
2014)usesamachinelearningapproachforthewholeQAprocess.
Thequestionanalysisphaseisusedtosegmentthequestionandtoextractfeatureslikethepositionofaphrase,thePOStagofaphrase,thetypeofdependencyinthedependencytreeandsomeother.
Inthephrasemappingphaseresourcesareassociatedwiththesegmentsandnewfeaturesareextracted:thetypeoftheresourcesandascoreforthesimilaritybetweenthesegmentandtheresource.
InthedisambiguationphasetheextractedfeaturesareusedinaMLN(asdescribedin8.
5)tondthemostprobablerelationbetweenthesegmentsandtondthemostprobablemapping.
ThedetectedrelationsarethenusedtogeneratetheSPARQLquery.
ThedisambiguationphasemustberetrainedforeachnewKB.
9.
5.
QueryconstructionrelayingonsemanticinformationTheQAsystemSINA(Shekarpouretal.
2015)wasdevelopedtodealprimarilywithkeywordqueries,i.
e.
insteadofinsertingthequestion"WhatisthecapitalofBelgium"theusercanalsoinsertthekeywords"capitalBelgium".
Thisimpliesthattherelationbetweenthedifferentresourcesisnotexplicitlikeinanaturallanguagequestion.
InthiscasethedependenciesbetweentheresourceshavetobederivedusingtheKB.
SINArstsegmentsthequestionandndsassociatedresources.
Thesearedisam-biguatedusingaHMM.
Oncetheresourcesaredetermined,SINAconstructsthequeryinthefollowingway.
Foreachinstanceorclass,avertexiscreated.
Foreachproperty,anedgeiscreated.
Theedgesareusedtoconnecttheverticesifthetypesoftherangeanddomainallowit.
Ifnot,theneitheroneortwonewverticesarecreatedthatcor-respondtovariables.
Notethatthecombinatoricscouldallowmorethanonegraph.
Inthiscase,theyareallconsideredbecauseitisnotclearwhichonereectstheuser'sintention.
Moreover,attheendoftheprocess,itispossiblethatonegetsunconnectedsubgraphs.
Inthiscase,foreachpairofverticesintwoxedsubgraphsthesetofpossi-blepropertiesthatcanconnectthemiscomputedandtakenintoaccount.
Allthepossi-blegraphsaretranslatedtoaSPARQLqueryandexecuted.
TheQAsystemPOMELOproceedsinasimilarway.
TheQAsystemdevelopedbyZhangetal.
alsodoesnotrelyonsyntacticfeaturestoconstructthequery.
ThequeryisgeneratedusinganILP.
First,thoseresourcesareidentiedwhichcanbereferredtobythequestion.
Thesearethencombinedusingsomehandmaderulesintotriplepatterns.
BetweenallpossibletriplepatternssomeareselectedusinganILP.
Theadvantageofthisstrategyisthatthegraphisconstructedstartingfromtheunderly-ingKBandnotusingthesyntaxofthequestion.
Thedisadvantagesarethatthisprocessiscomputationallycomplexandthatthesyntaxofthequestionisnotrespected.
Forexample,thesystemswillnotseeanydifferencebetweenthequestions"WhoisthemotherofAngelaMerkel"and"AngelaMerkelisthemotherofwho".
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey279.
6.
ApproachnotusingSPARQLTreo(Freitas&Curry2014)andSemSek(Aggarwal&Buitelaar2012)donotgener-ateaSPARQLquery,butinsteadnavigatethroughtheKBbydereferencingresources.
Considerthefollowingexample:"InwhichcitywastheleaderoftheEuropeanUnionborn".
SemSekrstidentiesa"centralterm"inthequestion,inthiscase"EuropeanUnion".
Usingthedependencytreeandstartingfromthe"centralterm"anorderedlistofpotentialtermscorrespondingtoresourcesisgenerated.
Intheexampleabovethelistwouldbe:"EuropeanUnion","leader","born","city".
Thencandidateresourcesfortherstterm(forexampledbr:European_Union)aresearchedandthecorrespondingURIhttp://dbpedia.
org/resource/European_Unionisdereferencedtosearchallcorrespond-ingproperties.
Thesearecomparedwiththesecondterminthelist.
Theobjectiscon-sideredifoneofthepropertyissimilar(asastringorsemantically)tothesecondterminthelist(likedbp:leaderNameordbp:leaderTitle).
Thisgeneratestwonewexploringdirectionsinthealgorithm.
Otherwisetheactualdirectionisnotfurtheranalyzed.
Thesystemscontinuelikethisandsearchtherightanswerinthegraph.
9.
7.
SummaryandresearchchallengesforthequeryconstructiontaskTable6givesanoverviewofthedifferenttechniquesusedbytheQAsystemsforthequeryconstructing.
Inthisdomain,thecurrentresearchchallengesthatareidentiedaretheinterpretationofadjectivemodiersandsuperlatives;theimplementationofaggre-gation,comparisonandnegationoperators;theresolutionofsyntacticandscopeambi-guities;thenon-compositionality,andthesemantictractability.
Notethatallchallengesidentiedinprevioussurveysarediscussedinsection13.
10.
QueryingdistributedknowledgeIntheprevioussections,wediscussedthetechniquesforansweringquestionsoverasingleKB.
Nevertheless,onepertinentquestionwouldbe:whatchangesinthecaseofmultipleKBsOnlyfewQAsystemstackledthisproblem.
Wefoundintheliteraturethattheproblemcanbeclassiedintotwogroups.
TherstassumesthattheKBsaredisjointgraphs.
ThesecondassumesthattheKBsareinterlinked,i.
e.
resourcesthatrefertothesameentityareidentiedthroughtheKBsusingowl:sameAslinks(twoidentiedresourcesarecalledaligned).
10.
1.
ConsideringunconnectedKBsInthissettingtheKBsarenotinterlinkedandinparticulardifferentKBscanrefertothesameentityusingdifferentURIs.
ThisscenariowastackeledbyPowerAqua(Lopezetal.
2012)andZhangetal(Zhangeetal.
2016).
Assumeforexamplethatforthequestion"WhichriversowinEuropeancountries"apartoftheinformationcanbefoundintotwodifferentKBs.
Onecontaininginformationlike"(river,ow,country)"andthesecondone"(country,type,European)"(i.
eingeneralonecansaythattherearetriple-patternsmatchingdifferentKBs).
ThesecannotbeexecutedasaSPARQLquerybecausetheURIsforcountriesintherstandsecondKBsaredifferentandnotlinked.
Therefore,theresultsareretrievedindependentlyfrombothKBsandmergedbycomparingthelabelsoftheURIs.
TheneitheraSPARQLqueryisgeneratedthat28D.
DiefenbachetalUsingtemplatesUsinginfo.
fromtheQAUsingSemanticParsingUsingmachinelearningSemanticinformationNotgeneratingSPARQLReferenceBELAx(Walteretal.
2012)CASIAx(Heetal.
2014)DEANNAx(Yahyaetal.
2012)FREyAx(Damljanovicetal.
2010)gAnswerx(Zouetal.
2014)GFMedx(Marginean2017)Hakimovetal.
x(Hakimovetal.
2015)Intui2x(Dima2013)Intui3x(Dima2014)ISOFTx(Parketal.
2014)POMELOx(Hamonetal.
2014)PowerAquax(Lopezetal.
2012)QAKiSx(Cabrioetal.
n.
d.
)QAnswerx(Rusetietal.
2015)RTVx(Giannoneetal.
2013)SemGraphQAx(Beaumontetal.
2015)SemSeKx(Aggarwal&Buitelaar2012)SINAx(Shekarpouretal.
2015)SWIPx(Pradeletal.
2012)TBSLx(Ungeretal.
2012)Treox(Freitas&Curry2014)TRDiscoverx(Songetal.
2015)UTQA(Pouran-ebnveyseh2016)Xserx(Xuetal.
2014)Zhangetal.
x(Zhangeetal.
2016)Zhuetal.
x(Zhuetal.
2015)Table6.
EachlineofthistablecorrespondstoaQAsystem.
Thecolumnsindicatewhichofthetechniquespresentedinthequeryconstructiontaskareused.
Thecolumnsrefertothesubsectionsofsection9.
containsthealignedURIs(i.
e.
uri:aowl:sameAsuri:b)ortheresultiscomputedwithoutaSPARQLquery.
10.
2.
ConsideringinterlinkedKBsThetaskofqueryinginterlinkedKBswasproposedatQALD-4.
ItwastackledbyGFMed(Marginean2017),SINA(Shekarpouretal.
2015),POMELO(Hamonetal.
2014),TRDiscover(Songetal.
2015)andZhangetal(Zhangeetal.
2016).
InthecasewheretheKBsareinterlinkedthereisnoparticulartechniqueused.
InfactonecanseetheinterlinkedKBsasonebigKB.
Theidenticationwithowl:sameAslinksmustbeconsideredduringqueryconstruction.
Notethat,insuchascenario,scalabilitycaneas-ilybecomeaproblem.
ThisisnotthecasefortheQALD-4tasksinceonlythreesmallKBsareinterlinked.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey29UnconnectedKBsInterlinkedKBsReferenceGFMedx(Marginean2017)POMELOx(Hamonetal.
2014)PowerAquax(Lopezetal.
2012)SINAx(Shekarpouretal.
2015)TRDiscoverx(Songetal.
2015)Zhangetal.
x(Zhangeetal.
2016)Figure10.
EachlineofthistablecorrespondstoaQAsystemandindicateswhichofthetechniquespresentedinthedistributedtaskituses.
Thecolumnsrefertothesubsectionsofsection10.
QAsystemsPrecisionRecallF-measureReferenceYaoetal.
(2014)0.
4800.
3370.
354(Yao&VanDurme2014)SEMPRE0.
4130.
4800.
357(Berantetal.
2013)Baoetal.
(2014)--0.
375(Baoetal.
2014)Bordesetal.
(2014)--0.
392(Bordesetal.
2014)PARASEMPRE0.
4660.
4050.
399(Berant&Liang2014)Clarkeetal.
(2015)--0.
401(Clarke2015)Dongetal.
(2015)--0.
401(Dongetal.
2015)Yangetal.
(2014)--0.
413(Yangetal.
2014)GraphPharser--0.
413(Reddyetal.
2014)Bordesetal.
(2015)--0.
422(Bordesetal.
2015)Zhangetal.
(2016)--0.
426(Zhangetal.
2016)Yao(2015)0.
5450.
5260.
443(Yao2015)Yangetal.
(2015)--0.
449(Yangetal.
2015)Aqqu0.
6040.
4980.
494(Bast&Haussmann2015)AgendaIL--0.
497(Berant&Liang2015)Wangetal.
(2014)0.
5250.
4470.
453(Wangetal.
2014)Reddyetal.
(2016)0.
6110.
4900.
503(Reddyetal.
2016)Abujabaletal.
(2017)--510(Abujabaletal.
2017)Yavuzetal.
(2016)--0.
516(Yavuzetal.
2016)STAGG0.
6070.
5280.
525(Yihetal.
2015)Tureetal.
(2016)--0.
522(Ture&Jojic2016)Jain(2016)0.
6490.
5520.
557(Jain2016)Table7.
ThistablesummarizestheQAsystemsevaluatedoverWebQuestions.
Itcon-tainspublicationsdescribingaQAsystemevaluatedoverWebQuestionsthatcited(Be-rantetal.
2013)accordingtogooglescholar.
11.
QAsystemsevaluatedoverWebQuestionsInthissection,wedescribeQAsystemsevaluatedoverWebQuestions.
Table7con-tainsalistofQAsystemsevaluatedoverWebQuestions.
Toidentifythesesystems,wesearchedinGoogleScholarforallpublicationscitingthepublication(Berantetal.
2013)whichintroducedthebenchmark.
Inthefollowing,wefocusontechniquesusedbyQAsystemsevaluatedoverWebQuestionsthatwerenotusedbyQAsystemseval-uatedoverQALD.
Duetospacelimitations,adetailedanalysisasforQALDisnotpossible.
Manyofthetechniquespresentedinsection6aboutquestionanalysisarealsousedbyQAsystemsevaluatedoverWebQuestions.
Forexample,Reddyetal.
2016makesanex-30D.
Diefenbachetaltensiveuseofdependecytrees,whileSEMPRE(Berantetal.
2013)andPARASEMPRE(Bordesetal.
2014)usePOStagsasfeatures.
Inadditiontothetechniquespresentedinsection6someworksmakeuseofneuralnetworks.
ThesehavebeensuccessfullyusedinmanyNLPareas.
Differentarchitectureshavebeenexplored.
Convolutionalneuralnetworksandrecurrentneuralnetworkswhereusedtoidentifyorfocusonparticularpartsofthequestions(liketheonereferringtothetype,therelationortheentitycon-tainedinaquestion).
ConvolutionalneuralnetworksareforexampleusedinDongetal.
2015whilerecurrentneuralnetworksareusedinZhangetal.
2016andTure&Jojic2016.
ManyQAsystemusesimilarapproachestotheonepresentedinsection7aboutthepharsemappingtask.
Forexamplethestrategypresentedinsection7.
3.
5isusedbySEMPRE(Berantetal.
2013),Yangetal.
2015andGraphPharser(Reddyetal.
2014).
OneadditionaldatasetthatwasusedtoclosethelexicalgapwasthePARALEXcor-pus(Faderetal.
2013).
Itcontains18millionpairsofquestionsfromwikianswers.
comwhichwheretaggedashavingthesamemeaningbyusers.
Thisdatasetcanbeusedtolearndifferentwaystoexpressthesamerelation.
ItwasusedforexamplebyPARASEM-PRE(Bordesetal.
2014)andBordesetal.
2015.
MoreoverdifferentlyfromQALDmanyworksusethedatasetsandstrategiespresentedinsection7tocreateembeddings(usingneuralnetworks)forrelationsandentities.
ThisisforexamplethecaseforBor-desetal.
2015,Yangetal.
2014,Yavuzetal.
2016,STAGG(Yihetal.
2015)andTure&Jojic2016.
Thedisambiguationproblemdiscussedinsection8isapproachedwithsimilarstrate-giesbothinQALDandWebQuestions.
Todisambiguatetheentities,manyQAsystemsevaluatedoverWebQuestionsuseELtoolsforFreebase.
Todisambiguatebetweendif-ferentgeneratedqueries,mainlymachinelearningalgorithmsareusedlikestructuredperceptrons,Reddyetal.
2016,andlearningtorankusedinAqquBast&Haussmann2015.
Thefeaturesaremainlythesame.
Thequeryconstructionproblemdiscussedinsection9issimplerthanforQALD.
ManyQAsystemsrestricttosimpletriplepatternslikeBordesetal.
2015,Ture&Jojic2016whilemostofthemrestricttosimpleorreiedtriplepatternslikeitisdoneinAqqu(Bast&Haussmann2015)orReddyetal.
2016.
SomeQAsystemsfollowastrategythatissimilartotheonepresentedinsection9.
6.
However,onestrategyisnotusedoverQALD,itisgenerallyreferredto"informationretrievalapproach".
Considerforexamplethequestion:"WhendidAvatarreleaseinUK".
First,themainentityofthequestionissearched.
Intheexample,theentityis"Avatar".
Thenallentitiesconnectedtoitviaasimpleorareiedstatementareconsideredasapossibleanswer.
Thisin-cludesforexamplethedate"17-12-2009"butalsothedirectorofAvatar,thecharactersthatplayedinthelmandmanyothers.
Inthismanner,theproblemisreducedtoaclassicationproblem,i.
e.
,determiningifanentityisananswerornot.
Thisisdone,forexample,byYao&VanDurme2014,Bordesetal.
2014,Bordesetal.
2015,Dongetal.
2015andZhangetal.
2016.
ThetaskofqueryingdistributedknowledgeisnotaddressedintheWebQuestionsbench-mark.
Insummary,onecansaythatoverWebQuestionsmoreattentionwasputonclosingthelexicalgapwhileinQALDmoreeffortwasdoneinthequeryconstructionprocess.
Thisismainlyduetothetypeofquestionscontainedinthebenchmarks.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey31QAsystemsF-measureReferenceBordesetal.
(2015)0.
627(Bordesetal.
2015)Yinetal.
(2016)0.
683(Yinetal.
2016)Daietal.
(2016)0.
626(Daietal.
2016)GolubandHe(2016)0.
709(Golub&He2016)Lukovnikovetal.
(2017)0.
712(Lukovnikovetal.
2017)Table8.
ThistablesummarizestheQAsystemsevaluatedoverSimpleQuestions.
Itcon-tainspublicationsdescribingaQAsystemevaluatedoverSimpleQuestionsthatcited(Bordesetal.
2015)accordingtogooglescholar.
EverysystemwasevaluatedoverFB2Mexcepttheonemarkedwith()whichwasevaluatedoverFB5M.
12.
QAsystemsevaluatedoverSimpleQuestionsInthissection,wedescribeQAsystemsevaluatedoverSimpleQuestions.
Table8con-tainsalistofQAsystemsevaluatedoverSimpleQuestions.
Toidentifythesesystems,wesearchedinGoogleScholarforallpublicationscitingthepublication(Bordesetal.
2015)whichintroducedthebenchmark.
Allsystemsevaluatedoverthisdatasetuseneuralnetworksarchitectures.
Infactthisbenchmarkwascreatedtoexperimentwiththesemachinelearningtechniquesinceitneedsalotoftrainingdatatoperformwell.
ThedifferentQASystemsevaluatedoverSimpleQuestionsfollowasimilarstrategy.
Rememberthateveryquestioninthisbench-markcanbesolvedbyonesingletriple,i.
e.
onlyasubjectandapredicatehastobefound.
Thepossibletriplesarefoundinthefollowingway.
Firstusingann-gramstrat-egy(seesection6.
1)candidateentitiesareidentied.
ThenbylookingintheKBalltriplescontainingtheseentitiesasasubjectareidentied.
Theproblemisreducedtochoosebetweenthelistoftriplestherightone.
Thislaststepisdoneusingdifferentneuralnetworkarchitectures.
Let'sseemoreindetailhowthedifferentphasesoftheQAprocessaretackled.
Thequestionanalysistaskisgenerallynottreatedasaseparatetaskbutsomepartsofthenetworksarededicatedtoit.
Golub&He2016andYinetal.
2016usethesocalledattentionmechanisms.
Thismeansthatthenetworklearnstofocusonthepartsofthequestionsthatrefertoasubjectortoapredicate.
ForthephrasemappingtasktheQAsystemsrelayonthelabelsprovidedbytheKBasdescribedinsubsection7.
1.
MoreoverthequestionisencodedeitheratthecharacterlevellikeinGolub&He2016or,usingwordembeddingslikeinDaietal.
2016or,inbothwayslikeinLukovnikovetal.
2017.
Wordembeddingisusedtobridgethelexicalgapfollowingtheideapresentedinsubsection7.
3.
5.
Thecharacterencodingisusedtoovercometheout-of-vocabularyproblem,i.
e.
thenumberofpossiblewordsappearinginaquestionistoobigtoincludethemallinthevocabulary.
Thedisambiguationproblemishandledasarankingproblem.
Herethedifferentneuralnetworkarchitecturesareusedtocomputeascorebetweenapairofaquestion,andatupleofasubjectandapredicate.
Thepairwiththehighestscoreistoken.
ThequeryconstructionproblemisverysimplesinceonlySPARQLquerieswithonetriplepatternmustbegenerated.
TheproblemofqueryingmultipleKBsisnotad-dressed.
Themainproblemtackledinthisbenchmarkisthedisambiguationproblem.
32D.
DiefenbachetalChallengeSurvey,YearscoveredShortdescriptionTaskIdentifyQuestiontypes(Cimiano&Minock2009),2004-07,(Lopezetal.
2013),2005-12Includeswh-questions,requests(giveme),nominalordenitions,topicalisedentities,howquestionswithadjectiveorquantica-tions(howbig/many)QuestionAnalysisLexicalgap,Vocabularygap(Lopezetal.
2011)2004-11,(Lopezetal.
2013),2005-12,(Freitasetal.
2012),2004-11,(Freitasetal.
2015),2011-12,(Hffneretal.
2016),2011-15Queryanddatabasesmaynotbeexpressedusingthesamevocabulary(synonymy)oratthesamelevelofabstraction.
ItrequirestobridgethegapbetweenthevocabularyintheuserqueryandtheKBvocabularyMappingMultilingualQA(Cimianoetal.
2013)QALD-3,(Ungeretal.
2014)QALD-4,(Hffneretal.
2016),2011-15Mediatesbetweenauserexpressinganinfor-mationneedinherownlanguageandthese-manticdataQuestionAnalysis,MappingLightexpressionVaguenessLexicalambiguity(Cimiano&Minock2009),2004-07,(Lopezetal.
2011),2004-11,(Lopezetal.
2013),2005-12,(Freitasetal.
2015),2004-11,(Hffneretal.
2016),2011-15Querieswithwordsthatcanbeinterpretedthroughdifferentontologicalentitiesorse-manticallyweakconstructions.
Relationsthatareexpressedimplicitlywiththeuseofverbssuchasbe/haveorlightprepositionsthatcanconveydifferentmeaningsDisambiguationSemanticgapConceptualcomplex-ity(Lopezetal.
2013),2005-12,(Freitasetal.
2012),2004-11,(Freitasetal.
2015),2011-12Queriesthatarenotnecessarilystructuredintheknowledgebaseinthesamewaythaninthequestion.
QueryconstructionSpatialandtemporalprepositions(Cimiano&Minock2009),2004-07,(Lopezetal.
2013),2005-12,(Hffneretal.
2016),2011-15Requirestocapturethedomain-independentmeaningofspatial(in,next,thorough)andtemporal(after,during)prepositions-Adjectivemodiersandsuperlatives(Cimiano&Minock2009),2004-07,(Lopezetal.
2011),2004-11,(Lopezetal.
2013),2005-12,(Hffneretal.
2016),2011-15Superlativemodiersandattributeselectors(how+adj)requiremappingeachadjectivetoaKBpredicate(e.
g.
,area/populationforsmallest),aswellaskeepingthepolarityforsuperlatives(orderbyASC/DESC)Disambiguation,QueryconstructionAggregations,com-parisonandnegationoperators(Cimiano&Minock2009),2004-07,(Lopezetal.
2011),2004-11,(Lopezetal.
2013),2005-12,(Hffneretal.
2016),2011-15Aggregationoperatorsarethosecalculatingamin,max,sum,anaverageoracountoveranumberofindividualsfulllingacer-tainproperty.
Comparisonoperatorscomparenumberstoagivenorder.
ThechallengeistorealizeaquantierthroughlogicaloperatorsQuestionAnalysis,QueryConstruc-tionSyntacticandScopeambiguities(Cimiano&Minock2009),2004-07Syntacticambiguityregardingtheconstituentthatprepositionalphrasesorrelativeclausescanattachto(tothelastortoanon-precedentconstituentinasentence),orwhenmultiplescopequantiersarepresentinaquery(most,all,each,etc.
)QueryconstructionDistributedQAEntityreconciliation(Lopezetal.
2011),2004-11,(Freitasetal.
2012),2004-11,(Ungeretal.
2014),QALD-4,(Hffneretal.
2016),2011-15Combiningfactsacrosssourcesrequiresmatchingatschemalevelaswellasentitylevel(ndsemanticallyequivalentdataseten-titiesgivenaqueryentity)tojoinpartialre-sultsortranslationsDistributedKnowledgeNon-compositionality(Cimiano&Minock2009),2004-07Partsofaquestionthatdonotcorrespondtoanylogicalformandneedtobeignored(e.
g.
,largestcitiesintheworldifworldisnotex-plicitlymodel)QueryconstructionSemantictractability(Freitasetal.
2012),2004-11Toanswerqueriesnotsupportedbyexplicitdatasetstatements(e.
g.
,inferringanentityxisanactressbecauseofthestatementxstarredymovie)Mapping,Disambiguation,QueryconstructionOutofscope(Cimiano&Minock2009),2004-07,(Lopezetal.
2011),2004-11AsystemshouldinformaboutthefactthatthequestionisoutofscopeoftheKB(vs.
outofthecapabilitiesofthesystem)AcrossallPortability(Cimiano&Minock2009),2004-07,(Lopezetal.
2011),2004-11Thelevelofeffortrequiretoportthesys-temtoothersources(e.
g.
,handcraftedlexi-con,training)AcrossallScalability(Lopezetal.
2011),2004-2011RequiredbothintermsofKBsizeandtheirnumberwhilekeepingrealtimeperformanceAcrossallHybridQA,Seman-ticandtextualgap(Lopezetal.
2011),2004-11,(Ungeretal.
2014)QALD-4Combiningbothstructuredandunstructuredinformationintooneanswer-Table9.
ChallengesinthestateoftheartforQAsystemsoverKBs13.
EvolutionofresearchchallengesofQuestionAnsweringoverKBsWelookattheresearchchallengesdescribedintheliteraturesincethebeginningofopendomainKB-basedQA,throughdifferentpublishedsurveysandQALDevaluationreports.
Table9listseachchallenge,togetherwiththereference(s)tothesurveywhereitCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey33isdiscussedandtheyearscovered,i.
e.
,therangeofpublicationyearsfortheKB-basedQAsystemscitedinthegivensurvey,excludingotherkindofQAsystemsthatsharesomeofthechallenges(suchasNaturalLanguageInterfacesforDatabases).
Wegroupsimilarchallengestogetherand,ifitapplies,mapthemacrossthevetasksintheQAprocess.
Assuchweassociatethetasks,inwhichweclusterthetechniquesanalysedinthispaper,withthechallengestheytrytotackle.
TherstsurveybyCimianoandMinock(Cimiano&Minock2009)publishedin2010presentsaquantitativeanalysisbasedontheinput(user)questionsaswellastheoutputs(formalqueries)fromaGeographicaldataset,oftenusedtoevaluatetherstdomain-specicKB-basedsystems.
Thefollowingchallengesareidentied:questiontypes;lan-guagelight;lexicalambiguities;syntacticambiguities;scopeambiguities;spatialandtemporalprepositions;adjectivemodiersandsuperlatives,aggregation,comparisonandnegations;non-compositionality;outofscope;andvariabilityoflinguisticforms(i.
e.
,questionsmatchingtothesamelogicalquery-notincludedinTable9).
Someoftheserstidentiedchallenges,likecapturingthemeaningofspatialandtem-poralprepositions,arenotyettackledbythekindofQAsystemssurveyedhere.
Cimi-anoandMinockalsoarguedthatresearchondeeplinguisticandsemanticapproaches,suchas:paraphrasegeneration,discourseprocessingandguidance;couldimprovetheperformanceofthesesystemsbyguidinguserstogenerateanon-ambiguousformofthequestion(i.
e.
,inlinewiththecapabilitiesofthesystem),aswelltoprocessquestionsnotinisolation,butincontextwithpreviouslyaskedquestions.
Fortheseearlysystems,portabilitywasoneofthemostchallengingissues,althoughthecustomisationeffortwasrarelyquantiedorcompared(Cimiano&Minock2009).
Lopezetal.
2011survey(Lopezetal.
2011)discussesthechallengesthatarisedwhenmovingfromaclassicKBsystemtoanopendomainQAsystemfortheWebofdata,intermsofportability,scalability,mappinganddisambiguation,distributedquery(fusionandrankingoffactsacrosssources)andbridgingthegapbetweenthesemanticdataandunstructuredtextualinformation.
Lexicalambiguities(nounhomonymyorpolysemy)arelessofaproblemwhenqueryinganunambiguousknowledgerepresentationinarestricteddomain.
However,foropendomainQA,amajorchallengeistodisambiguatethecorrectinterpretationwhilescalinguptolargeandheterogeneoussources.
Filteringandrankingtechniquesareoftenusedtoprunethelargespaceofcandidatesolutionsandobtainrealtimeperformance(Lopezetal.
2011).
Freitasetal.
2012(Freitasetal.
2012)categorizedthechallengesonqueryinghetero-geneousdatasetsintoqueryexpressivity,usability,vocabulary-levelsemanticmatching,entityreconciliationandtractability.
Asstatedin(Freitasetal.
2012)differentlyfromIR,entitycentricQAapproachesaimtowardsmoresophisticatedsemanticmatchingtechniquestotargetquerieswithhighexpressivity(abletooperateoverthedata,in-cludingsuperlatives,aggregators,etc.
),withoutassumingthattheusersareawareoftheinternalrepresentations(highusabilitythroughintuitivequeryinterfaces).
ForFreitasetal.
2015(Freitasetal.
2015)theprocessofmappingandsemanticinterpretationofschemaagnosticqueriesinvolvescopingwithconceptualcomplexity,termambiguity,vocabularygap(synonymy)andvagueness/indeterminacy(wherewordsorproposi-tionsfailtomaptheexactmeaningintendedbythetransmitter).
Freitasetal.
(Freitasetal.
2015)proposedentropymeasurestoquantifythesemanticcomplexityofmappingqueriestodatabaseelements,showingthatalargenumberofpossibleinterpretationsforwords/prepositionshadanegativeimpactintheF-measurereportedforsystemspartic-ipatinginQALD-1andQALD-2.
MostofthechallengesspeciedbyCimianoandMinock(Cimiano&Minock2009)fortherstdomainspecicQAsystemsremainvalidfortheopendomainquestionspresentedintheQALDbenchmarks.
Basedonthesechallengesandtheresultsfrom34D.
DiefenbachetalQALD-1andQALD-2campaigns,Lopezetal.
2013(Lopezetal.
2013)analyzedthecharacteristicproblemsinvolvedinthetaskofmappingNLqueriestoformalqueries,suchas:(1)thelimitationstobridgethelexicalgap(betweenthevocabularyintheuserqueryandtheKBvocabulary)usingonlystringmetricsandgenericdictionaries;(2)thedifcultiesarisingoninterpretingwordsthroughdifferententitiesbecauseoftheopen-nessandheterogeneityofthesources(includingduplicatedoroverlappingURIsandcomplexconjunctiveKBtermsfromYAGOcategories);and(3)complexquerieswithinformationneedsthatcanonlybeexpressedusingaggregation(requiringcounting),comparison(requiringlters),superlatives(requiringsortingresults),temporalreason-ing(e.
g.
,onthesameday,etc.
),oracombinationofthem.
AswellasthechallengesreportedinLopezetal.
2013(Lopezetal.
2013)forthersttwocampaigns,QALD-3(Cimianoetal.
2013)introducedthemultilingualitychal-lengeandQALD-4(Ungeretal.
2014)introducedtwonewchallengesaspartoftwonewtasks.
Thersttask,distributedQA,introducesquestionsforwhichanswersaredistributedacrossacollectionofinterconnecteddatasetsinthebiomedicaldomain.
Thesecondtask,HybridQA,evaluatesapproachesthatcanprocessbothstructuredandunstructuredinformation,consideringthatlotsofinformationisstillavailableonlyintextualform,e.
g.
,intheformofabstracts.
Lastly,thelatestsurveyonchallengesforsemanticQAsystems(Hffneretal.
2016)classiestheseintosevenchallenges:thelexicalgap,ambiguity,multilingualism,com-plexqueries,distributedknowledge,temporalandspatialqueriesandtemplates.
Thetemplateschallengerefertothesystemsthatusetemplatestocapturethestructureof(complex)querieswithmorethanonebasicgraphpattern.
14.
ConclusionsThisanalysisofQAsystemsshowsthatmostofthesystemshavemanycommontech-niques.
Forexample,alotofsystemsusesimilartechniquesinthequestionanalysisandphrasemappingtask.
Moreover,itshowsthat,QAsystemsgenerallyconcentrateonimprovingonlysometechniques,whereas,theyleaveasidesomeothers.
Forexam-ple,alotofsystemsonlyuselocaldisambiguationtechniques,othersonlyusebasictechniquesforthephrasemappingtask.
Itisnearlyimpossibletocompareexistingtechniquesindividually.
Therearemanyrea-sonsforthat.
Fromtheperformanceofamonolithicsystemitisimpossibletodeducethecontributionofasinglepart.
ItisalsonotsufcienttojustcomparetechniquesthatsolvethesametaskagainsteachothersinceaQAsystemisacombinationofmanycomponents,andabalancebetweenprecisionandrecallmustbefoundconsideringallsteps.
Acomponentwithhighrecallandlowprecisioncanbegoodorbaddependingonthenextsteps.
Soacomparisoncanonlybemadebycomparingthecombinationsofdifferenttechniquesinawholepipeline.
OnesolutiontoaddresstheseproblemsistodevelopfutureQAsystemsusingamod-ularapproach.
Thiswillallowthecommunitytocontributetonewplug-insinordertoeitherreplaceexistingapproachesorcreatenewones.
TheaimisthataQAsystemshouldnotalwaysbebuildfromscratchsuchthatresearchcanfocusmoreonsingletasks.
Asasideeffect,itwillbecomeeasiertocomparetechniqueswhichsolvethesametask.
Theseplug-insshouldsatisfytheconstraintsoftheSemanticWeb.
TheyshouldbeKBindependent,scalable,possiblymultilingualandrequireasfeweffortsaspossibletosetupaQAsystemoveranewdataset.
WeareawareoffourframeworksthatattempttoprovideareusablearchitectureforQAsystems.
QALL-ME(FerrandezCoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey35etal.
2011),openQA(Marxetal.
2014),OKBQA16andQANARY(Bothetal.
2016,Diefenbach,Singh,Both,Cherix,Lange&Auer2017).
SuchintegrationwouldalsoallowtobuildandreuseservicesaroundQAsystemslikespeechrecognitionmodulesandreusablefront-ends(Diefenbach,Amjad,Both,Singh&Maret2017).
ThiswaytheresearchcommunityisenabledtotacklenewresearchdirectionslikedevelopingspeechrecognitionsystemsspecicallydesignedtoanswerquestionsoverKBsandstudytheinteractionofQAsystemswiththeuser.
ThecomparisonbetweenQALDandWebQuestionsshowsthatthesebenchmarksarequitesimilar.
Despitethat,bothbenchmarksareaddressedbytwoquiteisolatedcom-munities.
ThatcanbeseenbythefactthatQAsystemsareeitherevaluatedonQALDoronWebQuestionsandnotonboth.
Wehopethatinfuturethesecommunitieswillmeetsothatbothcanlearnfromeachother.
OnepossiblemeetingpointistheWiki-dataKB.
Thereareseveralreasonsforthat.
TheFreebaseKBisnotupdatedanymoreandallrelatedserviceswereshutdown.
TheWikidatadumpcontainsbothareiedandanon-reiedversionoftheinformationsothatmovingfromDBpediaandFreebasetoWikidataissimple.
Moreovertherearemoreandmorewell-maintainedservicesaroundWikidatathatcanbeusedtodevelopnewQAsystems.
Atthesametime,thefactthatmostQAsystemsareevaluatedonlyoveroneKB,showsthatmoreworkisneededincreatingQAsystemsthataretrulyKBindependent.
ThiswouldalsoallowtogetnewinsightsofhowthequalityoftheKBaffectsthequalityoftheQAsystem.
ThesepointshouldbetackledmorebytheQAcommunity.
FinallyapoorlyaddressedpointistheinteractionofQAsystemswithusers.
WearenotawareofworksthatstudytheusabilityofopendomainKB-basedQAsystemsandthereareafewworksapplyingQAoverKBsinrealscenarios(Songetal.
2015,Lopezetal.
2016).
TheinteractionwiththeusercanalsobeagoodopportunitytoimproveQAsystemsovertimeandtocollecttrainingdatawhichisbecomingmoreandmoreimportanttoimprovetheperformanceandunderstandingofthesesystems.
ResearchinQAoverKBsisandremainsahotandinterestingtopic!
Acknowledgements.
ThisprojecthasreceivedfundingfromtheEuropeanUnion'sHorizon2020researchandinnovationprogramundertheMarieSklodowska-CuriegrantagreementNo642795.
ReferencesAbujabal,A.
,Yahya,M.
,Riedewald,M.
&Weikum,G.
(2017),Automatedtemplategenerationforquestionansweringoverknowledgegraphs,in'Proceedingsofthe26thInternationalConferenceonWorldWideWeb',pp.
1191–1200.
Aggarwal,N.
&Buitelaar,P.
(2012),'Asystemdescriptionofnaturallanguagequeryoverdbpedia',Proc.
ofInteractingwithLinkedData(ILD2012)[37].
Allam,A.
M.
&Haggag,M.
H.
(2012),'Thequestionansweringsystems:Asurvey',IntJournalofResearchandReviewsinInformationSciences(IJRRIS)2(3),.
Atzori,M.
,Mazzeo,G.
&Zaniolo,C.
(2016),QA3@QALD-6:StatisticalQuestionAnsweringoverRDFcubes,in'ESWC'.
toappear.
Bao,J.
,Duan,N.
,Zhou,M.
&Zhao,T.
(2014),'Knowledge-basedquestionansweringasmachinetranslation',Cell2(6).
16http://www.
okbqa.
org36D.
DiefenbachetalBast,H.
&Haussmann,E.
(2015),Moreaccuratequestionansweringonfreebase,in'Proceedingsofthe24thACMInternationalonConferenceonInformationandKnowledgeManagement',ACM.
Baudi,P.
&ediv`y,J.
(2015),'QALDchallengeandtheYodaQAsystem:Prototypenotes'.
Beaumont,R.
,Grau,B.
&Ligozat,A.
-L.
(2015),SemGraphQA@QALD-5:LIMSIparticipationatQALD-5@CLEF,in'WorkingNotesforCLEF2015Conference',CLEF.
Berant,J.
,Chou,A.
,Frostig,R.
&Liang,P.
(2013),SemanticParsingonFreebasefromQuestion-AnswerPairs.
,in'EMNLP'.
Berant,J.
&Liang,P.
(2014),Semanticparsingviaparaphrasing.
,in'ACL(1)'.
Berant,J.
&Liang,P.
(2015),'Imitationlearningofagenda-basedsemanticparsers',TransactionsoftheAssociationforComputationalLinguistics.
Bordes,A.
,Chopra,S.
&Weston,J.
(2014),'Questionansweringwithsubgraphem-beddings',arXivpreprintarXiv:1406.
3676.
Bordes,A.
,Usunier,N.
,Chopra,S.
&Weston,J.
(2015),'Large-scalesimplequestionansweringwithmemorynetworks',arXivpreprintarXiv:1506.
02075.
Both,A.
,Diefenbach,D.
,Singh,K.
,Shekarpour,S.
,Cherix,D.
&Lange,C.
(2016),Qanary–amethodologyforvocabulary-drivenopenquestionansweringsystems,in'InternationalSemanticWebConference',Springer.
Cabrio,E.
,Cojan,J.
,Aprosio,A.
P.
,Magnini,B.
,Lavelli,A.
&Gandon,F.
(n.
d.
),QAKiS:anopendomainQAsystembasedonrelationalpatterns,in'Proceedingsofthe2012thInternationalConferenceonPosters&DemonstrationsTrack-Volume914',CEUR-WS.
org.
Cimiano,P.
,Lopez,V.
,Unger,C.
,Cabrio,E.
,Ngomo,A.
-C.
N.
&Walter,S.
(2013),Multilingualquestionansweringoverlinkeddata(qald-3):Laboverview,in'In-formationAccessEvaluation.
Multilinguality,Multimodality,andVisualization',Springer.
Cimiano,P.
&Minock,M.
(2009),Naturallanguageinterfaces:Whatistheproblem-adata-drivenquantitativeanalysis.
,in'NLDB',Springer,pp.
192–206.
Clarke,D.
(2015),'Simple,fastsemanticparsingwithatensorkernel',arXivpreprintarXiv:1507.
00639.
Cunningham,H.
,Maynard,D.
,Bontcheva,K.
&Tablan,V.
(2002),GATE:AFrame-workandGraphicalDevelopmentEnvironmentforRobustNLPToolsandApplica-tions,in'Proceedingsofthe40thAnniversaryMeetingoftheAssociationforCom-putationalLinguistics(ACL'02)'.
Dai,Z.
,Li,L.
&Xu,W.
(2016),'Cfo:Conditionalfocusedneuralquestionansweringwithlarge-scaleknowledgebases',arXivpreprintarXiv:1606.
01994.
Daiber,J.
,Jakob,M.
,Hokamp,C.
&Mendes,P.
N.
(2013),Improvingefciencyandaccuracyinmultilingualentityextraction,in'Proceedingsofthe9thInternationalConferenceonSemanticSystems',ACM.
Damljanovic,D.
,Agatonovic,M.
&Cunningham,H.
(2010),IdenticationoftheQues-tionFocus:CombiningSyntacticAnalysisandOntology-basedLookupthroughtheUserInteraction.
,in'LREC'.
Damljanovic,D.
,Agatonovic,M.
&Cunningham,H.
(2012),FREyA:Aninteractivewayofqueryinglinkeddatausingnaturallanguage,in'TheSemanticWeb:ESWC2011Workshops',Springer.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey37Diefenbach,D.
,Amjad,S.
,Both,A.
,Singh,K.
&Maret,P.
(2017),Trill:Areusablefront-endforqasystems,in'ESWCP&D'.
Diefenbach,D.
,Singh,K.
,Both,A.
,Cherix,D.
,Lange,C.
&Auer,S.
(2017),TheQa-naryEcosystem:gettingnewinsightsbycomposingQuestionAnsweringpipelines,in'ICWE'.
Dima,C.
(2013),'Intui2:Aprototypesystemforquestionansweringoverlinkeddata',ProceedingsoftheQuestionAnsweringoverLinkedDatalab(QALD-3)atCLEF.
Dima,C.
(2014),AnsweringnaturallanguagequestionswithIntui3,in'ConferenceandLabsoftheEvaluationForum(CLEF)'.
Dong,L.
,Wei,F.
,Zhou,M.
&Xu,K.
(2015),Questionansweringoverfreebasewithmulti-columnconvolutionalneuralnetworks.
,in'ACL(1)'.
Dwivedi,S.
K.
&Singh,V.
(2013),'Researchandreviewsinquestionansweringsys-tem',ProcediaTechnology10,417–424.
Fader,A.
,Soderland,S.
&Etzioni,O.
(2011),Identifyingrelationsforopeninforma-tionextraction,in'ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing',AssociationforComputationalLinguistics.
Fader,A.
,Zettlemoyer,L.
S.
&Etzioni,O.
(2013),Paraphrase-drivenlearningforopenquestionanswering.
,in'ACL(1)',Citeseer.
Ferrandez,O.
,Spurk,C.
,Kouylekov,M.
,Dornescu,I.
,Ferrandez,S.
,Negri,M.
,Izquierdo,R.
,Tomas,D.
,Orasan,C.
,Neumann,G.
etal.
(2011),'TheQALL-MEframework:Aspeciable-domainmultilingualquestionansweringarchitecture',Websemantics:Science,servicesandagentsontheworldwideweb.
Ferré,S.
(2013),squall2sparql:aTranslatorfromControlledEnglishtoFullSPARQL1.
1,in'Work.
MultilingualQuestionAnsweringoverLinkedData(QALD-3)'.
Ferré,S.
(2017),'Sparklis:anexpressivequerybuilderforsparqlendpointswithguid-anceinnaturallanguage',SemanticWeb8(3),405–418.
Freitas,A.
&Curry,E.
(2014),Naturallanguagequeriesoverheterogeneouslinkeddatagraphs:Adistributional-compositionalsemanticsapproach,in'Proceedingsofthe19thinternationalconferenceonIntelligentUserInterfaces',ACM.
Freitas,A.
,Curry,E.
,Oliveira,J.
G.
&O'Riain,S.
(2012),'QueryingHeterogeneousDatasetsontheLinkedDataWeb:Challenges,Approaches,andTrends',IEEEIn-ternetComputing.
Freitas,A.
,EfsonSales,J.
,Handschuh,S.
&Curry,E.
(2015),HowhardisthisqueryMeasuringtheSemanticComplexityofSchema-agnosticQueries,in'Proceedingsofthe11thInternationalConferenceonComputationalSemantics',AssociationforComputationalLinguistics,London,UK.
Gerber,D.
&Ngomo,A.
-C.
N.
(2011),Bootstrappingthelinkeddataweb,in'1stWork-shoponWebScaleKnowledgeExtraction@ISWC',Vol.
2011.
Giannone,C.
,Bellomaria,V.
&Basili,R.
(2013),'AHMM-basedapproachtoquestionansweringagainstlinkeddata',ProceedingsoftheQuestionAnsweringoverLinkedDatalab(QALD-3)atCLEF.
Golub,D.
&He,X.
(2016),'Character-levelquestionansweringwithattention',arXivpreprintarXiv:1604.
00727.
Google(2016),'Freebasedatadumps',https://developers.
google.
com/freebase/data.
Hakimov,S.
,Unger,C.
,Walter,S.
&Cimiano,P.
(2015),Applyingsemanticparsingtoquestionansweringoverlinkeddata:Addressingthelexicalgap,in'NaturalLan-guageProcessingandInformationSystems',Springer.
38D.
DiefenbachetalHamon,T.
,Grabar,N.
,Mougin,F.
&Thiessard,F.
(2014),DescriptionofthePOMELOSystemfortheTask2ofQALD-2014.
,in'CLEF(WorkingNotes)'.
He,S.
,Zhang,Y.
,Liu,K.
&Zhao,J.
(2014),'CASIA@V2:AMLN-basedQuestionAnsweringSystemoverLinkedData',Proc.
ofQALD-4.
Hffner,K.
&Lehmann,J.
(2015),'QuestionAnsweringonStatisticalLinkedData'.
Hffner,K.
,Walter,S.
,Marx,E.
,Usbeck,R.
,Lehmann,J.
&NgongaNgomo,A.
-C.
(2016),'SurveyonChallengesofQuestionAnsweringintheSemanticWeb',Se-manticWebJournal.
Jain,S.
(2016),Questionansweringoverknowledgebaseusingfactualmemorynet-works,in'ProceedingsofNAACL-HLT'.
Joris,G.
&Ferré,S.
(2013),Scalewelis:ascalablequery-basedfacetedsearchsystemontopofsparqlendpoints,in'Work.
MultilingualQuestionAnsweringoverLinkedData(QALD-3)'.
Kolomiyets,O.
&Moens,M.
-F.
(2011),'Asurveyonquestionansweringtechnologyfromaninformationretrievalperspective',Inf.
Sci.
181(24),5412–5434.
Lopez,V.
,Fernández,M.
,Motta,E.
&Stieler,N.
(2012),'Poweraqua:SupportingUsersinQueryingandExploringtheSemanticWeb',Semant.
web3(3).
Lopez,V.
,Tommasi,P.
,Kotoulas,S.
&Wu,J.
(2016),Queriodali:Questionansweringoverdynamicandlinkedknowledgegraphs,in'InternationalSemanticWebConfer-ence',Springer,pp.
363–382.
Lopez,V.
,Unger,C.
,Cimiano,P.
&Motta,E.
(2013),'Evaluatingquestionansweringoverlinkeddata',WebSemantics:Science,ServicesandAgentsontheWorldWideWeb.
Lopez,V.
,Uren,V.
,Motta,E.
&Pasin,M.
(2007),'Aqualog:Anontology-drivenques-tionansweringsystemfororganizationalsemanticintranets',WebSemantics:Sci-ence,ServicesandAgentsontheWorldWideWeb5(2),72–105.
Lopez,V.
,Uren,V.
,Sabou,M.
&Motta,E.
(2011),'Isquestionansweringtforthesemanticwebasurvey',SemanticWeb2(2).
Lukovnikov,D.
,Fischer,A.
,Lehmann,J.
&Auer,S.
(2017),Neuralnetwork-basedquestionansweringoverknowledgegraphsonwordandcharacterlevel,in'Proceed-ingsofthe26thInternationalConferenceonWorldWideWeb',InternationalWorldWideWebConferencesSteeringCommittee,pp.
1211–1220.
Mahendra,R.
,Wanzare,L.
,Bernardi,R.
,Lavelli,A.
&Magnini,B.
(2011),Acquiringrelationalpatternsfromwikipedia:Acasestudy,in'Proc.
ofthe5thLanguageandTechnologyConference'.
Marginean,A.
(2017),'Questionansweringoverbiomedicallinkeddatawithgrammat-icalframework',SemanticWeb8(4),565–580.
Marx,E.
,Usbeck,R.
,Ngomo,A.
-C.
N.
,Hffner,K.
,Lehmann,J.
&Auer,S.
(2014),Towardsanopenquestionansweringarchitecture,in'Proceedingsofthe10thInter-nationalConferenceonSemanticSystems',ACM.
Mazzeo,G.
M.
&Zaniolo,C.
(2016),'AnsweringControlledNaturalLanguageQues-tionsonRDFKnowledgeBases'.
Nakashole,N.
,Weikum,G.
&Suchanek,F.
(2012),PATTY:ataxonomyofrelationalpatternswithsemantictypes,in'Proceedingsofthe2012JointConferenceonEmpir-icalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning',AssociationforComputationalLinguistics.
Park,S.
,Shim,H.
&Lee,G.
G.
(2014),ISOFTatQALD-4:Semanticsimilarity-basedquestionansweringsystemoverlinkeddata,in'CLEF'.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey39Pouran-ebnveyseh,A.
(2016),Cross-LingualQuestionAnsweringUsingProleHMM&UniedSemanticSpace,in'ESWC'.
toappear.
Pradel,C.
,Haemmerlé,O.
&Hernandez,N.
(2012),Asemanticwebinterfaceusingpatterns:theSWIPsystem,in'GraphStructuresforKnowledgeRepresentationandReasoning',Springer.
Reddy,S.
,Lapata,M.
&Steedman,M.
(2014),'Large-scalesemanticparsingwithoutquestion-answerpairs',TransactionsoftheAssociationforComputationalLinguis-tics.
Reddy,S.
,Tckstrm,O.
,Collins,M.
,Kwiatkowski,T.
,Das,D.
,Steedman,M.
&La-pata,M.
(2016),'Transformingdependencystructurestologicalformsforsemanticparsing',TransactionsoftheAssociationforComputationalLinguistics.
Ruseti,S.
,Mirea,A.
,Rebedea,T.
&Trausan-Matu,S.
(2015),QAnswer-EnhancedEn-tityMatchingforQuestionAnsweringoverLinkedData,in'CLEF(WorkingNotes)',CLEF.
Shekarpour,S.
,Marx,E.
,Ngomo,A.
-C.
N.
&Auer,S.
(2015),'Sina:Semanticinter-pretationofuserqueriesforquestionansweringoninterlinkeddata',WebSemantics:Science,ServicesandAgentsontheWorldWideWeb.
Song,D.
,Schilder,F.
,Smiley,C.
,Brew,C.
,Zielund,T.
,Bretz,H.
,Martin,R.
,Dale,C.
,Duprey,J.
,Miller,T.
etal.
(2015),TRdiscover:ANaturalLanguageInterfaceforQueryingandAnalyzingInterlinkedDatasets,in'TheSemanticWeb-ISWC2015',Springer.
Ture,F.
&Jojic,O.
(2016),'Simpleandeffectivequestionansweringwithrecurrentneuralnetworks',arXivpreprintarXiv:1606.
05029.
Unger,C.
,Bühmann,L.
,Lehmann,J.
,NgongaNgomo,A.
-C.
,Gerber,D.
&Cimiano,P.
(2012),Template-basedquestionansweringoverrdfdata,in'Proceedingsofthe21stinternationalconferenceonWorldWideWeb',ACM,pp.
639–648.
Unger,C.
,Forascu,C.
,Lopez,V.
,Ngomo,A.
-C.
N.
,Cabrio,E.
,Cimiano,P.
&Walter,S.
(2014),Questionansweringoverlinkeddata(QALD-4),in'WorkingNotesforCLEF2014Conference'.
Unger,C.
,Forascu,C.
,Lopez,V.
,Ngomo,A.
-C.
N.
,Cabrio,E.
,Cimiano,P.
&Walter,S.
(2015),AnsweringoverLinkedData(QALD-5).
,in'WorkingNotesforCLEF2015Conference'.
Unger,C.
,Ngomo,A.
-C.
N.
,Cabrio,E.
&Cimiano(2016),6thOpenChallengeonQuestionAnsweringoverLinkedData(QALD-6),in'TheSemanticWeb:ESWC2016Challenges.
'.
Usbeck,R.
,Ngomo,A.
-C.
N.
,Bühmann,L.
&Unger,C.
(2015),HAWK–HybridQues-tionAnsweringUsingLinkedData,in'TheSemanticWeb.
LatestAdvancesandNewDomains',Springer.
Walter,S.
,Unger,C.
&Cimiano,P.
(2014),M-ATOLL:aframeworkforthelexical-izationofontologiesinmultiplelanguages,in'TheSemanticWeb–ISWC2014',Springer.
Walter,S.
,Unger,C.
,Cimiano,P.
&Br,D.
(2012),Evaluationofalayeredapproachtoquestionansweringoverlinkeddata,in'TheSemanticWeb–ISWC2012',Springer.
Wang,Z.
,Yan,S.
,Wang,H.
&Huang,X.
(2014),Anoverviewofmicrosoftdeepqasystemonstanfordwebquestionsbenchmark,Technicalreport,Technicalreport,Mi-crosoftResearch.
Wu,F.
&Weld,D.
S.
(2010),OpeninformationextractionusingWikipedia,in'Proceed-40D.
Diefenbachetalingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics',AssociationforComputationalLinguistics.
Xu,K.
,Feng,Y.
&Zhao,D.
(2014),'Xser@QALD-4:AnsweringNaturalLanguageQuestionsviaPhrasalSemanticParsing'.
Yahya,M.
,Berberich,K.
,Elbassuoni,S.
,Ramanath,M.
,Tresp,V.
&Weikum,G.
(2012),Naturallanguagequestionsforthewebofdata,in'Proceedingsofthe2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandCom-putationalNaturalLanguageLearning',AssociationforComputationalLinguistics.
Yahya,M.
,Berberich,K.
,Elbassuoni,S.
&Weikum,G.
(2013),Robustquestionan-sweringovertheweboflinkeddata,in'Proceedingsofthe22ndACMinternationalconferenceonConferenceoninformation&knowledgemanagement',ACM.
Yang,M.
-C.
,Duan,N.
,Zhou,M.
&Rim,H.
-C.
(2014),Jointrelationalembeddingsforknowledge-basedquestionanswering.
,in'EMNLP'.
Yang,M.
-C.
,Lee,D.
-G.
,Park,S.
-Y.
&Rim,H.
-C.
(2015),'Knowledge-basedquestionansweringusingthesemanticembeddingspace',ExpertSystemswithApplications.
Yao,X.
(2015),Leanquestionansweringoverfreebasefromscratch.
,in'HLT-NAACL'.
Yao,X.
&VanDurme,B.
(2014),InformationExtractionoverStructuredData:Ques-tionAnsweringwithFreebase.
,in'ACL(1)',Citeseer.
Yates,A.
,Cafarella,M.
,Banko,M.
,Etzioni,O.
,Broadhead,M.
&Soderland,S.
(2007),Textrunner:openinformationextractionontheweb,in'ProceedingsofHumanLan-guageTechnologies:TheAnnualConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:Demonstrations',AssociationforCom-putationalLinguistics.
Yavuz,S.
,Gur,I.
,Su,Y.
,Srivatsa,M.
&Yan,X.
(2016),Improvingsemanticparsingviaanswertypeinference.
,in'EMNLP',pp.
149–159.
Yih,S.
W.
-t.
,Chang,M.
-W.
,He,X.
&Gao,J.
(2015),'Semanticparsingviastagedquerygraphgeneration:Questionansweringwithknowledgebase'.
Yih,W.
-t.
,Richardson,M.
,Meek,C.
,Chang,M.
-W.
&Suh,J.
(2016),Thevalueofsemanticparselabelingforknowledgebasequestionanswering.
,in'ACL(2)'.
Yin,W.
,Yu,M.
,Xiang,B.
,Zhou,B.
&Schütze,H.
(2016),'Simplequestionansweringbyattentiveconvolutionalneuralnetwork',arXivpreprintarXiv:1606.
03391.
Yosef,M.
A.
,Hoffart,J.
,Bordino,I.
,Spaniol,M.
&Weikum,G.
(2011),'Aida:Anonlinetoolforaccuratedisambiguationofnamedentitiesintextandtables',Pro-ceedingsoftheVLDBEndowment4.
Zettlemoyer,L.
S.
&Collins,M.
(2012),'Learningtomapsentencestologicalform:Structuredclassicationwithprobabilisticcategorialgrammars',arXivpreprintarXiv:1207.
1420.
Zhang,Y.
,Liu,K.
,He,S.
,Ji,G.
,Liu,Z.
,Wu,H.
&Zhao,J.
(2016),'Questionanswer-ingoverknowledgebasewithneuralattentioncombiningglobalknowledgeinforma-tion',arXivpreprintarXiv:1606.
00979.
Zhange,Y.
,He,S.
,Liu,K.
&Zhao,J.
(2016),'AJointModelforQuestionAnsweringoverMultipleKnowledgeBases'.
Zhu,C.
,Ren,K.
,Liu,X.
,Wang,H.
,Tian,Y.
&Yu,Y.
(2015),'AGraphTraver-salBasedApproachtoAnswerNon-AggregationQuestionsOverDBpedia',arXivpreprintarXiv:1510.
04780.
Zou,L.
,Huang,R.
,Wang,H.
,Yu,J.
X.
,He,W.
&Zhao,D.
(2014),NaturallanguagequestionansweringoverRDF:agraphdatadrivenapproach,in'Proceedingsofthe2014ACMSIGMODinternationalconferenceonManagementofdata',ACM.
CoreTechniquesofQuestionAnsweringSystemsoverKnowledgeBases:aSurvey41AuthorBiographiesDennisDiefenbachisaeuropeanresercher.
In2010hereceivedapre-degreeinphysics.
HegotaB.
S.
andaM.
S.
inmathematicsin2012and2014respectively,attheUniversityofKaiserslautern,Germany.
Thespecializationwasinalgebra,geometryandcomputeralgebra.
In2015hereceivedaMarieSklodowska-CuriefellowshipintheframeworkoftheWDAquaproject(www.
wdaqua.
eu).
HeisaPh.
D.
StudentatLaboratoireHubertCurieninSaint-Etienne,France.
Hisre-searchinterestisinQuestionAnsweringoverKnowledgeBasesandrelatedtop-ics.
Heisthemaindeveloperbehindareusablefront-endforQAsystemscalledTrillhttps://github.
com/WDAqua/TrillandtheQAsystemcalledWDAqua-core0thatcanbefoundunderwww.
wdaqua.
eu/qa.
VanessaLopezisaresearcheratIBMResearchIrelandsince2012,whereshein-vestigatessolutionsforharnessingurbanandwebdataascityknowledge,throughLinkedDatatechnologiestosupportdataintegration,andenvisionnaturalwaysforuserstoquery,exploreandndusefulinsightsacrossdatasources.
Herre-searchhasbeenappliedtodevelopcontext-awareapplicationsforsmartercitiesandSocialandHealthcare.
PrevioustojoiningIBM,shewasaresearcheratKMi(OpenUniversity)from2003,wheresheinvestigatedNLQuestionAnswer-inginterfacesfortheWebofDataandreceivedaPhDdegree.
Shegraduatedin2002withadegreeincomputerengineerfromtheTechnicalUniversityofMadrid(UPM),wheresheheldaninternshipattheAILab.
Shehasco-authoredmorethan40publicationsinhighimpactconferenceandjournals.
KamalDeepSinghreceivedtheB.
Tech.
degreeinelectricalengineeringfromtheIndianInstituteofTechnologyDelhi(IITD),Delhi,India,in2002andthePh.
D.
degreeincomputersciencefromtheUniversityofRennes1,Rennes,Francein2007.
HethenjoinedtheDionysosGroup,NationalResearchInsti-tuteinComputerScience(INRIA),asaPostdoctoralResearcher,wherehecode-velopedmanycomponentsofquality-of-experienceestimationtoolsandworkedmainlyontheanalysisofvideo-basedapplications.
Afterthat,hewasaPostdoc-toralResearcherwithTelecomBretagne,Rennes,whereheworkedonInternetofThingsandcognitiveradio.
HeiscurrentlyanAssociateProfessorwiththeUniversitéofSaint-tienne,Saint-tienne,France,whereheispartofthere-searchteamcalledConnectedIntelligenceattheLaboratoireHubertCurien.
HisresearchinterestsincludeInternetofThings,smartcities,bigdata,semanticweb,qualityofexperience,andsoftware-denednetworking.
PierreMaretisaProfessorinComputerScienceattheUniversityJeanMonnet(UniversityofLyon),LaboratoireHubertCurien,inSaintEtiennesince2009.
HereceivedaPhDinComputerSciencein1995andbecameanAssociateProfessoratINSALyonin1997.
Hisresearchinterestaredataandknowledgemodeling,semanticweb,knowledgemanagement,socialnetworks,virtualcommunities.
Heconductsresearchincollaborationwithinternationalresearchgroupsandindus-try.
HeleadstheITNMarieSklodowska-CurieWDAquaforitsFrenchpart,andco-chairstheworkshopWebIntelligenceandCommunitieshostedatTheWebConference(W3c).
HeleadstheinternationalmastertrackCyber-PhysicalandSocialSystems(CPS2)andcoordinatestheinternationalrelationsoftheFacultyofSciencesandTechnologies.
Correspondenceandoffprintrequeststo:DennisDiefenbach,UniversitédeLyon,CNRSUMR5516Labo-ratoireHubertCurien,F-42023,Saint-Etienne,France.
Email:dennis.
diefenbach@univ-st-etienne.
frViewpublicationstatsViewpublicationstats

展开全文

disadvantagegraphcore相关文档

"GlobalUnicornClub:PrivateCompaniesValuedat$1B+(asofMarch14,2019)",,,,,

网易网盘关闭入口如何快速开通网易网盘？美国互联网瘫痪美国网络大瘫痪到底是怎么发生的今日油条油条是怎样由来百度商城百度商城知道在哪个地方，怎么找不到啊同ip网站查询服务器禁PING 是不是就可以解决同IP网站查询问题嘀动网手机一键通用来干嘛呢？www.7788dy.comwww.tom365.com这个免费的电影网站有毒吗?百度指数词百度指数是指，词不管通过什么样的搜索引擎进行搜索，都会被算成百度指数吗？抓站工具仿站必备软件有哪些工具？最好好用的仿站工具是那个几个？yinrentangweichentang产品功效好不好？万网域名管理瓦工新世界机房 hawkhost优惠码 ix主机 360抢票助手香港新世界电讯 debian源服务器架设三拼域名 admit的用法免费防火墙阿里校园美国网站服务器如何安装服务器系统香港新世界中心上海服务器怎么建立邮箱吉林铁通 starry 更多

disadvantagegraphcore

妮妮云(100元/月)阿里云香港BGP专线 2核 4G

iON Cloud：新加坡cn2 gia vps/1核/2G内存/25G SSD/250G流量/10M带宽,$35/月

wordpress通用企业主题 wordpress高级企业自适应主题