close120ask.com

120ask.com 时间:2021-03-25 阅读:()

Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52https://doi.
org/10.
1186/s12911-019-0761-8RESEARCHOpenAccessApplyingdeepmatchingnetworkstoChinesemedicalquestionanswering:astudyandadatasetJunqingHe1,2*,MingmingFu1,2andManshuTu1,2From4thChinaHealthInformationProcessingConferenceShenzhen,China.
1-2December2018AbstractBackground:Medicalandclinicalquestionanswering(QA)ishighlyconcernedbyresearchersrecently.
Thoughthereareremarkableadvancesinthisfield,thedevelopmentinChinesemedicaldomainisrelativelybackward.
ItcanbeattributedtothedifficultyofChinesetextprocessingandthelackoflarge-scaledatasets.
Tobridgethegap,thispaperintroducesaChinesemedicalQAdatasetandproposeseffectivemethodsforthetask.
Methods:WefirstconstructalargescaleChinesemedicalQAdataset.
Thenweleveragedeepmatchingneuralnetworkstocapturesemanticinteractionbetweenwordsinquestionsandanswers.
ConsideringthatChineseWordSegmentation(CWS)toolsmayfailtoidentifyclinicalterms,wedesignamoduletomergethewordsegmentsandproduceanewrepresentation.
Itlearnsthecommoncompositionsofwordsorsegmentsbyusingconvolutionalkernelsandselectsthestrongestsignalsbywindowedpooling.
Results:ThebestperformeramongpopularCWStoolsonourdatasetisfound.
Inourexperiments,deepmatchingmodelssubstantiallyoutperformexistingmethods.
Resultsalsoshowthatourproposedsemanticclusteredrepresentationmoduleimprovestheperformanceofmodelsbyupto5.
5%Precisionat1and4.
9%MeanAveragePrecision.
Conclusions:Inthispaper,weintroducealargescaleChinesemedicalQAdatasetandcastthetaskintoasemanticmatchingproblem.
WealsocomparedifferentCWStoolsandinputunits.
Amongthetwostate-of-the-artdeepmatchingneuralnetworks,MatchPyramidperformsbetter.
Resultsalsoshowtheeffectivenessoftheproposedsemanticclusteredrepresentationmodule.
Keywords:Medicalquestionanswering,Chinesewordsegmentation,Semanticmatching,Convolutionalneuralnetworks,DeeplearningBackgroundAutomaticmedicalquestionansweringisaspecialkindofquestionanswering(QA)thatisinvolvedwithmed-icalorclinicalknowledge.
ThereisanurgentneedtodevelopadvancedautomaticmedicalQAsystemsbecauseofinsufficientprofessionalsandinconvenientaccesstohospitalsforsomepeople.
AccordingtoanAmerican*Correspondence:hejunqing@hccl.
ioa.
ac.
cn1KeyLaboratoryofSpeechAcousticsandContentUnderstanding,InstituteofAcoustics,ChineseAcademyofSciences,100190Beijing,China2UniversityofChineseAcademyofSciences,100049Beijing,Chinahealthsurvey,59%ofU.
S.
adultshadlookedontheInter-netforhealthinformation,amongwhich77%ofthemutilizedthegeneralsearchengines[1].
However,theyhavetofilternumerousresultsoftheirqueriestofinddesiredinformation.
Forthissake,healthconsultancywebsiteshavearisen,withthousandsofmedicalprofessionalsandenthusiasticpatientsansweringthequestionsproposedbyusers.
Butthiskindofservicefailstoprovideimmediateandaccurateanswersforusers,whichisunbearableforsomepatients.
Moreover,medicalQAsystemsalsobene-fitphysiciansbyprovidingpreviousanswersfromfellowsasareference.
TheAuthor(s).
2019OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.
0InternationalLicense(http://creativecommons.
org/licenses/by/4.
0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade.
TheCreativeCommonsPublicDomainDedicationwaiver(http://creativecommons.
org/publicdomain/zero/1.
0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated.
Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page92of197TradictionalMedicalQAThepreviousstudyonmedicalQAmainlyfocusedonextractinganswersfrompassagesinbooks,healthcarerecords,andotherclinicalmaterialstoassistindeci-sionmaking[2].
Untilnow,remarkableprogresshasbeenmadebyresearchesandadvancedinformationretrievaltechniqueshavebeenappliedtothistask[3–6].
ButtheseworkswerewithinadominantparadigmofEvidenced-BasedMedicine(EBM)thatprovidesscientificevidenceinsteadofapreciseanswerandonlytargetedatcertaintypesofquestions.
Theselimitationsmadetheminquisi-tiveforpatientsandnon-professionalpeople.
Thenon-linemedicalQAhasbeendrawingtheatten-tionofscholarsforitstremendousneed.
JainandDodiyapresentedrule-basedarchitecturesforonlinemedicalQAandintroducedquestionprocessingandanswersretrievalindetail[7].
However,rulesfailedtocoverlinguisticvarietyinpractice.
Wangetal.
proposedtotrainwordembeddings[8,9]assemanticrepresentationandevalu-atethesimilaritybetweenwordsasthecorrelationscorebetweensentences[10].
However,allthemethodsaboverelyonwell-designedtemplates,sophisticatedfeatures,andvariousmanualtuning.
ChineseMedicalQAComparedtoEnglishmedicalQAsystem,theresearchofChineseQAinthemedicalfieldareimmatureandarestillinapreliminarystageofdevelopment[2].
Itisachallengingtaskthathastwomaindifficulties:1Chinesewordsegmentation(CWS)performsworseinthemedicaldomainthaninopen-domain.
Fordictionary-basedmethods,therearenotpubliclyavailableChineseclinicalknowledgebaseandastandardofclinicaltermslikeSystematizedNomenclatureofMedicine(SNOMED).
Fordata-drivenmethods,therearenoannotatedChinesemedicaltextsdatatotrainaCWStool.
Moreover,thereareunprofessionaldescriptions,typingerrors,andabbreviationsinon-lineQAdata.
ThesephenomenaalsodegradetheperformanceofCWStools.
2TherearenotenoughChinesemedicalQAdatasetsforstudy.
ThoughtherearedatafromchallengesforpromotingresearchonmedicalQA,includingBioASQchallenges[11],CLEFtasks,andTRECmedicaltracks[12],noneofthemwereinChinese.
Tobridgethegap,weconstructalargeChinesemedicalnon-factoidQAdatasetformulatedinnaturallanguage,namelywebMedQA,andmakeitpubliclyavailable.
Evenso,Licombinedmulti-labelclassificationscoresandBM25[13]valuesforquestionretrievaloveracorpusofpre-builtquestion-answerpairs[14].
HealsoappliedtheTextRank[15]algorithmtothere-rankingofcandidates.
Hisdatawerecrawledfromthewebandnotpubliclyavailable.
ThemethodwasbasedonwordsandsufferedfromChinesewordsegmentationfailureinsomecases.
ThenZhangetal.
proposedamulti-scaleconvolutionalneuralnetwork(CNN,[16])forChinesemedicalQAandreleasedadataset[17].
(Itistheonlyonethatispubliclyavailableasweknow).
Thisend-to-endapproachelimi-nateshumaneffortsandpreventsfromCWSfailurebyusingcharacter-basedinput.
However,itusesthecosinedistanceasthesimilaritybetweentheCNNrepresenta-tionofquestionsandanswers,whichcouldnotcapturetherelationofwordsbetweenquestionsandanswers.
DeepMatchinginOpen-domainQAAsforQAinopen-domain,researchershavedisplayedmeaningfulworktoselectanswersbysemanticmatch-inginvariouslevel.
Huetal.
proposeARC-IandARC-II,whichfirstconductedword-levelmatchingbetweensen-tencesthenappliedCNNstoextracthigh-levelsignalsfrommatchingresults[18].
QiuandHuangthenupgradedthestructureofARC-Ibyatensorlayer[19].
Later,Long-shorttermmemory(LSTM,[20])wasadoptedtocon-structsentencerepresentationsandusedcosinesimilarityasscores[21].
Wanetal.
furtherimprovedtherepresen-tationbystrengtheningthepositioninformationusingabidirectionalLSTM[22]andreplacedthecosinesimilar-itywithmultiplelayerperceptron(MLP).
Pangetal.
thenproposedMatchPyramidtoextracthierarchicalsignalsfromwords,phraseandsentencelevelusingCNNs[23],whichcouldcapturerichmatchingpatternsandidentifysalientsignalssuchasn-gramandn-termmatchings.
Inthispaper,wecasttheQAtaskintoasemanticmatchingproblemthatselectsthemostrelatedanswer.
WefirstfindthebestCWStoolsandthemostsuitableinputunitforthetask.
Thenweapplydifferentstate-of-the-artmatchingmodelsinourtaskandcomparethemwithbaselines.
WefurtherproposeaCNN-basedseman-ticclusteredrepresentation(CSCR)tomergethewordsegmentsthatareprobablysplitwrongbyCWSandpro-duceanewrepresentationthatiscompatiblewithdeepmatchingmodels.
Themaincontributionsofthisworkcanbesummarizedasfollows:Weconstructalarge-scalecomprehensiveChinesemedicalQAcorpusforresearchandpracticalapplication.
Toourknowledge,itisthelargestpubliclyavailableChinesemedicalQAcorpussofar.
WeproposeaneuralnetworktoworkaroundtheCWSproblemforChinesemedicaltexts.
Itcansemanticallyclustercharactersorwordsegmentsintowordsandclinicaltermsthenproduceawordlevelrepresentation.
Tothebestofourknowledge,itHeetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page93of197isthefirstmodeltoimproveresultsofCWSinputsbypost-processing.
WeapplysemanticmatchingapproachestoChinesemedicalQAandconductaserialofexperimentsondifferentinputunitsandmatchingmodels.
WebuildabrandnewChinesemedicalQAsystemusingthebestperformerandreportabenchmarkresultonourdataset.
MethodsDatasetConstructionandContentOurChinesemedicalquestionanswering(QA)dataarecollectedfromprofessionalhealth-relatedconsultancywebsitessuchasBaiduDoctor[24]and120Ask[25].
Usersfirstfillintheformofpersonalinformation,thendescribetheirsicknessesandhealthquestions.
Thesequestionsareopentoalltheregisteredcliniciansandusersuntilthequestionproposerchoosethemostsatisfyinganswerandclosethequestion.
Doctorsandenthusiasticuserscanprovidetheirdiagnosesandadviceunderthepostedquestionswiththeirtitlesandspecializebeingdisplayedtogetherwiththeiranswers.
Thequestionerscanalsoinquirefurtheriftheyareinterestedinoneoftheanswers,whichisararecaseinthecollecteddata.
Thecategoryeachquestionbelongstoisalsoselectedbyitsproposer.
Wefilteredthequestionsthathaveadoptedanswersamongallthecollecteddata,whichadduptoatotalof65941pieces.
Thenwecleanedupallthewebtags,links,andgarbledbytesleavingonlydigits,punctuations,Chi-neseandEnglishcharactersusingourpreprocessingtool.
Wealsodroppedthequestionsthattheirbestanswersarelongerthan500characters.
Thequestionsthathavemorethanonebest-adoptedrepliesarealsoremoved.
Finally,wegotasetof63284questions.
Wefurthersampled4neg-ativeanswersforeachquestionforrelatedresearchsuchasanswerrankingandrecommendation.
Fortheques-tionsthathavelessthan4negativereplies,werandomlysampledanswersfromotherquestionsforsupplementa-tion.
Thenwesplitthedatasetintotraining,developmentandtestsetsaccordingtotheproportionof8:1:1ineachcategory.
Zhangetal.
alsointroducedaChineseMedi-calQAdataset(cMedQA)[17].
ComparisonofthesetwoopendatasetsislistedinTable1.
Thestatisticsofthequestionsandanswersinthetraining,validationandtestsetsarelistedinTable2.
Theaveragelengthofquestionsisshorterthantheanswers.
Allthelengthsaresimilarbetweenthetraining,developmentandtestsets.
InthewebMedQAdataset,eachlineisaQAsamplecontaining5fields:aquestionID,abinarylabelofwhethertheanswerisadopted,itscategory,thequestion,andananswer.
Theyareallsplitbyatab.
TheIDisuniqueforeachquestionandlabel1indicatestheansweriscor-rect.
Aclinicalcategoryisgivenforeachsamplebutmaybewronginsomecases.
ThetranslationoftheclinicalTable1ComparisonofcMedQAandourwebMedQAdatasetDatasetcMedQAwebMedQA#AnsTrain94134253050Dev377431685Test383531685Total101743316420#QuesTrain5000050610Dev20006337Test20006337Total5400063284ContaincategoryNoYescategory,questionandanswerarelistedinthecellundertheoriginaltexts,whicharenotincludedinthedataset.
AsampleisgiveninFig.
1.
Thereare23categoriesofconsultancyinourdataset,coveringmostoftheclinicaldepartmentsofcommondiseasesandhealthproblems.
Theamountoftheques-tionsineachcategoryinwebMedQAdatasetarelistedinTable3.
WecandiscoverthattheInternalMedicine,SurgeryandInternalMedicinearethemostconcerneddivisionsinthedataset.
Thereforemoremedicaleffortsshouldbeattachedtothesedivisionsinhospitals.
WhilethenumberofinquiriesaboutInternalMedicinehasreached18327,theamountsofquestionsaboutGenet-icsorMedicalExaminationareunderonehundred.
Thenumberofquestionsoverthecategoriesisseverelyimbal-anced.
ConvolutionalSemanticClusteredRepresentationCNNhasbeensuccessfullyappliedtoneurallanguageprocessinginmanyfieldsasanadvancedfeaturerepresen-tationincludingtextclassification[26],sentencemodeling[27],andQA[28].
Itcancapturelocalfeaturesusingcon-volvingfilers[16].
Basedonthisconsideration,weassumeTable2ThestatisticsofanswersandquestionsinwebMedQAdatasetTrainDevTestNumberofAns.
2530503168531685Avg.
LengthofAns.
146.
88147.
74148.
50MaxLengthofAns.
500499499MinLengthofAns.
222NumberofQues.
5061063376337Avg.
LengthofQues.
86.
6887.
4386.
08MaxLengthofQues.
131213021150MinLengthofQues.
235Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page94of197Fig.
1AsampleinthewebMedQA.
The5fieldsareontheleftwiththeircontentsontherightthatfiltersinCNNcanlearntoidentifyclinicaltermsandgeneratetheirrepresentation.
TheConvolutionalSemanticClusteredRepresentation(CSCR)modelemploysCNNtoautomaticallyrecognizethewordsandtermsbyMaxpoolingaroundtheneigh-borhood,inspiredbytheVeryDeepConvolutionalNeuralNetworks(VDCNN)[29].
ThearchitectureofCSCRisillustratedinFig.
2.
Letxi∈Rkbethek-dimensioncharacterembeddingcorrespondingtothei-thcharacterinthesentence.
Asentenceoflengthnisrepresentedasx1:n=x1x2xn(1)whereistheconcatenationoperator.
Forafilterw∈Rh*k,whichisappliedtoawindowheightofhcharac-terstoproduceafeatureci,theconvolutionoperationisformulatedasci=f(w·xi:i+h1+b)(2)Table3ThefrequencydistributionoverthecategoriesInternalMedicine18327Cosmetology775Surgery13511Drugs529Gynecology8691HealthCare439Pediatrics5312AssistantInspection430Dermatology4969Rehabilitation276Ophthalmology&3983HomeEnvironment253OtolaryngologyChildEducation247Oncology2118NutritionandHealth172MentalHealth1536Slimming169ChineseMedicine1452Genetics86InfectiousDiseases1360MedicalExamination64PlasticSurgery1211Others31wherexi:i+h1indicatestheconcatenationofcharactersxi,xi+1,xi+h1andb∈Risabiastermandfisanon-linearfunctionsuchastanhandReLU[30].
Thisfil-terisappliedtoeachpossiblewindowofcharactersinthesentencewithpaddingtoproduceafeaturemap:c=[c1,c2,cn](3)withc∈Rn.
Noticethatwegetafeaturemapofthesamelengthofsentencebecauseofpadding.
Wethenperformamax-over-timepoolingoperationwithwindowsizemforeverystepwithstridelengthd(disafactorofn).
Practi-cally,wefindthemaxsignalamongm=3andsetd=2tohaveaconvolutionresultoverlapped.
Thenwegetavectorofmaxvaluesc∈Rndc=[max{c1:m},max{c1+d:m+d}max{cnd:nd+m}](4)Theideaistocapturethemostimportantcompositionpatternsofcharacterstoformawordorclinicaltermineachwindowm.
Themaxvaluevectorcisconsideredasmaxcorrelationdegreesbetweenallpossibletermsinasentenceandfilterw.
Inotherwords,itisarepresentationofclusteredtermsinregardtofilterw.
Thisisthepro-cessbywhichonefilterrelatedtermsarerepresented.
Themodelusesmultiplefilters(withvariousheight)toobtainmultiplerepresentationofclusteredterms.
Andwecon-catenatethevectorsasmatrixz∈Rnd*|filters|witheachrowasasemanticrepresentationofcharactersinacertainblock(withndblocksintotal):z=c1,c2,c#filters(5)Givenaninputmatrixofembeddings,unlikethecanon-icalCNNthatresultedinasentencevector,ourmodelproducesamatrixwitheachrowbeingavectorofclus-teredsemanticsignals.
Thatmeansourmodelenablesword-levelsemanticmatchinginthefollowingoperations.
Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page95of197Fig.
2IllustrationofCSCRwithacharacter-levelinput.
misthelengthofinputsentenceanddisthelengthofembeddingforeachcharacterDeepMatchingNetworksAfterclusteringthecharactersintolatentmedicaltermsandrepresentingasentenceasmatrixz,weneedtocomputethematchingdegreesbetweentheclusteredrepresentationofaquestion-answerpairforidentifyingwhethertheansweristhebestone.
Weintroducetwodif-ferentmodelsforsemanticmatching:multiplepositionalsentencerepresentationwithLong-shortTermMemory(MV-LSTM,[22])andMatchPyramid[23]inthispaper.
MV-LSTMwasabasicmatchingmodelthathassteadyperformance.
MatchPyramidisthestate-of-the-artmodelfortextmatching.
MV-LSTMPositionalSentenceRepresentationItutilizesabidi-rectionalLSTM[20]togeneratetwohiddenstatestoreflectthemeaningforthewholesentencefromthetwodirectionsforeachword.
Thepositionalsentencerepre-sentationcanbeproducedbyconcatenatingthemdirectly.
UsingLSTM,fortheforwarddirectionwecanobtainahiddenvector→handobtainanother←hforthereversedirection.
Therepresentationforthepositiontinasen-tenceispt=→ht,←htT,where(·)Tstandsfortransposeoperationforamatrixorvector.
Forthesentenceoflengthl,anddimensionsized(hered=#fileters)ofeachposi-tionrepresentationforeachword,wefinallygetamatrixofsizel*dasthesemanticrepresentationofthesentence.
InteractionbetweenTwosentences.
Aftertherepre-sentationofthesentence,eachpositionofthequestionQandanswerAwillinteractandcomputeasimilarityscorematrixS∈Rm*n(mislengthofquestionmatrixQandnisthelengthofanswermatrixA)usingthebilinearmatrixB∈Rd*d(hered=#fileters).
EachelementsimofmatrixSiscomputedasfollows:sim→Qi,→Aj=→QiB→Aj+b(6)wherei,jdenotetheithandjthrowinQandArespec-tively,Bisthebilinearmatrixtore-weighttheinteractionsbetweendifferentdimensionsinvectorsandbisthebias.
Inthisway,wecancomputeasimilarityscorematrixofsizem*nwitheachelementdenotingthescoreoftwocorrespondingvectors.
WedonotusetheTensorLayerforfasterspeedandsmallerstorage.
Thisalsosimplifiesthemodelandmakeitsstructuremoreclear.
InteractionAggregationOncewecomputethesimilar-ityscorematrixbetweentwosentences,k-maxpoolingwillbeusedtoextractthemostkstrongestinteractionsasvectorvinthematrix[31].
Finally,weuseaMLPtoaggre-gatethefilteredinteractionsignals.
Weutilizetwolayersofneuralnetworksandgeneratethefinalmatchingscoreforabinaryclassifierasfollows.
(s0,s1)T=Wsf(Wrv+br)+bs(7)wheres0ands1arethefinalmatchingscoreofthecor-respondingclass,Wr,Wsstandfortheweightsandbr,bsstandforthecorrespondingbiases.
frepresentsanactiva-tionfunction,whichistanhinoursetting.
MatchPyramidUnlikeMV-LSTM,MatchPyramiddirectlyusesthewordembeddingsastextrepresentation.
Inoursystem,weusethematrixzastextrepresentationconsideringeachrowasawordembedding.
AmatchingmatrixSiscomputedwitheachelementsimbeingthedotproductofwordembeddingsfromquestionQandanswerArespectively:sim→Qi,→Aj=→Qi·→Aj(8)Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page96of197Basedonthisoperation,thematchingmatrixScorre-spondstoagrayimage.
HierachicalConvolutionThendifferentlayersofcon-volutionareperformed.
Eachconvolutionlayerisappliedtotheresultofthepreviousoperation.
SquarekernelsandReLUactivationareadopted.
Dynamicpoolingstrategyisthenusedafterward,whichisakindofmaxpoolinginarectanglearea.
Thentheresultsarereshapedtoavectorandfedtoafullyconnectedlayertopredictfinalmatchingscoress0ands1foreachquestion-answerpair.
ModelOptimizationSoftmaxfunctionisutilizedtothematchingscoresofeachclassforthebinaryclassifier.
Thencrossentropyisusedastheobjectivefunctionandthewholemodellearnstominimizing:loss=Ni=1yilogpi1+1y(i)log(p(i)0,(9)pk=eskes0+es1,k=0,1(10)wherey(i)isthelabelofthei-thtraininginstance.
WeapplystochasticgradientdescentmethodAdam[32]forparameterupdateanddropoutforregularization[33].
ResultsInthissection,weconductthreeexperimentsonourwebMedQAdataset.
ThefirstexperimentsinvestigatetheperformanceofMV-LSTMwithdifferentCWStools.
Thesecondexperimentcomparestheperformanceoftwoinputunitsandmatchingmodels.
Inthethirdexperiment,wevalidatewhethertheproposedCSCRrepresentationcanimprovethesystem'sperformance.
EvaluationMetricsTomeasuretheprecisionofourmodelsandtherank-ingofthegoldanswers,weusethePrecisionat1(P@1)andMeanAveragePrecision(MAP)asevaluationmetrics.
Sincethereisonlyonepositiveexampleinalist,P@1andMAPcanbeformalizedasfollowsP@1=1NNi=1δrs1a+i(11)MAP=1NNi=11rs1a+i(12)whereNisthenumberoftestingrankinglists,a+iistheithpositivecandidate.
r(·)denotestherankofasentenceandδistheindicatorfunction.
s1isthefinalscoreofclass1producedbymatchingmodelsasinEq.
7above.
ExperimentonCWStoolsWeusethreepopularChinesewordsegmentationtoolsincludingjieba[34],Ansj[35]andFnlp[36]tosplitthesentencesintotokensandchecktheirinfluencesintheresults.
Wedropallthewordsthatappearinthedatasetlessthantwice.
WeuseMV-LSTMasthematch-ingmodelhere.
Wesetthenumberofhiddenunitsofbi-LSTMto100andthedropoutrateissetto0.
5.
Wesetlengthq=50andlengtha=100,sinceitisthebestsettingfortheMV-LSTM.
kissetto50.
Wordembed-dingsarerandomlyinitializedwiththedimensionalityof200.
ThehiddensizeofLSTMis100.
Learningrateis0.
001andAdam[32]optimizerisused.
WeuseMatch-Zoo[37]andTensorFlow[38]forimplementation.
Werunthemodelsfor40epochsandpickupthebestper-formersonthevalidationsetandreporttheirresultsonthetestset.
TheresultsaredisplayedinTable4below.
AswecanseeinTable4,jiebaachievethehighestresultsinbothP@1andMAP.
AnsjperformstheworstinthesethreeCWStools.
ConsideringthatAnsjhasasmallervocabularysize,wesupposethattheAnsjcutssentencesintosmallersegments.
ExperimentonInputUnitsandModelsInthisexperiment,wecomparetheresultsofusingword-basedorcharacter-basedinputswithBM25,multi-CNN,MV-LSTMandMatchPyramidonourwebMedQAdataset.
Weusethesegmentedresultsfromjiebaastheword-levelinputssinceitperformsthebest.
Wedropallthewordsandcharactersthatappearinthedatasetlessthantwice.
Thevocabularysizeforcharactersis9648.
Formulti-CNN,wesetthekernelheightto3and4asin[17].
Weuse80kernelsforeachsizeandsetthemarginto0.
01forhingeloss.
Thelearningrateis0.
001.
ForMV-LSTM,theparametersettingsforword-basedinputareidenticaltothefirstexperimentabove.
Forcharacter-basedinput,wesetlengthq=200andlengtha=400.
ForMatchpyramid,theconvolutionkernelsareofsize[3,3]and64kernelsareused.
Asfordynamicpooling,thesizeissetto[3,10].
OtherparametersarethesameasMV-LSTM.
Wetrainthesemodelsfor50epochs.
ResultsaregiveninTable5.
Table4PerformanceofdifferentCWStoolsonwebMedQAwithMV-LSTMVocabSizeP@1(%)MAP(%)Ansj4414057.
773.
5Fnlp14505857.
974.
4jieba9463059.
375.
3Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page97of197Table5Theperformanceofdifferentmatchingmodelsusingcharacter-levelandword-levelinputsInputUnitModelP@1(%)MAP(%)Random20.
045.
7CharBM2526.
651.
2multiCNN[17]39.
860.
1MV-LSTM58.
174.
5MatchPyramid66.
079.
3WordBM2523.
649.
0multiCNN[17]40.
060.
5MV-LSTM59.
375.
3MatchPyramid58.
874.
9WecanseefromTable5thatmatchingmodelsout-performbaselinessubstantially.
Ittellsthatcapturingthesemanticsimilarityatthewordlevelenablethemodeltoachievegreatimprovement.
BM25performstheworst,only6.
6%higherthanrandomchoiceinP@1.
Itshowsthatthequestionsandanswersinourdatasetshareveryfewcommonwords,whichmakethetaskdifficult.
Theperformanceofmulti-CNN[17]withword-basedandcharacter-basediscloseandonlyachieves40.
0%P@1and60.
1%MAP.
Thesameinputunitperformsdifferentlywhenusingvariousmatchingmodel.
AsforMV-LSTM,itachieves59.
3%P@1and75.
3%MAPwithword-basedinput,1.
2%higherthanwithcharacter-basedinput.
Incontrast,MatchPyramidperformsbetterwhenusingcharacter-basedinput,withthehighestP@1of66.
0%andMAPof79.
3%.
Itis7.
2%and4.
4%betterthantheresultsofword-basedinputinP@1andMAPrespectively.
ExperimentonCSCRInthisexperiment,wevalidatewhethertheproposedCSCRmodelcangeneratebetterrepresentationgiveninputofdifferentgranularities.
WeaddCSCRtobothMV-LSTMandMatchPyramid.
ForMV-LSTM,theker-nelheightsaresetto[1,2,3]and64kernelsareusedforeachsizeinourexperiment.
ForMatchPyramid,theker-nelheightsaresetto[2,3,4].
Otherparametersettingsarethesameasinthesecondexperimentabove.
TheresultsareinFig.
3and4.
Figure3comparestheP@1resultsofmodelswithandwithoutCSCR.
ItisinterestinginthisfigurethatCSCRimprovestheperformanceofMV-LSTMnomatterwhatinputunitituses.
ItimprovestheP@1ofcharacter-basedinputby3.
0%.
Character-levelandword-levelinputsdonotinfluencetheperformanceofthemodelwithCSCR.
Moreover,character-basedinputwithCSCRoutperformsword-basedinputwithoutCSCR.
PositiveresultscanalsobeobservedinFig.
4forMV-LSTM.
However,forMatchPyramid,theresultsarecompli-cated.
ThesystemwithCSCRusingword-basedinputgains5.
5%improvementinp@1.
CSCRimprovestheMAPby4.
2%whenusingwordinput.
Butthereisnosignificantimprovementwhenusingcharacters.
Usingcharactersasinputdirectlyisthebestchoiceforthismodel.
Itcanachievearecordof66.
0%inP@1and79.
3%Fig.
3P@1ofmatchingmodelswithandwithoutCSCRusingdifferentinputunitsHeetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page98of197Fig.
4MAPofmatchingmodelswithandwithoutCSCRusingdifferentinputunitsinMAP,whichservesasacompetitivebenchkmarkonwebMedQA.
DiscussionThemostsuitableCWStoolforourdatasetJiebaperformsbestamongthreeCWStoolsinthefirstexperiment.
SegmentationresultsproducebyAnsj,FnlpandjiebaonthesamesamplearelistedinFig.
5below.
Aswecansee,bothAnsjandFnlpproducewrongsegmen-tationresults.
Ansjcutswordsintosmallerpieces.
e.
g.
,""and""arecutinto""and"".
Fnlpregardstwowordsasoneword.
e.
g.
,""and""aremergedto"".
Inthesetools,jiebaperformsthebestonourmedicalcorpus.
Word-basedinputv.
s.
character-basedinputBasedonourexperiments,resultsofcharacter-basedovertakeword-basedinputexceptformulti-CNNandMV-LSTMwithoutCSCR.
ItcanbeattributedtotheCWSfailureinthemedicaldomain.
Thereisnosig-nificantdifferencebetweenthesetwoinputunitswithmulti-CNN,whichisoppositetotheconclusionfromZhangetal.
[17].
Itisplausiblethatwerandomlyinitial-izethewordorcharacterembeddingsinsteadofusingthepre-trainedembeddings.
Trainingwordvectorsbasedonincorrectwordsegmentationresultsmayharmtheper-formanceandZhangetal.
didnotcomparetheresultsofword-basedandcharacter-basedinputswithoutpre-trainingtheembeddings.
MV-LSTMwithcharactersasinputperformsworsethanwithwords.
Basedonthisphenomenon,wediscoverthatMV-LSTMshouldusefinerinputssinceitfailstoclustersemanticunitsbasedoncharacters.
ForMatchPyramid,feedingcharactersasinputperformbetter.
Itisplausiblethatsmallconvolu-tionalkernelsandhierarchicalCNNlayersinMatchPyra-midcancapturericherdetailsandgeneratefine-grainFig.
5ThesegmentationresultsofCWStoolsonasample.
Segmentsareseparatedby/Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page99of197representations,whichismoresuitableforcharacterlevelinputsthanwordlevelinputs.
Deepmatchingmodelsoutperformmulti-CNNMulti-CNNachievesaworseresultonourdatasetthanoncMedQAdataset.
Thismayattributetothedifficultyofourtask.
cMedQAdataarefromonewebsite,therefore,havehighconsistencywhileourdataarecollectedfromvariouswebsites.
Moreover,theaveragelengthsofques-tionsandanswersinourdatasetareshorter(87v.
s.
117and147v.
s.
216).
Ourdataarealsomoreconversational.
Therefore,ourtaskismorechallengingthancMedQA.
Deepmatchingmodelsoutperformmulti-CNNsubstan-tially.
ItisplausiblethatMV-LSTMandMatchPyramidlearntherelationshipbetweenwordsorsub-words,whichisbeyondtheabilityofmulti-CNN.
TakethesampleinFig.
1asanexample.
Matchingmodelscanlearnthecorrelationbetweenwordsinquestionandanswers(e.
g.
,""/hormone,""/imbalance,""/acneinthequestionand""/nurse,""/water,""/exercises,""/sleepintheanswer)thenselectthetopscorestomakeadecision.
Multi-CNNfiltersouttheimportantwordsandproducesarepresentationofthesetwogroupsofwordsrespectively.
Thenthecosinedistanceoftheserepresentationsisusedastherankingevidence.
Butthesemanticsimilaritybetweenthesetwogroupsofwordsislow.
Therefore,matchingmodelscancapturethewordlevelrelationshipandhavebetterperformance.
TheinfluenceofCSCRComparingtheP@1andMAPresultsofthematch-ingmodelswithdifferentinputunits,wefindthatCSCRbooststheperformanceofmatchingmodelsinmostcases(excepttheP@1ofMatchPyramidwithcharacter-basedinput).
ItindicatesthatCSCRhelpsthemodelstoachievebetterperformancebyalleviat-ingthenegativeeffectofinputunitsandtheCWSproblem.
CSCRimprovestheresultsofbothmatchingmodelswithword-basedinput,especiallywhenusingMatchPyra-mid.
ItisimpliedthatCSCRcanproducebetterrepre-sentationthanCWSresultsandhelptoeasetheCWSprobleminthemedicaldomain.
CharacterinputwithCSCRevenachievesbetterresultsthanwordinput.
Therefore,byusingtheproposedCSCRmodule,thematchingmodelscanachievebetterresultswithoutCWSthanwithCWS.
Butnoincreaseincharacter-levelinputisdetectedinP@1whenusingMatchpyramid.
ItispartlyattributedtothedeepCNNsinMatchPyramid.
Theycancapturesemanticmeaningsandextracthigh-levelfeaturesfromcoarsecharacterrepresentations,whichmakesCSCRunnecessary.
ConclusionInthispaper,weintroducealargescaleChinesemedicalQAdataset,webMedQAforresearchandmultipleappli-cationsinrelatedfields.
WecastthemedicalQAasananswerselectionproblemandconductexperimentsonit.
WecomparetheperformanceofdifferentCWStools.
Wealsoevaluatetheperformanceofthetwostate-of-the-artmatchingmodelsusingcharacter-basedandword-basedinputunit.
ExperimentalresultsshowthenecessityofwordsegmentationwhenusingtheMV-LSTMandthesuperiorityofMatchPyramidwhenusingcharactersasinput.
Confrontedwiththedifficultyofwordsegmentationformedicalterms,weproposeanovelarchitecturethatcansemanticallyclusterwordsegmentsandproducearepresentation.
ExperimentalresultsrevealasubstantialimprovementinbothmetricscomparedwithvanillaMV-LSTMwithbothwordandcharacterinputs.
ButforMatchPyramid,character-basedinputisthebestconfigu-ration.
Aftertheseexperiments,weprovideastrongbaselineforQAtaskonthewebMedQAdataset.
WehopeourpapercanprovidehelpfulinformationforresearchfellowsandpromotethedevelopmentinChinesemedicaltextprocessingandrelatedfields.
AbbreviationsCNN:Convolutionalneuralnetworks;CSCR:Convolutionalsemanticclusteredrepresentation;CWS:Chinesewordsegmentation;LSTM:Long-shorttermmemory;MAP:Meanaverageprecision;MLP:Multiplelayerperceptron;P@1:Precisionat1;QA:QuestionansweringFundingPublicationcostsarefundedbytheNationalNaturalScienceFoundationofChina(Nos.
11590770-4,61650202,11722437,U1536117,61671442,11674352,11504406,61601453),theNationalKeyResearchandDevelopmentProgram(Nos.
2016YFB0801203,2016YFC0800503,2017YFB1002803).
AvailabilityofdataandmaterialsThewebMedQAdatasetwillbereleasedinhttps://github.
com/hejunqing/webMedQAafterpublication.
AboutthissupplementThisarticlehasbeenpublishedaspartofBMCMedicalInformaticsandDecisionMakingVolume19Supplement2,2019:Proceedingsfromthe4thChinaHealthInformationProcessingConference(CHIP2018).
ThefullcontentsofthesupplementareavailableonlineatURL.
https://bmcmedinformdecismak.
biomedcentral.
com/articles/supplements/volume-19-supplement-2.
Authors'contributionsJHconceivedthestudyanddevelopedthealgorithm.
MFandMTpreprocessedandconstructedthedataset.
JHandMFconducttheexperiments.
JHwrotethefirstdraftofthemanuscript.
Alltheauthorsparticipatedinthepreparationofthemanuscriptandapprovedthefinalversion.
EthicsapprovalandconsenttoparticipateNotapplicable.
ConsentforpublicationNotapplicable.
CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.
Heetal.
BMCMedicalInformaticsandDecisionMaking2019,19(Suppl2):52Page100of197Publisher'sNoteSpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
Published:9April2019References1.
Internet&AmericanLifeProject.
http://www.
pewinternet.
org/Reports/2013/Health-online.
aspx.
Accessed13March2018.
2.
ZhouX,WuB,ZhouQ.
Adepthevidencescorefusionalgorithmforchinesemedicalintelligencequestionansweringsystem.
JHealthcEng.
2018;2018:1–8.
3.
LeeM,CiminoJ,ZhuHR,SableC,ShankerV,ElyJ,YuH.
Beyondinformationretrieval–medicalquestionanswering.
In:AMIAAnnualSymposiumProceedings.
Washington:AmericanMedicalInformaticsAssociation;2006.
p.
469.
4.
AthenikosSJ,HanH,BrooksAD.
Aframeworkofalogic-basedquestion-answeringsystemforthemedicaldomain(loqas-med).
In:Proceedingsofthe2009ACMSymposiumonAppliedComputing.
Honolulu:ACM;2009.
p.
847–51.
5.
MurdockJW,FanJ,LallyA,ShimaH,BoguraevB.
Textualevidencegatheringandanalysis.
IBMJResDev.
2012;56(3.
4):8–1.
6.
AbachaAB,ZweigenbaumP.
Means:Amedicalquestion-answeringsystemcombiningnlptechniquesandsemanticwebtechnologies.
InfProcessManag.
2015;51(5):570–94.
7.
JainS,DodiyaT.
Rulebasedarchitectureformedicalquestionansweringsystem.
In:ProceedingsoftheSecondInternationalConferenceonSoftComputingforProblemSolving(SocProS2012).
Jaipur:Springer;2014.
p.
1225–33.
8.
MikolovT,SutskeverI,ChenK,CorradoGS,DeanJ.
Distributedrepresentationsofwordsandphrasesandtheircompositionality.
In:AdvancesinNeuralInformationProcessingSystems26.
NewYork:CurranAssociates,Inc.
;2013.
p.
3111–119.
9.
PenningtonJ,SocherR,ManningCD.
Glove:Globalvectorsforwordrepresentation.
In:Proceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).
Doha:AssociationforComputationalLinguistics;2014.
p.
1532–43.
10.
WangJ,ManC,ZhaoY,WangF.
Ananswerrecommendationalgorithmformedicalcommunityquestionansweringsystems.
In:2016IEEEInternationalConferenceonServiceOperationsandLogisticsandInformatics(SOLI).
Beijing:IEEE;2016.
p.
139–44.
11.
BalikasG,KritharaA,PartalasI,PaliourasG.
Bioasq:Achallengeonlarge-scalebiomedicalsemanticindexingandquestionanswering.
In:MultimodalRetrievalintheMedicalDomain.
Cham:Springer;2015.
p.
26–39.
12.
RobertsK,SimpsonM,Demner-FushmanD,VoorheesE,HershW.
State-of-the-artinbiomedicalliteratureretrievalforclinicalcases:asurveyofthetrec2014cdstrack.
InfRetrJ.
2016;19(1-2):113–48.
13.
SinghalA,SaltonG,MitraM,BuckleyC.
Documentlengthnormalization.
InfProcessManag.
1996;32(5):619–33.
14.
LiC.
Researchandapplicationonintelligentinquiryguidanceandmedicalquestionansweringmethods.
Master'sthesis,DalianUniversityofTechnology,ComputerScienceDepartment.
2016.
15.
MihalceaR,TarauP.
Textrank:Bringingorderintotext.
In:Proceedingsofthe2004ConferenceonEmpiricalMethodsinNaturalLanguageprocessing(EMNLP).
Barcelona:AssociationforComputationalLinguistics;2004.
16.
LecunY,BottouL,BengioY,HaffnerP.
Gradient-basedlearningappliedtodocumentrecognition.
ProceedingsoftheIEEE.
1998;86(11):2278–324.
17.
ZhangS,ZhangX,WangH,ChengJ,LiP,DingZ.
Chinesemedicalquestionansweringusingend-to-endcharacter-levelmulti-scalecnns.
ApplSci.
2017;7(8):767.
18.
HuB,LuZ,LiH,ChenQ.
Convolutionalneuralnetworkarchitecturesformatchingnaturallanguagesentences.
In:AdvancesinNeuralInformationProcessingSystems27(NIPS2014).
Montreal:CurranAssociates,Inc;2014.
p.
2042–050.
19.
QiuX,HuangX.
Convolutionalneuraltensornetworkarchitectureforcommunity-basedquestionanswering.
In:Twenty-FourthInternationalJointConferenceonArtificialIntelligence(IJCAI2015).
BuenosAires:AAAIPress;2015.
p.
1305–11.
20.
HochreiterS,SchmidhuberJ.
Longshort-termmemory.
NeuralComput.
1997;9(8):1735–80.
21.
PalangiH,DengL,ShenY,GaoJ,HeX,ChenJ,SongX,WardR.
IEEE/ACMTransAudio,SpeechLangProcess(TASLP).
2016;24(4):694–707.
22.
WanS,LanY,GuoJ,XuJ,PangL,ChengX.
Adeeparchitectureforsemanticmatchingwithmultiplepositionalsentencerepresentations.
In:ProceedingsoftheThirtiethAAAIConferenceonArtificialIntelligence.
AAAI'16.
Phoenix:AAAIPress;2016.
p.
2835–841.
23.
PangL,LanY,GuoJ,XuJ,WanS,ChengX.
Textmatchingasimagerecognition.
In:ProceedingsoftheThirtiethAAAIConferenceonArtificialIntelligence.
Phoenix:AAAIPress;2016.
p.
2793–799.
24.
BaiduDoctor.
https://muzhi.
baidu.
com.
Accessed18July2017.
25.
120Ask.
https://www.
120ask.
com.
Accessed18July2017.
26.
KimY.
Convolutionalneuralnetworksforsentenceclassification.
In:Proceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).
Doha:AssociationforComputationalLinguistics;2014.
p.
1746–51.
27.
ShenY,HeX,GaoJ,DengL.
Learningsemanticrepresentationsusingconvolutionalneuralnetworksforwebsearch.
In:Proceedingsofthe23rdInternationalConferenceonWorldWideWeb.
WWW'14Companion.
Seoul:ACM;2014.
p.
373–4.
28.
FengM,XiangB,GlassMR,WangL,ZhouB.
Applyingdeeplearningtoanswerselection:Astudyandanopentask.
CoRR.
2015;abs/1508.
01585:.
1508.
01585.
29.
ConneauA,SchwenkH,BarraultL,LecunY.
Verydeepconvolutionalnetworksfortextclassification.
In:Proceedingsofthe15thConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics:Volume1,LongPapers.
Valencia:AssociationforComputationalLinguistics;2017.
p.
1107–16.
30.
GlorotX,BordesA,BengioY.
Deepsparserectifierneuralnetworks.
In:GordonG,DunsonD,DudíkM,editors.
ProceedingsoftheFourteenthInternationalConferenceonArtificialIntelligenceandStatistics.
ProceedingsofMachineLearningResearch,vol.
15.
FortLauderdale:PMLR;2011.
p.
315–23.
31.
KalchbrennerN,GrefenstetteE,BlunsomP.
Aconvolutionalneuralnetworkformodellingsentences.
In:Proceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers).
Baltimore:AssociationforComputationalLinguistics;2014.
p.
655–65.
32.
DuchiJ,HazanE,SingerY.
Adaptivesubgradientmethodsforonlinelearningandstochasticoptimization.
JMachLearnRes.
2011;12(Jul):2121–159.
33.
HintonGE,SrivastavaN,KrizhevskyA,SutskeverI,SalakhutdinovRR.
Improvingneuralnetworksbypreventingco-adaptationoffeaturedetectors.
CoRR.
2012;abs/1207.
0580.
http://arxiv.
org/abs/1207.
0580.
34.
JiebaProject.
https://github.
com/fxsjy/jieba.
Accessed14Sept2017.
35.
AnsjProject.
https://github.
com/NLPchina/ansj_seg.
Accessed14Sept2017.
36.
FnlpProject.
https://github.
com/FudanNLP/fnlp.
Accessed14Sept2017.
37.
FanY,PangL,HouJ,GuoJ,LanY,ChengX.
Matchzoo:Atoolkitfordeeptextmatching.
CoRR.
2017;abs/1707.
07270.
http://arxiv.
org/abs/1707.
07270.
38.
Tensorflow.
https://www.
tensorflow.
org.
Accessed15Sept2017.

展开全文