匹配关键字排名查询

关键字排名查询  时间:2021-04-30  阅读:()
基本的匹配计算主要内容关键词查询结构化查询字符串的匹配算法允许出错的字符串的匹配算法关键词查询目前关键词查询是最常用的信息查询方式.
又可分为:1.
单个词2.
多个词组成的上下文(高级检索)3.
多个词用and,or或not组成的句子4.
自然语言句子单个词查询用一个最贴切的词表示查询的意思考研原理:文档用词组成的向量或文本.
匹配变成是否文档中含有查询词.
上下文查询类型:短语Phrasee.
g,ShandongUniversity近似句子允许有拼写错误等ShadongUniversity高级查询中组成的上下文:书名作者,出版社,发表时间,价格DefinitionAsyntax(语法)composedofatomsthatretrievedocuments,andofBooleanoperatorswhichworkontheiroperandse.
g,translationAND(syntaxORsyntactic)(布尔表达式)FuzzyBooleanRetrievedocumentsappearinginsomeoperands(TheANDmayrequireittoappearinmoreoperandsthantheOR)RankedhigherwhichhasalargernumberofelementsBoolean查询语法句法自然语言Generalizationof"fuzzyBoolean"AqueryisanenumerationofwordsandcontextqueriesAllthedocumentsmatchingaportionoftheuserqueryareretrievedSetathresholdsothatthedocumentwithverylowweightarenotretrieved结构查询内容与结构混合查询-给出匹配模板进行匹配三种结构-固定结构-超链结构-层次结构Fixed(固定)StructureDocument:afixedsetoffieldsEX:amailhasasender,areceiver,adate,asubjectandabodyfieldSearchforthemailssenttoagivenpersonwith"football"intheSubjectfieldAhypertextisadirectedgraphwherenodesholdsometext(textcontents)thelinksrepresentconnectionsbetweennodesorbetweenpositionsinsidenodes(structuralconnectivity)HypertextHierarchicalStructureHierarchicalStructure层次查询的处理从根到叶逐层限制的多次查询基于树或图的匹配算法StringMatchingdetectingtheoccurrenceofaparticularsubstring(pattern)inanotherstring(text)AstraightforwardSolutionTheKnuth-Morris-PrattAlgorithmStraightforwardsolutionAlgorithm:SimplestringmatchingInput:PandT,thepatternandtextstrings;m,thelengthofP.
Thepatternisassumedtobenonempty.
Output:ThereturnvalueistheindexinTwhereacopyofPbegins,or-1ifnomatchforPisfound.
Definition11.
1NotationforpatternsandtextPthepatternbeingsearchfor;TthetextinwhichPissought;mthelengthofPnthelengthofT,notknowntothealgorithm;mm)/*misthelengthofP*/match=i;//matchfound.
successcase//break;/*exittheloophere*/if(tj==pk){j++;k++;}else//Backupovermatchedcharacters.
intbackup=k-1;//从本次查询点的下一个顶点开始/j=j-backup;k=k-backup;Slidepatternforward,startover.
j++;i=j;returnmatch;ikjPTAnalysisWorst-casecomplexityisin(mn)P=aaabT=aaaaaaaaaaaaaabNeedtobackup.
However,itworksquitewellonaveragefornaturallanguage.
TheKnuth-Morris-PrattAlgorithmPatternMatchingwithFiniteAutomata(自动机)e.
g.
P="AABC"startisthebeginningindexofTIdea:rememberingthematchedpartbyutilizingtheprefixofpatternPanddonotconsiderT.
However,itisnotscalableforthesizeoftermtable.
TheKnuth-Morris-PrattFlowchart(流程图)Characterlabelsareinsidethenodes,notonthearcs.
Eachnodehastwoarrowsouttoothernodes:successlink,orfaillinknextcharacterisreadonlyafterasuccesslinkAspecialnode,node0,called"getnextchar"whichreadinnexttextcharacter.
e.
g.
P="ABABCB"T=ABABABCBConstructionoftheKMPFlowchartDefinition:FaillinksWedefinefail[k]asthelargestr(withr=1)/p1,…,ps与pk-s+1,…,pk-1比较5.
if(ps==pk-1)/*就是它!
*/6.
break;7.
s=fail[s];}/*否则递归向下找*/8.
fail[k]=s+1;}fail[1]=0;fail[2-1]=fail[1]=s=0;fail[2]=1;k=3;s=fail[3-1]=1;p2p1,s=fail[1]=0;fail[3]=s+1=1.
k=4;s=fail[3]=1;p1=p3;fail[4]=s+1=2;Tocomputefail[8],s=fail[7]=5,butp7p5,recomputes=fail[5]=3,butp7p3either,sore-computes=fail[3]=1.
stillp7p1.
Finally,s=fail[1]=0,endthesearch,andfail[8]isassigneds+1=1;PABABABCBfail01123451index12345678TheKnuth-Morris-PrattScanAlgorithmintkmpScan(char[]P,char[]T,intm,int[]fail)intmatch,j,k;match=-1;j=1;k=1;while(endText(T,j)==false)if(k>m)//success//match=j-m;break;if(k==0)//thepointofTmovesahead,andrescan//j++;k=1;elseif(tj==pk)//successatpositionkofP//j++;k++;else//Followfailarrow.
k=fail[k];//failandgobacktothepointofPcontinueloop.
returnmatch;没有使用变量iAnalysisBasedonthesimilarmethodonanalyzingthetimecomplexityofalgorithmKMPsetup,Thescanalgorithmrequires2ncharactercomparisonsintheworstcaseOverall:worstcasecomplexityis(n+m)RK算法输入:TwonbitstringsA(a1,a2,…,an)andB(b1,b2,…,bn)输出:whetherA=B.
传统方法:传输n位依次比较.
指纹机制:定义n位整数根据指纹函数Fp(x)=xmodp,p是一个素数比较Fp(a)是否等于Fp(b),传输位数减小为O(logp)设代表字符集合,x,定义函数ord(x),d=||,ord(x):{0,1,2,…,d-1}对任意的模式P,|P|=m,利用多项式指纹Q(P)=ord(P1)dm-1+ord(P2)dm-2+…+ord(Pm-1)d+ord(Pm)代表P同样对文本T=T1,T2,….
,Tn从左到右计算长度为m的连续子串的指纹,如Q(i)=ord(Ti)dm-1+ord(Ti+1)dm-2+…+ord(Ti+m-2)d+ord(Ti+m-1)并和Q(P)相比较.
若相同,则找到匹配的子串.
起始位置为i,00,b->1aa0*2+0=0,bb->2*1+1=3,03ba->(3-2*1)*2+0=202ab->(2-1*2)*2+1=1ba->(1-0*2)*2+0=2aa->(2-1*2)*2+0=0findtheposition问题是得到的整数无法表示了,过于大取素数q,Q(i)(modq)=Q(p)(modq)Q(i+1)(modq)=(Q(i)–ord(Ti)dm-1)*d)(modq)+ord(Ti+m)但这样的话,当Q(i)(modq)=Q(p)(modq),不一定对应的字符串相同,这时可以逐位进行检查,有人证明该算法的期望时间复杂性为O(m+n),是较好的算法.
特点:可以推广到高维的字符串匹配,是否可以应用到对2维图像的匹配应用到对3维物体的匹配计算具有一定误差的匹配ElementsofDynamicProgrammingConstructingsolutiontoaproblembybuildingitupdynamicallyfromsolutionstosmaller(orsimpler)sub-problemssub-instancesarecombinedtoobtainsub-instancesofincreasingsize,untilfinallyarrivingatthesolutionoftheoriginalinstance.
makeachoiceateachstep,butthechoicemaydependonthesolutionstosub-problemsPrincipleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
Principleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
MemorizationforDynamicprogrammingversionofarecursivealgorithme.
g.
Tradespaceforspeedbystoringsolutionstosub-problemsratherthanre-computingthem.
Assolutionsarefoundforsuproblems,theyarerecordedinadictionary,Beforeanyrecursivecall,sayonsubproblemQ,checkthedictionarytoseeifasolutionforQhasbeenstored.
Ifnosolutionhasbeenstored,goaheadwithrecursivecall.
IfasolutionhasbeenstoredforQ,retrievethestoredsolution,anddonotmaketherecursivecall.
Justbeforereturningthesolution,storeitinthedictionary.
Dynamicprogrammingversionofthefib.
DevelopmentofadynamicprogrammingalgorithmCharacterizethestructureofanoptimalsolutionBreakingaproblemintosub-problemwhetherprincipleofoptimalityapplyRecursivelydefinethevalueofanoptimalsolutiondefinethevalueofanoptimalsolutionbasedonvalueofsolutionstosub-problemsComputethevalueofanoptimalsolutioninabottom-upfashioncomputeinabottom-upfashionandsavethevaluesalongthewaylaterstepsusethesavevaluesofperviousstepsConstructanoptimalsolutionfromcomputedinformation字符串的近似匹配(Approximatestringmatching)Inmanyapplicationswecan'texpectanexactcopy,wewanttofindaapproximatingstringmatchwithatmostkmistakes,e.
g.
,aspellingcorrector.
Wewilldevelopadynamicprogrammingalgorithmforthek-approximatematch.
Definition:Letkbeanonnegativeinteger.
Ak-approximatematchisamatchofPinTthathasatmostkdifferences.
Thedifferencescanbeanyofthefollowingthreetypes,thenameofthedifferenceistheoperationneededonTtobringitclosertoP.
Revise:ThecorrespondingcharactersinPandTaredifferent;Delete:TcontainsacharacterthatismissingfromP.
Insert:TismissingacharacterthatappearsinP.
如何修改T中的子串,使其能匹配上e.
g.
3-approximatematchP:unnecessarilyT:unescessaraly(madethreespellingerrors)Definition11.
6DifferencetableD[i][j]=theminimumnumberofdifferencebetweenP1,…,PiandasegmentofTendingattj.
1im,1jm.
定义:D[0][j]=0;D[i,0]=i;Therewillbeak-approximatematchendingattjforanyjsuchthatD[m][j]k,sowecanstopassoonaswefindanentrylessthanorequaltokinthelastrowofD,whichisthefirstk-approximatematch.
TherulesforthecomputationofDD[i][j]=D[i-1][j-1]ifpi=tj/*noerror*/D[i][j]=D[i-1][j-1]+1ifpitjandrevisetjtopiandbothiandjincrease;D[i][j]=D[i-1][j]+1ifinsertpiintoT,onlyiincrease.
D[i][j]=D[i][j-1]+1ifdeletetjfromTandonlyjincrease.
Eachentryrequiresonlyentriesaboveitandtoitsleftinthetable0000012m12mD[i-1][j-1]D[i-1][j]D[i][j-1]D[i][j]D[i][j]iscalculatedtogettheminimumvaluefromabove4formulaeHaveahsppyday000000000000h1a2p3P4y51111110111112221211222223322221233334333321244444444321D[5][12]=1,t[8.
.
12]hasonemisspellingwithP.
NonserialMonadicDPFormulations:Longest-Common-SubsequenceGivenasequenceA=,asubsequenceofAcanbeformedbydeletingsomeentriesfromA.
GiventwosequencesA=andB=,findthelongestsequencethatisasubsequenceinbothAandB.
IfA=andB=,thelongestcommonsubsequenceofAandBis.
Longest-Common-SubsequenceProblemLetF[i,j]denotethelengthofthelongestcommonsubsequenceofthefirstielementsofAandthefirstjelementsofB.
TheobjectiveoftheLCSproblemistofindF[n,m].
Wecanwrite:左下和右上的最大值ConsidertheLCSoftwoamino-acidsequencesHEAGAWGHEEandPAWHEAE.
TheFtableforcomputingtheLCSofthesequences.
TheLCSisAWHEE.
F[7,10]=5LCS=AWHEE,

VPS云服务器GT线路,KVM虚vps消息CloudCone美国洛杉矶便宜年付VPS云服务器补货14美元/年

近日CloudCone发布了最新的补货消息,针对此前新年闪购年付便宜VPS云服务器计划方案进行了少量补货,KVM虚拟架构,美国洛杉矶CN2 GT线路,1Gbps带宽,最低3TB流量,仅需14美元/年,有需要国外便宜美国洛杉矶VPS云服务器的朋友可以尝试一下。CloudCone怎么样?CloudCone服务器好不好?CloudCone值不值得购买?CloudCone是一家成立于2017年的美国服务器...

木木云35元/月,美国vps服务器优惠,1核1G/500M带宽/1T硬盘/4T流量

木木云怎么样?木木云品牌成立于18年,此为贵州木木云科技有限公司旗下新运营高端的服务器的平台,目前已上线美国中部大盘鸡,母鸡采用E5-267X系列,硬盘全部组成阵列。目前,木木云美国vps进行了优惠促销,1核1G/500M带宽/1T硬盘/4T流量,仅35元/月。点击进入:木木云官方网站地址木木云优惠码:提供了一个您专用的优惠码: yuntue目前我们有如下产品套餐:DV型 1H 1G 500M带宽...

酷锐云香港(19元/月) ,美国1核2G 19元/月,日本独立物理机,

酷锐云是一家2019年开业的国人主机商家,商家为企业运营,主要销售主VPS服务器,提供挂机宝和云服务器,机房有美国CERA、中国香港安畅和电信,CERA为CN2 GIA线路,提供单机10G+天机盾防御,提供美国原生IP,支持媒体流解锁,商家的套餐价格非常美丽,CERA机房月付20元起,香港安畅机房10M带宽月付25元,有需要的朋友可以入手试试。酷锐云自开业以来一直有着良好的产品稳定性及服务态度,支...

关键字排名查询为你推荐
用户googlethinkphpThinkphp和onethink有什么区别开启javascript怎么在浏览器中启用JavaScript?centos6.5如何安装linux centos6.5搜狗360360浏览器为什么不能让我自动登录了特朗普吐槽iPhone为什么这么多人讨厌苹果呢?iPhone配置不足但是iOS流畅度确实很高很强大,性能领先几乎所有国产字节跳动回应TikTok易主#北京字节跳动科技有限公司#小说审核有三面么?我面试了两轮就叫我回家等消息了 要是刷下来了也该告360防火墙在哪里设置电脑或电脑360有联网防火墙吗,在哪里设置设计eset腾讯公司电话腾讯总公司服务热线是多少
中文域名注册 中国十大域名注册商 韩国服务器租用 美国linux主机 中文域名交易中心 yardvps jsp主机 秒解服务器 京东云擎 lamp配置 论坛空间 浙江独立 空间出租 bgp双线 域名接入 卡巴斯基试用版 稳定免费空间 天翼云盘 台湾谷歌 中国电信测速器 更多