匹配关键字排名查询
关键字排名查询 时间:2021-04-30 阅读:(
)
基本的匹配计算主要内容关键词查询结构化查询字符串的匹配算法允许出错的字符串的匹配算法关键词查询目前关键词查询是最常用的信息查询方式.
又可分为:1.
单个词2.
多个词组成的上下文(高级检索)3.
多个词用and,or或not组成的句子4.
自然语言句子单个词查询用一个最贴切的词表示查询的意思考研原理:文档用词组成的向量或文本.
匹配变成是否文档中含有查询词.
上下文查询类型:短语Phrasee.
g,ShandongUniversity近似句子允许有拼写错误等ShadongUniversity高级查询中组成的上下文:书名作者,出版社,发表时间,价格DefinitionAsyntax(语法)composedofatomsthatretrievedocuments,andofBooleanoperatorswhichworkontheiroperandse.
g,translationAND(syntaxORsyntactic)(布尔表达式)FuzzyBooleanRetrievedocumentsappearinginsomeoperands(TheANDmayrequireittoappearinmoreoperandsthantheOR)RankedhigherwhichhasalargernumberofelementsBoolean查询语法句法自然语言Generalizationof"fuzzyBoolean"AqueryisanenumerationofwordsandcontextqueriesAllthedocumentsmatchingaportionoftheuserqueryareretrievedSetathresholdsothatthedocumentwithverylowweightarenotretrieved结构查询内容与结构混合查询-给出匹配模板进行匹配三种结构-固定结构-超链结构-层次结构Fixed(固定)StructureDocument:afixedsetoffieldsEX:amailhasasender,areceiver,adate,asubjectandabodyfieldSearchforthemailssenttoagivenpersonwith"football"intheSubjectfieldAhypertextisadirectedgraphwherenodesholdsometext(textcontents)thelinksrepresentconnectionsbetweennodesorbetweenpositionsinsidenodes(structuralconnectivity)HypertextHierarchicalStructureHierarchicalStructure层次查询的处理从根到叶逐层限制的多次查询基于树或图的匹配算法StringMatchingdetectingtheoccurrenceofaparticularsubstring(pattern)inanotherstring(text)AstraightforwardSolutionTheKnuth-Morris-PrattAlgorithmStraightforwardsolutionAlgorithm:SimplestringmatchingInput:PandT,thepatternandtextstrings;m,thelengthofP.
Thepatternisassumedtobenonempty.
Output:ThereturnvalueistheindexinTwhereacopyofPbegins,or-1ifnomatchforPisfound.
Definition11.
1NotationforpatternsandtextPthepatternbeingsearchfor;TthetextinwhichPissought;mthelengthofPnthelengthofT,notknowntothealgorithm;mm)/*misthelengthofP*/match=i;//matchfound.
successcase//break;/*exittheloophere*/if(tj==pk){j++;k++;}else//Backupovermatchedcharacters.
intbackup=k-1;//从本次查询点的下一个顶点开始/j=j-backup;k=k-backup;Slidepatternforward,startover.
j++;i=j;returnmatch;ikjPTAnalysisWorst-casecomplexityisin(mn)P=aaabT=aaaaaaaaaaaaaabNeedtobackup.
However,itworksquitewellonaveragefornaturallanguage.
TheKnuth-Morris-PrattAlgorithmPatternMatchingwithFiniteAutomata(自动机)e.
g.
P="AABC"startisthebeginningindexofTIdea:rememberingthematchedpartbyutilizingtheprefixofpatternPanddonotconsiderT.
However,itisnotscalableforthesizeoftermtable.
TheKnuth-Morris-PrattFlowchart(流程图)Characterlabelsareinsidethenodes,notonthearcs.
Eachnodehastwoarrowsouttoothernodes:successlink,orfaillinknextcharacterisreadonlyafterasuccesslinkAspecialnode,node0,called"getnextchar"whichreadinnexttextcharacter.
e.
g.
P="ABABCB"T=ABABABCBConstructionoftheKMPFlowchartDefinition:FaillinksWedefinefail[k]asthelargestr(withr=1)/p1,…,ps与pk-s+1,…,pk-1比较5.
if(ps==pk-1)/*就是它!
*/6.
break;7.
s=fail[s];}/*否则递归向下找*/8.
fail[k]=s+1;}fail[1]=0;fail[2-1]=fail[1]=s=0;fail[2]=1;k=3;s=fail[3-1]=1;p2p1,s=fail[1]=0;fail[3]=s+1=1.
k=4;s=fail[3]=1;p1=p3;fail[4]=s+1=2;Tocomputefail[8],s=fail[7]=5,butp7p5,recomputes=fail[5]=3,butp7p3either,sore-computes=fail[3]=1.
stillp7p1.
Finally,s=fail[1]=0,endthesearch,andfail[8]isassigneds+1=1;PABABABCBfail01123451index12345678TheKnuth-Morris-PrattScanAlgorithmintkmpScan(char[]P,char[]T,intm,int[]fail)intmatch,j,k;match=-1;j=1;k=1;while(endText(T,j)==false)if(k>m)//success//match=j-m;break;if(k==0)//thepointofTmovesahead,andrescan//j++;k=1;elseif(tj==pk)//successatpositionkofP//j++;k++;else//Followfailarrow.
k=fail[k];//failandgobacktothepointofPcontinueloop.
returnmatch;没有使用变量iAnalysisBasedonthesimilarmethodonanalyzingthetimecomplexityofalgorithmKMPsetup,Thescanalgorithmrequires2ncharactercomparisonsintheworstcaseOverall:worstcasecomplexityis(n+m)RK算法输入:TwonbitstringsA(a1,a2,…,an)andB(b1,b2,…,bn)输出:whetherA=B.
传统方法:传输n位依次比较.
指纹机制:定义n位整数根据指纹函数Fp(x)=xmodp,p是一个素数比较Fp(a)是否等于Fp(b),传输位数减小为O(logp)设代表字符集合,x,定义函数ord(x),d=||,ord(x):{0,1,2,…,d-1}对任意的模式P,|P|=m,利用多项式指纹Q(P)=ord(P1)dm-1+ord(P2)dm-2+…+ord(Pm-1)d+ord(Pm)代表P同样对文本T=T1,T2,….
,Tn从左到右计算长度为m的连续子串的指纹,如Q(i)=ord(Ti)dm-1+ord(Ti+1)dm-2+…+ord(Ti+m-2)d+ord(Ti+m-1)并和Q(P)相比较.
若相同,则找到匹配的子串.
起始位置为i,00,b->1aa0*2+0=0,bb->2*1+1=3,03ba->(3-2*1)*2+0=202ab->(2-1*2)*2+1=1ba->(1-0*2)*2+0=2aa->(2-1*2)*2+0=0findtheposition问题是得到的整数无法表示了,过于大取素数q,Q(i)(modq)=Q(p)(modq)Q(i+1)(modq)=(Q(i)–ord(Ti)dm-1)*d)(modq)+ord(Ti+m)但这样的话,当Q(i)(modq)=Q(p)(modq),不一定对应的字符串相同,这时可以逐位进行检查,有人证明该算法的期望时间复杂性为O(m+n),是较好的算法.
特点:可以推广到高维的字符串匹配,是否可以应用到对2维图像的匹配应用到对3维物体的匹配计算具有一定误差的匹配ElementsofDynamicProgrammingConstructingsolutiontoaproblembybuildingitupdynamicallyfromsolutionstosmaller(orsimpler)sub-problemssub-instancesarecombinedtoobtainsub-instancesofincreasingsize,untilfinallyarrivingatthesolutionoftheoriginalinstance.
makeachoiceateachstep,butthechoicemaydependonthesolutionstosub-problemsPrincipleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
Principleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
MemorizationforDynamicprogrammingversionofarecursivealgorithme.
g.
Tradespaceforspeedbystoringsolutionstosub-problemsratherthanre-computingthem.
Assolutionsarefoundforsuproblems,theyarerecordedinadictionary,Beforeanyrecursivecall,sayonsubproblemQ,checkthedictionarytoseeifasolutionforQhasbeenstored.
Ifnosolutionhasbeenstored,goaheadwithrecursivecall.
IfasolutionhasbeenstoredforQ,retrievethestoredsolution,anddonotmaketherecursivecall.
Justbeforereturningthesolution,storeitinthedictionary.
Dynamicprogrammingversionofthefib.
DevelopmentofadynamicprogrammingalgorithmCharacterizethestructureofanoptimalsolutionBreakingaproblemintosub-problemwhetherprincipleofoptimalityapplyRecursivelydefinethevalueofanoptimalsolutiondefinethevalueofanoptimalsolutionbasedonvalueofsolutionstosub-problemsComputethevalueofanoptimalsolutioninabottom-upfashioncomputeinabottom-upfashionandsavethevaluesalongthewaylaterstepsusethesavevaluesofperviousstepsConstructanoptimalsolutionfromcomputedinformation字符串的近似匹配(Approximatestringmatching)Inmanyapplicationswecan'texpectanexactcopy,wewanttofindaapproximatingstringmatchwithatmostkmistakes,e.
g.
,aspellingcorrector.
Wewilldevelopadynamicprogrammingalgorithmforthek-approximatematch.
Definition:Letkbeanonnegativeinteger.
Ak-approximatematchisamatchofPinTthathasatmostkdifferences.
Thedifferencescanbeanyofthefollowingthreetypes,thenameofthedifferenceistheoperationneededonTtobringitclosertoP.
Revise:ThecorrespondingcharactersinPandTaredifferent;Delete:TcontainsacharacterthatismissingfromP.
Insert:TismissingacharacterthatappearsinP.
如何修改T中的子串,使其能匹配上e.
g.
3-approximatematchP:unnecessarilyT:unescessaraly(madethreespellingerrors)Definition11.
6DifferencetableD[i][j]=theminimumnumberofdifferencebetweenP1,…,PiandasegmentofTendingattj.
1im,1jm.
定义:D[0][j]=0;D[i,0]=i;Therewillbeak-approximatematchendingattjforanyjsuchthatD[m][j]k,sowecanstopassoonaswefindanentrylessthanorequaltokinthelastrowofD,whichisthefirstk-approximatematch.
TherulesforthecomputationofDD[i][j]=D[i-1][j-1]ifpi=tj/*noerror*/D[i][j]=D[i-1][j-1]+1ifpitjandrevisetjtopiandbothiandjincrease;D[i][j]=D[i-1][j]+1ifinsertpiintoT,onlyiincrease.
D[i][j]=D[i][j-1]+1ifdeletetjfromTandonlyjincrease.
Eachentryrequiresonlyentriesaboveitandtoitsleftinthetable0000012m12mD[i-1][j-1]D[i-1][j]D[i][j-1]D[i][j]D[i][j]iscalculatedtogettheminimumvaluefromabove4formulaeHaveahsppyday000000000000h1a2p3P4y51111110111112221211222223322221233334333321244444444321D[5][12]=1,t[8.
.
12]hasonemisspellingwithP.
NonserialMonadicDPFormulations:Longest-Common-SubsequenceGivenasequenceA=,asubsequenceofAcanbeformedbydeletingsomeentriesfromA.
GiventwosequencesA=andB=,findthelongestsequencethatisasubsequenceinbothAandB.
IfA=andB=,thelongestcommonsubsequenceofAandBis.
Longest-Common-SubsequenceProblemLetF[i,j]denotethelengthofthelongestcommonsubsequenceofthefirstielementsofAandthefirstjelementsofB.
TheobjectiveoftheLCSproblemistofindF[n,m].
Wecanwrite:左下和右上的最大值ConsidertheLCSoftwoamino-acidsequencesHEAGAWGHEEandPAWHEAE.
TheFtableforcomputingtheLCSofthesequences.
TheLCSisAWHEE.
F[7,10]=5LCS=AWHEE,
raksmart作为一家老牌美国机房总是被很多人问到raksmart香港服务器怎么样、raksmart好不好?其实,这也好理解。香港服务器离大陆最近、理论上是不需要备案的服务器里面速度最快的,被过多关注也就在情理之中了。本着为大家趟雷就是本站的光荣这一理念,拿了一台raksmart的香港独立服务器,简单做个测评,分享下实测的数据,仅供参考!官方网站:https://www.raksmart.com...
青云互联怎么样?青云互联是一家成立于2020年6月的主机服务商,致力于为用户提供高性价比稳定快速的主机托管服务,目前提供有美国免费主机、香港主机、香港服务器、美国云服务器,让您的网站高速、稳定运行。美国cn2弹性云主机限时8折起,可选1-20个IP,仅15元/月起,附8折优惠码使用!点击进入:青云互联官方网站地址青云互联优惠码:八折优惠码:ltY8sHMh (续费同价)青云互联活动方案:美国洛杉矶...
GigsGigsCloud新上了洛杉矶机房国际版线路VPS,基于KVM架构,采用SSD硬盘,年付最低26美元起。这是一家成立于2015年的马来西亚主机商,提供VPS主机和独立服务器租用,数据中心包括美国洛杉矶、中国香港、新加坡、马来西亚和日本等。商家VPS主机基于KVM架构,所选均为国内直连或者优化线路,比如洛杉矶机房有CN2 GIA、AS9929或者高防线路等。下面列出这款年付VPS主机配置信息...
关键字排名查询为你推荐
signal37linux防火墙设置如何使用iptables命令为Linux系统配置防火墙googlepr什么是Google PR值? 如何提高PR值?flashfxp下载怎样用FlashFXP从服务器下载到电脑上?设计esetpiaonimai跪求朴妮唛的的韩文歌,不知道是哪一部的,第一首放的是Girl's Day《Oh! My God》。求第三首韩文歌曲,一男一女唱的。可信网站可信网站认证怎么做?贵不?价格大概是多少?引擎收录怎么使自己的网站被搜索引擎收录呢?网站日志怎样将网站日志生成到网站根目录上传软件有哪些可以下载软件的软件。
域名服务器的作用 plesk hawkhost优惠码 ixwebhosting 创宇云 52测评网 空间出租 韩国名字大全 阿里校园 支持外链的相册 shopex主机 cloudlink qq金券 防cc攻击 免费蓝钻 114dns 江苏徐州移动 服务器托管价格 聚惠网 alexa世界排名 更多