匹配关键字排名查询

关键字排名查询  时间:2021-04-30  阅读:()
基本的匹配计算主要内容关键词查询结构化查询字符串的匹配算法允许出错的字符串的匹配算法关键词查询目前关键词查询是最常用的信息查询方式.
又可分为:1.
单个词2.
多个词组成的上下文(高级检索)3.
多个词用and,or或not组成的句子4.
自然语言句子单个词查询用一个最贴切的词表示查询的意思考研原理:文档用词组成的向量或文本.
匹配变成是否文档中含有查询词.
上下文查询类型:短语Phrasee.
g,ShandongUniversity近似句子允许有拼写错误等ShadongUniversity高级查询中组成的上下文:书名作者,出版社,发表时间,价格DefinitionAsyntax(语法)composedofatomsthatretrievedocuments,andofBooleanoperatorswhichworkontheiroperandse.
g,translationAND(syntaxORsyntactic)(布尔表达式)FuzzyBooleanRetrievedocumentsappearinginsomeoperands(TheANDmayrequireittoappearinmoreoperandsthantheOR)RankedhigherwhichhasalargernumberofelementsBoolean查询语法句法自然语言Generalizationof"fuzzyBoolean"AqueryisanenumerationofwordsandcontextqueriesAllthedocumentsmatchingaportionoftheuserqueryareretrievedSetathresholdsothatthedocumentwithverylowweightarenotretrieved结构查询内容与结构混合查询-给出匹配模板进行匹配三种结构-固定结构-超链结构-层次结构Fixed(固定)StructureDocument:afixedsetoffieldsEX:amailhasasender,areceiver,adate,asubjectandabodyfieldSearchforthemailssenttoagivenpersonwith"football"intheSubjectfieldAhypertextisadirectedgraphwherenodesholdsometext(textcontents)thelinksrepresentconnectionsbetweennodesorbetweenpositionsinsidenodes(structuralconnectivity)HypertextHierarchicalStructureHierarchicalStructure层次查询的处理从根到叶逐层限制的多次查询基于树或图的匹配算法StringMatchingdetectingtheoccurrenceofaparticularsubstring(pattern)inanotherstring(text)AstraightforwardSolutionTheKnuth-Morris-PrattAlgorithmStraightforwardsolutionAlgorithm:SimplestringmatchingInput:PandT,thepatternandtextstrings;m,thelengthofP.
Thepatternisassumedtobenonempty.
Output:ThereturnvalueistheindexinTwhereacopyofPbegins,or-1ifnomatchforPisfound.
Definition11.
1NotationforpatternsandtextPthepatternbeingsearchfor;TthetextinwhichPissought;mthelengthofPnthelengthofT,notknowntothealgorithm;mm)/*misthelengthofP*/match=i;//matchfound.
successcase//break;/*exittheloophere*/if(tj==pk){j++;k++;}else//Backupovermatchedcharacters.
intbackup=k-1;//从本次查询点的下一个顶点开始/j=j-backup;k=k-backup;Slidepatternforward,startover.
j++;i=j;returnmatch;ikjPTAnalysisWorst-casecomplexityisin(mn)P=aaabT=aaaaaaaaaaaaaabNeedtobackup.
However,itworksquitewellonaveragefornaturallanguage.
TheKnuth-Morris-PrattAlgorithmPatternMatchingwithFiniteAutomata(自动机)e.
g.
P="AABC"startisthebeginningindexofTIdea:rememberingthematchedpartbyutilizingtheprefixofpatternPanddonotconsiderT.
However,itisnotscalableforthesizeoftermtable.
TheKnuth-Morris-PrattFlowchart(流程图)Characterlabelsareinsidethenodes,notonthearcs.
Eachnodehastwoarrowsouttoothernodes:successlink,orfaillinknextcharacterisreadonlyafterasuccesslinkAspecialnode,node0,called"getnextchar"whichreadinnexttextcharacter.
e.
g.
P="ABABCB"T=ABABABCBConstructionoftheKMPFlowchartDefinition:FaillinksWedefinefail[k]asthelargestr(withr=1)/p1,…,ps与pk-s+1,…,pk-1比较5.
if(ps==pk-1)/*就是它!
*/6.
break;7.
s=fail[s];}/*否则递归向下找*/8.
fail[k]=s+1;}fail[1]=0;fail[2-1]=fail[1]=s=0;fail[2]=1;k=3;s=fail[3-1]=1;p2p1,s=fail[1]=0;fail[3]=s+1=1.
k=4;s=fail[3]=1;p1=p3;fail[4]=s+1=2;Tocomputefail[8],s=fail[7]=5,butp7p5,recomputes=fail[5]=3,butp7p3either,sore-computes=fail[3]=1.
stillp7p1.
Finally,s=fail[1]=0,endthesearch,andfail[8]isassigneds+1=1;PABABABCBfail01123451index12345678TheKnuth-Morris-PrattScanAlgorithmintkmpScan(char[]P,char[]T,intm,int[]fail)intmatch,j,k;match=-1;j=1;k=1;while(endText(T,j)==false)if(k>m)//success//match=j-m;break;if(k==0)//thepointofTmovesahead,andrescan//j++;k=1;elseif(tj==pk)//successatpositionkofP//j++;k++;else//Followfailarrow.
k=fail[k];//failandgobacktothepointofPcontinueloop.
returnmatch;没有使用变量iAnalysisBasedonthesimilarmethodonanalyzingthetimecomplexityofalgorithmKMPsetup,Thescanalgorithmrequires2ncharactercomparisonsintheworstcaseOverall:worstcasecomplexityis(n+m)RK算法输入:TwonbitstringsA(a1,a2,…,an)andB(b1,b2,…,bn)输出:whetherA=B.
传统方法:传输n位依次比较.
指纹机制:定义n位整数根据指纹函数Fp(x)=xmodp,p是一个素数比较Fp(a)是否等于Fp(b),传输位数减小为O(logp)设代表字符集合,x,定义函数ord(x),d=||,ord(x):{0,1,2,…,d-1}对任意的模式P,|P|=m,利用多项式指纹Q(P)=ord(P1)dm-1+ord(P2)dm-2+…+ord(Pm-1)d+ord(Pm)代表P同样对文本T=T1,T2,….
,Tn从左到右计算长度为m的连续子串的指纹,如Q(i)=ord(Ti)dm-1+ord(Ti+1)dm-2+…+ord(Ti+m-2)d+ord(Ti+m-1)并和Q(P)相比较.
若相同,则找到匹配的子串.
起始位置为i,00,b->1aa0*2+0=0,bb->2*1+1=3,03ba->(3-2*1)*2+0=202ab->(2-1*2)*2+1=1ba->(1-0*2)*2+0=2aa->(2-1*2)*2+0=0findtheposition问题是得到的整数无法表示了,过于大取素数q,Q(i)(modq)=Q(p)(modq)Q(i+1)(modq)=(Q(i)–ord(Ti)dm-1)*d)(modq)+ord(Ti+m)但这样的话,当Q(i)(modq)=Q(p)(modq),不一定对应的字符串相同,这时可以逐位进行检查,有人证明该算法的期望时间复杂性为O(m+n),是较好的算法.
特点:可以推广到高维的字符串匹配,是否可以应用到对2维图像的匹配应用到对3维物体的匹配计算具有一定误差的匹配ElementsofDynamicProgrammingConstructingsolutiontoaproblembybuildingitupdynamicallyfromsolutionstosmaller(orsimpler)sub-problemssub-instancesarecombinedtoobtainsub-instancesofincreasingsize,untilfinallyarrivingatthesolutionoftheoriginalinstance.
makeachoiceateachstep,butthechoicemaydependonthesolutionstosub-problemsPrincipleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
Principleofoptimalitytheoptimalsolutiontoanynontrivialinstanceofaproblemisacombinationofoptimalsolutionstosomeofitssub-instances.
Memorization(foroverlappingsub-problems)avoidcalculatingthesamethingtwice,usuallybykeepingatableofknowresultsthatfillsupassub-instancesaresolved.
MemorizationforDynamicprogrammingversionofarecursivealgorithme.
g.
Tradespaceforspeedbystoringsolutionstosub-problemsratherthanre-computingthem.
Assolutionsarefoundforsuproblems,theyarerecordedinadictionary,Beforeanyrecursivecall,sayonsubproblemQ,checkthedictionarytoseeifasolutionforQhasbeenstored.
Ifnosolutionhasbeenstored,goaheadwithrecursivecall.
IfasolutionhasbeenstoredforQ,retrievethestoredsolution,anddonotmaketherecursivecall.
Justbeforereturningthesolution,storeitinthedictionary.
Dynamicprogrammingversionofthefib.
DevelopmentofadynamicprogrammingalgorithmCharacterizethestructureofanoptimalsolutionBreakingaproblemintosub-problemwhetherprincipleofoptimalityapplyRecursivelydefinethevalueofanoptimalsolutiondefinethevalueofanoptimalsolutionbasedonvalueofsolutionstosub-problemsComputethevalueofanoptimalsolutioninabottom-upfashioncomputeinabottom-upfashionandsavethevaluesalongthewaylaterstepsusethesavevaluesofperviousstepsConstructanoptimalsolutionfromcomputedinformation字符串的近似匹配(Approximatestringmatching)Inmanyapplicationswecan'texpectanexactcopy,wewanttofindaapproximatingstringmatchwithatmostkmistakes,e.
g.
,aspellingcorrector.
Wewilldevelopadynamicprogrammingalgorithmforthek-approximatematch.
Definition:Letkbeanonnegativeinteger.
Ak-approximatematchisamatchofPinTthathasatmostkdifferences.
Thedifferencescanbeanyofthefollowingthreetypes,thenameofthedifferenceistheoperationneededonTtobringitclosertoP.
Revise:ThecorrespondingcharactersinPandTaredifferent;Delete:TcontainsacharacterthatismissingfromP.
Insert:TismissingacharacterthatappearsinP.
如何修改T中的子串,使其能匹配上e.
g.
3-approximatematchP:unnecessarilyT:unescessaraly(madethreespellingerrors)Definition11.
6DifferencetableD[i][j]=theminimumnumberofdifferencebetweenP1,…,PiandasegmentofTendingattj.
1im,1jm.
定义:D[0][j]=0;D[i,0]=i;Therewillbeak-approximatematchendingattjforanyjsuchthatD[m][j]k,sowecanstopassoonaswefindanentrylessthanorequaltokinthelastrowofD,whichisthefirstk-approximatematch.
TherulesforthecomputationofDD[i][j]=D[i-1][j-1]ifpi=tj/*noerror*/D[i][j]=D[i-1][j-1]+1ifpitjandrevisetjtopiandbothiandjincrease;D[i][j]=D[i-1][j]+1ifinsertpiintoT,onlyiincrease.
D[i][j]=D[i][j-1]+1ifdeletetjfromTandonlyjincrease.
Eachentryrequiresonlyentriesaboveitandtoitsleftinthetable0000012m12mD[i-1][j-1]D[i-1][j]D[i][j-1]D[i][j]D[i][j]iscalculatedtogettheminimumvaluefromabove4formulaeHaveahsppyday000000000000h1a2p3P4y51111110111112221211222223322221233334333321244444444321D[5][12]=1,t[8.
.
12]hasonemisspellingwithP.
NonserialMonadicDPFormulations:Longest-Common-SubsequenceGivenasequenceA=,asubsequenceofAcanbeformedbydeletingsomeentriesfromA.
GiventwosequencesA=andB=,findthelongestsequencethatisasubsequenceinbothAandB.
IfA=andB=,thelongestcommonsubsequenceofAandBis.
Longest-Common-SubsequenceProblemLetF[i,j]denotethelengthofthelongestcommonsubsequenceofthefirstielementsofAandthefirstjelementsofB.
TheobjectiveoftheLCSproblemistofindF[n,m].
Wecanwrite:左下和右上的最大值ConsidertheLCSoftwoamino-acidsequencesHEAGAWGHEEandPAWHEAE.
TheFtableforcomputingtheLCSofthesequences.
TheLCSisAWHEE.
F[7,10]=5LCS=AWHEE,

云步云72.5元/月起云服务器,香港安畅/葵湾/将军澳/沙田/大浦CN2机房,2核2G5M

云步云怎么样?云步云是创建于2021年的品牌,主要从事出售香港vps、美国VPS、日本VPS、香港独立服务器、香港站群服务器等,机房有香港、美国、日本东京等机房,目前在售VPS线路有CN2+BGP、CN2 GIA,香港的线路也是CN2直连大陆,该公司旗下产品均采用KVM虚拟化架构。目前,云步云提供香港安畅、沙田、大浦、葵湾、将军澳、新世界等CN2机房云服务器,2核2G5M仅72.5元/月起。点击进...

VPSDime7美元/月,美国达拉斯Windows VPS,2核4G/50GB SSD/2TB流量/Hyper-V虚拟化

VPSDime是2013年成立的国外VPS主机商,以大内存闻名业界,主营基于OpenVZ和KVM虚拟化的Linux套餐,大内存、10Gbps大带宽、大硬盘,有美国西雅图、达拉斯、新泽西、英国、荷兰机房可选。在上个月搞了一款达拉斯Linux系统VPS促销,详情查看:VPSDime夏季促销:美国达拉斯VPS/2G内存/2核/20gSSD/1T流量/$20/年,此次推出一款Windows VPS,依然是...

HostKvm:夏季优惠,香港云地/韩国vps终身7折,线路好/机器稳/适合做站

hostkvm怎么样?hostkvm是一家国内老牌主机商家,商家主要销售KVM架构的VPS,目前有美国、日本、韩国、中国香港等地的服务,站长目前还持有他家香港CN2线路的套餐,已经用了一年多了,除了前段时间香港被整段攻击以外,一直非常稳定,是做站的不二选择,目前商家针对香港云地和韩国机房的套餐进行7折优惠,其他套餐为8折,商家支持paypal和支付宝付款。点击进入:hostkvm官方网站地址hos...

关键字排名查询为你推荐
重庆大学网上服务大厅用文章文章操作http企业cms目前最好用的企业cms是哪个?重庆网络公司一九互联重庆网络公司,重庆网络优化,重庆页面制作性价比高且便宜的网络公司有哪些?德国iphone禁售令有人说苹果手机从2017年开始,中国禁售了outlookexpress如何开启OUTLOOK EXPRESS功能?平阴县教育和体育局下属锦东小学教学设备采购项目竞争性磋商文件文档下载怎么下载百度文档2828商机网千元能办厂?28商机网是真的吗?
如何注册域名 vps租用 3322免费域名 阿里云邮箱登陆首页 highfrequency ion 云主机51web php空间申请 北京双线机房 河南m值兑换 域名评估 河南移动m值兑换 hkt t云 江苏双线服务器 银盘服务 yundun 便宜空间 重庆电信服务器托管 photobucket 更多