EfficientAlgorithmsforSubstringNearNeighborProblemAlexandrAndoniPiotrIndykMIT1What’sSNN?SNN≈TextIndexingwithmismatchesTextIndexing:ConstructadatastructureonatextT[1..n],[1..m],urrencesofPinTTextindexingwithmismatches:GivenP,findthesubstringsofTthatareequaltoPexcept≤:.,computationalbio(BLAST)T=GAGTAACTCAATAP=AGTAT=GAGTAACTCAATA2OutlineGeneralapproachView:NearNeighborinHammingFocus:reducingspaceBackgroundLocality-SensitiveHashing(LSH)SolutionReducingquery&preprocessingRedesignLSHConcludingremarks3Approach(Or,whySNN?)SNN=anearneighborprobleminHammingmetricwithmdimensions:ConstructdatastructureonD={allsubstringsofToflengthm},,findapointinDthatisatdistance≤RfromPUseaNNdatastructureforHammingD={GAGT,AGTA,GTAA,….AATA}T=GAGTAACTCAATAP=AGTA4ApproximateNNExactNNproblemseemshard(.,hardw/oexponentialspaceorO(n)querytime)ApproximateNNiseasierDefinedforapproximationc=1+εasOKtoreportapointatdistance≤cR(whenthereisapointatdistance≤R)QuerySpace[KOR98,IM98]poly(logn,m)nO(1/ε^2)LSH[IM98]n1/c+mn1+1/cRcRq5OurcontributionProblem:needminadvanceforNNHavetoconstructadatastructureforeachm≤MHere:approxSNNdatastructureforunknownmWithoutdegradationinspaceorquerytimeOuralgorithmforSNNbasedonLSH:Supportspatternsoflengthm≤MOptimal*space:n1+1/cOptimal*querytime:n1/cSlightlyworsepreprocessingtimeifc>3(*,modulosubpolyfactors)Alsoextendstol16OutlineGeneralapproachView:NearNeighborinHammingFocus:reducingspaceBackgroundLocality-SensitiveHashing(LSH)SolutionReducingquery&preprocessingRedesignLSHConcludingremarks7Locality-SensitiveHashingBasedonafamilyofhashfunctions{g}ForpointsP[1..m],Q[1..m]:Ifdist(P,Q)≤R, Prg[g(P)=g(Q)]=“medium”Ifdist(P,Q)>cR, Prg[g(P)=g(Q)]=“low”Idea:ConstructLhashtableswithrandomg1,g2,…gLForqueryP,lookatbucketsg1(P),g2(P)…gL(P)Space:L*nQuerytime:L8LSHforHammingHashfunctiong:.:g1(“AGTA”)=“AA”(k=2)L=#hashtables=n1/ck=|logn/log(1-cR/m)|<m*lognT=GAGTAACTCAATAD={GAGT,AGTA,GTAA,…,AATA}HT1:GT->
Locality-Sensitive 来自beplayapp体育下载www.apt-nc.com转载请标明出处.