python - Algorithm for searching text from corrupted file -


i have search tags text file damaged, file damaged data has changed(some character deleted , have been modified). example, have search tag -> "no of pages"

text file 1:

bhaskar rao mukku (57)abstract in system 2 pedal rods pedals, 1 side balls based axle, hollowed secondary axle, counter axle, 2 splined gear wheels has 2 clutch pin holes on circular pitch, 2 splined gear wheels has ratchet gears on circular pitch, sprocket wheel, 4 clutch pins , liver used convert ordinary bicycle gear bicycle. number page : 10

text file 2:

bhaskar rao mukku (57)abstract in system 2 pedal rods pedals, 1 side balls based axle, hollowed secondary axle, counter axle, 2 splined gear wheels has 2 clutch pin holes on circular pitch, 2 splined gear wheels has ratchet gears on circular pitch, sprocket wheel, 4 clutch pins , liver used convert ordinary bicycle gear bicycle. no. of pages: 10

text file 3:

bhaskar rao mukku (57)abstract in system 2 pedal rods pedals, 1 side balls based axle, hollowed secondary axle, counter axle, 2 splined gear wheels has 2 clutch pin holes on circular pitch, 2 splined gear wheels has ratchet gears on circular pitch, sprocket wheel, 4 clutch pins , liver used convert ordinary bicycle gear bicycle. no of pages: 10

above sample of text files. can see in above files word number has been modified 3 different forms, these 3 files, code must output corresponding bold words.

what have tried till find longest common subsequence between tag , continuous string text file (of length equal of tag) , calculated percentage of characters matched , if percentage >85 code output continuous string.

my code

def lcs(s,t):  m = len(s)  n = len(t)  counter = [[0]*(n+1) x in range(m+1)]  longest = 0  lcs_set = set()  in range(m):     j in range(n):         if s[i] == t[j]:             counter[i+1][j+1] = counter[i][j]+1         else:             counter[i+1][j+1]=max(counter[i+1][j],counter[i][j+1])          return counter[m][n]  def match(word,tag):   word=modify(word)   tag=modify(tag)   sq=lcs(word,tag)   return(float(float(sq)/float(max(len(word),len(tag))))) i=0 start=end=0 #records position of matched tag in string p=0.85 #percentage   while <len(string):   #string contains text file   j=i   while j <i+len(tag)+7:#tag tag want search     arr=match(string[i:j+1],tag)     #print(str(p)+" "+str(arr)+' '+string[i:j+1]+' '+str(i))     if (arr>p):       p=arr       start=i       end=j      elif(p==arr):       p=arr       if(end-start>=j-i):         start=i         end=j      j+=1   i+=1     

but codes fails when many cases such text file 1.is there other way searching more accurately , efficiently.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -