what is the need of else block in the method "push_links" of the following code?

81 views Asked by At

This code is for Aho-Corasick algorithm which i have refereed from here

I understood this code up to if block of push_links method but i didn't get the use or requirement for the else part of the same method. More specifically first method is used for the construction of trie. The remaining work is done by second method i.e linking the node to their longest proper suffix which are prefix of some pattern also. This is carried out by the If block then what is the need of else part.

Please help me in this.

const int MAXN = 404, MOD = 1e9 + 7, sigma = 26;

int term[MAXN], len[MAXN], to[MAXN][sigma], link[MAXN], sz = 1;    

  // this method is for constructing trie

void add_str(string s)    
{ 

  // here cur denotes current node   

  int cur = 0;    

  // this for loop adds string to trie character by character

  for(auto c: s)        
    {     
      if(!to[cur][c - 'a'])  
        {   

  //here two nodes or characters are linked using transition                                       
  //array "to"

         to[cur][c - 'a'] = sz++;  
         len[to[cur][c - 'a']] = len[cur] + 1;     

         }

  // for updating the current node  

       cur = to[cur][c - 'a'];  
    }

  //for marking the leaf nodes or terminals

   term[cur] = cur;   
}   


void push_links()  
{
 //here queue is used for breadth first search of the trie  
 int que[sz];  
 int st = 0, fi = 1;  

 //very first node is enqueued 
 que[0] = 0;   

 while(st < fi)  
   { 

 // here nodes in the queue are dequeued  
    int V = que[st++];  

 // link[] array contains the suffix links.
    int U = link[V];


    if(!term[V]) term[V] = term[U];   

  // here links for the other nodes are find out using assumption that the  
  // link for the parent node is defined    

   for(int c = 0; c < sigma; c++)   

   // this if condition ensures that transition is possible to the next node 
   // for input 'c' 

        if(to[V][c])    
        {   

   // for the possible transitions link to the reached node is assigned over 
   // here which is nothing but transition on input 'c' from the link of the 
   // current node

            link[to[V][c]] = V ? to[U][c] : 0;  
            que[fi++] = to[V][c];  
        }  
        else 
        {  
            to[V][c] = to[U][c];  
        }  
   }  
}
2

There are 2 answers

1
Micromega On

IMO you don't need the else-condition. If there is no children either it's already a link or nothing.

0
rdkl On

There are some variations of Aho-Corasick algorithm. Base algorithm assumes that if edge from current node (cur) over symbol (c) is missing, then you go via suffix links to the first node that has edge over c (you make move via this edge). But your way over suffix links is the same (from the same cur and c), because you don't change automaton while searching. So you can cache it (save result of

// start from node 
while (parent of node doesn't have an edge over c) {
  node = parent
}
// set trie position
node = to[node][c]
// go to next character

in to[node][c]. So next time you won't do this again. And it transfrom automaton from non-deterministic into deterministic state machine (you don't have to use link array after pushing, you can use only to array).

There are some problems with this implementation. First, you can get an index of string you found (you don't save it). Also, len array isn't used anywhere.

For

means, this algorithm is just checking the existence of the character in the current node link using "link[to[V][c]] = V ? to[U][c] : 0;". should not it verify in the parents link also?

Yes, it's ok, because if to[U][c] is 0, then there are no edges via c from all chain U->suffix_parent->suffix parent of suffix_parent ... -> root = 0. So you should set to[V][c] to zero.