I have been watching the lecture by Prof. Erik Demaine from MIT in the Advanced Data Structure course, focusing on Suffix Trees and String Matching in general. In the beginning of the lecture, he mentioned that it can be useful to use Suffix Trees to answer queries, including wildcards.
At some point in the lecture (link attached with a time stamp), Erik said we can determine how many occurrences of a particular pattern exist in our string by augmenting the suffix trie with subtree size.
So, I am trying to understand if the subtree size includes the size of the node itself. For example, in the timestamp I provided, let's examine how many times 'ana' appears in 'banana'. We know it appears twice. But if the node on the board in the video, which is a parent of 1 and 3, the red indexes, is augmented with subtree size and includes itself in the count, then it will have SubtreeSize=3, which is not the correct number of occurrences of the pattern 'ana' in 'banana'. Or I might have looked at the wrong node.
Please help. I know it might seem easy or trivial, but I am striving to advance my knowledge.
The lecture: (at the time 47:40) https://youtu.be/NinWEPPrkDQ?si=9GFBd_U8N4nHaXnH&t=2862
to summarize, I would like to know the next:
- Whether the node I referred to in the lecture accurately indicates the number of occurrences of the pattern 'ana' in the text represented by the suffix trie.
- Whether the subtree size includes the count of the node itself or just the leaf nodes.
