In a nutshell, Shazam records a fingerprint of the song you're listening to, and sends it to its backend servers to match it against a fingerprint database. The lookup process then produces a histogram of offsets for each song in the index, and declares the song with most matches at a single offset to be the winner. Details about the algorithm can be found in the original paper here.
According to this blog post, Shazam split its index into tiers, in order to speed up the lookup process. The fingerprints of the most popular songs are stored in the first tier, which gets queried first. If no matching song is found in the first tier, the search then proceeds to the second tier, and so on and so forth.
What I don't get is how Shazam avoids false positives with such an architecture? E.g. how does it avoid matching a popular track with a high matching score when there is a less popular track with a higher matching score in a lower tier? Does it use a scoring function and a threshold? If yes, what would the scoring function look like?