Frequency Count in Haskell

1.3k views Asked by At

So The task i'm trying to accomplish is to get a frequency list of all possible patterns of length n from a string s.

Input (Text,Length) 
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order

["A","B","C","D"]

where the characters in s are limited to the four mentioned above, so I figured first step would be to get all the possible permutations of length n with repetitions allowed.

permutationsR k = sort(replicateM k ['A','B','C','D'])

eg.

permutationsR 2 

would give the output

 ["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]

and then for each pattern get a count of how many times each occurred so something like

patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)

However i figure this is going to be extremely inefficient because I am essentially going through the entire list to check for each pattern length(permutationsR n) times instead of what I reason should be possible to do in only one iteration.

Is there a way to generate a frequency map like i would in an imperative language.

i.e. in pseudocode

where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount

patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
    patterns[s[i:(i+n)]] += 1
}

basically just iterating through once, splitting the string from (i:i+n) and updating the patterns map with each occurence.

example input,output would be something like this

s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]
1

There are 1 answers

0
KJ4TIP On BEST ANSWER

Here's a possible solution:

import Data.List

groupsOf n str =
  unfoldr (\s ->
            if length s < n
            then Nothing
            else Just (take n s, tail s)) str

frequency :: (Ord t, Eq t) => [t] -> [(t, Int)]
frequency =
  map (\s -> (head s, length s)) . group . sort

groupsOf splits an input string into overlapping sequences of length n. For example,

groupsOf 3 "AABCABC"

would give

["AAB", "ABC", "BCA", "CAB", "ABC"].  

frequency then will count the occurrences of each subsequence, so

frequency $ groupsOf 3 "AABCABC"

should give

[("AAB", 1), ("ABC", 2), ("BCA", 1), ("CAB", 1)].

Any subsequence that does not appear in the result occurred zero times.