So The task i'm trying to accomplish is to get a frequency list of all possible patterns of length n from a string s.
Input (Text,Length)
Output (Frequency)
String -> Int -> [Int]
freqCount s n = frequency list [int] in alphabetical order
["A","B","C","D"]
where the characters in s are limited to the four mentioned above, so I figured first step would be to get all the possible permutations of length n with repetitions allowed.
permutationsR k = sort(replicateM k ['A','B','C','D'])
eg.
permutationsR 2
would give the output
["AA","AB","AC","AD","BA","BB","BC","BD","CA","CB","CC","CD","DA","DB","DC","DD"]
and then for each pattern get a count of how many times each occurred so something like
patternCount:: String -> String -> Int
patternCount text pattern = length (filter (\x -> x == pattern) [take (length(pattern)) (drop x text)| x <- [0..length(text)- length(pattern)]])
frequencyCount s n = map (\x -> patternCount s x) (permutationsR n)
However i figure this is going to be extremely inefficient because I am essentially going through the entire list to check for each pattern length(permutationsR n) times instead of what I reason should be possible to do in only one iteration.
Is there a way to generate a frequency map like i would in an imperative language.
i.e. in pseudocode
where s = string
and n = length of pattern
//pattern is a map where key = pattern and value = frequencyCount
patterns = {"AA":0,"AB":0,"AC:0...}
for (i = 0; i < (length s - n); i++){
patterns[s[i:(i+n)]] += 1
}
basically just iterating through once, splitting the string from (i:i+n) and updating the patterns map with each occurence.
example input,output would be something like this
s= "AABBCA"
n = 2
frequencyList s n = [1,1,0,0,0,1,1,0,1,0,0,0,0,0,0,0]
Here's a possible solution:
groupsOf
splits an input string into overlapping sequences of length n. For example,would give
frequency
then will count the occurrences of each subsequence, soshould give
Any subsequence that does not appear in the result occurred zero times.