I have some data that looks like this:
29 32 33 46 47 48
29 34 35 39 40 43
29 35 36 38 41 43
30 31 32 34 36 49
30 32 35 40 43 44
39 40 43 46 47 50
7 8 9 39 40 43
1 7 8 12 40 43
There is actually a lot more data, but I wanted to keep this short. I'd like to find a way in R to find the longest common subsequence for all rows and sort by the frequency (decreasing) where only those common subsequences that have more than one element in the sequence and more than one frequency is reported. Is there a way to do this in R?
So example result would be something like:
[29] 3
[30] 2
...
( etc for all the single duplicates across each row and their frequencies )
...
[46 47] 2
[39 40 43] 3
[40, 43] 2
Seems like you are asking two different kinds of questions. You want 1) length of contiguous runs of a single value columnwise and 2) count (non-contiguous) of ngrams (made rowwise) but counted columnwise.
Output of single
Output of ngrams
Combining the data
Output
Your data