Find the combinations of a given encoded string using Ruby

1k views Asked by At

I was asked this question during an interview and I couldn't come up with a satisfactory solution for it. Would appreciate if anybody could give some pointers.

Given a mapping like

  mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}

 encode("A") == "1"
 encode("BA") == "21"
 encode("ABC") == "123"
 encode("") == ""


decode("1") == ["A"] -> 1
decode("21") == ["BA", "V"] -> 2
decode("123") == ["ABC", "JC", "AX"] -> 3
decode("012") == [] -> 0
decode("") == [""] -> 1
decode("102") == ["JB"] -> 1


numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0

We need a numDecode method which gives the length of unique solution array.

Updated :

Given a mapping like :

mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}

Suppose we are given a string as "A" the it can be encoded as : "1"

encode("A") should return "1"
encode("BA") should return "21" as if mapping is a hash then B has a value of 2, A has a value of 1.
encode("ABC") should return "123" as mapping["A" is 1, mapping["B"] is 2, and mapping["C"] is 3.
encode("") should return "" as it is not in mapping.


Now if decode("1") is called then it should return an array with one element i.e. ["A"] as key matching with 1 as value in mapping is "A".
decode("") should return an array with empty string i.e. [""].
decode("21") should return an array ["BA", "U"] as 2 is "B", 1 is "A" and "U" is 21 in mapping.
decode("012") should return an empty array as string starts with "0" which is not in mapping keys.
decode("102") should return an array as ["JB"] as "10" is J and "2" is B.

And finally numDecode should return the count of unique decoded strings in array. So,

numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
6

There are 6 answers

2
Cary Swoveland On
@he = Hash[("A".."Z").to_a.zip((1..26).to_a.map(&:to_s))]
                 # => {"A"=>"1", "B"=>"2",...,"Z"=>"26"} 
@hd = @he.invert # => {"1"=>"A", "2"=>"B",.., "26"=>"Z"}

def decode(str, comb='', arr=[])
  return arr << s if str.empty?

  # Return if the first character of str is not a key of @hd
  return arr unless (c = @hd[str[0]])

  # Recurse with str less the first char, s with c appended and arr
  arr = decode(str[1..-1], s+c, arr)

  # If the first two chars of str are a key of @hd (with value c),
  # recurse with str less the first two chars, s with c appended and arr
  arr = decode(str[2..-1], s+c, arr) if str.size > 1 && (c = @hd[str[0..1]])

  arr
end  

def num_decode(str) decode(str).size end

decode('1')        # => ["A"]
decode('')         # => [""].
decode('21')       # => ["BA", "U"]
decode('012')      # => [""]
decode('102')      # => ["JB"]
decode('123')      # => ["ABC", "AW", "LC"]
decode('12345')    # => ["ABCDE", "AWDE", "LCDE"]
decode('120345')   # => ["ATCDE"] 
decode('12720132') # => ["ABGTACB", "ABGTMB", "LGTACB", "LGTMB"]     

Any more? Yes, I see a hand back there. The gentleman with the red hat wants to see '12121212':

decode('12121212')
  # => ["ABABABAB", "ABABABL", "ABABAUB", "ABABLAB", "ABABLL",   "ABAUBAB",
        "ABAUBL",   "ABAUUB",  "ABLABAB", "ABLABL",  "ABLAUB",   "ABLLAB",
        "ABLLL",    "AUBABAB", "AUBABL",  "AUBAUB",  "AUBLAB",   "AUBLL",
        "AUUBAB",   "AUUBL",   "AUUUB",   "LABABAB", "LABABL",   "LABAUB",
        "LABLAB",   "LABLL",   "LAUBAB",  "LAUBL",   "LAUUB",    "LLABAB",
        "LLABL",    "LLAUB",   "LLLAB",   "LLLL"]

num_decode('1')        # =>  1
num_decode('21')       # =>  2
num_decode('12121212') # => 34
num_decode('12912912') # =>  8 
0
Bandrami On

This looks like a combinatorics problem, but it's also a parsing problem.

(You asked for pointers, so I'm doing this in English rather than dusting off my Ruby.)

I would do something like this:

  1. If X is an empty string, return 1
  2. If X is not a string composed of digits starting with a nonzero digit, return 0
  3. If X contains no 1's or 2's, return 1 (there's only one possible parsing)
  4. If X contains 1's or 2's, it gets a bit more complicated:

Every 1 that is not the last character in X matches both "A" and the first digit of one of the letters "J" through "S".

Every 2 that is not the last character in X and is followed by a digit less than 7 matches both "B" and the first digit of one of the letters.

Count up your 1's and 2's that meet those criteria. Let that number be Y. You have 2^Y combinations of those, so the answer should be 2^Y but you have to subtract 1 for every time you have a 1 and 2 next to each other.

So, if you haven't returned by Step 4 above, count up your 1's that aren't the last character in X, and the 2's that both aren't the last character in X and aren't followed by a 7,8,9, or 10. Let the sum of those counts be called Y.

Now count every instance that those 1's and 2's are neighbors; let that sum be called Z.

The number of possible parsings is (2^Y) - Z.

0
markegli On

In the spirit of giving “some pointers”, instead of writing an actually implementation for numDecode let me say that the most logically straightforward way to tackle this problem is with recursion. If the string passed to numDecode is longer than one character then look at the beginning of the string and based on what you see use one or two (or zero) recursive calls to find the correct value.

And the risk of revealing too much, numDecode("1122") should make recursive calls to numDecode("122") and numDecode("22").

0
Matt On

Here's a recursive solution:

$mapping = Hash[(0..25).map { |i| [('A'.ord+i).chr,i+1] }]
$itoa = Hash[$mapping.to_a.map { |pair| pair.reverse.map(&:to_s) }]

def decode( str )
  return [''] if str.empty?
  return $itoa.key?(str) ? [$itoa[str]] : nil if str.length == 1
  retval = []
  0.upto(str.length-1) do |i|
    word = $itoa[str[0..i]] or next
    tails = decode(str[i+1..-1]) or next
    retval.concat tails.map { |tail| word + tail }
  end
  return retval
end

Some sample output:

p decode('1') # ["A"]
p decode('21') # ["BA", "U"]
p decode('123') # ["ABC", "AW", "LC"]
p decode('012') # []
p decode('') # [""]
p decode('102') # ["JB"]
p decode('12345') # ["ABCDE", "AWDE", "LCDE"]

Note differences between this output and the question. E.g. The 21st letter of the alphabet is "U", not "V". etc.

2
Neil Slater On

This is an interesting question, and the interview technique that goes with it is most likely to see how far the critical thinking goes. A good interviewer would probably not expect a single canonically correct answer.

If you get as far as a recursive decode solution that you then enumerate, then you are doing well IMO (at least I'd hire most candidates who could demonstrate clearly thinking through a piece of recursive code at interview!)

Having said that, one key hint is that the question asks for a num_decode function, not necessarily for implementations of encode and decode.

There is a deeper understanding and structure accessible here, that can be gained from analysing the permutations and combinations. It allows you to write a num_decode function that can handle long strings with millions of possible answers, without filling memory or taking hours to enumerate all possibilities.

First note that any set of separate ambiguous encoding multiply the number of possibilities for the whole string:

1920 -> 19 is ambiguous 'AI' or 'S' -> 'AIT' or 'ST'

192011 -> 11 is also ambiguous 'AA' or 'K' -> 'AITAA', 'AITK', 'STAA', 'STK'

Here 19 has two possible interpretations, and 11 also has two. A string with both of these separate instances of ambiguous codings has 2 * 2 == 4 valid combinations.

Each independent section of ambiguous coding multiplies the size of the whole set of decode values by the number of possibilities that it represents.

Next how to deal with longer ambiguous sections. What happens when you add an ambiguous digit to an ambiguous sequence:

11 -> 'AA' or 'K' -> 2
111 -> 'AAA', 'AK', 'KA' -> 3
1111 -> 'AAAA', 'AAK', 'AKA', 'KAA', 'KK' -> 5
11111 -> 'AAAAA', 'AAAK', 'AAKA', 'AKAA', 'AKK', 'KAAA', 'KAK', 'KKA' -> 8

2,3,5,8 should look familiar, it is the Fibonacci sequence, what's going on? The answer is that adding one digit to the sequence allows all the previous combinations plus those of the sub-sequence before that. By adding a digit 1 to the sequence 1111 you can either interpret it as 1111(1) or 111(11) - so you can add together the number of possibilities in 1111 and 111 to get the number of possibilities in 11111. That is, N(i) = N(i-1) + N(i-2) which is how to construct the Fibonacci series.

So, if we can detect ambiguous coding sequences, and get their length, we can now calculate the number of possible decodes, without actually doing the decode:

# A caching Fibonacci sequence generator
def fib n
  @fibcache ||= []
  return @fibcache[n] if @fibcache[n]
  a = b = 1
  n.times do |i|
    a, b = b, a + b
    @fibcache[i+1] = a
  end
  @fibcache[n]
end

def num_decode encoded
  # Check that we don't have invalid sequences, raising here, but you 
  # could technically return 0 and be correct according to question
  if encoded.match(/[^0-9]/) || encoded.match(/(?<![12])0/)
    raise ArgumentError, "Not a valid encoded sequence"
  end

  # The look-ahead assertion ensures we don't match
  # a '1' or '2' that is needed by a '10' or '20'
  ambiguous = encoded.scan(/[12]*1[789]|[12]+[123456](?![0])/)

  ambiguous.inject(1) { |n,s| n * fib(s.length) }
end

# A few examples:
num_decode('')  # => 1
num_decode('1') # => 1
num_decode('12') # => 2
num_decode('120') # => 1
num_decode('12121212') # => 34
num_decode('1212121212121212121212121211212121212121') # => 165580141

It is relatively short strings like the last one which foil attempts to enumerate the possibilities directly by decoding.

The regex in the scan took a little experimentation to get right. Adding 7,8 or 9 is ambiguous after a 1, but not after a 2. You also want to avoid counting a 1 or 2 directly before a 0 as part of an ambiguous sequence because 10 or 20 have no other interpretations. I think I made about a dozen attempts at the regex before settling on the current version (which I believe to be correct, but I did keep finding exceptions to correct values most times I tested the first versions).

Finally, as an exercise, it should be possible to use this code as the basis from which to write a decoder that directly output the Nth possible decoding (or even one that enumerated them lazily from any starting point, without requiring excessive memory or CPU time).

0
Wyatt Lansdale On
# just look for all singles and double as you go down and keep repeating this.. if you get to the end where the string would be 1 or 2 digets long you count 1
# IE
# 121
# 1 that's good 2 that's good 1 that's good if all good then count + 1 
# 12 that's good 1 that's good ... no more doubles if all good then count + 1
# 1 that's good 21 that's good  if all good then count + 1 
# test this on other cases

$str = "2022"
$strlength = $str.length
$count = 0

def decode(str)
    if str[0].to_i >= 1 and str[0].to_i <= 9
        $count += 1 if str.length == 1
        decode(str[1..-1])
    end

    if str[0..1].to_i >= 10 and str[0..1].to_i <= 26
        $count += 1 if str.length == 2
        p str.length
        decode(str[2..-1])
    end
end



decode($str)

p " count is #{$count}"