More concise way of setting a counter, incrementing, and returning?

84 views Asked by At

I'm working on a problem in which I want to compare two strings of equal length, char by char. For each index where the chars differ, I need to increment the counter by 1. Right now I have this:

def compute(strand1, strand2)
    raise ArgumentError, "Sequences are different lengths" unless strand1.length == strand2.length

    mutations = 0
    strand1.chars.each_with_index { |nucleotide, index| mutations += 1 if nucleotide != strand2[index] }
    mutations

    end
end

I'm new to this language but this feels very non-Ruby to me. Is there a one-liner that can consolidate the process of setting a counter, incrementing it, and then returning it?

I was thinking along the lines of selecting all chars that don't match up and then getting the size of the resulting array. However, from what I could tell, there is no select_with_index method. I was also looking into inject but can't seem to figure out how I would apply it in this scenario.

4

There are 4 answers

9
user12341234 On BEST ANSWER

Probably the simplest answer is to just count the differences:

strand1.chars.zip(strand2.chars).count{|a, b| a != b}
3
Nic Nilov On

Here is a number of one-liners:

strand1.size.times.select { |index| strand1[index] != strand2[index] }.size

This is not the best one since it generates an intermediary array up to O(n) in size.

strand1.size.times.inject(0) { |sum, index| sum += 1 if strand1[index] != strand2[index]; sum }

Does not take extra memory but is a bit hard to read.

strand1.chars.each_with_index.count { |x, index| x != strand2[index] }

I'd go with that. Kudos to user12341234 for mentioning count which is this is built upon.

Update

Benchmarking on my machine gives different results from what @CarySwoveland gets:

                   user     system      total        real
mikej          4.080000   0.320000   4.400000 (  4.408245)
user12341234   3.790000   0.210000   4.000000 (  4.003349)
Nik            2.830000   0.170000   3.000000 (  3.008533)


                   user     system      total        real
mikej          3.590000   0.020000   3.610000 (  3.618018)
user12341234   4.040000   0.140000   4.180000 (  4.183357)
lazy user      4.250000   0.240000   4.490000 (  4.497161)
Nik            2.790000   0.010000   2.800000 (  2.808378)

This is not to point out my code as having better performance, but to mention that environment plays a great role in choosing any particular implementation approach.

I'm running this on native 3.13.0-24-generic #47-Ubuntu SMP x64 on 12-Core i7-3930K with enough RAM.

6
Cary Swoveland On

This is an just extended comment, so no upvotes please (downvotes are OK).

Gentlemen: start your engines!

Test problem

def random_string(n, selection)
  n.times.each_with_object('') { |_,s| s << selection.sample }
end

n = 5_000_000
a = ('a'..'e').to_a
s1 = random_string(n,a)
s2 = random_string(n,a)

Benchmark code

require 'benchmark'

Benchmark.bm(12) do |x|

  x.report('mikej') do
    s1.chars.each_with_index.inject(0) {|c,(n,i)| n==s2[i] ? c : c+1}
  end

  x.report('user12341234') do
    s1.chars.zip(s2.chars).count{|a,b| a != b }
  end

  x.report('lazy user') do
    s1.chars.zip(s2.chars).lazy.count{|a,b| a != b }
  end

  x.report('Nik') do
    s1.chars.each_with_index.count { |x,i| x != s2[i] }
  end
end

Results

mikej          6.220000   0.430000   6.650000 (  6.651575)
user12341234   6.600000   0.900000   7.500000 (  7.504499)
lazy user      7.460000   7.800000  15.260000 ( 15.255265)
Nik            6.140000   3.080000   9.220000 (  9.225023)

                  user     system      total        real
mikej          6.190000   0.470000   6.660000 (  6.662569)
user12341234   6.720000   0.500000   7.220000 (  7.223716)
lazy user      7.250000   7.110000  14.360000 ( 14.356845)
Nik            5.690000   0.920000   6.610000 (  6.621889)

[Edit: I added a lazy version of 'user12341234'. As is generally the case with lazy versions of enumerators, there is a tradeoff between the amount of memory used and execution time. I did several runs. Here I report the results of two typical ones. There was very little variability for 'mikej' and 'user12341234', somewhat more for lazy user and quite a bit for Nik.]

2
mikej On

inject would indeed by a way to do this with a one-liner so you are on the right track:

strand1.chars.each_with_index.inject(0) { |count, (nucleotide, index)|
  nucleotide == strand2[index] ? count : count + 1 }

We start with an initial value of 0 and then if the 2 letters are the same we just return the current value of the accumulator value, if the letters are different we add 1.

Also, notice here that each_with_index is being called without a block. When called like this it returns an enumerator object. This lets us use one of the other enumerator methods (in this case inject) with the pairs of value and index returned by each_with_index, allowing the functionality of each_with_index and inject to be combined.


Edit: looking at it, user12341234's solution is a nice way of doing it. Good use of zip! I'd probably go with that.