Sum if frequency countif unique cells

1.2k views Asked by At

We get calls in our system, people press menu keys on their phone.

We don't want to count duplicate calls. This formula works.

We want the formula to ignore calls that were less than 60 seconds.

I only want the function to run when the row in column C is <60.

This is my function:

=SUM(--(FREQUENCY(IF(($B$1:$B$94={"1,1"}),COUNTIF($A$1:$A$94,"<"&$A$1:$A$94),""),COUNTIF($A$1:$A$94,"<"&$A$1:$A$94))>0))

With the data as such:

       A           B      C
(888) 208-0980    1,1    479
(888) 208-0980    1,1    479
(888) 208-0980    1,1    60
(888) 208-0980    1,2    50
(888) 540-7272    1,4    10

Here is a screenshot:

enter image description here

This is an array so when you copy it in you will need to press Ctrl + Shift + Enter.

The function only counts one instance of the number given the constraint. I want to add another constraint that is if C is greater than 60.

1

There are 1 answers

0
DMM On

OK, its a hell of a formula and does take some deconstructing.

The =COUNTIF($A$1:$A$94,"<"&$A$1:A$94) phrase generates an array of length 94 equal to {0,1,2,...,93} except where the corresponding phone number in $A$1:$A$94 duplicates an earlier number (when reading down column A), the corresponding array value of the first occurrence of the number is used. So, for example, the number "(888) 941-1960" occurs three times in rows 11, 12 and 13 of column A. The eleventh array element corresponds to the first occurrence of this number (in A11) and its value is 10. This value is repeated in the twelfth and thirteenth elements of the array, reflecting the duplicate values in A12 and A13. The number of distinct values in the array matches to the number of distinct phone numbers in $A$1:$A$94 I'll use PhoneIndex as a shorthand for this array.

The IF part re-written is IF($B1:$B94={"1,1"},PhoneIndex,""). This is an array of length 94 which simply replaces elements of PhoneIndex with zero-length strings where the corresponding entry in column B is not "1,1". I'll use PhoneIndex2 as a shorthand for this second array.

The FREQUENCY part rewritten is FREQUENCY(PhoneIndex2, PhoneIndex). An undocumented feature of the FREQUENCY function is that if values are duplicated in the second argument (the so-called Bins_array) then the duplicates are ignored and corresponding frequency values in the Bins_array are set as zero. PhoneIndex corresponds to all the phone numbers in $A$1:$A$94 (with duplicates), so the function delivers a third array (again of length 94) whose elements correspond in order to the phone numbers in $A$1:$A$94 and which provide a frequency count of the number of calls made from that number where 1,1 was pressed but any elements in the array corresponding to a duplicated phone number have a value of zero. I'll call this array CallCount.

The rest of the formula boils down to {=SUM(--(CallCount>0))}. The boolean comparison converts the array of frequency values in CallCount to an array containing True and False values according to whether or not any calls from the corresponding phone number in $A$1:$A$94 were associated associated with a "1,1" in column B. The double negative converts the array of booleans into an array of 1's and 0's which can then be summed. It is the boolean comparison and the double negative which ensures that a frequency count greater than 1 gets converted to a value of 1 for use in the summation. This is the mechanism which excludes duplicate calls in the final result.

Having deconstructed the formula, I'll now turn to modifying it to exclude calls less than 60 seconds. The obvious starting point is that IFpart. My intial thought was to modify its first argument and make it something like

AND($B$1:$B$94={"1,1"}, $C$1:$C$94>=60)

but, though valid syntactically, it delivers an array whose elements are either all True or all False - with the latter being by far the most likely outcome. I'll leave it as an exercise for the reader to figure out why both these observations hold.

Of course, what is really required is to combine the two arrays of booleans ($B$1:$B$94={"1,1"} and $C$1:$C$94>=60) to yield an array of length 94 where each element is the equivalent of an AND operation being performed on the two corresponding elements from each input array.

One solution is

1=((1*($B$1:$B$94={"1,1"}))*(1*($C$1:$C$94>=60)))

but no doubt there are others. The 1*'s convert each boolean array into an array of 1's and 0's (it is my preferred alternative to the double negative used in the original formula to do the same conversion job). The two arrays are multiplied together on a corresponding element basis giving a "product array" of length 94 which also comprises 1's and 0's. This array is then converted back to an array of True and False elements via the 1= bit. The resulting boolean array has 94 elements and, because of the way it is constructed, its elements have the required properties. (Readers of a mathematical mind will recognise the isomorphism being exploited here to get round Excel's behaviour with AND in array formulae.)

Substituting the above for the first argument of the IF() function in the original formula should do the trick.

One final observation: very long formulae can be simplified using Excel's Name Manager. It is perfectly valid to use a formula when defining a name, so PhoneIndex can be defined in the Name Manager as set out above. Names can also utilise other names in their definition, so providing easy means of defining PhoneIndex2 and CallCount. Ultimately, the phone calls formula could be written much more succinctly as =SUM(--(CallCount>0)). Sensible use of naming can be used to show how a complex formula has been constructed from individual elements and can greatly ease the burden of understanding what the formula is intended to do - useful when, as in this case, the formula requires change.