With one dataframe, and for each entry, I want to find the number of times the row data "new_page" and "treatment" don't match.

Can someone also explain to me how to add an image? https://ibb.co/gSv7FR4

I would imagine it's something like this where if the condition meet, count goes up. I would appreciate an explanation of how to do this in addition to the solution of the above problem.

if df.group[n]=='treatment' and df.landing_page[n]=='new_page'

4 Answers

hacker315 On Best Solutions

This would give the total times when 'new_data' and 'treatment' are in the same row:

((df.group=='treatment') & (df.landing_page=='new_page')).sum()
Anakhand On
sum(df.group == 'treatment' & df.landing_page == 'new_page')

Here each of df.group == 'treatment', df.landing_page == 'new_page' are logical arrays indicating the positions at which each predicate is True. The & operator combines them into a logical array indicating the positions at which both predicates are True. summing the array returns the number of True values.

If you want to be more verbose, this

import numpy as np

sum(np.logical_and(df.group == 'treatment', df.landing_page == 'new_page'))

also works.

Although the first approach is more readable, it iterates through the whole length of the array to create the necessary temporaries. A direct "lazy" way would be

sum(filter(lambda x: x['group'] == 'treatment' and x['landing_page'] == 'new_page', df.iterrows()))
Wen-Ben On

Let us stack with pandas

alexprice On

you can use the fact True is treated as 1 in pandas sum() function: