%timeit issue in Jupyter due to "%" syntax in function?

484 views Asked by At

I want to %timeit a function in Jupyter.

Generate Data

df["One"] = range(1,1001)
df["Two"] = range(2000, 3000)
df["Three"] = range(3000, 4000)
df.set_index(["One"], drop = True, inplace = True)

Set up function

def test_iterrows(df):
    for index, row in df.iterrows():
        if (row["Three"] & 1 == 0):  
            df.loc[index, "Three"] = "Even"
        else:
            df.loc[index, "Three"] = "Odd"
    print df.head()
    gc.collect()
return None

When I run test_iterrows(df), I get:

      Two Three
One            
1    2000  Even
2    2001   Odd
3    2002  Even
4    2003   Odd
5    2004  Even

Fine. The function works. However, when I do %timeit test_iterrows(df), I get an error:

<ipython-input-29-326f4a0f49ee> in test_iterrows(df)
     13 def test_iterrows(df):
     14     for index, row in df.iterrows():
---> 15         if (row["Three"] & 1 == 0):
     16             df.loc[index, "Three"] = "Even"
     17         else:

TypeError: unsupported operand type(s) for &: 'str' and 'int'

What is going on here? My (probably wrong) interpretation is, that I apparently can't %timeit functions that contain %.

What is going on here?

1

There are 1 answers

0
MSeifert On BEST ANSWER

%timeit repeatedly executes the statement and the function changes the df in-place. Note that I get the same exception when I just call the function twice:

test_iterrows(df)
test_iterrows(df)
# TypeError: unsupported operand type(s) for &: 'str' and 'int'

You probably should pass in a copy, although that would slightly "bias" the timings because it also times the time it takes to copy it:

%timeit test_iterrows(df.copy())  # time the execution with a copy
%timeit df.copy()                 # compared to the time it takes to just copy it

Also I'm not quite sure what the gc.collect() call is supposed to do there, because gc.collect just garbage collects objects that can't be garbaged by normal means because of reference cycles.