How Does Python Apply a Method from one Library to the Object of Another?

388 views Asked by At

When using pandarallel to use all cores when running .apply methods on my dataframes, I came across a syntax which I never saw before. Rather, it's a way of using dot syntax that I don't understand.

import pandas as pd
from pandarallel import pandarallel

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b'])


So far so good, just setting up a dataframe. Next, to get pandarallel ready, we do

pandarallel.initialize()


Next up is the bit where I am confused: to use pandarallel we call this method on the dataframe

df.parallel_apply(func)


My question is: if the dataframe df was instantiated using the pandas library, and pandas does not have a method called parallel_apply, how is it that Python knows to use the pandarallel method on the pandas object?

I presume it's something to do with the initialization, but I have never seen this before and I don't understand what's happening in the back end.

2

There are 2 answers

0
Bruno Mello On BEST ANSWER

You can create your methods to a previously created object:

def my_func(self):
    return 2*self


pd.DataFrame.my_method = my_func

df.my_method()

a   b
2   8
4  10
6  12

You can even pass arguments:

def sum_x(self, x):
    return self+x

pd.DataFrame.sum_x = sum_x

df.sum_x(3)
a  b
4  7
5  8
6  9

The first argument will be the self as a usual method inside a class.

1
Carcigenicate On

It appears to happen in initialize:

DataFrame.parallel_apply = parallelize(*args)

It seems that Dataframes allow attributes to be added on later, and that's what's happening here. parallelize appears to be a factory function that creates functions based on the passed args. It seems to be creating functions to act as methods, and that method it creates is being assigned to parallel_apply.