What is the fastest way to get a groups name by key using pandas?

668 views Asked by At

I checked around as much as I could find.

If I use groupby in pandas, and I have a group, call it group1, how do I get group1's name?

I am using groupby and apply, so I am not explicitly pulling the groups, which is why I need to do this.

Suppose of group df by two things.

df.groupby(['key1','key2'])

Then I get a group using this: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.get_group.html#pandas.core.groupby.GroupBy.get_group

I want to avoid doing:

group1.key1.unique()[0]
group1.key2.unique()[0]

to get the name because that is slow..

1

There are 1 answers

6
JD Long On BEST ANSWER

It's not clear to me what you mean by name of the group. Do you mean the values in the column you are grouping by?

Apply will break the dataframe into multiple smaller dataframes by the groupby columns. The columns you group by are still inside the smaller dataframes. Is that what you are after?

As an illustration:

example data:

np.random.seed(1)
n=10
df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n), 
                   'mygroups2' : np.random.choice(['dogs','cats','cows','chickens'], size=n),
                   'data' : np.random.randint(1000, size=n)})
print df.head()
   data  mygroups mygroups2
0   254      cats      dogs
1   357  chickens      cats
2   914      dogs      dogs
3   468      dogs  chickens
4   907  chickens      cats

let's group it and make up a silly function:

gb = df.groupby(['mygroups','mygroups2'])
def someFunction(ingroup):
    print ingroup
    return ""

gb.apply(someFunction)


   data mygroups mygroups2
7   668     cats      cats
   data mygroups mygroups2
7   668     cats      cats
   data mygroups mygroups2
0   254     cats      dogs
5   252     cats      dogs
   data  mygroups mygroups2
1   357  chickens      cats
4   907  chickens      cats
   data  mygroups mygroups2
6   490  chickens      cows
8   925  chickens      cows
   data mygroups mygroups2
3   468     dogs  chickens
   data mygroups mygroups2
2   914     dogs      dogs
9   398     dogs      dogs
Out[718]:
mygroups  mygroups2
cats      cats         
          dogs         
chickens  cats         
          cows         
dogs      chickens     
          dogs         
dtype: object

so you can see in the resulting printed output that each iteration of the apply gets all columns of the input dataframe.

EDIT:

I'm not sure how to grab a tuple of keys from an apply but I can from a loop:

for eachgroup in gb:
    print 'this group key = ' + str( eachgroup[0] )
    print 'this group values = ' 
    print eachgroup[1]