Groupby Mean not working on titanic dataset in Python

884 views Asked by At

I am using titanic dataset and tring to run the groupby command but its not working as shown on countless tutorials online. I have named my dataframe as ks_cl. Here is the command I executed in VScode:

ks_cl.groupby(['sex']).mean()

This is the output:

NotImplementedError                       Traceback (most recent call last)
File d:\Program Files\Python\Lib\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
   1501     # try to python agg
   1502     # TODO: shouldn't min_count matter?

File d:\Program Files\Python\Lib\site-packages\pandas\core\groupby\ops.py:959, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
    958 ngroups = self.ngroups
--> 959 return cy_op.cython_operation(
    960     values=values,
    961     axis=axis,
    962     min_count=min_count,
    963     comp_ids=ids,
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert CSSSCSSSSSQSSSCSSCQSCSSSSSSSSSSSSCSCSSSSSSSSSQSSSCSSSCCSSQSCSCSSSSSSSCSSSSSSSQSCSSCCCSSSSCQSCSSCCSSSSCCSSCSSCCSSSSSQSSSSSSSSSSSSSCSCSCSSSCSQSSSCSSSCSSSSCCSSSSSCSSSSSSSCSCSCSSSSSSSSSCSCSSQQSSSCCSSCSSSSSSSSSSSQSSSCSSSSSSSSSSSSCCCCSSSSCSSCSCCCSSQS to numeric
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

I was expecting this output:

enter image description here

1

There are 1 answers

2
Timeless On BEST ANSWER

You need to turn on numeric_only in GroupBy.mean :

numeric_only : (bool), default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Deprecated since version 1.5.0: Specifying numeric_only=None is deprecated. The default value will be False in a future version of pandas.

Source : [docs]

And as per pandas 2.0.0 :

Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False (GH46072)

link = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

ks_cl = pd.read_csv(link)
​
out = ks_cl.groupby("Sex").mean(numeric_only=True)

​ Output :

print(out)

        PassengerId  Survived   Pclass       Age    SibSp    Parch      Fare
Sex                                                                         
female   431.028662  0.742038 2.159236 27.915709 0.694268 0.649682 44.479818
male     454.147314  0.188908 2.389948 30.726645 0.429809 0.235702 25.523893