Groupby Mean not working on titanic dataset in Python

Question

Groupby Mean not working on titanic dataset in Python

886 views Asked by Muhammad Ali Siddiqui At 12 May 2023 at 06:39

I am using titanic dataset and tring to run the groupby command but its not working as shown on countless tutorials online. I have named my dataframe as ks_cl. Here is the command I executed in VScode:

ks_cl.groupby(['sex']).mean()

This is the output:

NotImplementedError                       Traceback (most recent call last)
File d:\Program Files\Python\Lib\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
   1501     # try to python agg
   1502     # TODO: shouldn't min_count matter?

File d:\Program Files\Python\Lib\site-packages\pandas\core\groupby\ops.py:959, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
    958 ngroups = self.ngroups
--> 959 return cy_op.cython_operation(
    960     values=values,
    961     axis=axis,
    962     min_count=min_count,
    963     comp_ids=ids,
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert CSSSCSSSSSQSSSCSSCQSCSSSSSSSSSSSSCSCSSSSSSSSSQSSSCSSSCCSSQSCSCSSSSSSSCSSSSSSSQSCSSCCCSSSSCQSCSSCCSSSSCCSSCSSCCSSSSSQSSSSSSSSSSSSSCSCSCSSSCSQSSSCSSSCSSSSCCSSSSSCSSSSSSSCSCSCSSSSSSSSSCSCSSQQSSSCCSSCSSSSSSSSSSSQSSSCSSSSSSSSSSSSCCCCSSSSCSSCSCCCSSQS to numeric
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

I was expecting this output:

enter image description here

Original Q&A

There are 1 answers

**Timeless** · Accepted Answer · 2023-05-12T06:48:19+00:00

You need to turn on numeric_only in GroupBy.mean :

numeric_only : (bool), default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Deprecated since version 1.5.0: Specifying numeric_only=None is deprecated. The default value will be False in a future version of pandas.

Source : [docs]

And as per pandas 2.0.0 :

Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False (GH46072)

link = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

ks_cl = pd.read_csv(link)

out = ks_cl.groupby("Sex").mean(numeric_only=True)

Output :

print(out)

        PassengerId  Survived   Pclass       Age    SibSp    Parch      Fare
Sex                                                                         
female   431.028662  0.742038 2.159236 27.915709 0.694268 0.649682 44.479818
male     454.147314  0.188908 2.389948 30.726645 0.429809 0.235702 25.523893

TechQA.

Groupby Mean not working on titanic dataset in Python

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DATA-SCIENCE

Popular Questions

Popular Tags

Trending Questions