Why does a series or dataframe column of integers containing NaN values have "float64" as data type?

21 views Asked by At

Either by using pd.read_csv or by defining a series of integers, if it contains a NaN value, then the data type of that series or column becomes "float64", including the respective ".0" at the end of each numeric value.

The data type of a column read from a CSV file is one of the characteristics I use for my analysis. When the data in a column is either integer or NaN values, once the table is loaded with pandas.read_csv, the dtype function returns the data type of that column as "float64", while its values are integers.

1

There are 1 answers

2
wotb On

Pure integers cannot be NaN. What you want is the nullable int type.

In code this might look something like:

df=pd.read_csv("file.csv",dtype={"col1":str,"col_with_nan":Int64})

Note the capital "I" in Int64.