Julia : Dataframes packages having trouble to convert column containing both int and float

280 views Asked by At

I'm a R user with great interest for Julia. I don't have a computer science background. I just tried to read a 'csv' file in Juno with the following command:

using CSV
using DataFrames

df = CSV.read(joinpath(Pkg.dir("DataFrames"), 
"path/to/database.csv"));

and got the following error message

CSV.CSVError('error parsing a 'Int64' value on column 26, row 289; encountered '.'"
in read at CSV/src/Source.jl:294
in #read#29 at CSV/src/Source.jl:299
in stream! at DataStreams/src/DataStreams.jl:145
in stream!#5 at DataStreams/src/DataStreams.jl:151
in stream! at DataStreams/src/DataStreams.jl:187
in streamto! at DataStreams/src/DataStreams.jl:173
in streamfrom at CSV/src/Source.jl:195
in paresefield at CSV/src/paresefield.jl:107
in paresefield at CSV/src/paresefield.jl:127
in checknullend at CSV/src/paresefield.jl:56

I look at the entry indicated in the data frame: the row 287, 288 are like this 30, 33 respectively (seem to be of type Integer) and the the row 289 is 30.445 (which is of type float).

Is the problem that DataFrames filling the column with Int and stopped when it saw an Float?

Many thanks in advance

1

There are 1 answers

1
Bogumił Kamiński On

The problem is that float happens too late in the data set. By default CSV.jl uses rows_for_type_detect value equal to 100. Which means that only first 100 rows are used to determine the type of a column in the output. Set rows_for_type_detect keyword parameter in CSV.read to e.g. 300 and all should work correctly.

Alternatively you can pass types keyword argument to manually set column type (in this case Float64 for this column would be appropriate).