Preventing csvkit from modifying dates/times?

984 views Asked by At

I'm just trying out csvkit for converting Excel to csv. However, it's not taking into account formatting on dates and times, and producing different results from Excel's own save-as-csv. For example, this is a row of a spreadsheet:

enter image description here

And this what Excel's save-as produces:

22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23

The date has no special formatting, and the time is formatted as [mm].ss.00. However, this is in2csv's version of the csv:

1959-04-22,Bar,F,0.00106481481481,0.000960648148148,0.00068287037037,0.000546643518519

which is of course of no use at all. Any ideas? There don't seem to be any command-line options for this - no-inference doesn't help. Thanks.

EDIT

Both csvkit ans xlrd do seem to take into account formatting, but they're not smart about it. A date of 21/02/1066 is passed though as the text string '21/02/1066' in both cases, but a date '22/04/1959' is turned into '21662.0' by xlrd, and 1959-04-22 by csvkit. Both of them just give up on small elapsed times and pass through the float representation. This is Ok if you know that the cell should contain an elapsed time, because you can just multiply by 24*60*60 to get the right answer.

I don't think xlrd would be much help here since its date tuple functions only handle seconds, and not centiseconds.

EDIT 2

Found out something interesting. I started with a base spreadsheet containing times. In one of them I formatted the times as [m:]ss.00, and in the other I formatted them as [mm:]ss.00. I then saved each as a .xls and a .xlsx, giving a total of 4 spreadsheets. Excel could convert all 4 to csv, and all the time text in the csv's appeared as originally written (ie. 0:21.0, for example, for 0m 21.0s).

in2csv can't handle the two .xls versions at all; this time appears as 00:00:21. It also can't handle the [m:]ss.00 version of the .xlsx - conversion gives the catch-all 'index out of range' error. The only one of the 4 spreadsheets that in2csv can handle is the .xlsx one, with [mm:]ss.00 formatting.

1

There are 1 answers

0
Joe Larson On

The optional -I argument should be working to avoid this issue. When testing your sample data, I get what Excel's save-as produces.

Command: in2csv sample.csv -I > sample-output-i.csv

Output: 22/04/1959,Bar,F,01:32.00,01:23.00,00:59.00,00:47.23

-I, --no-inference Disable type inference when parsing CSV input.

https://csvkit.readthedocs.io/en/latest/scripts/in2csv.html