I am trying to load a csv into a data frame using pandas.read_csv. My data has a column of cell ids that are 18 digits long, and then other columns with other data. Sometimes there is an empty entry, as shown below:
| root_id_orig | root_id_final | coarse_id |
|---|---|---|
| 648518346489344345 | 648518346489344345 | local |
| 648518346509145466 | ||
| 648518346489461189 | 648518346489461189 | intersegmental |
When I use pandas.read_csv, it reads in the empty spaces as NaNs, which is good, but then it also rounds the 18 digit numbers. I can force it to display all 18 digits, but then it will replace the last two digits seemingly randomly, so that '648518346489344345' becomes '648518346489344308.'
I would like to load in the data and avoid this rounding issue, but still have something like NaN in the empty entries, so that I know to ignore them later. Alternatively, I could just drop the rows with the empty entries, since honestly I do that later anyway. Any advice?
Edit: actual pandas output --
test1 = csv of data with no spaces/missing entries test2 = csv of data with missing entries
pd.set_option('display.float_format', lambda x: '%18.0f' % x)
segIDs1 = pd.read_csv(csv_path+'test1.csv')
segIDs2 = pd.read_csv(csv_path+'test2.csv')
print(segIDs1)
print(segIDs2)
segIDs1 prints as the following:
root_id_orig root_id_final coarse_id
0 648518346492622267 648518346492622267 local
1 648518346490149896 648518346490149896 intersegmental
2 648518346475243320 648518346475243320 local
3 648518346486220960 648518346486220960 local
4 648518346486220960 648518346491547966 intersegmental
.. ... ... ...
348 648518346494699683 648518346526246871 MN
349 648518346491602705 648518346499802323 local
350 648518346492012120 648518346503946592 local
351 648518346476192927 648518346499062337 local
352 648518346493059320 648518346492999344 intersegmental
[353 rows x 3 columns]
segIDs2 prints as the following:
root_id_orig root_id_final coarse_id
0 648518346492622267 648518346492622208 local
1 648518346490149896 648518346490149888 intersegmental
2 648518346475243320 648518346475243264 local
3 648518346486220960 648518346486220928 local
4 648518346486220960 648518346491547904 intersegmental
.. ... ... ...
529 648518346475585266 NaN NaN
530 648518346472501734 NaN NaN
531 648518346471918758 NaN NaN
532 648518346468216120 NaN NaN
533 648518346468216120 NaN NaN
[534 rows x 3 columns]