fetching a csv that have a vector in one column c(,)

105 views Asked by At

I was using R to get my data perpared, however I find myself forced to use python instead. The csv files have been stored as sf dataframe, where a column geometry stores both long and lat. In my files, I have the following structure:

a,geometry,b
50,c(-95.11, 10.19),32.24
60,,c(-95.12, 10.27),22.79
70,c(-95.13, 10.28),14.91
80,c(-95.14, 10.33),18.35
90,c(-95.15, 10.5),28.35
99,c(-95.16, 10.7),48.91

The aim here is to read the file while knowing that c(-95.11, 10.19) are 2 values lon and lat so they can be storred in two different columns. However having the separator inside the value which is also not a string makes this really hard to do.

The expected output should be :

a,long,lat,b
50,-95.11, 10.19,32.24
60,,-95.12, 10.27,22.79
70,-95.13, 10.28,14.91
80,-95.14, 10.33,18.35
90,-95.15, 10.5,28.35
99,-95.16, 10.7,48.91
2

There are 2 answers

1
Timus On BEST ANSWER

Does this work (input file: data.csv; output file: data_out.csv):

import csv

with open('data.csv', 'r') as fin, open('data_out.csv', 'w') as fout:
    reader, writer = csv.reader(fin), csv.writer(fout)
    next(reader)
    writer.writerow(['a', 'long', 'lat', 'b'])
    for row in reader:
        row[1] = row[1][2:]
        row[2] = row[2][1:-1]
        writer.writerow(row)

In your sample output is a blank after the second column: Is this intended? Also, your sample input has in line two a double , after the first column?

0
Jindra Lacko On

If you were looking for a R based solution you may consider extracting the coordinates from {sf} based geometry column into regular columns, and saving accordingly.

Consider this example, built on three semi-random North Carolina cities:

library(sf)
library(dplyr)

cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
                     x = c(-78.633333, -79.819444, -77.912222),
                     y = c(35.766667, 36.08, 34.223333)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

cities # a class sf data.frame
Simple feature collection with 3 features and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: -79.81944 ymin: 34.22333 xmax: -77.91222 ymax: 36.08
geographic CRS: WGS 84
        name                   geometry
1    Raleigh POINT (-78.63333 35.76667)
2 Greensboro    POINT (-79.81944 36.08)
3 Wilmington POINT (-77.91222 34.22333)

mod_cit <- cities %>% 
  mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2]) %>% 
  st_drop_geometry()

mod_cit # a regular data.frame
        name      long      lat
1    Raleigh -78.63333 35.76667
2 Greensboro -79.81944 36.08000
3 Wilmington -77.91222 34.22333