Error when pull data from dictionary and path-combine in pandas

123 views Asked by At

I tried to pull data from 3 cities. How can I read all 3 city data instead of reading one by one below? Do I have duplicated code for reading data below? How to read data from dictionary to avoid the error? Thanks so much.

import csv
with open('C:\\Users\\jasch\\chicago.csv') as chicago_data:
    csvReader = csv.reader(chicago_data)

import csv
with open('C:\\Users\\jasch\\new_york_city.csv') as new_york_data:
    csvReader = csv.reader(new_york_data)

import csv
with open('C:\\Users\\jasch\\washington.csv') as washington_data:
    csvReader = csv.reader(washington_data)

import time
import pandas as pd
import numpy as np

CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

df = pd.read_csv(CITY_DATA[city])

df['Start Time'] = pd.to_datetime(df['Start Time'])
df['month'] = df['Start Time'].dt.month
print (df['month'])

NameError                                 Traceback (most recent call last)
<ipython-input-16-b1588646f194> in <module>()
      7               'washington': 'washington.csv' }
      8 
----> 9 df = pd.read_csv(CITY_DATA[city])
     10 
     11 df['Start Time'] = pd.to_datetime(df['Start Time'])

NameError: name 'city' is not defined

3. csv files of city data have almost the same column names below.

     Start Time             End Time  Trip Duration  \
0  2017-05-29 18:36:27  2017-05-29 18:49:27            780   
1  2017-06-12 19:00:33  2017-06-12 19:24:22           1429   
2  2017-02-13 17:02:02  2017-02-13 17:20:10           1088   
3  2017-04-24 18:39:45  2017-04-24 18:54:59            914   
4  2017-01-26 15:36:07  2017-01-26 15:43:21            434   

              Start Station                          End Station  \
0     Columbus Dr & Randolph St                 Federal St & Polk St   
1        Kingsbury St & Erie St  Orleans St & Merchandise Mart Plaza   
2         Canal St & Madison St              Paulina Ave & North Ave   
3  Spaulding Ave & Armitage Ave       California Ave & Milwaukee Ave   
4        Clark St & Randolph St         Financial Pl & Congress Pkwy   

    User Type  Gender  Birth Year  
0  Subscriber    Male      1991.0  
1    Customer     NaN         NaN  
2  Subscriber  Female      1982.0  
3  Subscriber    Male      1966.0  
4  Subscriber  Female      1983.0   
1

There are 1 answers

0
tobsecret On BEST ANSWER

I think you don't need to go through all the trouble of reading in files with the csv module first. You are also re-assigning csvReader two times, so the first two files (Chicago and New York) are not referred to by anything after you are done reading in csv files.

Below is the pandas way of reading in multiple files and combining them into one file:

import pandas as pd
import os

city_data_files = ['C:\\Users\\jasch\\chicago.csv','C:\\Users\\jasch\\new_york_city.csv', 'C:\\Users\\jasch\\washington.csv']

In this line below, we are looping through the list of file paths and creating a DataFrame for each one, leaving us with a list of DataFrames. Additionally we are using the .assign() method to add a column with the filename. We do this so after combining the DataFrames together we can still tell apart which row came from which file.

dfs = [
       pd.read_csv(city_data_file, parse_dates=['Start Time'])\
       .assign(filename=os.path.basename(city_data_file)) 
       for city_data_file in city_data_files
       ]

Now we can go ahead and combine all the DataFrames into one DataFrame.

df = pd.concat(dfs) # this line combines the contents of the files 
df['month'] = df['Start Time'].dt.month

As for your error - the stack trace is telling you exactly what the problem is:

----> 9 df = pd.read_csv(CITY_DATA[city])
NameError: name 'city' is not defined

You are using the variable city but have never defined it anywhere in your code.