Reading data from csv and create a graph

3.8k views Asked by At

I have a csv file with data in the following format -

Issue_Type     DateTime
Issue1          03/07/2011 11:20:44
Issue2          01/05/2011 12:30:34
Issue3          01/01/2011 09:44:21
...             ...

I'm able to read this csv file, but what I'm unable to achieve is to plot a graph or rather trend based on the data.

For instance - I'm trying to plot a graph with X-axis as Datetime(only Month) and Y-axis as #of Issues. So I would show the trend in line-graphy with 3 lines indicating the pattern of issue under each category for the month.

I really don't have a code for plotting the graph and hence can't share any, but so far I'm only reading the csv file. I'm not sure how to proceed further to plot a graph

PS: I'm not bent on using python - Since I've parsed csv using python earlier I though of using the language, but if there is an easier approach using some other language - I would be open explore that as well.

3

There are 3 answers

0
Antimony On

The first thing you need to do is to parse the datetime fields as dates/times. Try using dateutil.parser for that.

Next, you will need to count the number of issues of each type in each month. The naive way to do that would be to maintain lists of lists for each issue type, and just iterate through each column, see which month and which issue type it is, and then increment the appropriate counter.

When you have such a frequency count of issues, sorted by issue types, you can simply plot them against dates like this:

import matplotlib.pyplot as plt
import datetime as dt

dates = []
for year in range(starting_year, ending_year):
    for month in range(1, 12):
        dates.append(dt.datetime(year=year, month=month, day=1))

formatted_dates = dates.DateFormatter('%b') # Format dates to only show month names
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(issues[0], dates) # To plot just issues of type 1
ax.plot(issues[1], dates) # To plot just issues of type 2
ax.plot(issues[2], dates) # To plot just issues of type 3
ax.xaxis.set_major_formatter(formatted_dates) # Format X tick labels
plt.show()
plt.close()
3
Erlinska On

A way to do this is to use dataframes with pandas.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";")  #Reads the csv
df.index = pd.to_datetime(df["DateTime"]) #Set the index of the dataframe to the DateTime column
del df["DateTime"] #The DateTime column is now useless

fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_Type"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m'))  #This will only show the month number on the graph

This assumes that Issue1/2/3 are integers, I assumed they were as I didn't really understand what they were supposed to be.

Edit: This should do the trick then, it's not pretty and can probably be optimised, but it works well:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";")
df.index = pd.to_datetime(df["DateTime"])
del df["DateTime"]
list=[]
for Issue in df["Issue_Type"]:
    list.append(int(Issue[5:]))
df["Issue_number"]=list

fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_number"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m'))
plt.show()
0
smundlay On

honestly, I would just use R. check this link out on downloading / setting up R & RStudio.

data <- read.csv(file="c:/yourdatafile.csv", header=TRUE, sep=",")
attach(data)
data$Month <- format(as.Date(data$DateTime), "%m")    
plot(DateTime, Issue_Type)