csv to frequency polygon using R or python

455 views Asked by At

I have a result.csv file to which contains information in the following format : date,tweets

2015-06-15,tweet
2015-06-15,tweet
2015-06-12,tweet
2015-06-11,tweet
2015-06-11,tweet
2015-06-11,tweet
2015-06-08,tweet
2015-06-08,tweet

i want to plot a frequency polygon with number of entries corresponding to each date as y axis and dates as x axis

i have tried the following code :

pf<-read.csv("result.csv")
library(ggplot2)
qplot(datetime, data =pf, geom = "freqpoly")

but it shows the following error : geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

can anyone tell me how to solve this problem. I am totally new to R so any kind of guidance will be of great help to me

2

There are 2 answers

2
Gregor Thomas On BEST ANSWER

Your issue is that you are trying to treat datetime as continuous, but it's imported it as a factor (discrete/categorical). Let's convert it to a Date object and then things should work:

pf$datetime = as.Date(pf$datetime)
qplot(datetime, data =pf, geom = "freqpoly")
0
Vahan Nanumyan On

Based on your code, I assume that the result.csv has a header: datetime, atweet. By default, read.csv takes the first line of the CSV file as column names. That means you will be able to access the two columns with pf$datetime and pf$atweet.

If you look at the documentation of read.csv, you will find that stringsAsFactors = default.stringsAsFactors(), which is FALSE. That is, the strings from CSV files are kept as factors.

Now, even if you change the value of stringsAsFactors, you still get the same error. That is because ggplot does not know how to order the dates, as it does not recognize the strings as such. To transform the strings into logical dates, you can use strptime.

Here is the working example:

pf<-read.csv("result.csv", stringsAsFactors=FALSE)
library(ggplot2)

qplot(strptime(pf$datetime, "%Y-%m-%d"), data=pf, geom='freqpoly')