Fitting a polinomial curve to time series data

102 views Asked by At

I have a time series graph with monthly article frequency as the y axis. The data looks like this:

     Count.V       Date      Month       Week       Year
2637       6 2006-01-02 2006-01-01 2006-01-02 2006-01-01
406        4 2006-01-03 2006-01-01 2006-01-02 2006-01-01
543        4 2006-01-04 2006-01-01 2006-01-02 2006-01-01
998        3 2006-01-05 2006-01-01 2006-01-02 2006-01-01
1400       4 2006-01-06 2006-01-01 2006-01-02 2006-01-01
2218       4 2006-02-01 2006-02-01 2006-01-30 2006-01-01
2792       6 2006-02-02 2006-02-01 2006-01-30 2006-01-01
2488      10 2006-02-03 2006-02-01 2006-01-30 2006-01-01
954        8 2006-02-04 2006-02-01 2006-01-30 2006-01-01
2622       3 2006-02-06 2006-02-01 2006-02-06 2006-01-01
2321      11 2006-02-07 2006-02-01 2006-02-06 2006-01-01
2452      10 2006-03-21 2006-03-01 2006-03-20 2006-01-01
2267       5 2006-03-22 2006-03-01 2006-03-20 2006-01-01
1408       3 2006-03-23 2006-03-01 2006-03-20 2006-01-01
2602       3 2006-03-24 2006-03-01 2006-03-20 2006-01-01
2489       5 2006-03-25 2006-03-01 2006-03-20 2006-01-01
2771       1 2006-03-27 2006-03-01 2006-03-27 2006-01-01

I use ggplot2 to plot it:

MyPlot <- ggplot(data = df, aes(x = Month, y = Count.V)) + stat_summary(fun.y = sum, geom ="line") + scale_x_date(
labels = date_format("%m-%y"),
breaks = "3 months")

Time series plot

However when I try to fit a polynomial curve to the data e.g.,

MyPlot + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) 

Something doesn't work right:

Time series graph with a polynomial curve (failed)

What am I doing wrong?

EDIT: Added the portion of a data frame with multiple months:

> dput(df) structure(list(Count.V = c(6L, 4L, 4L, 3L, 4L, 5L, 2L, 8L, 6L, 5L, 12L, 1L, 2L, 3L, 4L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 2L, 4L, 4L, 6L, 10L, 8L, 3L, 11L, 8L, 13L, 3L, 9L, 7L, 4L, 7L, 9L, 5L, 4L, 5L, 6L, 5L, 9L, 5L, 11L, 4L, 6L, 2L, 8L, 3L, 5L, 4L, 3L, 5L, 4L, 2L, 3L, 3L, 3L, 8L, 6L, 1L, 3L, 10L, 5L, 3L, 3L, 5L, 1L, 8L, 4L, 3L, 2L, 1L, 4L, 4L, 4L, 5L, 7L, 8L, 3L, 4L, 7L, 5L, 3L, 3L, 4L, 6L, 3L, 2L, 3L, 2L, 5L, 6L, 4L, 5L, 8L, 3L, 4L), Date = structure(c(13150, 13151, 13152, 13153, 13154, 13155, 13157, 13158, 13159, 13161, 13162, 13164, 13165, 13166, 13168, 13169, 13171, 13172, 13173, 13174, 13175, 13176, 13178, 13179, 13180, 13181, 13182, 13183, 13185, 13186, 13187, 13188, 13189, 13190, 13192, 13193, 13194, 13195, 13196, 13197, 13199, 13200, 13201, 13202, 13203, 13204, 13206, 13207, 13208, 13209, 13210, 13211, 13214, 13215, 13216, 13217, 13218, 13220, 13221, 13222, 13223, 13224, 13225, 13227, 13228, 13229, 13230, 13231, 13232, 13234, 13235, 13236, 13237, 13238, 13239, 13241, 13242, 13243, 13244, 13245, 13246, 13248, 13249, 13250, 13251, 13252, 13253, 13256, 13257, 13258, 13259, 13260, 13262, 13263, 13264, 13265, 13266, 13267, 13270, 13271), class = "Date"), Month = structure(c(13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13180, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13208, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13239, 13269, 13269 ), class = "Date"), Week = structure(c(13150, 13150, 13150, 13150, 13150, 13150, 13157, 13157, 13157, 13157, 13157, 13164, 13164, 13164, 13164, 13164, 13171, 13171, 13171, 13171, 13171, 13171, 13178, 13178, 13178, 13178, 13178, 13178, 13185, 13185, 13185, 13185, 13185, 13185, 13192, 13192, 13192, 13192, 13192, 13192, 13199, 13199, 13199, 13199, 13199, 13199, 13206, 13206, 13206, 13206, 13206, 13206, 13213, 13213, 13213, 13213, 13213, 13220, 13220, 13220, 13220, 13220, 13220, 13227, 13227, 13227, 13227, 13227, 13227, 13234, 13234, 13234, 13234, 13234, 13234, 13241, 13241, 13241, 13241, 13241, 13241, 13248, 13248, 13248, 13248, 13248, 13248, 13255, 13255, 13255, 13255, 13255, 13262, 13262, 13262, 13262, 13262, 13262, 13269, 13269), class = "Date"), Year = structure(c(13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149, 13149), class = "Date")), .Names = c("Count.V", "Date", "Month", "Week", "Year"), row.names = c(2637L, 406L, 543L, 998L, 1400L, 2667L, 1211L, 140L, 737L, 545L, 2573L, 978L, 2119L, 842L, 1866L, 1002L, 1956L, 1229L, 2278L, 1889L, 1285L, 1020L, 964L, 1584L, 2218L, 2792L, 2488L, 954L, 2622L, 2321L, 796L, 501L, 294L, 2476L, 2541L, 642L, 177L, 1222L, 1249L, 990L, 2776L, 580L, 1181L, 1792L, 431L, 224L, 214L, 679L, 1601L, 1655L, 645L, 2785L, 1507L, 1580L, 1274L, 2083L, 157L, 2491L, 2733L, 1533L, 2332L, 328L, 1995L, 1598L, 2452L, 2267L, 1408L, 2602L, 2489L, 2771L, 2323L, 1714L, 907L, 1522L, 882L, 2727L, 844L, 2105L, 253L, 1160L, 2075L, 1435L, 821L, 1284L, 2406L, 2357L, 1499L, 2145L, 1539L, 1890L, 1856L, 27L, 887L, 1500L, 812L, 1677L, 1965L, 2580L, 823L, 1482L), class = "data.frame")

1

There are 1 answers

5
Mamoun Benghezal On BEST ANSWER

try using mean instead of sum like this

ggplot(data = df, aes(x = Month, y = Count.V)) +
    stat_summary(fun.y = mean, geom ="line")+
    stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) +
    geom_point()+
    scale_x_date(labels = date_format("%m-%y"), breaks = "3 months")