R - estimating missing values

Question

R - estimating missing values

243 views Asked by Craig At 13 September 2017 at 00:08

Let's assume I have a table as such:

Date        Sales
09/01/2017  9000
09/02/2017  12000
09/03/2017  0
09/04/2017  11000
09/05/2017  14400
09/06/2017  0
09/07/2017  0
09/08/2017  21000
09/09/2017  15000
09/10/2017  23100
09/11/2017  0
09/12/2017  32000
09/13/2017  8000

The values in the table are estimated by an R program which I have no access to (it's a black box right now). Now there are a few days with 0 values which tend to creep in due to issues in our ingestion/ETL process. I need to estimate the values for the dates with 0 data.

Our approach is to:

Draw a line from the date prior to the missing data to the date right after the missing data
Estimate the value for the missing date from the line

Now if there's only one day with missing data between two good days, a straightforward mean would work. If there are two or more consecutive days with missing data, the mean would not work, so I'm trying to formulate a way to go about estimating values for multiple data points.

Would this approach work in R? I'm a total n00b at R so I'm not really sure if this is even feasible.

Original Q&A

There are 2 answers

www On 13 September 2017 at 02:28

We can also use the na.interpolation function from the imputeTS package. The default method of na.interpolation is linear interpolation, but we can also specify other methods if we want.

library(dplyr)
library(imputeTS)

dt2 <- dt %>%
  replace(. == 0, NA) %>%
  mutate(Sales = na.interpolation(Sales))

dt2
         Date Sales
1  09/01/2017  9000
2  09/02/2017 12000
3  09/03/2017 11500
4  09/04/2017 11000
5  09/05/2017 14400
6  09/06/2017 16600
7  09/07/2017 18800
8  09/08/2017 21000
9  09/09/2017 15000
10 09/10/2017 23100
11 09/11/2017 27550
12 09/12/2017 32000
13 09/13/2017  8000

Data

dt <- read.table(text = "Date        Sales
09/01/2017  9000
                 09/02/2017  12000
                 09/03/2017  0
                 09/04/2017  11000
                 09/05/2017  14400
                 09/06/2017  0
                 09/07/2017  0
                 09/08/2017  21000
                 09/09/2017  15000
                 09/10/2017  23100
                 09/11/2017  0
                 09/12/2017  32000
                 09/13/2017  8000",
                 header = TRUE, stringsAsFactors = FALSE)

**G5W** · Accepted Answer · 2017-09-13T00:34:15+00:00

You can fill in the values with linear interpolation using the function approxfun.

## Your data
df = read.table(text="Date        Sales
09/01/2017  9000
09/02/2017  12000
09/03/2017  0
09/04/2017  11000
09/05/2017  14400
09/06/2017  0
09/07/2017  0
09/08/2017  21000
09/09/2017  15000
09/10/2017  23100
09/11/2017  0
09/12/2017  32000
09/13/2017  8000",
header=TRUE, stringsAsFactors=FALSE)
df$Date = as.Date(df$Date, format="%m/%d/%Y")


## Create function for linear interpolation
Interp = approxfun(df[df$Sales > 0, ])

## Use function to fill in interpolated values
Vals = Interp(df$Date[df$Sales == 0])
df$Sales[df$Sales == 0] = Vals
plot(df, type="l")
grid()

TechQA.

R - estimating missing values

There are 2 answers

Related Questions in R

Related Questions in DATA-QUALITY

Popular Questions

Popular Tags

Trending Questions