R Replace <NA> with date, if other columns value is

1.9k views Asked by At

Let's say I have two columns, one of location and one of dates, some of which are missing.

    City       Date
1   Sheffield  19/05/1985
2   Sheffield  21/06/1986
3   Sheffield  <NA>
4   Newcastle  14/07/1996
5   Newcastle  <NA>
6   Liverpool  12/11/2001

I need to replace the missing dates with a dummy date (let's say 21/06/1866), but ONLY for the city of Sheffield.

In SQL the syntax would be:

UPDATE Dataframe SET Date = "21/06/1866" WHERE city="Sheffield"

In R I've tried the following with limited (to no) success:

filter <- (Dataframe$Date == is.na(Dataframe$Date) & Dataframe$City =="Sheffield")
Dataframe[filter,"Date"] <- as.Date("1866/06/21")

However this throws me the following error:

Error in [<-.data.frame(*tmp*, filter, "Date", value = -37814) : missing values are not allowed in subscripted assignments of data frames

Anyone got any ideas? I feel like this is probably insanely easy but I'm just not seeing how it's done yet.

1

There are 1 answers

0
Kristofersen On
df[df$City == "Sheffield" & is.na(df$Date), "Date"] = as.Date("1866/06/21", format = "%Y/%m/%d")

> df
       City       Date
1 Sheffield 1985-05-19
2 Sheffield 1986-06-21
3 Sheffield 1866-06-21
4 Newcastle 1996-07-14
5 Newcastle       <NA>
6 Liverpool 2001-11-12

if you look at your filter it just returns FALSE all the way across.

> filter <- (df$Date == is.na(df$Date) & df$City =="Sheffield")
> filter
[1] FALSE FALSE FALSE FALSE FALSE FALSE

You just need to write is.na(df$Date) this will return a logical vector indicating if the date is NA. What you wrote checks of date is TRUE or FALSE (returned from is.na(df$Date)), which it never is because it's a date.

> filter <- (is.na(df$Date) & df$City =="Sheffield")
> filter
[1] FALSE FALSE  TRUE FALSE FALSE FALSE