I'm doing data cleaning. I use mutate in Dplyr a lot since it generates new columns step by step and I can easily see how it goes.
Here are two examples where I have this error
Error: incompatible size (%d), expecting %d (the group size) or 1
Example 1: Get town name from zipcode. Data is simply like this:
    Zip
1 02345
2 02201
And I notice when the data has NA in it, it doesn't work.
Without NA it works:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE)
test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
resulting in
Source: local data frame [2 x 2]
Groups: <by row>
    Zip   Town1
1 02345 Manomet
2 02201  Boston
With NA it doesn't work:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE)
test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
resulting in
Error: incompatible size (%d), expecting %d (the group size) or 1
Example2. I wanna get rid of the redundant state name that occurs in the Town column in the following data.
         Town State
1   BOSTON MA    MA
2 NORTH AMAMS    MA
3  CHICAGO IL    IL
This is how I do it: (1) split the string in Town into words, e.g. 'BOSTON' and 'MA' for row 1. (2) see if any of these words match the State of that line (3) delete the matched words
library(dplyr)
test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE)
test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-is.state])
This results in:
         Town State Town.word is.state   Town1
1   BOSTON MA    MA  <chr[2]>        2  BOSTON
2 NORTH AMAMS    MA  <chr[2]>       NA      NA
3  CHICAGO IL    IL  <chr[2]>        2 CHICAGO
Meaning: E.g., row 1 shows is.state==2, meaning the 2nd word in Town is the state name. After getting rid of that work, Town1 is the correct town name.
Now I wanna fix the NA in row 2, but add na.omit would cause error:
test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-na.omit(is.state)]) 
results in:
Error: incompatible size (%d), expecting %d (the group size) or 1
I checked the data type and size:
test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(length(is.state) ) %>%       
  mutate(class(na.omit(is.state)))
results in:
         Town State Town.word is.state length(is.state) class(na.omit(is.state))
1   BOSTON MA    MA  <chr[2]>        2                1                  integer
2 NORTH AMAMS    MA  <chr[2]>       NA                1                  integer
3  CHICAGO IL    IL  <chr[2]>        2                1                  integer
So it is %d of length==1. Can somebody where's wrong? Thanks
 
                        
Can you just
subit out?(This way also catches commas after the town, if that happens.)
NB: if you use
ungroup()here with arowwise_df(as this is), it will wipe thetbl_dfclass as well and output a straight data.frame, which is fine for your data but will clobber your screen if you aren't careful and are looking at large amounts of data (as I've done countless times). (Github references #936 and #553.)