I need to remove the street type (St, Blvd, Rd, etc) from a series of addresses as the clean up step before a data match. I'm using the code below, but for some addresses the result is missing part of the street I want to keep.

library(tidyverse)
c("9123 GLENOAKS BLVD","123 E AVENUE K6 STE B","123 CAMP PLENTY RD","900 E VICTORIA ST","460 SAN FERNANDO RD","176 S SANTA FE AVE STE 9") %>% 
sub("AVE.*$|ST.*$| BLVD.*$| RD.*$| PL.*$| 3RD.*$| APT.*$| DR.*$", "", .) 

[1] "9123 GLENOAKS"    "123 E "           "123 CAMP"         "900 E VICTORIA "  "460 SAN FERNANDO" "176 S SANTA FE " 

Below is the expected output

[1] "9123 GLENOAKS"    "123 E AVENUE K6 "           "123 CAMP PLENTY"         "900 E VICTORIA "  "460 SAN FERNANDO" "176 S SANTA FE " 

1 Answers

2
Wiktor Stribi┼╝ew On Best Solutions

You may use

sub("(.*?)\\s+(?:AVE|STE?|BLVD|RD|PL|3RD|APT|DR)\\b.*", "\\1", .)

Details

  • (.*?) - Group 1 (this group will hold the value referenced to with \1 from the replacement pattern): any 0 or more chars as few as possible
  • \s+ - 1 or more whitespaces
  • (?:AVE|STE?|BLVD|RD|PL|3RD|APT|DR) - a list of the string alternatives: AVE, ST or STE, BLVD, RD, PL, 3RD, APT or DR
  • \b - a word boundary
  • .* - the rest of the input.