extract value between specific string and colon in R

Question

extract value between specific string and colon in R

171 views Asked by mashimena At 19 June 2023 at 06:54

I have a table example like this

No, Memo
  1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

I want to extract string after Date: ,City: and Note:. For example at NO. 1,I need to extract the "2020/10/22" which is between Date: and City:, "UA" which is between City: and Note:, and the "True mastery of any skill takes a lifetime." which is after Note:.

Desired Output like :

 No Date       City Note
  1 2020/10/22 UA   True mastery of any skill takes a lifetime.
  2 2022/11/01 CH   Sweat is the lubricant of success.
  3 2022y11m1d UA   Every noble work is at first impossible.
  4 2022y2m15d AA   Live beautifully, dream passionately, love completely.

Does anyone know an answer for that?Any help would be great.Thank you.

Original Q&A

There are 2 answers

G. Grothendieck On 05 July 2023 at 06:48

Place a newline before each keyword in Memo at which point it is in dcf format so read that using read.dcf. This is general, not depending on the particular keywords in Memo, and does not depend on any packages.

DF |>
  transform(Memo = gsub("(\\w+: )", "\n\\1", Memo)) |>
  with(data.frame(No, read.dcf(textConnection(Memo))))

giving

  No       Date City                                                   Note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

Note

DF <- data.frame(
  No = 1:4,
  Memo = c(
    "Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.",
    "Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.",
    "Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.",
    "Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely."
  )
)

**f.lechleitner** · Accepted Answer · 2023-06-19T07:30:53+00:00

My solution using regex and stringr and dplyr

library(stringr)
library(dplyr)

df <- read.table(
  text = "No; Memo
  1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
  sep = ";",
  header = T
)

df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
                         city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
                         note = str_extract(Memo, "(?<=Note: ).*")) %>%
  select(-Memo)

> df_test
  No       date city                                                   note
1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
2  2 2022/11/01   CH                     Sweat is the lubricant of success.
3  3 2022y11m1d   UA               Every noble work is at first impossible.
4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.

The regex matches everything between the groups specified using positive lookahead and loohbehind.

TechQA.

extract value between specific string and colon in R

There are 2 answers

Note

Related Questions in R

Related Questions in REGEX

Related Questions in SQLDF

Popular Questions

Trending Questions