I have a table where column A has values in '12:30:45' format. I want to create a column B where I only get the digits after the first colon of column A.

How do you use regex on python to extract only the digits after the first colon so that in the end, we get ':30:45'?

I've seen regex for extracting numbers, strings, split values between spaces, but couldn't find the one for extracting after colon.

I'm new to regex and any suggestions would be greatly appreciated.

2 Answers

1
Erfan On Best Solutions

You can use Series.str.extract with regex to get the : including:

# Print reproducable example dataframe
df = pd.DataFrame({'A':['12:30:45', '10:44:09', '8888']})
print(df)

          A
0  12:30:45
1  10:44:09
2      8888

df['A'] = df['A'].str.extract('^[^:]*(:.*)$')

print(df)
        A
0  :30:45
1  :44:09
2     NaN

If you want to keep the numbers which don't have a :, so you don't get a NaN, use the regex fallback |.* which you can read as: or all:

df['A'] = df['A'].str.extract('(^[^:]*(:.*)$|.*)')
print(df)
          A
0  12:30:45
1  10:44:09
2      8888
1
ALollz On

Use .str.split limiting the number of splits to n=1.

print(df)
                  time
0             12:30:45
1                12:30
2                12312
3  1:123:123123:123123

df.time.str.split(':', n=1).str[1]
#0                30:45
#1                   30
#2                  NaN
#3    123:123123:123123
#Name: time, dtype: object

If you really need the leading semi-colon ':'+df.time.str.split(':', n=1).str[1]'.