How do I extract two center columns from a tab-delimited line of text?

868 views Asked by At

I need two regex regular expressions. One that will find the second block of numbers and one that will find the third block of numbers. My data is like this:

8782910291827182    04  1988    081

One code to find the 04 and other to find the 1988. I already have the expression to find the first 16 numbers and the last 3 numbers, but I am stuck in finding those 2 numbers of the second and third section.

2

There are 2 answers

0
xdazz On

Find 2 numbers:

\b\d{2}\b

Find 4 numbers:

\b\d{4}\b
0
Todd A. Jacobs On

Use Field-Splitting Instead

Based on your corpus, it seems that one should be able to rely on the existence of four fields separated by tabs or other whitespace. Splitting fields is much easier than building and testing a regex, so I'd recommend skipping the regex unless there are edge cases not included in your examples.

Consider the following Ruby examples:

# Split the string into fields.
string = '8782910291827182    04  1988    081'
fields = string.split /\s+/
#=> ["8782910291827182", "04", "1988", "081"]

# Access members of the field array.
fields.first
#=> "8782910291827182"

fields[1]
#=> "04"

fields[2]
#=> "1988"

# Unpack array elements into variables.
field1, field2, field3, field4 = fields
p field2, field3
#=> ["04", "1988"]

A regular expression will force you to spend more time on pattern matching, especially as your corpus grows more complex; string-splitting is generally simpler, and will enable to you focus more on the result set. In most cases, the end results will be functionally similar, so which one is more useful to you will depend on what you're really trying to do. It's always good to have alternative options!