Separating integers, floats, and scientific numbers as well as names using Python regular expressions

45 views Asked by At

I have a text file contains meteorological data, which include fields like station name, latitude, temperature, pressure, etc. All the station's data are in a single line as follows

met = 'KIRKENES (CIV/MIL)              -8.666667       5.350000       5.866667      HORNSUND RIVER     ENAN      7.9999998E-02  93  85  2.0000000E-02  0.1600000      4.9999997E-02      -999.9000      -999.9000      8  7  3  22.50000'

This should be separated to show them as a list of all data in order, ie.

['KIRKENES (CIV/MIL)', '-8.666667',  ... ,'HORNSUND RIVER', 'ENAN', '7.999998E-02', '93', ...., '22.50000']

I tried several regular expressions, but I got no luck unfortunately, This is a sample when I tried to get the floats and integers only:

regex = '^-?\d*(.\d+)?$'
print re.findall(regex, met)

but it simply gets nothing!, also I tried this for scientific numbers, and got nothing as well!

regexSci = 're.findall('/[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?/', met)'

Notice that I want a regex to find all forms at once, but sadly I even failed to parse each form separately!

What am I doing wrong? and how to get this done?

1

There are 1 answers

2
alecxe On BEST ANSWER

From what I understand, you can just split on 2 or more spaces with re.split():

In [1]: import re

In [2]: met = 'KIRKENES (CIV/MIL)              -8.666667       5.350000       5.866667      HORNSUND RIVER     ENAN      7.9999
    ...: 998E-02  93  85  2.0000000E-02  0.1600000      4.9999997E-02      -999.9000      -999.9000      8  7  3  22.50000'

In [3]: re.split(r"\s{2,}", met)
Out[3]: 
['KIRKENES (CIV/MIL)',
 '-8.666667',
 '5.350000',
 '5.866667',
 'HORNSUND RIVER',
 'ENAN',
 '7.9999998E-02',
 '93',
 '85',
 '2.0000000E-02',
 '0.1600000',
 '4.9999997E-02',
 '-999.9000',
 '-999.9000',
 '8',
 '7',
 '3',
 '22.50000']