How to Extract specific data from Text File Using Python

43 views Asked by At

I have a data file that has a geometrical combination as the heading and the following related data generated from the software. The data file has the following structure.

The data file start from here;

#Parameters = {g5=0.6; g4=0.6; g3=0.6; g2=0.8; g1=1; w=0.6; ct=0.3; l1=20; st=0.4; L=30; phi=0; theta=0; alpha=0}
#"Frequency / GHz"  "RCS (Spherical) (0 0 100)(Abs) [pw] (1) [Magnitude]"
#-----------------------------------------------------------------------
1.0000000000000 -61.456712241986
1.0450000000000 -52.352042811820
1.0900000000000 -57.739956728723
1.1350000000000 -51.577139635280
1.1800000000000 -54.422408491602
1.2250000000000 -50.917549159602
................................
................................
9.9100000000000 -26.813663627631
9.9550000000000 -26.694234191386
10.000000000000 -26.738559594392
#Parameters = {g5=0.4; g4=0.4; g3=0.8; g2=0.6; g1=0.8; w=0.6; ct=0.3; l1=20; st=0.4; L=30; phi=0; theta=0; alpha=0}
#"Frequency / GHz"  "RCS (Spherical) (0 0 100)(Abs) [pw] (2) [Magnitude]"
#-----------------------------------------------------------------------
1.0000000000000 -52.852779305719
1.0450000000000 -55.167232774034
1.0900000000000 -56.453250374865
1.1350000000000 -51.165611656267

The same structure repeats until the end of the data set. I want to extract the geometrical combination g5, g4,g3,g2,g1, and l1 into separate columns and related frequency and RCS to the same row as given the example extraction for the first data set.

g1  g2  g3  g4  g5  l1 1.0000000000000  1.0450000000000  1.0900000000000  1.1350000000000 .......
1   0.8 0.6 0.6 0.6 20 -61.456712241986 -52.352042811820 -57.739956728723 -51.577139635280......

I have tried this by using the code,

import argparse

import pandas as pd
def process_data(file_path):
    g1_list = []
    g2_list = []
    g3_list = []
    g4_list = []
    g5_list = []
    l1_list = []
    frequency_list = []
    rcs_list = []

    with open(file_path, 'r') as file:
        lines = file.readlines()

        for line in lines:
            if line.startswith('#Parameters = {'):
                params = line.split(';')

                l1 = float(params[7].split('=')[1].strip())  # Extract l1
                g1 = float(params[4].split('=')[1].strip())  # Extract g1
                g2 = float(params[3].split('=')[1].strip())  # Extract g2
                g3 = float(params[2].split('=')[1].strip())  # Extract g3
                g4 = float(params[1].split('=')[1].strip())  # Extract g4
                g5 = float(params[0].split('=')[1].strip())  # Extract g5

            elif any(char.isdigit() for char in line[:10]):
                values = line.split()
                frequency = float(values[0].replace('\t', ''))
                rcs = float(values[1])
                g1_list.append(g1)
                g2_list.append(g2)
                g3_list.append(g3)
                g4_list.append(g4)
                g5_list.append(g5)
                l1_list.append(l1)
                frequency_list.append(frequency)
                rcs_list.append(rcs)

    data = {
        'g1': g1_list,
        'g2': g2_list,
        'g3': g3_list,
        'g4': g4_list,
        'g5': g5_list,
        'l1': l1_list,
        'Frequency / GHz': frequency_list,
        'RCS': rcs_list
    }
    rcs_geometry = pd.DataFrame(data)
    return rcs_geometry

But, when debugging, I get the following error.

Traceback (most recent call last):
  File "C:\Users\basnaym1\GitHub\deepa_learining_Neural_N\python_based\square_shaped_rfid\data_arrangement_square_shaped_rfid.py", line 135, in <module>
    rcs_geometry = process_data(args.file_path)
  File "C:\Users\basnaym1\GitHub\deepa_learining_Neural_N\python_based\square_shaped_rfid\data_arrangement_square_shaped_rfid.py", line 38, in process_data
    g5 = float(params[0].split('=')[1].strip())  # Extract g5
ValueError: could not convert string to float: '{g5'

How can I modify the code to extract the g5?

1

There are 1 answers

0
dafrandle On

This is a great use case for regular expressions.

Here is how I would do this:

import re

data = "all text data here"

line_select_pattern = r"#Parameters = \{([^}]*)\}"  # this will select the content inside the curly braces

matches_raw = re.findeall(line_select_pattern, data)

kv_pattern = r"(g5|g4|g3|g2|g1|w|ct|l1|st|L|phi|theta|alpha)=([\d.]+)"  # this assumes that the keys are consistent through the data

result_list = []

for match in matches_raw:
    kv_pairs_list = re.findeall(kv_pattern, match)
    this_dict = {key: value for key, value in kv_pairs}
    result_list.append(this_dict)

now result_list will have an index for each "#Parameters" line that exists in that data and each index will contain a dictionary with the data

if you cant get the all of the text into a single variable you can still do this , you would just have to run the first pattern against each line in your read loop.