Double if conditional in the line.startswith strategy

828 views Asked by At

I have a data.dat file with this format:

REAL PART 

FREQ     1.6     5.4     2.1    13.15    13.15    17.71
FREQ     51.64   51.64   82.11  133.15   133.15   167.71

.
.
.

IMAGINARY PART 

FREQ     51.64    51.64     82.12   132.15    129.15    161.71
FREQ     5.64     51.64     83.09   131.15    120.15    160.7

.
.
.

REAL PART 

FREQ     1.6     5.4     2.1    13.15    15.15    17.71
FREQ     51.64   57.64   82.11  183.15   133.15   167.71

.
.
.

IMAGINARY PART 

FREQ     53.64    53.64     81.12   132.15    129.15    161.71
FREQ     5.64     55.64     83.09   131.15    120.15    160.7

All over the document REAL and IMAGINARY blocks are reported

Within the REAL PART block,

I would like to split each line that starts with FREQ.

I have managed to:

1) split lines and extract the value of FREQ and

2) append this result to a list of lists, and

3) create a final list, All_frequencies:

FREQ = []
fname ='data.dat'
f = open(fname, 'r')
for line in f:
    if line.startswith(' FREQ'):
    FREQS = line.split()
    FREQ.append(FREQS)

print 'Final FREQ = ', FREQ
All_frequencies = list(itertools.chain.from_iterable(FREQ))
print 'All_frequencies = ', All_frequencies

The problem with this code is that it also extracts the IMAGINARY PART values of FREQ. Only the REAL PART values of FREQ would have to be extracted.

I have tried to make something like:

if line.startswith('REAL PART'):
   if line.startswith('IMAGINARY PART'):
      code...

or:

if line.startswith(' REAL') and line.startswith(' FREQ'):
   code...

But this does not work. I would appreciate if you could help me

4

There are 4 answers

4
Mark Ransom On BEST ANSWER

It appears based on the sample data in the question that lines starting with 'REAL' or 'IMAGINARY' don't have any data on them, they just mark the beginning of a block. If that's the case (and you don't go changing the question again), you just need to keep track of which block you're in. You can also use yield instead of building up an ever-larger list of frequencies, as long as this code is in a function.

def read_real_parts(fname):
    f = open(fname, 'r')
    real_part = False
    for line in f:
        if line.startswith(' REAL'):
            real_part = True
        elif line.startswith(' IMAGINARY'):
            real_part = False
        elif line.startswith(' FREQ') and real_part:
            FREQS = line.split()
            yield FREQS

FREQ = read_real_parts('data.dat') #this gives you a generator
All_frequencies = list(itertools.chain.from_iterable(FREQ)) #then convert to list
4
appills On

we start with a flag set to False. if we find a line that contains "REAL", we set it to True to start copying the data below the REAL part, until we find a line that contains IMAGINARY, which sets the flag to False and goes to the next line until another "REAL" is found (and hence the flag turns back to True)

using the flag concept in a simple way:

with open('this.txt', 'r') as content:
    my_lines = content.readlines()

f=open('another.txt', 'w')

my_real_flag = False    
for line in my_lines:
    if "REAL" in line:
        my_real_flag = True
    elif "IMAGINARY" in line:
        my_real_flag = False
    if my_real_flag:
        #do code here because we found real frequencies
        f.write(line)
    else:
         continue #because my_real_flag isn't true, so we must have found a 
f.close()

this.txt looks like this:

REAL
1
2
3
IMAGINARY
4
5
6
REAL
1
2
3
IMAGINARY
4
5
6

another.txt ends up looking like this:

REAL
1
2
3
REAL
1
2
3

Original answer that only works when there is one REAL section

If the file is "small" enough to be read as an entire string and there is only one instance of "IMAGINARY PART", you can do this:

file_str = file_str.split("IMAGINARY PART")[0]

which would get you everything above the "IMAGINARY PART" line.

You can then apply the rest of your code to this file_str string that contains only the real part

to elaborate more, file_str is a str which is obtained by the following:

with open('data.dat', 'r') as my_data:
    file_str = my_data.read()

the "with" block is referenced all over stack exchange, so there may be a better explanation for it than mine. I intuitively think about it as

"open a file named 'data.dat' with the ability to only read it and name it as the variable my_data. once its opened, read the entirety of the file into a str, file_str, using my_data.read(), then close 'data.dat' "

now you have a str, and you can apply all the applicable str functions to it.

if "IMAGINARY PART" happens frequently throughout the file or the file is too big, Tadgh's suggestion of a flag a break works well.

for line in f:
    if "IMAGINARY PART" not in line:
        #do stuff
    else:
        f.close()
        break
3
Tadhg McDonald-Jensen On

You would need to keep track of which part you are looking at, so you can use a flag to do this:

section = None #will change to either "real" or "imag"
for line in f:
    if line.startswith("IMAGINARY PART"):
        section = "imag"
    elif line.startswith('REAL PART'):
        section = "real"
    else:
        freqs = line.split()
        if section == "real":
            FREQ.append(freqs)
        #elif section == "imag":
        #    IMAG_FREQ.append(freqs)

by the way, instead of appending to FREQ then needing to use itertools.chain.from_iterable you might consider just extending FREQ instead.

4
Bill Bell On

Think of this as a state machine having two states. In one state, when the program has read a line with REAL at the beginning it goes into the REAL state and aggregates values. When it reads a line with IMAGINARY it goes into the alternate state and ignores values.

REAL, IMAGINARY = 1,2

FREQ = []
fname = 'data.dat'
f = open(fname)
state = None
for line in f:
    line = line.strip()
    if not line: continue
    if line.startswith('REAL'):
        state = REAL
        continue
    elif line.startswith('IMAGINARY'):
        state = IMAGINARY
        continue
    else:
        pass
    if state == IMAGINARY:
        continue
    freqs = line.split()[1:]
    FREQ.extend(freqs)

I assume that you want only the numeric values; hence the [:1] at the end of the assignment to freqs near the end of the script.

Using your data file, without the ellipsis lines, produces the following result in FREQ:

['1.6', '5.4', '2.1', '13.15', '13.15', '17.71', '51.64', '51.64', '82.11', '133.15', '133.15', '167.71', '1.6', '5.4', '2.1', '13.15', '15.15', '17.71', '51.64', '57.64', '82.11', '183.15', '133.15', '167.71']