I have a text file with this format:

0   -82.871 2.52531 36.64   138 96.05
0   -76.1014    2.52577 35.36   137 83.9
0   -76.1869    5.57562 35.36   137 62.8
0   -18.1623    -11.6886    386.08  411 200.9
0   -4.62234    -4.91846    325.92  364 252.2
0   -2.52609    -1.63149    325.92  364 85.4
0   -2.52609    -1.63149    112.16  197 48.4
0   -18.1623    -4.91846    -54.24  67  69.55
0   -18.1623    -4.91846    386.08  411 64.55
12345678  1
12345678  2
2   25.2279 -72.3226    48.16   147 221.55
2   28.7109 -70.2263    48.16   147 1587.7
2   76.1009 -63.4562    46.88   146 110.35
2   31.9979 -65.5526    48.16   147 1601.8
2   35.4805 -63.4559    48.16   147 310.25
2   31.9979 -58.7826    49.44   148 492.8
2   35.4805 -56.6859    46.88   146 42.6
2   1.63117 -43.1461    73.76   167 54.55
2   4.91818 -38.4723    76.32   169 75.4

I have written a program that skips the entire header with line = raw_dat.readlines()[7:] and reads the entire file until it encounters the magic_number and breaks the loop:

file = 'Runnumber169raw10.txt'
magic_number = '12345678'

event1 = []
x1 = []
y1 = []
z1 = []
tb1 = []
q1 = []
Xnoselection = []
X = []
distanceradius = 0

with open(file, 'r') as raw_dat:
    line = raw_dat.readlines()[7:]
    
    for lines in line:
        lines.split()
        print(lines)
        if lines.split()[0] == magic_number:
            break

The issue that I am having with this is that it stops because I added the break statement that prevents the loop to continue reading. This break statement is necessary for our purposes, because we want to analyze the data by reading through the entire file, stop when it encounters the magic_number, stores the values corresponding to the first column values, and then continue reading after the magic_number. This is the problem I'm having: continue reading the file and storing the values corresponding to the next 1st column values. I also want to note that this is just a test file; we have a file that has 10000 events (first column runs from 0 to 10000). I have used Pandas:

data = pd.read_csv('Runnumber169raw10.txt', sep = '\t', skiprows = 5)
event_series = pd.Series(data['eventno.'])
x_series = pd.Series(data['X'])
y_series = pd.Series(data['Y'])
z_series = pd.Series(data['Z'])
tb_series = pd.Series(data['Tb'])
q_series = pd.Series(data['Q'])

event_data = event_series[event_series == '0']
x_data = x_series[event_series == '0']
y_data = y_series[event_series == '0']
z_data = z_series[event_series == '0']
tb_data = tb_series[event_series == '0']
q_data = q_series[event_series == '0']

event = np.array(event_data)
x = np.array(x_data)
y = np.array(y_data)
z = np.array(z_data)
tb = np.array(tb_data)
q = np.array(q_data)

The problem with Pandas is that it looks through the entire file for the values corresponding to the first column values of 0 (in this case). But this is not how I want to read the file. To clarify, I just want to read through the entire file until it encounters the magic_number; stop reading the file; store the values corresponding to the values of the first column; and continue reading after the magic number and repeat. Can anyone offer any suggesting regarding this?

1

There are 1 answers

4
Barmar On

Use enumerate() to get the line indexes while iterating the first time. Then you can restart where you left off.

with open(file, 'r') as raw_dat:
    line = raw_dat.readlines()[7:]
    
for index, lines in enumerate(line):
    print(lines)
    if lines[0] == magic_number:
        break

for lines in line[index+1:]:
    # do other stuff

Or don't read the entire file into a list, loop over the file lines directly.

with open(file, 'r') as raw_dat:
    # skip first 7 lines
    for _ in range(7):
        raw_data.readline()

    for lines in raw_dat:
        print(lines)
        if lines.split()[0] == magic_number:
            break

    for lines in raw_dat:
        # do other stuff