I have a text file with this format:
0 -82.871 2.52531 36.64 138 96.05
0 -76.1014 2.52577 35.36 137 83.9
0 -76.1869 5.57562 35.36 137 62.8
0 -18.1623 -11.6886 386.08 411 200.9
0 -4.62234 -4.91846 325.92 364 252.2
0 -2.52609 -1.63149 325.92 364 85.4
0 -2.52609 -1.63149 112.16 197 48.4
0 -18.1623 -4.91846 -54.24 67 69.55
0 -18.1623 -4.91846 386.08 411 64.55
12345678 1
12345678 2
2 25.2279 -72.3226 48.16 147 221.55
2 28.7109 -70.2263 48.16 147 1587.7
2 76.1009 -63.4562 46.88 146 110.35
2 31.9979 -65.5526 48.16 147 1601.8
2 35.4805 -63.4559 48.16 147 310.25
2 31.9979 -58.7826 49.44 148 492.8
2 35.4805 -56.6859 46.88 146 42.6
2 1.63117 -43.1461 73.76 167 54.55
2 4.91818 -38.4723 76.32 169 75.4
I have written a program that skips the entire header with line = raw_dat.readlines()[7:]
and reads the entire file until it encounters the magic_number
and breaks the loop:
file = 'Runnumber169raw10.txt'
magic_number = '12345678'
event1 = []
x1 = []
y1 = []
z1 = []
tb1 = []
q1 = []
Xnoselection = []
X = []
distanceradius = 0
with open(file, 'r') as raw_dat:
line = raw_dat.readlines()[7:]
for lines in line:
lines.split()
print(lines)
if lines.split()[0] == magic_number:
break
The issue that I am having with this is that it stops because I added the break
statement that prevents the loop to continue reading. This break
statement is necessary for our purposes, because we want to analyze the data by reading through the entire file, stop when it encounters the magic_number
, stores the values corresponding to the first column values, and then continue reading after the magic_number
. This is the problem I'm having: continue reading the file and storing the values corresponding to the next 1st column values. I also want to note that this is just a test file; we have a file that has 10000 events (first column runs from 0 to 10000). I have used Pandas:
data = pd.read_csv('Runnumber169raw10.txt', sep = '\t', skiprows = 5)
event_series = pd.Series(data['eventno.'])
x_series = pd.Series(data['X'])
y_series = pd.Series(data['Y'])
z_series = pd.Series(data['Z'])
tb_series = pd.Series(data['Tb'])
q_series = pd.Series(data['Q'])
event_data = event_series[event_series == '0']
x_data = x_series[event_series == '0']
y_data = y_series[event_series == '0']
z_data = z_series[event_series == '0']
tb_data = tb_series[event_series == '0']
q_data = q_series[event_series == '0']
event = np.array(event_data)
x = np.array(x_data)
y = np.array(y_data)
z = np.array(z_data)
tb = np.array(tb_data)
q = np.array(q_data)
The problem with Pandas is that it looks through the entire file for the values corresponding to the first column values of 0 (in this case). But this is not how I want to read the file. To clarify, I just want to read through the entire file until it encounters the magic_number
; stop reading the file; store the values corresponding to the values of the first column; and continue reading after the magic number and repeat. Can anyone offer any suggesting regarding this?