I have four files each containing 153 data points. Each data point cosists of 3 lines, ie.
File 1:
datapoint_1_name
datapoint_1_info
datapoint_1_data_file1
datapoint_2_name
datapoint_2_info
datapoint_2_data_file1
datapoint_3_name
datapoint_3_info
datapoint_3_data_file1
File 2:
datapoint_1_name
datapoint_1_info
datapoint_1_data_file2
datapoint_2_name
datapoint_2_info
datapoint_2_data_file2
datapoint_3_name
datapoint_3_info
datapoint_3_data_file2
File 3:
datapoint_1_name
datapoint_1_info
datapoint_1_data_file3
datapoint_2_name
datapoint_2_info
datapoint_2_data_file3
datapoint_3_name
datapoint_3_info
datapoint_3_data_file3
File 4:
datapoint_1_name
datapoint_1_info
datapoint_1_data_file4
datapoint_2_name
datapoint_2_info
datapoint_2_data_file4
datapoint_3_name
datapoint_3_info
datapoint_3_data_file4
and so on.
The data in all files is the same except for the third line of each. I am trying to merge these files in such a way that the output contains the datapoint_name, datapoint_info, from just the first file, and then the third line (datapoint_data
) from all remaining files, like so:
Output:
datapoint_1_name
datapoint_1_info
datapoint_1_data_file1
datapoint_1_data_file2
datapoint_1_data_file3
datapoint_1_data_file4
datapoint_2_name
datapoint_2_info
datapoint_2_data_file1
datapoint_2_data_file2
datapoint_2_data_file3
datapoint_2_data_file4
datapoint_3_name
datapoint_3_info
datapoint_3_data_file1
datapoint_3_data_file2
datapoint_3_data_file3
datapoint_3_data_file4
I've tried with the below script in Python (I've replaced the pattern matching with 'some pattern' in these lines; the patterns are matching the lines correctly and I've verified that)
output_file = "combined_sequences_and_data2.txt"
with open(output_file, 'w') as output:
combined_data = []
with open('file1', 'r') as file:
for line in file:
line = line.strip()
if line.startswith('some pattern'):
combined_data.append(line)
elif line.isalpha():
combined_data.append(line)
elif line.startswith('some pattern'):
combined_data.append(line)
with open('file2', 'r') as file:
for line in file:
line = line.strip()
if line.startswith('some pattern'):
combined_data.append(line)
with open('file3', 'r') as file:
for line in file:
line = line.strip()
if line.startswith('some pattern'):
combined_data.append(line)
with open('file4', 'r') as file:
for line in file:
line = line.strip()
if line.startswith('some pattern'):
combined_data.append(line)
# Write the combined data to the output file
output.write('\n'.join(combined_data) + '\n')
This doesn't run at all just freezes and I can't understand where.
I also tried awk:
`#!/bin/bash
file1="filename"
file2="filename"
file3="filename"
file4="filename"
group_size=3
line_count=1
while read -r line; do
if [ $line_count -le $group_size ]; then
group_lines[$line_count]=$line
line_count=$((line_count + 1))
fi
if [ $line_count -gt $group_size ]; then
for i in "${group_lines[@]}"; do
echo "$i"
done
awk 'NR == 3' "$file2"
awk 'NR == 3' "$file3"
awk 'NR == 3' "$file4"
line_count=1
unset group_lines
fi
done < "$file1"`
This one is closer to working but doesn't loop over the 3rd lines for the remaining 3 files - just prints the same line over and over for each datapoint 1 in file 1
You don't need to examine the file contents as you know that the values you're interested in are in groups of 3. Therefore: