Python/Django- Save csv file data with messy rows

434 views Asked by At

I have a csv file with the following structure:

A, 10
B, 11
C, 8
D, 12
A, 21
B, 7
D, 22
D, 15
C, 111
D, 50
A, 41
B, 32
C, 19
D, 11

I want to read the entire file, and save the data on the second column, if the row is like A, B, C, D format. I have a list:

my_list = [A, B, C, D]

And I check each 4 row, if it is in my_list format, then read and save to the database like:

with open('csv_file.csv', 'rb') as csvfile:
        the_file = csv.reader(csvfile.read().splitlines())

        for idx, row in enumerate(islice(zip(*[the_file]*4), 100)):
            my_model = Django_model()
            if row[0][0] == my_list[0]:
                if row[0][0] == my_list[0] and row[1][0] == my_list[1] and \
                    row[2][0] == my_list[2] and row[3][0] == my_list[3]:
                    my_model.a_field = row[0][1]
                    my_model.b_field = row[1][1]
                    my_model.c_field = row[2][1]
                    my_model.d_field = row[3][1]
                    my_model.save()

The fact is that it is working if and ONLY if the structure of the row is the same as my_list. But when it gets to the messy part (A, B, D, D, C, D) it will not read the rows properly, and therefore many rows are skipped.

The question is that, how can I jump to the next interesting row (which follows my_list format) and read it? Meanwhile save the the skipped rows in another list?

I have heard that Pandas could help, but I have been through the documentation, and I couldn't find a way to solve this case.

1

There are 1 answers

7
Mohammad Yusuf On BEST ANSWER

You can extract the pattern and corresponding values like this:

import pandas as pd
import re

df = pd.read_csv('/home/yusuf/Desktop/c1', header=None, names=['c1','c2'])
l1=[]
for a in re.finditer('ABCD', ''.join(df.c1.tolist())):
    l1.append(range(a.start(),a.end()))
l2 = [b for a in l1 for b in a]
print df[df.index.isin(l2)], '\n'
print df[~df.index.isin(l2)]

Output:

   c1  c2
0   A  10
1   B  11
2   C   8
3   D  12
10  A  41
11  B  32
12  C  19
13  D  11

  c1   c2
4  A   21
5  B    7
6  D   22
7  D   15
8  C  111
9  D   50