Concatenate column values in csv based on grouping condition

740 views Asked by At

I have a csv that is like the below (Note: values in Name column are not limited i.e. not only ABC and DEF):

Name, Type, Text 
ABC, Type A, how
ABC, Type A, are
ABC, Type A, you
ABC, Type B, Your
ABC, Type B, Name?
DEF, Type A, I
DEF, Type A, am
DEF, Type A, good
DEF, Type B, I'm
DEF, Type B, Terminator
... and more 

I want to create another csv file like below (i.e. Group Text column based on Type column for every Name column):

Name, Type, Text
ABC, Type A, how are you
ABC, Type B, Your Name?
DEF, Type A, I am good
DEF, Type B, I'm Terminator
..till the end

I am trying to write a python script. My attempt as below:

TypeList = ['Type A','Type B']
with open("../doc1.csv", encoding='utf-8', newline='', mode="r") as myfile:
    
    g = csv.reader(myfile)

    with open("../doc2.csv", encoding='utf-8', newline='', mode="w") as myfile:
        h = csv.writer(myfile)
        h.writerow(["Name","Text"])

        for row in g:
            if TypeList[0] in row[1]:    
               Concatenatedtext[0]= Concatenatedtext[0] + ' ' + row[1]

Can someone please help me out with this mess?

1

There are 1 answers

0
snakecharmerb On

Grouping csv rows together is a task for the itertools.groupby function.

itertools.groupby accepts a key function that defines matching rows, and emits the key (here, the name and type) and the group (the rows that match) for each match found.

The operator.itemgetter function can be used to create the key function.

import csv
import itertools
import operator

# A function that gets the Name and Type values for each row:
# this is used to group the rows together.
key_func = operator.itemgetter(0, 1)

with open('myfile.csv', newline='') as f:
    reader = csv.reader(f)
    # Skip header row
    next(reader)
    for key, group in itertools.groupby(reader, key=key_func):
        text = ' '.join(cell[2] for cell in group)
        print([key[0], key[1], text])

Output:

['ABC', ' Type A', ' how  are  you']
['ABC', ' Type B', ' Your  Name?']
['DEF', ' Type A', ' I  am  good']
['DEF', ' Type B', " I'm  Terminator"]