Fill pandas dataframe within a for loop

1.2k views Asked by At

I am working with Amazon Rekognition to do some image analysis. With a symple Python script, I get - at every iteration - a response of this type: (example for the image of a cat)

{'Labels':
            [{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [],
              'Parents': [{'Name': 'Animal'}]}, {'Name': 'Mammal', 'Confidence': 96.146484375,
                                                 'Instances': [], 'Parents': [{'Name': 'Animal'}]},
             {'Name': 'Cat', 'Confidence': 96.146484375.....

I got all the attributes I need in a list, that looks like this:

[Pet, Mammal, Cat, Animal, Manx, Abyssinian, Furniture, Kitten, Couch]

Now, I would like to create a dataframe where the elements in the list above appear as columns and the rows take values 0 or 1.

I created a dictionary in which I add the elements in the list, so I get {'Cat': 1}, then I go to add it to the dataframe and I get the following error: TypeError: Index(...) must be called with a collection of some kind, 'Cat' was passed.

Not only that, but I don't even seem able to add to the same dataframe the information from different images. For example, if I only insert the data in the dataframe (as rows, not columns), I get a series with n rows with the n elements (identified by Amazon Rekognition) of only the last image, i.e. I start from an empty dataframe at each iteration. The result I would like to get is something like:

Image   Human   Animal  Flowers     etc...
Pic1    1        0       0  
Pic2    0        0       1  
Pic3    1        1       0  

For reference, this is the code I am using now (I should add that I am working on a software called KNIME, but this is just Python):

from pandas import DataFrame
import pandas as pd
import boto3

fileName=flow_variables['Path_Arr[1]']  #This is just to tell Amazon the name of the image
bucket= 'mybucket'
client=boto3.client('rekognition', region_name = 'us-east-2')

response = client.detect_labels(Image={'S3Object':
{'Bucket':bucket,'Name':fileName}})


data = [str(response)]  # This is what I inserted in the first cell of this question

d= {}
for key, value in response.items():
    for el in value:
        if isinstance(el,dict):
            for k, v in el.items():
                if k == "Name":
                    d[v] = 1
                    print(d)
                    df = pd.DataFrame(d, ignore_index=True)

print(df)
output_table = df

I am definitely getting it all wrong both in the for loop and when adding things to my dataframe, but nothing really seems to work!

Sorry for the super long question, hope it was clear! Any ideas?

1

There are 1 answers

0
D-E-N On

I do not know if this answers your question completely, because i do not know, what you data can look like, but it's a good step that should help you, i think. I added the same data multiple time, but the way should be clear.

import pandas as pd

response = {'Labels': [{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [], 'Parents': [{'Name': 'Animal'}]},
                       {'Name': 'Cat', 'Confidence': 96.146484375, 'Instances': [{'BoundingBox':
                                                                                      {'Width': 0.6686800122261047,
                                                                                       'Height': 0.9005332589149475,
                                                                                       'Left': 0.27255237102508545,
                                                                                       'Top': 0.03728689253330231},
                                                                                  'Confidence': 96.146484375}],
                        'Parents': [{'Name': 'Pet'}]
                        }]}


def handle_new_data(repsonse_data: dict, image_name: str) -> pd.DataFrame:
    d = {"Image": image_name}
    result = pd.DataFrame()
    for key, value in repsonse_data.items():
        for el in value:
            if isinstance(el, dict):
                for k, v in el.items():
                    if k == "Name":
                        d[v] = 1
        result = result.append(d, ignore_index=True)

    return result


df_all = pd.DataFrame()
df_all = df_all.append(handle_new_data(response, "image1"))
df_all = df_all.append(handle_new_data(response, "image2"))
df_all = df_all.append(handle_new_data(response, "image3"))
df_all = df_all.append(handle_new_data(response, "image4"))
df_all.reset_index(inplace=True)
print(df_all)