List append dictionary - handling missing data

13 views Asked by user16760825 At 28 March 2024 at 08:59

For reading a HTML-tree, I have a script that reads the element-tree. However, the problem I have is that with creating a list, the script uses for a missing value, the last known value.

However, using record_dict.clear() after the first layer of 'records in journal', the script takes 'NaN' for a missing value.

When I want to do this with the total script, it doens't work. This is logical, because it is using the values in different layers of data.

To make everything clear, I have the following scripts and examples:

The total script is:

transactions_df = pd.DataFrame()
total_recordstotal = list()
record_dict = dict()

for journal in journals:
    #record_dict = pd.DataFrame([dict()])    
    for records in journal:
        
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            
            #columnvalues = [x for x in records.text if x is not None]
            columnvalues = records.text
      
            record_dict[columnnames] = columnvalues
            
            
        
        else :
            for record in records: 
                if len(record) == 0:
                    columnnames = record.tag.replace(ns,'')
                    columnnames = record.tag.replace('nr','boeknr')
                    columnnames = columnnames.replace(ns,'')
                    columnvalues = record.text
                    record_dict[columnnames] = columnvalues 
  
                else:

                    for subfields in record: 
                        if len(subfields) == 0:
                            columnnames = subfields.tag.replace(ns,'')
                            columnvalues = subfields.text
                            record_dict[columnnames] = columnvalues

                        else: 

                            for subfields_1 in subfields: .
                                if len(subfields_1) == 0:
                                    columnnames = subfields_1.tag.replace(ns,'')
                                    columnvalues = subfields_1.text
                                    record_dict[columnnames] = columnvalues
                                else : print('nog een sublaag!')

                    total_recordstotal.append(record_dict.copy())

If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy()))

jrnID	bankAccNr
1	NaN
2	12
3	12
4	12
5	22
6	22
7	33
8	33
9	33
10	33

The output seems right, when changing the code to:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
    record_dict.clear()

jrnID	bankAccNr
1	NaN
2	12
3	NaN
4	NaN
5	22
6	NaN
7	33
8	NaN
9	NaN
10	NaN

However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?

If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy()))

jrnID	bankAccNr
1	NaN
2	12
3	12
4	12
5	22
6	22
7	33
8	33
9	33
10	33

The output seems right, when changing the code to:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
    record_dict.clear()

jrnID	bankAccNr
1	NaN
2	12
3	NaN
4	NaN
5	22
6	NaN
7	33
8	NaN
9	NaN
10	NaN

Original Q&A

TechQA.

List append dictionary - handling missing data

There are 0 answers

Related Questions in DATAFRAME

Related Questions in LIST

Related Questions in DICTIONARY

Related Questions in APPEND

Related Questions in DEEP-COPY

Popular Questions

Trending Questions