For reading a HTML-tree, I have a script that reads the element-tree. However, the problem I have is that with creating a list, the script uses for a missing value, the last known value.
However, using record_dict.clear() after the first layer of 'records in journal', the script takes 'NaN' for a missing value.
When I want to do this with the total script, it doens't work. This is logical, because it is using the values in different layers of data.
To make everything clear, I have the following scripts and examples:
The total script is:
transactions_df = pd.DataFrame()
total_recordstotal = list()
record_dict = dict()
for journal in journals:
#record_dict = pd.DataFrame([dict()])
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
#columnvalues = [x for x in records.text if x is not None]
columnvalues = records.text
record_dict[columnnames] = columnvalues
else :
for record in records:
if len(record) == 0:
columnnames = record.tag.replace(ns,'')
columnnames = record.tag.replace('nr','boeknr')
columnnames = columnnames.replace(ns,'')
columnvalues = record.text
record_dict[columnnames] = columnvalues
else:
for subfields in record:
if len(subfields) == 0:
columnnames = subfields.tag.replace(ns,'')
columnvalues = subfields.text
record_dict[columnnames] = columnvalues
else:
for subfields_1 in subfields: .
if len(subfields_1) == 0:
columnnames = subfields_1.tag.replace(ns,'')
columnvalues = subfields_1.text
record_dict[columnnames] = columnvalues
else : print('nog een sublaag!')
total_recordstotal.append(record_dict.copy())
If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
| jrnID | bankAccNr |
|---|---|
| 1 | NaN |
| 2 | 12 |
| 3 | 12 |
| 4 | 12 |
| 5 | 22 |
| 6 | 22 |
| 7 | 33 |
| 8 | 33 |
| 9 | 33 |
| 10 | 33 |
The output seems right, when changing the code to:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
record_dict.clear()
| jrnID | bankAccNr |
|---|---|
| 1 | NaN |
| 2 | 12 |
| 3 | NaN |
| 4 | NaN |
| 5 | 22 |
| 6 | NaN |
| 7 | 33 |
| 8 | NaN |
| 9 | NaN |
| 10 | NaN |
However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?
If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
| jrnID | bankAccNr |
|---|---|
| 1 | NaN |
| 2 | 12 |
| 3 | 12 |
| 4 | 12 |
| 5 | 22 |
| 6 | 22 |
| 7 | 33 |
| 8 | 33 |
| 9 | 33 |
| 10 | 33 |
The output seems right, when changing the code to:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
record_dict.clear()
| jrnID | bankAccNr |
|---|---|
| 1 | NaN |
| 2 | 12 |
| 3 | NaN |
| 4 | NaN |
| 5 | 22 |
| 6 | NaN |
| 7 | 33 |
| 8 | NaN |
| 9 | NaN |
| 10 | NaN |
However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?