I am writing a python script for data preprocessing. The data in question is read and stored within the script as a multi dimensional array consisting of data points similar to the ones below.
[['United', '-27.654379', '152.917741', 'e10', '1459', '2019-03-18'],
['United', '-27.654379', '152.917741', 'e10', '1449', '2019-03-19']]
Currently i need too remove values within the array that have identical dates so that
[['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16'],
['United', '-25.607894', '150.367213', 'e10', '1297', '2019-03-16']]
Would become
[['Costco', '-27.213607', '152.996416', 'e10', '1237', '2019-03-16']]
My current method of doing so (shown below) appears to identify and remove entries with duplicate dates, but some can still be found within the output.
for line in Data_text:
for row in Data_text:
if line[5] == row[5]:
Data_text.remove(row)
Any insight into the faults in my algorithm and/or a better way of doing it would be greatly appreciated.
Using pure Python, you can leverage the power of
set
to work in this case: