Data traceability to identify data referred for calculation

47 views Asked by At

We need to perform certain calculations on a set of transactions using custom logic (will be written in Java or Python).

The calculations will be performed on transactions for specific period (e.g. 1st Jan to 31st 2017) and as at the time of calculation e.g. 31-Jan-2018. It is possible for users to add (or cancel) back-dated transactions at any time. There will be hundreds of thousands transactions and calculation runs can be performed multiple times for the same time period. Therefore, the business needs to know which transactions were used for which calculation run.

Does anyone know if there are any tools that can assist in in this data traceability to identify data that used for specific calculation?

I think it is difficult for any tool as our custom code knows the data it has used.

We are thinking of storing transactions (just identifiers) referred for each calculation in a database which can be used by data visualisation tools by the business. Given volume of transactions, this will take time (may be in hours) to insert those many records but it will be acceptable.

I will appreciate if anyone who faced similar problem can share their experience and how this was resolved. I am not sure if there is any standard pattern as it is probably not a common problem.

Thanks

0

There are 0 answers