Best way to compare two large files on multiple columns

251 views Asked by Smogger At 05 May 2022 at 19:04

I am working on a feature which will allow users to upload two csv files, write the rules to compare the rows and output a result into a file.

Both files can have any number of columns and the columns name are also not fixed.

Currently, I read the files into two separate arrays and compare the rows based on the condition given in the rule.

This works for smaller files but for large ones, it takes a lot of time and memory to do the comparison.

Is there a better way where a DB can be utilized for storing and querying on schema-less data?

Example Data:

File1
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500

File2
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500
A    1   12/10/2005 500

Rule1  File1.type == File2.type && File1.amount == File2.amount

Rule2  File1.id == GroupBy(File2.id) && File1.amount == File2.TotalAmount

The match condition will be = Rule1 or Rule2

Original Q&A

TechQA.

Best way to compare two large files on multiple columns

There are 0 answers

Related Questions in PYTHON

Related Questions in ETL

Related Questions in LARGE-DATA

Related Questions in RECONCILIATION

Popular Questions

Trending Questions