Thanks for reading this.
Here's my goal: I have two Datatables I've read in from Excel Worksheets. The Datatables have the same schema (Columns A, B, C, ... in Datatable1 has same kind of data as Column A, B, C, ... in Datatable2).
I need to compare data in the tables by arbitrary columns (i.e. for comparison only Column A and C matter, but I need to keep data in Column A, B, C, ..., N).
Since I'm reading these in from Excel Worksheets, the schema can't be expected. For example, if I load a different set of Worksheets, the comparison columns may differ. For this reason, I can't use LINQ, which is like a hard coded SQL statement.
I need to perform the equivalent of a FULL OUTER JOIN. I'm trying to show all data, including missing data from either datatable which doesn't appear in the other datatable.
I've read a little bit about DataRelations, but I'm not sure how to use them.
Please provide example code.
Thanks in advance!
Given a pair of
DataTable
s with an arbitrary number of columns, and given a function that can create a grouping value of a reasonable type from each of the twoDataTable
s, you can use Linq to do most of the work for you.Let's start with a function to extract the join key from the
DataTable
s. It'd be nice to just return anobject[]
, but they don't compare well. We can do it with aTuple<object, object>
however - those work nicely for this purpose. And if you need more columns you can just add more columns :PNow the join. We can't do a full outer join directly, but we can do an outer join both ways and
Union
the results:Next up you have to create a suitable output format - a
DataTable
that has all the rows from both sources, plus a field to hold some info about the key:And finally, fill the table from the query:
Of course we could short out a couple of those operations to get a single Linq query that does 99% of the work for us. I'll leave that to you to play with if it sounds like fun.
Here's the full method, done as an extension with a generic function for the join key generator, making it reasonably generic:
Now, IF the tables have the same schema (which the above doesn't care about), you can do almost exactly the same thing - modify the result table generation to just clone one of the tables, then add some merge logic in the load loop.
Here's a Gist of the above with testing and verification that it's doing what it says. Drop that in your compiler and see what you get out.