C# Full Outer Join Two DataTables Without LINQ

3.7k views Asked by At

Thanks for reading this.

Here's my goal: I have two Datatables I've read in from Excel Worksheets. The Datatables have the same schema (Columns A, B, C, ... in Datatable1 has same kind of data as Column A, B, C, ... in Datatable2).

I need to compare data in the tables by arbitrary columns (i.e. for comparison only Column A and C matter, but I need to keep data in Column A, B, C, ..., N).

Since I'm reading these in from Excel Worksheets, the schema can't be expected. For example, if I load a different set of Worksheets, the comparison columns may differ. For this reason, I can't use LINQ, which is like a hard coded SQL statement.

I need to perform the equivalent of a FULL OUTER JOIN. I'm trying to show all data, including missing data from either datatable which doesn't appear in the other datatable.

I've read a little bit about DataRelations, but I'm not sure how to use them.

Please provide example code.

Thanks in advance!

1

There are 1 answers

5
Corey On

Given a pair of DataTables with an arbitrary number of columns, and given a function that can create a grouping value of a reasonable type from each of the two DataTables, you can use Linq to do most of the work for you.

Let's start with a function to extract the join key from the DataTables. It'd be nice to just return an object[], but they don't compare well. We can do it with a Tuple<object, object> however - those work nicely for this purpose. And if you need more columns you can just add more columns :P

// Produces a JoinKey as Tuple containing columns 'A' and 'C' (0 and 2)
public Tuple<object, object> GetJoinKey(DataRow row)
{
    return Tuple.Create(row[0], row[2]);
}

Now the join. We can't do a full outer join directly, but we can do an outer join both ways and Union the results:

// given DataTables table1 & table2:
var outerjoin = 
    (
        from row1 in table1.AsEnumerable()
        join row2 in table2.AsEnumerable() 
            on GetJoinKey(row1) equals GetJoinKey(row2)
            into matches
        from row2 in matches.DefaultIfEmpty()
        select new { key = GetJoinKey(row1), row1, row2 }
    ).Union(
        from row2 in table2.AsEnumerable()
        join row1 in table1.AsEnumerable()
            on GetJoinKey(row2) equals GetJoinKey(row1)
            into matches
        from row1 in matches.DefaultIfEmpty()
        select new { key = GetJoinKey(row2), row1, row2 }
    );

Next up you have to create a suitable output format - a DataTable that has all the rows from both sources, plus a field to hold some info about the key:

DataTable result = new DataTable();
// add column for string value of key:
result.Columns.Add("__key", typeof(string));
// add columns from table1:
foreach (var col in table1.Columns.OfType<DataColumn>())
    result.Columns.Add("T1_" + col.ColumnName, col.DataType);
// add columns from table2:
foreach (var col in table2.Columns.OfType<DataColumn>())
    result.Columns.Add("T2_" + col.ColumnName, col.DataType);

And finally, fill the table from the query:

var row1def = new object[table1.Columns.Count];
var row2def = new object[table2.Columns.Count];
foreach (var src in outerjoin)
{
    // extract values from each row where present
    var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
    var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);

    // create row with key string and row values
    result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
}

Of course we could short out a couple of those operations to get a single Linq query that does 99% of the work for us. I'll leave that to you to play with if it sounds like fun.

Here's the full method, done as an extension with a generic function for the join key generator, making it reasonably generic:

public static DataTable FullOuterJoin<T>(this DataTable table1, DataTable table2, Func<DataRow, T> keygen)
{
    // perform inital outer join operation
    var outerjoin = 
        (
            from row1 in table1.AsEnumerable()
            join row2 in table2.AsEnumerable() 
                on keygen(row1) equals keygen(row2)
                into matches
            from row2 in matches.DefaultIfEmpty()
            select new { key = keygen(row1), row1, row2 }
        ).Union(
            from row2 in table2.AsEnumerable()
            join row1 in table1.AsEnumerable()
                on keygen(row2) equals keygen(row1)
                into matches
            from row1 in matches.DefaultIfEmpty()
            select new { key = keygen(row2), row1, row2 }
        );

    // Create result table
    DataTable result = new DataTable();
    result.Columns.Add("__key", typeof(string));
    foreach (var col in table1.Columns.OfType<DataColumn>())
        result.Columns.Add("T1_" + col.ColumnName, col.DataType);
    foreach (var col in table2.Columns.OfType<DataColumn>())
        result.Columns.Add("T2_" + col.ColumnName, col.DataType);

    // fill table from join query
    var row1def = new object[table1.Columns.Count];
    var row2def = new object[table2.Columns.Count];
    foreach (var src in outerjoin)
    {
        // extract values from each row where present
        var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
        var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);

        // create row with key string and row values
        result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
    }

    return result;
}

Now, IF the tables have the same schema (which the above doesn't care about), you can do almost exactly the same thing - modify the result table generation to just clone one of the tables, then add some merge logic in the load loop.

Here's a Gist of the above with testing and verification that it's doing what it says. Drop that in your compiler and see what you get out.