How to remove specific rows from data in SSIS

134 views Asked by At

I am working on a SSIS package.

I have imported data from multiple sources.

Then I used UnionAll to combine the data.

I have an XML file, this file contains the rows that I have to remove. There is not any ID column. the XML file looks like:

<Suppliers>
    <Supplier SupplierName="ACCO UK LTD" Entity="Department for Business, Innovation and Skills" ExpenseArea="Finance, Commercial & Digital Transformation - Estates"/>
    <Supplier SupplierName="ADAM PHONES LTD" Entity="Department for Business, Innovation and Skills" ExpenseArea="Finance, Commercial & Digital Transformation - Digital Directorate"/>
    <Supplier SupplierName="Alexander Bryson" Entity="Department for Business, Innovation and Skills" ExpenseArea="Government Office for Science - Government Office for Science"/>
    <Supplier SupplierName="Auto Mobile Valet Limited" Entity="Department for Business, Innovation and Skills" ExpenseArea="Skills, Deregulation and Local Growth - Better Regulation Delivery Office"/>
</Suppliers>

I have 80 rows.

I have imported this XML file in XML component. But I don't know what to do next. I know about conditional split but how can I use it in my case? Or any alternatives?

Thank you.

1

There are 1 answers

1
billinkc On BEST ANSWER

You have 4 options here: Conditional Split, custom Script Component, Merge or a Lookup.

Conditional Split is going to be challenging on a number of fronts but the dynamic (I assume) nature of the filtering XML document is gonna make crafting the correct expression hard.

Custom Script component, as long as you can code in a VB/C#, you can do whatever you want. You'll need to either make it an asynchronous component and not send the matching rows onto the pipeline. Were it me, I'd add another column onto the pipeline from the Script Component as a boolean flag include/exclude. And then use a Conditional Split to shunt the unwanted rows to the bit bucket.

A Merge Sort probably works here but it's going to require sorted inputs and that's generally a performance killer so we're going to skip that and go to my preferred method.

Lookup component allows you to have 3 different routes of data out of it.

Match output. The match output handles the rows in the transformation input that match at least one entry in the reference dataset.

No Match output. The no match output handles rows in the input that do not match at least one entry in the reference dataset. If you configure the Lookup transformation to treat the rows without matching entries as errors, the rows are redirected to the error output. Otherwise, the transformation would redirect those rows to the no match output.

Error output. If there is no matching entry in the reference dataset, no join occurs. By default, the Lookup transformation treats rows without matching entries as errors. However, you can configure the Lookup transformation to redirect such rows to a no match output.

What I propose is that you're going to match your union all'ed data to the columns from the XML file. Anything that matches, the default, is going to go to the bit bucket aka nowhere. We'll set the No Match condition to Redirect rows to no match output

enter image description here

Flow the rows from the No Match output path into your destination. Easy peasy

Setup

To make this work, the Lookup needs the data to come from an OLE DB Connection manager or a Cached Connection Manager---and that's the one you'll want to use.

Add a precursor Data Flow that loads the XML into a Cache Connection Manager. You'll likely have an XML Source and shove one/all of those fields into the Cache Transform (under Other Transforms) specifying key columns.

References