I am working on a SSIS package.
I have imported data from multiple sources.
Then I used UnionAll
to combine the data.
I have an XML file, this file contains the rows that I have to remove. There is not any ID column. the XML file looks like:
<Suppliers>
<Supplier SupplierName="ACCO UK LTD" Entity="Department for Business, Innovation and Skills" ExpenseArea="Finance, Commercial & Digital Transformation - Estates"/>
<Supplier SupplierName="ADAM PHONES LTD" Entity="Department for Business, Innovation and Skills" ExpenseArea="Finance, Commercial & Digital Transformation - Digital Directorate"/>
<Supplier SupplierName="Alexander Bryson" Entity="Department for Business, Innovation and Skills" ExpenseArea="Government Office for Science - Government Office for Science"/>
<Supplier SupplierName="Auto Mobile Valet Limited" Entity="Department for Business, Innovation and Skills" ExpenseArea="Skills, Deregulation and Local Growth - Better Regulation Delivery Office"/>
</Suppliers>
I have 80 rows.
I have imported this XML file in XML component. But I don't know what to do next. I know about conditional split but how can I use it in my case? Or any alternatives?
Thank you.
You have 4 options here: Conditional Split, custom Script Component, Merge or a Lookup.
Conditional Split is going to be challenging on a number of fronts but the dynamic (I assume) nature of the filtering XML document is gonna make crafting the correct expression hard.
Custom Script component, as long as you can code in a VB/C#, you can do whatever you want. You'll need to either make it an asynchronous component and not send the matching rows onto the pipeline. Were it me, I'd add another column onto the pipeline from the Script Component as a boolean flag include/exclude. And then use a Conditional Split to shunt the unwanted rows to the bit bucket.
A Merge Sort probably works here but it's going to require sorted inputs and that's generally a performance killer so we're going to skip that and go to my preferred method.
Lookup component allows you to have 3 different routes of data out of it.
What I propose is that you're going to match your union all'ed data to the columns from the XML file. Anything that matches, the default, is going to go to the bit bucket aka nowhere. We'll set the No Match condition to
Redirect rows to no match output
Flow the rows from the No Match output path into your destination. Easy peasy
Setup
To make this work, the Lookup needs the data to come from an OLE DB Connection manager or a Cached Connection Manager---and that's the one you'll want to use.
Add a precursor Data Flow that loads the XML into a Cache Connection Manager. You'll likely have an XML Source and shove one/all of those fields into the Cache Transform (under Other Transforms) specifying key columns.
References