I'm following this excellent post by Rainer Stropek on 1. creating a custom SSIS dataflow item & 2. pulling data from Azure table storage. http://www.software-architects.com/devblog/2010/11/12/Custom-SSIS-Data-Source-For-Loading-Azure-Tables-Into-SQL-Server
I'll paste the code that would be of any interest to anyone who's attempting to answer my question:
GenericTableContext context = new GenericTableContext(cloudStorageAccount.TableEndpoint.AbsoluteUri, cloudStorageAccount.Credentials);
public override void PrimeOutput(int outputs, int[] outputIDs, PipelineBuffer[] buffers) {
IDTSOutput100 output = ComponentMetaData.OutputCollection[0];
PipelineBuffer buffer = buffers[0];
foreach(var item in this.context.CreateQuery<GenericEntity>("SomeTable") {
buffer.AddRow();
for (int x = 0; x < columnInformation.Count; x++) {
var ci = (ColumnInfo) columnInformation[x];
var value = item[ci.ColumnName].Value;
if (value != null) {
buffer[ci.BufferColumnIndex] = value;
} else {
buffer.SetNull(ci.BufferColumnIndex);
}
}
}
Now the Question:
My data set is very large; let's say a million records. I have an SSIS package in its simplest form, i.e. My custom data source writing data to a flat file destination. It has no transformations.
As I run my package, I can see records getting written on to the CSV file in batches of approx 10,000 records (DefaultBufferMaxRows).
My problem is, although these records are written, it still seems to remain in memory. Which ultimately causes my package to slow down and come to a halt. All the memory in my server is used up.
Now I'm no SSIS expert, but i'm assuming that the records that I write in to the PipelineBuffer are not getting released after they are consumed. the word "Buffer" implies to me that it could be "flushed". But I could not find any API to do this.
So I guess my question is, How can I release the memory allocated to records that have already been written on to the flat file so I have enough memory for the remaining records?
Thank you, Nandun
each buffer will flush, more precisely be reused once it has completed its journey through the pipeline. So SSIS manages the memory for you in this situation though you have to ensure that you have the proper rows and memory size allocated(you can turning on the buffer tuning event in logging to see if this is happening to your package). Typically people adjust the memory up and fail to realize it is per component in the data flow and subsequently the packages spend a significant time trying to over allocate huge amounts of ram.
it is also possible if your component is a destination your are not releasing the connection you need to use a connection manager or implement the resource management code to clean up in the script component.