C# Merging multiple memory streams and returning via filestreamresult, showing only last memory stream

196 views Asked by At

I am trying to merge multiple pdf files into one memory stream and return it with FileStreamResult to web application.

My code is like that:

string contentType = "application/pdf";
                //---save as fileStream Excel
                if (filePath == null)
                    filePath = new DirectoryInfo(Server.MapPath("~") + ConfigurationManager.AppSettings["SomeTemplate"].ToString()).ToString();

                var sumPdfStream = new MemoryStream();
                
                string fileDownloadName = "SomeFile.pdf";

//Generating Excel file for every offer/file, getting each offer through ID from string offerIDs, comma separated values
                for (int i = 0; i < offerIDs.Split(',').Length - 1; i++)
                {
                    var pdfStream = new MemoryStream();
                    var fileStream = new MemoryStream();
                    var offerID = Convert.ToInt32(offerIDs.Split(',')[i]);
                    fileStream = GenerateExcel(offerID, ref offerNumber, filePath);
                    //string fileDownloadName = "Ponudba_" + offerNumber + "_" + DateTime.Now.ToShortDateString().Replace(".", "-").Replace("/", "-") + ".pdf";

                    Spire.Xls.Workbook workbook = new Spire.Xls.Workbook();

                    workbook.LoadFromStream(fileStream, ExcelVersion.Version2013);
                    workbook.SaveToStream(pdfStream, Spire.Xls.FileFormat.PDF);

                    pdfStream.CopyTo(sumPdfStream);
                    pdfStream.Close();
                    pdfStream.Dispose();
                    fileStream.Close();
                    fileStream.Dispose();
                    //sumPdfStream.Write(pdfStream.GetBuffer(), (int)sumPdfStream.Position, (int)pdfStream.Length);
                }


                //sumPdfStream.Seek(0, 0);
                sumPdfStream.Position = 0;
                
                var fsr = new FileStreamResult(sumPdfStream, contentType);
                //sumPdfStream.Close();
                fsr.FileDownloadName = fileDownloadName;
                //fsr.FileStream.Seek(0, 0);
                return fsr;

Basically, I am trying to call some code to generate Excel file for each Offer (getting data from database) and then with Spire licence converting Excel to PDF and writing it to MemoryStream. Since there can be X offers/PDF files, I am trying to merge them into one with CopyTo method.

When I am debugging my code and setting breakpoint to line pdfStream.CopyTo(sumPdfStream), I can see that my sumPdfStream object, which is meant to hold all PDFs, is growing.

At the end I create FileStreamResult and copy sumPdfStream with all PDFs into it. Again, when I debug code all seems OK (FileStreamResult is bigger than just one PDF), but when I get FileStreamResult returned to the web application, only last PDF shows.

What am I doing wrong? Why is FileStreamResult showing only last one PDF, although there seems to be much more inside FileStreamResult?

2

There are 2 answers

0
SmolkoMatic On BEST ANSWER

After comment from Derek Pollard, I solved this problem by creating and merging multiple Excels into one and create a PDF from it.

All was done using filestream objects, no physical files were created, saved or deleted. After all done I send PDF to view with FileStreamResult object. Browser than opens a new window, or you can save file to downloads.

For merging Excel streams and converting file stream to PDF I have used Spire.XLS libraries with Licence version.

Here is the code:

            string contentType = "application/pdf";
            string offerNumber = "";
            //---save as fileStream Excel
            if (filePath == null)
                filePath = new DirectoryInfo(Server.MapPath("~") + ConfigurationManager.AppSettings["PonudbaTemplate"].ToString()).ToString();

            var sumPdfStream = new MemoryStream();
            string fileDownloadName = "blah.pdf";

            Spire.Xls.Workbook finalWorkbook = new Spire.Xls.Workbook();
            finalWorkbook.Worksheets.Clear();

            for (int i = 0; i < offerIDs.Split(',').Length - 1; i++)
            {
                //var pdfStream = new MemoryStream();
                var fileStream = new MemoryStream();
                var offerID = Convert.ToInt32(offerIDs.Split(',')[i]);
                fileStream = GenerateExcel(offerID, ref offerNumber, filePath);

                Spire.Xls.Workbook workbook = new Spire.Xls.Workbook();

                workbook.LoadFromStream(fileStream, ExcelVersion.Version2013);
                for (int x = 0; x < workbook.Worksheets.Count; x++)
                {
                    //fit to one page width
                    workbook.Worksheets[x].PageSetup.FitToPagesWide = 1;
                    workbook.Worksheets[x].PageSetup.FitToPagesTall = 0;
                }

                
                workbook.ConverterSetting.SheetFitToPage = false;


                workbook.DocumentProperties.Author = NeoAms.GlobalVariables.NeoGlobal.Authentication.CurrentUserName;
                workbook.DocumentProperties.Company = "blah";
                workbook.DocumentProperties.Subject = "blahblah";
                workbook.DocumentProperties.Title = fileDownloadName;

                foreach (Spire.Xls.Worksheet sheet in workbook.Worksheets)
                {
                    //Copy each worksheet from the current workbook to the new workbook
                    finalWorkbook.Worksheets.AddCopy(sheet, WorksheetCopyType.CopyAll);
                }
            }

            finalWorkbook.SaveToStream(sumPdfStream, Spire.Xls.FileFormat.PDF);
            sumPdfStream.Position = 0;
            
            var fsr = new FileStreamResult(sumPdfStream, contentType);
            fsr.FileDownloadName = fileDownloadName;

            return fsr;
3
K J On

When you merge two PDF files together by concatenation you will get a working file but it should in Adobe Acrobat just be seen as if it's the first one since the first file takes priority and is a file in its entirety. In the combined file (shown below left) you can see it is simply one page of one whole file.

This is a minimal (not normal but representative) illustration of why the two files are valid but the address INDEX (xref table) is what is used when possible (if not corrupted).

enter image description here

It is possible to add a new index to point to all the objects in both files but the object numbering would also have to be unique for the second file. That would be fairly easy for a proof of concept, but not for a good working solution.

Any merge utility would, simply remove the first index and renumber the second page but that also requires the start of the file index also be modified (or similar adjustment).

2 0 obj <</Type/Pages/MediaBox [ 0 0 200 200 ]/Count 1/Kids [ 3 0 R ]>>endobj

So the upshot is 2 file objects need to be written in FILE system fashion with file addresses then the contents interlaced in another file system unit.pdf (usually a new filename). Then the file addresses can be recalculated for the new interlaced file. NOTE that this would be a very inefficient use of valuable processor memory since there would in theory need to be all 3 files in memory at the same time, while picking (fileseek) from the first 2 files and packing into a bigger joint file. A merging library may do it better in chunks but may be faster from disk in parts, when the utility is using that valuable memory for itself (or threads).

enter image description here

%PDF-1.7
1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj
2 0 obj <</Type/Pages/MediaBox[0 0 200 200]/Count 2/Kids[3 0 R 6 0 R]>>endobj
3 0 obj <</Type/Page/Parent 2 0 R/Resources <</Font <</F1 4 0 R >>>>/Contents 5 0 R>>endobj
4 0 obj <</Type/Font/Subtype/Type1/BaseFont/Times-Roman>>endobj
5 0 obj <</Length 44>>
stream
BT 70 50 TD /F1 12 Tf (Hello, World 1) Tj ET
endstream
endobj

6 0 obj <</Type/Page/Parent 2 0 R/Resources <</Font <</F1 4 0 R >>>>/Contents 8 0 R>>endobj
7 0 obj <</Type/Font/Subtype/Type1/BaseFont/Times-Roman>>endobj
8 0 obj <</Length 44>>
stream
BT 70 50 TD /F1 12 Tf (Hello, World 2) Tj ET
endstream
endobj

xref
0 9
0000000000 65535 f 
0000000009 00000 n 
0000000054 00000 n 
0000000132 00000 n 
0000000224 00000 n 
0000000288 00000 n 
0000000381 00000 n 
0000000473 00000 n 
0000000537 00000 n 
trailer
<</Size 9/Root 1 0 R>>
startxref
630
%%EOF

Edit

I introduced redundant bloat in my optimisation, as object 7 is the font for second page and I had sub-set second page to same font as first page.

7 0 obj <</Type/Font/Subtype/Type1/BaseFont/Times-Roman>>endobj
In a real file the fonts would add to the bloat as often repeated, yet different font contents.

To avoid double the byte size output, the file contents should be merged at source. Then only ONE single minimal PDF generated. This can be done in Excel by merging outputs onto one printing spread out sheet, then printing that sheet in multiple pages, thus only one set of fonts are needed. The bonus of one source file is, then you can add features like "Bookmarks" or "Table of Contents".