Download and Unzip XML file

2k views Asked by At

I would like to unzip and parse an xml file located here

Here is my code:

HttpClientHandler handler = new HttpClientHandler()
{
    CookieContainer = new CookieContainer(),
    UseCookies = true,
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
   // | DecompressionMethods.None,

};

using (var http = new HttpClient(handler))
{

    var response =
         http.GetAsync(@"https://login.tradedoubler.com/report/published/aAffiliateEventBreakdownReportWithPLC_806880712_4446152766894956100.xml.zip").Result;

    Stream streamContent = response.Content.ReadAsStreamAsync().Result;

    using (var gZipStream = new GZipStream(streamContent, CompressionMode.Decompress))
    {
        var settings = new XmlReaderSettings()
        {
             DtdProcessing = DtdProcessing.Ignore
         };

         var reader = XmlReader.Create(gZipStream, settings);
         reader.MoveToContent();

         XElement root = XElement.ReadFrom(reader) as XElement;
     }
}

I get an exception on XmlReader.Create(gZipStream, settings)

The magic number in GZip header is not correct. Make sure you are passing in a GZip stream

To double check that I am getting properly formatted data from the web, I grab the stream and save it to a file:

byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
File.WriteAllBytes(@"C:\\temp\1111.zip", byteContent);

After I examine 1111.zip, it appears as a well formatted zip file with the xml that I need.

I was advised here that I do not need GZipStream at all but if I remove compression stream from the code completely, and pass streamContent directly to xml reader, I get an exception:

"Data at the root level is invalid. Line 1, position 1."

Either compressed or not compressed, I still fail to parse this file. What am I doing wrong?

2

There are 2 answers

0
Dan Wilson On BEST ANSWER

The file in question is encoded in PKZip format, not GZip format.

You'll need a different library to decompress it, such as System.IO.Compression.ZipFile.

You can typically tell the encoding by the file extension. PKZip files often use .zip while GZip files often use .gz.

See: Unzip files programmatically in .net

0
Nino On

After you save stream to local folder, unzip it with ZipFile class. Something like this:

    byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
    string filename = @"C:\temp\1111.zip";
    File.WriteAllBytes(filename, byteContent);

    string destinationDir = @"c:\temp";
    string xmlFilename = "report.xml";

    System.IO.Compression.ZipFile.ExtractToDirectory(filename, destinationDir);

    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(Path.Combine(destinationDir, xmlFilename));

    //xml reading goes here...