I would like to unzip and parse an xml file located here
Here is my code:
HttpClientHandler handler = new HttpClientHandler()
{
CookieContainer = new CookieContainer(),
UseCookies = true,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
// | DecompressionMethods.None,
};
using (var http = new HttpClient(handler))
{
var response =
http.GetAsync(@"https://login.tradedoubler.com/report/published/aAffiliateEventBreakdownReportWithPLC_806880712_4446152766894956100.xml.zip").Result;
Stream streamContent = response.Content.ReadAsStreamAsync().Result;
using (var gZipStream = new GZipStream(streamContent, CompressionMode.Decompress))
{
var settings = new XmlReaderSettings()
{
DtdProcessing = DtdProcessing.Ignore
};
var reader = XmlReader.Create(gZipStream, settings);
reader.MoveToContent();
XElement root = XElement.ReadFrom(reader) as XElement;
}
}
I get an exception on XmlReader.Create(gZipStream, settings)
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream
To double check that I am getting properly formatted data from the web, I grab the stream and save it to a file:
byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
File.WriteAllBytes(@"C:\\temp\1111.zip", byteContent);
After I examine 1111.zip, it appears as a well formatted zip file with the xml that I need.
I was advised here that I do not need GZipStream at all but if I remove compression stream from the code completely, and pass streamContent directly to xml reader, I get an exception:
"Data at the root level is invalid. Line 1, position 1."
Either compressed or not compressed, I still fail to parse this file. What am I doing wrong?
The file in question is encoded in PKZip format, not GZip format.
You'll need a different library to decompress it, such as System.IO.Compression.ZipFile.
You can typically tell the encoding by the file extension. PKZip files often use
.zip
while GZip files often use.gz
.See: Unzip files programmatically in .net