Exceptions with DateTime parsing in RSS feed in C#

7.8k views Asked by At

I'm trying to parse Rss2, Atom feeds using SyndicationFeedFormatter and SyndicationFeed objects. But I'm getting XmlExceptions while parsing DateTime field like pubDate and/or lastBuildDate.

Wed, 24 Feb 2010 18:56:04 GMT+00:00 does not work

Wed, 24 Feb 2010 18:56:04 GMT works

So, it's throwing due to the timezone field.

As a workaround, for familiar feeds I would manually fix those DateTime nodes - by catching the XmlException, loading the Rss into an XmlDocument, fixing those nodes' value, creating a new XmlReader and then returning the formatter from this new XmlReader object (code not shown). But for this approach to work, I need to know beforehand which nodes cause exception.

        SyndicationFeedFormatter syndicationFeedFormatter = null;
        XmlReaderSettings settings = new XmlReaderSettings();
        using (XmlReader reader = XmlReader.Create(url, settings))
        {
            try
            {
                syndicationFeedFormatter = SyndicationFormatterFactory.CreateFeedFormatter(reader);
                syndicationFeedFormatter.ReadFrom(reader);
            }
            catch (XmlException xexp)
            {
                // fix those datetime nodes with exceptions and read again.
            }
        return syndicationFeedFormatter;
    }

rss feed: http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=test&cf=all&output=rss

exception detials:

XmlException Error in line 1 position 376. An error was encountered when parsing a DateTime value in the XML.
at System.ServiceModel.Syndication.Rss20FeedFormatter.DateFromString(String dateTimeString, XmlReader reader)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result) at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader) at ... cs:line 171

<rss version="2.0">
  <channel>
    ...
    <pubDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</pubDate>
    <lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate> <-----exception
    ...
    <item>
      ...
      <pubDate>Wed, 24 Feb 2010 16:17:50 GMT+00:00</pubDate>
      <lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate>
    </item>
    ...
  </channel>
</rss>

Is there a better way to achieve this? Please help. Thanks.

2

There are 2 answers

0
Hussein Alrubaye On

to convert PublishDate in RSS to your computer datetime you could write this lines

  string dateStr = item.PublishDate.ToString("ddd MMM dd HH:mm:ss zzzz yyyy");
                    DateTime PostDate = DateTime.ParseExact(dateStr, "ddd MMM dd HH:mm:ss zzzz yyyy", CultureInfo.InvariantCulture);
2
James Lawruk On

Here is my hacky workaround for reading Google News RSS feeds.

string xml;
using (WebClient webClient = new WebClient())
{
    xml = Encoding.UTF8.GetString(webClient.DownloadData(url));
}
xml = xml.Replace("+00:00", "");
byte[] bytes = System.Text.UTF8Encoding.ASCII.GetBytes(xml);  
XmlReader reader = XmlReader.Create(new MemoryStream(bytes));
SyndicationFeed feed = SyndicationFeed.Load(reader);