I want to be able to store media RSS and iTunes podcast RSS feeds into the database. The requirement here is that I don't want to miss out on ANY element or its attributes in the feed. It would make sense to find all most common elements in the feed and have them stored in database as separate columns. The catch here is that there can be feed specific elements that may not be standard. I want to capture them too. Since I don't know what they can be, I won't have a dedicated column for them.
Currently I have 2 tables called feeds and feed_entries. For RSS 2.0 tags like enclosures, categories, I have separate tables that have associations with feeds/feed_entries. I am using feedzirra for parsing the feeds. Feedzirra requires us to know the elements in the feed we want to parse and hence we would not know if feed contains elements beyond what feedzirra can understand.
What would be the best way to go about storing these feeds in the database and not miss single bit of information? (Dumping of the whole feed into the database as is won't work as we want to query most of the attributes). What parser would be the best fit? Feedzirra was chosen for performance, however, getting all data in the feed into the database is a priority.
Update
I'm using MySQL as the database.
Store the XML as a CLOB, most databases have XML processing extensions that allow you to include XPath type queries as part of a SELECT statement.
Otherwise if your DBMS does not support XML querying, use your languages XPath implementation to query the CLOB. You will probably need to extract certain elements into table columns for speedy querying.