I am using RSS DIH in solr to index various rss feeds. The issue I am facing is in date field. The pubDate return in RSS is not in the format Solr expects and hence I get the exception when I start my Solr with this rss-data-config.xml file. Can someone tell me how I can convert the rss date into solr expected date format in rss-data-config.xml file?
In Schema.xml I have defined pubDate as date.
here is how my rss-data-config.xml looks:
<dataConfig>
<dataSource type="URLDataSource" name="dsurl"/>
<dataSource type="JdbcDataSource" name="dsdb" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/HCDACoreDB" user="root" password="CDA@318"/>
<document>
<entity name="rssimports"
pk="link"
url="${dataimporter.request.feedurl}"
processor="XPathEntityProcessor"
forEach="/rss | /rss/channel | /rss/channel/item"
transformer="HTMLStripTransformer"
dataSource="dsurl">
<field column="source" xpath="/rss/channel/title" commonField="true" dataSource="dsurl"/>
<field column="source-link" xpath="/rss/channel/link" commonField="true" dataSource="dsurl"/>
<field column="Source-desc" xpath="/rss/channel/description" commonField="true" dataSource="dsurl"/>
<field column="title" xpath="/rss/channel/item/title" dataSource="dsurl"/>
<field column="link" xpath="/rss/channel/item/link" dataSource="dsurl"/>
<field column="description" xpath="/rss/channel/item/description" stripHTML="true" dataSource="dsurl"/>
<field column="pubDate" xpath="/rss/channel/item/pubDate" dataSource="dsurl"/>
<field column='${dataimporter.functions.formatDate('${dataimporter.request.pubDate}', 'EEE, dd MMM YYYY HH:mm:ss z')}' name="pubDate"/>
<field column="guid" xpath="/rss/channel/item/guid" dataSource="dsurl"/>
<field column="content" xpath="/rss/channel/item/content" dataSource="dsurl"/>
<field column="author" xpath="/rss/channel/item/creator" dataSource="dsurl"/>
<entity name="feedcategory"
query="select category.CategoryName from feeds, category where feeds.FeedUrl = '${dataimporter.request.feedurl}' AND feeds.FeedCategory = category.CategoryId"
processor="SqlEntityProcessor"
dataSource="dsdb">
<field column="CategoryName" name="category" dataSource="dsdb"/>
</entity>
</entity>
</document>
</dataConfig>
Pls help.
You want to set up the DateFormatTransformer to deal with getting the date to the format solr is expecting.
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer