google-groups rss feed has truncated description

236 views Asked by At

I'm trying to analyze the sentiment of the google group forum I'm running. In order to get the forum content, I came to know of two methods: 1. Webscraping from google-groups using selenium but this method is unreliable and google changes the class names often. 2. Using RSS feed.

The 2nd method seemed to be a good option but the problem was the RSS feed descriptions were truncated. Is there a way to get the complete description without truncation ? or is there any other way to get the content of a public google groups ?

1

There are 1 answers

0
goofy On BEST ANSWER

To those who are facing similar problems - scraping google group contents, I came across a python pkg called gg_scraper 0.10.0 written by "Matěj Cepl" that downloaded the google group content into MBOX files. I later converted these MBOX files into JSON formatted files for my use.