I scraped reviews from a web and there are pros and cons separate from each other. I scraped them as a list because it looks like as the best solution for not having the same review with user, date etc. ten times just to separate the points of pros/cons.
Now I have a good structure of the dataset due to library pandas but I don't know how to work with the prons/cons as a list. The following preprocessing - removing punctuation etc. to start using language model fastText is the problem here - I would remove the [] as a list of pros/cons and the separated items... what should I do? or does it have no influence later for the fastText?
For example: review #1: product - user - date - pros['useful', 'nice price'] - cons['bad design']
I'm not that familiar with fastText yet, but I'm scared of having bad results of 'useful nice price'. If you also have any tips to work with fastText, I would be greatful.
Thank you!
I'm a little confused as to what exactly your question is. You are right that you should get rid of any brackets. fastText can handle the data as one string, as it has built in subword processing. So for example you could pass the pros as "useful nice price" and you should be able to get the results you're looking for. Alternatively you could break it down yourself and tokenize the data before passing it to fastText.
With more details about the project I can help you a lot more. I assume the goal here is to train a model to recognize reviews as positive or negative.
fastText isn't difficult at all. Since you're using python, check out this link and if you familiarize yourself with that page you should find anything you're looking for.
Again let me know more details or questions you have. Good luck!
FastText + Python