I am trying to create a multiclass classifier to identify topics of Facebook posts from a group of parliament members.
I'm using SimpleTransformers to put together an XML-RoBERTa-based classification model. Is there any way to add an embedding layer with metadata to improve the classifier? (For example, adding the political party to each Facebook post, together with the text itself.)
If you have a lot of training data, I would suggest adding the meta data to the input string (probably separated with
[SEP]
as another sentence) and just train the classification. The model is certainly strong enough to learn how the metadata interract with the input sentence, given you have enough training examples (my guess is tens of thousands might be enough).If you do not have enough data, I would suggest running the XLM-RoBERTa only to get the features, independently embed your metadata, concatenate the features, and classify using a multi-layer perceptron. This is proably not doable SimpleTransformers, but it should be quite easy with Huggingface's Transformers if you write the classification code directly in PyTorch.