I am using IBM Watson Personality Insights in the context of an academic research project.
From the analyses so far, I noticed that often there are large differences between the raw and percentile scores for the same tweets and in several cases the scores are even on opposite ends (e.g., agreeableness raw score: 0.21 and agreeableness percentile score 0.76). Moreover, on aggregate level for my sample population, the variance within the personality traits is much higher for percentile scores and very low for raw scores (all observations within a range of 0.1-0.2 per trait).
I understand that the percentiles are normalized scores and the interpretations of the scores are different. My question is which score is typically used by reseachers that aim to apply them in regression analysis (e.g., individual's personality traits - success)? In the papers I have seen that apply Personality Insights, the author's do not discuss which score they use. It would be great if you have some thoughts on this and could share any research that discusses their approach with Personality Insights in more detail.
Thanks a lot in advance for your guidance!
You are correct that the scores have different interpretations. The raw scores are exactly that, where as the normalized score is over a larger population. While the ranges for a trait's raw scores are from 0 - 1 in practice this isn't always the case, and the scores may be concentrated in a narrower band. This is why in the example you show above, a small change in the raw score can have a lot bigger change in the percentile score.
Note that to calculate the percentiles we ran the profiles for a larger population (100Ks) where you'd observe these trends that may not show in a smaller sample.
As for your other question; which score you'd use very much depends. In general, most use the percentile score as that gives you an idea how a given group of people compares with the population at large. For instance, if I'm interested to see how one group compares to another, using the percentile scores makes it easier to intuitively understand the differences (an agreeableness difference of 25% is easier is a lot easier to understand than a raw difference of 0.1 as you won't know whether that is significant or not). On the other hand, the raw scores are used mostly when the you are creating a larger model and are using the PI score as 1 of the features. In that case it is typically helpful to use the raw scores as you'd draw your own conclusions from the larger model.