Similar Questions but Different Response Set Up in Survey Data Sets

17 views Asked by At

I need some advice on prepping and cleaning my data. I have two survey data sets (2020 and 2021).

The 2021 survey had additional questions and a change of wording, but the questions between both years are mostly the same. However, I had to manually go through the data sets and identify columns that indicated the same information. To keep track of similar columns between the data sets, a reference key was used to keep track of the similar columns from both years.

Within that, Ive identified a couple of questions that are very similar in nature but the response setup is completely different. Am I able to change that to be more similar without messing anything up? If so, how should I go about doing it? Ive attached screen shots of the questions from both surveys. The similar questions are highlighted in green.

Within that, Ive identified a couple of questions that are very similar in nature but the response setup is completely different. Am I able to change that to be more similar to be able to include it in a merged dataset without messing anything up? If so, how should I go about doing it?

Ive attached screen shots of the questions from both surveys. The similar questions are highlighted in green.

Would measuring similarity between two sentences using cosine similarity be a way to do this?

Also, would python or sql be easier to do this in?

[2020 Question and Response 1](https://i.stack.imgur.com/Wz1Re.png)
[2020 Question and Response 2](https://i.stack.imgur.com/tNESF.png)
[2021 Question and Response 1](https://i.stack.imgur.com/B3voa.png)
[2021 Question and Response 2](https://i.stack.imgur.com/2GHne.png)
0

There are 0 answers