I want to compare what students answered in a multiple answer test and see which students deviate by only one answer. I've already converted answers (e.g. ABCDE) into hits or misses (eg. 00101). This test has 45 questions, so answer and hits strings are very long (e.g. 100000000000000010000010010001001000000000100). Base R can't even deal with these strings as numbers because of the floating-point error.
I have texts_1 with answers that got n questions right, and texts_2 with answers that got n+1 questions right. Then I compare every line in texts_1 with every line in texts_2 to look for strings with one character of difference.
One way to do this is with adist.
if (adist(texts_1[line_1], texts_2[line_2]) == 1) { ... }
If the result is 1, I know there's only one difference in the texts. This works, but the problem is that adist is very slow, and I have thousand of comparisons to do. It took 4 hours to make 20000 x 30000 comparisons.
My idea was to treat the hits and misses string as a number, and subtract them. If the answer was a power of 10, I'd know there was only one question different. e.g. 1101 - 1001 is a power of 10. However, R can't deal with numbers this big. Is there a package that lets me deal with binary numbers this large? Subtract and divide? Also some binary numbers will lead with zeroes.
tl;dr: How to subtract 001111111111111101111011111111111111111111111 - 001111111111111101111011111111111110111111111 in R? And then check if the answer is a power of 10?
If I understand the problem right this might help
If these are your tests
Getting the tests that only differ by one
Getting the test numbers