Sorry if this is off-topic, but here is your chance to reduce the amount of "homework" questions on this site :-)
I'm teaching a class of C programming where the students work on a small library of numeric routines in C. This year, the source files from several groups of students had significant amounts of code duplication in them.
(Down to identically misspelled printf
debug statements. I mean, how dumb can you be.)
I know that Git can detect when two source files are similar to each others beyond a certain threshold but I never manager to get that to work on two source files that are not in a Git repository.
Keep in mind that these are not particularly sophisticated students. It is unlikely that they would go to the trouble of changing variable/function names.
Is there a way I can use Git to detect significant and literal code duplication a.k.a plagiarism? Or is there some other tool you could recommend for that
Why use git at all? A simple but effective technique would be to compare the sizes of the diffs between all of the different submissions, and then to manually inspect and compare those with the smallest differences.