MOSS is a well-known server for checking software plagiarism. It allows teachers to send homework submissions, calculates the similarity between different submissions, and colors code blocks that are very similar. Here is an example of the results of the comparison. As you can see, it is very simple: it contains an HTML file with the index of the suspected files, and it contains links to specific HTML files for the comparison.
The results are kept on the MOSS website for two weeks. I would like to download all the results into my computer, so that I can view them later. I use this command on Linux:
wget -mkEpnp http://moss.stanford.edu/results/5/7683916027631/index.html
What I get is the following:
As you can see, only the index.html
file is downloaded. The other files, that are linked from the index.html
, e.g. match0.html and match1.html, are not downloaded.
I tried to mirror the same website with a different tool - Web HTTrack
- but got exactly the same results - only the index file is mirrored, and not the match
files.
The HTML looks very simple, so I cannot figure out why the mirroring does not work. What can I do to correctly mirror the results?
P.S. In case it is relevant, the robots.txt file contains the following:
User-agent: *
Disallow: /
you need to ignore robots.txt file e.g.
wget -r -l 1 -e robots=off http://moss.stanford.edu/results/1/XXXXXXXXXX/