I'm trying to write a word-processor program for processing huge files. Now whenever the user closes a file, I would prompt them "Do you want to save the file" if the file has been modified.
I'm implementing this using a dirty bit that is set whenever the user does any write operations.
However, this has the limitation that the file would be seen as dirty when it is in fact not dirty. For example, if the user types a character and deletes it, the file has not changed. However my "dirty-bit" implementation thinks that it has changed.
What's the best way in terms of speed, to detect if a file has really changed?
A full bit-for-bit comparison of the entire file is too slow. (Comparing the file hash is also too slow because the whole file needs to be processed to compute the hash. Doing a length comparison first before comparing the values works when the lengths are different, but fails when they aren't, like in my example above.)
Since this is a word processor program, it can have a history of actions as well. You can maintain 2 stack, one for historical actions (changes that have already been incorporated), and another for future actions (changes that had been applied, but now have been reverted in a linear fashion).
For example, every character typed in sequence can be one item in the actions stack, and delting it back could be equivalent to popping that action from stack of history items to stack of future actions (In case you need to redo the actions).
Now, whenever the history actions' stack is not empty, you prompt the user to close the file on closure.
For sake of simplicity, you can have limited number of history items (say last 100 actions). Then, since each add/subtract to document is happening with every user action, there is hardly any delay, and figuring out whether the stack is empty or not is an O(1) operation.