When using code from unknown third partys on github, I always make sure to check the code that no obvious backdoors that could compromise the security of my system exist.
The specific state of the repository I am reviewing is probably bound to a git tag and a commit hash. As we know, the content of a git tag can easily be changed. So downloading the source code again and trusting it based on the version tag is definitely not secure.
My question is: When dowing a fresh download of the source code, can I trust that if I checkout a specific commit based on it's full commit hash, that this is 100% the same code I reviewed before?
The focus of this question is not on the probability of sha1 collisions occuring at all (as a collision is alot easier to compute than computing a specific sha1 hash - which is - hopefully - pretty much impossible at the moment?) , but whether each and every file is part of this sha1 sum, so that a change would always trigger a different hash.
in short: yes.
on this page you can see how this sha1 sum is formed. It is composed of:
So every change in every file is contained in the calculation of the sha1sum. AFAIK you can trust that any change to any file would in every case give a different sha1 sum.
EDIT: I started working through one of my commits:
gives:
now
git cat-file -p 563ccb5109fbf0a01d99517ca1dbe15db349592d
:and I can continue deeper:
git cat-file -p d8fe4fa70f618843e9ab2df67167b49565c71f25
:(which is the content of my .gitignore file) or
git cat-file -p 256db03954535d25d5f340603e707207170f199c
:(which is the content of my "spec" directory).
so as you can see, the contents of each and every file is recursively present in the sha1 sum of the file; then in the sha1 sum of the source tree, and finally in the sha1 sum of the commit.