I have the following understanding of the git add file
and git checkout -- file
(but I am not sure if it is correct).
Whenever we edit files with a text editor, we do it in the working directory. Each time we can move file to the so called staging area
by executing git add file_name
. If we edit the file again (after git add
) we change the file in the working directory and in this way, in the working directory we have the file in a "new" state while in the staging area
the file is in the "old" state.
When we use git add
again, we bring the file in the staging area to the "new" state (the state from the working directory).
If we do git checkout -- file_name
, I assume that we take a file from the staging area and use it to overwrite the file in the working directory. In this way we can bring the file in the working directory to the "old" state. Is it correct?
What is also not clear to me, is if we copy or move the file from the staging area. In other words, does git checkout -- file
change the state of the file in the staging area
. Can we say that after git checkout -- file
the file in the staging area change the state of the file to its previous state in the staging area?
It's almost, but not quite, that symmetric.
It's true that
git add file
copies the file to the stage (aka "index"). However, the way it does so is a bit weird.Inside a git repo, everything is stored as a git "object". Each object has a unique name, its SHA-1 (those 40-character strings like
753be37fca1ed9b0f9267273b82881f8765d6b23
—that's from an actual.gitignore
I have here). The name is constructed by computing the hash on the file's contents (more or less—there's some gimmicking to make sure you don't make a file out of a directory tree or commit, and cause a hash collision, for instance). Git assumes that no matter the contents, the SHA-1 will be unique: no two different files, trees, commits, or annotated-tags will ever hash to the same value.Files (and symbolic links) are objects of type "blob". So a file that's in the git repo is hashed, and somewhere, git has a mapping: "file named
.gitignore
" to "hash value753be37fca1ed9b0f9267273b82881f8765d6b23
").In the repo, directory trees are stored as objects of type "tree". A tree object contains a list of names (like
.gitignore
), modes, object types (another tree or a blob), and SHA-1s:A commit object gets you (or git) a tree object, which eventually gets you the blob IDs.
The staging area ("index"), on the other hand, is simply a file,
.git/index
. This file contains1 the name (in a funny slightly-compressed form that flattens out directory trees), the "stage number" in the case of merge conflicts, and the SHA-1. The actual file contents are, again, a blob in the git repo. (Git does not store directories in the index: the index only has actual files, using that flattened format.)So, when you do:
git does this (more or less, and I'm deliberately glossing over filters):
file_name
(git hash-object -t blob
).-w
option tohash-object
)..git/index
(or$GIT_INDEX_FILE
) so that it has the mapping under the namefile_name
, to the name that came out ofgit hash-object
. This is always a "stage 0" entry (which is the normal, no-merge-conflict version).Thus, the file isn't really "in" the staging area, it's really "in" the repo itself! What's in the staging area is the name to SHA-1 mapping.
By contrast,
git checkout [<tree-ish>] -- file_name
does this:If given a
<tree-ish>
(commit name, tree-object ID, etc—basically anything git can resolve to a tree), do the name lookup from the tree found by converting the argument to a tree object. Using the object ID thus located, update the hash in the index, as stage 0. (Iffile_name
names a tree object, git recursively handles all the files in the directory the tree represents.) By creating stage 0 entries, any merge conflicts onfile_name
are now resolved.Otherwise, do the name lookup in the index (not sure what happens if
file_name
is a directory, probably git reads the working directory). Convert thefile_name
to an object ID (which will be a blob by this point). If there is no stage-0 entry, error out with the "unmerged" message, unless given-m
,--ours
,--theirs
options. Using-m
will "un-merge" the file (remove the stage 0 entry and re-create the conflicted merge2), while--ours
and--theirs
leave any stage 0 entry in place (a resolved conflict stays resolved).In any case, if this has not yet errored-out, use the blob SHA-1(s) thus located to extract the repo copy (or copies, if
file_name
names a directory) into the working directory.So, the short version is "yes and no":
git checkout
sometimes modifies the index, and sometimes only uses it. However, the file itself is never stored in the index, only in the repo. If yougit add
a file, change it some more, andgit add
it again, this leaves behind what git fsck will find as a "dangling blob": an object with no reference.1I'm deliberately omitting a lot of other stuff in the index that is there to make git perform well, and allow
--assume-unchanged
etc. (These are not relevant to the add/checkout action here.)2This re-creation respects any change to
merge.conflictstyle
, so if you decide you likediff3
output and already have a conflicted merge without thediff3
style, you can change the git config and usegit checkout -m
to get a new working-directory merge with the new style.