In Git, what is the purpose of the index?

269 views Asked by At

I understand that the typical Git workflow is a three step process: (modify files in work tree) -> (modify the index with git add/rm/etc) -> (run git commit).

However, why doesn't Git just treat the work tree as the staging area? E.g., you modify a file and it's automatically "staged" for commit unless you specifically tell git not to stage it. This is more of an "opt-out" approach rather than "opt-in", which makes sense to me because 99% of the files in your work tree will be committed. It would also render the whole git stash mechanism redundant, as you could simply create a temporary branch from your work tree rather than save a stash to apply later on.

If there's a valid reason for the separation between the work tree and the index, I'd love to hear it... perhaps my confusion is stemming from the fact that I haven't fully got my head around Git as yet.

2

There are 2 answers

0
Bart Jedrocha On

The Git site actually has a good explanation of what the staging area is, the reason why it exists, the benefits it offers, and a way to bypass it when making your commits.

Taken from http://git-scm.com/about/staging-area

Staging Area

Unlike the other systems, Git has something called the "staging area" or "index". This is an intermediate area where commits can be formatted and reviewed before completing the commit.

One thing that sets Git apart from other tools is that it's possible to quickly stage some of your files and commit them without committing all of the other modified files in your working directory or having to list them on the command line during the commit.

This allows you to stage only portions of a modified file. Gone are the days of making two logically unrelated modifications to a file before you realized that you forgot to commit one of them. Now you can just stage the change you need for the current commit and stage the other change for the next commit. This feature scales up to as many different changes to your file as needed.

Of course, Git also makes it easy to ignore this feature if you don't want that kind of control — just add a '-a' to your commit command in order to add all changes to all files to the staging area.

Hope this helps. Cheers!

2
torek On

Mercurial is basically "git without an index", and as such, is proof that it can be done. Any "why" question falls into a lot of grey areas. But this is what the index provides:

  • Very fast "git commit": the index already contains the next commit; it merely needs to be re-formatted into tree objects and a final commit object.

  • A staging area, so that you can make the commit not match the working directory. (Not everyone believes this is a good thing.) In Mercurial, to commit files A and C while omitting the work version of B, you must issue the commit command with the names of all the files to include (or exclude) spelled out all at once. In git, you can set up a stage, run git diff --cached (or --staged), decide if it's right or needs tweaking, git add or git reset to adjust the stage, run another git diff --cached, and so on. (In Mercurial I've found it's by far easiest to achieve the same thing by moving all the "unwanted" changes out of the repository area, hg commit, then move those changes back.)

  • Ease of amending unpublished commits. Having gotten used to the staging area and amending process in git, when I did an amend in Mercurial I was surprised to find my entire current work tree became the new amended commit. (I should not have been surprised, but I was! Again this goes back to the different philosophy: in hg, you move the "not ready yet" parts out of the repo entirely, lest they sneak in.)

  • Tricky hacks. (Again, not everyone believes this is a good thing.) In particular you can set bits in the index like "assume unchanged" or "intent to add", which affect future commits (again because the index is, in a sense, "the next commit being built").

  • A way to hold on to, and hence easily access, files being merged, when there is a merge conflict.

That last one deserves some extra explanation. Suppose you are merging old-fix into feature, and in feature, a file was renamed FA. The merge realizes that it needs to get the file (under its old name) from branch old-fix, and FA from feature, and merge them. But there is a merge conflict, and your version system stops and needs your help in completing the merge.

Now suppose you want to look at the version of FA in old-fix, comparing it to FA in feature. If you literally check out branch old-fix, there is no file named FA! But git stores it in the index so that you can see it, without having to know what the old name was, since the index is constructing the next commit (which retains the new FA name).

You can also look at the feature version, although of course that's easier since you know it's named FA. But it's there in the index. In addition, the common (base) version (which is also under the same old name as in old-fix) is in the index, as noted in gitrevisions:

A colon, optionally followed by a stage number (0 to 3) and a
colon, followed by a path, names a blob object in the index at the
given path. A missing stage number (and the colon that follows it)
names a stage 0 entry. During a merge, stage 1 is the common
ancestor, stage 2 is the target branch's version (typically the
current branch), and stage 3 is the version from the branch which
is being merged.

That is, :1:FA is the common ancestor for file FA, :2:FA is feature's version, and :3:FA is old-fix's version.

All of this fine control results in some mistakes for beginners (or even experts, sometimes) so Mercurial's index-less version may suit your work better. However, using git commit -a, you get essentially the same behavior as with Mercurial, so you can often ignore it until you want it.