Let's say I've written a method in feature1 branch and after some time I realize that I need this piece of code in another feature2 branch as well.
So I just copy/paste the code from feature1 into feature2 and the work simultaneously continues on both branches. I cannot merge feature1 into feature2, because then reviewers of feature2 will also have to check the changes from feature1 as well.
Then I ask reviewers to review both features.
Assume feature1 is merged into master and then I want to merge feature2 into master as well. But because of the copy/paste I get a merge conflict, so I have to ask for reviews again.
This is not a problem per se. But is there a way to avoid this conflict?
Your question starts with some incorrect assumptions:
Let's take a look at what it means to merge in Git, and some of the common things that go wrong.
Git is about commits
Those new to Git often think Git is about files or branches, but it's not. Git is about commits. A commit holds files—each commit has a full snapshot of every file, in fact—and we organize and find our commits using branches, so files and branches have a part. But the heart and soul of Git is the commit.
Git stores these commits in a big database full of Git "objects". There are four kinds of objects internally: blob, tree, commit, and annotated tag to be exact. But for the most part, humans only deal with the commit objects. These store our commits, and since the commit is the "unit of work" in a Git repository, as it were—because Git stores commits—that's the level at which we deal with Git.
Unfortunately for us humans, Git's commits are numbered, with big ugly random-looking numbers that have no rhyme or reason;1 they look like
1bcf4f6271ad8c952739164d160e97efd579424f, for instance. Humans can't deal with these, so we just don't. Git provides for us by adding, separately from the big database of commits and other objects, a smaller database of names, including branch and tag names. A name likerefs/heads/mainorrefs/heads/masteris a branch name and will turn into the big ugly hash ID for us. So we can give Git a branch name, and Git will fish out the right hash ID, and use that to fish out the right commit.That's how and why we can use names like
feature1andfeature2. These names mean less than nothing to Git. Git doesn't really need them, and does not care how we spell them or what we do with them—we can rename them whenever we like for instance—and just provides them for us to use so that we don't have to memorize hash IDs. Git turns the names into hash IDs and finds the commits by their hash IDs and gets to work. So Git isn't using your branch name at this time: it's only using the commit itself, which Git found by its hash ID.This is how and why Git is all about the commits. We use branch names, but Git mostly doesn't. I say mostly because we're about to hit the point where Git does use branch names, to keep them up to date for us.
1Technically, they're just outputs from some cryptographic hash. Traditionally, Git uses SHA-1, but Git now supports SHA-256. There's ongoing work on making this more useful: for now, if you're a Git user as opposed to a Git developer, you'll just be using SHA-1.
What's in a commit
Remember that each commit in Git has one of those big ugly hash IDs. These are unique to that particular commit: no other Git commit, anywhere, is ever allowed to use that hash ID again.2 So if we take any two Git repositories out on leashes to the Git-repository-park (like taking the dog to the dog park), they can go sniff each other and decide which commits one has that the other doesn't, just by looking at hash IDs. Then one Git repository can get the other's commits, knowing that the first repository is missing those commits based on the hash ID alone.
We won't worry about this exchange-of-commits stuff here—that's the distributed part of Git being a distributed version control system—but it's important to keep this "unique hash ID" thing in mind as we look at what's inside any one given commit:
Each commit has a full snapshot of every source file, frozen for all time. If you have that commit, you have that snapshot. (If you don't have that commit, you might still have that snapshot—two or more commits might have the same snapshot—but you can't tell for sure: you have to have the commit.) The files in these snapshots are stored cleverly, with compression and de-duplication, so that the repository doesn't grow tremendously fat as we add new commits: most files are either total duplicates, or near-duplicates, of some earlier file, and this gets compressed down to nothing at all (a duplicate) or very little storage. We'll skip all the details, even though they're mesmerizing.
Meanwhile, each commit stores some metadata, or information about the commit itself. This includes the name and email address of the author of the commit, for instance.
The metadata in any one commit aren't necessarily huge (or small—your commit log message goes here, so if you write a really big one, it's here, occupying that space). But besides your name and email address, Git adds, to each commit, a list of parent commit hash IDs. This list is usually exactly one entry long.
What this means is that given the latest commit, we can have Git work backwards to find the second-latest commit, all on its own. Let's draw this. Suppose the latest commit has some big ugly hash ID that we won't try to guess but will just call
H, for "hash". We'll draw it like this:Commit
Hhas a little arrow sticking out of it, in our drawing. In reality, commitHhas a hash ID in its metadata, and that hash ID is the hash ID of the commit that comes beforeH. Let's call that commitGand draw it in:Of course, commit
Gis a commit, so it has a hash ID "arrow" likeH's. CommitG's arrow points to the commit that comes beforeG, which we'll callF:Commit
Fhas an arrow that points to its parent, and so on. So all we have to do, to have Git find all the commits, is somehow have Git find the latest commitH.Well, we just said earlier that a branch name like
mainorfeature1stores a hash ID. So this name points toH, just likeHpoints toG, and so on:One of the tricks that Git has to use, to keep the hash IDs working, is that all parts of any commit are frozen for all time. That includes the hash IDs that point backwards to previous commits. So
Hwill always point toG, which will always point toF, and so on. As such, I get to be a little lazy about drawing the arrows that connect commits to each other.This is not the case for branch names. The arrows in a branch name move.
2This constraint gets relaxed a bit in two Git repositories that never meet. As long as they don't meet, the two separate Git repositories are allowed to accidentally re-use a hash ID. This doesn't really happen in practice anyway, especially because it's humans who control which repositories eventually meet. Git doesn't know what those crazy humans will do in the future, so Git just tries to ensure that every commit gets a totally unique hash ID.
Making a new commit
To make a new commit, we check out some branch with
git checkout, or usegit switchto "switch to" the branch, thus "checking it out", to the same effect. Git remembers which branch name we used, by attaching the special nameHEADto one of the branch names in the repository. At this particular point we only have one name,main, so there's not that much need for it, but we have this:Let's create a new branch name now though. Let's create the name
feature1. This name must point to some existing commit. We can pick any commit in the repository, but typically we'll pick the latestmain-branch commit (or maybe the latestdevelop-branch commit or something, but for now we only havemainanyway). So the new namefeature1will also point to commitH:Note how all the commits are on both branches. Both names select commit
Hright now. That's about to change, though.We now use
git switch feature1orgit checkout feature1to select the namefeature1with which to select commitH. This changes our picture:We have not changed commits, so we are working with the same files, but we have changed which branch name we are using to find commit
H.Now we do our usual thing of modifying and
git add-ing andgit commit-ing. When Git is done making the new commit, the new commit holds a new snapshot of all of the files (frozen, compressed, de-duplicated, and read-only), and the new commit—which we will call commitI—has commitHas its parent:But—here's Git's little magic trick—Git has stored
I's hash ID in the current branch, the name to whichHEADis attached. So if we include the branch names in the picture, we now have this:New commit
Iis only onfeature1right now. Commits up throughHcontinue to be on both branches. If we make another new commitJ, we get:If we now
git switch mainorgit checkout main, we get:Git will remove, from our work area, the files from commit
J, and put in place the files from commitHinstead. (We haven't covered the working tree and Git's index here, and for space reasons, we won't.)Let's now make a second branch name,
feature2, that also points to commitH, and then switch tofeature2:As we make new commits on
feature2, they causefeature2to grow, just as happened withfeature:So that's a branch
This is really what branches, in Git, are about. We call the latest commit, as found by some branch name, the tip commit of the branch. (That's an official Git term.) We call that commit plus some string of earlier commits "the branch", and we also call the name "the branch". So when someone says "branch feature1", they might mean:
feature1; orJ, the tip commit offeature1; orI-J, which are only onfeature1right now; orJ, includingHand other commits that are also onmain;or perhaps some other thing. The word branch, in Git, is rather badly overused, and it's often a good idea to be more specific (you can say "branch name" or "tip commit" or "set of commits", for instance).
Merging
When we have diverging branches like the above—
feature1andfeature2diverge from commitHand end at commitsJandLrespectively—we often later want to combine work. That is, given:we'd like to get a single commit
Mthat has, as its snapshot, a set of files that:H, except ...IandJ, andKandLtoo.We often achieve this in Git using
git merge.To run
git merge, we:git mergeand give it the other commit's hash ID, usually by branch name.So we run
git switch feature1 && git merge feature2, or maybegit switch feature2 && git merge feature1.When we do this, Git will:
HEADand the branch name);Our goal, remember, is to combine work. Commits, however, don't contain work. They contain snapshots: complete archives of the entire source.
So, by finding a "best" common starting point—which in this case is obviously commit
H—Git can simply compare the files in commitHwith those in commitJ, to see what changed onfeature1.The output of this comparison is a line-by-line set of changes to file-by-file changes for any changed files in the two commits. Files that didn't change at all—that stayed the same from
HtoJ—aren't mentioned. That's what you'll see if you rungit diffon commitsHandJ, and that's whatgit mergewill see.Having figured out which files changed, and what changed in them, from
HtoJ, Git now runs the same kind of comparison, from commitHto commitL. As before, this finds out which files were changed and what changed within those files, line-by-line.The
git mergecommand now combines the changes. If "we" (H-vs-J, if we're onfeature1now) touched some file and they (H-vs-L) didn't, Git keeps our changes. If we didn't touch the file but they did, Git keeps their changes. If we both touched the file, Git tries to combine our changes.You get a merge conflict if and when we and they made different changes to the same source lines. You also get a merge conflict if we and they touched two line ranges that "touch at the edges" (abut). All this means is that Git is not sure about how to combine these changes. Your job as the programmer is to provide the correct combination.
That's what a merge conflict is about: Git isn't sure if taking the changes line-by-line is right. If you don't get a merge conflict, Git is sure that taking the changes line-by-line is right, even if it isn't actually right. Git is not smart here: Git is following ridiculously simple rules about text lines.
Once you fix the merge conflicts, or if Git has no merge conflicts, Git makes a new commit as usual. The one thing that is special about this new commit
Mis that instead of just the one parentJ, it has a second parent,L: the commit we said that Git should merge. Git stores the new merge commit's hash ID into the current branch name as usual, so we get:Because commit
Mconnects backwards to both commitsJandL, commitsK-L, which used to be only onfeature2, are now on both branches. CommitsI-J-Mare still only onfeature1here becauseLis still the tip commit offeature2, and Git can only work backwards, not forwards. So fromLwe go backwards toK, thenH, thenG, never seeing commitsI-J-M.Trivial merges
Sometimes we make merges that are really easy for Git:
We run
git switch mainand thengit merge --no-ff feature(the--no-ffis required to make this act like GitHub's "merge" button; it defeats a short-cut that Git normally takes here). Git finds the common starting commit, but that's commitG, which is also the tip commit ofmain. So a full merge consists of:Gagainst the snapshot inGto see what changed (nothing);Gagainst the one inIto see what changed;G, getting the same snapshot that is inIalready; andThe result looks like this:
(I called the merge commit
Magain, for Merge; in reality it gets a unique hash ID, like every commit.) The snapshot inMis guaranteed to match the snapshot inI, becauseG-vs-Gnever has any changes to add, whileG-vs-Ialways has the changes to add that result in theIsnapshot.If we don't prevent Git from doing so, Git will turn this trivial merge into a fast-forward operation, which isn't really a merge at all. Instead of a new commit, we just get this:
That is, Git just scoots the name
mainforward two hops, like fast-forwarding a tape recorder. It's literally just a checkout that drags the name—in this case,main—forward. Git swaps out the commit-Gfiles in our working tree for the commit-Ifiles. No merge is needed, so no merge happens; no merge conflict can happen, so no merge is needed.Force the merge with
--no-ff(no fast forward) and the merge happens and you get a new merge commit. Sometimes you want this (for release tagging purposes for instance) and sometimes you don't care. To know whether you want it, you need to know that Git is all about commits. A new commit gets a new, unique hash ID, which we can tell apart from every other commit. "Re-use" a commit like fast-forward does and we don't get a new commit and therefore it's the same old commit as before.Cherry-picking
Suppose we have:
We suddenly realize that commit
J, say, fixed a nasty bug that we need to be fixed inbr2. We could copy and paste the code changes from that commit, but if that commit exactly fixes the bug, it would be nice if we could get Git to *compare the commit before that commit—commitI—to that commit to see what changed. That is, we'd like Git to diff the snapshots inIandJto see which files had what changes made.Given that Git can do that easily, we have Git do it. Then we have Git apply those same changes to our current versions of those files in commit
L. We could have Git just literally make the same changes to the same lines, but what happens if, say, the fix forthing.pyis on line 45 in their version, but we added a new function near the top and the fix goes on line 70 in our version ofthing.py?Well, we can have Git apply the fix a lot more cleverly. If we have Git diff commit
I's version ofthing.pyagainst commitL's version ofthing.py, that will show our added function, and that what was line 45 is now line 70. So Git will be able to apply their change to line 70, which is the correct line.But hang on a minute. We're having Git compare the file in snapshot
Ito the file in snapshotJand also to the file in snapshotL. What were we doing a moment ago withgit merge? We were doing the exact same thing withgit merge. Merge compares snapshots and combines changes.That's exactly how cherry-pick works: it's literally a merge operation, with the merge base being forced. We're cherry-picking from commit
J. CommitJ's parent is commitI. So Git uses commitIas the merge base, commitJas "their" branch-tip commit, and our current commitLas "our" commit. Git makes the usual diffs, and then combines work as usual. The thing that's not like a merge is that, once Git is done with the combining-work part, Git makes an ordinary (non-merge) commit:New commit
Nwill make the same changes toLthatJ-vs-Imade toI, adjusted as necessary. The cherry-pick code uses the merge engine to achieve the "adjusted as necessary" part.Cherry-picking can therefore get merge conflicts during the cherry-pick operation! That's normal, and as with
git mergeit is nothing to be afraid of: you, as the programmer, merely need to supply the correct result. Whatever you tell Git is the correct result, Git will believe you: that's what goes into the new commit's snapshot.If you had to modify the code when you cherry-picked it, it's very likely that you'll get a merge conflict later if you merge
NandL, for instance. That's because we took a change (to some set of lines) and modified the change, so later, when Git goes to combine changes, it will see slightly different changes that affect the "same lines", as it were. We'll have to resolve another merge conflict later. There's no guarantee that we won't have to resolve a merge conflict later even if this doesn't happen, though. Mostly, we just let the merge conflicts happen as they do, and fix them up manually. That's part of the job of being a programmer.