This post is specifically about a particular command in git. If you are new to git or don’t know what git is, then this post may not make much sense to you. Now that the statutory warning is out of the way, let’s carry on. This is actually a continuation of my previous blog post. The command we are going to talk about is ‘git pull’. This command that brings the changes in the remote repository to where you keep your own code. This is done by bringing the local copy of the remote repository up to date first, and then merging the changes into your own code repository and possibly your working copy. A lot of people use it very frequently without thinking about the possible side effects. When I first started out with git, I didn’t really concern myself with these side effects either. But programming purists would say that ‘git pull’ is risky and should be used with caution. Why is that?
They say history repeats itself, but what if it’s nonlinear?
One of the biggest advantages of having a version control system is the history of the code. We can go to any point in time and revert the code back to that state. Every time you commit the code, it saves that version so that you can access it later. By default, the ‘git pull’ command is equivalent to running ‘git fetch’ followed by ‘git merge’. It means that you bring your local directory up to date first, and then merge your changes with it. Let’s say you have made some changes and you haven’t pushed them yet. So if there are commits in the local repository that haven’t been pushed yet, the merge part of ‘git pull’ results in a nonlinear history. Now why is that? Well, since we have those commits that haven’t been pushed yet, it can’t fast forward to the current version on the remote repository.
Okay so what if we have a nonlinear history? What’s the problem here? The thing is that this whole nonlinear history situation makes it more difficult to review the code history because it’s harder to see how a merge affects the code base. Also, it may disrupt continuous integration schemes. If an auto-build has a first-parent path relationship, then it cannot work because there no sequential relationship anymore.
Reading reviews makes you thin-skinned
One of the most common practices in software development is code review. Whenever somebody makes some changes to the code, someone else should review it before it’s accepted. This is more robust against software bugs. Let’s say that you are in the middle of making some changes and someone else wants you to review some of their commits. Now the ‘git pull’ merge or rebase operation modifies the working directory and index. This means that your working directory and index must be clean.
That seems like a straightforward thing, right? We can use ‘git stash’ and then ‘git pull’. This way, we can review that person’s changes. Now what do you do when you’re done reviewing? To get back to where you were, you have to undo the merge created by ‘git pull’ and apply the stash. We need something that doesn’t modify the working directory, like ‘git fetch –all -p’. You can pause what you’re doing and review someone else’s commit without worrying about stashing or finishing up the commit you’re working on. If we use ‘git pull’, we don’t have that flexibility.
There are good surprises and there are bad surprises
Another risk associated with ‘git pull’ is the surprise element that comes with it. There’s no way to predict what the working directory or index will look like until ‘git pull’ is done. It’s like jumping into a dark well and hoping that you won’t land on something really sharp and pointy! There might be merge conflicts that you have to resolve before you can do anything else. Merge conflicts refer to changes made by different people on the same part of the same file. This process is usually not smooth. It might introduce a large log file in your working directory because someone accidentally pushed it, or may be it might rename a directory you are working in, etc. The command ‘git remote update -p’ or ‘git fetch –all -p’ allows you to look at other people’s commits before you decide to merge or rebase, allowing you to form a plan before taking action.
Come hither, my remote branch, let me rebase onto you
Whenever we use ‘git pull’, there is a merge commit that gets introduced. This is not exactly desirable, because it doesn’t really tell us anything. Each commit in your repo should indicate a specific change made to the codebase. Merge commits are like noise that you just weed out. So what people do is they use ‘git pull’ to bring in the latest changes, and then they use ‘git rebase’ to eliminate the merge commit that ‘git pull’ introduced. Pretty neat! We don’t have to worry about running two commands every time. Git has some config options where we can tell ‘git pull’ to perform a rebase instead of a merge.
If you have an unpushed merge that you want to preserve, neither a rebase-pull nor a merge-pull followed by a rebase will work. Why is that? Well, this is because ‘git rebase’ eliminates merges without the –preserve-merges option. The rebase-pull operation can’t be configured to preserve merges, and a merge-pull followed by a ‘git rebase’ won’t eliminate the merge caused by the merge-pull.
What goes around, comes around
Let’s say someone on the team rebases a branch and force pushes it. Rebasing is the process of moving a branch to a new base commit. Now you may ask, why would we want to do that? In general, this shouldn’t happen. But it’s sometimes necessary when you accidentally commit and push something. When somebody else pulls the code from the repo, they will not be subjected to your mistake. It’s a good thing to have! The merge done by ‘git pull’ will merge the new version of the upstream branch into the old version that still exists in your local repository.
Also, ‘git pull’ doesn’t prune remote tracking branches corresponding to branches that were deleted from the remote repository. For example, if someone deletes branch ‘mybranch’ from the remote repo, you’ll still see origin/mybranch. This leads to users accidentally resurrecting killed branches because they think they’re still active.
I understand what the problem is, but is there an alternative here?
A lot of people use ‘git pull’ and it just works fine in most cases. This is because the team is usually small or people work on totally different parts of the codebase. But as the team grows and more people start working on the same part, it becomes extremely critical to manage your codebase diligently. Instead of ‘git pull’, git experts say that it is better to create and use the following ‘git up’ alias:
git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'
This alias downloads all of the latest commits from all upstream branches (pruning the dead branches) and tries to fast-forward the local branch to the latest commit on the upstream branch. If successful, then there were no local commits, so there was no risk of merge conflict. The fast-forward will fail if there are local unpushed commits, giving you an opportunity to review the upstream commits before taking action. This still modifies your working directory in unpredictable ways, but only if you don’t have any local changes. Unlike ‘git pull’, ‘git up’ will never drop you to a prompt expecting you to fix a merge conflict.
———————————————————————————————————
Excellent, finally I understand why I should use git fetch and then merge instmergef git pull, a friend of mine told me but I didn’t get it very well, now I’m gonna apply this. Thank you
Excellent, finally I understand why I should use git fetch and then merge instmergef git pull, a friend of mine told me but I didn’t get it very well, now I’m gonna apply this. Thank you