"Please Let Me Merge Before I Start Crying": And Other Things I've Said at The Git Terminal
videoimage: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
As I've grown into my role as a data scientist, I've often said this phrase every time I felt victimized by Git. But today, I come to you from the other side of the elusive second learning curve that comes with using Git. Because we all have that initial challenge of just learning to use Git. But then you have an additional mountain to climb when we're learning how to handle mergers and conflicts so you can use Git collaboratively.
This talk today is geared towards those of you who might feel you're stuck right here in the middle. Maybe you figured out the basics of Git and have even done a few mergers already, but might be a little hesitant like I once was to even deal with merge conflicts. Understanding Git mergers and conflicts felt impossible at first. But it all started to make sense when I realized that using Git collaboratively is a lot like driving.
The road trip analogy
So let me give an example. Last spring, I took a long road trip that I've done many times. However, this time was different because it was the first time I was doing the drive with my teething baby and my well-meaning, but slightly overbearing mother, Hinto. I love you, mom. For me, this scenario was similar to my experiences using Git collaboratively.
Because just like I make compulsive packing lists before my road trips, yes, this is real, I also make compulsive task lists to prepare myself for Git merges. Getting my precious cargo from point A to point B on this trip felt a lot like getting my precious code from my local computer to the Git repository via Git merge. My supportive husband that I worked well with when parenting on this trip could be considered any one of my awesome co-workers that I work with outside of Git. Even having my mother in a car providing maybe a little too much love and care to my baby kind of felt like Git wanting to help me with my code, but actually creating more work for justified reasons.
Now, needless to say, I was stressed on this trip, y'all. I knew how to take care of my baby and drive, but I suddenly felt like I had to learn these things all over again. And honestly, I felt the same with Git. To understand how to work collaboratively with Git, I realized I needed to have a better understanding of it, because that lack of knowledge led me to say things like, please let me merge before I start crying.
Git basics
Phrases like these stem from desperation and frustration because we're trying to work with something we don't understand. So before we talk about merges and conflicts and collaboration, let's talk about some quick things we should all understand about Git.
So first things first, Git is not the same thing as GitHub, and these terms are not interchangeable. Now, later on, I will briefly touch on how I use GitHub in my own personal workflows, but know for now that Git is the software that allows us to implement version control within our work, and GitHub is the web-based developer platform that uses the Git software.
Now, the fact that a lot of people mix up these terms is just proof that we have a lot of reliable ways to interact with Git. And if you're an R user like myself, we have three main ways that we can. The command line interface terminal, your favorite IDE, whether that is RStudio or Positron of extensions, or a third-party client like GitHub Desktop, for example. Now, what's important is that you can do whatever you want. GitHub is not the only developer platform out there. Your data science card is not going to be revoked for not exclusively using the terminal, okay? This do what makes sense for your current skillset and situation.
What is a Git merge?
So, a Git merge is when we join two or more development histories, also known as branches, together. Now, they can come in different flavors, but we can usually see them via pushes when we are pushing our new work up from our local computers into a remote repository, or the reverse, when we're pulling down new work from a remote repository into our local computers. Now, no matter the type of merge, the action of merging is important because it allows us to safely modify work when collaborating with others. And honestly, on high level, that's all a Git merge is.
And I don't know, I think that seems simple enough. So, if a Git merge is simple in theory, why is a Git merge scary? Well, I'm going to argue that it's not. Like, no, really, it's not scary. Now, I will say that if you feel you are afraid of Git merges, might I suggest that you're actually afraid of Git merge conflicts. And if you are, that's okay, this is a judgment-free zone. But I believe in order to start conquering our fears, we have to learn about what we're dealing with. So, let's take a moment to understand what exactly a Git merge conflict is.
So, a Git merge conflict occurs when we have competing changes that are made to the same line of a file, or when someone edits a file and someone else deletes that same edited file. And to further contextualize this, we can simplify this even further. We can kind of look at it as having only two types of conflicts. So, we can either have a content conflict, which is the conflict in the actual lines of code, or a structural conflict, meaning that the conflict is dealing with the modification or deletion of a folder itself.
A real-life merge conflict example
And honestly, that doesn't seem too scary at first glance either. I mean, there's only two types of conflicts to worry about. Like, how bad could it be, right? But there has to be a reason that people are so afraid of merge conflicts. So, let's actually look at a simple scenario that was similar to something that happened at my job to see if these things are really scary or not.
So, one day, someone asked me to help resolve a Git merge conflict. So, what happened was, we had two coworkers that created two different local branches off of one master repository. Now, at my job, all of our repositories have a simple text configuration file with data cut dates and options in them. So, when both of these coworkers started on two different local branches, they had the same version of the configuration file.
So, time goes by, coworker number one actually finishes their work, they modify the configuration file, push it up to the repository, it merges in, no problem. Unfortunately, coworker number two is actually entering iteration hell and is going into there with the original version of the configuration file. So, more time passes by, coworker number two actually reaches the depths of iteration hell, and they finally come out with a changed configuration file. Now, of course, when they try to merge in all that work, they get a merge conflict.
Okay, so we have the scenario. So, let's see what this conflict looks like in the code. So, here we have both versions of the code, the copy from the master and the copy from the head, which is coworker number two's local active repository they're trying to, or branch that they're trying to merge in. So, if you can see here, we have a content conflict on line five because we have different dates for that main data cut date. So, when coworker number two attempted to merge in their work, the resulting conflict they got looked like this.
So, if this is giving you anxiety, that's okay. But what may help with your anxiety is knowing that every conflict, no matter how simple or complex, only has three components to it. So, the top of the conflict is usually the branch or the commit that you are on and trying to actively merge into something else. You'll see the starting conflict marker, which looks like these greater or less than signs, sorry. And then you'll have conflicted code under that that's on your branch. Next, you'll see the conflict divider, which separates your code or your version of the code from the version that you are trying to merge into. In this case, it's the master repository. Then after that conflicted code, you will see these greater than signs, which signifies the end of the conflict. Any code underneath that is code that is the same on both versions of the file, so left alone until you get another conflict marker.
Resolving conflicts
Now, now that we know what happened and what the conflict looked like, parts of the conflict, how do we resolve this conflict? Well, when coworker number two came to me and asked me that, I asked them, well, which date do you want? And their response was, oh. Coworker number two didn't realize that they just had to decide which date needed to move forward. They saw the conflict markers, got spooked, and actually broke the first rule of dealing with conflicts, which is to don't panic.
Now, I know that's easier said than done, but the biggest reason to not panic is usually conflicts aren't that big of a deal. I know this was a simple one I've shown, but a lot of my conflicts at work kind of look like this. Another reason you should not panic is because hopefully you are the expert of the code you're writing. Think back to the example. Coworker number, sorry, coworker number two enlisted me for help, but they were actually the expert. They knew the answers to move forward. I didn't.
Now, I will acknowledge that some people may panic when they have a conflict because they may realize that they can't do anything with Git when it's in a conflicted state. Now, if you use RStudio like myself, excuse me, if you use RStudio like myself, we actually have two ways that we can walk away from a conflict if we need to. So to my knowledge, there's no pretty icons for like a boarding and merge in RStudio. So you would have to open up that terminal tab and just copy in this command, git merge dash dash abort. And what that'll do is act as a time machine to get you right back to the point before you try to merge. And any work you had at that point saved will be safe. You can also use any third-party programming to have it. This is a snapshot of GitHub desktop. You can simply open it up and easily hit the abort merge button.
Now, when you are finally ready to tackle your conflicts, the second thing you should do is assess the damage. So, like I said, most conflicts are not that bad. They can be very simple. We just need to get some more information about it. And we can do that in various ways. So you can use the terminal, obviously, and you can copy in the command, git status, and you'll get this cute little report that shows you the unmerged files, the changes that are staged, things like that. You can also use the RStudio interface. If you open up that git pane, you'll see these pretty little icons next to all of the changes in your environment. And if you don't know what those icons mean, here's a rundown of them. But when you're looking for your unmerged files, you're gonna go for that RHU, because any unmerged files are files that have a merge conflict in them. So you can quickly graze it and see what files need attention. You can also use a third party. Just as easy as it is to abort a merge, you can also try to use it to resolve your conflicts as well. And if you're lucky enough, you can even use a developer platform, because it'll be easy enough to open up in the web browser if it allows you to do so. That means your conflict is easy.
Now, after you've realized you're not panicking and you've gotten your information about the conflict, the next step is to just choose your own adventure. Now, what do I mean by that? Well, let's think back to the example. When I asked coworker number two which date they wanted, they didn't realize an important concept. They didn't realize that, yeah, git is smart enough to throw an error, throw in your conflict markers, whatever, but it's actually not smart enough to know how to resolve your conflicts for you. When you get a conflict, you are in control. You can choose either your code or their code.
Now, thinking back to my experiences working with conflicts, I felt that the conflict markers and code always made it obvious. You know, it was obvious that git wanted me to pick one or the other, mine or theirs. But something I had a hard time wrapping my head around was what git was actually expecting of me. Like, literally, how do you want me to pick my code or their code? And the answer to that is to manually edit the file. And we're not gonna talk about how long it took me to realize that, but we will at least see an example of what that looks like.
So this is from the example we just talked about. All git wants you to do is literally remove what you don't want, keep what you do, remove your conflict markers, and that's it. That's all it wants you to do. You'll save it, you'll restage it, hopefully put on a nice informative commit message, and you're done. That's it.
I never realized that that's all git wanted you to do. The situation here is very primitive. Because git is relying on you to tell it what it needs to move forward, you literally can fling yourself into the void. You can make something completely new. It doesn't have to be one or the other. You can leave the conflict markers in if you want. Code won't work, but you can do it. Hell, you can even throw in some ASCII art for a little razzle-dazzle if you want, okay? When you have a conflict to resolve, anything goes.
When you have a conflict to resolve, anything goes.
So you don't have to be scared when you get a conflict, but you do have to be careful, because that is a lot of power. Now, some of you are thinking, oh, well, Megan, we used an overly easy example, and my response is 20 minutes ain't a lot of time. Y'all can come up here next class and do a more complicated one, all right? But believe me, it doesn't matter how complicated it is, the process is going to be the same.
Merge conflicts are communication problems
Now, what makes that process hard or not actually has little to do with git, but everything to do with how you prep and plan your work. So when you finally realize that and plan accordingly, you'll start to say things like, maybe this won't be so bad. When you start to become receptive to exploring git merge conflicts, you may start to realize that merge conflicts actually are not git problems. They're actually communication problems, they're workflow problems, even knowledge gap problems.
Now, you can probably never completely avoid conflicts while working with others, but you can lessen the frequency and severity of them. Now, unfortunately, I can only give the highlights of these important concepts, but all the links in my slides are active, and you can even scan this QR code to take you directly to the SOPS repository and get all these resources I'm about to fly through.
Now, the first thing we want to touch on is communication. Now, the first thing I think of is talking with others, and this can look as basic as asking those basic who, what, when, where, and why questions about what you're working on. Those basic questions will encourage explicit conversations that can reduce miscommunications, and those conversations can also allow for your team to plan your workflow and code review process that you hopefully have in place.
Communication in your code is also very important, and that can manifest as naming and styling conventions and consistent formatting. Now, some people might be like, oh, that sounds kind of nitpicky, and you know what sucks? Having merge conflicts because someone indented something differently than you or white spaces here and there, so it is kind of important.
Another part of communication is developer platforms. If you use them, leverage them. Now, my team uses GitHub heavily, and honestly, I wish I could just give a whole talk on just this topic, but these are all things that I use in my everyday toolkit. So if you use GitHub and you don't know what some of these things are or you're not using them fully, I strongly encourage you to go and explore these later.
Workflow before, during, and after coding
Now, next is probably the most important aspect of conflict mitigation, which is addressing your workflow. So you can think of a workflow as three parts. So what you should be doing before you start coding, during your coding sessions, and after your coding sessions. Now, first off, before you begin, don't be bullied into getting flustered for emergencies. Okay, I feel very strongly about this. I will die on this hill that emergencies are not real. Okay, I don't care.
So after you're not flustered by that, you should check your Git environment. So that means checking to see if you have any staged commits that were left behind or any stashes that you might've forgotten about. You should also check the branch status. If you are going to be working on a branch that someone else has been working on, talk to that person that has it and make sure that it's ready to be pulled down from the repository. You should also be mindful of branch drift. And what that means is if you started a branch off of a master or main repository a long time ago, the longer it's been, the higher your chances are of having a conflict increases. So you should consider if it's easier to just start on a new branch or not. And probably the best way to mitigate a lot of conflicts is just to pull your branch down before doing anything. This actually would have saved coworker number two in our example we had.
While you are coding, you should commit often. Now, if you're working on large tasks, it is absolutely acceptable to break them down and have as many commits as you want. But you should use commit amendments. Using commit amendments will keep your Git history clean for yourself and for others. You should also push thoughtfully. And what that means is that you should really only push changes up when they're at a good enough spot to be potentially merged into your master or main branch. But at the same time, you need to be mindful of your branch's drift risk. As in, don't let perfectionism hurt you because your work may never be perfect, but it may be good enough to be merged in if it's documented and annotated properly.
Also, stashes are very great for unfinished work or those fake emergencies that I hear about. But that, you could also use stashes for if you have work that does not meet your team's workflow standards.
So after you code, a big way to avoid conflicts is to just leave no trace behind. So when you're done, you should make sure that you've pushed all your commits and that you've stashed any unfinished work that you're not gonna push. Now, before sending your code to someone else, whether it's to review or work on, you should check it yourself. You are reviewer number one. And if you can even do so, sleep on it. Doing this will hopefully ensure that your work is clear, reproducible, and documented. So anyone can pick it up, run it, and understand it.
Keep learning and building confidence
But sometimes maybe you just need to learn more about Git. You gotta get good. So when that happens, I suggest that you learn what you need to, how you need to. And I recommend that you don't allow yourself to kind of get overwhelmed by all the resources. You'll know when you reach a point when you need to seek out information. Because sometimes you may really just need to do more of Git.
If you're an RStudio user like myself, we're limited because we are not using the terminal, if you're not using the terminal. If you're just using the Git pane, very limited in what you can do. So you may find that you have to seek out that information and learn some more things that you can do in the terminal.
Okay, so I know this was a lot, but my hope would be that you'd gradually improve your communication, your workflows, and your overall knowledge, so you can get to a point where you see a merge conflict and say, I got this. Because just like with parenting, traveling, and most difficult things in life, there's no tried and true method to success that anyone can just follow. You'll find plenty of resources explaining the basic concept of merge conflict resolution, but it's a tricky thing to teach because while most conflicts are simple, all of them require some type of contextual knowledge. The only reliable methods for learning how to resolve merge conflicts is through preparation, practice, and exposure.
The only reliable methods for learning how to resolve merge conflicts is through preparation, practice, and exposure.
But for those of you that have been afraid of merge conflicts or are still afraid, I know that this chaotic scene is probably what it feels like when you run into a conflict right now. But if you take the time to let the dust settle and regain your bearings, you and your team can start preparing to work more efficiently. And eventually, each time you run into a merge conflict, it will be less scary.
And so suddenly, you'll realize that you don't have to plead with Git to let you merge anymore because we have confidence, caution, and care. You do have the power to control the journey that your work takes to get to its final destination. Thank you.
