Instruction
Quick! What did you wear February 2, 2022? Can you recall every piece of that outfit exactly?
Or what if your favorite cousin said, “I loved the cake you made for the party last month! Can you make the exact same cake for my birthday?”
Or if you recently painted your room but regret it and want to change it back, could you find the original color exactly?
Without a record-keeping system, it would be difficult to do any of these perfectly. You’d have to remember exactly what you wore, the exact recipe, or the exact color of paint.
Granted, it’s far too tedious to record every detail and activity in your life, but these details are extremely important in a software development project. Fortunately, there’s a way to easily track these details that’s flexible and simple: Use a version control system.
Definition and Benefits
A version control system is software that records and organizes all changes made to a project. Version control systems work on files. These can be text files, such as code or plain text, or binary files, such as images or executables. Although version control is commonly associated with software development projects, you can use it with any project made of text and binary files.
For Solo Developers
Using version control lets you track the history of changes to documents. It allows you to revert to previous versions of any document after an error is introduced or an unintended change happens. It uses “branches”, or alternate copies of changes to files, so you can experiment with ideas without changing the “main,” “stable” or “live” branch of your project before you’re ready to do so.
For Teams
Version control also allows teams to work on the same files safely. If two or more people change the same file or even the same lines of code, version control allows you to safely merge changes together. Teams often use different branches on a project to work in tandem and then merge into the main branch when ready.
Version control can also enforce a structured approach to development through meaningful commit messages, organized development on branches, and peer code reviews on merging.
Types of Version Control Systems
Version control systems come in two types:
- Distributed Version Control Systems
- Centralized Version Control Systems
Both offer the benefits of version control but use different architectures, and each has its own advantages and challenges. You might see either type depending on where you work or what’s common in your industry.
Centralized Version Control Systems (CVCS)
In a centralized system, all users connect to a central server, where the document repository is located. Users must be connected to the server to see the version history or to commit new changes. This may require an internet connection or connection to a local intranet or virtual private network (VPN) at all times.
This allows for centralized control over who can make what kinds of changes to the codebase. First-time users might find it easier to understand how the system works, beacuse usually there’s only one code base and no branches. Although some CVCS do support a means of “branching” code, it’s usually not as complex as decentralized systems are.
Because there is only one copy of the repository, centralized systems can manage large codebases efficiently. However, this also means the server is a single point of failure in a centralized system. If the server is down, no one can browse the full code base or commit changes.
Subversion and Perforce are two common centralized version control systems.
Decentralized Version Control Systems (DVCS)
The main difference with a decentralized system is that each user has a complete copy of the repository stored locally on their computer. This includes all of the project’s change history too. Users can work with the full code base even when they aren’t connected to a server or network. It’s also easier for users to create branches and merge changes because everything is local to their own computer.
Decentralized systems offer more redundancy because each user has the full code repository. Consequently, there’s not a single point of failure. Even if the host server goes down, users can still browse the full file history and commit changes locally. They would only have to wait to “push” those changes to the host when it’s working again.
However, this redundancy does come with a cost. Each user has the entire code base and history on their computer locally, so this takes up much more disk space.
First-time users might also find decentralized systems’ use of branching and merging to be more difficult to understand. Because creating branches is easy to do, there’s often many branches on large projects.
Two of the most popular decentralized systems are Mercurial and, the one we are going to use in this module, Git.
Git for Version Control
Git was developed by Linus Torvalds in 2005 for use by the Linux kernel development team. The team had been using a commercial DVCS called BitKeeper. However, some people in the community were upset that an open source project like Linux was not using an open source version control system. The company that made BitKeeper also increased the licensing cost to more than the kernel team wanted to pay, so let’s be honest, money also played a part in the decision!
Linus’s key goals for the new system included speed, the ability to support extensive branching, and that it was completely distributed.
Since its development, Git has gone on to become the most widely used version control system in mobile software development and open-source projects. You’re sure to encounter it, so it’s a great idea to develop a good understanding of it!
Using Git
Git is included alongside the basic operating system on many computers. Git doesn’t run all of the time; it runs only when you issue commands to it. When you have a directory that Git manages, there will be a hidden folder that Git uses. That folder contains the entire repository. This hidden folder, and often one or two hidden files, is the only sign that Git is managing the changes in a directory. If you delete the folder, Git will no longer manage the directory and the history of the changes is lost. The only way to get it back is to copy the repository from somewhere else.
It’s time to examine some of the concepts and vocabulary you’ll encounter when working with Git.
Client
The client is the tool you use to interact with the Git system. On its own, Git is a command line driven system. However, there are many different graphical Git clients that make it easier to work with Git. Most modern integrated development environments (IDEs) also have graphical support for Git built in.
In these lessons, you’ll primarily learn basic Git commands in the Terminal, but you’ll also use the github.com website and GitHub Desktop for a graphical experience.
GitHub Desktop is not a full-featured Git client. Rather, its primary purpose is to be a desktop utility for use with github.com. So there are a number of Git commands that it does not support. However, it’s enough to show you the common ways graphical clients interact with Git.
It’s important to remember that all graphical Git clients are just executing the standard Git commands under the hood. So feel free to pick Git software that you like and learn to use it. Or use Terminal exclusively. The choice is yours to make.
In either case, you should know at least some Terminal commands, because not all commands are supported by graphical software, or you might be using an automated workflow that requires Terminal commands in scripts.
In the demo video for this lesson, you’ll see both Terminal and GitHub Desktop in use. If you’d like to follow along, you can install GitHub Desktop using the link above. If you have difficulties, don’t worry. The next lesson will give you a step-by-step guide to configuring Git and GitHub Desktop on your Mac.
Repository
The files that make up a project and the history of all the changes is called a repository. On your computer, it’s contained in that hidden Git folder. This is the local repository. If you’re the only person working on a project, this might be the only repository. More often, there is a shared location where you and other collaborators keep a copy of the repository. This is the remote repository. When you use a platform like GitHub, Bitbucket, or Azure to host your code, the copy at the host is the remote repository.
Working Directory
The directory that has your project and most recent changes is the working directory. The current files and changes to the files in this directory are not yet part of the repository. You can make any changes you want, including deleting files, without impacting the repository. When you switch the repository to a different branch or checkout a particular commit, the files in the working directory change to match.
Commit
When you want to add files, delete files, or preserve changes to files you’ve made in the working directory, you commit those changes. This makes them part of the repository and its change history.
When you commit changes, Git requires you to write a short note about the changes. This is the commit message. Writing a meaningful commit message is important to describe to other users (and your future self! ;]) what you did. If those changes cause problems, you can easily revert them.
Most developers make many commits a day as they work on something. However, that doesn’t mean you should just make a commit every few minutes. All of the changes in a commit should support a coherent change to the project. So you might make a commit after you’ve completed writing a particular function, added a button to your app’s interface, or fixed a bug. This makes the history of the repository easier to understand and helps future developers when they’re tracking down a bug or trying to understand how the project has evolved.
Branching
Usually, the first thing you do when you’re about to start working with a project managed by Git is to create a branch. Once you’ve made a branch, any changes you commit to the repository exist only on this branch. This gives you the freedom to make any changes you need without worrying that you’re going to affect the other branches in the repository. When you’re ready to apply the changes from one branch to another branch, you merge the branches.
All repositories start with one branch, usually called main
, that is a clean version of the project. The other branches can all trace their origin to main
, and when changes are ready to be released, they’ll be merged into main
. Sometimes, you then delete the other branch; other times, you’ll continue to work with it, merging to main
periodically.
Some people create a new branch for each new feature. Some teams might create a new branch for each work task. Some developers create a new branch every morning. There isn’t a “right” way to create branches, and every team decides on a branching and merging strategy that works for them.
Distributed Teams
When you’re working with local and remote repositories, there are a few more things Git can do. When you copy a remote repository to your computer, you are making a clone. As you’ve seen already, this will be a copy of the entire repository and its history. When there are changes to the remote repository, you can pull them to your local copy. When you have changes that you want to send to the remote repository, you push them.
Feeling a bit overloaded with all of the terms and concepts?
Don’t worry if you feel like there are a lot of terms and concepts. You’ll learn more about these and get hands-on practice throughout this module!