3. Git 101
Programming Project 2022/23

3.1. Introduction

Learning goals

In this module, we will learn about

  • version control systems,
  • git and and its importance,
  • basic git commands,
  • working with branches, and
  • working with remote repositories.

Sources

The material in this module has been adapted from:

  1. Jon Loeliger, Matthew McCullough. Version Control with Git, 2nd Edition, 2012, O'Reilly Media, Inc., ISBN 9780596520120.

  2. Scott Chacon, Ben Straub. Pro Git, 2nd Edition, 2014, Apress.

Think about how you code?

Do you just open an IDE, and start coding? Were do you save your source files?

Also, what would happen if:

  • you permanently delete a file by mistake?
  • you change your code a lot and then regret your decisions?
  • you need to work with other developers, but:
    • you are all working on the same part of a system?
    • you are in different places?
  • you want to publish your code for others to reuse?
  • your hard-drive/ssd crashes?

To avoid the disastrous situations implied by these questions, version control systems were invented!

VCS: Version Control System

  • A version control system (VCS) is a software that manages and tracks different versions of files over time.
  • Changes are registered with metadata, which usually includes:
    • an author,
    • a timestamp, and
    • an explanatory message.
  • A VCS can track any type file, but it is mostly used for source code.
  • Its main features are the following.
    • Allowing users to develop and maintain a repository of content.
    • Providing access to historical editions of each file.
    • Recording all changes in a log.

VCS Types

Local Version Control Systems

lvcs
Figure from https://git-scm.com/

  • Main benefit
    • Being able to do version control.
  • Main drawback
    • No support for collaboration.
    • Single point of failure.

Centralized Version Control Systems

cvcs
Figure from https://git-scm.com/

  • Main benefits
    • Being able to do version control.
    • Supports collaboration.
  • Main drawback
    • Single point of failure.
    • Project history is only visible to the server.

Distributed Version Control Systems

dvcs
Figure from https://git-scm.com/

  • Main benefits
    • Being able to do version control.
    • Supports collaboration.
    • Redundancy.
    • Availability.

Popular Version Control Systems

Multiple VCS have been developed over the years.

Git overwhelmingly superseded the other systems and became the de facto standard in the software industry.

Here is some data from OpenHub, a public directory of free and open-source software, on the adoption of version control systems.

And here is some data from Google Trends.

Git is a free and open-source VCS

  • Created by Linus Torvalds
  • See git's first commit here
  • It is a distributed version control system.
  • It provides full history repository.
  • There are no network requirements.

About Git

Before we get our hands dirty, let's take a whirlwind tour on how Git works.

How Git stores data

Traditional VCS systems store data as a list of file-based changes, as in what is called delta-based version control:

delta-vcs

Git stores complete versions of each file, as in a stream of snapshots:

snapshot-vcs

Almost every operation is local

Most operations in Git need only local files and resources. For example,

  • browsing the history of a project,
  • comparing different file states, and
  • creating branches of work.

When you clone a Git repository, you get its full history!

Git has integrity

  • Everything in Git is checksummed before it is stored and is then referrenced by that checksum.

  • It's impossible to alter the contents of a file without Git knowing.

  • Git uses SHA-1 hash for checksumming, a hashing algorithm that generates 40-character strings composed of hexadecimal characters (0–9 and a–f) and calculated based on the contents of a file or directory structure in Git.

  • A SHA-1 hash looks something like this:

    24b9da6552252987aa493b52f8696cd6d3b00373

Git generally only adds data

  • When you do actions in Git, nearly all of them only add data to the Git database.

  • You can lose or mess up changes you haven’t committed yet, but after you commit a snapshot into Git, it is very difficult to lose, especially if you regularly push your database to another repository.

  • This makes using Git a joy because we know we can experiment without the danger of severely screwing up things.

File states

  • tracked: Git is aware of the file.
    • modified: you have changed the file but have not committed it to your database yet.
    • staged: you have marked a modified file in its current version to go into your next commit snapshot.
    • committed: your data is safely stored in your local database.
  • untracked: Git is not "watching" the file.



From this StackOverflow question.