GIT - Commands & HowTo's

mail

Git : (rename+commit then change+commit) vs (rename+change then commit)

Looks like this makes no difference anymore on modern versions of Git (but it used to ! 👴)... or I'm missing something
rename+commit then change+commit
workDir=$(mktemp --tmpdir -d testGit.XXXXXXXX); echo "$workDir"; cd "$workDir";
git init
echo 'hello world' > file1; git add file1; git commit -m 'commit 1'
git mv file1 file2; git commit -m 'rename'
echo "Don't panic" >> file2; git add file2; git commit -m 'commit 2'

git log file2
# lists 'commit 2' + 'rename' commits

git log --follow file2
# lists all commits

cd ..; [ -d "$workDir" ] && rm -rf "$workDir"
rename+change then commit
workDir=$(mktemp --tmpdir -d testGit.XXXXXXXX); echo "$workDir"; cd "$workDir";
git init
echo 'hello world' > file1; git add file1; git commit -m 'commit 1'
echo "Don't panic" >> file1; git mv file1 file2; git add file2; git commit -m 'commit 2'

git log file2
# lists 'commit 2' only

git log --follow file2
# lists all commits

cd ..; [ -d "$workDir" ] && rm -rf "$workDir"
mail

Delete the deleted files

Foreword

As answered to my question, anything that is git added to a repository, then git rm'ed anytime later will live forever in the Git history. This is usually no big trouble... unless these are :
  • large files
  • binary files
    • they have a lower compression ratio than pure text files
    • ... or can't even be compressed anymore since they are already compressed (details : 1, 2)
  • or any file with a long list of changes : every time a file is committed —and whatever change it received— Git stores the whole file in its history
    • this helps making things like checkout fast
    • but costs in storage space
These can be (some of) the reasons why your Git repository is fat. To make it lighter, you have the possibility to delete the deleted files.
  • As said above, any file that got a git rm + git commit is not really deleted and can actually be retrieved from the Git history. The method below helps deflating the repository by actually deleting files, i.e. files affected by the commands below will be taken out of the history and will be gone forever.
  • Some deleted files are still worth being saved in the history, it's up to you to chose wisely which ones to keep or remove, and not blindly use the whole list of deleted files.

Procedure

Before going further, you'll need git filter-repo.

List deleted files

  1. git filter-repo --analyze
    you may re-run this command with :
    git filter-repo --analyze --force
  2. this creates ./.git/filter-repo/analysis/path-deleted-sizes.txt looking like :
    === Deleted paths by reverse accumulated size ===				git filter-repo did everything for us 
    Format: unpacked size, packed size, date deleted, path name(s)
          857646     618500 2018-06-04 bar/baz/foo.pdf
          857646     618500 2018-06-04 foo/bar/foo/bar.pdf
          857646     618500 2018-06-04 foo/bar/baz/foobar.pdf			plenty of documentation I deleted because it was obsolete 
          513453     444710 2016-11-16 bar/foo/baz.pdf
          513453     444710 2016-11-16 foo/bar/foo/foofoo.pdf
          376869     337648 2016-11-16 bar/foo/barfoo.pdf
          376869     337648 2016-11-16 foo/bar/foo/bazbaz.pdf
         1556701     327981 2016-11-15 bar/baz/bazfoo.eps
         1556701     327981 2016-11-15 foo/bar/baz/barbar.eps
       225216471     317154 2020-09-23 foo/foo.xml					has a very long history of changes, but compresses well
          296129     293152 2020-06-16 foo/bar/foobaz.odt
          
          (~3000 more lines)

Now delete the deleted files (details)

The idea is to run, for every fileToRemove : And should you need some weapons of mass deletion, here are some one-liners that may prove useful :
Even though commands are listed in a way suggesting they might be chained, you may not want to apply these on all lines of path-deleted-sizes.txt
  • some deleted files are worth staying in the Git history
  • on my PC, running a single git filter-repo fileToRemove command take 7-10 seconds
  • Edit .git/filter-repo/analysis/path-deleted-sizes.txt so that it only lists the deleted files you want to permanently remove from the Git history
  • Effectively sort the deleted files by decreasing unpacked size :
    sort -k1nr .git/filter-repo/analysis/path-deleted-sizes.txt > file1
  • Keep file names only :
    awk '{$1=$2=$3=""; print}' file1 > file2
  • Turn every line of the resulting file into a git filter-repo fileToRemove command :
  • Finally delete the deleted files :
    bash file2
mail

What happens when I cherry-pick then rebase ?

Situation

  1. Here's a very classic situation, I have 2 branches :
       A---B---C---D master
                    \
                     X---Y---Z feature
  2. I cherry-pick Z :
       A---B---C---D---Z' master
                    \
                     X---Y---Z feature
  3. I rebase feature on master :
       A---B---C---D---Z' master
                        \
                         X---Y---Z feature
  4. ... and I get :
       A---B---C---D---Z' master
                        \
                         X---Y feature
    Looks like Git detected that Z and Z' do the same changes, hence hides Z. How can I confirm ?

Solution

As said in the rebase manual : Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..upstream are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).
mail

What's the difference between a merge request (MR) and a pull request (PR) ?

Stuff that is common to both types of requests :

  • In both cases, you have a local repository on your workstation. You commit and pull there and it is up-to-date. The difference is mostly in the context :
    • how / why / when you interact with a repository which isn't yours
    • and "how" your commits reach such repository
  • Since there's the word request, you can guess you're actually asking someone to let your commits in.
  • Both involve actions in a web UI such as GitLab or GitHub.

merge request (aka MR) :

At work, there's a shared Git repository where you can push any branch (mostly feature branches, actually) except the master branch. This is because the development team leaders / code quality specialists want to review every commit before accepting it on master. Thus, they make sure every commit meets the internal coding standards (and succeeds at the CI tests, of course).

So, when the development of your feature is done and you feel it's ready to join the master branch, you ask —via the web UI— your colleagues to merge your branch.

  • if your commits are accepted, an actual merge is performed (and your feature branch may optionally be deleted once merged)
  • otherwise, they'll ask you to fix things and to make a new merge request

pull request (aka PR) :

  1. There's this very interesting project on GitHub you wish to give a try / have a look at, so you fork it.
    There is no git fork command. forking is a "special clone" you can do on GitHub, which actually clones the remote repository on the remote server (not locally like a regular clone).
  2. This forked repository is yours, you can work normally with it (usually starting by making a local clone on your workstation).
  3. You can, of course, push / pull / merge as you like since this is yours.
  4. You can also receive ("pull") the commits made on the original repository after you forked it. (commands ?)
  5. But what if you want to share your work / contribute to this original repository ? This is what the pull request is for : you ask the owner of the original repository to pull commits from your repository to his own.
  6. At this time, it's very likely there will be some discussion / comments between you and them before they actually pull your commits (coding standards, code quality, ...)
mail

Git repeatedly prompts for credentials

Situation

I have a script (an Ansible Galaxy Makefile, actually) that gets stuff from a list of Git repositories (pull or clone ? Whatever...), via HTTPS. Running this prompts for my username and password for every repository it has to get stuff from, which is rather long / annoying / inefficient / error-prone / I-want-to-stop-that!!!

Solution

Ask Git to cache your credentials :
git config --global credential.helper cache
If the default 15 minutes aren't enough :
git config --global credential.helper "cache --timeout=3600"
Thus, you'll only be prompted once.

Alternate solution

Append to ~/.gitconfig :
[credential]
	helper = cache --timeout=3600
mail

How to create SSH keys for GitHub ?

  1. Read this tutorial
  2. Test SSH connection to GitHub :
  3. Append to ~/.ssh/config :
    Host github.com
    	User		git				not an example, this MUST be "git"
    	IdentityFile	/home/stuart/.ssh/github
  4. Then, you can create a new repository and store it on GitHub
mail

What is a Git fast-forward ?

mail

git diff and file permissions

What does Git know / track about file permissions ? Let's experiment (or jump to the conclusion) :

Increasing permissions :

workDir='/tmp/testGit'; mkdir "$workDir"; cd "$workDir"; git init; myFile="$workDir/test.txt"; echo 'hello world' > $myFile; chmod 000 $myFile; ls -l $myFile
Just creating our testing environment.
---------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
git add $myFile; git commit $myFile -m 'hello'
Our 1st commit :
[master (root-commit) e13e5bb] hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 test.txt
Committed with 644 permission whereas the file actually has 000.
chmod u+r $myFile; ls -l $myFile; git diff
-r-------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod u+w $myFile; ls -l $myFile; git diff
-rw------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod u+x $myFile; ls -l $myFile; git diff
-rwx------ 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
diff --git a/test.txt b/test.txt
old mode 100644
new mode 100755
With the executable bit, the file is now considered having 755 permission whereas it actually has 700.
Let's commit it to go further : git add $myFile; git commit $myFile -m "u+x"
[master 93560cf] u+x
 0 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 test.txt/
chmod g+r $myFile; ls -l $myFile; git diff
-rwxr----- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod g+w $myFile; ls -l $myFile; git diff
-rwxrw---- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod g+x $myFile; ls -l $myFile; git diff
-rwxrwx--- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod o+r $myFile; ls -l $myFile; git diff
-rwxrwxr-- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod o+w $myFile; ls -l $myFile; git diff
-rwxrwxrw- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)
chmod o+x $myFile; ls -l $myFile; git diff
-rwxrwxrwx 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

Setting permission bits individually :

  1. Let's start by creating our test file :
    myOtherFile="$workDir/test2.txt"; echo 'blah blah blah' > $myOtherFile; chmod 000 $myOtherFile; git add $myOtherFile; git commit $myOtherFile -m 'Blah'
    [master 891cf3b] Blah
     1 files changed, 1 insertions(+), 0 deletions(-)
     create mode 100644 test2.txt
  2. Then, let's toggle permission bits one by one and see what Git detects :
    for person in u g o; do
    	for permission in r w x; do
    		echo $person+$permission
    		chmod $person+$permission $myOtherFile
    		ls -l $myOtherFile
    		git diff $myOtherFile
    		chmod $person-$permission $myOtherFile
    		echo
    	done
    done
    u+r
    -r-------- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    u+w
    --w------- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    u+x
    ---x------ 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    diff --git a/test2.txt b/test2.txt
    old mode 100644
    new mode 100755
    
    g+r
    ----r----- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    g+w
    -----w---- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    g+x
    ------x--- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    o+r
    -------r-- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    o+w
    --------w- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
    
    o+x
    ---------x 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

Conclusion

Git can only store two types of modes: 755 (executable) and 644 (not executable). If your file was 444 Git would store it has 644. (source)
Git is a content tracker, where content is de facto defined as "whatever is relevant to the state of a typical sourcecode tree". Basically, this is just files' data and "executable" attribute. (source)
mail

Git hooks

Git hooks :

Available hooks (source) :

hook name is run at trigger
post-receive a local repository when the local repository is the destination of a git push

Hook execution fails on fatal: not a git repository: '.' (source) :

So far this is wizardry to me, but the solution is to unset the GIT_DIR variable. Suggested way of proceeding :
OLD_GIT_DIR=$GIT_DIR
unset GIT_DIR

(part of the script where taking actions on a repo occur)

GIT_DIR=$OLD_GIT_DIR
mail

How are generated Git commit IDs ?

In order to find out, let's build a repo and commit some stuff :
testDir='/tmp/test'; testFile='test.txt'; mkdir -p "$testDir"; cd "$testDir"; git init; echo 'hello world' > "$testFile"; git add "$testFile"; git commit -m 'Hello to the world.'; echo 'hello everybody' >> "$testFile"; git add "$testFile"; git commit -m 'Hello to people.'; git show
Here's our 2nd commit, git show returns :
commit 93ce9bef143d57b6c0133d659db0c3030c24f75f
Author: Thomas ANDERSON <thomas.anderson@metacortex.com>
Date:	Mon Dec 15 17:54:22 2014 +0100

	Hello to people.
So, how is generated this commit ID : 93ce9bef143d57b6c0133d659db0c3030c24f75f ?
Try this :
(printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
This should output the exact commit ID we're talking about : 93ce9bef143d57b6c0133d659db0c3030c24f75f

More details :

The commit ID is the sha1sum of :
  • the string commit [length of commit metadata]NULL
  • then commit metadata itself, being :
    • the tree ID
    • the parent ID
    • the string author firstname lastname email
    • the string committer firstname lastname email
    • (a blank line)
    • the string commit message

Fine, but why is this important ?

This shows that the commit message is used to compute the commit ID. So if a commit is amended, and the commit message changes, so will the commit ID. But, if the previous commit ID has already been pushed to a remote, this breaks one of Git's rules :

Only rewrite that part of history which you alone possess (source) : don't amend your last commit if you've already pushed it (source) !

mail

My Git beginner's guide

I'm afraid this article will only be a collection of links (but a good link is worth 1000 words )
mail

Git bare repositories

Git has "regular" and bare repositories :

"regular" repositories :

  • contain a working copy (i.e. all the files handled by Git)
  • have a .git sub-directory for the metadata
  • can NOT be pushed to
    You may, actually, push to a non-bare repository, but this requires extra precautions and is not recommended.

bare repositories :

  • have no working copy, just the metadata
  • have no .git sub-directory. Actually, they don't need it since bare repositories only contain metadata : what is stored in .git in non-bare repositories is found one level higher in bare repositories
  • are directories named after (by convention) myRepoName.git
  • CAN be pushed to

How to create a bare repository.

How to convert a "regular" repository into a bare repository ?

The idea is to (source) :
  1. Rename the repository directory to append it a .git : myRepoName.git
  2. Delete the working copy
  3. Move one level up the contents of the .git subdirectory (and delete .git once empty)
  4. Make Git aware of the change : git config --bool core.bare true
  5. You may have some remotes to update

It is also possible to proceed with git clone --bare.

mail

Can not clone behind a proxy

Situation

Trying to git clone fails miserably :
git clone git@github.com:/everzet/capifony.git
Cloning into 'capifony'...
ssh: connect to host github.com port 22: Connection timed out
fatal: The remote end hung up unexpectedly
git clone git://github.com/everzet/capifony.git
Cloning into 'capifony'...
fatal: unable to connect to github.com:
github.com[0: 204.232.175.90]: errno=Connexion terminée par expiration du délai d'attente
git clone https://github.com/ehashman/ansiblefest.git
Cloning into 'ansiblefest'...
fatal: unable to access 'https://github.com/ehashman/ansiblefest.git/': Failed to connect to 10.2.0.20 port 3128: Connection timed out
This is a special case because :
  • I already did the proxy configuration described below
  • this was on an old VM, and 10.2.0.20 isn't the proxy to use anymore

Solution

  1. define the http_proxy environment variable :
    http_proxy=http://user:password@host:port
  2. make Git aware of it :
    • git config --global http.proxy $http_proxy
    • OR add into ~/.gitconfig :
      [http]
      	proxy = http://user:password@host:port
  3. clone via HTTP : git clone http://github.com/everzet/capifony.git
  4. do the same for https
mail

Git glossary

For details : man gitglossary
bare
Git bare repositories
FETCH_HEAD
several definitions until I find one that summarizes them all
  • the SHAs of branch/remote heads that were updated during the last git fetch (source)
  • a short-lived reference (i.e. a pointer) to keep track of what has just been fetched from the remote repository by git fetch (source)
  • FETCH_HEAD records the branch which you fetched from a remote repository with your last git fetch invocation (source)
feature branch
TODO:
fork
TODO:
GIT_AUTHOR_DATE, GIT_COMMITTER_DATE
Without surprise, both are dates handled by Git (), and the difference lies in the difference between an author and a committer :
author
the person who wrote the code
committer
the person who committed the code on behalf of the author (and who is often the author himself)
This is important because Git allows rewriting history, or applying patches on behalf of another person, which is what is done by a maintainer with code written by a contributor on an open source project.
HEAD
  • HEAD is the commit on top of which git commit would make a new one. (details)
  • Refers to a named branch, which in turn refers to a commit (a branch is updated after each commit to point to the latest commit).
  • HEAD is often considered as the latest commit of the current branch, which is partially true since HEAD can actually point to any commit. In such case (pointing to a specific commit instead of pointing to a named branch), we would be in detached HEAD mode (details 1, 2).
head
  • a reference to a commit object (i.e. a commit ID)
  • Each head has a name (branch name, tag name, ...). By default, there is a head in every repository called master.
  • A repository can contain any number of heads.
  • At any given time, one head is selected as "the current head". This one is aliased to HEAD (always in capitals).
  • References, heads or branches can be considered like post-it notes stuck onto commits in the commit history. Usually they point to the tip of series of commits, but they can be moved around with Git commands (checkout, revert, ...)
index (aka "cache" or "staging area")
This is where you place files before committing them into the Git repository. You can imagine it works like this :
  • the working area is the plant floor were your product is manufactured
  • the index is the "packaging / shipping" floor : this is where you bring the product itself, some accessories and shipment documents. You pack everything in a box and ship the whole package.
  • well, now, "shipping" is actually what is made with a commit
Adding file(s) to the index is made with git add.
The index is a binary file, usually found at .git/index.
reflog
Reference logs :
  • the history of HEAD values
  • record when the tips of branches and other references were updated in the local repository
  • are useful in various Git commands to specify the old value of a reference. For example :
    • HEAD@{2} means where HEAD used to be two moves ago
    • master@{one.week.ago} means where master used to point to one week ago in this local repository
    (details)
This information belongs to a local repository and is not carried by git clone : the reflog of a cloned repository will have a single entry saying (source) :
commitId HEAD@{0}: clone: from [source repository]
remote
Any Git repository you "synchronize" yours with, via push / pull commands. This is typically a GitLab or GitHub repository.
remote-tracking branch (sources : 1, 2)
upstream
Due of the decentralized nature of Git, all repositories are born equal, and none has a central or server role more than any other. However, developers teams need some kind of central point to act as a reference for the whole team. This central repository is called the upstream repository, it is typically the one developers interact with via GitLab.
Being an upstream repository is more an organization role than a technical functionality.
upstream branch
Checking out a local branch from a remote-tracking branch automatically creates what is called a tracking branch (and the branch it tracks is called an upstream branch). There are several ways to associate a local branch with a remote branch.

What's the point of defining an upstream branch ?

Adding a remote tracking branch means that Git then knows what you want to do the next time you'll git fetch, git pull or git push. It assumes that you want to keep the local branch and the remote branch it is tracking in sync and does the appropriate thing to achieve this. (source)
working tree (aka "working copy")
These are the files you're working on and that are tracked by Git.