GIT - Httqm's Docs

Looks like this makes no difference anymore on modern versions of Git (but it used to ! 👴)... or I'm missing something

rename+commit then change+commit

workDir=$(mktemp --tmpdir -d testGit.XXXXXXXX); echo "$workDir"; cd "$workDir";
git init
echo 'hello world' > file1; git add file1; git commit -m 'commit 1'
git mv file1 file2; git commit -m 'rename'
echo "Don't panic" >> file2; git add file2; git commit -m 'commit 2'

git log file2					lists 'commit 2' + 'rename' commits

git log --follow file2				lists all commits

cd ..; [ -d "$workDir" ] && rm -rf "$workDir"

rename+change then commit

workDir=$(mktemp --tmpdir -d testGit.XXXXXXXX); echo "$workDir"; cd "$workDir";
git init
echo 'hello world' > file1; git add file1; git commit -m 'commit 1'
echo "Don't panic" >> file1; git mv file1 file2; git add file2; git commit -m 'commit 2'

git log file2					lists 'commit 2' only

git log --follow file2				lists all commits

cd ..; [ -d "$workDir" ] && rm -rf "$workDir"

Foreword

As answered to my question : anything that is git added to a repository, then git rm'ed anytime later will live forever in the Git history. This is usually no big trouble... unless these are :

large files
binary files
- they have a lower compression ratio than pure text files
- ... or can't even be compressed anymore since they are already compressed (details : 1, 2)
or any file with a long list of changes : every time a file is committed —and whatever change it received— Git stores the whole file in its history
- this helps making things like checkout fast
- but costs in storage space

These can be (some of) the reasons why your Git repository is fat. To make it lighter, you have the possibility to delete the deleted files.

As said above, any file that got a git rm + git commit is not really deleted and can actually be retrieved from the Git history. The method below helps deflating the repository by actually deleting files, i.e. files affected by the commands below will be taken out of the history and will be gone forever.
Some deleted files are still worth being saved in the history, it's up to you to chose wisely which ones to keep or remove, and not blindly use the whole list of deleted files.

Procedure

Before going further, you'll need git filter-repo.

List deleted files

cd root/dir/of/git/repository
git filter-repo --analyze
you may re-run this command with :
git filter-repo --analyze --force

this creates ./.git/filter-repo/analysis/path-deleted-sizes.txt looking like :

=== Deleted paths by reverse accumulated size ===				git filter-repo did everything for us 
Format: unpacked size, packed size, date deleted, path name(s)
      857646     618500 2018-06-04 bar/baz/foo.pdf
      857646     618500 2018-06-04 foo/bar/foo/bar.pdf
      857646     618500 2018-06-04 foo/bar/baz/foobar.pdf			plenty of documentation I deleted because it was obsolete 
      513453     444710 2016-11-16 bar/foo/baz.pdf
      513453     444710 2016-11-16 foo/bar/foo/foofoo.pdf
      376869     337648 2016-11-16 bar/foo/barfoo.pdf
      376869     337648 2016-11-16 foo/bar/foo/bazbaz.pdf
     1556701     327981 2016-11-15 bar/baz/bazfoo.eps
     1556701     327981 2016-11-15 foo/bar/baz/barbar.eps
   225216471     317154 2020-09-23 foo/foo.xml					has a very long history of changes, but compresses well
      296129     293152 2020-06-16 foo/bar/foobaz.odt
      
      (~3000 more lines)

Now delete the deleted files (details)

The idea is to run, for every fileToRemove :

git filter-repo --invert-paths --path fileToRemove

And should you need some weapons of mass deletion, here are some one-liners that may prove useful :

Even though commands are listed in a way suggesting they might be chained, you may not want to apply these on all lines of path-deleted-sizes.txt

some deleted files are worth staying in the Git history
on my PC, running a single git filter-repo fileToRemove command take 7-10 seconds

Edit .git/filter-repo/analysis/path-deleted-sizes.txt so that it only lists the deleted files you want to permanently remove from the Git history.
i.e. remove from this file :
- the 2 header lines
- any file you'd like to keep in the history
compute the amount of saved space :
- size in bytes :
  bc <<< $(awk '{printf $2"+"}' .git/filter-repo/analysis/path-deleted-sizes.txt | sed -r 's/.$//')
- size in MiB (with Awk doing all the job) :
  awk 'BEGIN {sum=0} {sum+=$2} END {print sum/1024/1024}' .git/filter-repo/analysis/path-deleted-sizes.txt
Effectively sort the deleted files by decreasing unpacked size :
sort -k1nr .git/filter-repo/analysis/path-deleted-sizes.txt > file1
Keep file names only :
awk '{$1=$2=$3=""; print}' file1 > file2
Turn every line of the resulting file into a git filter-repo fileToRemove command :
sed -r i 's/^ *(.*)$/git filter-repo --invert-paths --path "\1"/' file2
Finally delete the deleted files :
bash file2

Here's a very classic situation, I have 2 branches :

   A---B---C---D master
                \
                 X---Y---Z feature

I cherry-pick Z :

   A---B---C---D---Z' master
                \
                 X---Y---Z feature

I rebase feature on master :

   A---B---C---D---Z' master
                    \
                     X---Y---Z feature

... and I get :

   A---B---C---D---Z' master
                    \
                     X---Y feature

Looks like Git detected that Z and Z' do the same changes, hence hides Z. How can I confirm ?

As said in the rebase manual : Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..upstream are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).

Stuff that is common to both types of requests :

In both cases, you have a local repository on your workstation. You commit and pull there and it is up-to-date. The difference is mostly in the context :
- how / why / when you interact with a repository which isn't yours
- and "how" your commits reach such repository
Since there's the word request, you can guess you're actually asking someone to let your commits in.
Both involve actions in a web UI such as GitLab or GitHub.

merge request (aka MR) :

At work, there's a shared Git repository where you can push any branch (mostly feature branches, actually) except the master branch. This is because the development team leaders / code quality specialists want to review every commit before accepting it on master. Thus, they make sure every commit meets the internal coding standards (and succeeds at the CI tests, of course).

So, when the development of your feature is done and you feel it's ready to join the master branch, you ask —via the web UI— your colleagues to merge your branch.

if your commits are accepted, an actual merge is performed (and your feature branch may optionally be deleted once merged)
otherwise, they'll ask you to fix things and to make a new merge request

pull request (aka PR) :

There's this very interesting project on GitHub you wish to give a try / have a look at, so you fork it.
There is no git fork command. forking is a "special clone" you can do on GitHub, which actually clones the remote repository on the remote server (not locally like a regular clone).
This forked repository is yours : you can work normally with it (usually starting by making a local clone on your workstation).
You can, of course, push / pull / merge as you like since this is yours.
You may also receive ("pull") the commits made on the original repository anytime after you forked it.
- there are 3 instances of the Git repository :
  1. the original repository of the project you're interested in, which belongs to other developers, and stored on GitHub
  2. your fork, also stored on GitHub
  3. a local copy, found on your workstation, and synchronized with your fork via push / pull / merge
- to keep your repository updated with the original one, you'll have to :
  1. pull commits from the original repository to the repository on your workstation
  2. push commits from your workstation to your fork
- there _may_ be a way to pull commits from the original repository to your fork (i.e. GitHub to GitHub), but I've not searched / tried
- to make this work seamlessly, I strongly suggest the use of branches
But what if you want to share your work / contribute to this original repository ? This is what the pull request is for : you ask the owner of the original repository to pull commits from your repository to his own.
At this time, it's very likely there will be some discussion / comments between you and them before they actually pull your commits (coding standards, code quality, ...)

I have a script (an Ansible Galaxy Makefile, actually) that gets stuff from a list of Git repositories (pull or clone ? Whatever...), via HTTPS. Running this prompts for my username and password for every repository it has to get stuff from, which is rather long / annoying / inefficient / error-prone / I-want-to-stop-that!!!

Ask Git to cache your credentials :: git config --global credential.helper cache
If the default 15 minutes aren't enough :: git config --global credential.helper "cache --timeout=3600"

Thus, you'll only be prompted once.

Append to ~/.gitconfig :

[credential]
	helper = cache --timeout=3600

Read this tutorial
Test SSH connection to GitHub :
- ssh -T git@github.com
- ssh -i ~/.ssh/github -T git@github.com

Append to ~/.ssh/config :

Host github.com
	User		git				not an example, this MUST be "git"
	IdentityFile	/home/stuart/.ssh/github

Then, you can create a new repository and store it on GitHub

A fast-forward is one method to execute a merge :
- either when explicitly running git merge
- or implicitly with a git pull = git fetch + git merge FETCH_HEAD
available methods to proceed to a merge include (details) :
- forcing a fast-forward
- forbidding a fast-forward
- going best-effort to do a fast-forward and abort if it's not possible
git merge goes fast-forward when a branch can be advanced along a linear sequence. This happens whenever you pull changes that build directly on top of the same commit you have as your most recent commit. In other words, there was never any divergence or simultaneous commits created in parallel in multiple repositories. If there had been parallel commits, then git merge would actually introduce a new merge commit to tie the two commits together.
When a non-fast-forward merge occurs, there is always the possibility that a conflict occurs. In this case, git merge will leave conflict markers in the files and instruct you to resolve the conflicts. When you are finished, you would issue a git commit -a to create the merge commit.

What does Git know / track about file permissions ? Let's experiment (or jump to the conclusion) :

Increasing permissions :

workDir='/tmp/testGit'; mkdir "$workDir"; cd "$workDir"; git init; myFile="$workDir/test.txt"; echo 'hello world' > $myFile; chmod 000 $myFile; ls -l $myFile

Just creating our testing environment.

---------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt

git add $myFile; git commit $myFile -m 'hello'

Our 1^st commit :

[master (root-commit) e13e5bb] hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 test.txt

Committed with 644 permission whereas the file actually has 000.

chmod u+r $myFile; ls -l $myFile; git diff

-r-------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod u+w $myFile; ls -l $myFile; git diff

-rw------- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod u+x $myFile; ls -l $myFile; git diff

-rwx------ 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
diff --git a/test.txt b/test.txt
old mode 100644
new mode 100755

With the executable bit, the file is now considered having 755 permission whereas it actually has 700.

Let's commit it to go further : git add $myFile; git commit $myFile -m "u+x"

[master 93560cf] u+x
 0 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 test.txt/

chmod g+r $myFile; ls -l $myFile; git diff

-rwxr----- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod g+w $myFile; ls -l $myFile; git diff

-rwxrw---- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod g+x $myFile; ls -l $myFile; git diff

-rwxrwx--- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod o+r $myFile; ls -l $myFile; git diff

-rwxrwxr-- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod o+w $myFile; ls -l $myFile; git diff

-rwxrwxrw- 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

chmod o+x $myFile; ls -l $myFile; git diff

-rwxrwxrwx 1 kevin developers 12 Dec 23 11:37 /tmp/testGit/test.txt
(no diff seen by Git)

Setting permission bits individually :

Let's start by creating our test file :
myOtherFile="$workDir/test2.txt"; echo 'blah blah blah' > $myOtherFile; chmod 000 $myOtherFile; git add $myOtherFile; git commit $myOtherFile -m 'Blah'
```
[master 891cf3b] Blah
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 test2.txt
```

Then, let's toggle permission bits one by one and see what Git detects :

for person in u g o; do
	for permission in r w x; do
		echo $person+$permission
		chmod $person+$permission $myOtherFile
		ls -l $myOtherFile
		git diff $myOtherFile
		chmod $person-$permission $myOtherFile
		echo
	done
done

u+r
-r-------- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

u+w
--w------- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

u+x
---x------ 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt
diff --git a/test2.txt b/test2.txt
old mode 100644
new mode 100755

g+r
----r----- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

g+w
-----w---- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

g+x
------x--- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

o+r
-------r-- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

o+w
--------w- 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

o+x
---------x 1 kevin developers 15 Dec 23 12:55 /tmp/testGit/test2.txt

Conclusion

Git can only store two types of modes: 755 (executable) and 644 (not executable). If your file was 444 Git would store it has 644. (source)
Git is a content tracker, where content is de facto defined as "whatever is relevant to the state of a typical sourcecode tree". Basically, this is just files' data and "executable" attribute. (source)

Git hooks :

are scripts that Git executes before or after events such as: commit, push, and receive
are a Git built-in feature : nothing to download / install
are run locally
appear as scripts found in .git/hooks

Available hooks (source) :

hook name	is run at	trigger
post-receive	a local repository	when the local repository is the destination of a git push

Hook execution fails on fatal: not a git repository: '.' (source) :

So far this is wizardry to me, but the solution is to unset the GIT_DIR variable. Suggested way of proceeding :

OLD_GIT_DIR=$GIT_DIR
unset GIT_DIR

(part of the script where taking actions on a repo occur)

GIT_DIR=$OLD_GIT_DIR

In order to find out, let's build a repo and commit some stuff :

testDir='/tmp/test'; testFile='test.txt'; mkdir -p "$testDir"; cd "$testDir"; git init; echo 'hello world' > "$testFile"; git add "$testFile"; git commit -m 'Hello to the world.'; echo 'hello everybody' >> "$testFile"; git add "$testFile"; git commit -m 'Hello to people.'; git show

Here's our 2^nd commit, git show returns :

commit 93ce9bef143d57b6c0133d659db0c3030c24f75f
Author: Thomas ANDERSON <thomas.anderson@metacortex.com>
Date:	Mon Dec 15 17:54:22 2014 +0100

	Hello to people.

So, how is generated this commit ID : 93ce9bef143d57b6c0133d659db0c3030c24f75f ? Try this :

(printf "commit %s\0" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum

This should output the exact commit ID we're talking about : 93ce9bef143d57b6c0133d659db0c3030c24f75f

More details :

The commit ID is the sha1sum of :

the string commit [length of commit metadata]NULL
then commit metadata itself, being :
- the tree ID
- the parent ID
- the string author firstname lastname email
- the string committer firstname lastname email
- (a blank line)
- the string commit message

Fine, but why is this important ?

This shows that the commit message is used to compute the commit ID. So if a commit is amended, and the commit message changes, so will the commit ID. But, if the previous commit ID has already been pushed to a remote, this breaks one of Git's rules :

Only rewrite that part of history which you alone possess (source) : don't amend your last commit if you've already pushed it (source) !

I'm afraid this article will only be a collection of links (but a good link is worth 1000 words )

MUST-READ for beginners : Getting Started - Git Basics
Some basic Git vocabulary
For those who need sketches to get things :
- start with this one : A Visual Git Reference
- animates as you type : Visualizing Git Concepts with D3
- the all-in-one : The Git cheatsheet
- Rachel M. Carmena has great sketches to teach Git.
Documentations I wrote about Git :
- A training session I made for colleagues (in french) : Git - Présentation basique et non-exhaustive
- My humble Git tutorial
Some extra man pages :
Git high-level porcelain commands :
- git-scm.com
- kernel.org

Git has "regular" and bare repositories :

"regular" repositories :

contain a working copy (i.e. all the files handled by Git)
have a .git sub-directory for the metadata
can NOT be pushed to
You may, actually, push to a non-bare repository, but this requires extra precautions and is not recommended.

bare repositories :

have no working copy, just the metadata
have no .git sub-directory. Actually, they don't need it since bare repositories only contain metadata : what is stored in .git in non-bare repositories is found one level higher in bare repositories
are directories named after (by convention) myRepoName.git
CAN be pushed to

How to create a bare repository.

How to convert a "regular" repository into a bare repository ?

The idea is to (source) :

Rename the repository directory to append it a .git : myRepoName.git
Delete the working copy
Move one level up the contents of the .git subdirectory (and delete .git once empty)
Make Git aware of the change : git config --bool core.bare true
You may have some remotes to update

It is also possible to proceed with git clone --bare.

Trying to git clone fails miserably :

git clone git@github.com:/everzet/capifony.git

Cloning into 'capifony'...
ssh: connect to host github.com port 22: Connection timed out
fatal: The remote end hung up unexpectedly

git clone git://github.com/everzet/capifony.git

Cloning into 'capifony'...
fatal: unable to connect to github.com:
github.com[0: 204.232.175.90]: errno=Connexion terminée par expiration du délai d'attente

git clone https://github.com/ehashman/ansiblefest.git

Cloning into 'ansiblefest'...
fatal: unable to access 'https://github.com/ehashman/ansiblefest.git/': Failed to connect to 10.2.0.20 port 3128: Connection timed out

This is a special case because :

I already did the proxy configuration described below
this was on an old VM, and 10.2.0.20 isn't the proxy to use anymore

define the http_proxy environment variable :

http_proxy=http://user:password@host:port

make Git aware of it :
- git config --global http.proxy $http_proxy
- OR add into ~/.gitconfig :
```
[http]
	proxy = http://user:password@host:port
```
clone via HTTP : git clone http://github.com/everzet/capifony.git
do the same for https

For details : man gitglossary

bare

Git bare repositories

branch

A line of development. There are some special branches :

master or main : default branch of a Git repository (What's the difference ?)
feature branch : a branch that was created for all the changes related to the development of a specific feature

(see also git branch)

FETCH_HEAD

several definitions until I find one that summarizes them all

the SHAs of branch/remote heads that were updated during the last git fetch (source)
a short-lived reference (i.e. a pointer) to keep track of what has just been fetched from the remote repository by git fetch (source)
FETCH_HEAD records the branch which you fetched from a remote repository with your last git fetch invocation (source)

fork

TODO:

GIT_AUTHOR_DATE, GIT_COMMITTER_DATE

Without surprise, both are dates handled by Git (), and the difference lies in the difference between an author and a committer :

author: the person who wrote the code
committer: the person who committed the code on behalf of the author (and who is often the author himself)

This is important because Git allows rewriting history, or applying patches on behalf of another person, which is what is done by a maintainer with code written by a contributor on an open source project.

HEAD

HEAD is the commit on top of which git commit would make a new one. (details)
Refers to a named branch, which in turn refers to a commit (a branch is updated after each commit to point to the latest commit).
HEAD is often considered as the latest commit of the current branch, which is partially true since HEAD can actually point to any commit. In such case (pointing to a specific commit instead of pointing to a named branch), we would be in detached HEAD mode (details 1, 2).

head

a reference to a commit object (i.e. a commit ID)
Each head has a name (branch name, tag name, ...). By default, there is a head in every repository called master.
A repository can contain any number of heads.
At any given time, one head is selected as "the current head". This one is aliased to HEAD (always in capitals).
References, heads or branches can be considered like post-it notes stuck onto commits in the commit history. Usually they point to the tip of series of commits, but they can be moved around with Git commands (checkout, revert, ...)

index (aka "cache" or "staging area")

This is where you place files before committing them into the Git repository. You can imagine it works like this :

the working area is the plant floor were your product is manufactured
the index is the "packaging / shipping" floor : this is where you bring the product itself, some accessories and shipment documents. You pack everything in a box and ship the whole package.
well, now, "shipping" is actually what is made with a commit

Adding file(s) to the index is made with git add.

The index is a binary file, usually found at .git/index.

reflog

Reference logs :

the history of HEAD values
record when the tips of branches and other references were updated in the local repository
are useful in various Git commands to specify the old value of a reference. For example :
- HEAD@{2} means where HEAD used to be two moves ago
- master@{one.week.ago} means where master used to point to one week ago in this local repository
(details)

This information belongs to a local repository and is not carried by git clone : the reflog of a cloned repository will have a single entry saying (source) :

commitId HEAD@{0}: clone: from [source repository]

remote

Any Git repository you "synchronize" yours with, via push / pull commands. This is typically a GitLab or GitHub repository.

remote-tracking branch (sources : 1, 2)

a remote-tracking branch is a local branch that represents a remote branch
the label of a remote-tracking branch (i.e. what it points to) is updated only with network operations such as clone, fetch, pull and push
remote-tracking branches are automatically created on repositories started with git clone. When manually linking a local and a remote repository with git remote add, you'll have to explicitly declare remote-tracking branch(es)
How to track a remote branch ?

upstream

Due of the decentralized nature of Git, all repositories are born equal, and none has a central or server role more than any other. However, developers teams need some kind of central point to act as a reference for the whole team. This central repository is called the upstream repository, it is typically the one developers interact with via GitLab.

Being an upstream repository is more an organization role than a technical functionality.

upstream branch

Checking out a local branch from a remote-tracking branch automatically creates what is called a tracking branch (and the branch it tracks is called an upstream branch). There are several ways to associate a local branch with a remote branch.

What's the point of defining an upstream branch ?

Adding a remote tracking branch means that Git then knows what you want to do the next time you'll git fetch, git pull or git push. It assumes that you want to keep the local branch and the remote branch it is tracking in sync and does the appropriate thing to achieve this. (source)

working tree (aka "working copy")

These are the files you're working on and that are tracked by Git.

Git : (rename+commit then change+commit) vs (rename+change then commit)

rename+commit then change+commit

rename+change then commit

Delete the deleted files

Foreword

Procedure

List deleted files

Now delete the deleted files (details)

What happens when I cherry-pick then rebase ?

Situation

Solution

What's the difference between a merge request (MR) and a pull request (PR) ?

Stuff that is common to both types of requests :

merge request (aka MR) :

pull request (aka PR) :

Git repeatedly prompts for credentials

Situation

Solution

Alternate solution

How to create SSH keys for GitHub ?

What is a Git fast-forward ?

git diff and file permissions

Increasing permissions :

Setting permission bits individually :

Conclusion

Git hooks

Available hooks (source) :

Hook execution fails on fatal: not a git repository: '.' (source) :

How are generated Git commit IDs ?

More details :

Fine, but why is this important ?

My Git beginner's guide

Git bare repositories

"regular" repositories :

bare repositories :

How to convert a "regular" repository into a bare repository ?

Can not clone behind a proxy

Situation

Solution

Git glossary

What's the point of defining an upstream branch ?