- how is computed the git commit hash?
- then, what is the “true” commit hash?
- what about the next commit?
- what about the promise of content addressable storage?
- gitlab in the way?
- what does fsck say?
- a change of algorithm with retro compatible support?
- a move to sha256?
- is gpgsig causing the issue?
- why did it not appear when running git cat-file in my first repository?
- beware git-replace
- Permalink
Why do the same git commits not have the same hash?
To avoid the survivorship bias, I kept the whole analysis, but it only shows that I did not know git-replace and the relevant part is in the end.
After applying a git-filter-repo to replace a mail address in a whole repository, I found out that it changed the hash of the first commit, while this commit was not subject to change.
This does not make sense, for by its content addressable nature, git must have the same hash when the commit is exactly the same.
This commit, before running git-filter-repo had the hash 4c10dbdd7ca357948521ad741182e119dd45d963 and after git-filter-repo, it had the hash bcefe2e6d93da969156da74a6d1df99483e8251e.
I immediately asked git what is different between those two commits, identified by two distinct hashes.
git cat-file
shows enough information to see exactly what differs between two
commits. It contains the metadata, including the tree hash, the committer and
author mails and dates are the same. And it is actually from the output of git cat-file
that the hash is computed (see below).
This means that, for two hashes A and B, git cat-file A == git cat-file B => A == B
.
diff -u <(git cat-file -p 4c10dbdd7ca357948521ad741182e119dd45d963) <(git cat-file -p bcefe2e6d93da969156da74a6d1df99483e8251e)
The fact that the diff is empty implies that those commit have the same hash, while obviously they don’t.
Then what the hell is going on?
how is computed the git commit hash?
-
External reference: https://gist.github.com/masak/2415865 I need to dig a little bit into how the hash is computed to find out why the hash might be different.
I thought it was something like these things that went into a commit:
$ git cat-file commit HEAD
[…]
it turns out there is also a NUL-terminated header that gets appended to this, containing the word “commit”, and the length in bytes of all of the above information
[…]
Put this header and the rest of the information together
[…]
…and what you get hashes to the right sha1!
$ (printf “commit %s\0” $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
As I expected, it is actually the content of
git cat-file
that is used to compute the hash.Taking the code from the gist, I have the following function to compute some hashes.
compute ( ) { local commit="${1}" { printf "commit %s\0" $(git cat-file commit "${commit}" | wc -c) git cat-file commit "${commit}" } | sha1sum | cut -f1 -d' ' }
then, what is the “true” commit hash?
What of 4c10dbdd7ca357948521ad741182e119dd45d963 or bcefe2e6d93da969156da74a6d1df99483e8251e is the correct hash of my first commit?
compute 4c10dbdd7ca357948521ad741182e119dd45d963
compute bcefe2e6d93da969156da74a6d1df99483e8251e
It looks like the correct hash is bcefe2e6d93da969156da74a6d1df99483e8251e (the one after git-filter-repo has run). And also the fact that compute find the same value for both commits shows that I was right guessing those where the same commit (actually, by looking at the code, is should be obvious, but still…).
It also appears that the “correct” sha1 was made by git-filter-repo and then where the hell does 4c10dbdd7ca357948521ad741182e119dd45d963 come from?
what about the next commit?
Was only the first commit miscalculated? Are the next hashes different only because they have a different parent?
Let’s take a look at some of the next commits to see how it goes.
compare () {
local old="${1}"
local new="${2}"
old_hash="$(compute "${old}")"
new_hash="$(compute "${new}")"
if [ "${old_hash}" == "${old}" ]
then
echo "old"
elif [ "${new_hash}" == "${new}" ]
then
echo "new"
else
echo "neither ${old} -> ${old_hash}, ${new} -> ${new_hash}"
fi
}
{
compare 3b3832c38937af16d3c4b5b5b30d960621584c13 2dc165f712788c95aeffbabd29e52738721f7044
compare b239c45ad3b3468bbdccb4738432471bf352cd88 001553e349bd3179aa34a0ce5d16f1ce721b6d06
compare f9c88063f2561fdaa0d0abcfd0c3a8c9144e2330 bead4ecbae9275913409e9b7fad25538abe1e032
compare 8f4647284698245f7e7db8881d54cef3060ccfa0 bc41628b1be91bc9e10f42b2ec014d8c670c83bd
compare b20339d21fcd1dc43b2db4be71a6e956f14f87da 757d373434f0a1a9882498ea387229a5016d6ffb
compare da0abf68c1f0cbdd66dc4b8b588550e0f2cb13c9 cf5a4ef6e60e947a6039c7c1c845677442eda22d
compare 50a5e45c316eaee0c7671c73ec98429c7fa22280 25372939c6c1dc685ae5613b9820045ba3fa74af
compare 4f1962f7a9ab85e10972e78387993d0287a8e741 e9e6566aea3236db1dac062fe9f3933ff529f88d
compare ad082ff26b13212e4231a1ad69bdde7680a2622a b56e9274d29f524d8bdd7224ce816a847e4dc927
compare 16134d0a0238d112b56d7c1741902c5bf0595f51 07655986600d6ff3685f1565c82ffb2bca9fec82
compare 33ad3afe29f66de89d3fd02352891166feb0aae6 dba364babf158247ac6fbc59fc05ac329e7e4f8c
compare 71b5587fe2dd2a0a409e975b32f65d0338e8acf9 e80955f1022c3ca1b3ffe79607067a17ed343b79
compare ce59e7d9be2fa343b1f4e4719a782851d3466d5c daf17752b253828e3a1230721ada7d457279fd78
compare 9974db6909049e6875dac301358cb5fa4db6ba8e 8bcc30988fef6ac29ae8aa70a8ae7c3bfa7076ca
compare 79f380d8d1ab728d2b273e160ac7bc4462828160 1f9bf3340055dea47080836380f2170982e9a419
compare fb1bf1c940006c71eba57dc296e0d664168581e3 e68bb46ae543db3f3cbae9a78c530ceb93d4927f
compare f254874518247f1b38e2ca355d4bee56712ed419 3cfb8638bfd9d32f67ccf33f52e9e01bf94b1890
compare dabe4a8215d312c57e6da60f143981a5c519c23e cf559bc775574bf4ea50cda6aaad3fe8dcb3711f
compare 163b39dc747211c110fa964cc9a94531313bf52b 9a18f7ddf13d249d834c1d24fbc8b77dc58f2f0a
compare f6210a78e75dc5aa657a3140c682e4767c86a2b8 142fe0bf373b6738592cc1bd1eeebc01bc82c502
compare a1220e55a68438038c6fcb40ca069314602da0f4 54fb8a496a6ff353ef2d408fa57e1eb10d980c10
compare b42c1152a4299edfa70fd889bd25a4d0dbcdb317 be66eaacd75c62bbc6bc8486e7b244773fb99bb7
compare 7c3b9599f5e118df5472d2269a656d43535c3769 b7de8468df616bb19e580ba1bbae051a00c00c39
compare 1ded7f9f7ada98b001b7ef7a7610bc0d427cca8d 5fda9cee37868993af14ff6e67281d3ddf5db704
compare ab4116d05049799e148ddff4c4ac8e9ef422c126 2bd1043f7ac6a3f625ecfcd0c673e26466164dd3
compare 9214b68a3a2c227b5339724be52ebf93644136d1 4ffe06c0bc6f834b1aad616235a3e992645a5622
compare 16947ecd3ead8a7d678404aecfea86f5d066c79c f99eaee0af2ec7a867d8176af2c954d6c6b239fe
compare 555a23f3ca1e1d85ed6a5e8f38541f7a578e63de a7ca84cacb015c62c35eb74debcd77a544ea1bce
compare 789a99911f61bdb6dd47a9a4d3459ec0e5741609 a6c87c4681777a1079839adff5d020096db5bdf6
compare cf45694669be7458cd55444ca1ceb862529886bc cf3a2cc1f2a904eadc5b9f1b6dc479b127b84acf
compare 749eae84bbef56fb2fcbc81c60a4043454bc2f8d f0a8f3e6e88b1951b71d041befd514aff245c1e0
compare 55521cf4aada3062b7aecb075df5867d023de117 9bf851f83d6573c148a818a3fd7823ad85e8b1dc
compare cf1d231aa4b9839be361a1995e971a15f44012b3 5dd7e7d3a3eb70b2b438d9c41e15b1f54415a9cf
compare ea6df5b9670e632da902793508101a376618d171 9097af0e7cd1f00e2b22b347decd3a8585e5b2e0
compare 11662d898d51f9ca8cc2b9765e8d758601d57ca1 1d31ae207748a2484df1697264b412fc4a15a0cc
compare e84d73bff8f842d328e05f993dcc32921a95476b c6fafab5c5066ec05bc849a9b8871500fba99c75
compare 1d3aae362e97a12e0559a6dc1d8b682cdc7ab397 1fe0fac6cd184909a50a2b4d9789ad6146052d21
compare 64b5d98b43aefcdd81637f1bc91183e1483c0e17 147da931c2c96f858e87a20d1b22d5e6a1344bae
compare 5ddc0a6cd5629397de33ecd31c38810953653677 ef7a698a82140b5330df889112f467ef999d51c5
compare 97a80154e8ce6a6a3b027fa0a731a2e1c0d06b1a e7348d4ceb8273d19406650b0266d7903a3d7160
compare e89c883754427c0a6d597f90dccb43304b654fb7 c049e38d81645889d9463648f030750f3c11c101
compare a33c65989d571e882211969a4d43e796b585f02f 23f2746ea80e320e17cf4c75797098805a94df66
compare bb9ef4ea3fbf808c8c4e41619f9e586e23616655 77e131d03013866b6c96633d172b9553b4393737
compare 665b3eea270ab249b0a9ced06c3752749793c05b a7adc4bed1036ce59bebe2659a0e03e5f8103f15
compare cbf79ce8f08da927d560e8d0ab09720ffa937a12 22df456599763dc5871c70623ee8d4b9db47cc27
compare fcf111040f94a33869e76f3f95c75ff9c436c058 5c61ca749a3632fa97147cfb97c2e050b82288dd
compare 45e7db124614de7a4c0f0bc10f443b2e18af3318 ac3b5770940f2552a14694b5ed178d5b787169b2
compare 24d2851a6e44d3b0024c800488e43146fae54c97 e119237c29232fca2b1d8461bdc791a06db75481
compare 13ac5c989afc29cbf79a5313cf92b501aee4024a c147cc4f2e727538479c29d196447bdd50a9cfde
compare f2d06e39e1ec38edef9d3e1f950095c2385665a7 d08a27616f9095e33711f0fb348c5de9bb82cf2b
compare 64c75de2dd861d7618eb507416c43cc87e5a322e b184e73f66740325bd14e00a21f68f0e7aee95ca
compare b70a6749eec07762f941036bcc41b15d9381fc6e 477c6267e0b81c183e1549567321b743e171387f
compare 892c1a0a644241093831dda558d01988b7ffabaf 51febc6ee3e23a0fac317985d7b14a277b96204e
compare ad40d5ee9d6849d7cacd3281c082d4c77f268ab3 40bcb22592d8a4b66bc05f783b7d85a6f72d75b5
compare a65b41683f8d6672dc0e96869b3e57efa07113d6 8aa90d8200b9477136f74e6c00e45c4570d60a28
compare 93d8d472c7199c208728e8f7dbde6a7a8e46ffb3 975f24cb6a13e59fd151adffae4f222838cdc94c
compare 2cce9dc110a7aef6cddf8318fe4a13c11a416b4b 5f775d7c21c5efd50da0d49eac7d6c5e9a4fe9da
compare 40d723d13e35b35102fd56aece6767f5781a4d1d 32a39c4361028931fc3a8f211fe583f6085f8494
compare fe8290a1498e6f11c62c73b917e2a70910d89358 2bd879a4b36c13ab868875948b8046b858aaa8d7
compare 285cba189904bb6d2b6bee41afedf1250b93a50d 66a0a695b935cd91ce62ad88ea17a2a820851c68
compare 3417ab05886dbdc70e9f66941d00a298a3e8a039 be1e907be594c320c3d4e72b3e91e66eb4351330
compare 17bf8af2dfb5d49d84b82aa21d20f4081f98cf5e be0fc1d9991c8660c371a73d32f1598b39324f19
compare 8bbe642011a2bc0d23f20df6427ec96a0194404c eb3c892d3fb36f87c2963bffb5b48a77704d253f
compare f489e253724727a908301577b17439673edb848c 26e11dcfb0662270c5514d4ab2118bbd1c5fcc8f
compare bb2f578e2d35961b786279e4c358a30fb006c1f4 5e5ac0ed0cd6b289a44097ba1be3a69283b4fb80
compare f32da423739c282b48aa8cba0258eee1afc8f229 ef2ca6e34c93df1f65806ea2819118df2a8151d0
compare 102dbdef3d8ac31f951b462faf922662d049a733 b63d580f8b02eafc1ad4de751216f3dea67d6e7f
compare 41d3041a4b36a174dd988b3cf425c4ee4bb5e4c8 5598ad24d6cd9e11b1731d362f2cf0b77fd0a1af
} | sort | uniq -c
68 new
It’s like my repository is containing commits that were computed using a different hash algorithm. Yet, considering that the gist is 10 years old, it looks like this algorithm did not change recently.
what about the promise of content addressable storage?
I tend to trust content addressable storage, like git or ipfs, because of the promise that you will always find the content, provided the hash.
Now that I can see that I can have two hashes that identify the same commit, how can I trust one or the other. Now that I pushed the new repository with the new hashes in the remote, the old hashes, like 4c10dbdd7ca357948521ad741182e119dd45d963, will eventually disappear. This means that all my previous communications that included git hash became obsolete?
Now, I feel like git might be not as trustworthy as I thought…
gitlab in the way?
This repository was hosted by gitlab. Maybe gitlab does some stuff to change the commit hashes?
After force pushing the content in gitlab, I tried getting a fresh clone, and the hashes as obtained after git-filter-repo are still in here. So I guess gitlab is not the culprit here.
what does fsck say?
The git-fsck help says:
git-fsck tests SHA-1 and general object sanity,
So it should realize that and complain.
git fsck --strict
(256/256)
Checking object directories: 100% (256/256), done.
(0/2478)
Checking objects: 100% (2478/2478), done.
dangling tag 4c13828c2d8ee3a30d21910a97a93c759f897db1
dangling tag c229d644acc0890696333022f355b689302db48d
dangling tag 4dcaf994d6edc84d3b4acf01d0deb2332168e26a
It sounds like it did not find that those objects are bad. Then git knows how to compute this other hash and there is definitely something going on there.
a change of algorithm with retro compatible support?
We know for sure that git-filter-repo outputs hashes that are using the same algorithm than the gist.
Maybe git-filter-repo uses an old algorithm and git is retro compatible.
It’s quite simple to test. We can simply create a new commit with git and see whether its hash also follows the algorithm of the gist.
echo "hello" > a
git add a
git commit -m "try a new commit from git itself and not git-filter-repo"
hash="$(git hash)"
echo "${hash}"
compute "${hash}"
1 file changed, 1 insertion(+)
create mode 100644 a
ae3bd673558021739efdf054c8565d297b9c6e66
ae3bd673558021739efdf054c8565d297b9c6e66
We can reject the hypotheses of a difference of algorithm between git and git-filter-repo.
a move to sha256?
It could be due to the change of hash algorithm to sha256
I don’t honestly believe in this, because the gist is 10 years old and my git is recent enough.
Also, all the facts are against this hypothesis.
First, according to the man page of git
extensions.objectFormat Specify the hash algorithm to use. The acceptable values are sha1 and sha256. If not specified, sha1 is assumed. It is an error to specify this key unless core.repositoryFormatVersion is 1.
Note that this setting should only be set by git-init(1) or git-clone(1). Trying to change it after initialization will not work and will produce hard-to-diagnose issues.
Then
git config core.repositoryFormatVersion
0
In addition, the commits I create with git are compatible with the gist, using sha1.
Finally, even bootstrapping a new repository uses sha1.
pushd "$(mktemp -d)" > /dev/null
{
git init
git config extensions.objectFormat
}
popd > /dev/null
Initialized empty Git repository in /home/sam/tmp/tmp.B6lPLJivt2/.git/
0
Those are may be not enough to definitely reject the idea that somehow my git was using sha256 at some point. But they are strong enough to not investigate much effort in that direction.
is gpgsig causing the issue?
I remembered that I sign all my commit. But git-filter-repo did not.
I should have seen the gpgsig header in the git cat-file of the initial commit. It is strange that I did not.
Cloning the repository in a fresh location shows something interesting.
OTHER_LOCATION="$(mktemp -d)"
git clone . "${OTHER_LOCATION}"
diff -u <(git -C "${OTHER_LOCATION}" cat-file -p 4c10dbdd7ca357948521ad741182e119dd45d963) <(git -C "${OTHER_LOCATION}" cat-file -p bcefe2e6d93da969156da74a6d1df99483e8251e)
compute 4c10dbdd7ca357948521ad741182e119dd45d963
compute bcefe2e6d93da969156da74a6d1df99483e8251e
Cloning into '/home/sam/tmp/tmp.OusIb8K4iX'...
done.
--- /dev/fd/63 2021-10-14 10:27:54.713320894 +0200
+++ /dev/fd/62 2021-10-14 10:27:54.713320894 +0200
@@ -1,17 +1,6 @@
tree 732db1c7196d230d9d5175342c8109c38edc81c7
1623168292 +0200
1623168292 +0200
-gpgsig -----BEGIN PGP SIGNATURE-----
-
- iQFIBAABCgAyFiEEWZO+etpl4tkGzlw2ddI87XQ5EGoFAmC/lSQUHGtvbnViaW5p
- eEBnbWFpbC5jb20ACgkQddI87XQ5EGrjCQgAoBRkhuBK4JtGDpdW+3OYwKwhd6Qk
- kVITkX+TgRcrb2kYG5EF3cqyEI6XECWBKeFdD4liJhnRv03J08uTqauv3+epmHqG
- Upgc04EVBmQ2cm5/kLDw22YCPXqJ5g1pRHgt8lxctAv7xo21haX/+SlpMP+lW/6w
- m1Xe9vNUqIpnt2agQTigf3RUYKz7Uikzgnb9gm7rdlYz7B8HAc848DYieKTOPkJJ
- P6MnZWE931dB5FQAyDuhGX7xmW+EJly5gEPOpBH3ZO/ebwnJRvFffNmlixApVj6v
- Ddd/qaAtjqG2g/Xcm3KJ2wTZ11yw94vYQulC3qi1OoTCJtPWplrs6l+7kA==
- =vepN
- -----END PGP SIGNATURE-----
First commit
4c10dbdd7ca357948521ad741182e119dd45d963
bcefe2e6d93da969156da74a6d1df99483e8251e
Here we go. The difference appears to be explained by the presence of the gpgsig.
Also, the algorithm now returns the correct value for both commits.
why did it not appear when running git cat-file in my first repository?
Even running git show --show-signature 4c10dbdd7ca357948521ad741182e119dd45d963
shows the signature in a fresh clone but not in my initial one.
git show --show-signature 4c10dbdd7ca357948521ad741182e119dd45d963|gi "signature"
git -C "${OTHER_LOCATION}" show --show-signature 4c10dbdd7ca357948521ad741182e119dd45d963|gi "signature"
gpg: Signature made Tue Jun 8 18:04:52 2021 CEST
git verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963
gpg: Signature made Tue Jun 8 18:04:52 2021 CEST
gpg: using RSA key 5993BE7ADA65E2D906CE5C3675D23CED7439106A
gpg: issuer "konubinix@gmail.com"
That is event stranger. All the git command behave as if the commit did not have a signature at all, except that the hash shows that it is a signed commit.
are the refs messing up with the signature?
One of the main differences between a clone and the initial repository are the references. Let’s try putting back the references in the fresh clone
git pack-refs
git -C "${OTHER_LOCATION}" pack-refs
cp ".git/packed-refs" "${OTHER_LOCATION}/.git/packed-refs"
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963
And now by removing the refs.
rm "${OTHER_LOCATION}/.git/packed-refs"
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963
gpg: Signature made Tue Jun 8 18:04:52 2021 CEST
gpg: using RSA key 5993BE7ADA65E2D906CE5C3675D23CED7439106A
gpg: issuer "konubinix@gmail.com"
The signature is back
It looks like this is a good path to understanding the issue.
Let’s try putting the refs one by one to find out which one causes the issue.
can_see_signature ( ) {
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963 2>&1|grep -q signature
}
rm "${OTHER_LOCATION}/.git/packed-refs"
cat .git/packed-refs | while read line
do
echo "${line}" >> "${OTHER_LOCATION}/.git/packed-refs"
if ! can_see_signature
then
echo "${line}"
break
fi
done
bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/4c10dbdd7ca357948521ad741182e119dd45d963
This is not very surprising to find out a ref related to the troubling commit.
can_see_signature ( ) {
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963 2>&1|grep -q signature
}
rm "${OTHER_LOCATION}/.git/packed-refs"
can_see_signature && echo "I can see the signature" || echo "I cannot see the signature"
echo "bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/4c10dbdd7ca357948521ad741182e119dd45d963" > "${OTHER_LOCATION}/.git/packed-refs"
can_see_signature && echo "I can see the signature" || echo "I cannot see the signature"
I can see the signature
I cannot see the signature
Is it the fact that the commit hash is part of a name of ref that causes the issue
can_see_signature ( ) {
git -C "${OTHER_LOCATION}" verify-commit 4c10dbdd7ca357948521ad741182e119dd45d963 2>&1|grep -q signature
}
test_signature ( ){
can_see_signature && echo "I can see the signature" || echo "I cannot see the signature"
}
echo "# Without any ref"
rm "${OTHER_LOCATION}/.git/packed-refs"
test_signature
echo "# Trying appending to the ref"
echo "bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/4c10dbdd7ca357948521ad741182e119dd45d963_andsomethingelse" > "${OTHER_LOCATION}/.git/packed-refs"
test_signature
echo "# Trying appending to the ref without underscore"
echo "bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/4c10dbdd7ca357948521ad741182e119dd45d963andsomethingelse" > "${OTHER_LOCATION}/.git/packed-refs"
test_signature
echo "# Trying prefixing to the ref"
echo "bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/something_4c10dbdd7ca357948521ad741182e119dd45d963" > "${OTHER_LOCATION}/.git/packed-refs"
test_signature
echo "# Trying removing part of the hash"
echo "bcefe2e6d93da969156da74a6d1df99483e8251e refs/replace/4c10dbdd7ca357948521ad741182e119dd45d96" > "${OTHER_LOCATION}/.git/packed-refs"
test_signature
echo "# Trying pointing to another hash"
echo "bcefe2e6d93da969156da74a6d1df99483e8251f refs/replace/4c10dbdd7ca357948521ad741182e119dd45d963" > "${OTHER_LOCATION}/.git/packed-refs"
test_signature
# Without any ref
I can see the signature
# Trying appending to the ref
I cannot see the signature
# Trying appending to the ref without underscore
I cannot see the signature
# Trying prefixing to the ref
I can see the signature
# Trying removing part of the hash
I can see the signature
# Trying pointing to another hash
I cannot see the signature
It appears to have something to do with references that look like refs/replace/XX
.
beware git-replace
every time you refer to this object, pretend it’s a different object
Looking after refs/replace/XX
I learn about git replace and found out that the
commands of git may show information about another commit than the one
referenced.
For instance, let’s use a new repository in which I create two commits that respectively add a file named “a” with the content a and a file named b with the content “b”
TEMP_DIR="$(mktemp -d)"
pushd "${TEMP_DIR}" > /dev/null
{
git init
echo a > a
git add a
git commit -m "Adding a"
REF_ADDING_A="$(git hash)"
echo b > b
git add b
git commit -m "Adding b"
REF_ADDING_B="$(git hash)"
}
popd > /dev/null
Initialized empty Git repository in /home/sam/tmp/tmp.MTOlVoCLPn/.git/
[develop (root-commit) c0ec27c] Adding a
1 file changed, 1 insertion(+)
create mode 100644 a
[develop de6f218] Adding b
1 file changed, 1 insertion(+)
create mode 100644 b
When I take a look at the commit that added a and the commit that added b, it is no surprise that git shows the correct commits.
show_a_and_b () {
pushd "${TEMP_DIR}" > /dev/null
{
echo "####### Showing the commit that added A: ${REF_ADDING_A}"
git show "${REF_ADDING_A}"
echo "####### Showing the commit that added B: ${REF_ADDING_B}"
git show "${REF_ADDING_B}"
}
popd > /dev/null
}
show_a_and_b
####### Showing the commit that added A: c0ec27c33608d091fde5964804d396a3900c94f4
commit c0ec27c33608d091fde5964804d396a3900c94f4
Author: Samuel Loury <konubinixweb@gmail.com>
Date: Tue Oct 19 10:14:52 2021 +0200
Adding a
diff --git a/a b/a
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/a
@@ -0,0 +1 @@
+a
####### Showing the commit that added B: de6f218ea53534e59f9ab006a30a7047bd5333ca
commit de6f218ea53534e59f9ab006a30a7047bd5333ca (HEAD -> develop)
Author: Samuel Loury <konubinixweb@gmail.com>
Date: Tue Oct 19 10:14:52 2021 +0200
Adding b
diff --git a/b b/b
new file mode 100644
index 0000000..6178079
--- /dev/null
+++ b/b
@@ -0,0 +1 @@
+b
And, also without surprise, computing the hashes will find the correct hashes.
pushd "${TEMP_DIR}" > /dev/null
{
compute c0ec27c33608d091fde5964804d396a3900c94f4
compute de6f218ea53534e59f9ab006a30a7047bd5333ca
}
popd > /dev/null
c0ec27c33608d091fde5964804d396a3900c94f4
de6f218ea53534e59f9ab006a30a7047bd5333ca
Now, add a special ref with the value of REF_ADDING_A that points to REF_ADDING_B. And see what happens.
pushd "${TEMP_DIR}" > /dev/null
{
mkdir -p ".git/refs/replace/"
echo "${REF_ADDING_B}" > ".git/refs/replace/${REF_ADDING_A}"
}
popd > /dev/null
show_a_and_b
####### Showing the commit that added A: c0ec27c33608d091fde5964804d396a3900c94f4
commit c0ec27c33608d091fde5964804d396a3900c94f4 (replaced)
Author: Samuel Loury <konubinixweb@gmail.com>
Date: Tue Oct 19 10:14:52 2021 +0200
Adding b
####### Showing the commit that added B: de6f218ea53534e59f9ab006a30a7047bd5333ca
commit de6f218ea53534e59f9ab006a30a7047bd5333ca (HEAD -> develop)
Author: Samuel Loury <konubinixweb@gmail.com>
Date: Tue Oct 19 10:14:52 2021 +0200
Adding b
Now you can see that even though git shows the correct ref for a and b (respectively c0ec27c33608d091fde5964804d396a3900c94f4 and de6f218ea53534e59f9ab006a30a7047bd5333ca).
The content is actually the one of de6f218ea53534e59f9ab006a30a7047bd5333ca in both cases.
And now, computing the hashes will show the same hash (the one of adding b) for both.
pushd "${TEMP_DIR}" > /dev/null
{
compute c0ec27c33608d091fde5964804d396a3900c94f4
compute de6f218ea53534e59f9ab006a30a7047bd5333ca
}
popd > /dev/null
de6f218ea53534e59f9ab006a30a7047bd5333ca
de6f218ea53534e59f9ab006a30a7047bd5333ca
By the way, creating such a ref can be eased with the replace command.
I can remove the previously created replacement with :
pushd "${TEMP_DIR}" > /dev/null
{
git replace -d "${REF_ADDING_A}"
compute c0ec27c33608d091fde5964804d396a3900c94f4
compute de6f218ea53534e59f9ab006a30a7047bd5333ca
}
popd > /dev/null
Deleted replace ref 'c0ec27c33608d091fde5964804d396a3900c94f4'
c0ec27c33608d091fde5964804d396a3900c94f4
de6f218ea53534e59f9ab006a30a7047bd5333ca
And I can put it back.
pushd "${TEMP_DIR}" > /dev/null
{
git replace "${REF_ADDING_A}" "${REF_ADDING_B}"
compute c0ec27c33608d091fde5964804d396a3900c94f4
compute de6f218ea53534e59f9ab006a30a7047bd5333ca
}
popd > /dev/null
de6f218ea53534e59f9ab006a30a7047bd5333ca
de6f218ea53534e59f9ab006a30a7047bd5333ca
Actually, git-filter-repo explains that they make use of git-replace to make old links still work while now showing the new content.
Additionally, several concerns are handled automatically (many of these can be overridden, but they are all on by default):
[…]
• creating replace-refs (see git-replace(1)) for old commit hashes, which if manually pushed and fetched will allow users to continue to refer to new commits using (unabbreviated) old commit IDs […] Parent rewriting –replace-refs {delete-no-add, delete-and-add, update-no-add, update-or-add, update-and-add} Replace refs (see git-replace(1)) are used to rewrite parents (unless turned off by the usual git mechanism); this flag specifies what do do with those refs afterward. Replace refs can either be deleted or updated to point at new commit hashes. Also, new replace refs can be added for each commit rewrite. With update-or-add, new replace refs are only added for commit rewrites that aren’t used to update an existing replace ref. default is update-and-add if $GIT_DIR/filter-repo/already_ran does not exist; update-or-add otherwise.
Man 1 git-filter-repo
That looks to me like a legitimate use of git-replace.
Finally, the documentation of git replace
tells how to disable this behavior.
It is possible to disable use of replacement references for any command using the –no-replace-objects option just after git.
For example if commit foo has been replaced by commit bar:
$ git –no-replace-objects cat-file commit foo
shows information about commit foo, while:
$ git cat-file commit foo
shows information about commit bar.
The GIT_NO_REPLACE_OBJECTS environment variable can be set to achieve the same effect as the –no-replace-objects option.
Man 1 git-replace
Let’s try for example using GIT_NO_REPLACE_OBJECTS
and let’s see if the commit
gets replaced.
export GIT_NO_REPLACE_OBJECTS
pushd "${TEMP_DIR}" > /dev/null
{
compute c0ec27c33608d091fde5964804d396a3900c94f4
compute de6f218ea53534e59f9ab006a30a7047bd5333ca
}
popd > /dev/null
c0ec27c33608d091fde5964804d396a3900c94f4
de6f218ea53534e59f9ab006a30a7047bd5333ca
It looks like it works. Hopefully in the future when I have an issue with two apparently identical commit I will remember to export this variable in my investigation.
As a conclusion, I don’t have anymore the feeling that two identical commits have different hashes and can continue thinking that git is one of the most incredible tools :-).