base64 and base58 are generally presented as being similar stuffs.
Base58 is just another encoding format (with 58 characters instead of 64
It is similar to Base64 but has been modified to
Essentially Base58 is similar to the ubiquitous Base64 minus several characters that could confuse readers later when read back into a wallet.
Base58 is a binary-to-text encoding scheme. It is similar to Base64 but has been modified to avoid both non-alphanumeric characters and letters which might look ambiguous when printed.
There’s also Base58, which is often used for bitcoin addresses, which omits a few characters that people might type wrong. For example, the ‘0’ and ‘O’. Or ‘I’ and ‘l’.
— https://www.quora.com/What-are-some-alternatives-to-using-base64-encoding
Even the ietf, if not read carefully, may give this impression
The Base58 encoding scheme is similar to the Base64 encoding scheme in that it can translate any binary data to a text string. It is different from Base64 in that the conversion alphabet has been carefully picked to work well in environments where a person, such as a developer or support technician, might need to visually confirm the information with low error rates.
They indeed have similarities
And indeed, they both follow basically the same logic of decomposing a number in a big base, using some alphabet to represent it.
# https://datatracker.ietf.org/doc/html/rfc4648#section-4
b64alphabet = {0:"A", 1:"B", 2:"C", 3:"D", 4:"E", 5:"F", 6:"G", 7:"H", 8:"I", 9:"J", 10:"K", 11:"L", 12:"M", 13:"N", 14:"O", 15:"P", 16:"Q", 17:"R", 18:"S", 19:"T", 20:"U", 21:"V", 22:"W", 23:"X", 24:"Y", 25:"Z", 26:"a", 27:"b", 28:"c", 29:"d", 30:"e", 31:"f", 32:"g", 33:"h", 34:"i", 35:"j", 36:"k", 37:"l", 38:"m", 39:"n", 40:"o", 41:"p", 42:"q", 43:"r", 44:"s", 45:"t", 46:"u", 47:"v", 48:"w", 49:"x", 50:"y", 51:"z", 52:"0", 53:"1", 54:"2", 55:"3", 56:"4", 57:"5", 58:"6", 59:"7", 60:"8", 61:"9", 62:"+", 63:"/",}
# https://tools.ietf.org/id/draft-msporny-base58-01.html#rfc.section.2
b58alphabet = {0:"1", 1:"2", 2:"3", 3:"4", 4:"5", 5:"6", 6:"7", 7:"8", 8:"9", 9:"A", 10:"B", 11:"C", 12:"D", 13:"E", 14:"F", 15:"G", 16:"H", 17:"J", 18:"K", 19:"L", 20:"M", 21:"N", 22:"P", 23:"Q", 24:"R", 25:"S", 26:"T", 27:"U", 28:"V", 29:"W", 30:"X", 31:"Y", 32:"Z", 33:"a", 34:"b", 35:"c", 36:"d", 37:"e", 38:"f", 39:"g", 40:"h", 41:"i", 42:"j", 43:"k", 44:"m", 45:"n", 46:"o", 47:"p", 48:"q", 49:"r", 50:"s", 51:"t", 52:"u", 53:"v", 54:"w", 55:"x", 56:"y", 57:"z",}
To decompose a number, we simply have to follow the Euclidean algorithm.
def euclid(number, alphabet, base):
while number > 0:
print(alphabet[number % base])
number = number // base
Let’s use the following number as an example to illustrate this decomposition.
number = 0b101010010101101010100110
Then, decomposing in base64 gives.
euclid(number, b64alphabet, 64)
m
q
V
q
Hence the base64 representation of the number is qVqm
And indeed:
import base64
print(base64.b64encode((number).to_bytes(3, 'big')).decode())
qVqm
And in base58, it is similar
euclid(number, b58alphabet, 58)
import base58
print(base58.b58encode((number).to_bytes(3, 'big')).decode())
T
H
t
y
ytHT
But they have different properties
Then, it is astonishing to find out that appending to a string won’t change the beginning of its base64 representation.
print(base64.b64encode(b"a").decode())
print(base64.b64encode(b"ab").decode())
print(base64.b64encode(b"abc").decode())
print(base64.b64encode(b"abcd").decode())
print(base64.b64encode(b"abcde").decode())
YQ==
YWI=
YWJj
YWJjZA==
YWJjZGU=
While doing so will change the base58 representation.
print(base58.b58encode(b"a").decode())
print(base58.b58encode(b"ab").decode())
print(base58.b58encode(b"abc").decode())
print(base58.b58encode(b"abcd").decode())
print(base58.b58encode(b"abcde").decode())
2g
8Qq
ZiCa
3VNr6P
BzFRgmr
Because they are actually different algorithms
This is because, even though they are looking alike, base64 always zero-pads the input so that it is composed of a number of bits that is a multiple of 6. It adds the character = at the end to indicate the padding. While base58 is much simpler and simply shows the mathematical representation in a base 58.
In the example of the number above, it worked nicely, because the number did not need padding. Indeed, I used a 3 octets (24 bits) number, which is the least common multiple of 6 and 8.
Also, base64 will not use the Euclidean algorithm, but will groups bits by clusters of 6 from left to right. While base58 will follow the Euclidean algorithm and artificially prefix 1 for each hexadecimal 0 in front of the number.
That means that with the same number, with a different number of leading 0, the base64 representation will not look like the Euclidean algorithm at all
euclid(number, b64alphabet, 64)
print(base64.b64encode((number).to_bytes(1 + 3, 'big')).decode())
m
q
V
q
AKlapg==
But, if we use a number of leading 0 that is a common multiple of 8 and 6 (3 octets = 3*8 bits = 24 bits = 4*6 bits), we then can see the decomposition, prefixed with A (the base64 value of 0).
euclid(number, b64alphabet, 64)
print(base64.b64encode((number).to_bytes(3 + 3, 'big')).decode())
m
q
V
q
AAAAqVqm
While in base58, the result looks much like we expect from a base.
euclid(number, b58alphabet, 58)
print(base58.b58encode((number).to_bytes(4, 'big')).decode())
print(base58.b58encode((number).to_bytes(6, 'big')).decode())
T
H
t
y
1ytHT
111ytHT
base58 is more like decimal while base64 is more like hexadecimal
In that sense, I would say that base58 is much more similar to “decimal” and “binary” representations, while base64 looks more like “hexadecimal” representations.
Indeed, writing the same number with a different number of leading zeros, in binary, gives the same result, more like base58.
print( 0b101010010101101010100110)
print( 0b0101010010101101010100110)
print( 0b00101010010101101010100110)
print(0b000101010010101101010100110)
11098790
11098790
11098790
11098790
While in hexadecimal, the results is totally changed, unless you use a common multiple, like base64
I don’t know how to dump hexadecimal representation of binary representation that keep the leading 0 in python, but theoretically, the result would look like this
0b101010010101101010100110 -> 0xa95aa6
0b0101010010101101010100110 -> 0x54ad530
0b00101010010101101010100110 -> 0x2a56a98
0b000101010010101101010100110 -> 0x152b54c
0b0000101010010101101010100110 -> 0x0a95aa6 (same as the first, because we added 4 bits, the size of one hexadecimal number)