Konubinix' site

Base64 and Base58 Are Not as Similar as It May Seem

base64 and base58 are generally presented as being similar stuffs.

Base58 is just another encoding format (with 58 characters instead of 64

https://qvault.io/cryptography/base64-vs-base58-encoding/#:~:text=Base64%20is%20one%20of%20the,to%20Bitcoin%20and%20other%20cryptocurrencies.

It is similar to Base64 but has been modified to

https://en.bitcoinwiki.org/wiki/Base58

Essentially Base58 is similar to the ubiquitous Base64 minus several characters that could confuse readers later when read back into a wallet.

https://crypto.bi/base58/

Base58 is a binary-to-text encoding scheme. It is similar to Base64 but has been modified to avoid both non-alphanumeric characters and letters which might look ambiguous when printed.

https://monerodocs.org/cryptography/base58/

There’s also Base58, which is often used for bitcoin addresses, which omits a few characters that people might type wrong. For example, the ‘0’ and ‘O’. Or ‘I’ and ‘l’.

https://www.quora.com/What-are-some-alternatives-to-using-base64-encoding

Even the ietf, if not read carefully, may give this impression

The Base58 encoding scheme is similar to the Base64 encoding scheme in that it can translate any binary data to a text string. It is different from Base64 in that the conversion alphabet has been carefully picked to work well in environments where a person, such as a developer or support technician, might need to visually confirm the information with low error rates.

https://tools.ietf.org/id/draft-msporny-base58-01.html

They indeed have similarities

And indeed, they both follow basically the same logic of decomposing a number in a big base, using some alphabet to represent it.

# https://datatracker.ietf.org/doc/html/rfc4648#section-4
b64alphabet = {0:"A", 1:"B", 2:"C", 3:"D", 4:"E", 5:"F", 6:"G", 7:"H", 8:"I", 9:"J", 10:"K", 11:"L", 12:"M", 13:"N", 14:"O", 15:"P", 16:"Q", 17:"R", 18:"S", 19:"T", 20:"U", 21:"V", 22:"W", 23:"X", 24:"Y", 25:"Z", 26:"a", 27:"b", 28:"c", 29:"d", 30:"e", 31:"f", 32:"g", 33:"h", 34:"i", 35:"j", 36:"k", 37:"l", 38:"m", 39:"n", 40:"o", 41:"p", 42:"q", 43:"r", 44:"s", 45:"t", 46:"u", 47:"v", 48:"w", 49:"x", 50:"y", 51:"z", 52:"0", 53:"1", 54:"2", 55:"3", 56:"4", 57:"5", 58:"6", 59:"7", 60:"8", 61:"9", 62:"+", 63:"/",}
# https://tools.ietf.org/id/draft-msporny-base58-01.html#rfc.section.2
b58alphabet = {0:"1", 1:"2", 2:"3", 3:"4", 4:"5", 5:"6", 6:"7", 7:"8", 8:"9", 9:"A", 10:"B", 11:"C", 12:"D", 13:"E", 14:"F", 15:"G", 16:"H", 17:"J", 18:"K", 19:"L", 20:"M", 21:"N", 22:"P", 23:"Q", 24:"R", 25:"S", 26:"T", 27:"U", 28:"V", 29:"W", 30:"X", 31:"Y", 32:"Z", 33:"a", 34:"b", 35:"c", 36:"d", 37:"e", 38:"f", 39:"g", 40:"h", 41:"i", 42:"j", 43:"k", 44:"m", 45:"n", 46:"o", 47:"p", 48:"q", 49:"r", 50:"s", 51:"t", 52:"u", 53:"v", 54:"w", 55:"x", 56:"y", 57:"z",}

To decompose a number, we simply have to follow the Euclidean algorithm.

def euclid(number, alphabet, base):
  while number > 0:
      print(alphabet[number % base])
      number = number // base

Let’s use the following number as an example to illustrate this decomposition.

number = 0b101010010101101010100110

Then, decomposing in base64 gives.

euclid(number, b64alphabet, 64)
m
q
V
q

Hence the base64 representation of the number is qVqm

And indeed:

import base64
print(base64.b64encode((number).to_bytes(3, 'big')).decode())
qVqm

And in base58, it is similar

euclid(number, b58alphabet, 58)
import base58
print(base58.b58encode((number).to_bytes(3, 'big')).decode())
T
H
t
y
ytHT

But they have different properties

Then, it is astonishing to find out that appending to a string won’t change the beginning of its base64 representation.

print(base64.b64encode(b"a").decode())
print(base64.b64encode(b"ab").decode())
print(base64.b64encode(b"abc").decode())
print(base64.b64encode(b"abcd").decode())
print(base64.b64encode(b"abcde").decode())
YQ==
YWI=
YWJj
YWJjZA==
YWJjZGU=

While doing so will change the base58 representation.

print(base58.b58encode(b"a").decode())
print(base58.b58encode(b"ab").decode())
print(base58.b58encode(b"abc").decode())
print(base58.b58encode(b"abcd").decode())
print(base58.b58encode(b"abcde").decode())
2g
8Qq
ZiCa
3VNr6P
BzFRgmr

Because they are actually different algorithms

This is because, even though they are looking alike, base64 always zero-pads the input so that it is composed of a number of bits that is a multiple of 6. It adds the character = at the end to indicate the padding. While base58 is much simpler and simply shows the mathematical representation in a base 58.

In the example of the number above, it worked nicely, because the number did not need padding. Indeed, I used a 3 octets (24 bits) number, which is the least common multiple of 6 and 8.

Also, base64 will not use the Euclidean algorithm, but will groups bits by clusters of 6 from left to right. While base58 will follow the Euclidean algorithm and artificially prefix 1 for each hexadecimal 0 in front of the number.

That means that with the same number, with a different number of leading 0, the base64 representation will not look like the Euclidean algorithm at all

euclid(number, b64alphabet, 64)
print(base64.b64encode((number).to_bytes(1 + 3, 'big')).decode())
m
q
V
q
AKlapg==

But, if we use a number of leading 0 that is a common multiple of 8 and 6 (3 octets = 3*8 bits = 24 bits = 4*6 bits), we then can see the decomposition, prefixed with A (the base64 value of 0).

euclid(number, b64alphabet, 64)
print(base64.b64encode((number).to_bytes(3 + 3, 'big')).decode())
m
q
V
q
AAAAqVqm

While in base58, the result looks much like we expect from a base.

euclid(number, b58alphabet, 58)
print(base58.b58encode((number).to_bytes(4, 'big')).decode())
print(base58.b58encode((number).to_bytes(6, 'big')).decode())
T
H
t
y
1ytHT
111ytHT

base58 is more like decimal while base64 is more like hexadecimal

In that sense, I would say that base58 is much more similar to “decimal” and “binary” representations, while base64 looks more like “hexadecimal” representations.

Indeed, writing the same number with a different number of leading zeros, in binary, gives the same result, more like base58.

print(   0b101010010101101010100110)
print(  0b0101010010101101010100110)
print( 0b00101010010101101010100110)
print(0b000101010010101101010100110)
11098790
11098790
11098790
11098790

While in hexadecimal, the results is totally changed, unless you use a common multiple, like base64

I don’t know how to dump hexadecimal representation of binary representation that keep the leading 0 in python, but theoretically, the result would look like this

    0b101010010101101010100110 -> 0xa95aa6
   0b0101010010101101010100110 -> 0x54ad530
  0b00101010010101101010100110 -> 0x2a56a98
 0b000101010010101101010100110 -> 0x152b54c
0b0000101010010101101010100110 -> 0x0a95aa6 (same as the first, because we added 4 bits, the size of one hexadecimal number)