How Do I Store My Files?


tl;dr: Using ipfs as backend.

When I have a file that matters to me, I run on it

ipfs add --quieter --pin=false "myfile"

So as to retrieve its ipfs cid.

I use this sooooo often. I created the alias ipfa to avoid typing the whole command each time.

I don’t use pins (see ipfs pinning sucks), and have a postgresql database of all the cids that matter to me.

If the file matters to me, it must be referred to in at least one of my source systems:

  1. either it is a bill, then its cid belongs to the metadata of the transaction in my ledger,
  2. or it is a photograph and goes to my personal gallery,
  3. or it is a file artifact of a concept, hence it goes in my second brain,

Then, I wrote a program that parses all those systems and sync this information with the postgresql database.

Then, another program ensures that, for each cid in the postgresql database, at least two nodes of my private ipfs cluster get it (as in ipfs get). This ensures duplication and mitigates risks of loss of data.

Let’s see an example.

Imagine I’m interested in storing some file with only the content “hello world”

I just need to have its cid

echo "hello wolrd" | ipfa

And voilĂ ! Because the cid is now written in this note, it will be automatically put in at least two private nodes.

No need to think on where to store the file (with a path) and how to name it.

The relevant database information is now

clk docs file --cid show --field cid --field added --field allocations
cid                                                   added                             allocations
----------------------------------------------------  --------------------------------  -------------  2021-10-08 23:45:05.726129+02:00  barberry kpi

Here, kpi and barberry are the names of the two ipfs nodes of my cluster that have been asked to duplicate the file.

If in the future I remove the cid from this note, its content will also eventually disappear. Therefore, there is no more the issue of storing old files in some exotic locations.

For instance, as long as I want to “remember” about the RC car wheel I made, the created wheel will be available. Clicking on the RC car wheel article and getting a 404 error does not make sense, as the file existence is intrinsically linked to the existence of the article itself. Also, if I want to remove the article and then “forget” about the RC car wheel, then the blender file will also disappear.

You might argue that the file might still stay there due to some old obsolete note. Indeed, but keeping a second brain up-to-date is another topic that I already do (or else the whole second brain stuff would be vain).

I only think about high level concepts and not about storing files. The file storage is only a side effect of my thinking and consume no cognitive load, at all. There is no more question like “where the hell did I put this file?” but retrieving a file is no more about “what was this file about?”. Of course the fact all my notes are in a complex network helps a lot for it allow me to think about a related topic and still jump from note to note to reach the one with the correct concept and hence the needed file.

For instance, I might want to get the “RC car wheel blender model” or “the model that I used as an example in the article about how I store files”.

With a bit of tooling, I can easily monitor what is going on on the workers as well as the available disk size.

clk docs worker-status
ipfsworker.barberry.df        : 1030.3297691345215
ipfsworker.blueberry.df        : 922.028621673584
ipfsworker.boysenberry.df        : 1424.7212677001953
  ipfsworker.boysenberry.current :

Here, for instance, I can see that boysenberry is currently getting the file to its local store.