Understanding the IPFS White Paper part 2

Mon 21 August 2017

This article is part 5 of the Blockchain train journal, start reading here: Catching the Blockchain Train.

The IPFS White Paper: IPFS Design

The IPFS stack is visualized as follows:

or with more detail:

I borrowed both images from presentations by Juan Benet (the BDFL of IPFS).

The IPFS design in the white paper goes more or less through these layers, bottom-up:

The IPFS Protocol is divided into a stack of sub-protocols responsible for different functionality:

  1. Identities - manage node identity generation and verification.
  2. Network - manages connections to other peers, uses various underlying network protocols. Configurable.
  3. Routing - maintains information to locate specific peers and objects. Responds to both local and remote queries. Defaults to a DHT, but is swappable.
  4. Exchange - a novel block exchange protocol (BitSwap) that governs efficient block distribution. Modelled as a market, weakly incentivizes data replication. Trade Strategies swappable.
  5. Objects - a Merkle DAG of content-addressed immutable objects with links. Used to represent arbitrary data structures, e.g. file hierarchies and communication systems.
  6. Files - versioned file system hierarchy inspired by Git.
  7. Naming - A self-certifying mutable name system.

Here's my alternative naming of these sub-protocols:

  1. Identities: name those nodes
  2. Network: talk to other clients
  3. Routing: announce and find stuff
  4. Exchange: give and take
  5. Objects: organize the data
  6. Files: uh?
  7. Naming: adding mutability

Let's go through them and see if we can increase our understanding of IPFS a bit!

Identities: name those nodes

IPFS is a P2P network of clients; there is no central server. These clients are the nodes of the network and need a way to be identified by the other nodes. If you just number the nodes 1,2,3,... anyone can add a node with an existing ID and claim to be that node. To prevent that some cryptography is needed. IPFS does it like this:

  • generate a PKI key pair (public + private key)
  • hash the public key
  • the resulting hash is the NodeId

All this is done during the init phase of a node: ipfs init > the resulting keys are stored in ~/.ipfs/config and returns the NodeId.

When two nodes start communicating the following happens:

  • exchange public keys
  • check if: hash(other.PublicKey) == other.NodeId
  • if so, we have identified the other node and can e.g. request for data objects
  • if not, we disconnect from the "fake" node

The actual hashing algorithm is not specified in the white paper, read the note about that here:

Rather than locking the system to a particular set of function choices, IPFS favors self-describing values. Hash digest values are stored in multihash format, which includes a short header specifying the hash function used, and the digest length in bytes.



This allows the system to (a) choose the best function for the use case (e.g. stronger security vs faster performance), and (b) evolve as function choices change. Self-describing values allow using different parameter choices compatibly.

These multihashes are part of a whole family of self-describing hashes, and it is brilliant, check it out: multiformats.

Network: talk to other clients

The summary is this: IPFS works on top of any network (see the image above).

Interesting here is the network addressing to connect to a peer. IPFS uses multiaddr formatting for that. You can see it in action when starting a node:

Swarm listening on /ip4/

Swarm listening on /ip4/

Swarm listening on /ip4/

Swarm listening on /ip6/2a02:1234:9:0:21a:4aff:fed4:da32/tcp/4001

Swarm listening on /ip6/::1/tcp/4001

API server listening on /ip4/

Gateway (read-only) server listening on /ip4/

Routing: announce and find stuff

The routing layer is based on a DHT, as discussed in the previous episode, and its purpose is to:

  • announce that this node has some data (a block as discussed in the next chapter), or
  • find which nodes have some specific data (by referring to the multihash of a block), and
  • if the data is small enough (=< 1KB) the DHT stores the data as its value.

The command line interface and API don't expose the complete routing interface as specified in the white paper. What does work:

# tell the DHT we have this specific content:
$ ipfs dht provide QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

# ask for peers who have the content:
$ ipfs dht findprovs QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

# Get all multiaddr's for a peer
$ ipfs dht findpeer QmYebHWdWStasXWZQiXuFacckKC33HTbicXPkdSi5Yfpz6

ipfs put and ipfs get only work for ipns records in the API. Maybe storing small data on the DHT itself was not implemented (yet)?

Exchange: give and take

Data is broken up into blocks, and the exchange layer is responsible for distributing these blocks. It looks like BitTorrent, but it's different, so the protocol warrants its own name: BitSwap.

The main difference is that wherein BitTorrent blocks are traded with peers looking for blocks of the same file (torrent swarm), in BitSwap blocks are traded cross-file. So one big swarm for all IPFS data.

BitSwap is modeled as a marketplace that incentivizes data replication. The way this is implemented is called the BitSwap Strategy, and the white paper describes a feasible strategy and also states that the strategy can be replaced by another strategy. One such a bartering system can be based on a virtual currency, which is where FileCoin comes in.

Of course, each node can decide on its own strategy, so the generally used strategy must be resilient against abuse. When most nodes are set up to have some fair way of bartering it will work something like this:

  • when peers connect, they exchange which blocks they have (have_list) and which blocks they are looking for (want_list)
  • to decide if a node will actually share data, it will apply its BitSwap Strategy
  • this strategy is based on previous data exchanges between these two peers
  • when peers exchange blocks they keep track of the amount of data they share (builds credit) and the amount of data they receive (builds debt)
  • this accounting between two peers is kept track of in the BitSwap Ledger
  • if a peer has credit (shared more than received), our node will send the requested block
  • if a peer has debt, our node will share or not share, depending on a deterministic function where the chance of sharing becomes smaller when the debt is bigger
  • a data exchange always starts with the exchange of the ledger, if it is not identical our node disconnects

So this is set up kind of cool I think: game theory in action! The white paper further describes some edge cases like what to do if I have no blocks to barter with? The answer is simply to collect blocks that your peers are looking for, so you have something to trade.

Now let's have a look how we can poke around in the innards of the BitSwap protocol.

The command-line interface has a section blocks and a section bitswap; those sound relevant :)

To see bitswap in action, I'm going to request a large file Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t which is a video (download it to see what video!):

# ask for the file
$ ipfs get Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t

# in a seperate terminal, after requesting the file, I inspect the "bitswap wantlist"
$ ipfs bitswap wantlist

# find a node where we have debt
$ ipfs dht findprovs Qmdsrpg2oXZTWGjat98VgpFQb5u1Vdw5Gun2rgQ2Xhxa2t

# try one to see if we have downloaded from that node
$ ipfs bitswap ledger QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3
Ledger for <peer.ID SoLMeW>
Debt ratio: 0.000000
Exchanges:  11
Bytes sent: 0
Bytes received: 2883738

Thank you QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3; what a generous peer you are!

Now, have a look at the block commands:

# Let's pick a block from the wantlist above
$ ipfs block stat QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ff
Key: QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ff
Size: 262158

$ ipfs block get QmYEqofNsPNQEa7yNx93KgDycmrzbFkr5oc3NMKXMxx5ff > slice_of_a_movie
# results in a binary file of 262 KB

We'll have another look at how blocks fit in in the next chapter.

The three layers of the stack we described so far (network, routing, exchange) are implemented in libp2p.

Let's climb up the stack to the core of IPFS...

Objects: organize the data

Now it gets fascinating. You could summarize IPFS as: Distributed, authenticated, hash-linked data structures. These hash-linked data structures are where the Merkle DAG comes in (remember our previous episode?).

To create any data structure, IPFS offers a flexible and powerful solution:

  • organize the data in a graph, where we call the nodes of the graph objects
  • these objects can contain data (any sort of data, transparent to IPFS) and/or links to other objects
  • these links - Merkle Links - are simply the cryptographic hash of the target object

This way of organizing data has a couple of useful properties (quoting from the white paper):

  1. Content Addressing: all content is uniquely identified by its multihash checksum, including links.
  2. Tamper resistance: all content is verified with its checksum. If data is tampered with or corrupted, IPFS detects it.
  3. Deduplication: all objects that hold the exact same content are equal, and only stored once. This is particularly useful with index objects, such as git trees and commits, or common portions of data.

To get a feel for IPFS objects, check out this objects visualization example.

Another nifty feature is the use of unix-style paths, where a Merkle DAG has the structure:


We'll see an example below.

This is really all there is to it. Lets see it in action by replaying some examples from the quick-start:

$ mkdir foo
$ mkdir foo/bar
$ echo "baz" > foo/baz
$ echo "baz" > foo/bar/baz
$ tree foo/
├── bar
│   └── baz
└── baz
$ ipfs add -r foo
added QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR foo/bar/baz
added QmWLdkp93sNxGRjnFHPaYg8tCQ35NBY3XPn6KiETd3Z4WR foo/baz
added QmeBpzHngbHes9hoPjfDCmpNHGztkmZFRX4Yp9ftKcXZDN foo/bar
added QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm foo

# the last hash is the root-node, we can access objects through their path starting at the root, like:
$ ipfs cat /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/baz

# To inspect an object identified by a hash, we do
$ ipfs object get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm

# The above object has no data (except the mysterious \u0008\u0001) and two links

# If you're just interested in the links, use "refs":
$ ipfs refs QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm

# Now a leaf object without links
$ ipfs object get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/baz


# The string 'baz' is somewhere in there :)

The Unicode characters that show up in the data field are the result of serialization of the data. IPFS uses protobuf for that I think. Correct me if I'm wrong :)

At the time I'm writing this there is an experimental alternative for the ipfs object commands: ipfs dag:

$ ipfs dag get QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm

$ ipfs dag get /ipfs/QmdcYvbv8FSBfbq1VVSfbjLokVaBYRLKHShpnXu3crd3Gm/bar/baz


We see a couple of differences there, but let's not get into that. Both outputs follow the IPFS object format from the white paper. One interesting bit is the "Cid" that shows up; this refers to the newer Content IDentifier.

Another feature that is mentioned is the possibility to pin objects, which results in storage of these objects in the file system of the local node. The current go implementation of ipfs stores it in a leveldb database under the ~/.ipfs/datastore directory. We have seen pinning in action in a previous post.

The last part of this chapter mentions the availability of object level encryption. This is not implemented yet: status wip (Work in Progress; I had to look it up as well). The project page is here: ipfs keystore proposal.

The ipfs dag command hints to something new...

Intermission: IPLD

If you studied the images at the start of this post carefully, you are probably wondering, what is IPLD and how does it fit in? According to the white paper, it doesn't fit in, as it isn't mentioned at all!

My guess is that IPLD is not mentioned because it was introduced later, but it more or less maps to the Objects chapter in the paper. IPLD is broader, more general, than what the white paper specifies. Hey Juan, update the white paper will ya! :-)

If you don't want to wait for the updated white paper, have a look here: the IPLD website (Inter Planetary Linked Data), the IPLD specs and the IPLD implementations.

And this video is an excellent introduction: Juan Benet: Enter the Merkle Forest.

But if you don't feel like reading/watching more: IPLD is more or less the same as what is described in the "Objects" and "Files" chapters here.

Moving on to the next chapter in the white paper...

Files: uh?

On top of the Merkle DAG objects IPFS defines a Git-like file system with versioning, with the following elements:

  • blob: there is just data in blobs and it represents the concept of a file in IPFS. No links in blobs
  • list: lists are also a representation of an IPFS file, but consisting of multiple blobs and/or lists
  • tree: a collection of blobs, lists and/or trees: acts as a directory
  • commit: a snapshot of the history in a tree (just like a git commit).

Now I hear you thinking: aren't these blobs, lists, and trees the same things as what we saw in the Mergle DAG? We had objects there with data, with or without links, and nice Unix-like file paths.

I heard you thinking that because I thought the same thing when I arrived at this chapter. After searching around a bit I started to get the feeling that this layer was discarded and IPLD stops at the "objects" layer, and everything on top of that is open to whatever implementation. If an expert is reading this and thinks I have it all wrong: please let me know, and I'll correct it with the new insight.

Now, what about the commit file type? The title of the white paper is "IPFS - Content Addressed, Versioned, P2P File System", but the versioning hasn't been implemented yet it seems.

There is some brainstorming going on about versioning here and here.

That leaves one more layer to go...

Naming: adding mutability

Since links in IPFS are content addressable (a cryptographic hash over the content represents the block or object of content), data is immutable by definition. It can only be replaced by another version of the content, and it, therefore, gets a new "address".

The solution is to create "labels" or "pointers" (just like git branches and tags) to immutable content. These labels can be used to represent the latest version of an object (or graph of objects).

In IPFS this pointer can be created using the Self-Certified Filesystems I described in the previous post. It is named IPNS and works like this:

  • The root address of a node is /ipns/<NodeId>
  • The content it points to can be changed by publishing an IPFS object to this address
  • By publishing, the owner of the node (the person who knows the secret key that was generated with ipfs init) cryptographically signs this "pointer".
  • This enables other users to verify the authenticity of the object published by the owner.
  • Just like IPFS paths, IPNS paths also start with a hash, followed by a Unix-like path.
  • IPNS records are announced and resolved via the DHT.

I already showed the actual execution of the ipfs publish command in the post Getting to know IPFS.

This chapter in the white paper also describes some methods to make addresses more human-friendly, but I'll leave that in store for the next episode which will be hands-on again. We gotta get rid of these hashes in the addresses and make it all work nicely in our good old browsers: Ten terrible attempts to make IPFS human-friendly

Let me know what you think of this post by tweeting to me @pors!