Design Requirements for Portal SubNetworks

At it's core, the Portal Network is a specialized storage network for Ethereum's data. It is a distributed peer-to-peer network where all nodes in the network participate in storing and serving data. The ongoing health of the network depends on our protocol designs adhering to the following requirements.

Provable Data Only

All data stored in the network must be able to be cryptographically anchored such that it can be proven to be correct. For example, a block header is addressed by it's block hash, and upon retrieval, the header can then be proven to be the correct data by taking the keccak(rlp(header)) and verifying that it matches the expected hash.

This requirement ensures that the network can only be used to store the intended data, which protects it against both spam and certain categories of denial of service attacks. Without this requirement the network could be abused by people attempting to store other types of data, or attacked by people trying to fill the network's storage capacity up with garbage data.

Canonical Data Only

All data stored in the network must be canonically anchored. This requirement is similar to the previous "Provable Data Only", however is a bit more specific in that it requires all data to not only be provable, but also canonical. An example of this would be the header for a non-canonical block which are often referred to as ommers (which is the gender nuetral term for aunt or uncle).

An example of this is our use of double batched merkle log accumulators (opens in a new tab). These allow us to easily prove an individual block header is part of the canonical chain of headers.

Easy To Prove

Some things are provable, but not easy to prove. An example of this is proving that any individual block header is canonical. The naive approach to proving a header is part of the canonical chain is to trace it's parent chain backwards in time to either genesis or to another historical header that is trusted. In practice, this is an arduous process that requires downloading and verifying very large amounts of information.

The Portal Network solves this problem by borrowing from the Beacon chain specifications and using double batched merkle log accumulators. These accumulators allow for users to verify a header is canonical by verifying a small merkle proof instead of needing to verifying the entire historical chain of hashes.

In general, this principle must apply to all data in our networks.

Amplification Attacks

As we look at the various proof requirements, one "trap" that is easy to fall into is the introduction of what is referred to as an "amplification attack". Suppose that we were to introduce a new type of data B that in order to prove it's canonicalness, a node would need to fetch a different piece of data A from the network. If the relative sizes are such that B is smaller than A then we will also be introducing an attack vector.

A bad actor can send out an invalid payload for B. The receiving node would then need to fetch A from the network in an attempt to verify the B payload they received. The verification will ultimately fail for B, but the damage to the network occurs because the bad actor has been able to send a small payload which results in the recipients then sending out larger payloads. This allows someone to execute a denial of service attack on our network by sending out a lot of small payloads which inturn cause the receiving nodes to then send out a bunch of even larger payloads, effectively amplifying the amount of bandwidth needed to execute this attack.

Evenly Distributed Data

The Portal Network is a fancy DHT. This DHT has a 256 bit address space, and all content within the portal network has a content-id which dictates its location in this address space. Data in the portal network is required to be evenly distributed across this address space. This is how we ensure that individual nodes in the network are able to effectively choose how much data they are willing to store for the network.

If our addressing scheme does not evenly distribute the data then we would end up with areas of the address space where there is significantly more data than other areas of the network. We refer to such areas as "hot spots". Nodes in the DHT that found themselves very close to these hot spots would end up being responsible for a disproportionately large amount of data.

Hash Accumulators Execution Clients