Introduction
S5 is a decentralized network that puts you in control of your data and identity.
At its core, it is a content-addressed storage network similar to IPFS, but with some new concepts and ideas to make it more efficient and powerful.
This website is hosted on S5.
All relevant code can be found here: https://github.com/s5-dev
You can join the Discord Server for updates and discussion: https://discord.gg/Pdutsp5jqR
Discussion will be moved to a decentralized chat app powered by S5 when it's ready :)
Concepts
This section explains some basic concepts used in S5:
Content-addressed data
Cryptographic hashes
Cryptographic hash functions are an algorithm that can map files or data of any size to a fixed-length hash value.
- They are deterministic, meaning the same input always results in the same hash
- It is infeasible to generate a message that yields a given hash value (i.e. to reverse the process that generated the given hash value)
- It is infeasible to find two different messages with the same hash value
- A small change to a message should change the hash value so extensively that it appears uncorrelated with the old hash value
The BLAKE3 hash function
BLAKE3 is a cryptographic hash function that is:
- Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.
- Secure, unlike MD5 and SHA-1. And secure against length extension, unlike SHA-2.
- Highly parallelizable across any number of threads and SIMD lanes, because it's a Merkle tree on the inside.
- Capable of verified streaming and incremental updates, again because it's a Merkle tree.
- A PRF, MAC, KDF, and XOF, as well as a regular hash.
- One algorithm with no variants, which is fast on x86-64 and also on smaller architectures.
Content-addressing
Content-addressing means that instead of addressing data by their location (for example with protocols like HTTP/HTTPS), it's referenced by their cryptographic hash. This makes it possible to make sure you actually received the correct data you are looking for without trusting anyone except the person who gave you the hash. It also makes all files immutable by default.
Verified streaming
To make verified streaming of large files possible, S5 uses the Bao implementation for BLAKE3 verified streaming. As mentioned earlier, BLAKE3 is a merkle tree on the inside - this makes it possible to verify the integrity of small parts of a file without having to download and hash the entire file first.
By default, S5 appends some layers of the Bao hash tree to every stored file that is larger than 256 KiB. With the default layers, it's possible to verify chunks with a minimum size of 256 KiB from a file. So if you're for example streaming a large video file, your player only needs to download the first 256 KiB of the file before being able to show the first frame. The overhead of storing the tree is a bit less than 256 KiB per 1 GiB of stored data.
CIDs (Content identifiers)
Hash values produced by the BLAKE3 hash function have a size of 32 bytes, for example c4d27f80613c2dfdc4d9d013b43c181576e21cf9c2616295646df00db09fbd95
(hex-encoded).
Instead of using this value directly, S5 prepends two additional bytes to reference raw files:
0x26 cidTypeRaw
: This CID contains a raw file without any additional metadata
0x1f mhashBlake3Default
: This CID contains a BLAKE3 hash of the file with the default 256-bit output size
You can find a list of all up-to-date magic bytes here: lib5:constants.dart
In addition to these two magic bytes, the size of the file (in bytes) is encoded with little-endian encoding and appended to the hash bytes.
For example a file with 18657
bytes, would be encoded like this:
0x26 0x1f 0xc4d27f80613c2dfdc4d9d013b43c181576e21cf9c2616295646df00db09fbd95 0xe148
type hash blake3-256-hash filesize
So the length of a raw file CID depends on the filesize:
- Files with a size of less than 256 bytes have a 35-byte CID
- Files with a size of less than 64 KiB bytes have a 36-byte CID
- Files with a size of less than 16 MiB bytes have a 37-byte CID
- Files with a size of less than 4 GiB bytes have a 38-byte CID
- ...
- Files with a size of less than 16384 PiB have a 42-byte CID
Encoding the CID bytes to a human-readable form
S5 uses the multibase standard for encoding the CID bytes. Basically the first character indicates how the bytes are encoded, here's a list of which ones are supported by S5:
base32, b, rfc4648 case-insensitive - no padding
base58btc, z, base58 bitcoin
base64url, u, rfc4648 no padding
By default, base58btc
with the z
prefix is used for newly uploaded files because it's short and easy to copy.
So the CID from the example earlier would be encoded like this:
base58btc: zHnq5PTzaLbboBEvLzecUQQWSpyzuugykxfmxPv4P3ccDcGwnw
base32: beyp4jut7qbqtylp5ytm5ae5uhqmbk5xcdt44eylcsvsg34anwcp33fpbja
base64url: uJh_E0n-AYTwt_cTZ0BO0PBgVduIc-cJhYpVkbfANsJ-9leFI
Media types
To make deduplication as efficient as possible, raw files on S5 do not contain any additional metadata like filenames or media types.
You can append a file extension to your CID to stream/share a single file with the correct content type, for example zHnq5PTzaLbboBEvLzecUQQWSpyzuugykxfmxPv4P3ccDcGwnw.txt
.
For other use cases, you should use one of the metadata formats.
Registry
The S5 registry is a decentralized key-value store. A registry entry looks like this:
class SignedRegistryEntry {
// public key with multicodec prefix
pk: Uint8Array;
// revision number of this entry, maximum is (256^8)-1
revision: int;
/// data stored in this entry, can have a maximum length of 48 bytes
data: Uint8Array;
/// signature of this registry entry
signature: Uint8Array;
}
Every registry entry has a 33-byte key.
The first byte indicates which type of public key it is, by default 0xed
for ed25519 public keys.
The other 32 bytes are the ed25519 public key itself.
Every update to a registry entry must contain a signature created by the ed25519 keypair referenced in the key.
Nodes only keep the highest revision number and reject updates with a lower number.
Because the data has a maximum size of 48 bytes, most types of data can't be stored directly in it.
For this reason, registry entries usually contain a CID which then contains the data itself.
The data
bytes for registry entries which refer to a CID look like this:
0x5a 0x261fc4d27f80613c2dfdc4d9d013b43c181576e21cf9c2616295646df00db09fbd95e148
link CID bytes
type
Subscriptions
Nodes can subscribe to specific entries on the peer-to-peer network to get new updates in realtime.
Peer-to-peer
S5 uses a peer-to-peer network to find providers who serve a specific file. Compared to IPFS, S5 does NOT transfer or exchange file data between peers directly. Instead, the p2p network is only meant to find storage locations and download links for a specific file hash. This has some advantages:
- Because only lightweight queries for hashes (34 bytes) and responses (only a short download link, usually less than 256 bytes) are sent over the p2p network, it's extremely lightweight and very scalable.
- Existing highly optimized HTTP-based software and infrastructure can be used for the file delivery itself, reducing costs significantly and making download more efficient. Also keeps peers lightweight.
- Because S5 uses the HTTP/HTTPS protocol (support for more is planned), existing download links or files mirrors can be directly provided on S5 without needing to re-upload them - even if the one who provides it on the network is not the same one hosting it.
Peer discovery
Right now S5 uses a configurable list of initial peers with their connection strings (protocol, ip address, port) to connect to the network. After connecting to a new peer, peers send a list of all other peers they know about to the new peer.
Supported P2P Protocols
- Custom TCP (authenticated, but not encrypted)
Planned P2P Protocols
- QUIC+TLS or nQUIC
- WebSocket
- WebTransport
Node/peer IDs
Every node has a unique (random) ed25519 keypair. This keypair is used to sign specific responses like provide operations, which contain a specific storage location and download link for a queried hash. Because the message itself contains the signature, all peers can also relay queries and responses without being trusted to not tamper with them.
Node scores
Every node keeps a local score for every other node/peer it knows of. This score is calculated based on the number of valid and useful responses by a node compared to the number of bad or invalid responses. The score also depends on the total number of responses, so a node with 1000 correct and 50 wrong responses has a better score than a node with 5 correct out of only 5 total responses for example.
The algorithm can be found here: lib5:score.dart
Node scores are used to decide which download links to try first if multiple are available for the same file hash.
Install the S5 node
Right now the only supported way to run a S5 node is using a container runtime like Docker or Podman.
You can install Docker on most operating systems using the instructions here: https://docs.docker.com/engine/install/
If you are on Linux you can use the convenience script: curl -fsSL https://get.docker.com | sudo sh
Podman is a popular alternative to Docker, but it might be harder to install on non-Linux system. You can find instructions for it here: https://podman.io/getting-started/installation
Run S5 using Docker
Before running this command, you should change the paths ./s5/config
and ./s5/db
to a storage location of your choice.
docker run -d \
--name s5-node \
-p 127.0.0.1:5050:5050 \
-v ./s5/config:/config \
-v ./s5/db:/db \
--restart unless-stopped \
ghcr.io/s5-dev/node:0.10.0
This will only bind your node to localhost, so you will need a reverse proxy like Caddy to access it from the internet.
If you instead want to expose the HTTP API port to the entire network, you can set -p 5050:5050
If something seems to not work correctly, you can view the logs with docker logs -f s5-node
config path
This path will be used to generate and load the config.toml
file, you will need to edit that file for configuring stores and other options.
db path
This path is used for storing small key-value databases that hold state relevant for the network and node. Do not use a slow HDD for this.
(optional) cache path
The cache stores large file uploads and different downloads/streams. You can use a custom cache location by adding -v ./s5/cache:/cache
to your command.
(optional) data path
If you are planning to store uploaded files on your local disk, you should prepare a directory for that and specify it with -v ./s5/data:/data
Using Sia
If you want to use S5 with an instance of renterd running on the same server, you should add the --network="host"
flag to grant S5 access to the renterd API.
Stop the container
docker container stop s5-node
Remove the container
docker container rm s5-node
Alternative: Using docker-compose
Create a file called docker-compose.yml
with this content:
version: '3'
services:
s5-node:
image: ghcr.io/s5-dev/node:0.10.0
volumes:
- ./path/to/config:/config
ports:
- "5050:5050"
restart: unless-stopped
Same configuration options as with normal Docker/Podman, run it with docker-compose up -d
S5 Config
You can edit the config.toml
file to configure your S5 node. You can apply changes with docker container restart s5-node
This page describes the available sections in the config.
keypair
The seed
is generated on first start, you should keep it private. It's used for signing messages on the network.
store
Check out the Stores documentation for configuring different object stores.
accounts
You can enable the accounts system by adding this part to your config:
[accounts]
enabled = true
[accounts.database]
path = "/db/accounts"
Registrations are disabled by default, you can enable them by adding this part:
[accounts]
alwaysAllowedScopes = [
'account/login',
'account/register',
's5/registry/read',
's5/metadata',
's5/debug/storage_locations',
's5/debug/download_urls',
's5/blob/redirect',
]
cache
Configure a custom cache path with path
, you likely don't need this if you are using Docker.
database
Configure a custom database path, you likely don't need this if you are using Docker.
http.api
domain
: Configure this value to match the domain you are using to access your node
port
: On which port the HTTP API should bind
p2p.peers
List of initial peers used for connecting to the p2p network.
Caddy reverse proxy
Caddy is an easy to use reverse proxy with automatic HTTPS.
You can install it by following the instructions over at https://caddyserver.com/docs/install
You'll also need a domain name with A
and AAAA
records pointed to your server.
You should also make sure that your firewall doesn't block the ports 80
and 443
Configuration
With the default S5 port of 5050
, you can configure your /etc/caddy/Caddyfile
like this:
YOUR.DOMAIN {
reverse_proxy localhost:5050
}
On Debian and Ubuntu you can run sudo systemctl restart caddy
to restart Caddy after editing the Caddyfile.
Don't forget to configure http.api.domain
in your S5 config.toml
after setting up a domain and reverse proxy!
Stores
The S5 network and nodes supports multiple different storage backends.
S3 is the easiest to set up, Sia is the cheapest option.
Local stores all files on your server directly, so that usually only makes sense for a home NAS use case or a small number of files.
Arweave provides permanent storage for a high price.
S3-compatible providers
Any cloud provider supporting the S3 protocol, see https://s3.wiki for the cheapest ones.
Configuration
[store.s3]
accessKey = "YOUR_ACCESS_KEY"
bucket = "YOUR_BUCKET_NAME"
endpoint = "YOUR_S3_ENDPOINT"
secretKey = "YOUR_SECRET_KEY"
Local
Stores uploaded files on the local filesystem.
Configuration
[store.local]
path = "/data" # If you are using the Docker container
[store.local.http]
bind = "127.0.0.1"
port = 8989
url = "http://localhost:8989"
By default, files will only be available on your local node.
To make it available on the entire network, you have to forward your port to be reachable from the internet and then update the url
to the URL at which your computer is available from the internet.
Sia Network
The Sia network provides decentralized and redundant data storage.
You will need a fully configured local instance of renterd: https://github.com/SiaFoundation/renterd
Warning: Both renterd and this integration are still experimental. Please report any bugs you encounter.
Configuration
[store.sia]
workerApiUrl = "http://localhost:9980/api/worker"
apiPassword = "test"
downloadUrl = "https://dl.YOUR.DOMAIN"
Using Caddy as a reverse proxy for Sia downloads
This configuration requires a version of Caddy with https://github.com/caddyserver/cache-handler, if you don't want to cache Sia downloads you can remove the first 4 lines and the cache directive.
/etc/caddy/Caddyfile
:
{
order cache before rewrite
cache
}
dl.YOUR.DOMAIN {
uri strip_suffix /
header {
Access-Control-Allow-Origin *
}
cache {
stale 6h
ttl 24h
default_cache_control "public, max-age=86400"
nuts {
path /tmp/nuts
}
}
rewrite * /api/worker/objects/1{path}
reverse_proxy {
to localhost:9980
header_up Authorization "Basic OnRlc3Q=" # Change this to match your renterd API key
}
}
Arweave
Arweave is expensive, but provides permanent storage for a one-time payment. Check out https://www.arweave.org/
Disabled right now
Tools
This section contains some useful tools for working with S5
cid.one
https://cid.one/ is a CID explorer for the S5 network.
It supports raw CIDs, all of the metadata formats, resolver CIDs and (soon) encrypted CIDs.
Here are some examples:
Raw file: https://cid.one/#uJh9dvBupLgWG3p8CGJ1VR8PLnZvJQedolo8ktb027PrlTT5LvAY
Resolver CID: https://cid.one/#zrjD7xwmgP8U6hquPUtSRcZP1J1LvksSwTq4CPZ2ck96FHu
Media Metadata: https://cid.one/#z5TTvXtbkQk9PTUN8r5oNSz5Trmf1NjJwkVoNvfawGKDtPCB
Web App Metadata: https://cid.one/#blepzzclchbhwull3is56zvubovg7j3cfmatxx5gyspfx3dowhyutzai
s5.cx
s5.cx is a web-based tool to securely stream files of any size directly from the S5 network. File data is NOT proxied by the s5.cx server.
It works by using a service worker that intercepts all raw file requests, fetches the file data from a host on the S5 network and verifies the integrity using BLAKE3/bao in Rust compiled to WASM and running directly inside of the service worker.
The service worker code can be used by any web app to easily stream files from S5 without needing any additional code or libraries in your project. A repository with setup instructions will be published soon.
The service worker is already being used by https://tube5.app/.
Here's an example file: https://s5.cx/uJh9dvBupLgWG3p8CGJ1VR8PLnZvJQedolo8ktb027PrlTT5LvAY.mp4
Metadata formats
This section contains documentation for all metadata formats used and supported by S5.
All formats have a JSON representation for easy creation, debug purposes and editing.
All formats also have a highly optimized serialization representation based on https://msgpack.org/ used for storing them on S5 including (optional) signatures and timestamp proofs.
JSON Schemas for all formats are available here: https://github.com/s5-dev/json-schemas
Web App metadata
Metadata format used for web apps stored on S5. This docs website is hosted using it.
Example
Web App Metadata: https://cid.one/#blepzzclchbhwull3is56zvubovg7j3cfmatxx5gyspfx3dowhyutzai
Fields
Full JSON Schema: https://schema.sfive.net/web-app-metadata.json
Web-based viewer: https://json-schema.app/view/%23?url=https%3A%2F%2Fschema.sfive.net%2Fweb-app-metadata.json
Directory metadata
Work-in-progress, will be used to store directory trees in Vup. Supports advanced sharing capabilities and is fully end-to-end-encrypted by default.
Media metadata
Very flexible metadata format used for almost any more advanced content/media structure.
Can be used for videos, images, music, podcasts, profiles, lists and more!
Already being used by Tube5.
Example
Media Metadata: https://cid.one/#z5TTvXtbkQk9PTUN8r5oNSz5Trmf1NjJwkVoNvfawGKDtPCB
Fields
Full JSON Schema: https://schema.sfive.net/media-metadata.json
Web-based viewer: https://json-schema.app/view/%23?url=https%3A%2F%2Fschema.sfive.net%2Fmedia-metadata.json