Introduction

S5 is a decentralized network that puts you in control of your data and identity.

At its core, it is a content-addressed storage network similar to IPFS, but with some new concepts and ideas to make it more efficient and powerful.

This website is hosted on S5.

All relevant code can be found here: https://github.com/s5-dev

You can join the Discord Server for updates and discussion: https://discord.gg/Pdutsp5jqR

Discussion will be moved to a decentralized chat app powered by S5 when it's ready :)

Concepts

This section explains some basic concepts used in S5:

Content-addressed data

Cryptographic hashes

Cryptographic hash functions are an algorithm that can map files or data of any size to a fixed-length hash value.

They are deterministic, meaning the same input always results in the same hash
It is infeasible to generate a message that yields a given hash value (i.e. to reverse the process that generated the given hash value)
It is infeasible to find two different messages with the same hash value
A small change to a message should change the hash value so extensively that it appears uncorrelated with the old hash value

The BLAKE3 hash function

BLAKE3 is a cryptographic hash function that is:

Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.
Secure, unlike MD5 and SHA-1. And secure against length extension, unlike SHA-2.
Highly parallelizable across any number of threads and SIMD lanes, because it's a Merkle tree on the inside.
Capable of verified streaming and incremental updates, again because it's a Merkle tree.
A PRF, MAC, KDF, and XOF, as well as a regular hash.
One algorithm with no variants, which is fast on x86-64 and also on smaller architectures.

Content-addressing means that instead of addressing data by their location (for example with protocols like HTTP/HTTPS), it's referenced by their cryptographic hash. This makes it possible to make sure you actually received the correct data you are looking for without trusting anyone except the person who gave you the hash. Other benefits include highly efficient caching (due to file blobs being immutable by default) and automatic deduplication of data.

Verified streaming

To make verified streaming of large files possible, S5 uses the Bao implementation for BLAKE3 verified streaming. As mentioned earlier, BLAKE3 is a merkle tree on the inside - this makes it possible to verify the integrity of small parts of a file without having to download and hash the entire file first.

By default, S5 stores some layers of the Bao hash tree next to every stored file that is larger than 256 KiB (same path, but .obao extension). With the default layers, it's possible to verify chunks with a minimum size of 256 KiB from a file. So if you're for example streaming a large video file, your player only needs to download the first 256 KiB of the file before being able to show the first frame. The overhead of storing the tree is a bit less than 256 KiB per 1 GiB of stored data.

CIDs (Content identifiers)

See /spec/blobs.html for up-to-date documentation on how S5 calculates CIDs.

Media types

To make deduplication as efficient as possible, raw files on S5 do not contain any additional metadata like filenames or media types. You can append a file extension to your CID to stream/share a single file with the correct content type, for example zHnq5PTzaLbboBEvLzecUQQWSpyzuugykxfmxPv4P3ccDcGwnw.txt.

For other use cases, you should use one of the metadata formats.

Registry

The S5 registry is a decentralized key-value store. A registry entry looks like this:

class SignedRegistryEntry {
  // public key with multicodec prefix
  pk: Uint8Array;

  // revision number of this entry, maximum is (256^8)-1
  revision: int;

  /// data stored in this entry, can have a maximum length of 48 bytes
  data: Uint8Array;

  /// signature of this registry entry
  signature: Uint8Array;
}

Every registry entry has a 33-byte key. The first byte indicates which type of public key it is, by default 0xed for ed25519 public keys. The other 32 bytes are the ed25519 public key itself.

Every update to a registry entry must contain a signature created by the ed25519 keypair referenced in the key.

Nodes only keep the highest revision number and reject updates with a lower number.

Because the data has a maximum size of 48 bytes, most types of data can't be stored directly in it. For this reason, registry entries usually contain a CID which then contains the data itself. The data bytes for registry entries which refer to a CID look like this:

0x5a 0x261fc4d27f80613c2dfdc4d9d013b43c181576e21cf9c2616295646df00db09fbd95e148
link CID bytes
type

Subscriptions

Nodes can subscribe to specific entries on the peer-to-peer network to get new updates in realtime.

Peer-to-peer

S5 uses a peer-to-peer network to find providers who serve a specific file. Compared to IPFS, S5 does NOT transfer or exchange file data between peers directly. Instead, the p2p network is only meant to find storage locations and download links for a specific file hash. This has some advantages:

Because only lightweight queries for hashes (34 bytes) and responses (only a short download link, usually less than 256 bytes) are sent over the p2p network, it's extremely lightweight and very scalable.
Existing highly optimized HTTP-based software and infrastructure can be used for the file delivery itself, reducing costs significantly and making download more efficient. Also keeps peers lightweight.
Because S5 uses the HTTP/HTTPS protocol (support for more is planned), existing download links or files mirrors can be directly provided on S5 without needing to re-upload them - even if the one who provides it on the network is not the same one hosting it.

Peer discovery

Right now S5 uses a configurable list of initial peers with their connection strings (protocol, ip address, port) to connect to the network. After connecting to a new peer, peers send a list of all other peers they know about to the new peer.

Supported P2P Protocols

WebSocket (wss://)

Planned P2P Protocols

iroh QUIC Connections (https://iroh.computer/docs/layers/connections)

Node/peer IDs

Every node has a unique (random) ed25519 keypair. This keypair is used to sign specific responses like provide operations, which contain a specific storage location and download link for a queried hash. Because the message itself contains the signature, all peers can also relay queries and responses without being trusted to not tamper with them.

Node scores

Every node keeps a local score for every other node/peer it knows of. This score is calculated based on the number of valid and useful responses by a node compared to the number of bad or invalid responses. The score also depends on the total number of responses, so a node with 1000 correct and 50 wrong responses has a better score than a node with 5 correct out of only 5 total responses for example.

The algorithm can be found here: lib5:score.dart

Node scores are used to decide which download links to try first if multiple are available for the same file hash.

Specification

Blobs

As explained in /concepts/content-addressed-data.md, S5 uses the concept of content-addressing for all data and files, so any blob of bytes.

IPFS introduced the concept of Content Identifiers (CIDs), to have a standardized and future-proof way to refer to content-addressed data. Unfortunately, "IPFS CIDs are not file hashes" because they split files up in a lot of small chunks, to make verified streaming of file slices possible without needing to download the entire file first. As a result, these files will never match their "true" hash, like when running sha256sum.

Fortunately, there has been some innovation in the space of cryptographic hash functions recently! Namely BLAKE3, which is based on the more well-known BLAKE2 hash function. Apart from being very fast and secure, its most unique feature is that its internal structure is already a Merkle tree. So instead of having to build a Merkle tree yourself (that's what IPFS does, CIDs point to the hash of a Merkle tree), BLAKE3 already takes care of that. As a result, CIDs using BLAKE3 are always consistent (for example with running b3sum on your local machine) and work with files of pretty much any size, while still supporting verified streaming (at any chunk size, down to 1024 bytes). So there's no longer a need to split up files bigger than 1 MiB in multiple chunks.

You can also check out the documentation of Iroh, another content-addressed data system, which explains this in a more in-depth way: https://iroh.computer/docs/layers/blobs

Cool, but why yet another new CID format?

With bigger blobs and no extra metadata (due to the unaltered input bytes always being the source of a CID hash, so no longer using something like UnixFS), there's a need for knowing the file size of a Blob CID. So S5 continues to use (and be fully compatible) with BLAKE3 IPFS CIDs (and limited compatibility with other hash functions like sha256) when the blob size doesn't matter, but for use cases where it does, it introduces a new CID format.

Other protocols like the AT Protocol (used in Bluesky) solve this by using JSON maps for referencing blobs which contain both the IPFS CID and the blob size in an extra field. But I feel like there's value in having a compact format for representing an immutable sequence of bytes including its hash, so here we are.

IPFS CIDs can be easily converted to S5 Blob CIDs if you know their file/blob size in bytes. If the IPFS CID is using the "raw binary IPLD codec", this operation is lossless. S5 Blob CIDs can always be converted to IPFS CIDs, but if the blob is bigger than 1 MiB it likely won't work with most IPFS implementations. S5 Blob CIDs can be losslessly converted to Iroh-compatible CIDs and back (assuming you keep the blob size somewhere or do a BLAKE3 size proof using Iroh)

The S5 Blob CID format

S5 Blob CIDs always start with two magic bytes.

The first one is 0x5b and indicates that the CID is a S5 blob CID.

The second one is 0x82 and indicates that it is a plaintext blob. 0x83 is reserved for encrypted blobs. (spec for them is still WIP)

Byte	Meaning
0x5b	S5 Blob CID magic byte
0x82	S5 Blob Type Plaintext (Unencrypted, just a simple blob)

As a nice side effect of picking exactly these two bytes, all S5 Blob CIDs start with the string "blob" when encoded as base32 (multibase). All S5 CID magic bytes are picked carefully to not collide with any existing magic bytes on the https://github.com/multiformats/multicodec table

After the two magic bytes, a single byte indicates which cryptographic hash function was used to derive a hash from the blob bytes. All S5 implementations should use 0x1e (for BLAKE3), but SHA256 is also supported for compatibility reasons. SHA256 should only be used for small blobs imported from other systems, like IPFS or the AT Protocol.

Byte	Meaning
0x1e	multihash blake3
0x12	multihash sha2-256

After the single multihash indicator byte, the 32 hash bytes follow. (S5 Blob CIDs always use the default hash output length, 32 bytes, for both blake3 and sha2-256. If the need for a different output length emerges in the future, a new possible value for the hash byte could be added)

Finally, the size (in bytes) of the blob is encoded as a little-endian byte array, trailing zero bytes are trimmed, and the remaining bytes appended to the CID bytes. Doing that could look like this in Rust (you can see a full example of calculating a CID in Rust at the bottom of this page):

#![allow(unused)]
fn main() {
let blob_size: u64 = 100_000_000_000; // 100 GB (you would usually just use .len() or something)
let mut cid_size_bytes = blob_size.to_le_bytes().to_vec();
if let Some(pos) = cid_size_bytes.iter().rposition(|&x| x != 0) {
    cid_size_bytes.truncate(pos + 1);
}
println!("{:x?}", cid_size_bytes);
}

If we put all of this together, this is how the S5 Blob CID of the string Hello, world! in hex representation would look like:

5b 82 1e ede5c0b10f2ec4979c69b52f61e42ff5b413519ce09be0f14d098dcfe5f6f98d 0d
PREFIX   BLAKE3 HASH (from b3sum)                                         SIZE

So the length of a S5 Blob CID depends on the filesize:

Files with a size of less than 256 bytes have a 36-byte CID
Files with a size of less than 64 KiB bytes have a 37-byte CID
Files with a size of less than 16 MiB bytes have a 38-byte CID
Files with a size of less than 4 GiB bytes have a 39-byte CID
...
Files with a size of less than 16384 PiB have a 43-byte CID

S5 Blob CIDs DO NOT contain a blob or file's media type, encoding or purpose. The reason for this is that it would no longer result in fully deterministic CIDs, because for example the media type could be interpreted differently by different applications or libraries.

Encoding the S5 Blob CID bytes to a human-readable string

S5 uses the multibase standard for encoding CIDs, just like IPFS, Iroh and the AT Protocol.

S5 implementations MUST support the following self-identifying base encodings:

character,  encoding,           description
f,          base16,             Hexadecimal (lowercase)
b,          base32,             RFC4648 case-insensitive - no padding
z,          base58btc,          Base58 Bitcoin
u,          base64url,          RFC4648 no padding

For the string Hello, world!, these would be the S5 Blob CIDs in different encodings:

base16:    f5b821eede5c0b10f2ec4979c69b52f61e42ff5b413519ce09be0f14d098dcfe5f6f98d0d
base32:    blobb53pfycyq6lwes6ogtnjpmhsc75nucnizzye34dyu2cmnz7s7n6mnbu
base58:    zhJTU2Mz5tATfj9rc5xorsXiadvYq3idS4CznEfW9Zg9zfksX2
base64url: uW4Ie7eXAsQ8uxJecabUvYeQv9bQTUZzgm-DxTQmNz-X2-Y0N

Calculating the S5 Blob CID of any file using standard command line utils

Step 1: Calculate the BLAKE3 hash of your file (might need to install b3sum). You could also use sha256sum instead (and then put 0x12 as the hash prefix in step 3)

b3sum file.mp4

Step 2: Encode the size of your file in little-endian hex encoding

wc -c file.mp4 | cut -d' ' -f1 | tr -d '\n' | xargs -0 printf "%016x" | tac -rs .. | sed --expression='s/[00]*$/\n/'

Step 3: Add the multibase prefix and magic bytes

Characters	Purpose
f	multibase prefix for Hexadecimal (lowercase)
5b	S5 Blob CID magic byte
82	S5 Blob Type Plaintext (Unencrypted, just a simple blob)
1e	multihash blake3

Now, put it all together (the zeros will be your hash and the 654321 suffix your file size):

f5b821e + BLAKE3_HASH + SIZE_BYTES = f5b821e0000000000000000000000000000000000000000000000000000000000000000654321

That's it, you can now use that CID to trustlessly stream exactly that file from the S5 Network!

Calculating a S5 Blob CID in Rust (using only top 100 crates)

You can run the code with the play button (top right) and edit it if you want!

use data_encoding::BASE32_NOPAD; // 2.5.0;
use sha2::{Digest, Sha256}; // 0.10.8

fn main() {
    let blob = b"Hello, world!";
    
    let cid_prefix_bytes = vec![
        0x5b, // S5 Blob CID magic byte
        0x82, // S5 Blob Type Plaintext (Unencrypted, just a simple blob)
        0x12, // multihash sha2-256
    ];
    
    let sha256_hash_bytes = Sha256::digest(blob).to_vec();
    
    let blob_size = blob.len() as u64;
    let mut cid_size_bytes = blob_size.to_le_bytes().to_vec();
    if let Some(pos) = cid_size_bytes.iter().rposition(|&x| x != 0) {
        cid_size_bytes.truncate(pos + 1);
    }
    
    let cid_bytes = [cid_prefix_bytes, sha256_hash_bytes, cid_size_bytes].concat();
    
    println!("b{}", BASE32_NOPAD.encode(&cid_bytes).to_lowercase());
}

Registry

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

The registry is a distributed key-value store on the S5 Network and makes mutable data structures possible by being a pointer to immutable blobs. At the moment, all keys are ed25519 public keys and each entry must be signed by the corresponding private key. All keys on the network are prefixed with the 0xed byte (indicating ed25519) to make future additions to the set of supported algorithms easy (especially quantum-safe ones). Each entry also has a 64-bit unsigned integer revision number. Nodes should drop any (valid) registry entries they receive if their revision number is lower than that of an entry for the same public key they already know about.

Registry entries should be stored for as long as possible by nodes receiving them. To make this easier, registry entries can hold a maximum of 48 bytes of data. While small, this is more than enough to store a Blob CID or simple hash reference. So it shouldn't put any limits on possible use cases.

The code examples below are written in the Dart programming language. The final spec will use Rust ones.

The registry uses little-endian byte encoding for the u64 revision.

Registry Entry structure

class SignedRegistryEntry {
  /// public key with multicodec prefix (0xed for ed25519)
  final Uint8List pk;

  /// revision number of this entry, maximum is (256^8)-1
  final int revision; // must be an unsigned 64-bit integer.

  /// data stored in this entry, can have a maximum length of 48 bytes
  final Uint8List data;

  /// signature of this signed registry entry
  final Uint8List signature;

Relevant Constants

const recordTypeRegistryEntry = 0x07;

Serializing a registry entry (for wire transport)

Uint8List serialize() {
    return Uint8List.fromList([
        recordTypeRegistryEntry, // 0x07 (constant)
        ...pk,
        ...encodeEndian(revision, 8), // this uses little-endian encoding
        data.length, // a single byte for the data length
        ...data,
        ...signature,
    ]);
}

Signing a registry entry

Future<SignedRegistryEntry> signRegistryEntry({
  required KeyPairEd25519 kp,
  required Uint8List data,
  required int revision,
  required CryptoImplementation crypto,
}) async {
  final list = Uint8List.fromList([
    recordTypeRegistryEntry, // 0x07 (constant)
    ...encodeEndian(revision, 8),
    data.length, // 1 byte
    ...data,
  ]);

  final signature = await crypto.signEd25519(
    kp: kp,
    message: list,
  );

  return SignedRegistryEntry(
    pk: kp.publicKey,
    revision: revision,
    data: data,
    signature: Uint8List.fromList(signature),
  );
}

Verifying a registry entry

Future<bool> verifyRegistryEntry(
  SignedRegistryEntry sre, {
  required CryptoImplementation crypto,
}) {
  final list = Uint8List.fromList([
    recordTypeRegistryEntry, // 0x07 (constant)
    ...encodeEndian(sre.revision, 8),
    sre.data.length, // 1 byte
    ...sre.data,
  ]);
  return crypto.verifyEd25519(
    pk: sre.pk.sublist(1),
    message: list,
    signature: sre.signature,
  );
}

Streams

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

S5 Streams are pretty much identical to the registry.md, with the main difference being that S5 Nodes are expected to store old stream messages, not just the latest one (like the registry). The serialized data structure of stream messages is identical to registry entries, only the constant prefix byte is changed to 0x08 instead of 0x07 (for the registry) and it uses big-endian encoding for the u64 revision number.

In addition to storing previous stream messages, it's also possible to query them using range requests. For example, "Please send me all stream messages with this seq/ts or greater, including any messages you'll receive in the future".

The revision number of a stream message is a u64 and is usually composed of a u32 unix timestamp and a u32 sequence number. This is however application-specific, so you can also use a slightly bigger timestamp and some random bytes to build the revision number if you prefer. S5 Nodes only care about the complete u64 revision number for ordering and range requests.

When serialized, stream messages can also contain the blob bytes referenced by the hash in the stream message to optimize lookup latency for small data packets.

Accounts on Storage Nodes

Many use cases of S5 require either storing more data that can fit on a user's local device or users might simply want to use a hosted provider to store their data.

Fortunately, S5 makes this mostly trustless by using integrity verification (blake3 hashes) for all file uploads and downloads, so your storage provider can do very little to tamper with your files.

That said, you can have a higher risk of losing access to your data when using an external storage provider. They could for example simply delete your files, revoke your access, be hacked or fall victim to a datacenter fire. Even when using a S5 Node on your local NAS as a storage provider, some of these risks still apply.

So to prevent losing access to your data in these cases, S5 tries to make it as easy as possible to keep your data mirrored on multiple independent storage providers. One part of this is using content-addressing and a decentralized network, which makes it possible for you to quickly pin files on additional storage services without needing to re-upload them (because the storage providers can communicate with each other directly via the S5 Network).

S5 provides an anonymous accounts system, which makes it easy to manage and authenticate on many different storage providers (S5 Nodes) at once with minimal metadata. A storage provider only gets a randomly generated pubkey from you when logging in, which is different for every provider you sign up to. Some providers might require additional data (like an email address) to sign up, but that's not part of the S5 protocols.

Account Tiers

This part is not written yet.

Relevant HTTP APIs

See https://github.com/s5-dev/S5/blob/main/lib/service/accounts.dart for the current reference implementation, more detailed API documentation will be added to this spec in the future.

`/s5/account/register`

This endpoint (first GET for the challenge, then POST with signed payload to register) can be used to register a new account using a specific ed25519 public key. How that key is generated is up to the implementation and use case, but for apps using the S5 Identity system it's described in identity.md.

`/s5/account/login`

This endpoint (first GET for the challenge, then POST with signed payload to login) can be used to generate a new authentication token for a specific account. The token can then be used to upload blobs, list pinned blobs, pin new blobs, read stats and delete blobs.

Authentication

Calls to the S5 APIs below can be authenticated by either passing the token as a header Authorization: Bearer AUTH_TOKEN_HERE or as a query parameter for simplicity ?auth_token=AUTH_TOKEN_HERE

`/s5/account/stats`

This endpoint requires authentication in the form of a token. It returns account details and statistics related to storage usage.

Upload

S5 Nodes provide two HTTP-based APIs for uploading files, depending on the file size and if you need resumable uploads.

The simple one for small files

Small files can be uploaded with a single HTTP POST request, using the default file upload form field:

curl -X POST "https://S5_NODE_URL/s5/upload" -F "file=@example.txt"

Please note that non-localhost (or local network) S5 Nodes reachable on the Internet usually require authentication for uploading files, see accounts.md for details.

curl -X POST "https://S5_NODE_URL/s5/upload?auth_token=AUTH_TOKEN_HERE" -F "file=@example.txt"

TUS for larger files

For larger (resumable) file uploads, S5 uses the https://tus.io/ protocol.

Before starting an upload, you need to calculate the blake3 hash of the file locally. When creating the tus upload using the initial POST request, you need to pass the hash as metadata. With most tus libraries, you would pass the hash like this:

{
    "hash": "BASE64URL_ENCODE(0x1e + HASH_HERE)"
}

It's available on the /s5/upload/tus endpoint and also usually requires authentication.

API Interface

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

S5 contains a lot of features. To keep the system modular and make it easier to for example only implement some parts of the spec in a new programming language or environment, there are two standard API interfaces used by most systems. So if you implement these, you can just pass them to something like the FS.

S5APIProvider

This is the base API that must be supported by all implementations. Some of them might add additional methods for extra features or convenience.

TODO Add purpose/context field to upload methods.

typedef OpenReadFunction = Stream<List<int>> Function([int? start, int? end]);

abstract class S5APIProvider {
  /// Blocks until the S5 API is initialized and ready to be used
  Future<void> ensureInitialized();

  /// Upload a small blob of bytes
  ///
  /// Returns the Raw CID of the uploaded raw file blob
  ///
  /// Max size is 10 MiB, use [uploadRawFile] for larger files
  Future<BlobCID> uploadBlob(Uint8List data);

  /// Upload a raw file
  ///
  /// Returns the Raw CID of the uploaded raw file blob
  ///
  /// Does not have a file size limit and can handle large files efficiently
  Future<BlobCID> uploadBlobWithStream({
    required Multihash hash,
    required int size,
    required OpenReadFunction openRead,
  });

  /// Downloads a full file blob to memory, you should only use this if they are smaller than 1 MB
  Future<Uint8List> downloadBlob(Multihash hash, {Route? route});

  /// Downloads a slice of a blob to memory, from `start` (inclusive) to `end` (exclusive)
  Future<Uint8List> downloadBlobSlice(
    Multihash hash, {
    required int start,
    required int end,
    Route? route,
  });

  Future<void> pinHash(Multihash hash);

  Future<void> unpinHash(Multihash hash);

  Future<SignedRegistryEntry?> registryGet(
    Uint8List pk, {
    Route? route,
  });
  Stream<SignedRegistryEntry> registryListen(
    Uint8List pk, {
    Route? route,
  });
  Future<void> registrySet(
    SignedRegistryEntry sre, {
    Route? route,
  });

  Stream<SignedStreamMessage> streamSubscribe(
    Uint8List pk, {
    int? afterTimestamp,
    int? beforeTimestamp,
    Route? route,
  });
  Future<void> streamPublish(
    SignedStreamMessage msg, {
    Route? route,
  });

  CryptoImplementation get crypto;
}

The Route? route argument is not used currently

CryptoImplementation

abstract class CryptoImplementation {
  Uint8List generateSecureRandomBytes(int length);

  Future<Uint8List> hashBlake3(Uint8List input);

  Uint8List hashBlake3Sync(Uint8List input);

  Future<Uint8List> hashBlake3File({
    required int size,
    required OpenReadFunction openRead,
  });

  Future<bool> verifyEd25519({
    required Uint8List publicKey,
    required Uint8List message,
    required Uint8List signature,
  });

  Future<Uint8List> signEd25519({
    required KeyPairEd25519 keyPair,
    required Uint8List message,
  });

  Future<KeyPairEd25519> newKeyPairEd25519({
    required Uint8List seed,
  });

  Future<Uint8List> encryptXChaCha20Poly1305({
    required Uint8List key,
    required Uint8List nonce,
    required Uint8List plaintext,
  });

  Future<Uint8List> decryptXChaCha20Poly1305({
    required Uint8List key,
    required Uint8List nonce,
    required Uint8List ciphertext,
  });
}

Identity System

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

There are two types of identity, private and public ones. Currently, S5 only supports private identites. This means that at the moment, the S5 root identity (stored in the seed phrase) is only used to derive private secrets, like the ones managing access and encryption for the private file system. While you can share directories from your private file system with others, or even create a public file system, S5 does not manage one public user identity key which then links to the entrypoints to these or stores other public data.

Instead, for all use cases which require public identites (like for example publishing your favorite files in a public file system under your name), S5 will use the AT Protocol (built by Bluesky). The AT Protocol uses DIDs for public identity identifiers and S5 will likely store private key material used for your public AT Protocol identites in the future, to make it very easy to sign in to both (S5 and ATProto) at once using only your seed phrase. Your public AT Protocol identity (which might also be used as your social media presence) will then support S5-specific features like posting files or directories hosted on S5.

Regarding authentication, it's also important to note that almost all operations which only involve reading data (like listing directories or downloading a file) don't require an identity on S5.

Seed Phrases

S5 uses a custom algorithm for seed phrases:

The wordlist consists of 1024 unique words: https://github.com/s5-dev/lib5/blob/main/lib/src/seed/wordlist.dart

Every word on the list has a unique 3-letter prefix - this makes it possible to change a word in your seed phrase to whatever you want, as long as the first 3 characters stay the same.

So every word contains 10 bits of entropy, which means that we need 13 words for our 128 bits of entropy. Finally, there are two extra words containing a checksum (blake3), which should make it easier to recover your identity if you lost some words of your seed phrase. As a result, S5 seed phrases are always 15 words long.

The current implementation of seed generation and validation is available at https://github.com/s5-dev/lib5/blob/main/lib/src/seed/seed.dart, the algorithm will be added to this spec page in the future (still subject to small changes).

Derivation

After the root identity secret is available (from the seed phrase), it's used to derive a number of keys. S5 applications are expected to only (securely) store the derived keys they actually need for their features (like the file system or accounts) and drop all other keys, including the root identity secret.

The derivation algorithm is described in (key-derivation.md)[key-derivation.md] and the derivation paths for different use cases is implemented in https://github.com/s5-dev/lib5/blob/main/lib/src/identity/identity.dart. The paths will be added to this spec after the final decision on which ones are actually needed (and which ones can be removed due to no active use) is made.

Key Derivation

S5 uses a very simple and fast (but secure) key derivation function for identity, accounts and the file system (only scoped to a single directory).

The base key is always random; in the file system it's randomly generated for every newly created directory, for accounts it's derived in multiple steps from the identity root secret (which is randomly generated).

fn derive_hash(base: &[u8; 32], tweak: &[u8; 32]) -> [u8; 32] {
    let mut hasher = blake3::Hasher::new();
    hasher.update(base);
    hasher.update(tweak);
    *hasher.finalize().as_bytes()
}

fn derive_hash_string(base: &[u8; 32], tweak: &[u8]) -> [u8; 32] {
    derive_hash(base, blake3::hash(tweak).as_bytes())
}

fn derive_hash_int(base: &[u8; 32], tweak: u16) -> [u8; 32] {
    let mut tweak_bytes = [0u8; 32];
    tweak_bytes[..2].copy_from_slice(&tweak.to_le_bytes());
    derive_hash(base, &tweak_bytes)
}

Encryption

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

S5 supports different types of encryption, used in the file-system.md and other parts of the spec.

It ensures secure end-to-end-encryption when users need it, like in their private file system or when sending messages over a stream.

Supported algorithms

ID	Cipher
2	AES-GCM (AES-256-GCM)
4	XChaCha20-Poly1305

S5 implementations must support both AES-GCM and XChaCha20-Poly1305 for immutable and mutable encrypted blobs.

Immutable Encrypted Blobs

Immutable Encrypted Blobs are used for file versions in the S5 file-system.md.

They have some parameters:

cipher: The cipher used by the blob
chunk size: The chunk size used for encrypting the blob (must be a power of 2 and larger than 1024 bytes). By default, S5 uses 256 KiB chunks.
encrypted blob hash: The BLAKE3 hash of the encrypted blob
key: The encryption key used for the blob
padding: How much padding is added to the end of the blob (before encryption).
plaintext blob hash: The BLAKE3 hash of the blob, before encryption and padding
plaintext blob size: The size of the blob, before encryption and padding

The implementation itself is pretty simple: Every chunk is encrypted with the key, using the chunk index in the blob as a nonce (little-endian encoded). This is secure, because a new encryption key is randomly generated for every blob (and thus file version).

There's no spec (or need) for a way to encode immutable blobs as CIDs yet, because they only appear in the S5 file system file version objects, which are specifically designed to support the metadata parameters needed for encrypted blobs.

Mutable Encryption

Mutable Encryption can be used when a key needs to be re-used for multiple revisions of a metadata file (like a directory) or messages (in a stream).

Payloads are currently limited to 1 MiB in size and padding is used by default to obfuscate the true data size.

XChaCha20-Poly1305 is the default cipher for mutable encrypted blobs.

Directory CIDs with an encryption key

If you share a directory with someone using a CID, it usually consists of the bytes 5d ed 32_BYTE_PUBKEY, which indicates that it points to a S5 Directory metadata file (0x5d, see file-system.md), that it's mutable by using an ed25519 pubkey (0xed) that points to a registry entry.

But if the directory is encrypted, you also need a key. In that case, a CID is encoded like this:

5d 5e e4 32_BYTE_ENCRYPTION_KEY ed 32_BYTE_PUBKEY

0x5d   S5 Directory CID magic byte
0x5e   Directory uses mutable encryption
0xe4   Mutable encryption uses the XChaCha20-Poly1305 cipher (see "Supported algorithms")
       In the case of AES-256-GCM, it would instead be 0xe2

32_BYTE_ENCRYPTION_KEY   This is the 32-byte key needed to decrypt the directory metadata with XChaCha20-Poly1305

ed 32_BYTE_PUBKEY   The ed25519 pubkey pointing to the registry entry is the same as with non-encrypted directories.

Encryption (in Dart)

TODO: Does it make sense to use a magic byte prefix for encrypted mutable files, or would that be a risk?

const encryptionNonceLength = 24;
const encryptionOverheadLength = 16;

Future<Uint8List> encryptMutableBytes(
  Uint8List data,
  Uint8List key, {
  required CryptoImplementation crypto,
}) async {
  final lengthInBytes = encodeEndian(data.length, 4); // 4 bytes

  final totalOverhead =
      encryptionOverheadLength + lengthInBytes.length + encryptionNonceLength;

  final finalSize =
      padFileSizeDefault(data.length + totalOverhead) - totalOverhead;

  // Prepend the data size and append the padding bytes
  data = Uint8List.fromList(
    lengthInBytes + data + Uint8List(finalSize - data.length),
  );

  // Generate a random nonce.
  final nonce = crypto.generateRandomBytes(encryptionNonceLength);

  // Encrypt the data.
  final encryptedBytes = await crypto.encryptXChaCha20Poly1305(
    key: key,
    plaintext: data,
    nonce: nonce,
  );

  // Prepend the nonce to the final data.
  return Uint8List.fromList(nonce + encryptedBytes);
}

Decryption (in Dart)

const encryptionKeyLength = 32;

Future<Uint8List> decryptMutableBytes(
  Uint8List data,
  Uint8List key, {
  required CryptoImplementation crypto,
}) async {
  if (key.length != encryptionKeyLength) {
    throw 'wrong encryptionKeyLength (${key.length} != $encryptionKeyLength)';
  }

  // Validate that the size of the data corresponds to a padded block.
  if (!checkPaddedBlock(data.length)) {
    throw "Expected parameter 'data' to be padded encrypted data, length was '${data.length}', nearest padded block is '${padFileSizeDefault(data.length)}'";
  }

  // Extract the nonce.
  final nonce = data.sublist(0, encryptionNonceLength);

  final decryptedBytes = await crypto.decryptXChaCha20Poly1305(
    key: key,
    nonce: nonce,
    ciphertext: data.sublist(encryptionNonceLength),
  );

  final lengthBytes = decryptedBytes.sublist(0, 4);
  final length = decodeEndian(lengthBytes);

  return decryptedBytes.sublist(4, length + 4);
}

Padding

S5 uses "pad blocks" for padding, with the algorithm taken from the Sia Skynet project.

/// MIT License
/// Copyright (c) 2020 Nebulous

/// To prevent analysis that can occur by looking at the sizes of files, all
/// encrypted files will be padded to the nearest "pad block" (after encryption).
/// A pad block is minimally 4 kib in size, is always a power of 2, and is always
/// at least 5% of the size of the file.
///
/// For example, a 1 kib encrypted file would be padded to 4 kib, a 5 kib file
/// would be padded to 8 kib, and a 105 kib file would be padded to 112 kib.
/// Below is a short table of valid file sizes:
///
/// ```
///   4 KiB      8 KiB     12 KiB     16 KiB     20 KiB
///  24 KiB     28 KiB     32 KiB     36 KiB     40 KiB
///  44 KiB     48 KiB     52 KiB     56 KiB     60 KiB
///  64 KiB     68 KiB     72 KiB     76 KiB     80 KiB
///
///  88 KiB     96 KiB    104 KiB    112 KiB    120 KiB
/// 128 KiB    136 KiB    144 KiB    152 KiB    160 KiB
///
/// 176 KiB    192 Kib    208 KiB    224 KiB    240 KiB
/// 256 KiB    272 KiB    288 KiB    304 KiB    320 KiB
///
/// 352 KiB    ... etc
/// ```
///
/// Note that the first 20 valid sizes are all a multiple of 4 KiB, the next 10
/// are a multiple of 8 KiB, and each 10 after that the multiple doubles. We use
/// this method of padding files to prevent an adversary from guessing the
/// contents or structure of the file based on its size.
///
/// @param initialSize - The size of the file.
/// @returns - The final size, padded to a pad block.

int padFileSizeDefault(int initialSize) {
  final kib = 1 << 10;
  // Only iterate to 53 (the maximum safe power of 2).
  for (var n = 0; n < 53; n++) {
    if (initialSize <= (1 << n) * 80 * kib) {
      final paddingBlock = (1 << n) * 4 * kib;
      var finalSize = initialSize;
      if (finalSize % paddingBlock != 0) {
        finalSize = initialSize - (initialSize % paddingBlock) + paddingBlock;
      }
      return finalSize;
    }
  }
  // Prevent overflow.
  throw "Could not pad file size, overflow detected.";
}

bool checkPaddedBlock(int size) {
  final kib = 1024;
  // Only iterate to 53 (the maximum safe power of 2).
  for (int n = 0; n < 53; n++) {
    if (size <= (1 << n) * 80 * kib) {
      final paddingBlock = (1 << n) * 4 * kib;
      return size % paddingBlock == 0;
    }
  }
  throw "Could not check padded file size, overflow detected.";
}

File System (FS5)

This specification is still a work-in-progress draft. If you spot any issues or have suggestions on how it could be improved, please create an issue here: https://github.com/s5-dev/docs/issues

The S5 file system (FS5) is a decentralized, end-to-end-encrypted (if needed), content-addressed, versioned file system built using all primitives explained in the other S5 specifications.

This specification is not fully complete yet, it will be updated shortly together with the Dart reference implementation

Directory CIDs

S5 supports sharing and referencing directories using CIDs. They are currently either 34 bytes long (for non-encrypted directories) and longer for encrypted directories (see <encryption.md>).

The first byte is always 0x5d.

The second byte is either 0x1e (blake3) for immutable directories, or 0xed (ed25519 pubkey for the registry) if you need a mutable pointer.

Finally, the last 32 bytes are either the blake3 hash or the ed25519 pubkey, depending on the previous byte.

The CID bytes are then encoded using multibase, see <blobs.md> for details on that.

When listing a directory, you first check if it's immutable or not. If yes, you just download the metadata blob using the blake3 hash from the network. If not, you first resolve the ed25519 pubkey to a blake3 hash using the <registry.md> and then download the metadata blob.

S5 directories can also be end-to-end-encrypted, as described in <encryption.md>.

FS5 Directory Schema

The schema is defined in Rust. See the documentation of the msgpack_schema crate for details on what the annotations mean.

#![allow(unused)]
fn main() {
use msgpack_schema::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
#[untagged]
struct FS5Directory {
    file_signature: String, // constant "FS5.io" (magic bytes)
    header: FS5DirectoryHeader,
    directories: BTreeMap<String, DirectoryReference>,
    files: BTreeMap<String, FileReference>,
}
}

#![allow(unused)]
fn main() {
#[derive(Deserialize, Serialize)]
struct FS5DirectoryHeader {
    // 1: info string
    // 2: created_ts

    #[tag = 4] // TODO check
    #[optional]
    previous_version: Option<Vec<u8>>,

    #[tag = 6] // TODO check
    #[optional]
    share_state: Option<FS5DirectoryHeaderShareState>,

    // TODO optional HAMT sharding or B-Tree info
    // TODO 0x07: registry entry pointing to this
    // TODO 0x08: stream pointing to this? for incremental changes
}

#[derive(Deserialize, Serialize)]
struct FS5DirectoryHeaderShareState {
    #[tag = 1]
    #[optional]
    ro_shared_ts: Option<u64>,

    #[tag = 2]
    #[optional]
    rw_shared_ts: Option<u64>,
}

}

Directory Reference

#![allow(unused)]
fn main() {
#[derive(Deserialize, Serialize)]
struct DirectoryReference {

    #[tag = 1]
    link: Vec<u8>, // always 32 bytes
    // TODO What if we want to encrypt the link?

    //#[tag = 2]
    //link_type: u8, // either 0xed pubkey or 0x1e static

    #[tag = 3]
    created_ts: u64,

    #[tag = 4]
    name: String,

    // ====================================

    #[tag = 4]
    // keys: DirectoryKeySet,
    // {key_id: {how_to_get_there: value}}
    keys: DirectoryKeySet,

    // ====================================

    #[tag = 14] // 0x0e
    #[optional]
    ext: Option<BTreeMap<String, msgpack_value::Value>>,
}

}

File Reference

#![allow(unused)]
fn main() {
#[derive(Deserialize, Serialize)]
struct FileReference {
    #[tag = 1] 
    file: FileVersion,

    #[tag = 3]
    created_ts: u64,

    #[tag = 14] // 0x0e
    #[optional]
    ext: Option<BTreeMap<String, msgpack_value::Value>>,
}

#[derive(Deserialize, Serialize)]
struct FileVersion {
    #[tag = 1]
    size: u64,
    #[tag = 3]
    created_ts: u64,

    // TODO is_deleted: Option<bool>,

    // TODO Encryption paramenters (maybe support multiple blobs) 

    #[tag = 0x1e]
    hash_blake3: Vec<u8>,

    #[tag = 14] // 0x0e
    #[optional]
    ext: Option<BTreeMap<String, msgpack_value::Value>>,
}
}

FS5 URI Format

fs5://DIRECTORYCID/path/to/file.txt

Example (long because hex-encoded): fs5://f5ded0000000000000000000000000000000000000000000000000000000000000000/video.mkv

Guides

Setup with Sia

Deploy a personal S5 Node with renterd storage on Debian

In this guide, you'll learn how to deploy a production-ready S5 Node backed by Sia renterd storage.

You can then use it with the Vup Cloud Storage app or just play with the S5 API directly to upload and manage files of any size!

Requirements

Domain Name (just a subdomain works too)
Debian VPS (x86 or arm64) with 8+ GB of RAM and 128+ GB of free disk space (16+ GB of RAM are better for performance)
Some SC (siacoin) for forming contracts on the network and renting storage

If you're looking for affordable providers with these specs, I found the new Netcup ARM Servers to be a pretty good choice (https://www.netcup.de/vserver/arm-server/)

7 EUR/month for 8 GB of RAM
12 EUR/month for 16 GB of RAM

Install Sia `renterd`

Check out the official Sia docs for detailed instructions with screenshots: https://docs.sia.tech/renting/setting-up-renterd/linux/debian

Or just connect to your Debian VPS over SSH and copy-paste these commands:

sudo curl -fsSL https://linux.sia.tech/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/siafoundation.gpg
echo "deb [signed-by=/usr/share/keyrings/siafoundation.gpg] https://linux.sia.tech/debian $(. /etc/os-release && echo "$VERSION_CODENAME") main" | sudo tee /etc/apt/sources.list.d/siafoundation.list
sudo apt update
sudo apt install renterd
cd /var/lib/renterd

Run renterd version to verify it was installed correctly.

Configure Sia renterd

Run cd /var/lib/renterd, then sudo renterd config and follow the instructions. Please choose a secure password for the renterd admin UI! You can use pwgen -s 42 1 to generate one.

Type yes when you're asked if you want to configure S3 settings.

Keep the S3 Address on the default setting. (You won't need it for this guide, but s3 support is very useful for many other potential use cases for your renterd node). It might also make sense to write down the generated s3 credentials if you want to use them later.

Finally, start renterd using sudo systemctl start renterd

Then you can re-connect to your VPS using ssh -L localhost:9980:localhost:9980 IP_ADDRESS_OR_DOMAIN to create a secure SSH tunnel to the renterd web UI. After connecting, you can open http://localhost:9980/ in your local web browser and authenticate with the previously set API password.

In the web UI, follow the step-for-step welcome guide and set everything up.

Configure the storage settings
Fund your wallet (https://docs.sia.tech/renting/transferring-siacoins)
Create a new bucket with the name s5 on http://localhost:9980/files (top right button)
Wait for the chain to sync
Wait for storage contracts to form

Install and setup Caddy (reverse proxy)

So first, you'll need to decide on two domains you want to use - one for the S5 node (important) and one for the download proxy to your renterd node (less important).

For example if you own the domain example.com, you could run the S5 Node on s5.example.com and the download proxy on dl.example.com.

You should then add DNS A records pointing to the IP address of your VPS for both of these subdomains. Of course AAAA records are nice too if you have IPv6.

Then install Caddy by following these instructions: https://caddyserver.com/docs/install#debian-ubuntu-raspbian

Then edit the Caddy config with nano /etc/caddy/Caddyfile to something like this:

You can generate the TODO_PASTE_HERE part by encoding your Sia renterd API password to base64 (input: :APIPASSWORD) on https://gchq.github.io/CyberChef/#recipe=To_Base64('A-Za-z0-9%2B/%3D')

S5_API_DOMAIN {
  reverse_proxy 127.0.0.1:5050
}

DOWNLOAD_PROXY_DOMAIN {
  uri strip_suffix /
  header {
    Access-Control-Allow-Origin *
  }
  rewrite * /api/worker/objects/s5{path}?bucket=s5
  @download {
    method GET
  }
  reverse_proxy @download {
    to 127.0.0.1:9980
    header_up Authorization "Basic TODO_PASTE_HERE"
  }
}

Then restart Caddy with systemctl restart caddy

Install and set up S5

First, install Podman using this command: sudo apt-get -y install podman

Create some needed directories:

mkdir -p /s5/config
mkdir -p /s5/db
mkdir -p /tmp/s5

If you're not root, you might need to run sudo chown -R $USER /tmp/s5 /s5 to set permissions correctly.

Then start a S5 node with these commands: (You might need to create the /s5/ directories in that command first)

podman run -d \
--name s5-node \
--network="host" \
-v /s5/config:/config \
-v /s5/db:/db \
-v /tmp/s5:/cache \
--restart unless-stopped \
ghcr.io/s5-dev/node:latest

Edit nano /s5/config/config.toml

Add these config entries there (the APIPASSWORD is what you used to login to the renterd web UI):

[http.api]
domain = 'S5_API_DOMAIN'

[accounts]
enabled = true
[accounts.database]
path = "/db/accounts"

[store.sia]
workerApiUrl = "http://127.0.0.1:9980/api/worker"
bucket = "s5"
apiPassword = "APIPASSWORD"
downloadUrl = "https://DOWNLOAD_PROXY_DOMAIN"

Then run podman restart s5-node to restart the S5 Node.

You can visit https://S5_API_DOMAIN/s5/admin/app in your web browser to create and manage accounts manually. The API key for your node can be retrieved by running journalctl | grep 'ADMIN API KEY'

Using your new S5 Node for Vup Storage

Edit nano /s5/config/config.toml and add

[accounts]
authTokensForAccountRegistration = ["INSERT_INVITE_CODE_OF_CHOICE_HERE"]
enabled = true

Then run podman restart s5-node to restart the S5 Node.

Now you can use the "Register on S5 Node" button in the Vup "Storage Service" settings, enter the domain of your node and the newly generated invite code and you should be good to go! You'll likely want to use more than 10 GB of storage, so just use the Admin Web UI to set a higher tier for your newly created account.

Setup With Sia

Please follow this guide instead: deploy-renterd.html

Sia is a decentralized, affordable and secure cloud storage platform. You can use it as a storage backend for your S5 Node.

First, you'll need a fully configured instance of renterd (the new Sia renter software) running somewhere. Here's a great guide which shows you how to set one up easily on the Sia testnet: https://blog.sia.tech/sia-innovate-and-integrate-christmas-2023-hackathon-9b7eb8ad5e0e

Next, you need to set up a S5 Node using the instructions available at /install/index.html

For configuring the S5 Node to use your Sia renter node, you will need to add this section to your config.toml:

[store.s3]
accessKey = "MY_ACCESS_KEY" # Replace this with the access key from your renterd.yml
bucket = "sfive" # Or just "default"
endpointUrl = "YOUR_S3_ENDPOINT_URL" # http://localhost:7070 if you followed the Sia renterd testnet guide
secretKey = "MY_SECRET_KEY" # Replace this with the secret key from your renterd.yml

And then restart the node with docker container restart s5-node

You might also want to enable the accounts system on your node if it's available on the internet or if you want to use it with Vup, see /install/config/index.html for details.

Tools

This section contains some useful tools for working with S5

cid.one

https://cid.one/ is a CID explorer for the S5 network.

It supports raw CIDs, all of the metadata formats, resolver CIDs and (soon) encrypted CIDs.

Here are some examples:

Raw file: https://cid.one/#uJh9dvBupLgWG3p8CGJ1VR8PLnZvJQedolo8ktb027PrlTT5LvAY

Resolver CID: https://cid.one/#zrjD7xwmgP8U6hquPUtSRcZP1J1LvksSwTq4CPZ2ck96FHu

Media Metadata: https://cid.one/#z5TTvXtbkQk9PTUN8r5oNSz5Trmf1NjJwkVoNvfawGKDtPCB

Web App Metadata: https://cid.one/#blepzzclchbhwull3is56zvubovg7j3cfmatxx5gyspfx3dowhyutzai

s5.cx

s5.cx is a web-based tool to securely stream files of any size directly from the S5 network. File data is NOT proxied by the s5.cx server.

It works by using a service worker that intercepts all raw file requests, fetches the file data from a host on the S5 network and verifies the integrity using BLAKE3/bao in Rust compiled to WASM and running directly inside of the service worker.

The service worker code can be used by any web app to easily stream files from S5 without needing any additional code or libraries in your project. A repository with setup instructions will be published soon.

The service worker is already being used by https://tube5.app/.

Here's an example file: https://s5.cx/uJh9dvBupLgWG3p8CGJ1VR8PLnZvJQedolo8ktb027PrlTT5LvAY.mp4

Install the S5 node

Right now the only supported way to run a S5 node is using a container runtime like Docker or Podman.

You can install Docker on most operating systems using the instructions here: https://docs.docker.com/engine/install/

If you are on Linux you can use the convenience script: curl -fsSL https://get.docker.com | sudo sh

Podman is a popular alternative to Docker, but it might be harder to install on non-Linux system. You can find instructions for it here: https://podman.io/getting-started/installation

Run S5 using Docker

Before running this command, you should change the paths ./s5/config and ./s5/db to a storage location of your choice.

docker run -d \
--name s5-node \
-p 127.0.0.1:5050:5050 \
-v ./s5/config:/config \
-v ./s5/db:/db \
--restart unless-stopped \
ghcr.io/s5-dev/node:latest

This will only bind your node to localhost, so you will need a reverse proxy like Caddy to access it from the internet.

If you instead want to expose the HTTP API port to the entire network, you can set -p 5050:5050

If something seems to not work correctly, you can view the logs with docker logs -f s5-node

config path

This path will be used to generate and load the config.toml file, you will need to edit that file for configuring stores and other options.

db path

This path is used for storing small key-value databases that hold state relevant for the network and node. Do not use a slow HDD for this.

(optional) cache path

The cache stores large file uploads and different downloads/streams. You can use a custom cache location by adding -v ./s5/cache:/cache to your command.

(optional) data path

If you are planning to store uploaded files on your local disk, you should prepare a directory for that and specify it with -v ./s5/data:/data

Using Sia

If you want to use S5 with an instance of renterd running on the same server, you should add the --network="host" flag to grant S5 access to the renterd API.

Stop the container

docker container stop s5-node

Remove the container

docker container rm s5-node

Alternative: Using docker-compose

Create a file called docker-compose.yml with this content:

version: '3'
services:
  s5-node:
    image: ghcr.io/s5-dev/node:latest
    volumes:
      - ./path/to/config:/config
    ports:
      - "5050:5050"
    restart: unless-stopped

Same configuration options as with normal Docker/Podman, run it with docker-compose up -d

S5 Config

You can edit the config.toml file to configure your S5 node. You can apply changes with docker container restart s5-node

This page describes the available sections in the config.

keypair

The seed is generated on first start, you should keep it private. It's used for signing messages on the network.

http.api

domain: Configure this value to match the domain you are using to access your node. If you for example configured your domain example.com to be reverse-proxied to your S5 Node Docker container using Caddy, nginx or others, you should set this to example.com

port: On which port the HTTP API should bind to and be available (you should usually keep this the default)

store

Check out the Stores documentation for configuring different object stores.

accounts

You can enable the accounts system by adding this part to your config:

[accounts]
enabled = true
[accounts.database]
path = "/db/accounts"

Registrations are disabled by default, you can enable them by adding this part:

[accounts]
alwaysAllowedScopes = [
    'account/login',
    'account/register',
    's5/registry/read',
    's5/metadata',
    's5/debug/storage_locations',
    's5/debug/download_urls',
    's5/blob/redirect',
]

Advanced

cache

Configure a custom cache path with path, you likely don't need this if you are using Docker.

database

Configure a custom database path, you likely don't need this if you are using Docker.

p2p.peers

List of initial peers used for connecting to the p2p network.

Caddy reverse proxy

Caddy is an easy to use reverse proxy with automatic HTTPS.

You can install it by following the instructions over at https://caddyserver.com/docs/install

You'll also need a domain name with A and AAAA records pointed to your server.

You should also make sure that your firewall doesn't block the ports 80 and 443

Configuration

With the default S5 port of 5050, you can configure your /etc/caddy/Caddyfile like this:

YOUR.DOMAIN {
  reverse_proxy localhost:5050
}

On Debian and Ubuntu you can run sudo systemctl restart caddy to restart Caddy after editing the Caddyfile.

Don't forget to configure http.api.domain in your S5 config.toml after setting up a domain and reverse proxy!

Stores

The S5 network and nodes supports multiple different storage backends.

S3 is the easiest to set up, Sia is the cheapest option.

Local stores all files on your server directly, so that usually only makes sense for a home NAS use case or a small number of files.

Arweave provides permanent storage for a high price.

S3-compatible providers

Any cloud provider supporting the S3 protocol, see https://s3.wiki for the cheapest ones.

Configuration

[store.s3]
accessKey = "YOUR_ACCESS_KEY"
bucket = "YOUR_BUCKET_NAME"
endpointUrl = "YOUR_S3_ENDPOINT_URL"
secretKey = "YOUR_SECRET_KEY"

Local

Stores uploaded files on the local filesystem.

Configuration

[store.local]
path = "/data" # If you are using the Docker container

[store.local.http]
bind = "127.0.0.1"
port = 8989
url = "http://localhost:8989"

By default, files will only be available on your local node. To make it available on the entire network, you have to forward your port to be reachable from the internet and then update the url to the URL at which your computer is available from the internet.

Sia Network

The Sia network provides decentralized and redundant data storage.

This page shows how to use Sia with the native integration, it most cases you should follow this guide for the S3-based integration instead: /guide/setup-with-sia.html

You will need a fully configured local instance of renterd: https://github.com/SiaFoundation/renterd

Warning: Both renterd and this integration are still experimental. Please report any bugs you encounter.

Configuration

[store.sia]
workerApiUrl = "http://localhost:9980/api/worker"
apiPassword = "test"
downloadUrl = "https://dl.YOUR.DOMAIN"

Using Caddy as a reverse proxy for Sia downloads

This configuration requires a version of Caddy with https://github.com/caddyserver/cache-handler, if you don't want to cache Sia downloads you can remove the first 4 lines and the cache directive.

/etc/caddy/Caddyfile:

{
    order cache before rewrite
    cache
}

dl.YOUR.DOMAIN {
  uri strip_suffix /

  header {
    Access-Control-Allow-Origin *
  }

  cache {
    stale 6h
    ttl 24h
    default_cache_control "public, max-age=86400"
    nuts {
      path /tmp/nuts
    }
  }

  rewrite * /api/worker/objects/1{path}

  reverse_proxy {
    to localhost:9980
    header_up Authorization "Basic OnRlc3Q=" # Change this to match your renterd API key
  }
}

Arweave

Arweave is expensive, but provides permanent storage for a one-time payment. Check out https://www.arweave.org/

Disabled right now

Metadata formats

This section contains documentation for all metadata formats used and supported by S5.

All formats have a JSON representation for easy creation, debug purposes and editing.

All formats also have a highly optimized serialization representation based on https://msgpack.org/ used for storing them on S5 including (optional) signatures and timestamp proofs.

JSON Schemas for all formats are available here: https://github.com/s5-dev/json-schemas

Web App metadata

Metadata format used for web apps stored on S5. This docs website is hosted using it.

Example

Web App Metadata: https://cid.one/#blepzzclchbhwull3is56zvubovg7j3cfmatxx5gyspfx3dowhyutzai

Fields

Full JSON Schema: https://schema.sfive.net/web-app-metadata.json

Web-based viewer: https://json-schema.app/view/%23?url=https%3A%2F%2Fschema.sfive.net%2Fweb-app-metadata.json

Directory metadata

Work-in-progress, will be used to store directory trees in Vup. Supports advanced sharing capabilities and is fully end-to-end-encrypted by default.

Media metadata

Very flexible metadata format used for almost any more advanced content/media structure.

Can be used for videos, images, music, podcasts, profiles, lists and more!

Already being used by Tube5.

Example

Media Metadata: https://cid.one/#z5TTvXtbkQk9PTUN8r5oNSz5Trmf1NjJwkVoNvfawGKDtPCB

Fields

Full JSON Schema: https://schema.sfive.net/media-metadata.json

Web-based viewer: https://json-schema.app/view/%23?url=https%3A%2F%2Fschema.sfive.net%2Fmedia-metadata.json

S5 Network Docs