2025 Roadmap for Hyper Hyper Space

Introduction

Hyper Hyper Space is a data sync engine designed for authority decentralization. This means that each device acts as a sovereign node, verifying the validity of the information it receives, and ensures that any local changes are also verifiable by others.

Throughout 2025 we will develop a new major version of Hyper Hyper Space (version 3). The NLNet Foundation will support this work through its NGI Zero Core fund.

Following our previous analysis, we'll have two main goals:

Co-Transactions will be used as the unifying abstraction through which non-monotonic behavior will be represented.
Adapters for traditional storage systems (relational databases, filesystems, etc.) will be introduced.

This combination (co-transactions + adapters) should enable application developers to use traditional storage systems, and have Hyper Hyper Space automatically transform application data into a verifiable log that can be used for synchronization.

We'll also adopt a modular approach, where each component in the system can be replaced (transport, peer discovery, log storage, etc.), instead of targeting web browsers as in previous versions of the engine.

The plan for version 3 follows. Any feedback will be received warmly. Possible collaborations or contributions would be especially welcome.

Sync Engine Version 3.0

Work on v3.0 will be divided in 4 stages:

Refactor Merkle-DAG based Log

We'll start by refactoring the Merkle-DAG based log. In the current version, in addition to the log being a DAG, each log entry is composed by a DAG itself, that may overlap with the DAGs in other entries. We'll simplify by making the payload of each entry opaque, and deferring data deduplication to higher layers. Use a compact header to include necessary information about the topology of the DAG. This header will later be used when doing over-the-wire reconciliation to figure out deltas. Finally, add support for doing fast local comparisons of DAG entries, enabling their use as Merkle-clocks. Finding common ancestry (a fast "meet" operation in the semilattice formed by the partially ordered Merkle-timestamps in the log) should also be supported.

Add Materialized State & Time Traveling

In the v2 engine, state could either be generated by re-playing the entire log in-memory, or a snapshot of the latest state could be loaded to memory directly. There are two important changes in state management in v3: first, state will be materialized directly on the storage medium, alongside the log, without needing to load the entire thing to RAM. Second, to implement co-transactions later on, we need to be able to instantiate the state at any point of the log (a.k.a. time traveling).

The unit of state materialization should be a composition by nesting of several basic CRDTs. Each of these units would share a single history log, and they will reference each other to create the full application data model, that will support constraints via co-transactions. We'll implement this new state storage system (supporting state materialization and time traveling) and port the basic CRDTs from v2 (sets, sequences, registers, counters).

Refactor Sync Module & write Sync Protocol Spec

We'll next apply a simplifying refactor to the sync engine, following the refactor of the Merkle-DAG. In particular, now that log entry payloads are opaque to the log, all the logic for diffing the contents of log entries will be eliminated. A spec for this simplified version of the sync protocol will be produced. Peer discovery & identity verification will be delegated to plug-ins and fully modular. Simple versions inspired on the ones in v2 will be provided as plug-ins for v3.

Add support for Co-Transactions

Finally, the new state-materializing store will be augmented with a co-transactional engine. Which co-transactions are actually valid (have all their preconditions satisfied) will be added to the materialized state, and time traveling over the log will also be supported for co-transaction evaluation. When materializing state upon advancing a log, any states that should not have been observed will be detected (with the help of the new lattice-like operations on history now provided by the log), and time-traveling of state will be used to detect precondition invalidation and, if necessary, replay the necessary portion of the log to arrive at the new correct state.

At this point, the 3.0 engine will be operational!

Relational Database Adapter

The objective of adapters is to enable p2p synchronization of data within traditional storage systems. Our first adapter will target relational databases: relation-based schemas, if constructed with some care, are a good fit for synchronization using Hyper Hyper Space.

Since we want to generate a self-verifiable log from the database contents, all roles, permissions, access controls and other security-related information must be self-contained in the database itself (as tabular data, not as configurations in the native database permissions system). Furthermore, all identities will be based on asymmetric cryptography, row identifiers will be random (UUID or similar), and we'll need to add columns to help the adapter detect changes, attribute them to a given crypto id, and inform the status of each row with respect to synchronization.

Development will proceed in three stages:

Table-wrapping CRDT

First we'll develop a CRDT that will wrap the contents of a single table, represented as a collection of named records. The values will initially be LWW pointers to the usual SQL-supported datatypes (specific CRDTs for text, counters and application defined types could be added later). This CRDT will depend co-transactionally on a second schema-definition CRDT, that will provide schema versioning. All updates on the table will have a precondition stating they respect the constraints present in the schema. This way, when the schema changes, it will be transparent for concurrent changes that conform to it (for example when adding a new optional field), but concurrent changes that do not (when a new constraint has been added) will automatically trigger the co-transactional invalidation mechanism. Inter-table consistency conditions will also be enforced co-transactionally (to write to a table, a corresponding row in the table where permissions are stored must be present, etc.).

Two-way data exchange adapter

Using the table-wrapping CRDT, we'll code be the actual adapter: a service that detects changes in the database and imports them into the verifiable log within the sync system, and conversely applies the changes received through sync to the local database. Whenever there are conflicts that lead to co-transactional rollbacks, the adapter must both update the corresponding status in the database view, and provide this information via a stream and/or a polling endpoint to the application.

Database replication configuration and versioning tool

Finally, we'll create a small configuration tool that can generate instances of both the sync engine and the schema of the exported database (using SQL DDL). This tool should also be able to process patches that change both the schema definition CRDTs and the schema in the relational database (again, using SQL DDL). This tool will have a command-line version, and a programmatic one that can be used to execute database updates when software is updated on-device.

When generating SQL DDL, the tool should, when possible, generate constraints in the relational database to match the consistency conditions applied by the replication mechanism (using foreign keys, check constraints, etc.), to attempt to make invalid changes fail locally, before entering the sync engine. This will not be exhaustive, though, and is provided just to make development easier.

At this point, a developer should be able to define a schema that includes all the application invariants (in particular, those related to permissions and access control) and obtain the SQL DDL for the relational database view schema. A synchronization instance that can be used with the adapter will be initialized automatically. If later on, the schema needs to be modified, the tool should be able to generate SQL DDL and operations on the schema definition CRDTs that can be shipped to clients as software is updated and applied automatically.

Filesystem Adapter

Applications depend on media files (images, videos, attachments, etc.) that are usually not stored within a database. For simplicity, we want these files to be delivered and synchronized using Hyper Hyper Space.

This adapter will provide a configuration interface that will define access control permissions for a key -> blob store, and an API to fetch & share these blobs, as well as export them to the local filesystem for display in the application.

Audit & Next Steps

At this point, a complete and easy to use back-end for distributed applications should be operational. NGI ZeroCore will facilitate a security audit, and we'll seek partnerships with both application developers and application building tools.

References

Hyper Hyper Space: Conclusions and ideas for building p2p secure data sync, by S. Bazerque
https://www.hyperhyperspace.org/report.html
Hyper Hyper Space, Project Website
https://www.hyperhyperspace.org
Hyper Hyper Space Sync Engine and adapters, NGI Zero Core Project Briefing
https://nlnet.nl/project/HHS-SyncEngine/
NGI Zero Core, a fund by NLNet Foundation
https://nlnet.nl/core/

This article has been archived in this repo. Feel free to add issues or submit corrections.

Designed using Cavepaint.