Problems.vc

After the Platforms: The Agent SubstratePart III of IV

The Shape the Problem Wants

If you were starting from scratch, what would the data substrate underneath an agent-era economy actually have to do? Three fields converge on one shape: a metagraph with cryptographic provenance and capability-mediated access.

The last piece ended with Alice and Bob trapped in agent-mediated verification loops across systems that cannot prove anything to anyone. This piece asks the design question that implies.

If you were starting from scratch, with no legacy vendors to accommodate and no business model to preserve, and your job was to build the data substrate underneath commerce in an agent-era economy, what would it actually have to do?

The answer is not a preference. It is a set of properties that emerge from what the problem is, and the properties are more constrained than they look. Once you write them down and look at what each one implies, you are pointed at a specific object. Three different fields converge on the same object from three different starting points. The object is not contingent. It is structural.

Here is how it falls out.

The first property: every claim carries proof of where it came from

Go back to piece 2. The core friction in every scene was the same. Alice's agent produces an artifact. The counterparty's agent receives it. Neither side has any way to independently verify that the artifact corresponds to anything true. So trust is substituted for verification, and the volume of unverified assertion compounds faster than any party, human or agent, can review it.

The first property falls out of this directly. Every claim about Alice's company has to carry proof of where it came from. Not a reputation chain ("Alice said so, and Alice seems honest"), which is what PDFs give you, and which agents weaken rather than strengthen because agents can hallucinate. A cryptographic chain: this P&L line was signed at this moment by this accounting system, which was signed by this company's identity, which is derived from a keypair this company controls. The chain bottoms out at a primary attestation, a signature by a key that any counterparty can verify without anyone else's cooperation.

This is cryptographic provenance. Everything else in the design falls out of what it requires.

It requires, first, that claims be signed at the point of origin. Not later, by a trust intermediary. At the moment the claim is produced, by the system that produced it. An accounting system that books a line of revenue signs the claim as it is booked. An HR system that records a hire signs the claim at the moment of hire. A bank that sees a transaction signs its own view of that transaction. Every claim in the system is born with a signature. Claims without signatures are artifacts, not claims.

It requires, second, that identity be something a counterparty can verify without relying on a registry that a third party controls. If Alice's counterparty has to ask a platform "is this Alice's key?" then the platform is the trust anchor, not the cryptography, and we are back to the situation piece 1 described. So identity has to be derivable from content, not assigned from above. A company's identity is its public key. Its key is its key. No registry can revoke it.

It requires, third, that a signed claim can cite other signed claims, and that citations are themselves structural. If a P&L line cites the revenue transactions it summarizes, the citation has to be verifiable. You cannot just say "summary of transactions 1 through 4317"; you have to point at those transactions such that anyone can fetch them and verify the summary. This pushes the representation toward content-addressed pointers: every claim has a canonical identifier derived from its content, and citations are references to those identifiers.

So the first property, provenance, pulls three more properties into place. Signing at the source. Identity derived from keys. Claims referenced by content address.

The second property: access is a grant, not a platform intermediation

Given provenance, the next question is how a counterparty reads the claims.

The answer piece 1 described as capability-mediated access is what drops out once you take provenance seriously. Alice's counterparty does not need a platform in the middle. The counterparty's agent needs a signed grant from Alice saying "you may read this slice of my claims for this period of time for this purpose, and here is the signature proving I authorized this." The grant is itself a signed claim. The counterparty's agent presents the grant to Alice's systems, or to a node Alice has delegated to hold her claims, and the systems verify the grant cryptographically before returning the requested data.

The grant is bounded. Bounded by scope (what claims it covers), by time (how long it is valid), and by purpose (what the counterparty is allowed to do with the claims once it has them, at least as a declared intent that downstream violations can be reasoned about). The grant is revocable. At any point Alice can publish a revocation, also signed, and any subsequent attempt to exercise the grant fails.

This replaces the OAuth-to-a-platform pattern. It also replaces the PDF-export pattern. It is a different trust model than either. The platform does not hold Alice's data and mediate access to it. Alice holds her data and grants access directly. The platform, if one exists, is a service Alice uses to host her node, not a custodian with a commercial stake in gatekeeping her counterparties.

The third property: the representation is native to structure, not flattened to rows

Provenance creates claims about claims. A P&L line cites revenue transactions. A revenue transaction cites a customer record. A customer record cites the entity that vouched for it. Each citation is itself a signed claim, which is itself cited by other claims. The structure is recursive.

A flat table cannot hold this. Foreign keys between tables can express some of it, but the moment a citation needs its own properties (who cited this, when, with what confidence), the foreign key is inadequate. You end up with a join table, which is a relation pretending not to be a relation. You end up with many join tables. You end up with a schema that works around the fact that your representation does not treat relations as first-class.

The property that falls out is that edges, in the graph sense, have to be first-class. An edge between two claims has its own identity, its own properties, its own participation in other edges. An edge is not a pointer from A to B; it is itself an object that can be referenced, signed, and reasoned about directly. Once edges are first-class, the representation is no longer a graph in the traditional sense. It is a hypergraph where edges can themselves be vertices in other edges. A graph about graphs. A metagraph.

The fourth property: aggregation is an opt-in commons

The fourth property is less forced by provenance than by what happens when the first three are in place.

Once data lives with its owners and access is granted rather than mediated, the aggregate intelligence that currently accrues privately to silo operators (what products work, how companies move, what patterns predict outcomes) is no longer captured by default. Silo operators capture it today because every read flows through them. If reads flow directly between owners and counterparties under capability grants, nobody is in the middle to aggregate by default.

This is a feature, not a bug. But it implies that the aggregate intelligence, which is genuinely valuable, has to be reconstituted intentionally rather than taken by capture. Participants who want to benefit from aggregate intelligence contribute to it: they commit anonymized slices of their data to a shared pool under cryptographic guarantees that individual values cannot be reconstructed. In exchange, they read back what the aggregate tells them. The pool is a commons, not a product. It is opt-in at every moment. It is not owned by anyone.

This is a different social arrangement than the one we have now, where aggregation is a byproduct of platform custody. Here it is a deliberate public good, contributed to by participants who value what the aggregate offers.

The fifth property: the connective tissue is protocol, not product

The last property is what ties the others together.

If any single actor controls the connective tissue (the content-addressing scheme, the capability grammar, the provenance format, the aggregation protocol), then that actor is the new platform, and the default outcome of piece 1 reasserts itself under a new name. So the connective tissue has to be nobody's in particular. It has to be an open standard, self-hosting, with protocol-level commitments that no single implementor can unilaterally change.

This is a discipline, not a technical requirement. The cryptographic primitives work equally well in a proprietary implementation. What keeps the commons open is that the protocol is published, the implementations are interoperable, exits are protocol-faithful (a participant can take their signed claims and move to a different implementation without loss), and governance of the protocol is distributed across participants rather than held by any one of them.

Without this property, the first four collapse back into the default. With it, the first four compose into something structurally different from what exists today.

Why these five properties, and not others

The properties above were not picked off a menu. They fell out of what piece 2 described. You can get to them from the other direction too.

Three fields converge on the same set of requirements, each from a different starting point.

From the expressiveness side: any domain where relations have structure, which is every domain worth representing, needs a representation in which relations are first-class. Knowledge representation researchers have been saying this in various forms for half a century. The expressive adequacy of a representation is bounded by what it can reason about without loss. Flat tables lose information the moment relations get non-trivial. Ordinary graphs lose information the moment relations need to participate in other relations. The minimum adequate representation for any domain with interesting structure is a hypergraph with reflexive edges. This is not contingent; it is a statement about what the world contains.

From the compression side: shared structure in any corpus wants to be factored into shared objects rather than duplicated across records. Information theory has a clean statement of this. The most compressible representation of a corpus is one in which recurring sub-structure is encoded once and referenced, rather than inlined. In representations where edges are first-class and identities are content-addressed, compression is automatic: two claims that cite the same upstream claim share a pointer, not a copy. Compression is not a feature you add. It is a property of the shape.

From the coordination side: agents acting independently across organizations cannot coordinate synchronously. They need representations in which their actions compose without blocking on a central authority. Monotonic, coordination-free data structures give you this. They are content-addressed, append-only, and compose by union rather than by agreement. Distributed systems research has converged on this pattern independently of knowledge representation, but the shape it produces is the same shape: content-addressed, relational, provenance-carrying.

Three fields. Three different motivations. One geometric object at the end. A reflexive directed hypergraph in which edges are first-class and identifiers are content-addressed, claims are signed, and access is mediated by capabilities rather than custody. Call it a metagraph. It is not a preference. It is the attractor the problem converges on from any honest starting point.

Built for the agent as primary reader

There is one more observation worth making, and it is the one that makes the shape urgent rather than merely elegant.

Every graph database in commercial use today was designed to be queried by a human, through a tool a human operates, in service of a human decision. The affordances of those systems (the query languages, the visualization interfaces, the schema tooling) are all shaped by the assumption that a human sits at the end of the query loop, asserting meaning from context the system does not hold.

Agents are not humans. They cannot assert meaning from context the system does not hold. They operate on what the representation gives them. When the representation is lossy (flat rows with implicit relations, foreign keys with implicit semantics, field names that mean subtly different things in different vendors), the agent has to infer, and the inference is where hallucination lives. When the representation is structurally adequate (first-class relations, content-addressed identity, provenance carried through the structure), the agent does not infer, because there is nothing left to infer. It reads.

The shape the problem converges on is, coincidentally, the shape that machines need to read without hallucinating. Not because it was designed for machines in any particular sense. Because the properties that prevent hallucination in a machine reader are the same properties that give a human reader provenance, expressiveness, and compositional access. The agent era is not a new set of requirements. It is the old set of requirements finally taken seriously, because the old reader was generous and the new reader is not.

Graph databases were built to be sold to humans for human use. The shape this piece has been describing is what you get when you build for agents as the primary reader and let humans benefit downstream. The two audiences no longer diverge.

The window

The shape is visible. Content-addressed storage is production-ready. Cryptographic identity has mature standards. Capability-based access control has working implementations. Reflexive hypergraph representations are well-studied. Every primitive required to build the shape exists.

What does not yet exist, commercially at scale, is the composition of the primitives into something coherent. That composition is the window piece 1 referenced. The window is open because the primitives are ready and the composition has not happened. It is closing because the default is compounding. Every agent-mediated transaction that ships without this shape widens the verification debt, and the ambient pain grows, but so does the cost of switching. The window is not open forever.

Someone is going to build this. The properties are forced, the attractor is real, the primitives are mature. The question is who does it, with what commitments, and on what timeline. Those are the questions that determine which world on the far side of the transition we end up in.

The shape is structural, not contingent. Someone is going to build it. The question is who, with what principles, and on what timeline.

The next piece in this series answers those questions.