Authority vs. Execution: Proving What’s Actually Running

“We deployed the new config.”

That statement has two possible meanings:

We sent the config to the cluster
The cluster is actually running the config

Hexarch treats these as separate states, tracked separately, verified separately.

The Data Model

The GatewayNode interface captures both states:

interface GatewayNode {
  id: string;
  clusterId: string;
  status: 'Ready' | 'Starting' | 'Error' | 'Divergent';

  // Authority (desired)
  desiredSnapshotId: string;

  // Execution (applied)
  appliedSnapshotId: string;

  // Reconciliation
  lastHeartbeat: string;
  lastSyncError?: string;
  reconcileReason?: string;

  // Runtime metrics
  metrics: {
    cpu: number;
    memory: number;
    tps: number;
  };
}

The gap between desiredSnapshotId and appliedSnapshotId is the drift. If they match, the node is in sync. If they don’t, the node is divergent.

The Cluster Manager UI

In ClusterManager.tsx, the UI renders both states side-by-side:

// Authority (Desired)
<div className="fleet-state fleet-state--authority">
  <p className="fleet-state__value">{cluster.desiredSnap}</p>
  <span className="fleet-state__hash">{cluster.desiredHash}</span>
  <button type="button">Force Global Snapshot</button>
</div>

// Execution (Applied)
<div className="fleet-state fleet-state--execution">
  <span className="fleet-state__pill">{syncedCount} / {totalNodes} NODES VERIFIED</span>
  <button type="button">Runtime Audit Logs</button>
</div>

Operators see both simultaneously. No guessing about what’s actually running.

Cohesion Metrics

Cohesion is the percentage of nodes running the desired configuration:

const cohesionPercent = Math.round(
  (nodes.filter(n => n.appliedSnapshotId === desiredSnapshotId).length / nodes.length) * 100
);

The UI color-codes this:

Green (100%): All nodes match authority
Yellow (50-99%): Some nodes are syncing or recovering
Red (<50%): Significant drift

Node Status Matrix

Each node appears in a runtime matrix:

{nodes.map(node => (
  <tr key={node.id}>
    <td>{node.id}</td>
    <td>
      <Badge status={node.status === 'Ready' ? 'healthy' : 'degraded'}>
        {node.status}
      </Badge>
    </td>
    <td>
      <Badge status={node.appliedSnapshotId === desiredSnapshotId ? 'synced' : 'divergent'}>
        {node.appliedSnapshotId === desiredSnapshotId ? 'IN SYNC' : 'DIVERGENT'}
      </Badge>
    </td>
    <td>{node.lastSyncError || '—'}</td>
    <td>
      <Button size="sm" onClick={() => forceSync(node.id)}>
        Force Sync
      </Button>
    </td>
  </tr>
))}

When a node is divergent, the lastSyncError shows why:

RELOAD_FAILURE: Incompatible Filter Signature
NATS_TIMEOUT: Failed to receive snapshot
JVM_MEMORY_PRESSURE: OOM during policy load

Force Reconciliation

When you click “Force Sync” on a node—or “Force Global Snapshot” on a cluster—the UI requires justification:

const [justificationTarget, setJustificationTarget] = useState<string | null>(null);
const [reason, setReason] = useState('');

function handleForceSync(nodeId: string) {
  setJustificationTarget(nodeId);
  // Modal opens, user must enter reason
}

function confirmForceSync() {
  if (!reason.trim()) return;

  // Record the override with justification
  api.forceReconcile({
    target: justificationTarget,
    reason: reason,
    actor: currentUser.id
  });

  setJustificationTarget(null);
  setReason('');
}

The justification is recorded in the audit log. You can’t force-sync without explaining why.

Cryptographic Verification

In the current types.ts model, snapshots point to artifacts and artifacts carry a hash:

interface ConfigArtifact {
  id: string;
  hash: string;
  // ...
}

interface ConfigSnapshot {
  id: string;
  artifactId: string;
  // ...
}

The Cluster Manager demo UI also surfaces a desiredHash field on the cluster data to make verification visible.

Important nuance: the GatewayNode interface currently reports appliedSnapshotId and desiredSnapshotId (authority vs execution), but it does not include a node-reported artifact hash. In other words, the UI already distinguishes authority vs execution and shows verification affordances, while full end-to-end cryptographic attestation is represented as demo/UX today rather than a strict node-level hash comparison in the typed model.

The Three Clusters Demo

The UI ships with demo data showing three clusters:

const INITIAL_CLUSTERS = [
  {
    id: 'eks-prod-us',
    status: 'HEALTHY',
    environment: 'PRODUCTION',
    cohesion: 100,
    nodes: [/* all synced */]
  },
  {
    id: 'gke-stage-eu',
    status: 'SYNCING',
    environment: 'STAGING',
    cohesion: 75,
    nodes: [/* one degraded */]
  },
  {
    id: 'azure-dev-west',
    status: 'DIVERGENT',
    environment: 'DEVELOPMENT',
    cohesion: 33,
    nodes: [/* network issues */]
  }
];

Production is healthy. Staging has a node catching up. Development has drift. This is realistic—and the UI makes it visible at a glance.

Why This Separation Matters

Without authority/execution separation:

“Deployed” means “sent somewhere”
Drift is invisible until something breaks
Incident response starts with “what’s actually running?”

With Hexarch:

“Deployed” means “verified running on N% of nodes”
Drift is visible immediately in the cohesion metric
Every node’s state is provable via hash verification

Try It

The Cluster Manager is available at /cluster. Explore the three demo clusters, inspect node-level status, and see how the authority/execution split makes fleet state legible.

When you click “Force Sync,” notice the justification requirement. That’s not a UI annoyance—it’s an audit trail.