Here’s a design problem I don’t have a good answer to, and I’m skeptical of people who claim they do.
As AI systems become more agentic — capable of taking sequences of actions over longer horizons — the standard design response is to add human checkpoints. Require approval before irreversible actions. Surface intermediate states for review. Build in confirmation steps at consequential decision points.
This is the right instinct. It’s also producing systems that users are actively circumventing. The checkpoints create friction. Users who trust the system — whether that trust is calibrated or not — find the friction annoying and develop workarounds. They batch-approve without reviewing. They configure systems to minimize interruptions. They create the appearance of human oversight while functionally removing it.
So we have two failure modes. Fully automated systems that users over-trust. Human-in-the-loop systems that users route around. The equilibrium — the design that preserves genuine human judgment without creating friction that defeats the purpose — is something I haven’t seen built successfully yet.
My current thinking, which I hold with low confidence: the right intervention point isn’t the action level, it’s the goal level. Requiring human approval of every action is too granular to be sustainable. Requiring human specification of the goal, the constraints, and the acceptable failure modes before the agent begins might be the right level of abstraction. The human isn’t reviewing every step — they’re specifying the boundary conditions, and the system is constrained to operate within them.
The problem with this: it requires users to think carefully about failure modes before they’ve seen the system fail, which is asking people to reason about scenarios they haven’t experienced. That’s a known cognitive difficulty. I don’t have a clean solution to it. I’m watching how researchers approach it and expecting to update.

Leave a comment