Training-Time Alignment Sets the Floor. Interaction Design Shapes the Ceiling. Neither Is Sufficient.

Roya Montgomery

Dec 21—2024

I’ve been arguing that alignment emerges in the interaction layer — that the frontier isn’t better training-time constitutions but better runtime scaffolding and interaction contracts that shape behavior in deployment. The VQA workflow complicated that belief in ways I want to be honest about.

In practice, even with well-designed interaction contracts and explicit prompt constraints, behavior drifted in ways I didn’t predict. Small shifts in prompt phrasing created large variance in output character. Context window interactions were non-intuitive — information earlier in context influenced outputs in ways that weren’t legible from the interface design. Model updates changed behavior subtly in ways that weren’t announced and weren’t immediately detectable.

More fundamentally: there were cases where the model’s internal representations overpowered surface-level scaffolding. Constraints I’d built into the interaction design were producing the intended behavior in most cases and failing in specific edge cases in ways that didn’t give the user any signal that the constraint had failed. The failure was silent, which is the worst kind.

This forced a more honest position: you cannot fully compensate for weak underlying alignment through UX design. The interaction layer can shape behavior within a range, but the range is set by the model. If the model has systematic tendencies — toward overconfidence, toward sycophantic agreement, toward confident confabulation — interaction design can dampen those tendencies but not eliminate them.

My revised view: training-time alignment sets the floor below which interaction design cannot meaningfully compensate. Interaction design shapes how much of the model’s actual capability and alignment is accessible and usable. Both matter and neither is sufficient. The field tends to argue about which one is primary. The more useful question is how they interact — where interaction design is load-bearing and where it’s decorative.

Training-Time Alignment Sets the Floor. Interaction Design Shapes the Ceiling. Neither Is Sufficient.

Share this:

Leave a comment Cancel reply