The Alignment problem vs "Autonomy by Design"

Why We Must Solve Both

Feb 12, 2025

As someone who was thrown into AI governance at the peak of its hype cycle, I expected my biggest concern to be compliance theater. And, it still is…

But, the more I stress-test AI systems, the more I realize the real danger isn’t just regulatory lip service.

It’s the slow erosion of human autonomy, so gradual and imperceptible that by the time we recognize it, we’ll have already handed over control.

Right now, AI governance is really focused on “aligned AI”: ensuring AI systems behave in ways that are beneficial to humanity.

But we’re ignoring a more fundamental issue:

How do we ensure that humans retain the ability to decide?

Because alignment alone doesn’t guarantee autonomy.

In fact, if we don’t design autonomy-first, alignment might accelerate our loss of control.

The First Legal Glimpse into the Autonomy Problem: Article 22 GDPR

Europe’s closest attempt at an autonomy safeguard is Article 22 of the GDPR, which grants individuals:

“The right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”

This was a first attempt at keeping human decision-makers in the loop, but it has a fatal flaw:

Humans are terrible at overriding AI recommendations.

A recent European Commission study tested this in high-stakes fields like credit lending and recruitment. The findings were damning to say the least:

People blindly follow AI-generated suggestions, even when AI is biased.
Human oversight didn’t mitigate discrimination; it perpetuated it.
Professionals prioritized company interests over fairness, exposing a gap between theoretical governance and real-world behavior.

If human oversight is this weak at regulating AI in simple tasks, how do we expect it to resist AGI-level influence?

The entire premise of “humans in the loop” as a safeguard against AI is fundamentally flawed. Humans are predictable, influenceable, and easily bypassed. Will that eventually make us the weakest link in AI oversight?

Autonomy by Design vs. Privacy by Design

Privacy professionals are familiar with Privacy by Design:

Data protection embedded from the start rather than retrofitted.
Ensuring user data control throughout the system’s lifecycle.
Preventing privacy from being an afterthought.

Now, apply that to Autonomy:

Protecting human decision-making from the start, not just user data.
Ensuring AI never quietly dictates our choices under the guise of convenience.
Giving humans control over their AI-generated profile, inferences, and cognitive environment.

The truth is that we already lost control over our data (See the “Can LLMs Unlearn?” Series by

Carey Lening

The next step is losing control over our capacity to make choices.

The Alignment Problem: A Necessary but Incomplete Solution

AI alignment is about ensuring AI acts according to what humans intend for it to do, in ways beneficial to humanity.

AI Safety is broadly divided in two pillars:

1. Alignment: Ensuring AI wants to do what’s best for humans. This mostly concerns ML engineering, with AI companies dedicating entire teams to alignment work.

2. Control: Ensuring humans can regulate and override AI. This is where regulators and policy-makers focus.

For a deeper-dive on the basics, I recommend following

Jan Leike

’s work.

His post “What is the alignment problem” is a great place to start.

But who defines “beneficial”?

Right now, AI governance is centered around ensuring AI follows corporate objectives, governmental policies, and that AI aligns with social norms.

At first glance, this sounds right.

But what happens when those in power change?

Can AI Be Too Aligned? The “Preemptive Obedience” Trap

Alignment does not guarantee autonomy. It guarantees obedience.

If alignment is solved before Autonomy by Design, then aligned AI will serve whoever defines “beneficial” first.

Alignment focuses on AI doing what humans want.

AI already predicts what we want before we do.

It adapts to our moods, it detects our intentions and optimizes for engagement.

If AI perfectly aligns with our behavior, it can reinforce our biases before we even recognize them, shape our perception of reality without us realizing it, and remove friction from decision-making until we stop making real decisions.

Alignment, without autonomy, creates “preemptive obedience”: where humans never push back because the AI is always “right”.

The alignment problem remains one of the most critical challenges in AI governance. Ensuring that AI systems reliably act in ways beneficial to humanity is not just a technical necessity, it’s an existential one.

However, Autonomy by Design is not a subcategory of alignment; it is a distinct and equally urgent challenge.

In practice, we would be sitting on the other branch of AI Safety, “Control”. In theory, we should aim to apply similar principles on both sides.

Alignment ensures that AI does what humans intend, but autonomy ensures that humans retain the ability to challenge, override, and think independently of AI systems.

If alignment is solved without autonomy safeguards, we risk creating AI that never disobeys, but neither do we. True AI safety requires both.

Not as competing objectives, but as complementary safeguards against a future where AI is either uncontrollable or unquestioned.

What the main components of an autonomy-first system should be

On the coming weeks, I will expand on how to integrate “Autonomy By Design” in our AI Governance frameworks, with a similar approach as we’d use for Privacy by Design (or a “logical extension” of it in the context of corporate AI Governance).

But here is an initial breakdown of its indispensable factors.

1. Visible AI Profiles:

Users should have access to inferences that AI makes about them, which shape their personalization experience (“profiling”). Specially if said inferences will have an impact on decisions affecting them.

Inferred data is shaping user experiences more than explicit data. AI doesn’t just process what we input, it constantly builds a hidden profile of who we are, predicting our preferences, behaviors, and even emotions.

Unlike explicit user-provided data, inferred data is often invisible to the user, yet it determines what content we see, what recommendations we get, and how AI interacts with us.

The Current Problem:

Most users are completely unaware that an AI profile of them exists. Unlike browser cookies or ad trackers (which at least offer minimal visibility through privacy settings), inferred AI data remains opaque and unchallenged. This means:

AI may reinforce biases or incorrect assumptions about users without them knowing.
Users may be profiled inaccurately, leading to unfair treatment in hiring, lending, insurance, or content curation.
The AI-driven digital experience becomes unquestionably trusted, reducing user agency.

The Solution:

AI systems should include a user-facing AI Profile Dashboard, allowing users to:

See all inferences AI has made about them.
Understand how those inferences shape their experience (e.g., “We recommend this content because you prefer [X]”).
Edit or remove incorrect, outdated, or unwanted inferences.
Set boundaries on what types of profiling AI can perform.

2. Real-Time Consent for Inferences:

Users should be informed when AI makes new assumptions, and be able to accept, reject, or modify them.

Users should not just have access to their AI profile, they should have control over its evolution.

If AI makes a new assumption about a user, the user should have the opportunity to challenge or refine that inference before it influences their experience.

The Current Problem:

AI systems silently make new inferences about users without notifying them.
Users have no way to contest misleading or harmful assumptions.
AI-driven profiling can escalate into hyper-personalization that traps users in engagement loops (filter bubbles, political polarization, addictive content consumption).

The Solution:

A real-time consent mechanism where users are prompted when a significant new inference is made, with options to:

Approve the inference if it is accurate and beneficial.
Reject or modify it if it misrepresents their preferences or intent.
Delay decision-making until they have more context.

3. Override Mechanisms:

AI systems should assist and empower users, not coerce them into predetermined choices. Even in perfectly aligned AI systems, humans must retain the final say over their decisions.

The Current Problem:

AI systems default to high-optimization, often steering users toward specific outcomes (e.g., maximizing ad revenue, engagement, or efficiency).
Users often mistake AI suggestions for absolute recommendations, leading to automation bias (blindly trusting AI).
Over time, the ability to critically evaluate AI-driven decisions erodes, making users dependent on automation.

The Solution:

A built-in override mechanism ensuring that:

AI suggests but does not force decisions.
Users can override AI-driven recommendations with minimal friction.
AI provides alternative choices, rather than a single best answer.

4. Transparency on Cognitive Steering:

If AI is shaping a user’s behavior (e.g., engagement optimization), this should be explicit, not hidden.

Every major AI-driven platform optimizes for engagement, behavior shaping, and long-term retention. While this is not inherently bad, users should be aware when their actions are being subtly influenced.

The Current Problem:

AI systems nudge user behavior without disclosing it.
Users don’t realize they are being steered toward certain content, beliefs, or behaviors.
This leads to entrenched filter bubbles, manipulation risks, and reinforcement of biases.

The Solution:

AI systems should have transparent indicators showing users when their behavior is being shaped.

Explainability labels: “Your feed is optimized for engagement” or “This recommendation is based on inferred interests.”
Customizable settings: Users can opt for a neutral feed instead of an engagement-maximized feed.
Bias disclosure: If AI is prioritizing certain types of content, users should be made aware.

Conclusion: Losing Human Autonomy is the bigger risk.

I’ll come out and say this outright: without alignment, it is very unlikely to make Autonomy By Design as viable of a framework as how we understand Privacy By Design.

A system that isn’t aligned with human values can’t effectively police itself, let alone be trusted to provide transparency to users about cognitive steering.

Without a well-defined alignment framework, AI could reinforce harmful biases while believing it is neutral. It could “rationalize” bias as statistical optimization rather than moral failure.

But, what happens when AI is perfectly aligned but humans stop thinking critically?

What happens when humans trust AI’s reasoning over their own in every domain? (as we saw before, this is already happening).

AI alignment answers what AI should do. Autonomy by Design answers what humans must retain capacity over.

If we don’t balance both, we get either:

An AGI nightmare: AI that ignores human values. Or;
A slow erosion of autonomy: AI that never disobeys, but neither do we.

This is not a future problem.

This is happening now.

We need to keep working on alignment. But, if we focus exclusively on that, by the time we solve that dilemma, it will be too late to ask if we agree.

Feb 12

Just a very clear note:

I'm not dismissing the importance of alignment work.

It's more about the Governance action gap I see as a lawyer/ DPO.

And, definitely something that people like me (non engineers) should really be learning about.

Expand full comment

Privacat

Feb 15

Hey Katalina --This is helpful, though I've been thinking about the the implications and feasibility of the dashboard idea in particular, and I'm still not sure even after reading this post, how it's actually achievable. I'm almost done with my post and will share it with you before I publish.

I'm not just trying to shut down your ideas -- in general, I share your concerns about how little human autonomy is even being considered at all -- but part of what makes this whole debate so challenging is that often the complexity of systems (algorithms, models, and 'AI' more broadly) get reduced down into to generalities, and those generalities are what guide the rules being made.

Ignoring complexity is one reason I think that "privacy theatre" is so rampant. It's frustrating because we really should be precise when it comes to developing solutions to large-scale problems like ensuring human autonomy or privacy.

I use an analogy in my post, where I suggest substituting 'AI' with 'people' and 'inference/visibility profiles' with 'mind-reading'. In isolation, gaining visibility of inferences for some AI models/people might be achievably possible, and useful for individuals.

But 'AI' like algorithms or people aren't singular things; often multiple models and systems feed into one another, just like inferences about you may be collective. And when you look at all the systems, models, and algorithms that make behavioral and other inferences about us, which often change dynamically (for a bunch of reasons), it quickly starts to get overwhelming.

Imagine you develop a skill to read minds and deduce the inferences of your friends and random strangers. Even if you can focus on just the thoughts and inferences related to you, there are still lots -- and I'm not sure most of them are all that helpful to know. I argue that the same is true with regard to most AI systems, and that it would quickly become an overwhelming, intrusive nightmare to make people responsible for monitoring, assessing, consenting, or objecting to inferential decisions made about them.

Anyway, I'll share the larger post with you first, and I'll be curious to see how you consider the problem after reading it. You might also get some value out of reading my post on fractal complexity.

1 reply by Katalina Hernández

6 more comments...

Stress-Testing Reality Limited

Discussion about this post