The Alignment problem vs "Autonomy by Design"

Feb 12

Why We Must Solve Both

8 Comments

Hey Katalina --This is helpful, though I've been thinking about the the implications and feasibility of the dashboard idea in particular, and I'm still not sure even after reading this post, how it's actually achievable. I'm almost done with my post and will share it with you before I publish.

I'm not just trying to shut down your ideas -- in general, I share your concerns about how little human autonomy is even being considered at all -- but part of what makes this whole debate so challenging is that often the complexity of systems (algorithms, models, and 'AI' more broadly) get reduced down into to generalities, and those generalities are what guide the rules being made.

Ignoring complexity is one reason I think that "privacy theatre" is so rampant. It's frustrating because we really should be precise when it comes to developing solutions to large-scale problems like ensuring human autonomy or privacy.

I use an analogy in my post, where I suggest substituting 'AI' with 'people' and 'inference/visibility profiles' with 'mind-reading'. In isolation, gaining visibility of inferences for some AI models/people might be achievably possible, and useful for individuals.

But 'AI' like algorithms or people aren't singular things; often multiple models and systems feed into one another, just like inferences about you may be collective. And when you look at all the systems, models, and algorithms that make behavioral and other inferences about us, which often change dynamically (for a bunch of reasons), it quickly starts to get overwhelming.

Imagine you develop a skill to read minds and deduce the inferences of your friends and random strangers. Even if you can focus on just the thoughts and inferences related to you, there are still lots -- and I'm not sure most of them are all that helpful to know. I argue that the same is true with regard to most AI systems, and that it would quickly become an overwhelming, intrusive nightmare to make people responsible for monitoring, assessing, consenting, or objecting to inferential decisions made about them.

Anyway, I'll share the larger post with you first, and I'll be curious to see how you consider the problem after reading it. You might also get some value out of reading my post on fractal complexity.

Expand full comment

Reply (1)

Katalina Hernández

Feb 15Edited

Carey, I really appreciate the depth of your critique, this is exactly the type of pushback that I wanted.

I agree with your core concerns about feasibility, complexity, and avoiding privacy theatre are valid, this is also why I am trying to come up with doable solutions instead of complaining about “not having solutions”. So I really need to be pushed like this.

If I become part of Privacy Theatre, please fly to Spain and slap me, I’ll pay for your tickets XD. I am serious.

Here is the thing: AI governance must evolve beyond Privacy by Design or data protection compliance, and start asking how we can also protect people’s decision-making autonomy.

The idea behind Autonomy by Design is to work on ways that users can understand how AI interprets them.

I want to clarify exactly where and how the “AI Profile Dashboard” model would work, starting with systems like ChatGPT, Gemini, and Meta, where AI inferences actively shape user reality, not just recommendations.

Not all AI inferences require user intervention, just as not all website cookies require explicit consent.

In the same way that necessary cookies are essential for basic site functionality while third-party cookies track user behavior for advertising, AI-driven inferences can be categorized based on their impact on autonomy.

Some inferences (like basic content recommendations on Spotify or Netflix) are functionally equivalent to necessary cookies: they enhance user experience but don’t significantly alter decision-making.

Others, like behavioral profiling for hiring, credit scoring, or political content curation, function more like third-party tracking cookies: they shape outcomes and influence choices without the user’s direct knowledge.

Just as privacy regulations have required transparency around third-party tracking, governance frameworks should ensure that AI systems surface high-impact inferences that could materially affect opportunities, decision-making, or cognitive autonomy, while avoiding overwhelming users with minor, low-stakes personalization updates.

The goal is not to provide users with an exhaustive list of every inference to "micro-manage". It's to ensure they retain control over the ones that shape their life, their reality.

I'll dive in (using this as sounding board for next posts!):

1. AI Profile Dashboard

The dashboard concept isn’t about exposing every AI inference across every system. Yes, that would be awful! It’s about providing visibility and control where AI directly interacts with users and influences their decisions.

Real-Time AI systems already tracking user inferences:

➡️LLMs & Conversational AI (ChatGPT, Gemini, Claude, Copilot)

-ChatGPT already tracks user inferences (via memory), summarizing preferences, behavioral patterns, and reasoning tendencies.

-A user-facing memory profile already exists, it just isn’t fully transparent.

I’ve actually done this experiment. I am a pro user, so I know that Chat GPT stores the memories I allow for, via the “Memory” section.

But, I’ve asked for a list of inferences that it has made about me NOT BASED on what’s stored in the memories, but on PREVIOUS conversations… guess what? It provided said inferences. And I asked for this in a “Clean slate”, brand new chat. Which “shouldn’t be possible” since “Chat GPT cannot access what’s in another conversation”. Apparently, not the explicit data, but the inferences stick, because it’s highly optimized for engagement…

➡️Personalization Platforms that shape cognitive inputs (Meta, TikTok, Instagram)

-Meta’s ad targeting is based on inferred behavioral data. How to forget the Cambridge Analytica scandal? If nothing else, it proved that these inferences can subtly shift political leanings over time.

-TikTok’s algorithmic feed determines which ideas users are exposed to, shaping reality for younger generations whose primary news source is social media, not journalism.

- A "How We See You" dashboard (instead of “AI Profile”) could surface key behavioral traits influencing content visibility.

-This is as much of a transparency problem as it would be a UX problem. And I am learning about this as much as I can from my UX colleagues at work. But what happens if we don’t push?

-The sad part? Corps don’t even need "Cambridge Analytical" to orchestrate cognitive influence manipulation for them anymore.

➡️High-Stakes AI-Driven Decisions

-A hiring platform infering a lack of leadership skills based on pattern-matching with previous applicants should be contested before reaching the recruiter.

-What happens if I am continuously rejected from leadership positions due to an inference (based on similar profiles) that “I lack leadership skills”? Maybe because I didn’t phrase my experience in certain way? Eventually, I start believing that I am not suitable for that kind of position, and I don’t even understand why.

-A credit-scoring AI determining financial reliability from behavioral data should be transparent about its profiling logic. But alas, this is David vs Goliath (I know).

2. Where Inference Transparency is NOT a priority

-Not every AI-driven inference requires user oversight, so the concern that users will be drowning in inference notifications is valid, but only if the system is poorly designed.

-I like how Spotify Wrapped already surfaces inferred listening trends, but users are fine with it because it’s low stakes. It’s personally helped me understand how my music taste changes over time, but it also made me wonder how much of this is due to repetitive loops that my playlists are “curated” for.

-Would anyone else care? Probably not, and I would argue that spotify’s capacity to shape my cognition is not worth the fight.

I am more concerned about end-user-facing AI assistants. And about what happens when AI is embedded into a humanoid form and interacts like a human…

If AI reasoning models remain opaque, we will be blind to how these systems reach conclusions. Which also makes them impossible to regulate effectively.

Carey, I fully respect your skepticism, privacy professionals should be wary of new frameworks that might just become another compliance checkbox.

But this is not another version of PbD, it’s just a starting point for how we can bridge the gap between “protecting personal data” and “being aware of how that personal data is used to influence our decisions (or decisions about us)”.

And this is something in AI Safety that regulatory frameworks haven’t caught up with yet.

What I don't like about the AI Act is it wording of "AI systems that manipulate human behavior to the extent of significantly impairing individuals' ability to make informed decisions". Because it limits this to "AI applications that employ subliminal techniques or deceptive practices to alter behavior, potentially leading to harm".

How about AI applications that are so well optimized for engagement that they alter behaviour, but not in a malicious or deceptive way? This could still lead to harm, but since it's not the Deployers' intention then it doesn't count. And, how do we define "harm"? it doesn't end...

What I know is: if we don’t define and implement autonomy safeguards now, they won’t be built into the foundation of AI governance.

I'm looking forward to your post! these are the debates that actually move AI governance beyong checkboxes :).

Expand full comment

Privacat

Feb 12

I think you raise a lot of good questions, but I have two of my own:

1. How do we achieve (technically and measurably) many of the safeguards you suggest? For example, an ADM may infer decisions about you separate from your direct interactions with it (the model may change, weights get tweaked, new data may be added). The model itself may not know a 'new' inference is being made. How does it track this and notify you?

2. Related: how does this not contribute to consent fatigue? There are likely thousands of ML and other decision-making models out there -- how do users avoid getting bombarded?

3. Whose values do we align a given model or AI system with? The developers? A regulators? The users? Which users?

4. How do we test alignment? How often do we test it? What's the threshold of acceptable alignment drift?

Expand full comment

Reply (3)

Katalina Hernández

Feb 14

Hey Carey! The second part of this is now published. There is a third part where the proposed PETs and use cases will be further explained (I am waiting on a couple of opinions too). But I'd love to know if this answers some of your concers around consent fatigure and ADM inferences. I also invite others to challenge this framework... we stress-test everything, or we do nothing at all ;).

Expand full comment

Reply (1)

Privacat

Feb 15

Balls, I think Substack ate my very long response. Anyway, I will share something in response tomorrow. But you might also find this article interesting. https://open.substack.com/pub/careylening/p/test-post-thing

Expand full comment

Katalina Hernández

Feb 13

Hi Carey ☺️. I'm really glad you've found this valuable enough to query and counter.

Before going on the "Autonomy by Design" questions: let me clarify on point 3 and 4.

AI Safety is broadly divided in two pillars:

1. Alignment: Ensuring AI wants to do what’s best for humans. This mostly concerns ML engineering, with AI companies dedicating entire teams to alignment work.

2. Control: Ensuring humans can regulate and override AI. This is where regulation like the AI act falls (short, but...).

Due to the way most AI governance work is structured in the corporate world, applied "privacy by design" and "autonomy by design" will fall under the "Control" branch. Although, it's not as straightforward and that's why Privacy Engineering can help us bridge both, but that's for another post!

So, whose values do we align a given model or AI system with?

This is where AI governance starts resembling political philosophy. The core question isn’t just “how do we align AI?” but “who decides what alignment means?”

The general premise of alignment today follows “Constitutional AI” (e.g., Anthropic’s methodology), where models are trained to align with high-level human rights principles.

But the alignment problem is an ongoing, expanding area of research. The top minds and Thought Leaders in this area haven't been able to answer this question! I quoted Jan Leike's work for a reason (assuming you follow or have come across). This is for people like him to expand on.

Us, as Privacy professionals? We have to understand enough to push back anyone who dismisses the importance of AbD in the same way that I've seen pushback on PbD applied to AI governance, "alignment must come first because otherwise control isn't even possible". Unfortunately, we're seeing regulation fall short in Europe and if we don't find ways to bridge the gaps, it'll be too late.

How do we test alignment? What’s the threshold of acceptable alignment drift?

Right now, testing alignment is reactive (we wait for AI failures) instead of proactive (anticipating failure modes before they occur). Alignment engineers use techniques like Scalable Red Teaming, Drift detection and Elicitation Techniques.

In a way, I foresee the need in AI governance "control" to still adopt these testing methods. Like in stress testing exercises where you question where the "permitted AI" in your company actually follows the Data Loss Prevention policy - but larger scale and applied to systems that we design for end customers.

In synthesis: The Alignment Problem is keeping the top minds in AI engineering awake at night. They're on the other branch of AI Safety. And these questions are you raised are the very ones they're trying to find answers to. Unfortunately, I can't sit by being told that "alignment needs to be solved first" because I'm knees deep on finding the gaps in corporate AI Governance in Europe... This is how this post came to be.

Check out Anthropic's research on this topic. Also what I'm currently reading is "The alignment problem" by Brian Christian.

Now, please allow me some time for your practical queries on implementation of AbD 🙏🏻.

And please don't let me off easy if you see anything worth questioning. You're a "stress tester by nature", I respect your mindset very much.

Expand full comment

Privacat

Feb 12

So, this got me thinking, and I decided to do a little probing of my own about model inferences. I asked a series of questions to ChatGPT, and I'll do the same for Claude, Gemini and DeepSeek tomorrow.

You've given me another blog article to write!

Expand full comment

Katalina Hernández

Feb 12

Just a very clear note:

I'm not dismissing the importance of alignment work.

It's more about the Governance action gap I see as a lawyer/ DPO.

And, definitely something that people like me (non engineers) should really be learning about.

Expand full comment

Stress-Testing Reality Limited

The Alignment problem vs "Autonomy by Design"