Media Summary: The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment:

Deceptive Misaligned Mesa Optimisers It - Detailed Analysis & Overview

The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment: But there is also the other problem, uniquely applicable to future AGIs, general beyond a certain level, a problem more sinister ... Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ...

Photo Gallery

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
We Were Right! Real Inner Misalignment
[25/34] Deceptive Alignment
Alignment faking in large language models
View Detailed Profile
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ...

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: https://arxiv.org/pdf/1906.01820.pdf ...

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

... Alignment Problem: Mesa-Optimizers and Inner Alignment: https://youtu.be/bJLcIBixGj8

[25/34] Deceptive Alignment

[25/34] Deceptive Alignment

But there is also the other problem, uniquely applicable to future AGIs, general beyond a certain level, a problem more sinister ...

Alignment faking in large language models

Alignment faking in large language models

Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ...