Deceptive Misaligned Mesa Optimisers It

Media Summary: The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment:

Deceptive Misaligned Mesa Optimisers It - Detailed Analysis & Overview

The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the ... This "Alignment" thing turns out to be even harder than we thought. # Links The Paper: ... Alignment Problem: Mesa-Optimizers and Inner Alignment: But there is also the other problem, uniquely applicable to future AGIs, general beyond a certain level, a problem more sinister ... Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ...