Media Summary: Host Jeremie Harris and the latest guest on the podcast, Part 1 of a series of talks in which researcher We purposely build or discover situations where models might be behaving in misaligned ways”
Evan Hubinger The Inner Alignment - Detailed Analysis & Overview
Host Jeremie Harris and the latest guest on the podcast, Part 1 of a series of talks in which researcher We purposely build or discover situations where models might be behaving in misaligned ways” AI systems are increasingly embedded in our workplaces and our homes. They judge our skills, our values, and sometimes our ... If You're New Subscribe ▻ Ep 111: Might we be surrounded with undetected minds? If an AI system learned a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training ...
Most of us have encountered situations where someone appears to share our views or values, but is in fact only pretending to do ...