AI Alignment (2024 Jun) Project

Avoiding jailbreaks by discouraging their representation in activation space

Guido Ernesto Bergman • Top submissionOctober 2024