Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

December 6, 2023

Anay Mehrotra, Amin Karbasi (Yale University, Robust Intelligence), Manolis Zampetakis, Paul Iassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer (Robust Intelligence)

Abstract

In empirical evaluations, we observe that Tree of Attacks with Pruning (TAP) generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) far more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.

DOWNLOAD THE FULL PAPER

more insights