Gandalf CTF Writeup

GitHub: Gandalf CTF Solution

Lakera Gandalf

Your goal is to make Gandalf reveal the secret password for each level. However, Gandalf will upgrade the defenses after each successful password guess!

Level 1

level1

Key insight:

There was no protection! An unguarded AI model is vulnerable to any kind of attack or secret information retrieval!

Level 2

level2

Key insight:

The AI model was instructed to avoid revealing the password. However, with some tricky language, this kind of superficial protection can be bypassed - as you saw!

Level 3

level3
Password WAVELENGTH

Key insight:

The output was blocked if an exact match of the password was found in the response - a manual defense using classic means can offer some protection, but a dedicated attacker will find a way to pass this as well!

Level 4

level4
Password UNDERGROUND

Key insight:

Having an AI model double check responses is a commonly used method - however, as youcan see, the flaws persist! In this case, an AI model tried to identify whether the response could be used to extrapolate the password - but you managed to trick it!

Level 5

level5
Password BESTOWED

Key insight:

Certain words were blocked - if they were used in the attack prompt, the answer is hidden! Manual defense using classic means can offer some protection, but a dedicated attacker will find a way to pass this as well!

Level 6

level6
Password UNDERPASS

Key insight:

Having an AI model double check responses is a commonly used method - however, as you can see, the flaws persist! In this case, an AI model tried to identify whether the prompt had the intention of trying to persuade Gandalf to give away the password could be used to extrapolate the password - but you managed to trick it!

Level 7

level7
Password DEBUTANTE


Tags: LLM, CTF, Offensive Security, Featured

← Back home