Solving CTFs with AI Agents

Its 2025. The dawn of AGI is upon us. Binaries are just tokens. Code is just tokens. Symbolic execution is a brittle memory of the past. Manual reverse engineering is an ancient art form. Embrace the future… embrace … tokens.

ctf

nfuncs

I played DEF CON 33 Quals this year with SuperDiceCode (placing 2nd thanks to my strong teammates!).

I spent a lot of time on the nfuncs challenge, written by fish, a known symbolic execution lover and one of the core devs of angr. (note: this writeup may be blasphemous for those who prefer static analysis tools with strong guarantees, you may want to avert your eyes)

The n<challenge> naming scheme is a callback to the original ncuts challenge. It hints that the objective here is not to solve one hard problem, but rather to automatically solve thousands of mini problems. In previous years, these mini problems have all been fairly easy to solve by hand, but it would be too time consuming and tedious to solve them all.

Unsurprisingly, this year was the same, where the objective was to decode a huge binary consisting of a Matryoshka-like unpacking process.

This writeup summarizes my attempt at solving the challenge in a fairly general way using an o3-mini powered agent. This solution was enough for nfuncs1, but proved too slow for nfuncs2. We realized the encoding scheme was repetitive enough to solve with a heuristic-based script too late, and thus only managed to decode 25% of part 2 by the end of the CTF.

However, I think this type of solution path is interesting. o3-mini was more capable than I expected, and as more models come out, costs and latencies are reduced, I think these types of solutions will become much more common for automation tasks.

Overview

The challenge provides us with a 3 GB Windows executable (which Windows 11 even refuses to run).

After loading it into Binary Ninja (hint: disable automatic linear sweep!), we see that most of the binary is just random data, except for a few functions including this one:

Function: sub_140001480 :: Initial unpacker function
1400014aaenum PAGE_PROTECTION_FLAGS var_24
1400014aaVirtualProtect(sub_140001510, 0xdaf0, PAGE_EXECUTE_READWRITE, &var_24)
1400014b6int64_t var_20 = -0x6e7019b8ad0c0e43
1400014bbint64_t i = 1
1400014c0*sub_140001510 = 0xe8
1400014e0do
1400014d3    *(i + sub_140001510) ^= *(&var_20 + zx.q(i.d & 7))
1400014d6    i += 1
1400014e0while (i != 0xdaf0)
1400014e0
1400014f2VirtualProtect(sub_140001510, 0xdaf0, var_24, nullptr)
140001500return sub_140001510()

This function decrypts sub_140001510 by first calling VirtualProtect to make memory writable. It then decrypts the memory using an XOR key and calls VirtualProtect again to revert memory to read-only. Finally, it jumps to the newly decrypted function.

Decoding this by hand, we uncover sub_140001510 :

Function: sub_140001510 :: Function 1
140001520int64_t _DstBuf = 0
14000152cint64_t s
14000152c__builtin_memset(&s, 0, 0x100)
140001694int64_t* i_5 = &s
1400016a1int64_t* i = &s
1400016aevoid var_20
1400016aedo
1400016a4    *i = 0
1400016a7    i += 1
1400016aewhile (i != &var_20)
1400016ae
1400016b0int64_t* i_1 = &s
1400016bddo
1400016b3    *i_1 = 1
1400016b6    i_1 += 1
1400016bdwhile (i_1 != &var_20)
1400016bd
1400016bfint64_t* i_2 = &s
1400016ccdo
1400016c2    *i_2 = 2
1400016c5    i_2 += 1
1400016ccwhile (i_2 != &var_20)
1400016cc
1400016e7char var_121
1400016e7for (int64_t i_3 = 1; i_3 != 0x101; i_3 += 1) (&var_121)[i_3] = i_3.b
1400016e9int64_t* i_4 = &s
140001706do
1400016fc    *i_4 = 2 - &s + i_4.b
1400016ff    i_4 += 1
140001706while (i_4 != &var_20)
140001706
14000171edo
140001715    *i_5 = 0xd - zx.b(&s) + i_5.b
140001717    i_5 += 1
14000171ewhile (i_5 != &var_20)
14000171e
14000173a_read(0, &_DstBuf, 1)
14000174f_read(0, &_DstBuf:1, 1)
140001764_read(0, &_DstBuf:2, 1)
140001779_read(0, &_DstBuf:3, 1)
14000178e_read(0, &_DstBuf:4, 1)
1400017a3_read(0, &_DstBuf:5, 1)
1400017b8_read(0, &_DstBuf:6, 1)
1400017cd_read(0, &_DstBuf:7, 1)
14000185fif(*(&s + zx.q(_DstBuf.b)) != 0x96 || *(&s + zx.q(_DstBuf:1.b)) != 0x5d || *(&s + zx.q(_DstBuf:2.b)) != 0x5b || *(&s + zx.q(_DstBuf:3.b)) != 0x54 || *(&s + zx.q(_DstBuf:4.b)) != 0x1a || *(&s + zx.q(_DstBuf:5.b)) != 0x17 || *(&s + zx.q(_DstBuf:6.b)) != 0x27 || *(&s + zx.q(_DstBuf:7.b)) != 0x17)
140001861    int16_t _Buffer_1 = 0x283a
140001868    var_121 = 0
140001872    return puts(&_Buffer_1)
14000185f
14000187eint16_t _Buffer = 0x293a
140001885char var_135_1 = 0
14000188fputs(&_Buffer)
1400018c8int64_t var_130 = rol.q(ror.q(rol.q(0x9d6c1c5b + _DstBuf, 0xd) ^ 0xdeadbeefcafebabe, 7) ^ 0x123456789abcdef0, 5)
1400018e7enum PAGE_PROTECTION_FLAGS var_134
1400018e7VirtualProtect(sub_14000f000, 0x12280, PAGE_EXECUTE_READWRITE, &var_134)
140001914for (int64_t i_6 = 0; i_6 != 0x12280; i_6 += 1) *(i_6 + sub_14000f000) ^= *(&var_130 + zx.q(i_6.d & 7))
140001930VirtualProtect(sub_14000f000, 0x12280, var_134, nullptr)
140001936return sub_14000f000()

This function does some preparation, then calls _read 8 times, and validates the 64-bit value that is read in, printing either “:)” ( 0x293a ) or “:(” ( 0x283a ) depending on whether the value is correct.

In this case, the lookup table s performs a mapping from x to x + 0xd. In the main branch at 14000185f , we see each byte, mapped through this lookup table, results in the sequence 965d5b541a172717. Subtracting 0xd from each byte, we get the input \x89PNG\x0d\x0a\x1a\x0a (which happens to be the start of a PNG file).

Next, on line 1400018c8 , it performs some bit manipulation on this value to generate an XOR key, and uses it to decrypt the next function.

Looking at the next function, we see a similar pattern:

Function: sub_14000f000 :: Function 2
14000f01cchar _DstBuf
14000f01c_read(0, &_DstBuf, 1)
14000f02echar _DstBuf_1
14000f02e_read(0, &_DstBuf_1, 1)
14000f040char _DstBuf_2
14000f040_read(0, &_DstBuf_2, 1)
14000f052char _DstBuf_3
14000f052_read(0, &_DstBuf_3, 1)
14000f064char _DstBuf_4
14000f064_read(0, &_DstBuf_4, 1)
14000f076char _DstBuf_5
14000f076_read(0, &_DstBuf_5, 1)
14000f088char _DstBuf_6
14000f088_read(0, &_DstBuf_6, 1)
14000f09achar _DstBuf_7
14000f09a_read(0, &_DstBuf_7, 1)
14000f0e1if((_DstBuf | _DstBuf_1 | _DstBuf_2) != 0 || _DstBuf_3 != 0xd || _DstBuf_4 != 0x49 || _DstBuf_5 != 0x48 || _DstBuf_6 != 0x44 || _DstBuf_7 != 0x52)
14000f0e3    int16_t _Buffer_1 = 0x283a
14000f0ea    char var_11_1 = 0
14000f0f4    return puts(&_Buffer_1)
14000f0e1
14000f100int16_t _Buffer = 0x293a
14000f107char var_25_1 = 0
14000f111puts(&_Buffer)
14000f148int64_t var_20 = rol.q(ror.q(rol.q(_DstBuf.q + 0x241c85de, 0xd) ^ 0xdeadbeefcafebabe, 7) ^ 0x123456789abcdef0, 5)
14000f167enum PAGE_PROTECTION_FLAGS var_24
14000f167VirtualProtect(sub_140021280, 0x631e, PAGE_EXECUTE_READWRITE, &var_24)
14000f194for (int64_t i = 0; i != 0x631e; i += 1) *(i + sub_140021280) ^= *(&var_20 + zx.q(i.d & 7))
14000f1b0VirtualProtect(sub_140021280, 0x631e, var_24, nullptr)
14000f1b6return sub_140021280()

Here, on line 14000f0e1 , we find a direct comparison to 0000000d49484452 (\x00\x00\x00\rIHDR), a continuation of the PNG file, describing the size and type of the first chunk (the IHDR chunk).

We also see the same bit manipulation pattern as before on line 14000f148 , where the XOR key is generated.

By the fourth function, we see some different structures:

Function: sub_14002759e :: Function 4
1400275c2char _DstBuf
1400275c2_read(0, &_DstBuf, 1)
1400275d4char _DstBuf_1
1400275d4_read(0, &_DstBuf_1, 1)
1400275e6char _DstBuf_2
1400275e6_read(0, &_DstBuf_2, 1)
1400275f8char _DstBuf_3
1400275f8_read(0, &_DstBuf_3, 1)
14002760achar _DstBuf_4
14002760a_read(0, &_DstBuf_4, 1)
14002761cchar _DstBuf_5
14002761c_read(0, &_DstBuf_5, 1)
14002762echar _DstBuf_6
14002762e_read(0, &_DstBuf_6, 1)
140027640char _DstBuf_7
140027640_read(0, &_DstBuf_7, 1)
140027687if(_DstBuf != 8 || _DstBuf_1 != 6 || (_DstBuf_2 | _DstBuf_3 | _DstBuf_4) != 0 || _DstBuf_5 != 0x8d || _DstBuf_6 != 0x77 || _DstBuf_7 != 0xe)
140027689    int16_t _Buffer_1 = 0x283a
140027690    char var_19_1 = 0
14002769a    return puts(&_Buffer_1)
140027687
1400276a6int16_t _Buffer = 0x293a
1400276adchar var_35_1 = 0
1400276b7puts(&_Buffer)
1400276bdint64_t var_30 = 0
1400276c6int64_t var_28_1 = 0
1400276e7sub_1400273a0(&var_30, _DstBuf.q + 0x6274eb3f)
140027703enum PAGE_PROTECTION_FLAGS var_34
140027703VirtualProtect(sub_14003a3be, 0x1b692, PAGE_EXECUTE_READWRITE, &var_34)
140027730for (int64_t i = 0; i != 0x1b692; i += 1) *(i + sub_14003a3be) ^= *(&var_30 + zx.q(i.d & 0xf))
14002774cVirtualProtect(sub_14003a3be, 0x1b692, var_34, nullptr)
140027752return sub_14003a3be()

Here there is a similar user input check at 140027687 . But now, the XOR key is 16 bytes (indicated by the 0xf at 140027730 ). Also the XOR key is now generated by a cryptographic-looking subroutine sub_1400273a0 , seen below:

Function: sub_1400273a0 :: Function 4 XOR key generator
1400273b2uint64_t result = arg2
1400273b5int64_t s
1400273b5__builtin_memset(&s, 0, 0x100)
140027504int32_t rdx = 1
140027509int32_t r8 = 1
140027573do
140027513    int32_t rcx_1 = (r8 * 2) ^ r8
140027516    r8.b s>>= 7
14002751e    r8 = (r8 & 0x1b) ^ rcx_1
140027524    uint64_t rdx_1 = zx.q(rdx ^ (rdx * 2))
14002752d    int32_t rcx_4 = (rdx_1 << 2).d ^ rdx_1.d
140027534    int32_t rdx_4 = rcx_4 << 4 ^ rcx_4
140027538    int32_t rcx_5
140027538    rcx_5.b = rdx_4.b s>> 7
14002753e    rdx = rdx_4 ^ (rcx_5 & 9)
14002756b    *(&s + zx.q(r8.b)) = rol.b(rdx.b, 1) ^ rol.b(rdx.b, 4) ^ rdx.b ^ rol.b(rdx.b, 2) ^ rol.b(rdx.b, 3) ^ 0x63
140027573while (r8.b != 1)
140027573
140027575s.b = 0x63
140027579char* i = arg1
14002759ado
140027583    char rdx_6 = *(&s + zx.q(result.b))
140027587    *i = rdx_6
140027590    result = result u>> 4 ^ zx.q(rdx_6)
140027593    i = &i[1]
14002759awhile (i != &arg1[0x20])
14002759a
14002759dreturn result

In general, we observe that the binary contains functions which read 8 bytes, validate the user input, and then use the input somehow to generate an XOR key (either 8 or 16 bytes), which is then used to decrypt the next function:

Importantly, we can also tell if the XOR key is correct by looking at whether the decompiled function code is sensible. For example, the following is a function decoded with an incorrect XOR key:

Function: sub_1400aa04e :: Incorrect XOR key
1400aa04eint64_t rbx
1400aa04erbx:1.b = 0xca
1400aa050*(rbx - 0x7024a7ab)
1400aa05aint64_t rax
1400aa05a*(*(rax - 0x1a))()
1400aa05cundefined

In particular, we will check if there is a call to VirtualProtect as an oracle for whether the XOR key is correct.

Thinking with tokens

From manually decoding the first few functions, we learn that the overall objective of the challenge is to decode each function consecutively, solving 8-bytes of a PNG file along the way. There is some variety in the function templates, but overall they have a similar structure.

A year or two ago, I might have tried dynamic tracing or symbolic execution to hook the functions and then solve for the user input. But that sounds boring…

Instead of all that, we’re going to build an agent that will solve each function for us!

Behold:

agent

Flow

Based on our initial observations, most functions are self-contained and consist of a separate user-input check, followed by an XOR key derivation. We split the flow up into two main steps:

  1. In the first step, we instruct the LLM to analyze the _read calls, deduce the constraints on the user input, and determine the correct 64-bit integer.
  2. In the second step, we then instruct the LLM to figure out the XOR key derivation and produce an XOR key (along with its size).

We can then validate the output of the agent by performing the decryption and checking if the decompiled function is sensible (i.e. contains a call to VirtualProtect ).

In both steps, we also provide the LLM with the ability to invoke the run_python tool which will run arbitrary Python code in a jupyter-notebook style cell. LLMs are not good at precise mathematical reasoning, but are capable of writing Python programs to compute things when instructed to do so.

Prompts

The full first prompt is the following:

You are an expert CTF player.
Your task is to understand a function in a decompiled binary for a CTF challenge.

The function consists of two steps:
1. 8-bytes of user input are read into a buffer and treated as a 64-bit little-endian integer.
2. This integer is manipulated in a series of arithmetic operations, to produce an XOR key.
3. The XOR key is used to deobfuscate another part of memory.

For now, work on the first step:
1. Identify the `_read` calls to find where the user input is read into a buffer.
2. There should be 8 bytes read, but the buffer is treated as a single 64-bit little-endian integer.
3. Determine the correct user input (as a 64-bit integer).

You have access to the `run_python` tool, which executes a python notebook cell and returns the output.
Do not attempt to perform any calculations yourself, let the `run_python` tool do that.
Do not perform any kind of brute force search, the validation is quite simple, just figure out how to recover the user input directly.

Each byte of the user input should be constrained to a specific value.
If the partial condition is (x == ) or (x != ), then that byte should be .
You can perform a search if you need to, but check each byte individually on the partial condition.

When you have identified the correct user input, run the `UserInput` tool to submit your answer.
Provide the user input as a little-endian hex integer with the 0x prefix.

[CODE]
{code}
[/CODE]

We inject the HLIL (as text) inside the [CODE] tags. Also if the function has any subroutine calls, we include their HLIL inside a [HELPER_FUNCTIONS] tag appended to the end of the prompt.

Once the Agent performs the UserInput call, indicating they have recovered the user input, we then append the second prompt:

Now that you have identified the correct user input, you need to determine the XOR key.

The XOR key will be used in the loop after the VirtualProtect call.

Work step by step:
1. Analyze the code to identify how the user input is used to calculate the XOR key.
2. Carefully examine the logic and translate it to equivalent python code 1 to 1.
3. Use the `run_python` tool to run the python code and calculate the XOR key.

If there are any helper functions, be careful to keep track how the function arguments and return values are used.

When you have identified the correct XOR key, run the `XORKey` tool to submit your answer.
Provide the XOR key as a little-endian hex integer with the 0x prefix.
Also return the key_size (number of bytes) as an integer.
Check the & condition inside the unpacking loop to determine how many bytes are used.

If it is helpful, here is the same code represented in MLIL.
Make sure to pay attention to the number and order of arithmetic operations.

[MLIL]
{mlil}
[/MLIL]

We observed that sometimes, the agent wasn’t quite able to translate nested expressions correctly into python so we also provide the MLIL representation of the function (where each expression like rol and ror is on a new line) to improve reasoning ability.

Examples

Here are some example traces of the agent solving the first 5 functions with a variety of schemes:

Func 1: Func 1 - Shifted input + Bit manipulation XOR

Func 2: Func 2 - Direct input + Bit manipulation XOR

Func 3: Func 3 - Shifted input + Bit manipulation XOR

Func 4: Func 4 - Direct input + Subroutine S-box XOR key expansion (agent was able to discover S-box on its own)

Func 5: Func 5 - Bit checking input + Subroutine S-box XOR key expansion (agent wrote a python-based solver to solve the user input check and then used the same S-box key expansion to solve the XOR key)

What was really cool to see was that the o3-mini was able to deduce that a function like the following was implementing AES S-box generation and then using that to expand the user input into a 16-bit XOR key. (See Func 4 and 5 above for an example)

Function: sub_1400273a0 :: Function 4 XOR key generator
1400273b2uint64_t result = arg2
1400273b5int64_t s
1400273b5__builtin_memset(&s, 0, 0x100)
140027504int32_t rdx = 1
140027509int32_t r8 = 1
140027573do
140027513    int32_t rcx_1 = (r8 * 2) ^ r8
140027516    r8.b s>>= 7
14002751e    r8 = (r8 & 0x1b) ^ rcx_1
140027524    uint64_t rdx_1 = zx.q(rdx ^ (rdx * 2))
14002752d    int32_t rcx_4 = (rdx_1 << 2).d ^ rdx_1.d
140027534    int32_t rdx_4 = rcx_4 << 4 ^ rcx_4
140027538    int32_t rcx_5
140027538    rcx_5.b = rdx_4.b s>> 7
14002753e    rdx = rdx_4 ^ (rcx_5 & 9)
14002756b    *(&s + zx.q(r8.b)) = rol.b(rdx.b, 1) ^ rol.b(rdx.b, 4) ^ rdx.b ^ rol.b(rdx.b, 2) ^ rol.b(rdx.b, 3) ^ 0x63
140027573while (r8.b != 1)
140027573
140027575s.b = 0x63
140027579char* i = arg1
14002759ado
140027583    char rdx_6 = *(&s + zx.q(result.b))
140027587    *i = rdx_6
140027590    result = result u>> 4 ^ zx.q(rdx_6)
140027593    i = &i[1]
14002759awhile (i != &arg1[0x20])
14002759a
14002759dreturn result

It was able to then generate Python code like the following to solve it:

def aes_sbox():
    # Standard AES S-box
    return [
        0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5,
        0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
        0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
        0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
        # (... rest of the S-box)
    ]


def generate_key(user_input):
    # user_input is a 64-bit integer, little endian representation was given.
    # Step 1: compute arg2 = user_input + 0x6274eb3f
    arg2 = user_input + 0x6274eb3f
    result = arg2
    s = aes_sbox()
    out_bytes = []
    # Generate 32 bytes as per loop
    for i in range(32):
        idx = result & 0xff
        b = s[idx]
        out_bytes.append(b)
        result = (result >> 4) ^ b
    return out_bytes

# Given user input:
user_input = 0x0e778d0000000608

key_schedule = generate_key(user_input)
# The XOR loop later uses only 16 bytes: index = i mod 16 into key_schedule
xor_key = key_schedule[:16]

# Represent xor_key as a little-endian hex integer.
# Since xor_key is a sequence of 16 bytes in order, that order is little-endian order.

val = 0
for i, b in enumerate(xor_key):
    val |= (b << (8 * i))

print(f"XOR key (little-endian hex): 0x{val:032x}")
print(f"key_size: {len(xor_key)}")

Takeaways

During the actual CTF, we used this agent to solve around the first 100 functions (left it running overnight Saturday), at a cost of around $20.

However, after examining the decoded functions, we realized that many of them were repetitive enough to solve with a heuristic-based script and ended up switching to that for the rest of the challenge which was much faster (and cheaper).

The agent did obviate the need to actually manually reverse engineer any decoding schemes. I ended up just stealing one its solutions for the S-box key expansion and for the bit checking user input (Func 4 above).

Later in part two of the challenge, a few new schemes were introduced, including an embedded RC4-based user input checker, which o3-mini was able to solve by recognizing the RC4 pattern and writing the correct Python code to decode it.

Code

You can find the code for the agent here: hgarrereyn/nfuncs-agent.

This is messy CTF proof-of-concept code, but it may serve useful as an example.