Sunday 7 January 2024

[re:educate] How to Pwn Buffer Overflow - Overwriting Variable

How to Pwn Buffer Overflow - Overwriting Variable

This article is written by one of MCC alumni, Wan Muhammad Khairuddin from Universiti Teknikal Malaysia Melaka (UTeM). Wan is one of the skilled CTF players and have good understanding in reverse engineering and pwn categories. He observed there is an increasing of CTF fans among the students and would like to share a basic on how to solve PWN category.

Have you ever heard of buffer overflow? Have you tried to exploit them? Or perhaps you have no idea what it is or what you can do with it?

Buffer overflow is a common challenge in Capture the Flag (CTF) competition under binary exploitation or pwn category. If you ever played CTF, you might have encounter this category but do not know how to solve it.

In this article, we will together explore a vulnerability called stack buffer overflow and how to exploit them. Lets together try to understand the basic things about memory/stack, and what is it happening when a program is running at low level. With these information, we could be able to understand how to detect this vulnerability, why is the cause and how we can exploit it to overwrite variable value.

Intro

A buffer overflow condition exists when a program attempts to put more
data in a buffer than it can hold or when a program attempts to put
data in a memory area past a buffer. — OWASP Foundation

Okay, what is that?

Buffer overflow is a vulnerability that can be found in a software or a compiled binary caused by the usage of insecure function that takes user input. In most of the times, this compiled binary is written in C programming and the developer used an insecure function such as gets(), strcpy() or scanf to get the user input.

What is wrong with these functions?

We maybe often using these functions in our school project to collect user input, so we might not be aware that these functions can cause a security problem.

Insecure function

As what have been mentioned, the gets(), strcpy() and scanf() are insecure functions. Why? Now let’s talk about that.

These functions are not secure because they take user input and store it in a variable without checking how much the variable can store. In other words, these functions can take a long user input and stores it in a variable that has a size shorter than the user input.

For example, we may create a variable to store strings with length 24 characters.

char name[24]; // variable to store name. only 24 characters !!

Then we use an insecure function gets() to take user input and store it inside the name variable.

gets(name); // take user input and store it in name variable

The gets() function does not know that the name variable can only stores 24 characters but it will take as much user input as it can until the user press Enter and store it in name variable. Same goes to the other mentioned functions like strcpy or scanf.

Now that we know the behaviour of these insecure functions, then what happen to that long user input? Well, it will be stored in the name variable, BUT only the 24 characters will be stored while the remaining characters will overflow to the adjacent memory region!

In other words, it will overflow and overwrite other variables. And this is what we called as stack buffer overflow. So, if we enter 100 characters of “A”, only 24 characters will be stored in the name and the other 76 characters will overflow to the other memory region.

Overwrite variable

Now we know that this bug or vulnerability allows our user input to overwrite other data in memory, we can actually take advantage or exploit this bug to overwrite other variables that exist in the program.

Lets see the below code as an example:

#include <stdio.h>
#include <string.h>

int main() {
    char name[]="Mike"; // name variable contains "Mike"
    char buffer[10];

    printf("Enter a string: ");
    gets(buffer); // take user input

    printf("name variable contains: %s\n", name); // print name variable
    return 0;
}

The code above has two variables, name which has a value “Mike” and buffer which can only store 10 characters. Then the code asks for user input using gets() function (the insecure function) and stores it in the buffer variable. Finally the code prints out name variable. If we run the program and enter “hello” as user input, what will be the output?

Correct, it will print out “Mike”.

But what if we enter 16 "A"s, “AAAAAAAAAAAAAAAA”? Do not be suprise that the name variable does not store “Mike” anymore. Now the output will be “AAAAAA”. Mindblown

What??

To understand that, we need to understand “stack”, the memory region that store variables and addresses. Every function in a program has a memory layout that we call as “stack frame”. Inside this stack frame, the value for variable is stored on top of each other. For example, main() function has 2 variables. So the value for these variables will be stored inside main() function’s stack frame. The stack frame looks like the following:

+-----------------------+
|     return address    | // ignore this for now
+-----------------------+
|function's base address| // ignore this for now
+-----------------------+
|         Mike          | <- value for name variable
+-----------------------+
|                       | <- value for buffer variable. only 10 characters!!
+-----------------------+
|                       |

As per the above diagram, that is how the value for each variables stored in memory. They sit on top of each other. When we enter “hello”, it will be stored below the name variable. But when we enter “AAAAAAAAAAAAAAAA”, only 10 "A"s will be stored inside the buffer variable while the rest will overflow into name variable as a result overwriting the value in name variable.

Worth noting that the value is stored from bottom to top, thats why the "A"s overflow to the top and not to the bottom. Now if we enter “AAAAAAAAAACCCCCC”, the output that we will get is “CCCCCC”. You can try run this and see it for yourself.

+-----------------------+
|     return address    | // ignore this for now
+-----------------------+
|function's base address| // ignore this for now
+-----------------------+
|        CCCCCC         | <- value for name variable
+-----------------------+
|      AAAAAAAAAA       | <- value for buffer variable. only 10 characters!!
+-----------------------+
|                       |

The challenge

Now that we understand the theory, lets try a challenge. Without looking at the solution, lets make the following program print “you win”.

int main() {
    int secret = 0x12341234; // Initialize the secret variable
    char text[] = "I love cats";
    char buffer[10];
    
    puts("Enter: ");
    gets(buffer);

    if (secret == 0x4f4f4f4f) {
        printf("You win!\n");
    } else {
        printf("You lose :( your secret is %x, not 0x4f4f4f4f\n", secret);
    }

    return 0;
}

Tips: Find out how much character we need to enter to completely fill thebuffer variable and overwrite text variable before we can overwrite secret ?

Solution

Got it? Would like to verify if your solution is correct? Lets dive in together.

To solve this challenge, we need to build our own exploit payload. To build the payload, we have to calculate the size of the buffer and text.

size of buffer = 10
size of text = 11 # including whitespace
size of buffer + size of text + 1 char for nullbyte = 21

So, our payload has to be 22 in length to fill buffer variable and completely overwrite the text variable. But this is not enough since our main goal is to overwrite secret, so we need to add another 4 characters (0x4f4f4f4f is 4 bytes or characters). So what is 0x4f4f4f4f is translated to in ascii character? its ‘O’. Finally, our payload should look like this

AAAAAAAAAAAAAAAAAAAAAOOOO

where ‘A’ is a junk character and we can replace with any character.

Got it? Congratulations!

Conclusion

Stack buffer overflow is an impactful vulnerability and we can do more than just overwriting variable. Using this vulnerability, attacker can exploit it and get a Remote Code Execution (RCE). This article just explain a small part of buffer overflow attack and there is more to it such as overwriting return address to change the execution flow of the vulnerable program.

But before we try to go through to further process on achieving RCE, it is crucial to have a basic understanding first such as how does the stack look like. Being able to visualise the stack is very helpful when exploiting this vulnerability.

That’s all, thank you !

Share: