So far I have made references to 'flow control data', in this post I want to explain and clarify what this really means.
A program, like our authentication example, is not just a big list of instructions with one following after the other. A program makes decisions, it compares data and performs different tasks according to the result. This is achieved through what are called jumps. A jump in a program is just a decision (based on some program data) as to whether or not the program should 'jump' to another part of the program, or carry on as normal. These jumps are sometimes called branches.
A certain type of branch that is very common is called a function call. When writing a program is is often useful to compartmentalise the the symantic purpose of the program into little chunks. These little chunks are called functions. For example, you might have a function that calculates a Christmas bonus based on salary and performance of your employees. Rather than writing the same calculation over and over for each employee, you would simply write a Christmas bonus function, and call that function with information from each employee. That way you only have to write the calculation once.
These function calls happen all the time in pretty much every program ever written. We are interested in exactly how a function is called in terms of a running process. When a program reaches a function call, the first thing it does is make a note of what is called a return address. This is necessary as when the function completes (in our example; when the Christmas bonus has been calculated) the process needs to return control to whatever the program was doing just before the function call.
So when the function completes, it checks the return address and jumps back to the calling procedure, just after it's own function call, so that the program can continue on as it was. This return address is the target of the majority of buffer overflow attacks. When we overflow our buffer as in Part 2, it is often the case that a return address (of some kind) is overwritten by the overflowing data. It is overwritten with whichever part of our input data happens to end up over the location in memory when the return address was stored
When the function completes, and attempts to jump back up to the calling procedure, it reads the (now incorrect) value stored in the return address, and jumps to this new memory location. This is why the program crashes with a segfault; the program tries to jump to an address that is outside it's allocated memory and the operating system will not allow it (accidentally overwriting a return address has a very very small chance of overwriting it with information that will not cause this kind of crash, as the memory allocated to a process is comparitively tiny in relation to the total potential address space).
So what we can now do, as a hacker, is supply a long username (taking our authentication example from part 2) that causes a carefully chosen address to be overwritten into the return address, whatever we point the address to is what the program will run once it jumps to that address. But what do we jump to? This is the tricky part.
We now have the power to run any set of instructions we like, provided they can already be found somewhere in the memory available to the process. So we could jump to some other random procedure and possibly cause some harm, but we are still confined to just executing instructions that were already present in the program.
Or are we? Remember that we have just written a great big long username into memory, and only the tail end of it is actually doing anything useful (the part that has been carefully crafted with our desired return address). We can use the rest of it to store a sequence of instructions, and then point the return address to the beginning of our sequence of instructions, now we have total control of the process!
The sequence of instructions is called the shellcode, because classically it has always been used to spawn a local or remote shell. The process of writing a shellcode from scratch is quite complex and involved and I plan to address it in a later part to this tutorial (I highly recommend The Shellcoders Handbook for those who want to learn more, link to the side). For now just be convinced that we can A) Cause the remote process to jump to any location in memory that we choose, and B) write a sequence of instructions to memory to execute any (albeit small) program we like.
Monday, 28 March 2011
Shellcode Tutorial - Part 2
So we've got a very blurry and nonsensical image of whats kind of going on. In this post I will try to solidify that a little.
In practical terms, a buffer overflow vulnerability appears when a programmer forgets to check the length of user-supplied data. So lets say you're logging into a website, and whoever programmed the process that authenticates the username and password you supply forgot to check the lengths of these two things. You could supply a really really long username, thousands of characters long. But why is this a problem?
This is a problem because of the way the computer (thats the remote server doing the authentication, not the computer infront of you) stores program data. It must 'write down' the username you give it, and to do that it has to allocate a certain amount of space for it. This amount of space is hard coded, and most programmers will think something along the lines of - allocate 20 characters worth of space for the username, and write the username they type in to that space. So if we type in a username longer than 20 characters what happens? Well, we crash the program. Our long username gets written to the allocated space, fills it, and continues to overflow into other areas of memory, overwriting whatever was previously stored there. This is almost always catastrophic for the running program and will almost always cause it to crash with a segfault.
So how do hackers exploit this simple little programming error? Well remember in part 1 that I said the fundamental issue is that program data (in our example that would be the username) and process flow control data (thats information that the program uses internally, its not data any user - even the programmer - is likely to see, but its necessary for a running program to keep track of certain things, these things are the control flow data) are stored side by side in memory. So when our long username overflows its buffer and writes to other areas of memory, sometimes it will overwrite control flow data. And it is in these cases that a skilled hacker can take control of the process by steering it in the direction he wants in just the right way.
In practical terms, a buffer overflow vulnerability appears when a programmer forgets to check the length of user-supplied data. So lets say you're logging into a website, and whoever programmed the process that authenticates the username and password you supply forgot to check the lengths of these two things. You could supply a really really long username, thousands of characters long. But why is this a problem?
This is a problem because of the way the computer (thats the remote server doing the authentication, not the computer infront of you) stores program data. It must 'write down' the username you give it, and to do that it has to allocate a certain amount of space for it. This amount of space is hard coded, and most programmers will think something along the lines of - allocate 20 characters worth of space for the username, and write the username they type in to that space. So if we type in a username longer than 20 characters what happens? Well, we crash the program. Our long username gets written to the allocated space, fills it, and continues to overflow into other areas of memory, overwriting whatever was previously stored there. This is almost always catastrophic for the running program and will almost always cause it to crash with a segfault.
So how do hackers exploit this simple little programming error? Well remember in part 1 that I said the fundamental issue is that program data (in our example that would be the username) and process flow control data (thats information that the program uses internally, its not data any user - even the programmer - is likely to see, but its necessary for a running program to keep track of certain things, these things are the control flow data) are stored side by side in memory. So when our long username overflows its buffer and writes to other areas of memory, sometimes it will overwrite control flow data. And it is in these cases that a skilled hacker can take control of the process by steering it in the direction he wants in just the right way.
Shellcode Tutorial - Part 1
This is the first of several posts I am going to write, giving a (hopefully) quite easy explanation of a buffer overflow vulnerability, exploit, and payload (that last part is what people call the shellcode, but we'll get to that).
The fundamental problem at the root of buffer overflow vulnerabilities is that in any computer that operates a stack (and, thats just about any machine you could care to think of), program variables are mixed in with thread control data. This means that program data (anything from names, email addresses, the number of cats in new york city) is stored alongside flow control data (memory addresses, jumps, library calls etc).
So what does this all mean? It means that if you can jingle around the program data in a program running on a remote machine, you might just be able to cause it to write over some 'nearby' process control data, and if you do it just right, you can take control of the process entirely.
Thats all for part 1, just a very abstract and brief overview of what happens during a buffer overflow attack (an attack by the way that has been at the heart of nearly every single major computer breach in history)
The fundamental problem at the root of buffer overflow vulnerabilities is that in any computer that operates a stack (and, thats just about any machine you could care to think of), program variables are mixed in with thread control data. This means that program data (anything from names, email addresses, the number of cats in new york city) is stored alongside flow control data (memory addresses, jumps, library calls etc).
So what does this all mean? It means that if you can jingle around the program data in a program running on a remote machine, you might just be able to cause it to write over some 'nearby' process control data, and if you do it just right, you can take control of the process entirely.
Thats all for part 1, just a very abstract and brief overview of what happens during a buffer overflow attack (an attack by the way that has been at the heart of nearly every single major computer breach in history)
Home of MySQL Hacked.....BY SQL INJECTION!
As ironic as it sounds, the homepage of mysql (mysql.com) has been broken into via a web script vulnerability on their website. The script allowed an unsanitised SQL injection to be passed to the database for processing! Talk about putting your foot in it!
Hello World
Welcome to Silver Shadow Musings, a place where I hope to share my thoughts and expertise in the world of Information Security, Ethical Hacking and Penetration Testing!
Subscribe to:
Posts (Atom)