0 - INTRODUCTION
- Format String vulnerabilities result from data entry in a program under the guise of "format strings". The format strings are characters used in the input and output functions to specify the conversion between a data set and a string of characters.
- Thus, vulnerability is a result of the interpretation by the program of data inputs as instructions or commands of language itself. The consequences of the attack would be for example the execution of arbitrary code, reading or dumping the stack as a protected information disclosure, and denial of service, all affecting the security and stability of the system.
- Although C and C ++ languages are prone to suffer from Format String attacks, other programming languages are also vulnerable to these attacks, such as Perl, PHP, Java, Ruby, etc ...
- The below links offer information about Format String attacks in several different languages than C / C ++:
http://www.drdobbs.com/security/programming-language-format-string-vulne/197002914
http://www.drdobbs.com/security/programming-language-format-string-vulne/197002914?pgno=2
http://www.drdobbs.com/security/programming-language-format-string-vulne/197002914?pgno=3
http://www.drdobbs.com/security/programming-language-format-string-vulne/197002914?pgno=4
- During the resolution of this exercise the C language has been used because it is traditionally which has suffered most attacks of Format String type.
- To explain the attack should be noted the following:
a) Function Format: ANSI C standard converts a variable primitive programming language as a format function representation in the form of readable string for a human being. For example: printf, fprintf, sprintf, etc ... are examples of language functions in format C. Such functions are called variadics, and characterized by accepting a variable number of arguments. As we will be seen below, the arguments can be of two types: on the one hand the argument that characterizes the format output, on the other hand the values to be formatted.
b) Format String: this is the set of arguments used in the function format and are composed of text and parameters ASCIIZ (ASCII Code 0, strings ending in 0) type. Its utility is to specify and control the representation of variables. They are quoted, for example: printf ( "Today is November day %d. \ n", 22);
c) Conversion Character: this is the parameter that defines the type of conversion to be performed by the format function. For example %d (integer, reads an integer from memory), %f (Floating point, reading a real number format from memory), %c (char, a single character), %s (string, reading from the memory of a string of characters), %x (reading from the stack an hexadecimal number), etc ...
- As mentioned above, the attack would be implemented by inserting entries maliciously crafted, what would not be adequately validated by the program, so that the behavior function format would be different than expected.
- Regarding comparison with Stack Overflow attacks, both attacks seem interested in making a malicious usage of the stack. However, while the attack Stack Overflow is specially designed to rewrite the contents of the stack, forcing the program to execute arbitrary instructions, the Format String attack focuses on using converters from C language, for example %s, %x, etc ..., so that the stack interpreters the converters as part of the entered parameter in the function as an argument.
- The consequences of an Stack Overflow attack usually result in altering the flow control program, which executes its instructions leading to different outputs from initial purpose. However, Format String attacks are oriented either to the disclosure of information stored in certain memory locations or the denial of service (DoS). Both cases can be checked in detail in the following example.
1 - FORMAT STRING ATTACK - EXAMPLE - Disclosure of information and DoS
- To ilustrate the Format String attack a simple program (fs.c) written in C will be used.
- The original purpose of the program is simply to return to the screen the argument entered by the user at the command line.
- However, the program contains proprietary information ("INFORMACION 1" and "INFORMACION 2") that might be disclosed using a Format String attack.
- Editing and compiling the program in a Linux enviroment:
- Let's examine the code of the program. Two pointers (*s1 and *s2) are defined to memory locations that store certain reserved information:
- Also, an instruction has been introduced which aims simply to print out the argument argv [1] inserted by the user via the command line:
printf (argv [1]);
- Thus, the purpose of the program in not at all intended to "reveal" the information stored in s1 and s2, but simply return to the screen the argument entered by the user.
- However, let's see how through manipulation of the command line input the result can be very different than expected.
- First, let's enter a string in the command line argument, which is printed below the screen, according to the proper purpose of the program:
- However, let's see what happens when the arguments are entered by the user conversion characters.
- Introducing %s:
- Introducing %s%s:
- Introducing HOLA%s%s:
- It is observed that by introducing in the command line the converter %s a "revelation of information" (information disclosure, information leakage) occurs, so that the program behavior differs from the original purpose of it.
- As defined in the program, pointers *s1 and *s2 are stored on the stack in consecutive positions, prior to argument argv [1] expected to be received by the line command.
- Thus, upon receiving as input the conversion character %s, the printf function reads from the stack the nearest string, printing it. Upon receiving %s%s it performs the same operation with the two strings stored in the stack.
- To check the above concepts and analyze the contents of the stack the gdb GNU debugger will be used on the program.
- The disassembly of the main function shows that the call to printf function occurs in the memory address 0x08048433:
- Setting a breakpoint just before the call to printf, at the above address 0x08048433:
-The program is run with the input argument HOLA:
- The content of the stack is analyzed:
- Arguments received by the printf routine are stored in the lower memory addresses of the stack:
- The stack would have the following content, from low to high memory, placing the esp pointing to the direction 0xbffff598, which contains the string HOLA introduced by command line:
- This explains that while printf reads arguments stored on the stack they are consecutively printed on the screen. In this case only HOLA because it has been executed the gdb command (gdb) run HOLA:
- Now, what would happen in case of introducing into the program many string parameters, beyond the values stored on the stack? The program would begin to read meaningless memory addresses, printing strange characters:
- Finally, in case of entering more converters % s the result would be the failure of the program (segmentation fault), because it would be trying to access invalid addresses. For example:
- See the same result in gdb:
- The program finishes running with a SIGSEGV signal indicating an invalid memory access, or segmentation fault.
- In this late case the attack would result into a denial-of-service attack (DoS), since the program fails without performing the purpose for it was written.
- Another interesting converter for conducting String Format attacks is converter % x, which reads the stack in hexadecimal values.
- Let's see what happens with gdb running on four %x converters :
- At that point of the execution, the contents of the stack are:
- The program runs to the end, and the output matches the contents of the stack, except the first memory location (whose content will be seen soon):
- The first contents of the stack (lowest memory addresses) are the converters themselves %x% x%x%x, introduced as parameters:
- It can be observed the same result running the program directly from the command line, with fout %x converters arranged in the form of 0x%08x. The result would be the dump of the stack content.
- It means that the memory addresses stored on the stack in hexadecimal format would be obtained:
- Since the program output matches the contents of the stack it is concluded that the information disclosure attack has been successful.