Security-specific Programming Errors (Part 1)
Thomas Biege
Table of Contents
- Introduction/Motivation
- C
Introduction/Motivation
"What does security have to do with me as a
programmer?" "All I want is to finish my program
quickly and I want it to have as many functions as possible."
Statements like these are fairly typical of a lot of (but not all!)
programmers. Such a pragmatic and shortsighted approach isn't entirely
unfounded. The pressure on and expectations of development teams in
commercial as well as non-profit (for example, KDE vs. GNOME, kernel
development etc.), software is high. Mammoth software products have to
be shipped in record time, simply because the manager who negotiated
the contract with the customer lacked the experience to realistically
assess the time frame. Or a fight with the competition has to be
won. Owing to this situation, security problems are a recurring
feature of large and small programs. These security problems generate
costs not only within the company doing the development, but also
further down the line - with the customer.
These costs are incurred by the following activities:
- Programmers need to be pulled out of current projects to
eliminate the security flaw
- Customers have to be notified
- The new product has to be made available to the customer
- The customer has to replace the old product. This can lead to
errors which, in turn, represents more work for in-house support
staff
- The image of the company suffers, resulting in long-term
damage (consider the attacks on Microsoft's own network using
Microsoft products)
- Further, products for banks and insurance companies generally
need to be checked by an external (and very expensive)
consultant
A large portion of these costs can be avoided by making security an
issue from the very outset (as part of the QA process). To achieve
this, the developers need to know all about the vulnerabilities and
characteristics of the programming language they are using as well as
how to avoid them and work around them.
In special cases - financial and insurance products, privileged
server programs etc. - an additional review of the source code by two
experienced people, usually programmers, should be carried out. These
'reviewers' should not be involved in the project and they should
review the code in turn (four eyes are better than two) to make sure
that no errors have found their way into the program.
Armed with the knowledge presented in this article and the source
code review, a lot of the errors should disappear from the software
product, thus keeping any follow-on costs to a minimum.
C
C is one of the most commonly used programming languages. It is
used to program the kernels of operating systems as well as small
privileged system tools, server daemons and graphical interfaces
(C/C++).
This is why most security flaws are found in programs that were
written with C. This does not mean to say that C is an "insecure"
language per se, but that C is very widespread, powerful and
flexible.
A lot of the security risks described in this section also apply to
other programming languages as they are dependent on the operating
environment.
Buffer Overflow
Buffer overflows are one of the main reasons for security
vulnerabilities and program crashes. A buffer overflow occurs whenever
data from an untrusted source such as a keyboard, network or user file
is stored in a fixed-size chunk of memory without array bounds
checking.
The consequences of buffer overflows depend largely on the storage
type. However, let us first briefly consider the memory management
process of programs on Intel-based processors in order to better
understand the processes in the following attacks.
Local variables are stored on the stack and global variables are
stored on the heap. The stack and heap share the same chunk of memory,
with the heap growing from the bottom up, and the stack growing
downwards.
Every function has its own stack frame where it saves its
local variables. The same applies to code blocks enclosed in the
braces { and } in C/C++. In addition to local data, the contents of
the CPU registers first need to be saved before jumping to the
function's machine code on the stack. This is to enable a return to
the calling function after the function is done.
| First, the function parameters are pushed to the stack before the
routine is called. |
 |
| The Assembler command CALL saves the current
position in the machine code, which is represented by CS:IP,
and the BP register, that it requires for its own stack frame, that of
the calling function. |
| Finally, the function sets up its stack frame for the local
variables. |
When the function is done, it returns to its parent program by
using RET to restore the CS, IP and BP
registers saved on the stack. This means that execution of the
machine code resumes at the point where the saved CS:IP points,
i.e., after the CALL command.
Lets us now consider the consequences.
Stack. Buffer overflows on the stack can be exploited in
three ways.
The contents of variables above the variable in question on the
stack can be overwritten with any kind of data by the attacker. A
classic example of a security vulnerability that can be exploited is
password-based authentication. The password is first retrieved from a
local database and stored in a variable. Later on, the user is
prompted to enter the password and the program compares the two
strings.
For example:
[...]
/* that's our secret phrase */
char origPassword[12] = "Secret\0";
char userPassword[12];
[...]
gets(userPassword); /* read user input */
[...]
if(strncmp(origPassword, userPassword, 12) != 0)
{
printf("Password doesn't match!\n");
exit(-1);
}
[...]
/* give user access to everything */
[...]
If the user now enters more than 12 characters (32-bit alignment),
he will overwrite the contents of origPassword[]. Thus, if he
enters opensesame!!opensesame!!', userPassword[] and
origPassword[] contain the same string (opensesame!!)
and the comparison is thus positive.
Of course, not only the contents of variables, but also the
saved registers on the stack can be overwritten. Thus, by
entering even more characters and overwriting the instruction
pointer (IP), the attacker can execute the program code,
with RET at the end of the function, at any point in the
program. Generally, however, the program's own code is not used;
instead, the CPU is fed with the attacker's own machine code. To do
this, the machine code is written into the variable, i.e., the stack,
and, in addition, the saved IP address is set to the start
address of the attacker code. If the variable is too small to accept
the machine code, then it can still be stored in the program
environment, on the heap or elsewhere in the accessible address
space. When the function finishes, RET fills the
IP register of the CPU with the IP value from our stack,
which was set by the attacker, and the computer now faithfully
executes the attacker's code sequences.
Moreover, function pointers can be overwritten in order
to execute third-party code when the pointer is used. The principle is
therefore the same. The attacker places his machine code in a global
or local variable or in the program environment - no overflow is
required to do this, a place where the code can be stored is all that
is needed - and has the function pointer point to his program
code.
When the function pointer is used to call the function, it is not
the function code that is executed but the attacker's code
instead.
For example:
[...]
long (* funcptr) () = atol();
[...]
/*
** the attacker writes his code somewhere in the
** addressable memory
*/
[...]
/*
** thanks to an overflow in the program code, the attacker
** overwrites the value which (*funcptr) contains
** with the start address of his own code
*/
[...]
/*
** the function is called by the pointer and the third-
** party code is thus executed
*/
(*funcptr)(string);
[...]
Heap. Just like the stack, heap overflows can be used to
modify data and function pointers thus altering the manner in which
the program behaves to the attacker's advantage. The heap also offers
the chance to overwrite the jmp_buf variable of the
setjmp(3) function. Among other things, the Jumpbuffer contains
the address for the position in the program code when setjmp(3)
is called. When this value is overwritten with the start address of
the own machine code and if longjmp(3) is then called, the
IP register is set to the beginning of the third-party code and
thus made to execute.
Range. Errors that are difficult to find are caused by
exceeding value ranges with numeric variables. The code snippets below
should illustrate the danger.
For example:
1.)
[...]
unsigned int uintAnzahl = GetZahlFromUser();
unsigned int uintGroesse = uintAnzahl * sizeof(struct
myStructure);
/*
** When the user enters the maximum for the
** range unsigned int
** (UINT_MAX defined in limits.h) as a number, then a value
** greater than UINT_MAX is obtained by
** multiplying it.
** The variable now overflows, and the consequence is
** that uintGroesse is allocated a smaller value.
*/
myStructureArray[i] = malloc(uintGroesse);
/*
** with the malloc(3) call, a smaller portion
** of memory is allocated than
** UINT_MAX * sizeof(struct myStructure);
** The smaller buffer thus inevitably leads to a
** buffer overflow.
*/
[...]
2.)
[...]
unsigned int uintAnzahl = GetZahlFromUser();
myArray[i] = malloc(uintAnzahl + strlen("oops!"));
/*
** The same happens with addition,
** the buffer allocated by malloc(3) is too small
*/
3.)
[...]
char Buffer[1024];
[...]
int intAnzahl = GetZahlFromUser();
[...]
if(intAnzahl > sizeof(Buffer))
{
fprintf(stderr, "Buffer too small!\n");
exit(-1);
}
/*
** If we specify -1 for intAnzahl, the expression in
** the if condition is FALSE, but if we
** a malloc(3), memcpy(3) or similar
** later, then -1 is handled as a positive
** value by the functions. The value -1
** corresponds to approx. 4 GB.
*/
[...]
A number of system/library calls and program
segments, which are the most common reason for buffer
overflows are listed at the end of this section.
gets(3)
Data is read from stdin into a static buffer. The most famous bug
of this kind was exploited by the Morris Internet Worm in
fingerd in order to execute commands on a computer across the
network.
Wrong:
[...]
char HopeItFits[12];
[...]
while(gets(HopeItFits) != NULL)
{
puts(HopeItFits);
memset(HopeItFits, 0, sizeof(HopeItFits));
}
[...]
Right:
With fgets(3) data can be read securely by restricting size.
By specifying the amount of data to read with
sizeof(HopeItFits), i.e., 12 bytes, fgets(3) then reads
only 12-1 bytes and also adds the NULL character at the end of
the line. This way, no problems occur when the string is further
processed with the str*-functions in string.h.
[...] char HopeItFits[12]; [...]
while(fgets(HopeItFits, sizeof(HopeItFits), stdin) !=
NULL)
{
puts(HopeItFits);
memset(HopeItFits, 0, sizeof(HopeItFits));
}
[...]
scanf(3)
The scanf functions also generally read data without array
bounds checking.
Wrong:
[...]
char HopeItFits[12];
[...]
while(scanf("%s", HopeItFits) != NULL)
{
puts(HopeItFits);
memset(HopeItFits, 0, sizeof(HopeItFits));
}
[...]
Right:
With *scanf(3) a size limit can be set in the format
string for the format tags. For strings, this is done with
%.<size>s.
[...]
char HopeItFits[12];
[...]
while(scanf("%.11s", HopeItFits) != NULL)
{
HopeItFits[11] = `\0`;
puts(HopeItFits);
memset(HopeItFits, 0, sizeof(HopeItFits));
}
[...]
*sprintf(3)
Basically, the problem here is the same as with the scanf
functions. The array bounds can be defined either in the
format tags too or via the snprintf(3) or
vsnprintf(3) function, which contains the size of the
destination buffer as the second parameter.
Wrong:
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
sprintf(HopeItFits, "%s", BigBadBuffer);
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
}
[...]
Right:
Format tags:
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
sprintf(HopeItFits, "%.11s", BigBadBuffer);
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0,
sizeof(BigBadBuffers));
}
[...]
snprintf(3):
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
snprintf(HopeItFits, sizeof(HopeItFits), "%s",
BigBadBuffer);
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0,
sizeof(BigBadBuffers));
}
[...]
strcpy(3)/strcat(3)
With strcpy(3) and strcat(3), too, attention needs to
be paid to the size of the destination buffer.
Wrong:
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
strcpy(HopeItFits, BigBadBuffer);
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
}
[...]
Right:
The number of bytes to copy can be specified with strncpy(3)
and strncat(3). However, be careful as
strncpy(3)/strncat(3) copies the exact number of bytes
specified as the third argument when the function is called, and
(strncpy(3)/strncat(3)) does not NULL-terminate the
string. This particular feature should be taken into
account. strncpy(3)/strncat(3) therefore does not work like
fgets(3)!
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
strncpy(HopeItFits, BigBadBuffer,
sizeof(HopeItFits)-1);
HopeItFits[sizeof(HopeItFits)-1] = '\0';
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
}
[...]
strncpy(3)/strncat(3), when used incorrectly
A lot of programmers use strncat(3) or strncpy(3) and
think that they are on the safe side. However, they often forget the
particular characteristic of strncpy(3) (see above). This
results in a 1-byte buffer overflow, which leads to a segmentation
fault without necessarily posing a security threat.
Wrong:
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
strncpy(HopeItFits, BigBadBuffer,
sizeof(HopeItFits));
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
}
[...]
Right:
[...]
char HopeItFits[12];
char BigBadBuffer[120];
[...]
while(scanf("%.120s", BigBadBuffer) != NULL)
{
BigBadBuffer[111] = `\0`;
strncpy(HopeItFits, BigBadBuffer,
sizeof(HopeItFits)-1);
HopeItFits[sizeof(HopeItFits)-1] = '\0';
[...]
memset(HopeItFits, 0, sizeof(HopeItFits));
memset(BigBadBuffers, 0, sizeof(BigBadBuffers));
}
[...]
Reading in a loop while ignoring buffer lengths
Loops that read user input until a specific character (such as the
newline character '\n') is found in the input stream are
common.
Wrong:
[...]
int Byte, i;
char HopeItFits[12];
[...]
i = 0;
while((Byte = getc(stdin)) != `\n`)
{
HopeItFits[i] = Byte;
[...]
i++;
}
[...]
Format tags:
To force a buffer overflow, all the attacker needs to do is enter
more than 12 bytes without a newline character.
Right:
[...]
int Byte, i;
char HopeItFits[12];
[...]
i = 0;
while((Byte = getc(stdin)) != `\n`)
{
HopeItFits[i] = Byte;
[...]
if(++i >= sizeof(HopeItFits))
{
fprintf(stderr, "Too much data read!\n");
return(-1);
}
}
[...]
Of course, this can also be solved with strncat(3).
getwd(3)
The library function getwd(3) returns the name of the
current directory to the char array that it received as an
argument. If the array is too small for the name, a buffer overflow
occurs. More recent versions of the getwd(3) implementation
write a maximum of PATH_MAX characters to the array. One is
thus safe when the array is PATH_MAX+1 byte large.
By using getcwd(3) or get_current_dir_name(3), one
can be sure that one's program does not contain a buffer overflow due
to implementation discrepancies. With getcwd(3), however,
caution is called for as it only calls popen("pwd") on old
SunOS systems which brings with it its own set of problems. (see
section "Program Environment")
And many more
There are a lot more functions that do not perform array bounds
checking. They depend on the operating system, the existing libraries
and the implementations.
|
|
|