In this chapter, we discuss the advantages of strong typing and how it can be used to avoid common implementation and maintenance issues.
The notions of tainted data and trusted data usually refer to data coming from the user vs. data coming from the application. Tainting is viral, in that any result of a computation where one of the operands is tainted becomes tainted too.
Various C/C++ static analyzers provide checkers for tainted data that help find bugs where data from the user serves to compute the size of an allocation, so that an attacker could use this to trigger a buffer overflow leading to an Elevation of Privilege (EoP) attack.
In Ada, the compiler can provide the guarantee that no such bugs have been
introduced by accident (although you can still bypass the rule if you
really want to, for example by using
address clause overlays), provided different types are used for tainted
and trusted data, with no run-time penalty. This can be done with many
types of data, including basic types like integers.
Let's say tainted data is of an integer type. The basic idea is to derive the trusted type from the tainted one, and to provide a function Value to get to the raw data inside a trusted value, like the following:
Notice that the implementation of
Value is just a type conversion:
Then, make sure the sensitive program uses trusted data:
Let's try to pass in data from the user to the sensitive program:
The compiler returns with a type error.
Now, this does not prevent us from doing useful computations on trusted data as easily as on tainted data, including initialization with literals, case statements, array indexing, etc.
Trusted_Value is a type derived from the tainted type
Integer), all operations allowed on tainted data are also allowed
on trusted data, but operations mixing them are not allowed.
Be aware that nothing prevents the program itself from converting between tainted data and trusted data freely, but this requires inserting an explicit conversion, which can be spotted during code reviews.
To completely prevent such unintended conversions (say, to facilitate
maintenance), the type used for trusted data must be made private, so that
only the package which defines it can convert to and from it. With
Trusted_Value being private, we should also provide a corresponding
function for each literal which we used previously, as well as the
operations that we'd like to allow on trusted values (note that for
efficiency all operations could be inlined):
The new implementation is as expected:
Of course, the client now needs to be adapted to this new interface:
That's it! No errors can result in tainted data being accidentally passed by the user where trusted data is expected, and future maintainers of the code won't be tempted to insert conversions when the compiler complains.
Input validation consists of checking a set of properties on the input which guarantee it is well-formed. This usually involves excluding a set of ill-formed inputs (black-list) or matching the input against an exhaustive set of well-formed patterns (white-list).
Here, we consider the task of validating an input for inclusion in an SQL command. This is a well-known defense against SQL injection attacks, where an attacker passes in a specially crafted string that is interpreted as a command rather than a plain string when executing the initial SQL command.
The basic idea is to define a new type
SQL_Input derived from type
Validate checks that the input is properly
validated and fails if not. Function
Valid_String returns the raw
data inside a validated string, as follows:
The implementation of
Validate simply checks that the input string
does not contain a dangerous character before returning it as an
Valid_String is a simple type conversion:
Now, this does not prevent future uses of such type conversions in the
program, whether malicious or unintended. To guard against such
possibilities, we must make type
SQL_Input private. To make sure we
do not ourselves inadvertently convert an input string into a valid one in
the implementation of package
Inputs, we use this opportunity to
SQL_Input a discriminated record parameterized by the
Each time we access field
Valid_Input, a discriminant check will be
performed to ensure that the operand of type
SQL_Input has been
validated. Observe the use of
Unbounded_String for the type of the
input component, which is more convenient and flexible than using a
Note in the implementation of
Validate, that instead of raising an
exception when the string cannot be validated, as in the first
implementation, here we create corresponding validated or invalid input
values based on the result of the check against dangerous characters.
Is_Valid function has been added to allow clients to query
validity of an
That's it! As long as this interface is used, no errors can result in improper input being interpreted as a command, while ensuring that future maintainers of the code won't inadvertently be able to insert inappropriate conversions.
Of course, this minimal interface does not really provide anything other
than the validation of the input. Simply having an
function to tell whether a string is valid input data would seem to give
you much the same functionality. However, you can now safely extend this
package with additional capabilities, such as transformations on valid SQL
inputs (for example, to optimize queries before sending them to the
database), or to resolve queries faster using a local cache, and so forth.
By using the private encapsulation, you are guaranteed that no client
package will tamper with the validity of the SQL inputs you are
Incidentally, the similar but distinct problem of input sanitization, where possibly invalid data is transformed into something that is known valid prior to use, can be handled in the same way.
In this section, we discuss an application that accesses a two-dimensional table. We first look into a typical implementation, and then discuss how to improve it with better use of strong typing.
Let's look at an application that declares a two-dimensional lookup table, retrieves a value from it an displays this value.
In this application, we use
Y as indices to access the
Tab table. We store the value in
V and display it.
In principle, there is nothing wrong with this implementation. Also, we're
already making use of strong typing here, since accessing an invalid
position of the array (say
Tab (6, 25)) raises an exception.
However, in this application, we're assuming that
X always refers
to the first dimension, while
Y refers to the second dimension.
What happens, however, if we write
Tab (Y, X)? In the application
above, this would still work because
Tab (5, 1) is in the table's
range. Even though this works fine here, it's not the expected behavior.
In the next section, we'll look into strategies to make better use of
strong typing to avoid this problem.
One could argue that the problem we've just described doesn't happen to competent developers, who are expected to be careful. While this might be true for the simple application we're discussing here, complex systems can be much more complicated to understand: they might include multiple tables and multiple indices for example. In this case, even competent developers might make use of wrong indices to access tables. Fortunately, Ada provides means to avoid this problem.
Using stronger typing¶
In the example above, we make use of the
Positive type, which is
already a constrained type: we're avoiding accessing the
using an index with negative values or zero. But we still may use indices
that are out-of-range in the positive range, or switch the indices, as in
Tab (Y, X) example we mentioned previously. These problems can
be avoided by defining range types for each dimension. This is the updated
Now, we not only avoid mistakes like
Tab (Y, X), but we also detect
them at compile time! This might decrease development time, since we don't
need to run the application in order to check for those issues.
Also, maintenance becomes easier as well. Because we're explicitly stating
the allowed ranges for
Y, developers can know how to
avoid constraint issues when accessing the
Tab table. We're also
formally indicating the expected behavior. For example, because we declare
X to be of
X_Range type, and that type is used in the first
Tab, we're documenting --- using the syntax of the Ada
language --- that
X is supposed to be used to access the first
Tab. Based on this information, developers that need
to maintain this application can immediately identify the purpose of
X and use the variable accordingly.
In this section, we discuss another example where the use of strong typing is relevant. Let's consider an application with the following requirements:
- The application receives the transmission of chunks of information.
- Each chunk contains two floating-point coefficients.
- Also, these chunks are received out of order, so that the chunk itself includes an index indicating its position in an ordered array.
- The application also receives a list of indices for the ordered array of chunks. This list --- a so-called selector --- is used to select two chunks from the array of ordered chunks.
- Due to external constraints, the application shall use the unordered
array; creating an array of ordered chunks shall be avoided.
- A function that returns an ordered array of chunks shall be available for testing purposes only.
- A function that returns the selected chunks shall be available for testing purposes only.
- A function that returns a mapping from the index of ordered chunks to the index of unordered chunks must be available.
For example, consider the following picture containing input chunks and a selector:
By using the mapping, we can select the correct chunks from the input (unordered) chunks. Also, we may create an array of ordered chunks for testing purposes.
Let's skip the discussion whether the design used in this application is good or not and assume that all requirements listed above are set on stone and can't be changed.
This is a typical specification of the main package:
And this is a typical specification of the
Test child package:
This is the corresponding body of the main package:
This is the corresponding body of the
Test child package:
Note that the information transmitted to the application might be
inconsistent due to errors in the transmission channel. For example, the
Chunk record) might be wrong. In a
real-world application, we should deal with those transmission errors.
However, for the discussion in this section, these problems are not
crucial, so that we can simplify the implementation by skipping error
Let's finally look at a test application that makes use of the package we've just implemented. In order to simplify the discussion, we'll initialize the array containing the unordered chunks and the selector directly in the application instead of receiving input data from an external source.
In this line of the test application, we retrieve the chunk using the index from the selector:
C1 : Chunk := C (M (S (I)));
C contains the unordered chunks and the index from
refers to the ordered chunks, we need to map between the ordered index
and the unordered index. This is achieved by the mapping stored in
If we'd use the ordered array of chunks, we could use the index from
S directly, as illustrated in the following function:
In this relatively simple application, we're already dealing with 3 indices:
- The index of the unordered chunks.
- The index of the ordered chunks.
- The index of the selector array.
The use of the wrong index to access an array can be a common source of issues. This becomes even more problematic when the application is extended and new features are implemented: the amount of arrays might increase and developers need to be especially careful not to use the wrong index.
For example, a mistake that developers can make when using the package
above is to skip the mapping and access the array of unordered chunks
directly with the index from the selector --- i.e.
C (S (I)) in the
test application above. Detecting this mistake requires extensive testing
and debugging, since both the array of unordered chunks and the array of
ordered chunks have the same range, so the corresponding indices can be
used interchangeably without raising constraint exceptions, even though
the behavior is not correct. Fortunately, we can use Ada's strong typing
to detect such issues in an early stage of the development.
Using stronger typing¶
In the previous implementation, we basically used the
for all indices. We can, however, declare individual types for each index
of the application. This is the updated specification of the main package:
By declaring these new types, we can avoid that the wrong index is used. Moreover, we're documenting --- using the syntax provided by the language --- which index is expected in each array or function from the package. This allows for better understanding of the package specification and makes maintenance easier, as well as it helps when implementing new features for the package.
This is the updated specification of the
Test child package:
Note that we also declared a separate type for the array of ordered
Ord_Chunks. This is needed because the arrays uses a
different index (
Ord_Chunk_Index) and therefore can't be the same
Chunks. For the same reason, we declared a separate type
for the array of selected chunks:
As a side note, we're now able to include a
Ord_Chunks that verifies that the index stored in the each chunk
matches the corresponding index of its position in the ordered array.
We also had to add a new private package that includes a function that
retrieves the range of an array of
Chunk type --- which are of
Chunk_Index type --- and converts the range using the
This is needed for example in the
Get_Mapping function, which has
to deal with indices of these two types. Although this makes the code a
little bit more verbose, it helps documenting the expected types in that
This is the corresponding update to the body of the main package:
This is the corresponding update to the body of the
This is the updated test application:
Apart from minor changes, the test application is basically still the same. However, if we now change the following line:
C1 : Chunk := C (M (S (I)));
C1 : Chunk := C (S (I));
The compiler will gives us an error, telling us that it expected the
Chunk_Index type, but found the
By using Ada's strong typing, we're detecting issues at compile time
instead of having to rely on extensive testing and debugging to detect
them. Basically, this eliminates a whole category of potential bugs
and reduces development time. At the same time, we're improving the
documentation of the source-code and facilitating further improvements
to the application.