Standard library: Strings
In previous chapters, we've seen source-code examples using the String
type, which is a fixed-length string type — essentialy, it's an array
of characters. In many cases, this data type is good enough to deal with
textual information. However, there are situations that require more advanced
text processing. Ada offers alternative approaches for these cases:
Bounded strings: similar to fixed-length strings, bounded strings have a maximum length, which is set at its instantiation. However, bounded strings are not arrays of characters. At any time, they can contain a string of varied length — provided this length is below or equal to the maximum length.
Unbounded strings: similar to bounded strings, unbounded strings can contain strings of varied length. However, in addition to that, they don't require a maximum length to be specified at the declaration of a string. In this sense, they are very flexible.
For further reading...
Although we don't specify a maximum length for unbounded strings, the limit is defined by the Reference Manual:
An object of type
Unbounded_String
represents aString
whose low bound is 1 and whose length can vary conceptually between 0 andNatural'Last
.
Therefore, the implicit maximum length is Natural'Last
. In
contrast, bounded strings have an explicit maximum length that is specified
when the Generic_Bounded_Length
package is instantiated (as we'll
see later on).
Another difference between bounded and unbounded strings is the strategy that is used by the compiler to allocate memory for those strings. When using GNAT, bounded strings are allocated on the stack, while unbounded strings are allocated on the heap.
The following sections present an overview of the different string types and common operations for string types.
String operations
Operations on standard (fixed-length) strings are available in the
Ada.Strings.Fixed
package. As mentioned previously, standard strings
are arrays of elements of Character
type with a
fixed-length. That's why this child package is called Fixed
.
One of the simplest operations provided is counting the number of
substrings available in a string (Count
) and finding their
corresponding indices (Index
). Let's look at an example:
with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_IO; use Ada.Text_IO; procedure Show_Find_Substring is S : String := "Hello" & 3 * " World"; P : constant String := "World"; Idx : Natural; Cnt : Natural; begin Cnt := Ada.Strings.Fixed.Count (Source => S, Pattern => P); Put_Line ("String: " & S); Put_Line ("Count for '" & P & "': " & Natural'Image (Cnt)); Idx := 0; for I in 1 .. Cnt loop Idx := Index (Source => S, Pattern => P, From => Idx + 1); Put_Line ("Found instance of '" & P & "' at position: " & Natural'Image (Idx)); end loop; end Show_Find_Substring;
We initialize the string S
using a multiplication. Writing
"Hello" & 3 * " World"
creates the string Hello World World World
.
We then call the function Count
to get the number of instances
of the word World
in S
. Next we call the function Index
in a
loop to find the index of each instance of World
in S
.
That example looked for instances of a specific substring. In the next
example, we retrieve all the words in the string. We do this using
Find_Token
and specifying whitespaces as separators. For example:
with Ada.Strings; use Ada.Strings; with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Strings.Maps; use Ada.Strings.Maps; with Ada.Text_IO; use Ada.Text_IO; procedure Show_Find_Words is S : String := "Hello" & 3 * " World"; F : Positive; L : Natural; I : Natural := 1; Whitespace : constant Character_Set := To_Set (' '); begin Put_Line ("String: " & S); Put_Line ("String length: " & Integer'Image (S'Length)); while I in S'Range loop Find_Token (Source => S, Set => Whitespace, From => I, Test => Outside, First => F, Last => L); exit when L = 0; Put_Line ("Found word instance at position " & Natural'Image (F) & ": '" & S (F .. L) & "'"); -- & "-" & F'Img & "-" & L'Img I := L + 1; end loop; end Show_Find_Words;
We pass a set of characters to be used as delimitators to the procedure
Find_Token
. This set is a member of the Character_Set
type from the
Ada.Strings.Maps
package. We call the To_Set
function (from the
same package) to initialize the set to Whitespace
and then call
Find_Token
to loop over each valid index and find the starting index of
each word. We pass Outside
to the Test
parameter of the
Find_Token
procedure to indicate that we're looking for indices that
are outside the Whitespace
set, i.e. actual words. The First
and
Last
parameters of Find_Token
are output parameters that indicate
the valid range of the substring. We use this information to display the
string (S (F .. L)
).
The operations we've looked at so far read strings, but don't modify them. We next discuss operations that change the content of strings:
Operation |
Description |
---|---|
Insert |
Insert substring in a string |
Overwrite |
Overwrite a string with a substring |
Delete |
Delete a substring |
Trim |
Remove whitespaces from a string |
All these operations are available both as functions or procedures.
Functions create a new string but procedures perform the operations in
place. The procedure will raise an exception if the constraints of the
string are not satisfied. For example, if we have a string S
containing
10 characters, inserting a string with two characters (e.g. "!!"
) into
it produces a string containing 12 characters. Since it has a fixed length,
we can't increase its size. One possible solution in this case is to
specify that truncation should be applied while inserting the substring.
This keeps the length of S
fixed. Let's see an example that makes use
of both function and procedure versions of Insert
, Overwrite
, and
Delete
:
with Ada.Strings; use Ada.Strings; with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_IO; use Ada.Text_IO; procedure Show_Adapted_Strings is S : String := "Hello World"; P : constant String := "World"; N : constant String := "Beautiful"; procedure Display_Adapted_String (Source : String; Before : Positive; New_Item : String; Pattern : String) is S_Ins_In : String := Source; S_Ovr_In : String := Source; S_Del_In : String := Source; S_Ins : String := Insert (Source, Before, New_Item & " "); S_Ovr : String := Overwrite (Source, Before, New_Item); S_Del : String := Trim (Delete (Source, Before, Before + Pattern'Length - 1), Ada.Strings.Right); begin Insert (S_Ins_In, Before, New_Item, Right); Overwrite (S_Ovr_In, Before, New_Item, Right); Delete (S_Del_In, Before, Before + Pattern'Length - 1); Put_Line ("Original: '" & Source & "'"); Put_Line ("Insert: '" & S_Ins & "'"); Put_Line ("Overwrite: '" & S_Ovr & "'"); Put_Line ("Delete: '" & S_Del & "'"); Put_Line ("Insert (in-place): '" & S_Ins_In & "'"); Put_Line ("Overwrite (in-place): '" & S_Ovr_In & "'"); Put_Line ("Delete (in-place): '" & S_Del_In & "'"); end Display_Adapted_String; Idx : Natural; begin Idx := Index (Source => S, Pattern => P); if Idx > 0 then Display_Adapted_String (S, Idx, N, P); end if; end Show_Adapted_Strings;
In this example, we look for the index of the substring World
and
perform operations on this substring within the outer string. The procedure
Display_Adapted_String
uses both versions of the operations. For the
procedural version of Insert
and Overwrite
, we apply truncation to
the right side of the string (Right
). For the Delete
procedure, we
specify the range of the substring, which is replaced by whitespaces. For
the function version of Delete
, we also call Trim
which trims the
trailing whitespace.
Limitation of fixed-length strings
Using fixed-length strings is usually good enough for strings that are initialized when they are declared. However, as seen in the previous section, procedural operations on strings cause difficulties when done on fixed-length strings because fixed-length strings are arrays of characters. The following example shows how cumbersome the initialization of fixed-length strings can be when it's not performed in the declaration:
with Ada.Text_IO; use Ada.Text_IO; procedure Show_Char_Array is S : String (1 .. 15); -- Strings are arrays of Character begin S := "Hello "; -- Alternatively: -- -- #1: -- S (1 .. 5) := "Hello"; -- S (6 .. S'Last) := (others => ' '); -- -- #2: -- S := ('H', 'e', 'l', 'l', 'o', -- others => ' '); Put_Line ("String: " & S); Put_Line ("String Length: " & Integer'Image (S'Length)); end Show_Char_Array;
In this case, we can't simply write S := "Hello"
because the
resulting array of characters for the Hello
constant has a different
length than the S
string. Therefore, we need to include trailing
whitespaces to match the length of S
. As shown in the example, we could
use an exact range for the initialization ( S (1 .. 5)
) or use an
explicit array of individual characters.
When strings are initialized or manipulated at run-time, it's usually better to use bounded or unbounded strings. An important feature of these types is that they aren't arrays, so the difficulties presented above don't apply. Let's start with bounded strings.
Bounded strings
Bounded strings are defined in the
Ada.Strings.Bounded.Generic_Bounded_Length
package. Because
this is a generic package, you need to instantiate it and set the
maximum length of the bounded string. You can then declare bounded
strings of the Bounded_String
type.
Both bounded and fixed-length strings have a maximum length that they can hold. However, bounded strings are not arrays, so initializing them at run-time is much easier. For example:
with Ada.Strings; use Ada.Strings; with Ada.Strings.Bounded; with Ada.Text_IO; use Ada.Text_IO; procedure Show_Bounded_String is package B_Str is new Ada.Strings.Bounded.Generic_Bounded_Length (Max => 15); use B_Str; S1, S2 : Bounded_String; procedure Display_String_Info (S : Bounded_String) is begin Put_Line ("String: " & To_String (S)); Put_Line ("String Length: " & Integer'Image (Length (S))); -- String: -- S'Length => ok -- Bounded_String: -- S'Length => compilation error: -- bounded strings are -- not arrays! Put_Line ("Max. Length: " & Integer'Image (Max_Length)); end Display_String_Info; begin S1 := To_Bounded_String ("Hello"); Display_String_Info (S1); S2 := To_Bounded_String ("Hello World"); Display_String_Info (S2); S1 := To_Bounded_String ("Something longer to say here...", Right); Display_String_Info (S1); end Show_Bounded_String;
By using bounded strings, we can easily assign to S1
and S2
multiple times during execution. We use the To_Bounded_String
and
To_String
functions to convert, in the respective direction, between
fixed-length and bounded strings. A call to To_Bounded_String
raises an
exception if the length of the input string is greater than the maximum
capacity of the bounded string. To avoid this, we can use the truncation
parameter (Right
in our example).
Bounded strings are not arrays, so we can't use the 'Length
attribute as we did for fixed-length strings. Instead, we call the
Length
function, which returns the length of the bounded string. The
Max_Length
constant represents the maximum length of the bounded string
that we set when we instantiated the package.
After initializing a bounded string, we can manipulate it. For example, we
can append a string to a bounded string using Append
or concatenate
bounded strings using the &
operator. Like so:
with Ada.Strings; use Ada.Strings; with Ada.Strings.Bounded; with Ada.Text_IO; use Ada.Text_IO; procedure Show_Bounded_String_Op is package B_Str is new Ada.Strings.Bounded.Generic_Bounded_Length (Max => 30); use B_Str; S1, S2 : Bounded_String; begin S1 := To_Bounded_String ("Hello"); -- Alternatively: -- -- A := Null_Bounded_String & "Hello"; Append (S1, " World"); -- Alternatively: -- Append (A, " World", Right); Put_Line ("String: " & To_String (S1)); S2 := To_Bounded_String ("Hello!"); S1 := S1 & " " & S2; Put_Line ("String: " & To_String (S1)); end Show_Bounded_String_Op;
We can initialize a bounded string with an empty string using the
Null_Bounded_String
constant. Also, we can use the Append
procedure
and specify the truncation mode like we do with the To_Bounded_String
function.
Unbounded strings
Unbounded strings are defined in the Ada.Strings.Unbounded
package.
This is not a generic package, so we don't need to instantiate it before
using the Unbounded_String
type. As you may recall from the previous
section, bounded strings require a package instantiation.
Unbounded strings are similar to bounded strings. The main difference is that they can hold strings of any size and adjust according to the input string: if we assign, e.g., a 10-character string to an unbounded string and later assign a 50-character string, internal operations in the container ensure that memory is allocated to store the new string. In most cases, developers don't need to worry about these operations. Also, no truncation is necessary.
Initialization of unbounded strings is very similar to bounded strings. Let's look at an example:
with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings; use Ada.Strings; with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; procedure Show_Unbounded_String is S1, S2 : Unbounded_String; procedure Display_String_Info (S : Unbounded_String) is begin Put_Line ("String: " & To_String (S)); Put_Line ("String Length: " & Integer'Image (Length (S))); end Display_String_Info; begin S1 := To_Unbounded_String ("Hello"); -- Alternatively: -- -- A := Null_Unbounded_String & "Hello"; Display_String_Info (S1); S2 := To_Unbounded_String ("Hello World"); Display_String_Info (S2); S1 := To_Unbounded_String ("Something longer to say here..."); Display_String_Info (S1); end Show_Unbounded_String;
Like bounded strings, we can assign to S1
and S2
multiple times
during execution and use the To_Unbounded_String
and To_String
functions to convert back-and-forth between fixed-length strings and
unbounded strings. However, in this case, truncation is not needed.
And, just like for bounded strings, you can use the Append
procedure and
the &
operator for unbounded strings. For example:
with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; procedure Show_Unbounded_String_Op is S1, S2 : Unbounded_String := Null_Unbounded_String; begin S1 := S1 & "Hello"; S2 := S2 & "Hello!"; Append (S1, " World"); Put_Line ("String: " & To_String (S1)); S1 := S1 & " " & S2; Put_Line ("String: " & To_String (S1)); end Show_Unbounded_String_Op;
In this example, we're concatenating the unbounded S1
and S2
strings with the "Hello"
and "Hello!"
strings, respectively.
Also, we're using the Append
procedure, just like we did with bounded
strings.