Characters and Strings

Fortran’s support of characters and strings has evolved significantly through the recent standards. The earliest Fortran used Hollerith codes to represent characters. The character type was introduced in the Fortran 77 standard. The original character type is essentially a fixed-length string. Variable-length strings were introduced in the Fortran 2003 standard.

Character Sets and Encodings

Like everything else in the computer, characters must be represented by a sequence of 0s and 1s. A catalogue of these representations is usually called an encoding.

The basic character set used by Fortran is ASCII, the American Standard Code for Information Interchange. Internationally, it is sometimes known as US-ASCII. Originally, 7 bits were used to represent data, resulting in a total of 128 (27) available characters. The first 32 are non-printing characters that were mainly needed by the mechanical devices for which the encoding was developed. Only a few non-printing characters are still used, among them line feed, carriage return, and even “bell.”

Exercise:

program ringer
    print *, char(7)
end program

ASCII now uses 8 bits (extended ASCII) which doubles the number of available characters, but the first 128 character codes are the same as 7-bit ASCII. It was not as well standardized as ASCII, though standards exist for Latin alphabets (ISO 8859-1, ISO Latin 1) and Cyrillic (ISO 8859-5). Nearly all programming languages continue to restrict the characters allowed for statements to the original ASCII, even when larger character sets are the default and may be used for comments and output.

Even extended ASCII accommodates far too few characters to accommodate more than a handful of alphabets, much less other writing systems. Unicode was created to address this. The first 128 codes are still the 7-bit ASCII codes even with the millions available through Unicode. Fortran supports a version of Unicode called ISO_10646 for comments and output, though not all compilers implement it yet.

Example from the gfortran documentation. It may compile only with gfortran.

program character_kind
use iso_fortran_env
implicit none
   integer, parameter :: ascii = selected_char_kind ("ascii")
   integer, parameter :: ucs4  = selected_char_kind ('ISO_10646')
   character(kind=ascii, len=26) :: alphabet
   character(kind=ucs4,  len=30) :: hello_world
          
   hello_world = ucs4_'Hello World and Ni Hao -- ' &
                 // char (int (z'4F60'), ucs4)     &
                 // char (int (z'597D'), ucs4)
   open (output_unit, encoding='UTF-8')
   write (*,*) trim (hello_world)
end program character_kind

Fixed-Length Strings

Declare character variables with

CHARACTER(len=<N>) :: string

The value of N must be an integer literal and it must be large enough to contain all characters in the string, including whitespace (tab character, space).

Arrays of character variables are permitted.

CHARACTER(len=3), DIMENSION(10) :: string

This is a 10-element, rank-1 array, each of whose elements is a length-3 character.

Variable-Length Strings

Variable-length strings are declared similarly to allocatable arrays.

CHARACTER(len=:), ALLOCATABLE :: string

The string must be allocated in the executable code before it is used.

   num_chars=5
   allocate(character(len=num_chars) :: string)

Allocatable arrays of allocatable strings are possible, but will require creating a derived type.

An allocatable string may be deallocated if necessary with the usual DEALLOCATE intrinsic.

DEALLOCATE(str)

Prior to Fortran 2003, the standard defined a module iso_varying_string. Most compilers available now support the 2003 standard so will offer the standard variable string, but the iso_varying_string module provides a number of functions so may still be worthwhile. We will discuss standardized modules later.

Substrings

Similar to arrays, slices or substrings may be extracted from strings. Characters are counted starting from 1 at the left and the upper bound of the range is included. If the lower bound is omitted, the first character is assumed. If the upper bound is omitted, characters from the specified character to the end of the string are included. To extract a single character, the range is from its position to the same position.

character(len=11) :: message

message="Hello world"
print *, message(1:5)," ",message(5:5)," ",message(7:)

This results in

 Hello o world

Concatenation

The only string operator is concatenation //. Try this out and see what is printed. Remember to add the program, implicit none, and end program statements to your program, and indent properly.

character(len=5)  :: word1, word2
character(len=20) :: message

word1="Hello"
word2="world"
message=word1//" "//word2
print *, message

Now try

print *, message//" today"

You should see

Hello world          today

String Length

A useful string function, especially for variable-length strings, is LEN(S). A fixed-length string will always occupy the specified number of characters. The default is to left-justify nonblank characters in the field. This can be modified with intrinsics.

Exercises

  • Declare character variables large enough to hold the indicated strings. Make full_title at least 5 characters longer than you think necessary.
title="Jaws"
subtitle="The Revenge"
print *,len(title)
full_title=title//":"//subtitle
print *, full_title
print *,len(full_title)
print *,full_title(2:4)
  1. Change “Jaws” to “Paws” in full_title
  2. Make the strings variable sized. Use the len function.
Solution with variable strings.

program strings
implicit none
character(len=:), allocatable :: title, subtitle, full_title
character                     :: c
integer                       :: nchars, n

   allocate(character(len=4) :: title)
   allocate(character(len=22) :: subtitle)

   title="Jaws"
   subtitle="The Revenge"

   nchars=len(title)+len(subtitle)+1
   allocate(character(len=nchars) :: full_title)
   full_title=title//":"//subtitle
   print *, full_title
   print *, len(full_title)
   print *, full_title(2:4)

   full_title="P"//full_title(2:)
   print *, full_title

end program

Previous
Next