Characters and Strings
Fortran’s support of characters and strings has evolved significantly through the recent standards. The earliest Fortran used Hollerith codes to represent characters. The
character type was introduced in the Fortran 77 standard. The original character type is essentially a fixed-length string. Variable-length strings were introduced in the Fortran 2003 standard.
Character Sets and Encodings
Like everything else in the computer, characters must be represented by a sequence of 0s and 1s. A catalogue of these representations is usually called an encoding.
The basic character set used by Fortran is ASCII, the American Standard Code for Information Interchange. Internationally, it is sometimes known as US-ASCII. Originally, 7 bits were used to represent data, resulting in a total of 128 (27) available characters. The first 32 are non-printing characters that were mainly needed by the mechanical devices for which the encoding was developed. Only a few non-printing characters are still used, among them line feed, carriage return, and even “bell.”
program ringer print *, char(7) end program
ASCII now uses 8 bits (extended ASCII) which doubles the number of available characters, but the first 128 character codes are the same as 7-bit ASCII. It was not as well standardized as ASCII, though standards exist for Latin alphabets (ISO 8859-1, ISO Latin 1) and Cyrillic (ISO 8859-5). Nearly all programming languages continue to restrict the characters allowed for statements to the original ASCII, even when larger character sets are the default and may be used for comments and output.
Even extended ASCII accommodates far too few characters to accommodate more than a handful of alphabets, much less other writing systems. Unicode was created to address this. The first 128 codes are still the 7-bit ASCII codes even with the millions available through Unicode. Fortran supports a version of Unicode called ISO_10646 for comments and output, though not all compilers implement it yet.
Example from the gfortran documentation. It may compile only with gfortran.
program character_kind use iso_fortran_env implicit none integer, parameter :: ascii = selected_char_kind ("ascii") integer, parameter :: ucs4 = selected_char_kind ('ISO_10646') character(kind=ascii, len=26) :: alphabet character(kind=ucs4, len=30) :: hello_world hello_world = ucs4_'Hello World and Ni Hao -- ' & // char (int (z'4F60'), ucs4) & // char (int (z'597D'), ucs4) open (output_unit, encoding='UTF-8') write (*,*) trim (hello_world) end program character_kind
Declare character variables with
CHARACTER(len=<N>) :: string
The value of
N must be an integer literal and it must be large enough to contain all characters in the string, including whitespace (tab character, space).
Arrays of character variables are permitted.
CHARACTER(len=3), DIMENSION(10) :: string
This is a 10-element, rank-1 array, each of whose elements is a length-3 character.
Variable-length strings are declared similarly to allocatable arrays.
CHARACTER(len=:), ALLOCATABLE :: string
The string must be allocated in the executable code before it is used.
num_chars=5 allocate(character(len=num_chars) :: string)
Allocatable arrays of allocatable strings are possible, but will require creating a derived type.
An allocatable string may be deallocated if necessary with the usual
Prior to Fortran 2003, the standard defined a module
iso_varying_string. Most compilers available now support the 2003 standard so will offer the standard variable string, but the iso_varying_string module provides a number of functions so may still be worthwhile. We will discuss standardized modules
Similar to arrays, slices or substrings may be extracted from strings. Characters are counted starting from 1 at the left and the upper bound of the range is included. If the lower bound is omitted, the first character is assumed. If the upper bound is omitted, characters from the specified character to the end of the string are included. To extract a single character, the range is from its position to the same position.
character(len=11) :: message message="Hello world" print *, message(1:5)," ",message(5:5)," ",message(7:)
This results in
Hello o world
The only string operator is concatenation
//. Try this out and see what
is printed. Remember to add the
implicit none, and
end program statements to your program, and indent properly.
character(len=5) :: word1, word2 character(len=20) :: message word1="Hello" word2="world" message=word1//" "//word2 print *, message
print *, message//" today"
You should see
Hello world today
A useful string function, especially for variable-length strings, is
A fixed-length string will always occupy the specified number of characters. The default is to left-justify nonblank characters in the field. This can be modified with
- Declare character variables large enough to hold the indicated strings. Make full_title at least 5 characters longer than you think necessary.
title="Jaws" subtitle="The Revenge" print *,len(title) full_title=title//":"//subtitle print *, full_title print *,len(full_title) print *,full_title(2:4)
- Change “Jaws” to “Paws” in full_title
- Make the strings variable sized. Use the
Solution with variable strings.
program strings implicit none character(len=:), allocatable :: title, subtitle, full_title character :: c integer :: nchars, n allocate(character(len=4) :: title) allocate(character(len=22) :: subtitle) title="Jaws" subtitle="The Revenge" nchars=len(title)+len(subtitle)+1 allocate(character(len=nchars) :: full_title) full_title=title//":"//subtitle print *, full_title print *, len(full_title) print *, full_title(2:4) full_title="P"//full_title(2:) print *, full_title end program