Character Variables
A char
variable is 1 byte (8 bits). It represents a single character.
char letter;
A char
is unsigned. C++ supports the signed char
type but it is not as commonly used. Since a signed char is effectively an 8-bit integer, it can be used in arithmetical expressions and some programmers who must write for devices with limited memory use the char
type to save space. A signed char can also be promoted to int consistently.
Another use of char
is to act as a byte
type since C, and older C++, do not support byte
. Newer C++ standards (from C++17) and compilers that support them offer a type std::byte
, but this is defined in terms of unsigned char
and is a
class, not a primitive type.
Byte types or their equivalents offer direct access to memory, which is organized by bytes.
C-Style Strings
C-style strings are actually arrays of individual characters. They must be declared with a fixed size, or allocated.
char cstr[8];
cstr="Hello";
The compiler automatically terminates each C-style string with an invisible “null” character. If the assigned value does not fill the array, it will be padded with blanks. It may not exceed the length of the string. The size must account for the null terminating character, so
char greeting[5]="Hello";
will result in an error, but
char greeting[6]="Hello";
works.
A C-style string may only be initialized to a quoted string when it is declared.
char greeting[6];
greeting="Hello";
is invalid.
C Style Character Operations
C++ supports C-style string functions. Include the <cstring>
header.
Function | Operation | Usage |
---|---|---|
strcpy | copy str2 to str1 | strcpy(str1,str2) |
strcat | concatenate str2 to str1 | strcat(str1,str2) |
strcmp | compare two strings | strcmp(str1,str2) |
strlen | length of string (excludes null) | strlen(str) |
Individual characters may be addressed using bracket notation. Each character is one item, and the count begins from zero and goes to strlen-1.
char greeting[8]="Hello";
std::cout<<greeting[0]<<"\n";
std::cout<<greeting[2]<<"\n";
Example:
#include <iostream>
#include <cstring>
int main() {
char greeting[6]="Hello";
char musical_instr[6]="Cello";
char two_strings[13]="";
std::cout<<strcmp(greeting,musical_instr)<<"\n";
std::cout<<strcat(two_strings,greeting)<<"\n";
std::cout<<strcat(two_strings,musical_instr)<<"\n";
std::cout<<strlen(greeting)<<"\n";
std::cout<<strcat(greeting,musical_instr)<<"\n";
std::cout<<greeting<<"\n";
std::cout<<strlen(greeting)<<"\n";
char str[6];
strcpy(str,greeting);
std::cout<<str<<"\n";
}
In the above code, pay attention to the lines
std::cout<<strlen(greeting)<<"\n";
std::cout<<strcat(greeting,musical_instr)<<"\n";
std::cout<<greeting<<"\n";
std::cout<<strlen(greeting)<<"\n";
char str[6];
strcpy(str,greeting);
std::cout<<str<<"\n";
What result did this code yield? On a Linux system with g++ the output was
5
HelloCello
HelloCello
10
HelloCello
The size of greeting
was doubled (not counting null terminators) even though it was declared size 6. The compiler did not check for this. The strcpy
function then copied it to another variable of size 6.
The result is a buffer overflow.
To see what can happen, compile and run the following code
#include <iostream>
#include <cstring>
int main(int argc, char **argv) {
char user[6];
char password[8];
std::cout<<"Enter your user id: ";
std::cin>>user;
std::cout<<"Enter your password: ";
std::cin>>password;
if (std::strcmp(password,"Eleventy")==0) {
std::cout<<"You have logged in\n";
}
else {
std::cout<<"Incorrect password\n";
}
}
Type in a short username (any string), then type Eleventy
as your password. It should work as expected. Now try typing a username that is longer than 10 characters and see what happens.
If using C-style strings and functions, guard against this by using
Function | Operation | Usage |
---|---|---|
strncpy | copy str2 to str1, max n bytes of str2 | strncpy(str1,str2,n) |
strncat | concatenate str2 to str1, max n bytes of str2 | strncat(str1,str2,n) |
One way to ensure that n
is correct is to use sizeof
, which returns a value in bytes.
strncpy(str1,str2,sizeof(str1)-1);
str1[strlen(str1)]='\0';
We must explicitly add the null character to the end of the target of the copy or even strncpy will overflow the buffer.
The strncat
function is more difficult to use correctly since it appends $n$ bytes from str2 regardless of the size of str1.
In general, it is best to avoid fixed-size char variables as much as possible, because C++ (and C) does not check C-style array bounds. Similar problems can occur with numerical arrays, but in those cases the result is typical a segmentation fault. Buffer overflows in characters can result in insecure programs.
Since we are programming in C++, not C, for most purposes it is better to use C++ strings (see here), which do not have these disadvantages.