This is a short response to lovesegfault’s post of the same
name, which I
recommend as the clearest explainer I’ve seen for this subtle detail of C.
I wanted to explain my own favourite distinction between pointers and arrays.
For expediency, I will assume general knowledge of C and
linkers/loaders/ELF.
Let’s start with a naive approach to defining a global string constant:
const char* kMyStringMut = "hello world";
If we run clang -c foo.c -o foo.o && objdump -x foo.o
on this file, we can
observe two problems:
SYMBOL TABLE:
0000000000000000 l d .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g O .data 0000000000000008 kMyStringMut
RELOCATION RECORDS FOR [.data]:
OFFSET TYPE VALUE
0000000000000000 R_X86_64_64 .rodata.str1.1
A quick explanation:
-
.rodata.str1.1
contains the string’s actual bytes.
-
kMyStringMut
is an 8-byte slot for a 64-bit address.
- The relocation records ensure
kMyStringMut
will contain the address of
.rodata.str1.1
at runtime.
But what is wrong with this picture? First of all, kMyStringMut
has been
placed in .data
. That isn’t read-only! We forgot to include the top-level
const
for the global variable itself. It’s not a constant at all, and any code
in the program could reassign it with kMyStringMut = myNewString
. Let’s fix
that:
const char* const kMyStringIndirect = "hello world";
SYMBOL TABLE:
0000000000000000 l d .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g O .data.rel.ro 0000000000000008 kMyStringIndirect
RELOCATION RECORDS FOR [.data.rel.ro]:
OFFSET TYPE VALUE
0000000000000000 R_X86_64_64 .rodata.str1.1
There we go! It’s in .data.rel.ro
which is read-only as intended.
But, wait a minute, why does the ELF object file even store a pointer to the
string’s bytes, and not just the bytes themselves? Because C arrays are not
pointers. Let’s switch to an array:
const char kMyString[] = "hello world";
SYMBOL TABLE:
0000000000000000 g O .rodata 000000000000000c kMyString
Finally! It’s in rodata
, which is read-only, and the symbol directly labels
the string bytes, without any indirection through ELF relocations.
Of course, when you use kMyString
in the program, it will insert uses of this
ELF symbol, and the linker/loader system will ensure it uses the eventual
runtime address of kMyString
, but this is true of any global variable, and
is not unique to arrays at all. Arrays provide the most direct, least
foot-gun-prone global string constants.
In summary:
// Mutable pointer to immutable bytes.
const char* kMyStringMut = "hello world";
// Immutable pointer to immutable bytes.
const char* const kMyStringIndirect = "hello world";
// Immutable bytes.
const char kMyString[] = "hello world";