Scott Olson

C Arrays Are Not Pointers: An ELF’s Perspective

This is a short response to lovesegfault’s post of the same name, which I recommend as the clearest explainer I’ve seen for this subtle detail of C.

I wanted to explain my own favourite distinction between pointers and arrays. For expediency, I will assume general knowledge of C and linkers/loaders/ELF. Let’s start with a naive approach to defining a global string constant:

const char* kMyStringMut = "hello world";

If we run clang -c foo.c -o foo.o && objdump -x foo.o on this file, we can observe two problems:

SYMBOL TABLE:
0000000000000000 l    d  .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g     O .data  0000000000000008 kMyStringMut

RELOCATION RECORDS FOR [.data]:
OFFSET           TYPE              VALUE
0000000000000000 R_X86_64_64       .rodata.str1.1

A quick explanation:

  1. .rodata.str1.1 contains the string’s actual bytes.
  2. kMyStringMut is an 8-byte slot for a 64-bit address.
  3. The relocation records ensure kMyStringMut will contain the address of .rodata.str1.1 at runtime.

But what is wrong with this picture? First of all, kMyStringMut has been placed in .data. That isn’t read-only! We forgot to include the top-level const for the global variable itself. It’s not a constant at all, and any code in the program could reassign it with kMyStringMut = myNewString. Let’s fix that:

const char* const kMyStringIndirect = "hello world";
SYMBOL TABLE:
0000000000000000 l    d  .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g     O .data.rel.ro   0000000000000008 kMyStringIndirect

RELOCATION RECORDS FOR [.data.rel.ro]:
OFFSET           TYPE              VALUE
0000000000000000 R_X86_64_64       .rodata.str1.1

There we go! It’s in .data.rel.ro which is read-only as intended.

But, wait a minute, why does the ELF object file even store a pointer to the string’s bytes, and not just the bytes themselves? Because C arrays are not pointers. Let’s switch to an array:

const char kMyString[] = "hello world";
SYMBOL TABLE:
0000000000000000 g     O .rodata        000000000000000c kMyString

Finally! It’s in rodata, which is read-only, and the symbol directly labels the string bytes, without any indirection through ELF relocations.

Of course, when you use kMyString in the program, it will insert uses of this ELF symbol, and the linker/loader system will ensure it uses the eventual runtime address of kMyString, but this is true of any global variable, and is not unique to arrays at all. Arrays provide the most direct, least foot-gun-prone global string constants.

In summary:

// Mutable pointer to immutable bytes.
const char* kMyStringMut = "hello world";

// Immutable pointer to immutable bytes.
const char* const kMyStringIndirect = "hello world";

// Immutable bytes.
const char kMyString[] = "hello world";