May 2 2013

Static Cling

I was hanging out on the #macdev IRC channel on Freenode the other day when someone asked a question: “static has different meanings based on the context it is placed in, right?”. Indeed, it has different meaning. And yet it’s the same. Static is a C Koan.

An Ice Cream Koan

static controls scope, which is the visibility of an entity. It tell the compiler “Here is this thing that I’m using, but don’t let anyone else know about it.” It communicates that something, like a function or a variable, is an implementation detail and should not be made public.

What constitutes “public?” Stuff outside of a compilation unit.

What’s a compilation unit? It’s a term in C-style languages, which just refers to all the stuff that’s processed during a single invocation of the compiler, whether it’s gcc or clang. This is a typical compiler invocation:

clang -g -Wall -c thing1.m

The command tells the compiler to open up thing1.m, run it through the preprocessor, take that output and run it through the compiler, and save the compiled goodies to an object file named thing1.o. The preprocessed text that’s fed into the compiler is a compilation unit. It is possible for thing1.m to #import thing2.m, but it’ll be just one same compilation unit. But that is (hopefully) a rare occurrence.

static controls the visibility of a symbol outside of the compilation unit that’s being processed. Things prefixed with static are not visible outside of that compilation unit. Things not prefixed with static are visible. That’s pretty much it.

So what does “visible” mean? It means that other code can call it (for functions) or access / change it (for variables). It also means the function or variable can be looked up by name using a function like dlsym. This meaning of visible is orthogonal to the idea that there are debugging symbols and that values are visible inside of the debugger. static has no control over that.

This visibility is also independent of the presence of a function prototype or a variable declaration in a header file. If some other piece of code knows the name of a visible function, it can access the non-static symbol, even if that code doesn’t pull in the proper header file. Or if the function isn’t in any header files at all. It’s what allows us to call private API.

Static functions

Functions, by default, are visible everywhere, and so can be called from anywhere. Here’s a function in thing1.m:

void VisibleFunction (void) {
    printf ("Hi!  I'm visible!\n");
}

It’s totally visible. This one will be hidden:

static void InvisibleFunction (void) {
    printf ("Hi!  I'm hidden!\n");
}

Don’t believe me? You can ask nm to display the symbol table for linky-things:

% clang -g -Wall -c thing1.m
% nm thing1.o
0000000000000248 s EH_frame0
0000000000000152 s L_.str
0000000000000000 T _VisibleFunction
0000000000000260 S _VisibleFunction.eh
                 U _printf

Each line is a different symbol used in the file. You can see VisibleFunction (with a leading underscore, which is Just The Way OS X does things). The T stands for “Defined in the Text section”. VisibleFunction is indeed defined, because it has a body of code. S is for other symbols. In this case, some exception handling jazz. U is for undefined. That means the linker needs to mop up things and put in an address where printf can be found. The lower-case s’s are other symbols, such as another exception handling thing and the string character constants used in the printf’s.

OS X also has libtool which does what nm does, and more. But nm is available on every unix platform, so I’ll be using that.

Notice that InvisibleFunction is not in nm‘s output. It’s, well, invisible.

What are the implications of this? You can have multiple static functions with the same name in different compilation units. You don’t have to worry about your LogAllTheThings function being confused with someone else’s LogAllTheThings.

What happens if you leave off the static? The function is no longer private – it’s now global. You could get linker errors if you have two object files that define the same function. Say I have a second file, thing2.m, which is just thing1.m copied and compiled:

% cp thing1.m thing2.m
% clang -g -Wall -c thing2.m

And just to be pedantic, here’s the nm of the file:

% nm thing2.o
0000000000000248 s EH_frame0
0000000000000152 s L_.str
0000000000000000 T _VisibleFunction
0000000000000260 S _VisibleFunction.eh
                 U _printf

It’s got VisibleFunction defined in its text segment, too. Trying to mash them together gives you a linker error:

% clang -g -Wall *.o -o static
duplicate symbol _VisibleFunction in:
    thing1.o
    thing2.o
ld: 1 duplicate symbol for architecture x86_64

If there’s no conflict detected by the linker, it just means you just have a function that’s now visible.

Static “Module” Variables

I call variables that are declared outside of any functions “module variables”, to distinguish them from stuff that lives inside of a function. They’re also referred to as global variables, and have a lifespan of the duration of the program. Like with functions, any non-static variables in a compilation unit are visible to other compilation units. Here’s a variable declared like this, added to the top of thing1.m:

int foobage;

An nm of the resulting object file includes a new line:

% nm thing1.o
00000000000002c8 s EH_frame0
000000000000019c s L_.str
0000000000000000 T _VisibleFunction
00000000000002e0 S _VisibleFunction.eh
0000000000000198 C _foobage
                 U _printf

The capital C stands for a Common section symbol – it’ll be loaded and initialized to zero. Lines with a capital D means a Data section symbol, which happens if you assign a value on the declaration line. Because this symbol is visible, anyone can access foobage and change it. Putting a static in front of a module variable declaration:

static int invisibleFoobage;

Keeps it hidden.

OBTW, static variables are initialized to zero at program launch.

What happens if you leave off the static? It becomes a true global variable. What happens if there’s a conflict, like thing1.o has a visible foobage variable, and thing2.o has its own, distinct visible foobage? The linker will coalesce them. Any changes inside of thing1.o to foobage will, in essence, be visible to the code in thing2.o. Mayhem could possibly ensue, especially if the types aren’t miscible.

Static Function Variables

Just to make things even more subtly different-but-the-same, we’ve got static variables that live inside of functions:

void VisibleFunction (void) {
    printf ("Hi!  I'm visible!\n");
    static int g_force;
    printf ("Firey Phoenix number %d\n", g_force++);
}

Like static module variables, these are global variables. They’re initialized to zero and they hang around for the life of the program. Unlike the static module variables which have visibility in the entire compilation unit, static function variables only have visibility inside of their function or curly brace scope. Now the only code that can modify g_force is code inside of VisibleFunction. Of course, you can have code that takes the address of g_force and pass it to functions which can then turn around and change the value. But at least vending the pointer is under your control.

What happens if you leave off the static? The variable is just a local variable now. It won’t be initialized to anything sane, and the value won’t persist from function (or method) call to call. Hopefully bugs will manifest themselves quickly

Variable Declarations and Header Files

Why am I harping on the term “Compilation Unit” throughout this whole discussion? Why not just say “visibility in the source file”? The compilation unit is the aggregate of all the code that’s pulled in by the preprocessor, including all the header files that are directly (or indirectly) included. What happens if you have a header like this?

// things.h
void VisibleFunction (void);
int headerVariable;

Every compilation unit will get its own copy of the variable. Ordinarily the linker will sort things out, but you can run into problems when building shared libraries or plugins. Rather than resolving to the running process’s headerVariable at load time, code will happily use storage reserved for the shared library, leading to subtle bugs. I don’t like subtle edge cases.

What happens if the header declares the variable static-styles?

// things.h
void VisibleFunction (void);
static int headerVariable;

This means every compilation unit will get its own copy of headerVariable. This might not be a problem – if you’re expecting multiple compilation units to access the same memory location, you wouldn’t have made it static to start out with.

The downside is you can get a compiler warning about a static variable that’s not used. The compiler is thinking “You go to the trouble of declaring this variable. You say it’s static, so only code in this compilation unit can touch it. But no code actually touches it. It’s superfluous. So you may be doing something wrong.” And it complains:

./things.h:5:12: warning: unused variable 'headerVariable' [-Wunused-variable]
static int headerVariable;

Of course, you are driven to fix all your warnings. You can silence it by prefixing the variable declaration with __unused:

__unused static int headerVariable;

But now I have to ask, “What’s the point?” This is the time I’d make a comment in the code review system or drop a quick email asking “so, what does this really mean? What is t trying to accomplish?”

externinator

There are legitimate times you want to expose a global variable, such as Cocoa’s NSString constants that are used as dictionary keys (looking at you, NSURLLocalizedNameKey). If you have a plain old variable in a header file, every object file will get its own copy leading to extra work for the linker to coalesce things. If you make it static, you can cause warnings.

There’s an additional keyword, extern, which tells the compiler “Hey, this thing? Trust me that it’ll actually be defined in a compilation unit somewhere. Don’t worry. It’ll be there at link time.” So, the declaration in things.h would look like

extern int headerVariable;

This fixes both of the problems. The definition should appear in just one place, and you don’t have any duplication issues to worry about.

Of course, someone, somewhere, will need to have a non-extern declaration of this, otherwise the compiler will complain:

Undefined symbols for architecture x86_64:
  "_headerVariable", referenced from:
      _main in main.o
ld: symbol(s) not found for architecture x86_64

This means You’ve broken the promise that it’ll actually be defined somewhere. Typically, you’ll have a header file that’s the public interface to some .c or .m file. You’d have the extern variables in the header, and a non-extern declaration in the .c or .mfile.

Of course, you really, really should think twice before making variables directly accessible.

Static Analysis

So, back to the original question, “static has different meanings based on the context it is placed in, right?”. At a hight level, it means “consider this to be private”. Where the static is placed controls whether it’s controlling function visibility or variable visibility. There’s also function statics which restrict the visibility even more.

(Dig grungy details like this? You’ll love Advanced Mac OS X Programming : The Big Nerd Ranch Guide. It’s chock full of language and command goodness.)

3 Comments

  1. Jordan

    Aaand then there’s C++ static members, which mean “I’m just using this class as a namespace, really” and have nothing to do with linkage visibility. *sigh*

    • Yeah, I was avoiding talking about C++ :-) static isn’t so bad, but then the next logical question would be “tell me about the different meanings of virtual…”

  2. Javier Soto

    Correct me if I’m wrong, but there’s no difference between

    int variable;

    and

    extern int variable;

    extern is the default linkage.

Leave a Comment

Join the discussion. Do not worry, your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>