Dec 5 2013

Foggy Macro Breakdown

I’m a big fan of caveman debugging. That is, using log statements to gather information and visualize a program’s control flow. But I only use it very tactically: I need this one piece of information. Add a log or two and see what I get. Once I solve the problem I take out the logs before sending the code out for review and checking it in. I consider caveman debugging different from the logging an app does to report its activity or health. This should be designed like any other feature, and not be dependent upon developers randomly scattering log statements around the code base.

I have worked with some friends that caveman debug, and then leave them in there “just in case.” It’s your program, so knock yourself out. Recently, though, I was doing a pair with a friend and the question came up – “that log statement there. Where did it come from?”, because it had some interesting information.

My usual solution for that is to put a symbolic breakpoint on NSLog and see where it gets triggered. Unfortunately this doesn’t scale if you have thousands of lines of log output. Wouldn’t it be cool if we could rewrite NSLog to tell us where it was actually called from? It’s actually pretty easy – just drop this macro at the end of your precompiled header:

#define NSLog(format, ...) do { printf(stderr, "%s-%d: ", __FILE__, __LINE__); (NSLog)((format), ##__VA_ARGS__); } while (0)

Ok, that’s a little hard to read. You can also split it across multiple lines, but be sure to escape the newlines with backslashes so it appears as one line to the preprocessor. Makes it easier for us regular humans to read:

#define NSLog(format, ...) do {                      \
    fprintf(stderr, "%s-%d: ", __FILE__, __LINE__);  \
    (NSLog)((format), ##__VA_ARGS__);                \
} while (0)

Now every NSLog called from within our program’s code is prefixed with where it came from.

MJScoreView.m-108: 2013-12-04 14:08:47.945 MusicMonger[25078:c07] drawing measure 0 with width 513.628235
MJMetrics.m-453: 2013-12-04 14:08:47.947 MusicMonger[25078:c07] spacing 0.400000
MJScoreView.m-108: 2013-12-04 14:08:47.948 MusicMonger[25078:c07] drawing measure 1 with width 160.247223

“How does that work” was the next question asked, so I figured I’d write up how this macro works. There’s nothing really complicated about it – it’s just built from a number of common building blocks.

TL;DR: Macro hygiene 101, along with variadic argument handling.

Do Be Square Do

First on the list is the do { … } while (0) loop surrounding the body of the macro:

Macro do while

That is a bit of macro hygiene you should use for any macro that expands to multiple statements. It turns the macro body into a single statement.

Suppose you have this simpler macro:

#define LogError()          \
printf("Saw an error!\n");  \
error_count++;              \

And then you go to use it:

if (errorHappened) {
    LogError ();
}

Life is groovy. It expands into:

if (errorHappened) {
    printf("Saw an error!\n");
    error_count++;
}

Errors happen, lines get printed, and the error count increases. It passes all the unit tests, too!

Then your coworker who doesn’t brace their if statements uses your macro:

if (errorHappened)
    LogError ();

Suddenly the error counts are through the roof, but no logging. What happened? Here’s the macro expansion:

if (errorHappened)
    printf("Saw an error!\n");
    error_count++;

Which is actually interpreted by the compiler as

if (errorHappened)
    printf("Saw an error!\n");
error_count++;

The error count is always incremented, even when there’s no error. do { .. } while (0) turns the statement into a single expression:

#define LogError() do {         \
    printf("Saw an error!\n");  \
    error_count++;              \
} while (0)

Which expands to:

if (errorHappened) do {
    printf("Saw an error!\n");
    error_count++;
} while (0);

And happiness is restored. Note that you cannot just use braces because you’ll have a stray semicolon that can confuse if/else statements:

#define LogError() {            \
    printf("Saw an error!\n");  \
    error_count++;              \
}

if (errorHappened)
    LogError();
else
    HaveANiceDay();

This expands to

if (errorHappened) {
    printf("Saw an error!\n");
    error_count++;
}; else
    HaveANiceDay();

Using do { ... } while(0) is sometimes said to “eat the semicolon.”

No Argumenting!

Next up is the first macro argument. I surround its use by parens:

Macro argument

Macros do not obey the rules of the C language. They don’t obey many rules at all, being purely textual substitutions. You not only have to worry about what your macro will expand in to, you also have to worry about the different evil corner cases that programmers will create as they use your macro.

Consider this macro:

#define Square(x) x * x

Seems simple enough. The unit test passes:

assert (Square(5) == 25);

Then you start getting weird mathematical errors later one. “Dude, Square is returning something wrong for me. Can you take a look?” Here’s the code:

blah = Square(2+3);

Which evaluates to eleven. But it should go to twenty five. What happened? Remember that macros are dumb. It’s just text.

Square(2+3)

expands into

2 + 3 * 2 + 3

Once the textual substitution is done, the code is fed to the compiler which applies C’s operator precedence:

2 + (3 * 2) + 3

Which is 2 + 6 + 3, eleven.

You can fix this by wrapping your arguments with parens:

#define Square(x) (x) * (x)

Which expands to

(2+3) * (2+3)

That will yield 25. For even more safety, wrap the whole expression in parens too:

#define Square(x) ((x)*(x))

Not only does it look kind of like a drunk owl, it is a safer expression.

Statement side-effects are another problem with macro expansion. What does this mean?

Square(i++);

It actually doesn’t mean anything. It expands to

((i++) * (i++))

Which according to the C standards is undefined. You can’t have multiple pre- or post- increment operations within a single “sequence point”. Sequence points are kind of a rabbit hole to go down, but rest assured that i++ * i++ has an undefined value. Be warned that “seems to work the way I intend” is a member of the set of undefined behavior.

You can get similar bad behavior by using a function call as a macro argument:

Square(getchar());

Granted, squaring the next character from standard-in is nonsensical, but work with me. This expands to

( (getchar()) * (getchar()))

getchar() is now called twice, most likely with different values. If nothing else, you’ve unexpectedly eaten the next character.

You can work around this by evaluating the expression only once and assigning it to a variable declared in the do { ... } while (0) scope. There’s also a gcc extension (supported by clang) called statement expressions that avoid some of these pitfalls.

For this macro, I’m not using format twice, so I don’t care about possible multiple evaluations of the argument. I do surround it with parens just out of habit.

Dude, Where’s my Car?

The most useful part of the macro is printing the file and line information:

Macro file line

__FILE__ and __LINE__ are built-in macros provided by the preprocessor. They evaluate to the path of the currently compiling file (as a char *) and the line in question. __FILE__ and __LINE__ aren’t evaluated when the NSLog macro is defined, though, but when the NSLog macro actually used. Therefore, whenever this NSLog macro is expanded, __FILE__ and __LINE__ will be the location where the programmer put in an NSLog.

fprintf is used to send the text to the standard error stream, which is where NSLog sends its output. I did not add a newline at the end of this printf so that NSLog‘s output will be on the same line as the file and line.

There is no escape

About that other set of semirandom parentheses:

Macro nslog

The inner NSLog is surrounded by parentheses. This is a technique to prevent macro expansion of a symbol. Say you knew of the existence of this NSLog wrapper, and you didn’t want it used for whatever reason. Defeat the macro expansion like so:

NSLog (@"SMOOK %@, %d", @"bork", 23);
(NSLog) (@"HA-HA!");

This prints out:

where-log-macro.m-25: 2013-12-04 14:16:13.567 where-log-macro[25165:303] SMOOK bork, 23
2013-12-04 14:16:13.569 where-log-macro[25165:303] HA-HA!

I didn’t have to put NSLog in parens here because macros aren’t recursive, but I do it anyway as a signal to whomever is reading the macro “I want to use the symbol NSLog without any disturbance by the preprocessor.”

Vari-wha?

Now for the real fun part!

Macro variadic

NSLog is a “variadic” function, meaning it can take an arbitrary number arguments. In C, variadic functions have one or more required arguments, then zero or more optional arguments. NSLog processes the optional arguments by looking at the first (required) argument – the format string – and then using va_start / va_arg / va_end to locate and process the arguments based on the format specifiers in the format string.

Macros can be variadic too. You can’t access individual arguments in a variadic macro, but you can move them around in bulk.

The three dots in the macro definition argument list tells the preprocessor that this is a variadic macro, and so it should expect any number of arguments. These extra arguments are made available via the __VA_ARGS__ symbol, which will expand to be all the things.

Here’s a simple macro:

#define Blah(oop, ...)  printf(oop, __VA_ARGS__);

And you invoke it like this:

Blah ("hello %d\n", 23);

Which expands to

printf("hello %d\n", 23);

Like this:

Macro expand 1

Multiple extra arguments are treated the same:

Blah ("I %s to be %s %s\n", "seem", "a", "verb");

expands to

printf ("I %s to be %s %s\n", "seem", "a", "verb");

and prints out

I seem to be a verb

Here’s what the preprocessor is doing:

Macro expand 2

With these two test cases working correctly, it’s time to ship it! Woo!
There’s just one problem. You may notice that the NSLog-whacker doesn’t use plain old __VA_ARGS__, but instead has two pound signs in front of it:

(NSLog)((format), ##__VA_ARGS__);

That’s another gcc extension (that clang supports) which fixes a corner case:

Blah ("just a message\n");

This expands into:

Blah ("just a message", );

The comma is there because it’s there in the macro:

Macro expand 3

The two pound signs in front of ##__VA_ARGS__ tells the preprocessor “if there are no variadic arguments provided, then eat the previously occurring comma.” Yeah. It’s a narrow-focused feature, but it’s exactly what’s needed:

Macro expand 4

All Wrapped Up

And that’s the macro. Here it is, again, with the commentary I have in my mind when I read something like this:

#define NSLog(format, ...) do {                      \
    fprintf(stderr, "%s-%d: ", __FILE__, __LINE__);  \
    (NSLog)((format), ##__VA_ARGS__);                \
} while (0)

“pound-define NSLog, so replace all NSLogs with some stuff. Oh good it’s properly protected. Print out the current file and line when expanded, call the real NSLog, pass it the parameters and eat that pesky comma, then finally eat the semicolon.”

Great, now I’m hungry.

Is this something I’d leave in my project permanently? Nope. The real output from this is actually much more verbose because __FILE__ gives you the full path to the file (I edited it out for this posting). You’d would want to run it through a function to get the file name of the path. And in general, knowing the file and line of a log statement is not terribly interesting. If you really need to know that, you may want to reduce the logging you do, and perhaps design a better overall mechanism.

3 Comments

  1. Reid

    Great post! I do something pretty similar, which I found on Stack Overflow a while back and have found to be incredibly useful:

    #define RDB_DEBUG_LOG

    #ifdef RDB_DEBUG_LOG
    #define DLog( s, … ) NSLog( @” %@”, self, [[NSString stringWithUTF8String:__FILE__] lastPathComponent], __LINE__, [NSString stringWithFormat:(s), ##__VA_ARGS__] )
    #else
    #define DLog( s, … )
    #endif

    I put this in the precompiled header so that I can use it project-wide. The nice thing about this is that I can simply comment out the first #define, and all my DLog statements will be effectively turned off throughout the project…when it’s time for the code to go production, no need to remove any of them (and I can quickly turn them back on later when needed.) Definitely an improvement over NSLog.

  2. Juri Pakaste

    Another pretty useful built-in macro is __PRETTY_FUNCTION__ which gives you output like int main(int, char **) or +[MyObject foo]. Those can sometimes be a bit more descriptive than a file name.

    Of course there’s good reasons to ditch NSLog for something like Lumberjack, too. If you ever run Xcode on a laptop screen, it’s really nice to not output things like process name or date.

    • I always forget about pretty function. It’s nice.

      Many times when I’ve cavemanning, and the NSLog output is getting annoying, I replace NSLog with a quieter version: http://borkware.com/quickies/single?id=261

      I figured things like Lumberjack are awesome for when you’re designing the logging an app needs to produce for health / activity / in-the-field diagnostics. It’s fundamentally a strategic tool. Design it well, add it, and leave it. Real Caveman Debugging, at least for me, is add it, get the info, then get rid of it – extremely tactical. Rarely have I ever found someone else’s caveman debugging output useful when left in “just in case”.

Leave a Comment

Join the discussion. Do not worry, your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>