Jul 9 2013

Inside the Bracket, part 5 – Runtime API

Behold, the Objective-C Runtime API. Here are the tools to let you peer into the metadata that the Objective-C compiler keeps around. Here are the tools that let you manipulate the underlying data structures. Here be the gateway to information and control. Here also be some sharp corners.

There’s four main categories in the Objective-C runtime API. You can query classes, objects, protocols, and properties for information about themselves, and you can dig into an object and retrieve its instance variables. You can manipulate objects and classes, such as adding new methods for classes, or redirecting existing methods to run different code. You can send messages with objc_msgSend and friends, and you can also associate objects together. This part (part 5) talks about introspection, and next time will talk about manipulation. I touched on objc_msgSend in part 1, and I’ll cover associated objects another time.

What are the kinds of things you can do with this API? I believe the intent of the API is to make the magic that Cocoa does possible. This work also allows bridges to be built to different languages, using the runtime API to construct the glue that translates Objective-C classes and objects back and forth. The SenTesting unit test framework uses the runtime API to gather the test classes together, and to find the test methods. You can also patch in methods to existing classes to change their behavior that could be useful when writing unit tests.

I mainly use it for debugging, seeing what’s going on under the surface. Maybe I can temporarily swizzle in some caveman debugging into some existing code path. It’s also fun to just hack around with. Maybe add some class-info dumping code to a project, and see all the neat private classes floating around the app.

I like to think of the runtime API as the interface of last resort. If something can be done with NSObject, use it. I can only remember a couple of times where I’ve actually shipped code that used some of this API.

API Flavor

The Objective-C runtime API is, for the most part, consistent. When you get some data from a call that includes the word “copy” you are responsible for releasing that memory using free. If you get some information via a “get” call, you don’t need to worry about cleaning up after it.

There are a number of opaque “objects” that you work with, such as a Class, or a Protocol. The API uses a type_verbNouns naming style. class_getInstanceSize? That tells you that you’ll be passing it something class-like (of type Class). It will be returning a value, which you don’t need to clean up after, that happens to be the number of bytes that instances of this class will consume.

A rough corner: Here’s the types of the different kinds of “objects” you deal with:

  • class_ – takes a Class
  • ivar_ – takes an Ivar
  • sel_ – takes a SEL
  • protocol_ – takes a Protocol *pointer*
  • object_ – takes an id
  • property_ – takes an objc_property_t
  • method_ – takes a Method, which you get from classes
  • When you ask protocols for the list of their methods, they return a struct objc_method_description instead of a Method

The types that are passed and returned by the different calls aren’t always uniform in exposed type (such as something taking a Class, an another taking Protocol pointers) or nomenclature (Class and Method vs objc_property_t). So, until you get comfortable with what particular flavor of calls takes as its first argument, you should keep the docs handy, or keep the header in /usr/include/objc/runtime.h handy. The online docs haven’t been updated since 10.6 and the headers have some newer calls added since then.

Lest you think the Objective-C runtime API designers are sloppy, there’s pretty good reasons for the inconsistencies here. Protocols are declared as objects, and so need to be accessed with a pointer. Class/Method/Ivar were there from the beginning, but properties came later. The runtime API couldn’t really usurp the type name Property because it could have legitimately been used in existing client code. objc_property_t is much less likely to collide with an existing symbol. Methods are associated with a class, but that doesn’t make sense for protocols because they can be associated with many classes, so that’s why you just get a struct that describes the method.

Taking a Tour

The Objective-C runtime API is actually kind of hard to write about in an interesting way. You can pretty much say “Here’s all the information you can get out of the runtime. Go have fun!” I’ve written a little explorer tool that can be found at this gist. Run it with various arguments, like listclasses, and it’ll list all the classes that it can find, along with some information about them. You can also list protocols, or get detailed information about an individual class or protocol.

First off, here’s how you can list all the classes (from the ListClasses function in the sample). If you’re not sure where to start getting information, the calls prefixed with objc_ are good starting places. They’re usually the ultimate source of all truth, or at least where you can get ahold of every classes and property the runtime knows about.

First, copy out all the classes:

    unsigned int classCount;
    Class *classes = objc_copyClassList (&classCount);

This is a standard idiom. Ask a call to Copy some kind of List. You pass it an integer’s address, and that piece of memory gets filled in with the count of classes. You are then handed back a dynamically allocated pointer to an array of Classes. They might just be pointers, they might be structures. They’re opaque. They’re my Classes.

You can either walk the array indexed in for-loop, or you can scan the array and stop when you hit a NULL value at the end. The copy*List calls add that NULL on the end for your convenience. Because this memory is dynamically allocated, you need to free it when you’re done.

You can get the name of the class by using class_getName and passing it the Class in question. It’s a ‘get‘ call, so no need to worry about memory.

    const char *name = class_getName (someclass);

You’re not guaranteed of any order that the information is returned, but I like to see my classes sorted alphabetically. The array of classes you get is just a C array of things of type Class, so you can qsort it. Blocks are cool, so put the comparison function in a block:

    qsort_b (classes, classCount, sizeof(Class),
             ^(const void *thing1, const void *thing2) {
            return strcmp(class_getName(*((Class *)thing1)),
                          class_getName(*((Class *)thing2)));
        });

The casting and dereferencing games are necessary because the sort block is getting passed pointers to the Classes, rather than to the class itself.

Then you can walk the list of classes and ask it stuff.

    for (int i = 0; i < classCount; i++) {
        const char *name = class_getName (classes[i]);

You can ask the class for its @properties. Or what instance variables it has. Or what protocols it adopts. The code here will just print out the number of dudes found, giving some summary information These calls all follow the same pattern – class_copySomethingList.

        unsigned int propertyCount;
        objc_property_t *properties = class_copyPropertyList (classes[i], &propertyCount);
        free (properties);

        unsigned int ivarCount;
        Ivar *ivars = class_copyIvarList (classes[i], &ivarCount);
        free (ivars);

        unsigned int protocolCount;
        Protocol * __unsafe_unretained *protocols = class_copyProtocolList (classes[i], &protocolCount);
        free (protocols);

Why that __unsafe_unretained in the Protocol line, and not the other ones? Recall that Protocols are accessed by a pointer to a Protocol, rather than directly like a Class or Method. In <objc/runtime.h> a Protocol is forward-declared as a @class, so it has to be accessed by pointer. ARC is paranoid about this block of non-object memory that’s pointing to a bunch of pointers, so the __unsafe_unretained is there to tell ARC “It’s ok. Don’t worry. I trust this call not to let any protocols disappear out from underneath me.”

There are two kinds of methods – instance and class. You get at them with their own copyList call:

        unsigned int instanceMethodCount;
        unsigned int classMethodCount;
        Method *instanceMethods = class_copyMethodList (classes[i], &instanceMethodCount);
        Method *classMethods = class_copyMethodList (object_getClass(classes[i]), &classMethodCount);
        free (instanceMethods);
        free (classMethods);

And finally, print out a line per class, close the loop, and free the class list we got:

        printf ("    %d: %s, %d properties, %d ivars, %d protocols, "
                "%d instance methods, %d class methods\n",
                i, name, propertyCount, ivarCount, protocolCount,
                instanceMethodCount, classMethodCount);
    }

    free (classes);

Ever wondered what you get “for free” when you have a minimal program? It’s pretty scary / cool:

Got 673 classes
    0: CFXPreferencesCompatibilitySource, 0 properties, 1 ivars, 0 protocols, 8 instance methods, 0 class methods
    1: CFXPreferencesManagedSource, 0 properties, 0 ivars, 0 protocols, 1 instance methods, 0 class methods
    2: CFXPreferencesPropertyListSource, 0 properties, 9 ivars, 0 protocols, 9 instance methods, 0 class methods
    3: CFXPreferencesPropertyListSourceSynchronizer, 0 properties, 11 ivars, 0 protocols, 11 instance methods, 0 class methods
...
    670: __NSTaggedDate, 0 properties, 0 ivars, 0 protocols, 3 instance methods, 4 class methods
    671: __NSTimeZone, 0 properties, 4 ivars, 0 protocols, 10 instance methods, 5 class methods
    672: __NSXPCObjCServerClient, 0 properties, 6 ivars, 1 protocols, 5 instance methods, 1 class methods

That’s a lot of classes. It’s slightly inflated because the program has one class it uses for demonstration purposes.

It’s also a whole lot of information, even before digging into the details of, say, properties or Methods.

You can also ask a class for its version, with class_getVersion. Most classes have a zero version number, but some have a non-zero value. The version info is useful with object archives. If see an older version of an object living in an archive, it might be a candidate that needs some special-case updating. Out of those 672 classes, I found six that had non-zero versions: NSAffineTransform, NSCountedSet, NSDateFormatter, NSMutableString, NSNumberFormatter, and NSString.

A class also knows how big each object instance is. It’s enough memory to hold all the instance variables, plus whatever padding is necessary to keep everything nice and aligned. class_getInstanceSize gives that size.

Property Rights

Classes and protocols can have @properties. The Objective-C runtime stores extra information about the properties (property properties) that you can query. Get the properties from the class like you saw above, by copying the property list:

    unsigned int propertyCount;
    objc_property_t *properties = class_copyPropertyList (classy, &propertyCount);
    printf ("* has %d properties\n", propertyCount);

And for fun, walk the array in memory.

    objc_property_t *propertyScan = properties;
    while (*propertyScan) {

There’s really not a lot to properties – just a name, and a string that describes everything about the property:

        const char *propertyName = property_getName (*propertyScan);
        const char *propertyAttributes = property_getAttributes (*propertyScan);
        printf ("    - %s -> %s\n", propertyName,
                ReadablePropertyAttributes(propertyAttributes));
        propertyScan++;
    }
    free (properties);

So, yeah. String-based APIs. I’m not a fan of them, but they are popular if you’re wanting to cram a lot of arbitrary chunks of information together in a form that’s both compact and human(?) readable. The strings look like this:

T^{CGRect={CGPoint=dd}{CGSize=dd}},N,V_boundsPointer
T@?,C,V_blockHead
Td,N,V_gilliganFactor
T@"NSURL",&,V_mikeysPonyFarm
T@"NSString",W,GsetBlah,SgetBlah:,V_scoobyDoobyDoo

When expanded, they actually mean something:

CGRect*, non-atomic, ivar: _boundsPointer
block, copy, ivar: _blockHead
double, non-atomic, ivar: _gilliganFactor
NSURL, retain/strong, ivar: _mikeysPonyFarm
NSString, weak, getter: setBlah, setter: getBlah:, ivar: _scoobyDoobyDoo

Here’s the @properties that led to these property attribute strings:

@property (assign, nonatomic) CGRect *boundsPointer;
@property (copy) void (^blockHead)(NSFileHandle *);
@property (assign, nonatomic) CGFloat gilliganFactor;
@property (strong) NSURL *mikeysPonyFarm;
@property (weak, setter=getBlah:, getter=setBlah) NSString *blah;

The blah property has a custom backing instance variable:

@implementation XXStuff
@synthesize blah = _scoobyDoobyDoo;

The strings are pretty straightforward – a comma separated list of attributes, with a single character indicator of what they are, and optionally some extra information. “R” means it’s read-only, “C” for copy, G/S indicate custom getter and setter names (and yes the blah property above has a setter named getBlah:, just to be mean). The first element of the list, “T” has the type that comes from the @encode compiler directive. The last thing on the list, “V“, is the name of the backing instance variable for the property. You can see the @synthesize changed the ivar for the blah property. You can see that the auto-synthesized instance variables are prefixed with underscores, as expected.

String and String, what is string!

Property attributes aren’t the only string-based encoding. You can get method type encodings from Methods (via method_getTypeEncoding) and Protocols (via protocol_copyMethodDescriptionList, and then digging into the returned structure’s type field). These are very similar to the encoding strings you’d use for creating new invocations, with one small hitch. It’s full of numbers:

v24@0:8@16

The numbers are, from right to left, the size of the arguments on the stack, followed by the offsets of the arguments. In some distant time in the past. These days on the ARM and 64-bit x86 architectures many parameters get passed through registers, so these numbers are pretty wrong or meaningless.

Just strip the numbers out leaving you with a more readable string:

v@:@

Aside from looking like a dazed person wearing a Bajoran earring, this description means that the method returns void, takes an id and selector, which are our old friends self and _cmd, and takes an id argument. The typestring.m file in the gist has a function to pull apart a type string and turn it into something somewhat readable.

Accessing Ivars

Instance variables are first-class citizens in the Objective-C runtime API, represented by the Ivar type. You ask a class for an Ivar with a particular name and then you can use that Ivar to get (or set) to the actual value in an object.

Here’s a couple of properties that you saw before:

@property (strong) NSURL *mikeysPonyFarm;
@property (assign, nonatomic) CGFloat gilliganFactor;

You’d populate them normally:

    XXStuff *stuff = [[XXStuff alloc] init];
    stuff.mikeysPonyFarm = [NSURL URLWithString: @"http://www.fanfiction.net/cartoon/My-Little-Pony/1/0/0/1/0/0/0/0/0/1/0/"];
    stuff.gilliganFactor = 8.2213;

And here’s the info behind the properties:

double, non-atomic, ivar: _gilliganFactor
NSURL, retain/strong, ivar: _mikeysPonyFarm

Their backing instance variable names are just the underscore-prefixed version, but could be customized.

First, you get the Ivar from the class:

    Ivar ponyIvar = class_getInstanceVariable ([stuff class], "_mikeysPonyFarm");

Then you tell the object, “hey give me your info!”

    NSURL *url = object_getIvar (stuff, ponyIvar);

And you’re done. At least for objects. Non-object values are a different story.

Back in the old days you might write functions that return a sufficiently large number of bytes so that you could return any scalar type, whether integers, pointers, or floats, and you’d just cast or use union tricks to get the value you wanted. The realities of modern programming make that more difficult – how does ARC know when or when not to retain something coming through that return value? Think simple. object_getIvar returns Objective-C object pointers. id‘s. That’s it.

How do you get something else, like that float? You have to go where it lives by asking an Ivar for the byte offset from the beginning of the object.

    Ivar gilliganIvar = class_getInstanceVariable ([stuff class], "_gilliganFactor");
    ptrdiff_t offset = ivar_getOffset(gilliganIvar);

So the offset, in the case of this object, is 32. The first byte of the float starts on the 32nd byte from the beginning of the object. Now it’s just a matter of pointer math. First task is to get a naked pointer to the beginning of the object without ARC complaining or trying to step in:

    unsigned char *stuffBytes = (unsigned char *)(__bridge void *)stuff;

And then add the offset to the base address, cast-and-dereference to pull out a float‘s worth of bytes.

    CGFloat gilligans = * ((CGFloat *)(stuffBytes + offset));

When printed out, you get the expected value of 8.221300

    printf ("%f gilligans\n", gilligans);

An Extra Bit Of Memory

A common trick in C-land is to declare a structure with a weird zero-length array at the end:

typedef struct ExpandOMatic {
    char *name;
    int footCount;
    float footSizes[0];
} ExpandOMatic;

Say you’re storing the shoe sizes for space aliens, some of whom can have an arbitrary number of feet. It’d be nice if you could store all the foot sizes in-line with the other information rather than having to spill out into an allocated array or list. By changing how big of a chunk of memory we ask for, you can hold on to the extra feet and access them via that array at the end.

The structure there is 16 bytes. For a creature with no feet, you’d just allocate the structure, which would have the name pointer and the foot count:

    ExpandOMatic *nofoot = malloc (sizeof(ExpandOMatic));
    nofoot->name = "nochan";
    nofoot->footCount = 0;

For a creature with 30 feet, you’d allocate more space, and use the array slot at the end to access the extra bytes:

    int footCount = 30;
    ExpandOMatic *lotsOfFeet = malloc (sizeof(ExpandOMatic) + sizeof(float) * footCount);
    lotsOfFeet->name = "footloose";
    lotsOfFeet->footCount = footCount;
    for (int i = 0; i < footCount; i++) {
        lotsOfFeet->footSizes[i] = random() % 15;
    }

You can do similar things with objects. Make an object with extra bytes tacked on the end with class_createInstance, and get the pointer to those extra bytes by using object_getIndexedIvars. The code for this can be found at this gist.

There’s a class that holds extra stuff:

@interface Stuffage : NSObject {
    int _extraCount;
}
- (id) initWithExtraSpace: (int) extraCount;    
- (int) stuffAtIndex: (unsigned int) index;
@end // Stuffage

Say you wanted an extra 30,000 ints at the end of the object. Rather than using alloc, you can create an instance and initialize it to let the object know how many extra ints there will be.

    Stuffage *stuff = class_createInstance ([Stuffage class], sizeof(int) * 30000);
    stuff = [stuff initWithExtraSpace: 30000];

Here’s how the init goes, making a note of how many extra ints it has to play with, and initializing them.

- (id) initWithExtraSpace: (int) extraCount  {

    if ((self = [super init])) {
        _extraCount = extraCount;

        int *extra = object_getIndexedIvars (self);
        for (int i = 0; i < extraCount; i++) {
            extra[i] = i * 100;
        }
    }
    return self;
} // initWithExtraSpace

And then you can access them:

    printf ("%d %d %d %d\n",
            [stuff stuffAtIndex: 1],
            [stuff stuffAtIndex: 25],
            [stuff stuffAtIndex: 300],
            [stuff stuffAtIndex: 29999]);

Which prints out

100 2500 30000 2999900

Foundation uses this trick with some of its very performance-sensitive classes. If the data that backs an immutable class is known at the time of creation, it can be more efficient to have a single memory allocation keep that data.

There are some caveats. The first: don’t do this. It’s cool, and nifty information, but don’t do this. It’s a tool for the framework implementors or for folks making language bridges. For those of us out in application land, if you’re doing something so time critical that you’re thinking of inlining all your data in the object like this, Objective-C is probably not the right choice. You can always drop down to C or C++.

The second is: it doesn’t play well with ARC. In fact, it’s on the ARC verboten list, along with retain and release. If you try to use it in an ARC-enabled compilation unit, the compiler will just smack you down. You need to put this call into a -fobjc-no-arc ghetto and do your own memory management.

The Ugly

With any API, especially one that’s been around for many OS revision, and for one that’s low-level, there are some warts here and there. It’s to be expected.

One of them is that all category information is not available via the API. The information is present in the executable (in Mach-O Objective-C segments) because DTrace includes the category name in the probemod variable, but you can’t grovel around at runtime and see what the categories are.

There are some calls that are listed in the docs and header, but have big “hands off” signs. objc_get/setFutureClass has an intriguing name, but the only description is that it’s used by Foundation’s toll-free bridging, and “Do not call this function yourself.” objc_duplicateClass is used in KVO, and also says “Do not call this function yourself.”

objc_getInstanceVariable is another way of accessing instance variables, but it is incompatible with ARC, and is not aware of weak references. Any use of tis function also needs to be put into a -fno-objc-arc ghetto with proper memory management. It also doesn’t work for non-object types.

There are some functions that were only useful with Cocoa Garbage Collection – the IvarLayout calls – class_get/setIvarLayout and class_get/setWeakIvarLayout.

It’s not really a sharp corner, but there’s an old-school way (10.6 or prior) to get the class list. You use getClassList that returns the count of classes, and fills in a buffer you give it:

    unsigned int classCount = objc_getClassList (NULL, 0);
    Class *classes = (__unsafe_unretained Class *) malloc (sizeof(Class) * classCount);
    unsigned int newClassCount = objc_getClassList (classes, classCount);
    assert (classCount == newClassCount);

That is kind of messy compared to objc_copyClassList – it’s inconsistent with similar calls, plus there’s a race condition there. Because the count, memory allocation, and filling of that memory are all distinct operations, it’s possible that another thread might have loaded a class causing the list of classes returned to be stale.

It might even be missing a class (say you allocated space for 10 classes, a new one gets loaded, objc_getClassList fills in only 10 classes but includes the new one. One old class, which still exists, is left out. There’s no locking that we can do around these calls to prevent that. By putting everything into one call for us, objc_copyClassList can use its own internal locking to prevent race conditions.

Various pieces of advice you see, even in the docs, is to pull in <Foundation/NSObjCRuntime.h> to get the runtime goodies. There’s also <objc/objc.h> Those are out of date and don’t include the runtime calls. You need to explicitly pull in <objc/runtime.h> to get the runtime API.

Coming up

Next time – More of the runtime API. Yay!

2 Comments

  1. Chris Ryland

    Mark, nice work!

    You might be better to be slightly more general and paranoid and use something like

    ExpandOMatic *lotsOfFeet = malloc (sizeof(*lotsOfFeet) + sizeof((*lotsOfFeet).footSizes[0]) * footCount);
    .

Leave a Comment

Join the discussion. Do not worry, your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>