Preventing Surprisingly Large Objective-C Type Encodings

I’ve been writing code in Objective-C for more than 12 years now, yet it still surprises me. One example of something that caught me off guard recently is the combination of ObjC, C++ objects, and type encodings. Did you know these encodings can become thousands of characters long? This is obviously wasteful and bad for binary size and performance.

Type Encodings

If you’re familiar with Objective-C, you likely know about selectors, but they only contain the name and number of parameters. The type each parameter has is stored in the method type encoding, which is encoded as a simple string. With C++ types, those strings can easily become lengthy and add to your binary size. The ObjC runtime includes NSMethodSignature, which helps to parse this string, but you don’t have to understand how this encoding works in detail to follow along.

Here’s how the encoding for CGRect looks: {CGRect={CGPoint=dd}{CGSize=dd}}16@0:8

This is fairly straightforward. However, if we use a C++ type, it’s not quite as compact.

Here’s an example from our codebase of the encoding for a simple map with an enum and a string, std::unordered_map<PSPDFAnnotationPlaceholderState, NSString *>:

Copy
1
2
3
4
5
6
PSPDFAnnotationPlaceholder._actionsForStates: (3176) {unordered_map<PSPDFAnnotationPlaceholderState
  , NSString *, std::__1::hash<PSPDFAnnotationPlaceholderState>, std::__1::equal_to<PSPDFAnnotationPlaceholderState>,
  std::__1::allocator<std::__1::pair<const PSPDFAnnotationPlaceholderState, NSString *> > >="__table_"{__hash_table
    <std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, std::__1::__unordered_map_hasher
    <PSPDFAnnotationPlaceholderState, std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>,
    std::__1::hash<PSPDFAnnotationPlaceholderState>, true>, std::__1::__unordered_map_equal<PSPDFAnnotationPlaceholderState, std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, std::__1::equal_to<PSPDFAnnotationPlaceholderState>, true>, std::__1::allocator<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *> > >="__bucket_list_"{unique_ptr<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> *[], std::__1::__bucket_list_deallocator<std::__1::allocator<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> *> > >="__ptr_"{__compressed_pair<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> **, std::__1::__bucket_list_deallocator<std::__1::allocator<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> *> > >="__value_"^^{__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *>}"__value_"{__bucket_list_deallocator<std::__1::allocator<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> *> >="__data_"{__compressed_pair<unsigned long, std::__1::allocator<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *> *> >="__value_"Q}}}}"__p1_"{__compressed_pair<std::__1::__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *>, std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> > >="__value_"{__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *>="__next_"^{__hash_node_base<std::__1::__hash_node<std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, void *> *>}}}"__p2_"{__compressed_pair<unsigned long, std::__1::__unordered_map_hasher<PSPDFAnnotationPlaceholderState, std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, std::__1::hash<PSPDFAnnotationPlaceholderState>, true> >="__value_"Q}"__p3_"{__compressed_pair<float, std::__1::__unordered_map_equal<PSPDFAnnotationPlaceholderState, std::__1::__hash_value_type<PSPDFAnnotationPlaceholderState, NSString *>, std::__1::equal_to<PSPDFAnnotationPlaceholderState>, true> >="__value_"f}}}

That’s more than 3,000 characters. Before it was changed, the worst offending method type signature in our codebase contained 13,126 characters. That’s 12 KB (!) for one type. To understand how this encoding works, I suggest reading Dave MacLachlan’s Objective C, Encoding and You article, which was the reason I investigated our type encoding sizes in the first place.

Scanning the Runtime

While you can use strings Foo.app/Foo | grep -e '{‘ in your app binary to find encodings, it’s not always easy to see the method or variable referenced. So instead, I wrote a helper in Swift to find the largest type encodings of methods and instance variables. Querying the runtime in Swift is easier than I expected, but it’s still relatively messy, as you need to allocate memory manually. Luckily, all you need to do here is drop the file into your project and call RuntimeScanner().scan() on it.

You will want to adjust the class filter to use your own prefix (search for hasPrefix("PSPDF")) — Apple uses a lot of ObjC++ and has some large offenders, but that’s something we neither can change nor want to see. In this example, I scanned for PSPDF and PDFC, as these are our internal prefixes. I deliberately skipped protocols, since we do not use protocols that reference C++ classes.

Here’s an example output, sorted by encoding size, printing class, and ivar/method name:

Copy
1
2
3
PSPDFDocumentProvider.coreDocumentProviderImpl: (13126) (...)
PSPDFAnnotationPlaceholder._actionsForStates: (3176) (...)
PSPDFTextSelectionView._cachedViewRectsHashMap: (2681) (...)

If you don’t see any output, you either have a pure Swift project without UIKit references, or (more likely) you forgot to change the prefix filter to match your prefix. Don’t try to fix all items on the list — pick the largest offenders for the most benefit. Properties are especially important, since they have multiple entries in the metadata registry, the setter, the getter, and the property metadata itself. As such, they should be on the top of your list of things to change.

Mitigation Techniques

There are various ways you can hide C++ objects. For most situations, I created a separate struct in the ObjC class implementation and moved C++ objects into it. This newly created struct is connected to the class via a unique_ptr. This is an additional level of indirection, but it can be cached before entering any hot paths, so it shouldn’t have any performance drawbacks (see below).

I also used objc_metadata_hider_ptr and objc_metadata_hider_ref from the aforementioned blog post, and I converted some static methods into static functions to avoid the ObjC runtime completely. You’ll have to experiment a bit and be iterative in your approach to reduce the list.

It’s worth mentioning LLVM Bug D55544 and the clang-tidy addition D55640. Each of these feature requests would help to find/mitigate these issues earlier. The new objc_direct Clang attribute will also make hiding specific metadata from the runtime much easier.

Result

To understand if the changes have any effect, we can use the wonderful Bloaty McBloatface, a size profiler for binaries. Make sure to compile your binary the very same way (I’m using our nightly CI process to redirect it to my branch), and then be sure to strip away other architectures and bitcode.

Using file <binary> should show exactly one arch, and you probably want to compare arm64 since it’s the most relevant one: PSPDFKit-orig: Mach-O 64-bit dynamically linked shared library arm64. Note: If bloaty shows a large __LLVM segment, then you didn’t strip bitcode.

Here’s the output when we compare the existing binary with the one after our optimizations using bloaty -d segments,sections PSPDFKit-smol -- PSPDFKit-orig:

Copy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
     VM SIZE                                    FILE SIZE
 --------------                              --------------
  [ = ]       0 __LINKEDIT                       -48  -0.0%
      +0.0%     +48 [__LINKEDIT]                       0  [ = ]
      -0.0%      -8 Rebase Info                       -8  -0.0%
      -0.1%     -40 Function Start Addresses         -40  -0.1%
  -0.6% -96.0Ki __TEXT                       -96.0Ki  -0.6%
       +84% +10.9Ki [__TEXT]                     +10.9Ki   +84%
      +0.0%      +5 __TEXT,__objc_classname           +5  +0.0%
      -0.1%    -209 __TEXT,__objc_methname          -209  -0.1%
      -0.2%    -344 __TEXT,__unwind_info            -344  -0.2%
      -0.1%    -460 __TEXT,__gcc_except_tab         -460  -0.1%
      -0.1% -14.9Ki __TEXT,__text                -14.9Ki  -0.1%
      -5.9% -39.8Ki __TEXT,__cstring             -39.8Ki  -5.9%
     -41.4% -51.2Ki __TEXT,__objc_methtype       -51.2Ki -41.4%
  -0.5% -96.0Ki TOTAL                        -96.0Ki  -0.6%

You can see that my efforts here shaved off around 100 KB, or 0.5 percent of the binary. That’s not a lot, but it’s decent progress for a day’s worth of work. The PSPDFKit iOS SDK is embedded in thousands of apps, and the average person has at least three copies on a phone, so if you multiply these savings, it quickly adds up.

PSPDFKit Newsletter

Subscribe to our newsletter for more articles like this.