Wednesday, June 28, 2017

Arena-Based Storage Management

This is the second blog post on compiler construction. You can read the first one here.

Calling malloc incurs the obligation of a subsequent call to free which can be costly and error prone (imagine forgetting to free or freeing in-use space).

malloc allocates based on object size. I used a storage management scheme that bases allocation on object lifetime instead:

  • Memory is allocated from arenas and entire arenas are deallocated at once.
  • Objects with the same lifetime are allocated from the same arena.
  • An arena is implemented as a linked list of contiguous blocks of memory.

The allocator acts like malloc except that it uses unsigned long instead of size_t (size_t is often defined to be unsigned int or unsigned long so we use unsigned long to account for all sizes) and takes an arena identifier from which space is allocated. The allocator returns a pointer suitably aligned to hold values of any type. A basic type is properly aligned if it is stored at an address that is a multiple of its size (this is how x86 and ARM processors lay out basic data types in memory, for example). In C, a structure gets the alignment of the member that has the widest alignment. Extra space may be allocated to account for the alignment requirements. In fact, reordering the members of a struct so that they are sorted by non-increasing alignment can reduce the memory footprint of a program significantly. You can read more about this in this excellent article by Eric Raymond: The Lost Art of C Structure Packing.

The deallocator recycles space by adding it to a list of free blocks so that the next time space needs to be allocated space from the free blocks is used. This technique renders the cost of allocation and deallocation negligible.

I recently learned that arena based storage management is used in Protocol Buffers, Google’s technology for serializing structured data.