Friday, December 7, 2007

Memory Management Mechanisms

Mechanism Versus Policy

Accessing and manipulating memory involves a lot of accounting work. Measures have to be taken to ensure that memory being accessed is valid and that it corresponds to actual physical storage. If memory protection mechanisms are in place, checks will also need to be performed by the processor to ensure that an executing task does not access memory locations that it should not. Memory protection is the type of service that multiuser operating systems are built upon. If virtual memory is being used, a significant amount of bookkeeping will need to be maintained in order to track which disk sectors belong to which task. It is more effort than you think, and all the steps must be completed flawlessly.


ASIDE

An arm-waving explanation is a proposition that has not been established using precise mathematical statements. Mathematical statements have the benefit of being completely unambiguous: They are either true or false. An arm-waving explanation tends to eschew logical rigor entirely in favor of arguments that appeal to intuition. Such reasoning is at best dubious, not only because intuition can often be incorrect, but also because intuitive arguments are ambiguous. For example, people who argue that the world is flat tend to rely on arm-waving explanations.

Memory Hierarchy

When someone uses the term "memory," they are typically referring to the data storage provided by dedicated chips located on the motherboard. The storage these chips provide is often referred to as Random Access Memory (RAM), main memory, and primary storage. Back in the iron age, when mainframes walked the earth, it was called the core. The storage provided by these chips is volatile, which is to say that data in the chips is lost when the power is switched off.

There are various types of RAM:

  • DRAM

  • SDRAM

  • SRAM

  • VRAM

Dynamic RAM (DRAM) has to be recharged thousands of times each second. Synchronous DRAM (SDRAM) is refreshed at the clock speed at which the processor runs the most efficiently. Static RAM (SRAM) does not need to be refreshed like DRAM, and this makes it much faster. Unfortunately, SRAM is also much more expensive than DRAM and is used sparingly. SRAM tends to be used in processor caches and DRAM tends to be used for wholesale memory. Finally, there's Video RAM (VRAM), which is a region of memory used by video hardware.

Recent advances in technology and special optimizations implemented by certain manufacturers have led to a number of additional acronyms. Here are a couple of them:

  • DDR SDRAM

  • RDRAM

  • ESDRAM

DDR SDRAM stands for Double Data Rate Synchronous Dynamic Random Access Memory. With DDR SDRAM, data is read on both the rising and the falling of the system clock tick, basically doubling the bandwidth normally available. RDRAM is short for Rambus DRAM, a high-performance version of DRAM sold by Rambus that can transfer data at 800 MHz. Enhanced Synchronous DRAM (ESDRAM), manufactured by Enhanced Memory Systems, provides a way to replace SRAM with cheaper SDRAM.

A bit is a single binary digit (i.e., a 1 or a 0). A bit in a RAM chip is basically a cell structure that is made up of, depending on the type of RAM, a certain configuration of transistors and capacitors. Each cell is a digital switch that can either be on or off (i.e., 1 or 0). These cells are grouped into 8-bit units call bytes. The byte is the fundamental unit for measuring the amount of memory provided by a storage device. In the early years, hardware vendors used to implement different byte sizes. One vendor would use a 6-bit byte and another would use a 16-bit byte. The de facto standard that everyone seems to abide by today, however, is the 8-bit byte.

There is a whole set of byte-based metrics to specify the size of a memory region:

1 byte = 8 bits

1 word = 2 bytes

1 double word = 4 bytes

1 quad word = 8 bytes

1 octal word = 8 bytes

1 paragraph = 16 bytes

1 kilobyte (KB) = 1,024 bytes

1 megabyte (MB) = 1,024KB = 1,048,576 bytes

1 gigabyte (GB) = 1,024MB = 1,073,741,824 bytes

1 terabyte (TB) = 1,024GB = 1,099,511,627,776 bytes

1 petabyte (PB) = 1,024TB = 1,125,899,906,842,624 bytes

Manual Memory Management

Replacements for malloc() and free()

Manual memory management dictates that the engineer writing a program must keep track of the memory allocated. This forces all of the bookkeeping to be performed when the program is being designed instead of while the program is running. This can benefit execution speed because the related bookkeeping instructions are not placed in the application itself. However, if a programmer makes an accounting error, they could be faced with a memory leak or a dangling pointer. Nevertheless, properly implemented manual memory management is lighter and faster than the alternatives.

In ANSI C, manual memory management is provided by the malloc() and free() standard library calls. There are two other standard library functions (calloc() and realloc()), they resolve to calls to malloc() and free().

I thought that the best way to illustrate how manual memory management facilities are constructed would be to offer several different implementations of malloc() and free(). To use these alternative implementations, all you will need to do is include the appropriate source file and then call newMalloc() and newFree() instead of malloc() and free(). For example:

#include

void main()
{
char *cptr;
initMemMgr();

cptr = newMalloc(10);
if(cptr==NULL){ printf("allocation failed!\n"); }
newFree(cptr);

closeMemMgr();
return;
}

The remainder of this chapter will be devoted to describing three different approaches. In each case, I will present the requisite background theory, offer a concrete implementation, provide a test driver, and look at associated trade-offs. Along the way, I will also discuss performance measuring techniques and issues related to program simulation.

Automatic Memory Management


Garbage Collection Taxonomy

Taking out the trash is a dance with two steps:

  1. Identifying garbage in the heap

  2. Recycling garbage once it is found

The different garbage collection algorithms are distinguished in terms of the mechanisms that they use to implement these two steps. For example, garbage can be identified by reference counting or by tracing. Most garbage collectors can be categorized into one of these two types.

Reference counting collectors identify garbage by maintaining a running tally of the number of pointers that reference each block of allocated memory. When the number of references to a particular block of memory reaches zero, the memory is viewed as garbage and reclaimed. There are a number of types of reference counting algorithms, each one implementing its own variation of the counting mechanism (i.e., simple reference counting, deferred reference counting, 1-bit reference counting, etc.).

Tracing garbage collectors traverse the application run-time environment (i.e., registers, stack, heap, data section) in search of pointers to memory in the heap. Think of tracing collectors as pointer hunter-gatherers. If a pointer is found somewhere in the run-time environment, the heap memory that is pointed to is assumed to be "alive" and is not recycled. Otherwise, the allocated memory is reclaimed. There are several subspecies of tracing garbage collectors, including mark-sweep, mark-compact, and copying garbage collectors.

Miscellaneous Topics

Suballocators

Memory managers have to suffer through the slings and arrows of not knowing when or how much memory will be requested by an application. There are, however, some circumstances where you will know in advance how large an application's memory requests will be or how many there will be. If this is the case, you can construct a dedicated memory manager known as a suballocator to handle memory requests and reap tremendous performance gains.

A suballocator is an allocator that is built on top of another allocator. An example of where suballocators could be utilized is in a compiler. Specifically, one of the primary duties of a compiler is to build a symbol table. Symbol tables are memory-resident databases that serve as a repository for application data. They are typically built using a set of fixed-size structures. The fact that a symbol table's components are all fixed in size makes a compiler fertile ground for the inclusion of a suballocator. Instead of calling malloc() to allocate symbol table objects, you can allocate a large pool of memory and use a suballocator to allocate symbol table objects from that pool.

The following source code implements the SubAllocator class and a small test driver:

#include
#include
#include

#define U4 unsigned long
#define U1 unsigned char

struct Indices
{
U1 free;
U4 index1;
U4 index2;
U4 index3;
};

class SubAllocator
{
private:
struct Indices *indices;
U4 size;
U4 lastAlloc;

public:
SubAllocator(U4 nElms);
~SubAllocator();
struct Indices *alloc();
void release(struct Indices *addr);
void printList();
};

SubAllocator::SubAllocator(U4 nElms)
{
U4 i;

size = nElms;
indices = (struct Indices*)malloc(size*(sizeof(struct
Indices)));
if(indices==NULL)
{
printf("SubAllocator::SubAllocator(%lu) :",size);
printf("could not allocate list\n");
exit(1);
}

for(i=0;i
{
indices[i].free =TRUE;
}

lastAlloc = 0;
return;

} /*end constructor-----------------------------------------------*/

SubAllocator::~SubAllocator()
{
free(indices);
return;

} /*end destructor-----------------------------------------------*/

struct Indices* SubAllocator::alloc()
{
U4 i;
if(lastAlloc==size-1){ lastAlloc=0; }

for(i=lastAlloc;i
{
if(indices[i].free==TRUE)
{
indices[i].free=FALSE;
lastAlloc = i;
return(&indices[i]);
}
}

for(i=0;i
{
if(indices[i].free==TRUE)
{
indices[i].free=FALSE;
lastAlloc = i;
return(&indices[i]);
}
}
return(NULL);

}/*end alloc-----------------------------------------------*/

void SubAllocator::release(struct Indices *addr)
{
//sanity check
if((addr>=&indices[0])&&(addr<=&indices[size-1]))
{
(*addr).free=TRUE;
}
else
{
printf("SubAllocator::release():");
printf("release failed, address out of bounds\n");
}
return;

}/*end release-----------------------------------------------*/

void SubAllocator::printList()
{
U4 i;
for(i=0;i
{
if(indices[i].free==FALSE)
{
printf("indices[%lu] ",i);
printf("[%lu, ",indices[i].index1);
printf("%lu,",indices[i].index2);
printf("%lu]\n",indices[i].index3);
}
else
{
printf("indices[%lu]=FREE\n",i);
}
}
return;

}/*end printList-----------------------------------------------*/

void main()
{
U4 ticks1,ticks2;
U4 nAllocations=1024;
U4 i;
SubAllocator *ptr;
struct Indices **addr;

ptr = new SubAllocator(nAllocations);
addr = (struct Indices**)malloc(nAllocations*sizeof(struct
Indices*));

ticks1 = GetTickCount();

for(i=0;i
{
addr[i] = (*ptr).alloc();
if(addr[i]==NULL)
{
printf("addr[%lu]==NULL\n",i);
exit(1);
}
}

for(i=0;i
{
(*ptr).release(addr[i]);
}

ticks2 = GetTickCount();

delete(ptr);
free(addr);

printf("msecs=%lu\n",ticks2-ticks1);

return;

}/*end main-----------------------------------------------*/

When this application is executed, the following output is produced:

   msecs=0