Saturday 24 December 2011

Load in Place data structures and Pointer Fixups

PROLOGUE:

When I first started this blog post my intention was to describe the process of loading full game assets using the Load in Place technique in C++, I quickly realised however that I probably needed a little more context before jumping into a full example so what you're reading now is the result.

Please bear in mind all of this is done purely as an example to learn the core concepts, you would never use this in a real project. You would ideally want some nice way of encapsulating what is happening by providing friendly interfaces to handle loading for you, but I find these interfaces can often obscure what is actually happening when you are first learning this stuff, to I have omitted this for clarity. I will hopefully at some point in the future provide an example of such an interface to use all this code with! (We also would want a generic implementation capable of loading lots of different types of files and also the ability to load more than one object at once, but we'll worry about that stuff later).

THE BEGINNING:

Simple File I/O in C:

I decided to use the file I/O in C for this example as opposed to C++. There are some differences and in retrospect C++ maybe have been a better call but they achieve pretty much the same ends. I am sure if you're that way inclined you could port these examples to use fstream instead and I may well revisit this in future to do just that.

Here is a simple example of saving a file with some data you have created:
(I suggest having a read through this article if you are not too familiar with File I/O in C:
http://www.cplusplus.com/doc/tutorial/files/ )

class Foo {
public:
    int i;
    int j;
};

int main(int argc, char** argv)

{
    FILE* file;
    file = fopen("MyFile.bin", "wb");

    Foo myFoo;
    myFoo.i = 1;
    myFoo.j = 2;

    fwrite(&myFoo, sizeof(Foo), 1, file);
    fclose(lfile);

    return 0;
}

So what's going on here. First we create a file in the local directory called "MyFile.bin". The extension is used to identify the file as being in binary format, when we create the file we open it for writing in binary mode which is what the "wb" represents - 'writing,binary'. We then fill in a very simple structure and write that structure to disk, making sure to close the file after. We have just saved the state of our object, which is pretty neat. Imagine if that was the score in a given level or the amount of health we had left at a particular save. The cool thing is getting that information back is super easy as well! All we need to do is this:

class Foo {
public:
    int i;
    int j;
};

int main(int argc, char** argv)
{
    FILE* file;
    file = fopen("MyFile.bin", "rb");

    Foo myLoadedFoo;

    fread(&myLoadedFoo, sizeof(Foo), 1, file);
    fclose(lfile);

    cout << "i: " << myFoo.i << "j: " << myFoo.j << endl;
    return 0;
}

To get the data back we just read the file and fill in a blank struct of the type we saved. We open the file for reading in binary mode "rb - reading,binary", call fread instead of fwrite and fill in our struct. If we print out the contents you will find we have the output of 1 and 2 on screen. Pretty cool right?

THE MIDDLE:

Where things get interesting is when we have pointers in those objects to some data else where in memory. How do we store that dependency? If we were to use the method above with a pointer in Foo, like this:

class Foo {
public:
    int   i;
    int   j;
    char* p;
};

The method we just discussed will unfortunately not work, lets see why. Take a look at this next example.

class MessageList {
public:
    char hello[6];
    char goodbye[8];
    char cheese[7];
};

int main(int argc, char** argv)
{
    // This is just an example of a standard resource 
    // load you can think of happening in a resource manager
    // This would probably also be loaded from a file at some 
    // earlier stage in the program in a real world situation.
    MessageList myMessageList;

    char* hello =   "Hello";
    char* goodbye = "Goodbye";
    char* cheese =  "Cheese";
    
    strcpy(myMessageList.hello,   hello); 
    strcpy(myMessageList.goodbye, goodbye);
    strcpy(myMessageList.cheese,  cheese); 
    // End of file load  

    FILE* file;
    file = fopen("MyFile.bin", "wb");

    Foo myFoo;
    myFoo.i = 1;
    myFoo.j = 2;
    myFoo.p = myMessageList.cheese; 

    fwrite(&myFoo, sizeof(Foo), 1, file);
    fclose(lfile);
    return 0;
}

Now there is a little bit more going on here but most of it isn't really that important, it's just to set up the example so don't worry too much about it if you don't get it all. Basically think of MessageList as a memory pool, which might easily contain all kinds of game assets like textures and models and sounds. I just set up some default strings and copy their value across into the structure in the same way I would load a texture and then store it in a block of memory somewhere. The way you load and store these assets is up to you, you can store them dynamically or allocate them statically, just make sure the object you're saving (in this case Foo) has access to that memory chunk. So going back to the example, I save the memory address of myMessageList.cheese and then write the myFoo object to disk.

The problem with doing only this is that we cannot guarantee MessageList will be at the same address in memory when we try to load it as when we saved it previously (in fact it is pretty much guaranteed this won't be the case). There is no way of going between the address it resided at when you saved it and the address you actually want when you come to load the object, so that just won't work. What we need is a way of translating our pointer from it's old address to the new address of MessageList when it is loaded. When we save Foo, if we know the start address of MessageList when we save it, and the address of what the pointer in our object is pointing to in MessageList, we can calculate the offset of that pointer by subtracting one from another!

class Foo {
public:
    int   i;
    int   j;
    char* p;
    uintptr_t offsetOfPointer;
};

class MessageList {
public:
    char hello[6];
    char goodbye[8];
    char cheese[7];
};

int main(int argc, char** argv)

    MessageList myMessageList;

    char* hello   = "Hello";
    char* goodbye = "Goodbye";
    char* cheese  = "Cheese";
    
    strcpy(myMessageList.hello,   hello); 
    strcpy(myMessageList.goodbye, goodbye);
    strcpy(myMessageList.cheese,  cheese);      

    FILE* file;
    file = fopen("MyFile.bin", "wb");

    Foo myFoo;
    myFoo.i = 1;
    myFoo.j = 2;
    myFoo.p = myMessageList.cheese; 

    // Here we store the offset of the pointer, this should 
    // really be stored in a pointer lookup table and not as 
    // part of the object but I have left it here for simplicity.
    myFoo.offsetOfPointer = (uintptr_t)myFoo.p - (uintptr_t)&myMessageList;

    fwrite(&myFoo, sizeof(Foo), 1, file);
    fclose(lfile);
    return 0;
}

The line

myFoo.offsetOfPointer = (uintptr_t)&myFoo.p - (uintptr_t)&myFoo;

might look a bit confusing but it really isn't. uintptr_t is just a typedef of an unsigned integer value and it is only used to guarantee cross platform compatibility. It provides a large bound for the size of the number it can store as well, (in this case we could probably just use 'long' and it would still work on most platforms). We cast the memory address to integers, and because the object is contiguous in memory, subtracting one address from the other gives us a value of how far the pointer in bytes is into the data structure. So all we need to do is save the offset of the pointer, not the pointer itself, and use that offset to 'Fix-up' the pointer when we load the object again. Ideally you store the pointer offsets in another file which is not part the object you're interested in, you'd then load them at the same time and use the offsets stored in your pointer lookup table,  I have just stuck the offset in the object for brevity.

THE PAYOFF:

Right, so we have saved our object and the offset of the pointer into the memory pool we are loading from (in this case MessageList). Lets see how we actually do the pointer fix-up.

class Foo {
public:
    int i;
    int j;
    char* p;
    uintptr_t offsetOfPointer;
};

class MessageList {
public:
    char hello[6];
    char goodbye[8];
    char cheese[7];
};

int main(int argc, char** argv)
{
    MessageList myMessageList;

    char* hello   = "Hello";
    char* goodbye = "Goodbye";
    char* cheese  = "Cheese";

    strcpy(myMessageList.hello,   hello);
    strcpy(myMessageList.goodbye, goodbye);
    strcpy(myMessageList.cheese,  cheese);

    FILE* file;
    file = fopen("MyFile.bin", "rb");

    Foo myFoo;
    fread(&myFoo, sizeof(Foo), 1, file);

    // Here we cast MessageList to be an array of chars 

    // in order to make sure when we use the offset to do 
    // some pointer arithmetic, we are adding only
    // 1 byte (char == 1 byte) per offset so we move the correct 

    // distance into the data structure for our pointer offset.
    char* data = (char*)&myMessageList + myFoo.offsetOfPointer;

    // We then cast the data back to the same type of the pointer. 

    // In this case we actually stored a char* pointer so the next
    // step is a bit pointless, but imagine it was another user
    // defined type such as Model, you would just do
    // myFoo.model = (Model*)data;
    myFoo.p = (char*)data;

    fclose(lfile);
    return 0;
}


The lines you want to pay attention to are these ones:

char* data = (char*)&myMessageList + myFoo.offsetOfPointer;
myFoo.p = (char*)data;


What we are doing is casting the MessageList to be an array of chars, this just means we can treat the structure as a chunk of bytes in memory that we can index into. When we calculated the offset before, that gave us the offset in bytes, so by adding the offset to the start address of MessageList after we have cast it to an array of chars will leave us at the correct place in memory where we want to fix-up our pointer! The second line in this case is actually redundant as it happens we are dealing with char arrays anyway with our string values, we don't need to cast the char* back to the object type of our pointer because it's already a char*, but what if our pointer was to another user or fundamental type such as Model or int, we would need to cast it back to the correct type before we could use it.

And that is pretty much it! Once you have done that you have completely loaded your object and can do with it what you please. I must stress that this really is the simplest example of Load in Place, and hopefully now you can see the power and potential of such a technique. As I said before, creating a lovely interface to handle all this nitty gritty stuff for you is absolutely the way to go, and I will try and dream up something as an example in not too long.

I hope this has been of use to you and made pointers and loading data not quite so scary as it may have been before. (Also if I have got something wildly wrong please let me know!)

(Also check out these articles on Load In Place which go into much more detail and are great resources which helped me)

http://entland.homelinux.com/blog/2007/02/21/fast-file-loading-ii-load-in-place/
http://www.gamasutra.com/view/feature/3984/delicious_data_baking.php?print=1