IOTTMCO

Intuitively Obvious to the Most Casual Observer

Linking raw data with C code

Generally speaking, the unix design philosophy dictates that one use external files for most resources (images, help files, etc…). This is good advice when possible, but it’s sometimes just much easier to package everything into one file: when writing an operating system kernel, say. To accomplish this, the “standard” solution is to write a simple bin2c program, which creates a simple C file containing the resource.

char the_data[] = {0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x00};

This is the type of solution one is most likely to find searching the problem on stackoverflow and so on. It works, of course, but feels somewhat ugly, for good reason. The compiler ends up re-constructing the original sequence of bytes verbatim, to put in the executable - why not put it in there directly?

That is, of course, possible. The idea is to create an object file with the desired data, with symbols pointing to the beginning and end of it, which can then be referenced from linked C code (or assembly, or what have you). The standard GNU binutils contain a program objcopy that supports (among many other useful tricks) just this. From the manpage:

-B bfdarch
--binary-architecture=bfdarch
    Useful when transforming a architecture-less input file into an object file.  In
    this case the output architecture can be set to bfdarch.  This option will be
    ignored if the input file has a known bfdarch.  You can access this binary data
    inside a program by referencing the special symbols that are created by the
    conversion process.  These symbols are called _binary_objfile_start,
    _binary_objfile_end and _binary_objfile_size.  e.g. you can transform a picture
    file into an object file and then access it in your code using these symbols.

How very convenient. It’s almost as if some developers spent some time trying to prevent others from having to resort to petty bin2c-style hacks. The invocation, then, will be

objcopy -I binary -O elf32-i386 -B i386 license.txt license.txt.o

In this case, the relevant symbols will be _binary_license_txt_start, _binary_license_txt_end, and _binary_license_txt_size. These are slightly unintuitive to use; in particular, they are not pointers. Rather, the symbol *_start is the actual first element, similar to the following assembly:

_binary_license_txt_start:
    db 0x48
    db 0x65
    db 0x6c
    db 0x6c
    db 0x6f
    db 0x00
_binary_license_txt_end:

So, if you want to get a pointer to the string, use &_binary_license_txt_start.

The symbol *_size is even more interesting. Because we’re in magic elf-land, symbols can do weird things, like point places that aren’t in the object file. In this case, the *_size symbol is set to point to the absolute address N, where N is the size of the binary data. In other words, &_start + &_size = &_end.

One last cautionary note: objcopy won’t append a null byte, so don’t assume your data is terminated like that.