Amiga Bitmap Fonts, Part 3: The Font Descriptor File

26 June 2021

In Part 2 of our exploration of Amiga fonts we looked at the format of the font contents file and found… well not much of interest. Let’s now turn our attention to the other file type - the font descriptor file.

The font descriptor filename is usually numeric - the number corresponding to the vertical size of the font in pixels. So sapphire/19 will contain a representation of a font with a height of 19 pixels.

Once again we turn to our well thumbed copy of the Amiga ROM Kernel Reference Manual: Libraries for details of the contents of a typical font file:

struct TextFont { struct Message tf_Message; /* reply message for font removal / UWORD tf_YSize; UBYTE tf_Style; UBYTE tf_Flags; UWORD tf_XSize; UWORD tf_Baseline; UWORD tf_BoldSmear; / smear to affect a bold enhancement */

UWORD   tf_Accessors;
UBYTE   tf_LoChar;
UBYTE   tf_HiChar;
APTR    tf_CharData;        /* the bit character data */

UWORD   tf_Modulo;          /* the row modulo for the strike font data   */
APTR    tf_CharLoc;         /* ptr to location data for the strike font  */
                            /*   2 words: bit offset then size           */
APTR    tf_CharSpace;       /* ptr to words of proportional spacing data */
APTR    tf_CharKern;        /* ptr to words of kerning data              */

};

Breaking this down we have:

a struct called Message which the OS uses to link together the fonts in the system list. We don’t need this so will ignore it.
Next, a UWORD containing the vertical size.
Next, some bits containing style information and flags information.

So far this is all very similar to the font contents file so I’m beginning to wish I hadn’t bothered with that file first.

Next we have the horizontal width, XSize, of the font - useful for monospaced fonts. Now we’re getting somewhere.
And then the Baseline, the distance in pixels from the top line of the font to the row on which the base of each letter sits (except for descenders of course!)
Then BoldSmear. When algorithmically making a font bold, the amount of ‘boldening’ in pixels.
Accessors seems to relate to the number of open instances of the font. Not very useful for us really.
LoChar, or the minimum US ASCII character represented in the font. For a nice complete font this is likely to be 32.
HiChar, or the maximum US ASCII character represented in the font. For a complete font this is likely to be 255.

We can extract all the above data using a similar Node.js script to the one we used in Part 2. Read the file into a Node.js Buffer, then read bytes and words at various positions to extract the data we need. The beginning of the file has some assembly instructions so we’ll slice those off and work on the remainder of the file.

const fs = require('fs');
const path = require('path');
const BitArray = require('node-bitarray');
const _ = require('lodash');
const fontName = 'WebLight';
const fontSize = 32;
const rawFontFile = fs.readFileSync(
    path.join(__dirname, `../fonts/webcleaner/${fontName}/${fontSize}`)
);
// strip the first 32 bytes off to make the pointer locations accurate
const fontFile = rawFontFile.slice(32); 
const bitIsSet = (value, bit) => {
    return !!(value & (2 ** bit)); 
}
const expandStyle = (style) => ({
    value: style,
    normal: style === 0,
    underlined: bitIsSet(style, 0),
    bold: bitIsSet(style, 1),
    italic: bitIsSet(style, 2),
    extended: bitIsSet(style, 3),
    colorfont: bitIsSet(style, 6),
    tagged: bitIsSet(style, 7)
});
const expandFlags = (flags) => ({
    value: flags,
    disk: bitIsSet(flags, 1),
    proportional: bitIsSet(flags, 5),
    designed: bitIsSet(flags, 6)
});
const ySize = fontFile.readUInt16BE(78);
const style = fontFile.readUInt8(80);
const flags = fontFile.readUInt8(81);
const xSize = fontFile.readUInt16BE(82);
const baseline = fontFile.readUInt16BE(84);
const boldSmear = fontFile.readUInt16BE(86);
const accessors = fontFile.readUInt16BE(88);
const loChar = fontFile.readUInt8(90);
const hiChar = fontFile.readUInt8(91);

And then we get to the fun bit - the character data.

How character data is represented

Let’s consider a small pixel font with three characters, A, B, and C:

.###....####.....###.
#...#...#...#...#...#
#####...####....#....
#...#...#...#...#...#
#...#...####.....###.

In a font descriptor file the characters are stored alongside each other as follows:

.###.####..###.
#...##...##...#
#########.#....
#...##...##...#
#...#####..###.

A font descriptor file also contains the equivalent of a .notdef character - a character which is displayed as a fallback when it is not defined in the font. We’ll design this as a rectangle and tack it on.

.###.####..###.#####
#...##...##...##...#
#########.#....#...#
#...##...##...##...#
#...#####..###.#####

Each row becomes the font data. The data is stored as UWORDs so we’ll pad the end of each row with zeroes to get a multiple of 16 bits. Our first row then looks like this:

.###.####..###.#####............ (or in hex: 779DF000)

Now we can explain the charData in our struct. It’s a pointer (APTR) to the starting location in our file where the character data is stored. Every row in our simple example is 32 bits, or 4 bytes long. In our struct this corresponds to the modulo - the number of bytes that should be repeatedly added to the starting location to retrieve subsequent rows of the character data.

To be able to extract single characters, we need charLoc - a list of locations of each character. For our simple font this is:

0005  // 'A' starts at position 0 and has length 5
0505  // 'B' starts at position 5 and has length 5
0A05  // 'C' starts at position 10 and has length 5
0F05  // 'notdef' starts at position 15 and has length 5

So if we, for example, want to render the character ‘C’, we need to take five bits starting from position 10 from each of our rows of character data.

Finally we have charSpace - the amount of space each character takes up in pixels, and charKern - the amount of kerning. But beware! Kerning is usually thought of in terms of the distance between pairs of characters. That’s not the case here - kerning is a per character value and represents how much space should be added or subtracted to the gap between this and the following character.

Right, we have all the information we need. Let’s return to Node.js.

Extracting the character data

The beauty of Node.js is its rich ecosystem of modules, which means we don’t have to reinvent the wheel when writing code. We’re going to need to work on arrays of bits and fortunately a module exists to do just that, node-bitarray. It’s old but it does the job. We’re also going to use a common helper library, lodash, to work on our array of bits, chunking it down into individual rows.

Let’s see some code. Again, the whole script is available at https://github.com/smugpie/amiga-bitmap-font-tools/blob/main/node/readFontDescriptor.js to save some typing.

const charRange = hiChar - loChar + 2; // There's an extra "notdef" character
fontDataStart = fontFile.readUInt32BE(92);
const modulo = fontFile.readUInt16BE(96);
locationDataStart = fontFile.readUInt32BE(98);
spacingDataStart = fontFile.readUInt32BE(102);
kerningDataStart = fontFile.readUInt32BE(106);
const locationData = fontFile.slice(locationDataStart);
const kerningData = fontFile.slice(kerningDataStart);
const spacingData = fontFile.slice(spacingDataStart);
const fontBitmapData = fontFile.slice(fontDataStart, fontDataStart + (modulo * ySize));
const fontBitArray = BitArray.fromBuffer(fontBitmapData).toJSON();
const fontBitmapRows = _.chunk(fontBitArray, modulo * 8);

In the above code snippet, we calculate the number of characters from the hiChar and loChar values, not forgetting that we also have an extra notdef character. Then we grab the locations of the character, spacing and kerning data and convert our character data to rows of bitarrays.

Let’s now work through our glyphs, dig out the subset of bits we need and create our font as JSON data…

const fontData = {
    name: `${fontName}${fontSize}`,
    ySize,
    flags: expandFlags(flags),
    style: expandStyle(style),
    xSize,
    baseline,
    boldSmear,
    accessors,
    loChar,
    hiChar,
    fontDataStart,
    locationDataStart,
    spacingDataStart,
    kerningDataStart,
    modulo,
    glyphs: {}
};
for (let i = 0; i < charRange; i += 1) {
    const charCode = loChar + i;
    const locationStart = locationData.readUInt16BE(i * 4);
    const bitLength = locationData.readUInt16BE((i * 4) + 2);
    fontData.glyphs[charCode > hiChar ? 'notdef' : charCode] = {
        character: charCode > hiChar ? 'notdef' : String.fromCharCode(charCode),
        kerning: kerningData.readInt16BE(i * 2),
        spacing: spacingData.readInt16BE(i * 2),
        locationStart,
        bitLength,
        bitmap: fontBitmapRows.map((row) => row.slice(locationStart, locationStart + bitLength))
    }
};
console.log(JSON.stringify(fontData));

Job done! Let’s try it out by extracting a character from the WebLight/32 font in the repository. What happens when we get a lowercase a from our font data? We’ll try this:

fontData.glyphs[97].bitmap.forEach((row) => {
    console.log(row.join('').replace(/1/g, '##').replace(/0/g, '..'));
});

............................
............................
............................
............................
............................
............................
............................
............................
............................
............................
............................
........##########..........
....##################......
..######################....
..########......########....
..######..........######....
..................######....
........################....
..######################....
..##############..######....
########..........######....
######..........########....
########......##########....
############################
..##############..##########
....##########......########
............................
............................
............................
............................
............................
............................

Success! We have our font in an interoperable data format. Which means we can do some interesting things with it. For example, font editors tend to be scriptable so could we recreate our fonts in a more usable font format? Stay tuned…