Wednesday, July 6, 2011

Anatomy of a HEX File

Before we can write a bootloader that reprograms our PIC for us we need to understand what a C18 compiled .hex file looks like. To do this I'm just using the MPLABX environment I set up on my Ubuntu dev machine. If we're going to look at a hex file then we need to get a hex file first. Let's keep it very simple. I created a new standalone project in MPLABX called HEXplorer. I added a single file named main.c with the following contents:
void main(void)
return 0;

That's as simple as I know how to make it. It compiled fine and created a hex file that looks like this:

Twenty-three lines of instructions for a four line program? This highlights a good point. C compilers are very nice utilities to allow you to write firmware code in a high level language but you do pay a price for it. Another interesting thing to noticed when you build this simple program is the RAM usage. After a clean build it showed 268 bytes or 7% of total RAM being used. Where is all that RAM going? Good question. If you look at section 3.3 (START-UP CODE) of the C18 User Guide you'll read that the default startup behavior that C18 creates involves initializing stack pointers. Those stack pointers point to a stack that is, by default, 256 bytes in size. You can control that size with a custom linker file. More on that in a later post. In section 3.4 of the User Guide you see that the compiler-managed resources account for a minimum of 12 bytes. 256 + 12 accounts for our 268 bytes.

HEX Demystified

"I don't even see the code. All I see is blonde, brunette, redhead. Hey uh, you want a drink?"

So if you look at the Intel HEX ( page on Wikipedia ( I recommend you read it before going any further) you'll see that there are multiple formats for HEX files so the first thing we need to do is figure out what format we're looking at. That's easy, we just go to the Project Properties in MPLABX and select the MPLINK item on the left and see on the right that the HEX file format is INHX32.

Alright, so what do we know about that format? Well the MPLINK documentation can tell us. This is what it says. INTEL HEX 32 FORMAT
The extended 32-bit address hex format is similar to the hex 8 format, except that the extended linear address record is also output to establish the upper 16 bits of the data address. This is mainly used for 16-bit core devices since their addressable program memory exceeds 64 kbytes. Each data record begins with a 9-character prefix and ends with a 2-character
checksum. Each record has the following format:
BB A two digit hexadecimal byte count representing the number of data bytes that will appear on the line.
AAAA A four digit hexadecimal address representing the starting address of the data record.
TT A two digit record type:
00 – Data record
01 – End of File record
02 – Segment Address record
04 – Linear Address record
HH A two digit hexadecimal data byte, presented in low byte/high byte combinations.
CC A two digit hexadecimal checksum that is the two's complement of the sum of all preceding bytes in the record.

So let's look at the first line in our compiled file:

Broken into it's parts we have
: - Indicates the start of the line
02 - Tells us that there are 0x02 or 2 bytes in the data segment
0000 - Will always be 0000 for record type 04.
04 - The record type is the Linear Address record. The data bytes represent the upper 16 bits of a 32 bit address.
0000 - The two data bytes. This makes sense because are program is small and will be under the 64KB (0xFFFF) range so the upper 16 bits of the address should be all zeroes.
FA - The checksum for the line. We're going to ignore this for now.

Well, that's not horribly interesting since that is how all of our PIC18 programs are likely to start. So let's look at one more.

This one is more interesting.

: - Start of the line
06 - Data segment will contain 6 bytes of data.
0000 - The starting address of the data in the data segment
00 - The record type is a Data Record
63EF00F01200 - The actual data. We'll disect that in a second.
A6 - The checksum.

Ok, so this time we're actually getting into the data that should be programmed onto our device at the reset vector (address 0x000). An important thing to remember here is how the data is organized. We read it from left to right but the bytes are swapped in each single-word instruction. Back in the MPLINK documentation it describes the data as "A two digit hexadecimal data byte, presented in low byte/high byte combinations." That means that "63EF"is really "EF63" if we want to break it out into binary and figure out the opcode which we obviously want to do right? This is confusing at first and seems completely insane and backward but it will help us later because when we want to program the PIC we program it LSB first then MSB so we'll program the device exactly how the data is in the hex file. It just needs to be swapped when we humans are looking at it.

Dissecting the Data

So lets look at the first two bytes of data.
EF63. Broken into binary we have

1110 1111 0110 0011

We take that information to the PIC18F27J53 datasheet, section 29.1, Table 29-2 which is the standard instruction set table. We just go down the instruction column until we find a match. Remember, some of the instructions have variable bits that will represent data for the instruction to use. That happens to be the case for our instruction which is the GOTO instruction. It has the format:

1110 1111 kkkk kkkk

The k's represent the address to go to. That address is the rest of the bits in our instruction word, “0110 0011” or 0xC6 when translated to hex format. If you want some practice try to figure out what the rest of the instructions are from our data segment. So, in review, the first instruction at the reset vector is going to be GOTO 0xC6.

I don't expect you to take my word for it. Let's have MPLABX confirm this. After building the simple HEXplorer program we can go to Window->PIC Memory Views->Memory View 2. This will display the PIC Program Memory in a tab in the bottom view area and you can see what the program memory should contain after programming the PIC with the HEXplorer hex file.

Well, you're welcome to go through the rest of the hex file and try to decode all of the instructions but I think we've got a solid enough understanding of the hex file format to continue working on our bootloader. The next piece of our problem is to "Understand how to write to the PIC's program memory space." That will be up next.


  1. hi i have a question for you,
    i work with PIC16F877A that has a flash memory of 14kb but i can load HEX files into it till the size of the hex file is 24 kb after that it throws error.....any idea how it is possible to load a 24kb hex file into a 14 kb microchip......any answers is greatly appreciated

    1. Sorry I didn't reply earlier to this. A hex file doesn't contain just the data that goes into program memory. There are instructions in the hex file like segment address, linear address, and end of file that occupy space in the hex file but have no effect on program memory size. Also remember that there is a bunch of overhead on each line in the hex file like the record type and checksum that also doesn't go into program memory. So the short answer is that while your hex file is 24kb in size, it doesn't have 24kb of data to go into the PIC. Hopefully that makes sense. Let me know if you have any other questions.

  2. Replies
    1. You're most welcome zewdu. Have a great day.

  3. This comment has been removed by the author.

  4. This comment has been removed by the author.

  5. Why EF 63 is dissasembled to GOTO 0xC6? Where is 63, and where is 0xC6? Maybe it is aderss of byte, because 063h x 02h = 0C6h

    1. I can't believe I missed that. Nice catch. 63 definitely doesn't translate to 0xC6 like I said and you are right about multiplying by 0x02. I went through the process of trying this again with the C8 compiler (C18 isn't available anymore) and found the same result on a simple program. If I take the GOTO address I extract from the hex file and multiply it by 0x02 I get the same address shown in the Program Memory view in MPLABX. Sorry for the misinformation. I still don't quite understand the reasoning behind multiplying by 0x02 though. The datasheet says the GOTO value from hex is a literal.

  6. I found an explanation in DS39632e:

    The Program Counter (PC) specifies the address of the
    instruction to fetch for execution...
    ...The PC addresses bytes in the program memory. To
    prevent the PC from becoming misaligned with word
    instructions, the Least Significant bit of PCL is fixed to
    a value of ‘0’. The PC increments by 2 to address
    sequential instructions in the program memory!


Keep it clean and civil. That's all I ask.