Blog

Relearning MSX #44: Pointers, arrays and strings (Part 1)

Posted by in Development, How-to, MSX, Retro, Technology | May 24, 2016

Relearning_MSX_44

Pointers are a special type of data in the C programming language (and also C-like languages). In short, a pointer is just a variable that contains the memory address of some data in the computer. It’s a simple concept, but pointers are always a source of confusion when we’re learning the language. However, once understood they become a powerful tool that provide lots and lots of flexibility.

Today we’re starting a short series explaining what pointers are and how they can make our programs much more interesting. As always, don’t hesitate to ask in the comments below when anything isn’t clear.

Let’s begin.

The address operator (&) and the indirection operator (*)

Computers store all program data and variables in an area known as memory. In MSX computers memory is organized in 65,536 cells(*), each containing a number between 0 and 255 (known as a byte).

Each memory cell is identified by a number between 0 and 65,535 (0x0000 – 0xFFFF in hexadecimal). We call this the memory address.

It’s easy to visualize this concept if we think of the computer memory as an spreadsheet with 65,536 rows, such as this one:

Memory cells and data (Click to enlarge)

Memory cells and data (Click to enlarge)

All variables, arrays, and code in our program is stored in these memory cells. In BASIC we didn’t usually need to know where the program stored variables in memory (though there was anyway a VARPTR function that returns the address of a variable). In C we will use these addresses quite often.

(*) These 65,536 memory cells represent the amount of memory the MSX processor is able to access at the same time. Most MSX computers have more memory than this amount. In a future post we will discuss the mechanism to access that memory.

Finding out the memory address of a variable

To find out the address of a variable we use the address operator, represented in C by the ampersand symbol (&). Prefixing a variable name with & returns the memory address of the variable instead of its contents. For example, if we have a variable i, the expression &i returns the memory address of the variable.

This also works with array elements. As an example, the expression &a[i] returns the memory address of the element at position i inside the array a.

The program below illustrates this:

(Click to enlarge)

(Click to enlarge)

Arithmetic with addresses

The address values we get by using the & operator behave slightly different from normal numerical values. Pay attention to the example program below and look what happens when we add 1 and then 2 to the address of the int variable i:

(Click to enlarge)

(Click to enlarge)

In this example, when we add 1 to &i the address increases by 2. When we add 2 to &i the address it increases by 4. There’s a reason for this, and it has to do with the type of the i variable.

The pointer data type

The reason why in this particular case the address increases by 2 bytes at a time is that the compiler considers &i as being of type pointer.

Values of type pointer always contain memory addresses. One way in which they behave differently from other values is that they can’t be multiplied or divided. Trying to do so will result in an error during compilation:

(Click to enlarge)

(Click to enlarge)

The only arithmetical operations supported by pointers are addition and substraction. This means that taking a given address &as a reference point, we can obtain the addresses of other variables n positions before or after &i. For example:

&i + n (returns the address of the variable n positions after &i)

&i – n (returns the address of the variable n positions before &i)

It’s important to note that whether these operations return meaningful values or not depends on the type of the variables involved. For example, in MSX-C int variables occupy two bytes in memory, but char variables only take up only one byte. This is why (assuming i is an int&i + 1 is two bytes higher than &i. If we have a char variable c, the address returned by &c + 1 is only one byte bigger than &c:

(Click to enlarge)

(Click to enlarge)

In the C progamming language we call a pointer that contains the address of an int variable a pointer to int. Similarly, a pointer that contains the address of a char variable is a pointer to char, and so on.

Assigning pointers to variables

We can assign pointer data (which are just numbers representing a memory address) to a variable of type pointer. Pointer variables are defined by prefixing the variable name with an asterisk (*):

data_type_pointed *variable_name;

For example, we define a pointer variable called p that points to a value of type int like this:

int *p;

Remember the example before in which adding 1 to the address of a variable increased the address by two bytes. Pointers contain addresses, and because of that, they behave in exactly the same way:

(Click to enlarge)

(Click to enlarge)

In the example above we have a pointer to int initialized to the address of int variable i. Printing the value of the pointer returns the address of i, and adding one to the pointer returns the address of one int after i. Notice that we don’t prefix the pointer variable with the asterisk.

We could have also used the increment or decrement operators with the exact same result:

p++;

Remember that different data types take up a different number of bytes in memory. So far we’ve only seen types that take either one byte (char) or two (intunsigned), but in the near future we will see other data types that are bigger.

Pointer variables occupy two bytes themselves.

Accessing the data referenced by a pointer

To access the data referenced by a pointer we use the indirection operator (*) before the name of the pointer variable. This tells the program to retrieve the value in the memory address referenced by the pointer. We call this dereferencing the pointer. For example, the expression *p means “the value in the memory address p”. If p is a pointer to int, then *p will also be an int value.

This may be confusing, so let’s see a short example:

(Click to enlarge)

(Click to enlarge)

We can use a dereferenced pointer in any situation we can use an expression of the same type as the pointer. For example, a dereferenced pointer to int can appear anywhere an int would be acceptable. It is also possible to assign to it:

(Click to enlarge)

(Click to enlarge)

Don’t worry if you find this confusing at the beginning. It certainly took me some time to grasp the concept. Just remember the following: if p is a pointer to int:

  • int *p declares the pointer variable
  • p contains the memory address of an int value
  • *p is the int value at memory address p

From the compiler’s point of view, accessing the value referenced by a pointer is a two-step process: first, obtain the memory address stored in the pointer variable, and second, access that memory address to retrieve the value.

(Click to enlarge)

(Click to enlarge)

Summary

In this post we’ve started learning about pointers. We’ve seen the address operator (&) and the pointer data type. We’ve seen the properties of pointer arithmetic (addition and substraction from a pointer to access the previous or next object in a series). We’ve also seen how to declare pointer variables and access the value they reference using the dereference operator (*).

In the next post

In the next post we will see the relationship between pointers and arrays.


This series of articles is supported by your donations. If you’re willing and able to donate, please visit the link below to register a small pledge. Every little amount helps.

Javi Lavandeira’s Patreon page

4 comments on “Relearning MSX #44: Pointers, arrays and strings (Part 1)

  1. Nice post again! While I was looking at some old code I saw:

    p1 = (char*)fp;
    p2 = (char*)(&fres);

    Is this the same as without the type hinting:
    p1 = *fp;
    p2 = *(&fres);

    The vars are declared like this:
    char *p1, *p2;
    FIBSUB *fp; /* FIBSUB is a struct */
    FIBENT fres; /* FIBENT is a struct */

    • No, they’re not the same thing. p1 and p2 are pointers to char and fp and res are pointers to FIBSUB. They can’t be assigned to each other.

      Casting fp to a (char *) allows you to access every byte in the fp structure one by one, instead of taking it as a whole.

  2. Javi, I went through your entire postings on MSX in 2 days and now it’s time to wait for the next post :)
    Until then, congrats on this great material. It’s been some 15 years since I last fooled around with C, and your extremely didactical content is of great help.

    Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *