Monthly Archives: April 2014

Matlab and integers

In computer programming, when a calculation involves variables of different types, we say that these calculations are performed in mixed-mode arithmetic. In most common typed programming languages, mixed-mode arithmetic involves promotion of variables in such a way as not to lose precision. The most common examples of mixed-mode arithmetic would be calculations involving integer and floating-point variables. As a rule, the integer variables are converted to floating-point values before the arithmetic operations are carried out. Consider for example the following C program:

int main()
{
    int two = 2;
    double x = 2.2;
    double y = two*x;
    printf("%f\n",y);
}

When computing y, the integer two is converted to a double-precision value in order not to degrade the precision of the mixed-mode multiplication with the double-precision value x. The multiplication is then carried out, and the computer prints out 4.4, as expected. Fortran functions in an essentially identical way. Note that the promotion rules are independent of the variable to which the result will be assigned. If necessary, the result of the computation is converted to another type before storage.

Of all the computer languages I have worked with, Matlab is the first one that goes the other way. Consider the following Matlab code:

two = int8(2);
x = 2.2;
y = two*x

Just assigning a value to x, whether or not it has a decimal point, implicitly makes x a double, as reported by class(x). (Most of us who started out with Fortran eventually came to the conclusion that implicit typing is a great evil. Too bad the Matlab folks don’t feel the same way.) If you run this code in Matlab, you will get the following result: y = 4. If you query Matlab about the class of y, it will return int8. Not what most of us would expect…

This unusual behavior means that you have to be very careful when using integers in Matlab. It is very easy to write a line of mixed-mode arithmetic that won’t trigger any warnings, but that will generate results different from those expected by the programmer.

So what is the rational response to Matlab’s unusual mixed-mode arithmetic rules? One might be tempted to simply avoid integers, but the fact is that there are good reasons to use integer variables from time to time. Here are a few guidelines that might help:

  1. Minimize the use of integers in Matlab programs. You probably don’t need an integer variable for a simple accumulator (e.g. a variable that counts up how many times something happened). (Note the use of the word “probably” in the previous sentence. I know that there are clever people who will cook up counterexamples. As with all things, you have to think a bit about what your data will look like. Since we’re talking about scientific programming here, we probably don’t need to worry about malicious users, although simple mistakes in the input are always a possibility and you do have to plan for those. But that’s a matter of testing the inputs to the program, and not what happens when reasonable inputs have passed through those filters.) You may not need an integer array to store data that, conceptually, ought to be integer in nature, provided you’re careful not to do things that are risky with floating-point numbers, such as comparing them for equality. (For a brief introduction to some of the issues you can run into with floating-point numbers, see http://floating-point-gui.de.) I have to say right up front that this is contrary to the advice I would give for any other computer language I know. We would normally think of appropriate typing of variables as a smart, safe practice that prevents, for example, variables representing whole numbers from accidentally being corrupted into non-integer values. Here though, the risk of numbers being unintentionally rounded to integers during a mixed-mode calculation is the greater risk.
  2. Make a habit of searching for instances of any integer variables in your Matlab programs. Be aware that code such as
    x = int16(1);
    y = x;
    

    implicitly makes y of class int16. (Again with the implicit typing!) In fact,

    x = int16(1);
    y(1) = x;
    y(2) = 4.8
    

    generates the output

    y =
    
          1      5
    

    since the entire array y is typed int16 by the assignment of the first element, and the floating-point value 4.8 is converted to an integer (by rounding) prior to storage.

    Comment any implicitly typed integer variables to make their types clear to anyone reading the code. Look for places where integer variables, whether implicit or explicit, are involved in mixed-mode arithmetic and explicitly convert the integers to double in the arithmetic expressions (unless you require some other behavior, of course).

  3. Watch for whole-number outputs to programs whose outputs are not expected to be whole numbers. This can be a sign that a mixed-mode calculation has resulted in an integer result.
  4. Do use integers for tags. Tags, in the sense I intend here, are numbers we associate with each of several possibilities, to be used perhaps in a case statement, or to symbolically represent positions in an array associated with particular objects. Here is a simple-minded example, which we might use in a program that handled employee data in a restaurant:
    Unclassified = int8(0);
    Cook = int8(1);
    ShiftSupervisor = int8(2);
    

    The idea is that we would have a field in the employee record that contained his or her job classification. Rather than store a word (where maintaining consistent spelling might be a long-term maintenance headache), we store a number that represents the job classification, and rather than using those numbers explicitly anywhere, we use the tags defined above, e.g.

    if (employee(i).jobclass == Cook)
    ...
    

    This makes the code easy to read and easy to maintain. Since we are going to compare tags from time to time for equality (as in the above code snippet), we need a type where these comparisons can be made exactly, i.e. an integer type.

I generally like Matlab because it provides a large set of functions to carry out just about every common numerical procedure you might run across, which is great from the point of view of not having to reinvent the wheel. It’s also a very easy programming environment for students to learn. Here though, we have run into two bad (in my opinion) language design decisions:

  1. Mixed-mode arithmetic rules that are likely to generate hard-to-find bugs because they violate most programmers’ expectations.
  2. Implicit typing, which was a common source of bugs in Fortran before the IMPLICIT NONE declaration was made standard.

Both of these design decisions are built into Matlab and it would be hard to fix them without breaking a lot of existing code. I guess we’ll all have to learn to be extremely careful with integers in Matlab.