Thursday, February 11, 2010

Week 3: Schoolboy Math Goes Binary

Source: rm.2010021js
Test page: register-machine-test-2010-02-11.html
Language reference: assembly-language-reference.html

Changes:
This was Speed Week. I made three major improvements, two that affect the speed of running programs, one that affects the speed of writing them.

- run_program() now dumps all output once at the end of the program (or when an error is thrown), rather than one line at a time. This dramatically speeds up output when multiple lines are being written (e.g., the ASCII chart and increment functions)
- Line-number based compiling has been replaced with label-based ("jmp 0 label:" vs. "5 jmp 0 15"). Look at the sample programs on the test page to see how this simplifies program creation. Now if you want to adjust a program, you don't need to go through and renumber everything and adjust your jump steps to match.
- Increment-based arithmetic has been replaced with real binary arithmetic. This is the largest speed improvement.
- The sample programs have been updated to the new format, and a couple new ones (factorials and the Fibonacci sequence) have been added.
- To illustrate the speed improvements, the word_length has been raised to 16 bits.

I learned a lot this week about arithmetic, which is a strange thing for a 38 year old computer programmer to say. Yes, I know how to do math by hand, they taught me all about it in elementary school (good old Mrs. Spake at Kernersville Elementary prevented me from a life of crime, all things considered). The new functions that do arithmetic schoolboy-style were interesting to write, and ended up being pretty concise, if a little odd. The exception to the fun was division, however; that was downright painful.

Here is an examination of the four basic arithmetic functions, and how I am currently treating them.

Addition

Here is my old increment-based add function:

// Increment co_0 by co_1, set co_1 to 0
add = function() {
while(co_read(1) != 0) {
co_inc(0);
co_dec(1);
}
}

Until register 1 returns a 0 value, decrement it, and increment register 0, which will end up being the sum of both registers. Nothing to it. Move Suzy's apples to John's pile, and then John has all the apples.

The function is correct, but the problem it has is with speed. The function always goes through regs[1] iterations: when register 1 starts out set to 250, the function takes 250 iterations. The increment function that add() calls, while not very expensive, iterates through each bit, least significant first, until it finds a zero. A little function analysis shows incrementing from 0 to 255 iterates through a total of 502 bits. This problem was compounded by my old multiplication function, which used a similar principle: adding the multiplicand to itself, then decrementing the multiplier, and repeating until the multiplier is 0. The total number of iterations over bits is roughly twice the product, e.g., 255x255 iterates over 129,540 bits!

It was clear after some basic testing that if I'm going to work with numbers that aren't tiny, I had to do math more efficiently, and that's what I've taken steps to do during this week's build.

Here is the revised add function:

co_add = function() {
// a = register 0's current bit, b = register 1's current bit, c = carry bit;
c = 0;
for (var x in bit) {
a = (regs[0] >> x) & 1;
b = (regs[1] >> x) & 1;
if (c != b) { regs[0] ^= bit[x] };
if (a == b && a != c) { c = !c }
}
if (c != 0) { overflow = true }
}

First, binary addition works the same way it does in base 10, except you only have ones and zeroes. You can subtotal by iterating right to left over the digits, carries work the same, everything's the same. Here's binary 011 (decimal 3) + binary 010 (decimal 2) to illustrate:

Step 1   Step 2   Step 3

1 1
011 011 011
+ 010 + 010 + 010
----- ----- -----
1 01 101 (binary 101 = decimal 5)

Just like you remember from school, right? All the possibilities for subtotaling each column can be layed out in a simple matrix of eight options: Two possibilities for each of the number being added, and two for whether there was a carry from the previous column. Here is the matrix of possibilities, the subtotals they produce, and whether or not the next column will need a carry:

Carry  Row 1  Row 2     Subtotal   New Carry
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

The matrix has two properties I stumbled across that I found suprising. First, If the current column's carry and row 2 values are the same, the subtotal will be the same as row 1, otherwise it will be the opposite. Second, If both rows have the same digit, the next column's carry is that digit, otherwise it remains whatever it was before. Look for yourself. It's pretty strange.

Those stumbled-upon properties combine to make it simple to calculate a sum in one pass through the digits, as well as to update row 1's values in place to make it the sum, rather than using a temporary variable. The meat happens here:

if (c != b) { regs[0] ^= bit[x] };
if (a == b && a != c) { c = !c }

The first line says "If the current value of carry doesn't equal row 2's bit, flip row 1's bit." The second says "If row one and two are equal, but do not match the current carry, flip the carry." Concise, but abstracted from what I would consider math, even though it effectively is adding the same way a third-grader does.

Subtraction:

The rewrite for subtraction works much the same way as addition. In fact, the function is identical except for the line that manages the "borrow" flag (the carry flag from addition). Here is the matrix for subtraction:

Borrow Row 1  Row 2     Subtotal   New Borrow
0 0 0 0 0
0 0 1 1 1
0 1 0 1 0
0 1 1 0 0
1 0 0 1 1
1 0 1 0 1
1 1 0 0 0
1 1 1 1 1

Subtotaling worked exactly the way it did for addition. Whether borrowing continued or not took a fair bit of analysis, though. It eventually jumped out at me that borrow tends to continue as it was, and in both cases where it switches, the previous value and row 1's bit matched. And when it changed, it changed to row 2's bit.

So with "c" being our borrow bit, and "a" and "b" being the row bits, managing the borrow flag can be accomplished with this simple statement:
if ( c == a ) { c = b };

Madness! Anyway, that makes the final subtract function:
co_subtract = function() {
c = 0; // Borrow flag
for (var x in bit) {
a = (regs[0] >> x) & 1;
b = (regs[1] >> x) & 1;
if (c != b) { regs[0] ^= bit[x] };
if ( c == a ) { c = b };
}
if (c != 0) { overflow = true }
}

Multiplication

Binary multiplication also works like decimal multiplication, but it has a fantastic simplicity to it that is really easy to code for. Take this example, multiplying 9 x 5 (binary 1001 x 101):
    1 0 0 1
x 1 0 1
-----------
1 0 0 1
0 0 0 0
1 0 0 1
-----------
1 0 1 1 0 1

First, binary 101101 is indeed 45. Second, for all the subtotal rows, one of two things happens: either the multiplicand comes down exactly as it was, or all zeroes comes down. We need only initialize a subtotal to 0, then iterate over the multiplier. For every 1, we add in the multiplicand, left shifted as appropriate, to the subtotal.

Here is what the basic rewrite looks like:
co_multiply = function() {
regs[2] = regs[1];
regs[1] = regs[0];
regs[0] = 0;
for (var x in bit) {
if ( (regs[2] >> x) & 1 ) { co_add() }
regs[1] <<= 1;
}
}

All the register shifting at the top is done to capitalize on the pre-existing addition function. It always adds register 1 to register 0, so reg[0] needs to start out empty, then get reg[1] added to it where appropriate. Inside the for loop, reg[1] gets left-shifted once per iteration.

This would work as-is, but I put in a couple additional safeguards. If either initial value is 0, we know the answer already, so there is no work to do. Also, I set the overflow flag if I'm about to left-shift off a 1, as the answer won't fit in the number of bits I'm working with. Here's the final function:
co_multiply = function() {
if (regs[0] == 0) { return }
if (regs[1] == 0) { regs[0] = 0; return }
regs[2] = regs[1];
regs[1] = regs[0];
regs[0] = 0;
for (var x in bit) {
if ( (regs[2] >> x) & 1 ) { co_add() }
if ( (regs[1] & highbit) != 0 ) { overflow = true }
regs[1] <<= 1;
}
}

I've pre-defined highbit as the highest value in the bit array, to speed up lookups. I'm considering rewriting that line as:
if ( (regs[1] >> word_length) & 1 ) { overflow = true }

Some very unscientific test cases have shown that right-shifting followed by a bitwise AND with 00000001 is slightly faster than a bitwise AND with a higher value. Plus it looks cooler. And, you know, that's important, too.

Division

There are no two ways about it: Long division brings back bad memories from school for everyone. After tackling it in code on a low level, my belief is now that it is being taught completely wrong to our kids. Yes, they are taught a method that produces correct answers, but it is abstracted from the real calculations going on so much that it might as well be a raindance that magically gives answers. Consider the steps in the standard long division technique of dividing 2825 by 25:
Step 1:   Step 2:   Step 3:   Step 4:   Step 5:
__1__ __1__ __11_ __11_ __113
25)2825 25)2825 25)2825 25)2825 25)2825
25 - 25 - 25 - 25 - 25
3 32 32 32
- 25 - 25 - 25
7 75
- 75
0

The first few things that jump out at me: In step 1, so far my answer is 1. My adult understanding of numbers can see that this is a 1 in the hundreds place. But my 8 year-old self had trouble with that.

In step 2, what exactly does that 3 represent? Or in step 3, what does 32 represent? By the time we get to step 4, there are a bunch of floating numbers abstracted from reality, and we just have to have faith in the magic system that gives us the answer.

What do I think would help? Well, let's start by filling in the blanks:
Step 1:   Step 2:   Step 3:   Step 4:   Step 5:
__100 __100 __110 __110 __113
25)2825 25)2825 25)2825 25)2825 25)2825
2500 - 2500 - 2500 - 2500 - 2500
325 325 325 325
- 250 - 250 - 250
75 75
- 75
0

So there it is. The "3" represents the first digit of "325", the difference of 2825 and 2500. So by step 2 we've said 2825 / 25 is 100, with a remainder of 325. What's left to do is to try to whittle down the remainder.

In the method with the blanks filled in, there is no mystic bringing down of digits for the next step; instead you do the complete subtraction, and what's left is the complete remainder. Maybe one day I'll flesh out what I think is a better approach to teaching division to kids, but for now let's stick to binary division in JavaScript, and what is *really* going on behind the scenes in long division.

In the binary world we are again aided by the fact that there are just ones and zeroes. When building our answer, we need only determine whether or not to put a one in the current bit. Consider 50 / 5, or binary 110010 / 101:
   ___1010
101)110010
- 101000 (101 + 3 zeroes)
01010
- 00000 (000 + 2 zeroes)
1010
- 1010 (101 + 1 zero)
000
- 000 (000 + no zeroes)

And binary 1010 is indeed decimal 10. The approach I took to tell JavaScript to do this breaks down into two groups of steps, here calling the dividend N (numerator), the divisor D (demoninator), the quotient Q, and x is a counter for how far left-shifted D is.

Group A:
A1: Set Q and x to 0.
A2: If D is greater than N, go to step B1
A3: Left-shift D
A4: Increment x
A5: Go to step A2

Group A left-shift's N until it is greater than D, incrementing x along the way. Group B does the real division work.

Group B:
B1: If D is greater than N, go to step B4
B2: Subtract D from N
B3: Add 1 left-shifted x times to Q
B4: If x is 0 or N is 0, Q is quotient, N is remainder, exit
B5: right-shift D
B6: decrement x
B7: Go to step B1

Group B starts out with D greater than N, so it skips down to right-shifting it over once. Then it iterates through subtracting from the dividend, and adding 1 bits to the quotient, and right-shifting D until the remainder is 0, or D has been right-shifted back to where it started.

The drawback to this function is that it can iterate through 2*word_length bits if the dividend is large and the divisor small. I'm sure it is possible to simplify the function and reduce its average bit iterations, but this second attempt at division is an order of magnitude faster than its predecessor. Here is the current function:
co_divide = function() {
// If either input register is 0, stop and return zeroes or throw divide by zero error
if(regs[1] == 0) { throw("Can't divide by 0") }
if(regs[0] == 0) { return }

regs[3] = 0; // Answer, ultimately. We build it up one bit at a time
var x; // current bit

// Left-shift denominator until it is >= numerator, or left-shifting will lose a 1
for (x in bit) {
a = greater(regs[0], regs[1]);
if (a == null) { // registers are equal
regs[0] = bit[x];
regs[1] = 0;
return;
}
if (a == false) { break } // Stop if left-shifted denominator is greater than numerator
if ( (regs[1] & highbit) != 0 ) { overflow = true }
regs[1] <<= 1;
}

// Subtract, right-shift, repeat until remainder or x = 0
while (regs[0] != 0) {
if(a != false) {
co_subtract();
regs[3] |= bit[x];
}
if (x == 0) { break }
x = decrement(x);
regs[1] >>= 1;
a = greater(regs[0], regs[1]);
}

regs[1] = regs[0]; // Remainder
regs[0] = regs[3]; // Quotient
}

Next steps:
I still need to look at floats, but I'm more interested in cleaning up the compiler to make it more like the other functions, and not rely on JavaScript text matching. I'm not sure how much progress I'll make on either of those by next week, but I'm having fun whipping up small programs using the new language style, so I'll at least add a few more of those to the test suite.

No comments:

Post a Comment