Thursday, February 18, 2010

Week 4: How to write itoa, and the trouble with tech interviews

Source: rm.js
Test page: register-machine-test-2010-02-18.html
Language reference: assembly-language-reference.html

Changes:
- Fixed bug in division function
- All word characters (A-Z,a-z,0-9,_) can be used in labels, not just lowercase letters
- All assembly functions can use labels as the address part
- Added inp command to read from STDIN textarea
- Added a demo of subroutines, an itoa algorithm, and a STDIN reversal algorithm to the test suite
- Added right-shift and left-shift commands (lft, rgt)
- Got rid of "cor" command, replaced with unique commands for each math function (inc, dec, add, sub, mul, div)

This last change presents a question: Rework how commands are stored, or do something different with the address part of math commands? I decided for now to use the address part as an optional destination for the result of the math function. If the address is zero, the function works like its "cor" counterpart; if it is non-zero, memset is called and the result is copied to that address. This removes the need to follow up math functions with "cpo" commands to store the results. For example, add two numbers, store them in address 50, the old and the new way:

Old:

mov 0 50
mov 1 75
cor 0 2 // Add
cpo 0 50 // Copy reg 0 to mem 50
end

New:

mov 0 50
mov 1 75
add 0 50 // Add reg 0 and 1, result to mem 50
end

Additionally, for the add and subtract commands, Register 1 is added to or subtracted from the register you specify. Previously this was hardcoded to just register 0.

My "Phase 1" builds have problems that keep creeping up, and I've been fighting code and feature bloat as much as I can. For example, I started out with two registers, and quickly bumped that up to four before the first build, and now I'm up to four user-addressable registers, another register for the jump-to address, one for the STDIN pointer, three flags used in math functions, and an overflow flag... that I don't really do anything with yet.

I've reached the point where the project is becoming fleshed out enough that I've started researching other assembly environments. Interestingly, some of what I came up with independently matches what exists out in the wild, such as my algorithm for incrementing binary values, my solution for an itoa (Integer to ASCII) algorithm, and my method of creating pseudo-subroutines, which matches the way Knuth's MIX environment solves the same problem. Specifically, an executable byte contains both a command and an address, and exists in a memory location that is user-writable. If the command is "jump", I can overwrite just the address part, and change where it jumps to from within the program. This can be used to tell a "subroutine" where to return to when it's finished.

Sticking to the "only low-level bit stuff" goal, I accomplished this by using a bitmask that is all ones by the command part of a byte, and all zeroes by the address part, using the following JavaScript command:
mem[address] = (mem[address] & mask) | value; // update word part only

Imagine I have a three-bit command, 101, that corresponds to "JMP", and a four-bit address, 0011, that I want to replace with a new address of 0110. My bitmask will be 1110000. I start out by ANDing the byte and the mask:
  1 0 1 0 0 1 1
& 1 1 1 0 0 0 0
---------------
1 0 1 0 0 0 0

This retains the "command" part of the byte, and zeroes out the "address" part. Then I OR the byte with the new address, in this case 0110:
  1 0 1 0 0 0 0
| 0 0 0 0 1 1 0
---------------
1 0 1 0 1 1 0

So the old byte was 1010011, and by ANDing the mask then ORing the new value, I have a new byte 1010110, and I have updated where it will jump to.

This feature of being able to update just the address part of a byte can be extended to other functions as well. I found a use for it when trying to create a better itoa function than my first attempt. The itoa algorithm is a common function in many languages that takes a number and converts it to characters, one digit at a time, and prints each digit out in turn. This may sound like so much fluff and wasted effort, but programs commonly need to do this kind of conversion. For example, in plain JavaScript:
var ans = 5000 / 20;
alert("The answer to five-thousand divided by twenty is " + ans + ".");



The "ans" variable was a number stored as a numeric value. What is being alerted is a string, and each digit of "250" must be extracted from the number and displayed, one at a time, as ASCII. The extraction of each digit and conversion to ASCII was fantastically painful... and fun!

My first attempt at this in my assembly language was only semi-functional, and I wasn't happy with it. The principle was starting out dividing the number by 10,000 (since 65535 is the largest number that can fit in my 16 bit word length, there won't be more than 5 digits in a number). The integer portion of the quotient is the first digit in the number. Take that number and add 48 to it (ASCII 48 being where "0" is located), and print that character. Then step the divisor down to 1,000, and divide the remainder by that. Repeat, divide by 100, then 10, then 1. For example, divide 12,345 by 10,000 and you get 1 with a remainder of 2,345. Print out the 1, divide 2,345 by 1000, and you get 2 r 345. Etc.

The first algorithm was not optimal, as you have to adjust the hardcoded 10,000 divisor if the word_length changes from 16 to something else, and it behaves poorly whenever 0 is encountered. Also, the output will always be 5 bytes, with leading zeroes on small numbers. Boo! After pondering for a while, I found a better solution and implemented it in my assembly language.

The new algorithm works by starting from the small side. Divide by 10, and the remainder is your last digit. Divide that quotient by 10, and you get your next-to-last digit. Repeat until you get 0, and then print out the characters you've extracted in reverse order. To do that in my assembly language, I had to modify cpy and cpo addresses during the program execution using the same trick used above to create pseudo-subroutines, which necessitated adding the "update value only" feature I mentioned above to all functions, not just to jumps.

The working algorithm is available on the test page linked to above. An unexpected bonus to this implementation is you can change what number you're dividing by to change the base of the result. Change it to eight and it converts the number to octal, two and it prints out binary. Neat.

So after I designed and tested this, I did a Google search for "itoa algorithm", and found out that this is basically the standard solution for rolling your own itoa function, and also that this is a common interview question for coders. In fact, there appears to be a multitude of fora dedicated to helping potential interviewees perform better when quizzed in tech interviews. This irritates me for two basic reasons: First, random algorithm tests for candidates is a completely crappy interview method. Second, most of the posts I came across appeared to be pleas for help in gaming the system.

Job seekers seem to be building lists of answers to potential interview questions rather than building their skill at coding and problem-solving. So take my two approaches to the itoa problem. I hadn't "studied for the test", so my first implementation at an interview would have been scoffed at, and I wouldn't have gotten the job. I realized the problems with my algorithm, and attacked it again and wrote the optimal solution. This is my main approach to tech problem solving: take a stab at it to better learn the problem, then start over, using what you've learned to build a better solution. But I wouldn't have gotten the job, the unqualified flunky who cheated on the test would have. What a mess. Fortunately, I currently have a coding job I'm happy with, and I don't plan on trying to pass any arbitrary interview tests in the near future.

End rant.

Next steps:
- Add a "jump if flag is 1" command to branch on math and overflow flags
- Add logic commands such as xor, and, or, equals, not
- Work on functions that don't yet follow "bit manipulation only" rule
- Update test suite programs to read STDIN lines as integers for input instead of hardcoding the inputs.

Thursday, February 11, 2010

Week 3: Schoolboy Math Goes Binary

Source: rm.2010021js
Test page: register-machine-test-2010-02-11.html
Language reference: assembly-language-reference.html

Changes:
This was Speed Week. I made three major improvements, two that affect the speed of running programs, one that affects the speed of writing them.

- run_program() now dumps all output once at the end of the program (or when an error is thrown), rather than one line at a time. This dramatically speeds up output when multiple lines are being written (e.g., the ASCII chart and increment functions)
- Line-number based compiling has been replaced with label-based ("jmp 0 label:" vs. "5 jmp 0 15"). Look at the sample programs on the test page to see how this simplifies program creation. Now if you want to adjust a program, you don't need to go through and renumber everything and adjust your jump steps to match.
- Increment-based arithmetic has been replaced with real binary arithmetic. This is the largest speed improvement.
- The sample programs have been updated to the new format, and a couple new ones (factorials and the Fibonacci sequence) have been added.
- To illustrate the speed improvements, the word_length has been raised to 16 bits.

I learned a lot this week about arithmetic, which is a strange thing for a 38 year old computer programmer to say. Yes, I know how to do math by hand, they taught me all about it in elementary school (good old Mrs. Spake at Kernersville Elementary prevented me from a life of crime, all things considered). The new functions that do arithmetic schoolboy-style were interesting to write, and ended up being pretty concise, if a little odd. The exception to the fun was division, however; that was downright painful.

Here is an examination of the four basic arithmetic functions, and how I am currently treating them.

Addition

Here is my old increment-based add function:

// Increment co_0 by co_1, set co_1 to 0
add = function() {
while(co_read(1) != 0) {
co_inc(0);
co_dec(1);
}
}

Until register 1 returns a 0 value, decrement it, and increment register 0, which will end up being the sum of both registers. Nothing to it. Move Suzy's apples to John's pile, and then John has all the apples.

The function is correct, but the problem it has is with speed. The function always goes through regs[1] iterations: when register 1 starts out set to 250, the function takes 250 iterations. The increment function that add() calls, while not very expensive, iterates through each bit, least significant first, until it finds a zero. A little function analysis shows incrementing from 0 to 255 iterates through a total of 502 bits. This problem was compounded by my old multiplication function, which used a similar principle: adding the multiplicand to itself, then decrementing the multiplier, and repeating until the multiplier is 0. The total number of iterations over bits is roughly twice the product, e.g., 255x255 iterates over 129,540 bits!

It was clear after some basic testing that if I'm going to work with numbers that aren't tiny, I had to do math more efficiently, and that's what I've taken steps to do during this week's build.

Here is the revised add function:

co_add = function() {
// a = register 0's current bit, b = register 1's current bit, c = carry bit;
c = 0;
for (var x in bit) {
a = (regs[0] >> x) & 1;
b = (regs[1] >> x) & 1;
if (c != b) { regs[0] ^= bit[x] };
if (a == b && a != c) { c = !c }
}
if (c != 0) { overflow = true }
}

First, binary addition works the same way it does in base 10, except you only have ones and zeroes. You can subtotal by iterating right to left over the digits, carries work the same, everything's the same. Here's binary 011 (decimal 3) + binary 010 (decimal 2) to illustrate:

Step 1   Step 2   Step 3

1 1
011 011 011
+ 010 + 010 + 010
----- ----- -----
1 01 101 (binary 101 = decimal 5)

Just like you remember from school, right? All the possibilities for subtotaling each column can be layed out in a simple matrix of eight options: Two possibilities for each of the number being added, and two for whether there was a carry from the previous column. Here is the matrix of possibilities, the subtotals they produce, and whether or not the next column will need a carry:

Carry  Row 1  Row 2     Subtotal   New Carry
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

The matrix has two properties I stumbled across that I found suprising. First, If the current column's carry and row 2 values are the same, the subtotal will be the same as row 1, otherwise it will be the opposite. Second, If both rows have the same digit, the next column's carry is that digit, otherwise it remains whatever it was before. Look for yourself. It's pretty strange.

Those stumbled-upon properties combine to make it simple to calculate a sum in one pass through the digits, as well as to update row 1's values in place to make it the sum, rather than using a temporary variable. The meat happens here:

if (c != b) { regs[0] ^= bit[x] };
if (a == b && a != c) { c = !c }

The first line says "If the current value of carry doesn't equal row 2's bit, flip row 1's bit." The second says "If row one and two are equal, but do not match the current carry, flip the carry." Concise, but abstracted from what I would consider math, even though it effectively is adding the same way a third-grader does.

Subtraction:

The rewrite for subtraction works much the same way as addition. In fact, the function is identical except for the line that manages the "borrow" flag (the carry flag from addition). Here is the matrix for subtraction:

Borrow Row 1  Row 2     Subtotal   New Borrow
0 0 0 0 0
0 0 1 1 1
0 1 0 1 0
0 1 1 0 0
1 0 0 1 1
1 0 1 0 1
1 1 0 0 0
1 1 1 1 1

Subtotaling worked exactly the way it did for addition. Whether borrowing continued or not took a fair bit of analysis, though. It eventually jumped out at me that borrow tends to continue as it was, and in both cases where it switches, the previous value and row 1's bit matched. And when it changed, it changed to row 2's bit.

So with "c" being our borrow bit, and "a" and "b" being the row bits, managing the borrow flag can be accomplished with this simple statement:
if ( c == a ) { c = b };

Madness! Anyway, that makes the final subtract function:
co_subtract = function() {
c = 0; // Borrow flag
for (var x in bit) {
a = (regs[0] >> x) & 1;
b = (regs[1] >> x) & 1;
if (c != b) { regs[0] ^= bit[x] };
if ( c == a ) { c = b };
}
if (c != 0) { overflow = true }
}

Multiplication

Binary multiplication also works like decimal multiplication, but it has a fantastic simplicity to it that is really easy to code for. Take this example, multiplying 9 x 5 (binary 1001 x 101):
    1 0 0 1
x 1 0 1
-----------
1 0 0 1
0 0 0 0
1 0 0 1
-----------
1 0 1 1 0 1

First, binary 101101 is indeed 45. Second, for all the subtotal rows, one of two things happens: either the multiplicand comes down exactly as it was, or all zeroes comes down. We need only initialize a subtotal to 0, then iterate over the multiplier. For every 1, we add in the multiplicand, left shifted as appropriate, to the subtotal.

Here is what the basic rewrite looks like:
co_multiply = function() {
regs[2] = regs[1];
regs[1] = regs[0];
regs[0] = 0;
for (var x in bit) {
if ( (regs[2] >> x) & 1 ) { co_add() }
regs[1] <<= 1;
}
}

All the register shifting at the top is done to capitalize on the pre-existing addition function. It always adds register 1 to register 0, so reg[0] needs to start out empty, then get reg[1] added to it where appropriate. Inside the for loop, reg[1] gets left-shifted once per iteration.

This would work as-is, but I put in a couple additional safeguards. If either initial value is 0, we know the answer already, so there is no work to do. Also, I set the overflow flag if I'm about to left-shift off a 1, as the answer won't fit in the number of bits I'm working with. Here's the final function:
co_multiply = function() {
if (regs[0] == 0) { return }
if (regs[1] == 0) { regs[0] = 0; return }
regs[2] = regs[1];
regs[1] = regs[0];
regs[0] = 0;
for (var x in bit) {
if ( (regs[2] >> x) & 1 ) { co_add() }
if ( (regs[1] & highbit) != 0 ) { overflow = true }
regs[1] <<= 1;
}
}

I've pre-defined highbit as the highest value in the bit array, to speed up lookups. I'm considering rewriting that line as:
if ( (regs[1] >> word_length) & 1 ) { overflow = true }

Some very unscientific test cases have shown that right-shifting followed by a bitwise AND with 00000001 is slightly faster than a bitwise AND with a higher value. Plus it looks cooler. And, you know, that's important, too.

Division

There are no two ways about it: Long division brings back bad memories from school for everyone. After tackling it in code on a low level, my belief is now that it is being taught completely wrong to our kids. Yes, they are taught a method that produces correct answers, but it is abstracted from the real calculations going on so much that it might as well be a raindance that magically gives answers. Consider the steps in the standard long division technique of dividing 2825 by 25:
Step 1:   Step 2:   Step 3:   Step 4:   Step 5:
__1__ __1__ __11_ __11_ __113
25)2825 25)2825 25)2825 25)2825 25)2825
25 - 25 - 25 - 25 - 25
3 32 32 32
- 25 - 25 - 25
7 75
- 75
0

The first few things that jump out at me: In step 1, so far my answer is 1. My adult understanding of numbers can see that this is a 1 in the hundreds place. But my 8 year-old self had trouble with that.

In step 2, what exactly does that 3 represent? Or in step 3, what does 32 represent? By the time we get to step 4, there are a bunch of floating numbers abstracted from reality, and we just have to have faith in the magic system that gives us the answer.

What do I think would help? Well, let's start by filling in the blanks:
Step 1:   Step 2:   Step 3:   Step 4:   Step 5:
__100 __100 __110 __110 __113
25)2825 25)2825 25)2825 25)2825 25)2825
2500 - 2500 - 2500 - 2500 - 2500
325 325 325 325
- 250 - 250 - 250
75 75
- 75
0

So there it is. The "3" represents the first digit of "325", the difference of 2825 and 2500. So by step 2 we've said 2825 / 25 is 100, with a remainder of 325. What's left to do is to try to whittle down the remainder.

In the method with the blanks filled in, there is no mystic bringing down of digits for the next step; instead you do the complete subtraction, and what's left is the complete remainder. Maybe one day I'll flesh out what I think is a better approach to teaching division to kids, but for now let's stick to binary division in JavaScript, and what is *really* going on behind the scenes in long division.

In the binary world we are again aided by the fact that there are just ones and zeroes. When building our answer, we need only determine whether or not to put a one in the current bit. Consider 50 / 5, or binary 110010 / 101:
   ___1010
101)110010
- 101000 (101 + 3 zeroes)
01010
- 00000 (000 + 2 zeroes)
1010
- 1010 (101 + 1 zero)
000
- 000 (000 + no zeroes)

And binary 1010 is indeed decimal 10. The approach I took to tell JavaScript to do this breaks down into two groups of steps, here calling the dividend N (numerator), the divisor D (demoninator), the quotient Q, and x is a counter for how far left-shifted D is.

Group A:
A1: Set Q and x to 0.
A2: If D is greater than N, go to step B1
A3: Left-shift D
A4: Increment x
A5: Go to step A2

Group A left-shift's N until it is greater than D, incrementing x along the way. Group B does the real division work.

Group B:
B1: If D is greater than N, go to step B4
B2: Subtract D from N
B3: Add 1 left-shifted x times to Q
B4: If x is 0 or N is 0, Q is quotient, N is remainder, exit
B5: right-shift D
B6: decrement x
B7: Go to step B1

Group B starts out with D greater than N, so it skips down to right-shifting it over once. Then it iterates through subtracting from the dividend, and adding 1 bits to the quotient, and right-shifting D until the remainder is 0, or D has been right-shifted back to where it started.

The drawback to this function is that it can iterate through 2*word_length bits if the dividend is large and the divisor small. I'm sure it is possible to simplify the function and reduce its average bit iterations, but this second attempt at division is an order of magnitude faster than its predecessor. Here is the current function:
co_divide = function() {
// If either input register is 0, stop and return zeroes or throw divide by zero error
if(regs[1] == 0) { throw("Can't divide by 0") }
if(regs[0] == 0) { return }

regs[3] = 0; // Answer, ultimately. We build it up one bit at a time
var x; // current bit

// Left-shift denominator until it is >= numerator, or left-shifting will lose a 1
for (x in bit) {
a = greater(regs[0], regs[1]);
if (a == null) { // registers are equal
regs[0] = bit[x];
regs[1] = 0;
return;
}
if (a == false) { break } // Stop if left-shifted denominator is greater than numerator
if ( (regs[1] & highbit) != 0 ) { overflow = true }
regs[1] <<= 1;
}

// Subtract, right-shift, repeat until remainder or x = 0
while (regs[0] != 0) {
if(a != false) {
co_subtract();
regs[3] |= bit[x];
}
if (x == 0) { break }
x = decrement(x);
regs[1] >>= 1;
a = greater(regs[0], regs[1]);
}

regs[1] = regs[0]; // Remainder
regs[0] = regs[3]; // Quotient
}

Next steps:
I still need to look at floats, but I'm more interested in cleaning up the compiler to make it more like the other functions, and not rely on JavaScript text matching. I'm not sure how much progress I'll make on either of those by next week, but I'm having fun whipping up small programs using the new language style, so I'll at least add a few more of those to the test suite.

Thursday, February 4, 2010

Week 2: Bit Twiddling

Source: rm20100203.js
Test page: register-machine-test-2010-02-03.html

Changes:
- Increment/decrement now using bit twiddling of integers, not strings
- Test suite page redesigned to display output in textarea rather than alert window
- Working memory handling, program executor, and compiler
- Test suite contains sample assembly programs for arithmetic and displaying an ASCII table

There is an inherent flaw in this project that is unavoidable: JavaScript does not provide direct access to memory. No matter what my design, at some point I'm going to have to emulate something a chip does to memory by using a JavaScript-provided construct. My first draft from last week was using strings of zeroes and ones to be virtual memory, and I set out to have the chip functions do all math through my wacky search/replace increment functions. As I built on last week's code and provided the interface to memory and assembly instructions, I could not seem to avoid doing this:
var memsize = Math.pow(2,word_length)-1;

...and a lot of this:
for (var x = 0; x < word_length; x++) ...

I ended up with chunks of code considering addresses as strings of zeroes and ones, and other chunks considering it with integer values. I also was letting more native JavaScript math slip in everywhere. I decided, then, to rewrite everything to use JavaScript integers to store values, and to be as much of a purist as possible in avoiding JavaScript math or high level compare functions. I also abandoned the idea of registers as objects, and just store them now as arrays of integers.

After some heavy rewriting, the only maths my math and core functions now use are logical bitwise operators, and testing for 0 and null. The program executor and compiler do no math, but use switch statements to check for more values than just 0. Future builds will be more true to my "purist" idea, but rather than taking weeks to build the perfect framework, I wanted to build something quickly that would work.

And work it does. The goal of being able to type in assembly commands, and have them interpreted by a virtual machine and executed, and send output to a textarea works famously, as shown in this screenshot:


The assembly language I invented is very simple, comprised of only 8 commands, all following a standard format: [line] [command] [register] [word]. These get turned by the compiler into 13 bit integers (3 for the command, 2 for the register, 8 for the word) and inserted in my memory array, using "line" as the index. The executor then iterates over memory, and extracts from the integers the command, register, and word, and then acts on them.

Here is an overview of the commands, where "r" is the specified register, "w" the word:
jmp - Jump to address w, unless register r is less than register 0 (If r = 0, always jump)
jmz - Jump to address w only if register r's value is 0
cpy - Copy memory address w register r
cpo - Copy register r to address w
mov - Move the literal value w to register r
cor - Run core math function w. w = 0, increment register r. 1 = decrement r. 2 = add regsters 0 and 1. 3 = subtract, 4 = multiply, 5 = divide
out - Copy address w to output device (getElementById("op").value, for now). Copy decimal value plus newline if r = 0, ASCII value otherwise
end - Stop program. Register and word are ignored, but must be included

This is a very limiting language, naturally, but I've been able to accomplish some fun stuff with it during my first week of playing with it. Here, for example, is a simple 11 line program to display an ASCII table, an improvement over the one that comes in the built-in test suite, as it won't get stuck in an infinite loop if you try to show all 256 ASCII characters:

0 mov 0 0 // Start
1 mov 1 255 // End
2 mov 2 32 // Space
3 cpo 2 51
4 cpo 0 50
5 out 1 50 // Character
6 out 1 51 // Space
7 out 0 50 // Number
8 cor 0 0 // Increment to next character
9 jmz 0 11 // Exit if counter overflows
10 jmp 1 4 // Go to loop if End >= current
11 end 0 0


You can cut and paste that directly into the "input" window on the test page I link to above, and click "Run program" to watch it work. It may take a few seconds, but my engine is still sort of slow. A last note about the program, I said "all 256 ASCII characters" above, however the JavaScript conversion function I'm using to extract characters is String.fromCharCode, which displays unicode values. I tested this using 16 bit words and displaying characters greater than 256, and successfully output unicode values.

My functions now conceptually think of memory and registers as bits, my increment/decrement functions use xor-ing and flags to manipulate one bit at a time, and I assemble my cpu instructions using register shifting. Here are my old and new increment functions, to illustrate:

Old:
function increment(input) {
// Last index of 0 becomes 1, everything after becomes 0
var l = input.lastIndexOf("0") + 1;
var a = input.substring(0,l);
var b = input.substring(l);
return a.replace(/0$/, "1") + b.replace(/1/g, "0");
}

New:
// The bit[] array has flags used in XOR/AND operations, starting at 1, then doubling
// i.e., bit[0] = binary 00000001, bit[1] = 00000010, bit[2] = 00000100, etc.
var bit = new Array(1, 2, 4, 8, 16, 32, 64, 128);
function increment(num) {
for (var x in bit) {
var tmp = num & bit[x]; // tmp will be 0 if current bit being examined is 0
num = num ^ bit[x]; // This flips the current bit
if (tmp == 0) { return num; } // Stop if we just flipped a 0 to a 1
}
overflow = 1;
return num;
}

So, the new incrementing function works on same basic principle, except it's more explicit, whereas the old function was more "tricky" with its use of regular expressions. The new function conceptually thinks of the number as binary, and starts at the least significant bit and works backwards. Here is an example of how it works, incrementing 3 (binary 00000011) to 4 (00000100).

Step 1:
num    00000011                 00000011
bit[0] 00000001 & (AND) 00000001 ^ (XOR)
-------- --------
tmp 00000001 new num 00000010

tmp does not equal 0, so continue with new num 00000010 (decimal 2, yes I realize so far we're going the wrong way)

Step 2:
num    00000010                 00000010
bit[1] 00000010 & 00000010 ^
-------- --------
tmp 00000010 new num 00000000

tmp still isn't 0, so continuing with new num being 0. Looks bad, but have faith...

Step 3:
num    00000000                 00000000
bit[2] 00000100 & 00000100 ^
-------- --------
tmp 00000000 new num 00000100

OK, tmp is zero, so stop and return the new num value of 00000100 (decimal 4). Ta-da! increment(3) = 4, just like it is supposed to, and no actual arithmetic was done, just logical operations on bits and testing for 0. I'm building all arithmetic the register machine does on top of that. Up from first principles, so to speak, or rather, I'm pretending my increment function does what my theoretical cpu would do with bits. Decrementing works the same way, and I won't expound on it. The only difference is we stop when tmp does not equal 0.

Enjoy toying with the test page, modifying my sample programs, and making your own. Watch out for infinite loops. I've had to kill my browser dozens of times while experimenting this week due to exceptions I didn't see. Remember, assembly is a low-level and completely unsafe environment to code in.

Next steps:
- Write assembly language reference and more sample programs
- Speed up binary arithmetic functions instead of relying on increment/decrement only
- Add label support to assembly copiler, to eleminate reliance on line numbers. (Making small changes to a program is painful, as you currently have to renumber everything.)
- Experiment with floating point numbers