Thursday, February 4, 2010

Week 2: Bit Twiddling

Source: rm20100203.js
Test page: register-machine-test-2010-02-03.html

Changes:
- Increment/decrement now using bit twiddling of integers, not strings
- Test suite page redesigned to display output in textarea rather than alert window
- Working memory handling, program executor, and compiler
- Test suite contains sample assembly programs for arithmetic and displaying an ASCII table

There is an inherent flaw in this project that is unavoidable: JavaScript does not provide direct access to memory. No matter what my design, at some point I'm going to have to emulate something a chip does to memory by using a JavaScript-provided construct. My first draft from last week was using strings of zeroes and ones to be virtual memory, and I set out to have the chip functions do all math through my wacky search/replace increment functions. As I built on last week's code and provided the interface to memory and assembly instructions, I could not seem to avoid doing this:
var memsize = Math.pow(2,word_length)-1;

...and a lot of this:
for (var x = 0; x < word_length; x++) ...

I ended up with chunks of code considering addresses as strings of zeroes and ones, and other chunks considering it with integer values. I also was letting more native JavaScript math slip in everywhere. I decided, then, to rewrite everything to use JavaScript integers to store values, and to be as much of a purist as possible in avoiding JavaScript math or high level compare functions. I also abandoned the idea of registers as objects, and just store them now as arrays of integers.

After some heavy rewriting, the only maths my math and core functions now use are logical bitwise operators, and testing for 0 and null. The program executor and compiler do no math, but use switch statements to check for more values than just 0. Future builds will be more true to my "purist" idea, but rather than taking weeks to build the perfect framework, I wanted to build something quickly that would work.

And work it does. The goal of being able to type in assembly commands, and have them interpreted by a virtual machine and executed, and send output to a textarea works famously, as shown in this screenshot:


The assembly language I invented is very simple, comprised of only 8 commands, all following a standard format: [line] [command] [register] [word]. These get turned by the compiler into 13 bit integers (3 for the command, 2 for the register, 8 for the word) and inserted in my memory array, using "line" as the index. The executor then iterates over memory, and extracts from the integers the command, register, and word, and then acts on them.

Here is an overview of the commands, where "r" is the specified register, "w" the word:
jmp - Jump to address w, unless register r is less than register 0 (If r = 0, always jump)
jmz - Jump to address w only if register r's value is 0
cpy - Copy memory address w register r
cpo - Copy register r to address w
mov - Move the literal value w to register r
cor - Run core math function w. w = 0, increment register r. 1 = decrement r. 2 = add regsters 0 and 1. 3 = subtract, 4 = multiply, 5 = divide
out - Copy address w to output device (getElementById("op").value, for now). Copy decimal value plus newline if r = 0, ASCII value otherwise
end - Stop program. Register and word are ignored, but must be included

This is a very limiting language, naturally, but I've been able to accomplish some fun stuff with it during my first week of playing with it. Here, for example, is a simple 11 line program to display an ASCII table, an improvement over the one that comes in the built-in test suite, as it won't get stuck in an infinite loop if you try to show all 256 ASCII characters:

0 mov 0 0 // Start
1 mov 1 255 // End
2 mov 2 32 // Space
3 cpo 2 51
4 cpo 0 50
5 out 1 50 // Character
6 out 1 51 // Space
7 out 0 50 // Number
8 cor 0 0 // Increment to next character
9 jmz 0 11 // Exit if counter overflows
10 jmp 1 4 // Go to loop if End >= current
11 end 0 0


You can cut and paste that directly into the "input" window on the test page I link to above, and click "Run program" to watch it work. It may take a few seconds, but my engine is still sort of slow. A last note about the program, I said "all 256 ASCII characters" above, however the JavaScript conversion function I'm using to extract characters is String.fromCharCode, which displays unicode values. I tested this using 16 bit words and displaying characters greater than 256, and successfully output unicode values.

My functions now conceptually think of memory and registers as bits, my increment/decrement functions use xor-ing and flags to manipulate one bit at a time, and I assemble my cpu instructions using register shifting. Here are my old and new increment functions, to illustrate:

Old:
function increment(input) {
// Last index of 0 becomes 1, everything after becomes 0
var l = input.lastIndexOf("0") + 1;
var a = input.substring(0,l);
var b = input.substring(l);
return a.replace(/0$/, "1") + b.replace(/1/g, "0");
}

New:
// The bit[] array has flags used in XOR/AND operations, starting at 1, then doubling
// i.e., bit[0] = binary 00000001, bit[1] = 00000010, bit[2] = 00000100, etc.
var bit = new Array(1, 2, 4, 8, 16, 32, 64, 128);
function increment(num) {
for (var x in bit) {
var tmp = num & bit[x]; // tmp will be 0 if current bit being examined is 0
num = num ^ bit[x]; // This flips the current bit
if (tmp == 0) { return num; } // Stop if we just flipped a 0 to a 1
}
overflow = 1;
return num;
}

So, the new incrementing function works on same basic principle, except it's more explicit, whereas the old function was more "tricky" with its use of regular expressions. The new function conceptually thinks of the number as binary, and starts at the least significant bit and works backwards. Here is an example of how it works, incrementing 3 (binary 00000011) to 4 (00000100).

Step 1:
num    00000011                 00000011
bit[0] 00000001 & (AND) 00000001 ^ (XOR)
-------- --------
tmp 00000001 new num 00000010

tmp does not equal 0, so continue with new num 00000010 (decimal 2, yes I realize so far we're going the wrong way)

Step 2:
num    00000010                 00000010
bit[1] 00000010 & 00000010 ^
-------- --------
tmp 00000010 new num 00000000

tmp still isn't 0, so continuing with new num being 0. Looks bad, but have faith...

Step 3:
num    00000000                 00000000
bit[2] 00000100 & 00000100 ^
-------- --------
tmp 00000000 new num 00000100

OK, tmp is zero, so stop and return the new num value of 00000100 (decimal 4). Ta-da! increment(3) = 4, just like it is supposed to, and no actual arithmetic was done, just logical operations on bits and testing for 0. I'm building all arithmetic the register machine does on top of that. Up from first principles, so to speak, or rather, I'm pretending my increment function does what my theoretical cpu would do with bits. Decrementing works the same way, and I won't expound on it. The only difference is we stop when tmp does not equal 0.

Enjoy toying with the test page, modifying my sample programs, and making your own. Watch out for infinite loops. I've had to kill my browser dozens of times while experimenting this week due to exceptions I didn't see. Remember, assembly is a low-level and completely unsafe environment to code in.

Next steps:
- Write assembly language reference and more sample programs
- Speed up binary arithmetic functions instead of relying on increment/decrement only
- Add label support to assembly copiler, to eleminate reliance on line numbers. (Making small changes to a program is painful, as you currently have to renumber everything.)
- Experiment with floating point numbers

No comments:

Post a Comment