Thursday, February 18, 2010

Week 4: How to write itoa, and the trouble with tech interviews

Source: rm.js
Test page: register-machine-test-2010-02-18.html
Language reference: assembly-language-reference.html

Changes:
- Fixed bug in division function
- All word characters (A-Z,a-z,0-9,_) can be used in labels, not just lowercase letters
- All assembly functions can use labels as the address part
- Added inp command to read from STDIN textarea
- Added a demo of subroutines, an itoa algorithm, and a STDIN reversal algorithm to the test suite
- Added right-shift and left-shift commands (lft, rgt)
- Got rid of "cor" command, replaced with unique commands for each math function (inc, dec, add, sub, mul, div)

This last change presents a question: Rework how commands are stored, or do something different with the address part of math commands? I decided for now to use the address part as an optional destination for the result of the math function. If the address is zero, the function works like its "cor" counterpart; if it is non-zero, memset is called and the result is copied to that address. This removes the need to follow up math functions with "cpo" commands to store the results. For example, add two numbers, store them in address 50, the old and the new way:

Old:

mov 0 50
mov 1 75
cor 0 2 // Add
cpo 0 50 // Copy reg 0 to mem 50
end

New:

mov 0 50
mov 1 75
add 0 50 // Add reg 0 and 1, result to mem 50
end

Additionally, for the add and subtract commands, Register 1 is added to or subtracted from the register you specify. Previously this was hardcoded to just register 0.

My "Phase 1" builds have problems that keep creeping up, and I've been fighting code and feature bloat as much as I can. For example, I started out with two registers, and quickly bumped that up to four before the first build, and now I'm up to four user-addressable registers, another register for the jump-to address, one for the STDIN pointer, three flags used in math functions, and an overflow flag... that I don't really do anything with yet.

I've reached the point where the project is becoming fleshed out enough that I've started researching other assembly environments. Interestingly, some of what I came up with independently matches what exists out in the wild, such as my algorithm for incrementing binary values, my solution for an itoa (Integer to ASCII) algorithm, and my method of creating pseudo-subroutines, which matches the way Knuth's MIX environment solves the same problem. Specifically, an executable byte contains both a command and an address, and exists in a memory location that is user-writable. If the command is "jump", I can overwrite just the address part, and change where it jumps to from within the program. This can be used to tell a "subroutine" where to return to when it's finished.

Sticking to the "only low-level bit stuff" goal, I accomplished this by using a bitmask that is all ones by the command part of a byte, and all zeroes by the address part, using the following JavaScript command:
mem[address] = (mem[address] & mask) | value; // update word part only

Imagine I have a three-bit command, 101, that corresponds to "JMP", and a four-bit address, 0011, that I want to replace with a new address of 0110. My bitmask will be 1110000. I start out by ANDing the byte and the mask:
  1 0 1 0 0 1 1
& 1 1 1 0 0 0 0
---------------
1 0 1 0 0 0 0

This retains the "command" part of the byte, and zeroes out the "address" part. Then I OR the byte with the new address, in this case 0110:
  1 0 1 0 0 0 0
| 0 0 0 0 1 1 0
---------------
1 0 1 0 1 1 0

So the old byte was 1010011, and by ANDing the mask then ORing the new value, I have a new byte 1010110, and I have updated where it will jump to.

This feature of being able to update just the address part of a byte can be extended to other functions as well. I found a use for it when trying to create a better itoa function than my first attempt. The itoa algorithm is a common function in many languages that takes a number and converts it to characters, one digit at a time, and prints each digit out in turn. This may sound like so much fluff and wasted effort, but programs commonly need to do this kind of conversion. For example, in plain JavaScript:
var ans = 5000 / 20;
alert("The answer to five-thousand divided by twenty is " + ans + ".");



The "ans" variable was a number stored as a numeric value. What is being alerted is a string, and each digit of "250" must be extracted from the number and displayed, one at a time, as ASCII. The extraction of each digit and conversion to ASCII was fantastically painful... and fun!

My first attempt at this in my assembly language was only semi-functional, and I wasn't happy with it. The principle was starting out dividing the number by 10,000 (since 65535 is the largest number that can fit in my 16 bit word length, there won't be more than 5 digits in a number). The integer portion of the quotient is the first digit in the number. Take that number and add 48 to it (ASCII 48 being where "0" is located), and print that character. Then step the divisor down to 1,000, and divide the remainder by that. Repeat, divide by 100, then 10, then 1. For example, divide 12,345 by 10,000 and you get 1 with a remainder of 2,345. Print out the 1, divide 2,345 by 1000, and you get 2 r 345. Etc.

The first algorithm was not optimal, as you have to adjust the hardcoded 10,000 divisor if the word_length changes from 16 to something else, and it behaves poorly whenever 0 is encountered. Also, the output will always be 5 bytes, with leading zeroes on small numbers. Boo! After pondering for a while, I found a better solution and implemented it in my assembly language.

The new algorithm works by starting from the small side. Divide by 10, and the remainder is your last digit. Divide that quotient by 10, and you get your next-to-last digit. Repeat until you get 0, and then print out the characters you've extracted in reverse order. To do that in my assembly language, I had to modify cpy and cpo addresses during the program execution using the same trick used above to create pseudo-subroutines, which necessitated adding the "update value only" feature I mentioned above to all functions, not just to jumps.

The working algorithm is available on the test page linked to above. An unexpected bonus to this implementation is you can change what number you're dividing by to change the base of the result. Change it to eight and it converts the number to octal, two and it prints out binary. Neat.

So after I designed and tested this, I did a Google search for "itoa algorithm", and found out that this is basically the standard solution for rolling your own itoa function, and also that this is a common interview question for coders. In fact, there appears to be a multitude of fora dedicated to helping potential interviewees perform better when quizzed in tech interviews. This irritates me for two basic reasons: First, random algorithm tests for candidates is a completely crappy interview method. Second, most of the posts I came across appeared to be pleas for help in gaming the system.

Job seekers seem to be building lists of answers to potential interview questions rather than building their skill at coding and problem-solving. So take my two approaches to the itoa problem. I hadn't "studied for the test", so my first implementation at an interview would have been scoffed at, and I wouldn't have gotten the job. I realized the problems with my algorithm, and attacked it again and wrote the optimal solution. This is my main approach to tech problem solving: take a stab at it to better learn the problem, then start over, using what you've learned to build a better solution. But I wouldn't have gotten the job, the unqualified flunky who cheated on the test would have. What a mess. Fortunately, I currently have a coding job I'm happy with, and I don't plan on trying to pass any arbitrary interview tests in the near future.

End rant.

Next steps:
- Add a "jump if flag is 1" command to branch on math and overflow flags
- Add logic commands such as xor, and, or, equals, not
- Work on functions that don't yet follow "bit manipulation only" rule
- Update test suite programs to read STDIN lines as integers for input instead of hardcoding the inputs.

No comments:

Post a Comment