More Machine Language For Beginners
Richard Mansfield
Assistant Editor
This article has two purposes: to provide a way of insuring that private documents and programs cannot be seen or used by unauthorized persons and to explain some aspects of machine language programming. Readers who are familiar with M.L. might wish to skip the second part of the article.
The BASIC Program
The BASIC listing of Security Lock (Program 1) will run on any version PET. The M.L. routine goes into a "safe" area in the second cassette buffer, common to all ROM sets in Commodore machines, including the new 8000 series. This area is "safe" because it is below BASIC programs and is not used by PET unless a second cassette machine is used.
The uses of "Security Lock" are explained within the program. It is not necessary to type in the entire program. Simply copy lines 120, 130, and the DATA lines from 1000 up.
The three-letter code can be changed, as described in the program, to any combination. An additional security measure — making it virtually impossible to break into a protected program — is not in the BASIC LISTing in Program 1. The reason that it cannot be illustrated is simple: the purpose of this technique is to prevent LISTings themselves from taking place.
We must describe how to do this since it cannot be demonstrated via a printout. First, when you include "Security Lock" within a program, you will be using a line similar to line 130 in Program 1 (REM statement removed). If you are calling the M.L. routine at the start of the program, you might type it in as line 1, thus — 1SYS867 (or, if you are not in the "graphics" mode, Isys867).
Now, after the last character, type in a quotation mark and hit the RETURN key —: 1SYS867"
Then, using the cursor control keys, move the cursor back up to a position directly following the quotation mark. Holding the SHIFT key down, press the INSERT key nine times. Then release the INSERT and the SHIFT keys and press the DELETE key nine times. You will see nine reverse-character "t" 's which represent nine automatic deletions. Then press the RETURN key to enter this line into the rest of the program.
As you can see, any attempt to LIST the program will now delete line 1 from view, as if it were not part of the program. A brief flash on the screen is the only clue that something exists there, yet this line will operate normally during a RUN of the program. To eliminate the flash, you can use the quote/delete rub-out further into the program (as in line 130, Program 1) where it will be unlikely to be noticed.
Before turning to some observations on M.L. programming, it might be worthwhile to mention one modification to "Security Lock" which may prove useful. The M.L. program always prints "code?" on the screen to remind you that it is the Lock hanging up the program, not an endless loop or a hardware failure. If you simply want to freeze a program or file, without giving a clue as to why it's locked, eliminate the prompt word in the M.L. routine by typing in the following and then hitting RETURN:
FOR I = 867 TO 880: POKE 1,234: NEXT
This puts NOP (no operation) instructions into the routine, and when the SYS lands PET at 867, it slides up to 881 with no ill effects, where the input routine starts.
If you save frequently-used routines on a "Utilities" tape or disk for easy appending to future programs, this routine, like all M.L. routines, cannot be SAVEd normally (as BASIC is SAVEd). The following procedure will save M.L. routines which can later be LOADed in the usual way. Go into the Machine Language Monitor by typing SYS 1024. (If you have an Original PET, follow the instructions which came with your MLM tape.) Immediately after the dot, where the cursor should have landed, type —
s"Security Lock",01,035a,0399
and hit RETURN (for tape). For disk: .s"0:Security Lock",08,035a,0399. Note that the upper limit address must be one higher than the actual upper limit which was 0398 hex.
The Four Types Of M.L Listings
In books and magazines, machine language routines can be listed in four ways: as BASIC DATA statements (sometimes called a "BASIC loader") as a memory dump, as a simple disassembly, and as an annotated assembly. This can be confusing to the novice, so the four Programs which accompany this article illustrate the four kinds of listings possible for the same M.L. Program: Security Lock.
As a series of BASIC DATA statements (Program 1), M.L. code is a part of a larger BASIC program. This gives little or no information about the nature of the M.L. section. It is typed, then RUN, as a subroutine of the host program. And the reader is frequently cautioned to type the program exactly as it appears. This is because a single error in M.L. will usually crash the entire program. But much typing time can be saved, if the M.L. routine is all that's wanted from the BASIC program, by looking for three things: a READ-loop, a SYS, and DATA statements. Line 120 is a READ-loop which POKEs the M.L. routine into memory and line 130 contains the SYS to enter that routine at the proper address. In BASIC, the DATA for an M.L. routine are decimal numbers.
The next step up toward clarity, though the program's meaning is still not easily recognizable, is a "memory dump" (Program 2). (This is sometimes called a "hex dump.") It is a table of hexadecimal numbers. The first number is the address of the first datum on its line. In Program 2, the "dump" shows that address 035a contains a 43, address 035b contains 4f, etc. As before, to make such a program your own, you copy in the information, being careful to copy precisely. In this case, however, you must first enter the M.L. Monitor and then type:
.M 035a 0392 (RETURN key)
This will put a memory dump on screen of what currently exists in the se memory cells. To put in the new data, you just type over what appears on the screen, observing the spaces between each hexadecimal number and hitting RETURN when each line has been changed.
A third type of M.L. printout is a list of each machine language instruction in terms of its function. This is a disassembly, (Program 3), and resembles a LISTing in BASIC, though in a highly abbreviated form. Any series of numbers can be examined by a disassembler, a program which translates raw data into M.L. instruction mnemonics. A disassembler can be found on pg. 81 of COMPUTE! #8. If the numbers are part of an M.L. routine, the disassembler will list them as in Program 3. If it cannot make sense of what it sees (if it were examining memory which contained BASIC code for example) it would print a series of question marks.
A disassembler usually prints out four fields, or zones of information. It is easy to see that the first four characters in Program 3 represent memory addresses. This is the "address field" and is similar to the first four characters of Program 2, the memory dump, except that here the number of bytes in the second field, the "data field," can be 1, 2, or 3 — so the numbers in the address field will increase irregularly. The second, "data field," also corresponds to Program 2's dump, but there is the same irregularity as different numbers group themselves together. This grouping is then translated in the third and fourth fields — the "instruction" field and the "operator" field. These last two fields are "mnemonic" (easy to remember) representations of the information contained in the raw hex numbers of the "data field" which precedes them. The "instructions" tell the computer what to do and the "operators" tell the computer what to do it to. In the phrase, "drive a car," drive is the instruction, car is the operator. In LDY #$00, LDY (load the Y register) is the instruction, #$00 (zero) is the operator. The same structure exists in BASIC — POKE 32768, 41 or PRINT "Hello."
The reason that the disassembly must group its information irregularly is that different instructions are designed to work with different sized operators. INY, (increase the value of the Y register by 1), has no explicit operator since the "1" is implied within the instruction itself. LDY #$00 has a one-byte operator, 00, so it is two bytes long: LDY and 0. To instruct the computer to compare the number in the accumulator with the number in address $0360, we need three bytes, CMP plus two bytes to represent a number as large as 0360. Any one byte can only hold a number up to 255.
Full Source Code
Finally, Program 4 illustrates the clearest way that an M.L. program can be presented: as an annotated assembly listing. (It is also called "source code.") This contains within it the four fields of the disassembly, but adds three more fields — line numbers, labels, and comments.
Such listings represent the program rather elaborately by M.L. standards. Such programs are written using an "assembler" program which accepts mnemonics such as INY, translates them, and puts them in memory. Assemblers are either "single-pass," (simple translators of mnemonics) or complex, label-oriented powerhouses. Unfortunately, the M.L. Monitor within PET does not contain a disassembler or an assembler, but the monitor can be made to include these functions (and others) with a program such as "Supermon" or "Extramon." For short routines, simple assemblers will work well. For larger jobs — an entire arcade game would be a large job — a power assembler is needed. To my knowledge, the most advanced assemblers available to PET users are "MAE" and "ASM/TED," written by Carl Moser. (Available from A.B. Computers or Eastern House Software.)
A printout from such an assembler is much easier to understand because it contains labels and comments. Using such an assembler, it is also easier to write large programs since some of the problems associated with programming in M.L. are handled automatically by the computer.
Mnemonics are easier to manipulate than numbers, but whole words (labels) are often an improvement over mnemonics, particularly when a program is lengthy. To clarify the additional fields found in large assembly programs, we can examine line 0180 (Program 4). It begins with two fields which are identical to the disassembly in Program 3, but the third field is a BASIC-like consecutive numbering of each line of the program. This allows the programmer to manipulate the instructions more easily, since renumbering can open up new space for additional instructions or whole sections of the program can be conveniently rearranged.
Following the line number is the label field, in this case the label, "START," since it is the beginning of the program. This is better than BASIC. Several locations (lines 350, 380, 410) are able to say IF THEN GOTO START, where a comparable BASIC program would need to use the line number instead of a word: IF THEN GOTO 180. This relieves the programmer of having to look up line addresses for his subroutines or major entry points as well as eliminating a frequent cause of errors.
The next two fields are the instruction and operator fields (as in a disassembly) except that now some of the operators have been replaced by labels. If, as in line 190, we see LDA (load accumulator) TEXT we can find the value or meaning of the label, TEXT, in three ways. We can look over to the second field, where the raw numbers are, we can look earlier in the program where TEXT is defined (line 80), or we can look at the end of the program in the "Label File." TEXT refers to data which starts at address $035A and which line 80 defines as being the word "code?"
The last field holds comments which describe the function of the line where they appear (and sometimes, subsequent lines). The semi-colon is the same as REM — anything which follows it serves to document the program, but is ignored by the assembler. This makes later modifications easier, debugging faster, and also helps to reveal the meaning of the program to others.
To review, we see a progress toward clarity, from Program 1 to Program 4, largely due to the addition of new fields of information. Program 1 contains a single field: decimal data. Program 2 adds an address field. Program 3 adds fields three and four — a translation into instruction mnemonics and operators of the raw data from field two. And Program 4 adds line numbers, labels and comments — for a total of seven fields. We have now examined the horizontal organization of a M.L. program, from its simplest form to its most complex. Using the most complex example, (Program 4), let's twist ourselves sideways and go on to investigate the vertical organization of M.L. programs.
The Four Parts Of A Computer Program
All programs — in fact, all thinking — can be broken down into four essential parts: 1. Initialization and Protection, 2. Data Tables, 3. Main Loop, 4. Subroutines. Before learning a new word (thinking), a person must: 1. not be being shot to death, 2. have a dictionary, 3. start looking up the word, and 4. move his thumbs correctly, know or guess the spelling, keep his balance, etc. The order of these elements is important. Without protection, any M.L. routine between addresses 1024 and the screen RAM at 32768 can be overwritten by a BASIC program either by a LOAD or because BASIC puts some of its variables up at the top of available RAM where M.L. programmers like to stick routines.
Protection can be achieved by telling PET that its memory size has shrunk — changing the numbers in addresses 52 and 53 (134,135, in Original ROMs). Then all BASIC activity will be confined to RAM below the address resulting from (PEEK (52) + PEEK (53) * 256). Or, a short M.L. routine can be nestled into a space where BASIC doesn't usually go, such as the second cassette buffer. BASIC protects itself, so that is not of concern in BASIC programming.
A table is a collection of information (data) which the program will need. In Program 4, and in M.L. generally, the tables are placed at the beginning of the program (but sometimes at the end). It is good to get into a habit of keeping tables together and putting them at the start. In line 80, instead of an ordinary mnemonic, we have a pseudo-op, .BY, (pseudo-ops are preceded by a period). A pseudo-op is a request to the assembler program to perform some task for the programmer. In this instance, the programmer is requesting that an ASCII word, CODE?, be translated into bytes and stored to be used by the program later. Line 90 contains the pseudo-op, .DE, which defines the label, SEND-CHAR, as the address in BASIC ROM which prints a character to the screen. The .DS in line 110 tells the assembler to define some storage space, three cells large, called STORAGE which the program will later use as a place to hold the codeword PET.
A main loop is a series of steps which control the program as a whole. It is distinct from subroutines in that it calls subroutines, they do not call the main loop. In a complicated M.L. program, the main loop can be a series of JSR (Jump to Subroutine) instructions which defines the order in which subroutines are performed. In BASIC, it can take the form of an ON GOTO list of addresses, a series of GOSUBS, or a loop. In simpler programs, the main loop is often merely implicit — each subroutine is already arranged within the program in the desired order of execution. The program runs more or less sequentially from start to finish. In such cases, a governing loop is only implied.
In Program 4, the instructions break into two divisions: initialization and subroutine. Since it is a simple program, there is only a fragment of what would be a main loop in a larger program. The initialization zone is often at the start of a main loop, and sets up whatever preconditions the program will later expect (including protection). In this case, the word code? must be printed to the screen until the correct code is entered and the loop can be exited.
The phrase "cold start" refers to an entrance into a program at the very beginning of the initialization section. This will reset all flags, pointers, counters, etc. to their virgin condition. A "warm start" enters the mail loop beyond initialization, so that various kinds of information, modified during a program RUN, are left undisturbed. There is some ambiguity to these terms since initialization is sometimes unnecessary, or is sometimes refreshed on every entrance to the main loop (the warm start and the cold start would then be identical), or other anomalies. It is valuable, however, to develop a sense that a program has two distinct active parts and one passive part. The main loop governs the action of the subroutines. Data tables are passive zones of information which perform no tasks. And, before all else, an M.L. program must protect itself from a BASIC invasion. (M.L. can also require protection from interruptions, but this concept is outside of the purview of this article.)
How Security Lock Works
The main loop begins at line 180 (Program 4) and sets the Y register to zero so that it can act as an "offset" to the address called "TEXT" (a little table holding the word "CODE?"). This is much simpler than it sounds. "Offset" means add this number to a fixed number. TEXT has already been defined as a fixed address (035A) which is the start of the table holding the word "CODE?" So, in line 190, we LDA (load the accumulator register, a temporary resting place for bytes of data) with whatever is in the address TEXT + Y. Since Y was just loaded with a zero (LDY #0), the byte that we are putting into the accumulator will be at $035A itself. Line 200 tests to see if the whole word, CODE?, has been printed. BEQ means branch if you just loaded (LDA) a zero into the accumulator. But we didn't. Since address 035A has a 43 in it, now the accumulator also has the 43. We will only branch (go somewhere else) if it's a zero. So the branch is ignored and we continue to line 210 which jumps to a subroutine (JSR) in BASIC ROM which puts the character in the accumulator on the screen. The 43 (letter "C") will then appear and the control of the computer is returned to line 220, as in a BASIC RETURN command. Line 220 increases (increments) the number in the Y register by 1 (INY). It was a zero, so now it's a 1. Then, like a GOTO, line 230 jumps (JMP) to the line we've labeled LOOP (line 190) where the value of TEXT and Y are again added together to give the address where we will find what to put into the accumulator. This time, however, since Y now equals 1, the "effective" address is 035B, where the letter "O" is waiting to be picked up. After looping this way for a while, increasing the address each time by increasing the value of Y, we will eventually pick up a zero which we thoughtfully placed in address 035F, the end of our TEXT table (see line 80). This is a "delimiter" to let the loop know that we are finished and that it should now BEQ (branch if equal to zero) to COMPARE, line 270.
Here we load the accumulator with the code letters and put them into the previously defined (line 110) storage area within our tables. This time, rather than setting up a loop and a delimiter, we add the offset directly to the labels: STORAGE + 1, STORAGE + 3. At line 330 we again jump to subroutine in BASIC ROM which will input a single letter from the human and leave it in the accumulator, returning from the subroutine to line 340. Here, we compare (CMP) this letter in the accumulator with the first letter in the STORAGE zone, a "P." If the accumulator does not match "P," then the instruction in line 350 (BNE, branch if not equal) takes effect and the computer is thrown back to START. If it was equal, we "fall through" to line 360 where the same comparison is done for "E." Any failure of equality causes a branch to START. If all three letters match, the instruction RTS (return from subroutine) puts us back into BASIC just beyond the SYS which threw us into the M.L. routine in the first place. SYS is merely a GOSUB to M.L. subroutines.