Threading
Disassembler
Elmer N. Keil
Disassembler
Elmer N. Keil
This machine language disassembler follows a program's branches and jumps, rather than taking a linear path. Written in Microsoft BASIC for the Commodore PET, it will also work without changes on the 64 or on a VIC (with at least 16K memory expansion). With limited conversion, it should work on any 6502-based computer.
Most assemblers and disassemblers proceed through a machine language program in a linear fashion from the lowest to the highest address, which is fine as long as the program contains few jumps and branches.
However, when trying to find your way through complex routines such as the built-in ROM, a linear disassembler is almost useless. For example, the warm start entry point sets a flag in page zero; loads the accumulator and Y register; and jumps (JSR) about 1700 bytes away, only to jump immediately to another location 2400 bytes away. It settles down for ten instructions before going into several sets of compare-test-branch instructions which lead off in all directions. It can be frustrating to list long routines, only to find that the first instruction goes somewhere else. And trying to find your way back after all this jumping can be a real challenge. One solution is to use a threading disassembler, which follows the execution thread as it weaves through various parts of memory and keeps track of where to return after each jump.
An Efficient Structure
It is a common practice to place initialization at the end of a program so that an interpretive BASIC will not have to continually search past a block of one-time code. This program pushes that concept one step further by placing the main loop at the beginning and then jumping far back into the program and gradually working its way forward. This was an experiment, and the results are not clearly evident. The program stays ahead of my printer and can scroll listings off the CRT at a moderate rate (use the RVS key on the PET, or CTRL key on the VIC and 64, to slow the scrolling).
The program starts by initializing some variables and asking the user to select various options. Since this section is only used once, it has been placed at the end of the program so the main program loop will execute more efficiently.
The main loop gets an instruction, checks to see if it changes the program flow, decodes and formats it for printing, and then follows the flow. A dummy stack is maintained to keep track of the return points in much the same manner as the hardware stack.
Once started, the program will loop continuously until stopped. The loop contains a sequence which terminates the program when the Q (quit) key is pressed.
User's Choice
It is impossible for the program to know how to interpret the conditions associated with conditional branches. The program will display the branch destination and ask the user if the branch should be followed.
Although it is often possible to look at the preceding instructions and determine what conditions should exist, sometimes all you can do is take your best guess and see where it leads you.
The Start-Up Routines
Initialization, called by the GOSUB at line 40, clears the screen and homes the cursor for neatness. The pseudo stack pointer (SP), the stack array (SS), and the pseudo program counter (PC) are allocated. Arrays GO$, G1$, and GG$ are filled with the 6502 mnemonics (the mnemonic BAD represents invalid opcodes). Variable TP is set to the highest addressable memory location.
Since dividing by a power of two is the same as shifting a binary number to the right, variables B3 and B6 are set up to shift bit three or bit six respectively into the low order bit position. LB is used to scale the left byte before adding the number to the right byte when generating an address.
Hex Lookup Table
HX$ is a lookup table of valid hexadecimal numbers. Variable OP is set to the screen device number, but it may be changed to the printer device number depending on the answer to the first question, PRINTER OUTPUT? OP is used only once in the opening of file number PR, and all writes to the listing device are to file PR. (VIC users may want to change some of the PRINT statements to better fit their 22-column screens.)
The second question, TITLE?, lets you identify the listing at some future time.
Although the program was intended as a threading disassembler, it's possible to use it as a standard block disassembler, depending on your answer to question three, BLOCK DISASSEMBLY?; the program sets BD=0 for threading and BD =1 for block disassembly.
Select Decimal Mode
Normal input/output format for numbers is hexadecimal-you can select decimal mode in answer to the fourth question, DECIMAL MODE?; the program sets variable HX=1 for hexadecimal mode and HX=0 for decimal mode.
And finally the last and most important question, STARTING LOCATION?; respond with a decimal or hexadecimal number according to the mode selected. The program then prints the remaining header information and reminds you to press Q to quit at any time.
The main loop consists of lines 80-170. Line 80 looks for a keypress and will ignore anything other than a Q (including no keypress); a Q will break the loop and terminate the program at line 970. Line 90 PEEKs a byte from the location pointed to by the pseudo program counter (PC) and then calls a subroutine at 280 to convert the PC to a hexadecimal string. Line 100 combines the hex string with a decimal equivalent string and some blank spacers into PC$ for later printing.
Lines 110-130 calculate the three parts of the opcode that was PEEKed in line 90, and line 170 branches according to the type of opcode.
Converting The Opcodes
Lines 700-1470 process all of the branches, jumps, and other opcodes which change program flow. This is the heart of the program. Lines 700-710 do a table lookup for the opcode mnemonic and verify that it is a valid opcode. Line 720 tests for conditional branches and jumps to 1330 to process any it finds.
All other opcodes which change program flow are detected in line 730, which could transfer control to line 760. Line 740 branches to the appropriate routine to process and format the operand according to the addressing mode indicated by the opcode.
Line 760 further checks for opcodes which change program flow; these are processed at line 1010. JMP is detected at line 770 and processed at line 820.
Creating The Mnemonics
Once the program flow has been processed, the opcodes are processed. The mnemonics are obtained by a table lookup, the addressing mode is determined, and the operand is formatted accordingly. Lines 350 and 370 represent subroutines for fetching one-byte and two-byte operands respectively.
Lines 250-310 represent a subroutine for converting the operand value to a character string, and lines 430-650 may add supplemental information to the operand string as well as generating a comment string CM$ to identify addressing mode. Line 210 prints the collection of information about this particular instruction and then jumps to line 80 to start the loop for the next instruction.
If you don't have a printer, line 210 can be changed by dropping the blank spacer B2$ and the addressing mode comment CM$ from the end of the PRINT command. This will shorten the print line to under 40 characters and let you view more disassembled instructions at one time. Use the RVS or CTRL key to slow down the scrolling.
Threading Disassembler
Refer to "COMPUTE!'s Guide To Typing In Programs" before entering this listing.
40 GOSUB 2070 :rem 173
60 REM{10 SPACES}MAIN LOOP STARTS HERE
:rem 174
80 GET Z$:IF Z$=Q$ GOTO 970 :rem 152
90 B1=PEEK(PC):A=PC:GOSUB 280 :rem 193
100 PC$=RIGHT$(BL$+STR$(PC),5)+RIGHT$(BL$
+A$,6)+"{3 SPACES}" :rem 232
110 P1=INT(B1/B6):A=B1-P1*B6 :rem 33
120 P2=INT(A/B3) :rem 115
130 P3=A-P2*B3 :rem 227
150 REM{3 SPACES}ANALYZE OP CODE :rem 72
170 ON(P3+1)GOTO 700,1520,1670,1930
:rem 28
190 REM{3 SPACES}PRINT A DISASSEMBLED LIN
E :rem 228
210 PRINT#PR,PC$;OP$;B2$;LEFT$(ND$+BL$,14
);B2$;CM$:GOTO 80 :rem 53
230 REM{3 SPACES}CONVERT OPERAND :rem 163
250 IF HX=1 GOTO 280 :rem 7
260 A$=STR$(A):RETURN :rem 3
280 ZZ$="":A$="":IF A<0 THEN A=-A:A$="-"
:rem 241
290 J=INT(A/16):ZZ$=MID$(HX$,A-(J*16)+1,1
)+ZZ$ :rem 25
300 A=J:IF A>0 GOTO 290 :rem 167
310 A$=A$+ZZ$:RETURN :rem 184
330 REM{3 SPACES}GET OPERAND :rem 99
350 A=PEEK(PC+1):PC=PC+2:GOSUB 250:RETURN
:rem 224
370 A=PEEK(PC+1)+LB*PEEK(PC+2):PC=PC+3:GO
SUB 250:RETURN :rem 44
400 REM{3 SPACES}ADDRESSING MODES:rem 212
410 REM{8 SPACES}ZERO PAGE + INDEX
:rem 121
430 GOSUB 350:ND$=A$+",X":CM$="ZERO PAGE,
INDEX X":GOTO 210 :rem 2
450 GOSUB 350:ND$=A$+",Y":CM$="ZERO PAGE,
INDEX Y":GOTO 210 :rem 6
470 REM{8 SPACES}ZERO PAGE :rem 220
490 GOSUB 350:ND$=A$:CM$="ZERO PAGE":GOTO
210 :rem 25
510 REM{8 SPACES}ABSOLUTE + INDEX:rem 124
530 GOSUB 370:ND$=A$+",X":CM$="ABSOLUTE,I
NDEX X":GOTO 210 :rem 7
550 GOSUB 370:ND$=A$+",Y":CM$="ABSOLUTE,I
NDEX Y":GOTO210 :rem 11
570 REM{8 SPACES}ABSOLUTE :rem 223
590 GOSUB 370:ND$=A$:CM$="ABSOLUTE":GOTO
{SPACE}210 :rem 30
610 REM{8 SPACES}IMMEDIATE :rem 10
630 A=PEEK(PC+1):PC=PC+2:GOSUB 280
:rem 202
640 ND$="#"+A$:CM$="IMMEDIATE" :rem 130
650 GOTO 210 :rem 103
670 REM{7 SPACES}GROUP ZERO OP CODES
:rem 91
680 REM{8 SPACES}(SOME MOSTECH GROUP 3)
:rem 218
700 OP$=MID$(GO$(Pl),P2*3+1,3) :rem 25
710 IF OP$=BD$ GOTO 1970 :rem 219
720 IF P2=4 GOTO 1330:{5 SPACES}REM
{13 SPACES}8 BRANCHES :rem 183
730 IF P1<4 GOTO 760:{6 SPACES}REM
{13 SPACES}SPECIAL FUNCTION :rem 117
740 ON(P2+1)GOTO 630,490,1720,590,1930,43
0,1720,530 :rem 56
760 IF P2=0 GOTO 1010:{5 SPACES}REM
{12 SPACES}BRK,JSR,RTI,RTS :rem 110
770 IF OP$="JMP" GOTO 820: REM{12 SPACES}
JMP :rem 48
780 ON(P2+1)GOTO 1930,490,1720,590,1930,4
30,1720,530 :rem 112
800 REM{4 SPACES}JUMPS HANDLED HERE
:rem 31
820 Bl=PEEK(PC+1)+LB*PEEK(PC+2):A=B1
:rem 35
830 GOSUB 250:ND$=A$:CM$=BL$ :rem 33
840 IF(BD=1)AND(Pl=2) THEN PC=PC+3:GOTO 1
170 :rem 176
850 IF P1=2 THEN PC=BI:GOTO 1170 :rem 202
860 ND$="( " + ND$ + " )" :rem 118
870 Bl=PEEK(B1) + LB*PEEK(Bl+1):A=Bl:GOSU
B 250 :rem 220
880 PRINT#PR:PRINT#PR,"*** ENCOUNTERED IN
DIRECT JUMP" :rem 54
890 PRINT#PR,"{2 SPACES}THRU ";ND$;"
{2 SPACES}TO ";A$ :rem 89
900 IF(BD=1) THEN PC=PC+3:GOTO 1170
:rem 153
910 PRINT:PRINT"ENCOUNTERED INDIRECT JUMP
":PRINT" THRU ";ND$;" TO ";A$:rem 253
920 PRINT:PRINT"IS THIS VALID ?":INPUT A$
:rem 229
930 IF LEFT$(A$,l)=YA$ THEN PC=Bl:GOTO117
0 :rem 54
940 PRINT#PR :rem 239
950 PRINT:PRINT"DO YOU WANT TO CONTINUE ?
":INPUT A$ :rem 118
960 IF LEFT$(A$,1)=YA$ THEN GOSUB 2320:GO
TO 80 :rem 220
970 CLOSE PR:END :rem 201
990 REM{5 SPACES}HANDLES{2 SPACES}BRK,JSR
,RTI, AND RTS :rem 146
1010 ON(Pl+1)GOTO 1020,1120,1060,1210
:rem 92
1020 A=PC:GOSUB 250:PRINT#PR:PRINT#PR,"**
**{2 SPACES}BREAK AT ";A$ :rem 239
1030 PRINT:PRINT"ENCOUNTERED BREAK AT ";A
$ :rem 50
1040 GOTO 940 :rem 155
1060 A=PC:GOSUB 250:PRINT#PR:PRINT#PR,"**
**{2 SPACES}RTI AT ",A$ :rem 125
1070 PRINT:PRINT"ENCOUNTERED RTI AT ";A$
:rem 192
1080 GOTO 940 :rem 159
1100 REM{33 SPACES}STACK{2 SPACES}(JSR)
:rem 92
1120 A=PEEK(PC+l) + LB*PEEK(PC+2):rem 240
1130 LC=PC:IF(BD=1) GOTO 1150 :rem 50
1140 SP=SP+l:SS(SP)=PC+2 :rem 166
1150 PC=A:GOSUB 250:ND$=A$:CM$-BL$
:rem 152
1160 IF(BD=1) THEN PC=LC+3 :rem 136
1170 PRINT#PR,"-----":GOTO 210 :rem 114
1190 REM{33 SPACES}UNSTACK (RTS) :rem 18
1210 IF(BD=1) THEN PC=PC+1:GOTO 1240
:rem 192
1220 IF SP<l GOTO 1270 :rem 103
1230 PC=SS(SP)+l:SP=SP-1 :rem 167
1240 PRINT#PR,"-----" :rem 106
1250 ND$=BL$:CM$=BL$:GOTO 210 :rem 80
1270 A=PC:GOSUB 250:PRINT#PR:PRINT#PR,"**
* RTS AT ";A$;" - STACK EMPTY"
:rem 17
1280 PRINT:PRINT"NO STACK ENTRY FOR RTS A
T ";A$ :rem 29
1290 GOTO 940 :rem 162
1310 REM{5 SPACES}BRANCHES - REL ADDR
:rem 26
1330 A=PEEK(PC+1) :rem 170
1340 IF A>127 THEN A=A-LB :rem 25
1350 B1= PC+2+A:ND$="*":IF A=>0 THEN ND$=
"*+" :rem 224
1360 GOSUB 250:ND$=ND$+A$:CM$=BL$ :rem 49
1370 A=BI:GOSUB 250:ND$=LEFT$(ND$+BL$,7)+
RIGHT$(BL$+A$,7) :rem 147
1380 A=PC:GOSUB 250 :rem 46
1390 PRINT :rem 90
1400 IF(BD=1) GOTO 1470 :rem 158
1410 PRINT OP$;"-- CONDITIONAL BRANCH ENC
OUNTERED" :rem 13
1420 PRINT" FROM ";A$;" TO ";ND$ :rem 127
1430 PRINT:PRINT"DO YOU WANT TO FOLLOW TH
E BRANCH ?" :rem 110
1440 INPUT A$ :rem 190
1450 IF LEFT$(A$,l)=YA$ THEN PC=BI:GOTO 1
170 :rem 100
1460 IF LEFT$(A$,1)=Q$ GOTO 970 :rem 71
1470 PC=PC+2:GOTO 210 :rem 146
1500 REM{6 SPACES}GROUP ONE OP CODES
:rem 38
1520 OP$=MID$(Gl$,Pl*3+1,3) :rem 120
1530 IF (P1=4)AND(P2=2) THEN OP$=BD$:GOTO
1970 :rem 205
1540 ON(P2+1)GOTO 1580,490,630,590,1620,4
30,550,530 :rem 55
1560 REM{8 SPACES}(INDIRECT,X) ADDRESSING
:rem 187
1580 GOSUB 350:ND$="( "+A$+",X )":CM$="IN
DEXED INDIRECT":GOTO 210 :rem 243
1600 REM{8 SPACES}(INDIRECT),Y{2 SPACES}A
DDRESSING :rem 183
1620 GOSUB 350:ND$="( "+A$+" ),Y":CM$="IN
DIRECT INDEXED":GOTO 210 :rem 239
1650 REM{9 SPACES}GROUP TWO OP CODES
:rem 68
1670 OP$=MID$(G2$,Pl*3+1,3) :rem 127
1680 IF P1<4 GOTO 1870{10 SPACES}REM
{11 SPACES}SHIFTS AND ROTATES :rem 2
1690 ON(P2+1)GOTO 630,490,1710,590,1830,1
740,1770,1810 :rem 215
1710 OP$=MID$(GG$,(Pl-4)*3+1,3) :rem 65
1720 ND$=BL$:CM$=BL$:PC=PC+1:GOTO 210
:rem 75
1740 IF P1<6 GOTO 450 :rem 32
1750 IF P1>5 GOTO 430 :rem 32
1770 OP$=MID$(GG$,Pl*3+1,3) :rem 149
1780 IF OP$=BD$ GOTO 1970 :rem 19
1790 GOTO 1720 :rem 212
1810 IF P1=5 GOTO 550 :rem 31
1820 IF P1>5 GOTO 530 :rem 31
1830 OP$=BD$:GOTO 1970 :rem 186
1850 REM{10 SPACES}SHIFTS AND ROTATES
:rem 120
1870 ON(P2+1)GOTO 1830,490,1890,590,1830,
430,1830,530 :rem 169
1890 ND$=BL$:CM$=BL$:PC=PC+1:GOTO 210
:rem 83
1910 REM{5 SPACES}VOID GROUP CODE:rem 137
1930 OP$=BD$:GOTO 1970 :rem 187
1950 REM{5 SPACES}INVALID OP CODE:rem 116
1970 ND$=BL$:CM$="BAD OP CODE" :rem 102
1980 Z$="{2 SPACES}":FOR I=0 TO 10
:rem 172
1990 A=PEEK(PC+I):GOSUB 280:Z$=Z$+A$
:rem 37
2000 NEXT :rem 1
2010 PRINT#PR:PRINT#PR,PC$;Z$;" HEX"
:rem 161
2020 PC=PC+1:GOTO1170 :rem 191
2050 REM{22 SPACES}INITIALIZATION:rem 211
2070 CL$=CHR$(147):PRINTCL$:{2 SPACES}REM
{11 SPACES}CLEAR SCREEN AND HOME CUR
SOR :rem 64
2080 SP=0:DIM SS(50):{9 SPACES}REM
{11 SPACES}POINTER AND STACK:rem 210
2090 PC=0:{20 SPACES}REM{11 SPACES}PROGRA
M COUNTER :rem 33
2110 DIM GO$(7):{14 SPACES}REM{11 SPACES)
OP CODES :rem 236
2120 GO$(0)="BRKBADPHPBADBPLBADCLCBAD"
:rem 245
2130 GO$(l)="JSRBITPLPBITBMIBADSECBAD"
:rem 62
2140 GO$(2)="RTIBADPHAJMPBVCBADCLIBAD"
:rem 29
2150 GO$(3)="RTSBADPLAJMPBVSBADSEIBAD"
:rem 70
2160 GO$(4)="BADSTYDEYSTYBCCSTYTYABAD"
:rem 144
2170 GO$(5)="LDYLDYTAYLDYBCSLDYCLVLDY"
:rem 164
2180 GO$(6)="CPYCPYINYCPYBNEBADCLDBAD"
:rem 88
2190 GO$(7)="CPXCPXINXCPXBEQBADSEDBAD"
:rem 98
2200 Gl$="ORAANDEORADCSTALDACMPSBC"
:rem 181
2210 G2$="ASLROLLSRRORSTXLDXDECINC"
:rem 33
2220 GG$="TXATAXDEXNOPTXSTSXBADBAD"
:rem 45
2230 TP=65535:{16 SPACES}REM(11 SPACES}ME
MORY ADDRESS LIMIT :rem 44
2240 B3=4:B6=32:{14 SPACES}REM{11 SPACES}
SHIFTS OP CODE RIGHT :rem 41
2250 LB=256:{18 SPACES}REM{11 SPACES}LEFT
BYTE MULTIPLIER :rem 181
2260 BL$="{14 SPACES}":YA$="Y":BD$="BAD":
B2$="{6 SPACES}" :rem 78
2270 HX$="0123456789ABCDEF":Q$="Q":rem 51
2280 OP=3:{20 SPACES}REM{11 SPACES}CRT DE
VICE RETURN WITHOUT GOSUB :rem 38
2290 PRINT"DO YOU WANT PRINTER OUTPUT ?":
INPUT A$ :rem 235
2300 IF LEFT$(A$,1)=YA$ THEN OP=4:
{5 SPACES}REM: PRINTER DEVICE RETURN
WITHOUT GOSUB :rem 176
2310 PR=5:OPEN PROP :rem 179
2320 PRINT:PRINT"WHAT IS A GOOD TITLE FOR
THIS ?":INPUT A$ :rem 168
2330 BD=0 :rem 187
2340 PRINT#PR:PRINT#PR :rem 167
2350 PRINT:PRINT"DEFAULT IS TO FOLLOW THE
PROGRAM THREAD :rem 8
2360 PRINT"DO YOU WANT A BLOCK DISASSEMBL
Y :rem 48
2370 INPUT Z$:IF LEFT$(Z$,1)<>YA$ GOTO 24
00 :rem 85
2380 BD=1:PRINT#PR,"{2 SPACES}BLOCK DISAS
SEMBLY OF":PRINT#PR,".. ";A$:rem 245
2390 GOTO 2410 :rem 206
2400 PRINT#PR,"{2 SPACES}THREADING DISASS
EMBLY OF":PRINT#PR,"{3 SPACES}";A$
:rem 143
2410 PRINT#PR :rem 25
2420 PRINT"DEFAULT IS HEX MODE":PRINT"DO
{SPACE}YOU WANT TO USE DECIMAL ?"
:rem 215
2430 HX=1:INPUT A$ :rem 6
2440 IF LEFT$(A$,1)=YA$ THEN HX=0:PRINT"D
ECIMAL MODE SELECTED" :rem 90
2450 PRINT"DISASSEMBLY TO START AT LOCATI
ON ?" :rem 58
2460 GOSUB 2560:PC=A:IF PC>TP GOTO 2450
:rem 166
2470 A=PC:GOSUB 250:PRINT#PR,"STARTING LO
CATION =";A$ :rem 205
2480 PRINT#PR :rem 32
2490 PRINT#PR,"LOC{12 SPACES}OP{5 SPACES)
OPERAND" :rem 23
2500 PRINT#PR :rem 25
2510 PRINT:PRINT" PRESS Q TO STOP AT ANY
{SPACE}TIME":PRINT :rem 154
2520 RETURN :rem 169
2540 REM{13 SPACES}SUBROUTINE TO GET STAR
TING LOCATION :rem 7
2560 IF HX=1 GOTO 2590 :rem 115
2570 INPUT A:RETURN :rem 185
2590 A=0:INPUT A$:IF LEN(A$)>4 THEN PRINT
"TOO BIG-TRY AGAIN":GOTO2590 :rem 16
2600 OK=1:FOR I=1 TO LEN(A$):Z$=MID$(A$,I
,l) :rem 91
2610 BAD=1:FOR J=1TO16:IF Z$<>MID$(HX$,J,
1) GOTO 2630 :rem 140
2620 BAD=0:A=A*16+J-1 :rem 91
2630 NEXT:IF(BAD=0)THEN NEXT:GOTO 2650
:rem 6
2640 PRINT:PRINT"INVALID HEX CHAR":OK=0:N
EXT :rem 40
2650 IF OK=1 THEN RETURN :rem 115
2660 GOTO 2590 :rem 215