Q L H A C K E R ' S J O U R N A L =========================================== Supporting All QL Programmers =========================================== #23 January 1996 The QL Hacker's Journal (QHJ) is published by Tim Swenson as a service to the QL Community. The QHJ is freely distributable. Past issues are available on disk, via e-mail, or via the Anon-FTP server, garbo.uwasa.fi. The QHJ is always on the look out for article submissions. QL Hacker's Journal c/o Tim Swenson 5615 Botkins Rd Huber Heights, OH 45424 USA (513) 233-2178 swensontc@mail.serve.com http://www.serve.com/swensont/ EDITOR'S FORUMN Have you ever sat in front of the word processor, looking at a blank screen, and not really know what to type. This is how I am as I try to write this article. I always write this article last and use it to cover any loose thoughts that I might have from the other articles. But, not this time. I only have two things I want to pass on. QHJ Freeware. I believe that the proper response to the phrase "Why doesn't someone ..." is "Yes, why don't you." A related axiom is the phrase "Why isn't there ..." So when I said "why isn't there a QL Freeware distributer in North America", guess what my answer was. QHJ Freeware is a service for North American QL and Z88 users. I am collecting QL and Z88 freeware files and making them available for QL and Z88 users (QDOS disks only). Since most QHJ readers do not reside in North America, I'll keep it brief by saying that if you are interested in what files I have, send me an SASE and I'll send you the details. The list of files is also available on my Web Page: http://www.serve.com/swensont/ The only cost of the freeware is return postage for the blank disks you send. I am not trying to compete with the European Freeware distributers so that's why I'm limiting it to North Americans only. For others, I can put any files you ask for on one of the QL FTP servers. Plus, all QHJ Freeware files are available on QBOX-USA at 810-254-9878 (300-14.4 baud, 81N, 24 hours a day) The other thing I wanted to talk about is C68 version 4.20a. It has just been made available on ftp.nvg.unit.no so I now have a copy. Even though it's dated 20 Nov 95, it's brand new to me. The biggest thing I've seen with this version (at least what was put on the FTP server) is that the source files are not included. There are three runtimes disks (Main System Disk, Boot + Extras Disk, and Utilities Disk) and two documentation disks (General & Main Programs and Utilities & Libraries). Since I am not a big user of C68, I can't tell you what the biggest changes are from the last version. North American QL users can get the whole package from QHJ Freeware. So until next time, enjoy the issue. SOFTWARE REUSE AND SSB Through work I get a number of computer related magazines, either directly to me or ones that are recieved by the office. This gives me a chance to scan articles on a variety of subjects, of which some I find applicable to my computing. I like to try to apply different programming concepts I glean from these articles to my personal programming. Not being a developmental programmer at work ( I mostly hack a bit in Perl, C Shell, Awk, or HTML.), I am not able to professionally utilize articles on Software Engineering, Object Oriented Programming, Computer Aided Software Development, etc. Issues dealing with programming teams do not apply well to a single programmer. While reading one magazine, I read a series of articles on Software Reuse. Software Reuse has been touted as one of the key benefits of Object Oriented Programming. Software Reuse is the concept that the code you write can be easily used by another programmer. This concept has been well used in the form of libraries consisting of "canned" functions and procedures. A good example is the standard library of C, which is a number of "reused" routines. Proponents of OOP have claimed that OOP can take software reuse beyond just the reuse of functions and procedures. Something not believed by the author of the series. He stated that developers have a tendency to develop from scratch for three reasons: - The initial cost of investigating someone elses work is often high. - There is uncertainty as to whether the code will work. - There is uncertainty as to who will fix the code if it does not work. This sort of thing is still happening. There is so much source code available on the Internet or through CD-ROM, but many programmers still write the simple stuff from scratch. After reading the series, I pondered on how software reuse could be utilized in my programming. Since, by default, I almost always write in Structured SuperBasic I thought about how reuse can be implmented using it, especially in comparison to straight SuperBasic. As I mentioned above, the primary way to implement software reuse it in the reuse of procedures and functions. These procedures and functions should be written as solid little boxes of code. I mean solid in that they should be well tested and error free. I mean little in that they should not try to accomplish too much. They should be designed to perform one task well. Each procedure or function can be stored in a seperate file and pulled into your code by using the #INCLUDE feature of SSB. Implemented this way, the actual code of the routines do not clutter up your "real" code and they can look like a library routine. You can MERGE code in straight SuperBasic, but with line numbers you always have to worry about any MERGEd routines having the same line numbers of others. With SSB, this problem goes away. With the proper use of #INCLUDE, the C language concept of static variables can be implemented. Static variables are those variables used by routines that are "remembered" between times the routine is called. In a way static variables behave like global variables. When storing a routine in a seperate file, one would normally just have the function or procedure code in the file. By having one or more variables written in the file, but outside the definition (scope) of the function or procedure, these variables become global or static. This is only true if the #INCLUDE statement is used outside of function or procedure definitions. An example is this: var = 0 DEFine FuNction sum( in_var ) var = var + in_var RETURN var END DEFine Now the value of var will be retained between calls to the function sum(). Try to name your static/global variables with names that are unlikely to be used in the main program. If someone else #INCLUDEs this function and used the variable var in their program, the effect is totally lost and the function will not work. RESPONSE TO DAY_OF_WEEK By John Southern While playing around with the function on paper there appears to be a slight error. I am sure that I will not be the first to see this but I will attempt to explain anyway. The year part of the function Y + Y/4 - Y/100 + Y/400 is correct taking into account leap years, centuries and leap centuries. The month part however, 2*m + 3*(m+1)/5, is fine in most cases except for the way rounding occurs. Consider 20 June 1995. We get: 30+2*6+3*7/5+1995+1995/4-1995/100+1995/400 30+12+4.2+1995+498.75-19.95+4.9875 2524.9875 The Modulus 7 of this is 4.9875. Rounded up to give 5, which equals Friday. Now consider 2 July 1995. We get: 2+2*7+3*8/5+1995+1995/4-1995/100+1995/400 2+14+4.8+1995+498.75+4.9875 2499.5875 The Modulus 7 of this is 0.5875. Rounded up this give us 1, which equals Monday. This should be 0 for Sunday. To correct this could I suggest the following function instead: DEFine FuNction day_of_week$(d,m,y) LOCal a IF m=1 OR m=2 THEN m=m+12 d=d-1 END IF a = INT((d+2.6*m+y+y/4-y/100+y/400+0.8125) MOD 7) IF a = 0 THEN RETurn "Sunday" IF a = 1 THEN RETurn "Monday" IF a = 2 THEN RETurn "Tuesday" IF a = 3 THEN RETurn "Wednesday" IF a = 4 THEN RETurn "Thursday" IF a = 5 THEN RETurn "Friday" IF a = 6 THEN RETurn "Saturday" END DEFine The 2.6*m is for the offset each month required due to months not being made up of an exact number of weeks. The 0.8125 is the initial adjustment to set Sundays correctly. ARCHIVE INDENT I have always thought that Archive has always been one of the underrated programming languages for the QL. When I first looked at Archive, it reminded me a lot of dBase III. What has limited me from really programming in Archive has been not having a big need to write a database application and not knowing Archive well enough to get what I wanted done. Granted Archive is not the best database programming language for the QL, but it has two distinct things going for it: it is adequate enough for most database needs, and EVERY QL user has a copy. If you were going to write a database for other QLers to use, you would do it in Archive. Recently when I did want to create a database application by putting the Master Sinclair E-Mail list in Archive and write some procedures to generate the lists. The problem I ran into is the knowledge factor I mentioned above. Joining two tables in Archive is not too obvious by reading the limited QL guide. Bill Cable has written a series of columns for UPDATE on Archive, but I have yet to get the energy to sit down and read them. To make it easier, Bill said that he is working on putting all of the columns in one booklet. For those who don't know Bill Cable, he sells QL software as Wood & Wind Computing, and is probably THE North American Archive programmer. One small nagging thing about programming in Archive is that the _prg files are not indented. When editing procedures in Archive, everything is indented, making the code easy to read. When you save the file and go to print it out, it's all left justified. The following SuperBasic program takes in an Archive _prg file and indents it. I've played with Archive and it will accept an indented file when you LOAD a procedure, but it still saves it as left justified. This means that you can create your code in another editor, load it into Archive to test, run it through the indenter, and still have code that will work in Archive. Archive's native indentation style is a little off the norm of indenting. It uses a style like this: proc dummy --------------- --------------- --------------- endproc The more accepted style of indenting is like this: proc dummy -------------- -------------- -------------- endproc By uncommenting two lines of code (marked in the code) and deleting these lines later in the code, the program will support the native Archive indenting style. You will also need to delete the two IF statements with "else" in them (when determining when to indent or unindent). Archive does not normally unindent an ELSE statement. ## Archive Indent ## This program will take a Archive program ## and produce an indented version of the ## program. NONE = 0 PLUS = 1 MINUS = 2 OPEN #3,con_300x200a75x0_32 BORDER #3,2,4 PAPER #3,0 : INK #3,4 : CLS #3 PRINT #3,"Enter input file: (_prg) " INPUT #3,in_file$ PRINT #3,"Enter output file: " INPUT #3,out_file$ ## assume all input files have a _prg ext. in_file$ = in_file$&"_prg" OPEN #4,in_file$ OPEN_NEW #5,out_file$ ## indent$ holds how many spaces to indent. ## it will grow and shrink with the indenting. indent$ = "" REPEAT loop indent = NONE IF EOF(#4) THEN EXIT loop INPUT #4,in$ ## To use the same indent style as Archive does, ## uncomment these lines and delete/comment the ## same lines below. ## ## ignore any leading blanks ## temp = first_char(in$) ## PRINT #5,indent$;in$( temp TO ) first$ = first_word$(in$) IF first$ = "endproc" THEN indent = MINUS IF first$ = "endif" THEN indent = MINUS IF first$ = "endwhile" THEN indent = MINUS IF first$ = "endall" THEN indent = MINUS IF first$ = "endcreate" THEN indent = MINUS IF first$ = "else" THEN indent = MINUS ## if minus then shrink indent$ IF indent = MINUS THEN IF LEN(indent$) = 3 THEN indent$ = "" ELSE indent$ = indent$( 1 TO LEN(indent$)-3) END IF END IF ## ignore any leading blanks temp = first_char(in$) PRINT #5,indent$;in$( temp TO ) IF first$ = "proc" THEN indent = PLUS IF first$ = "if" THEN indent = PLUS IF first$ = "while" THEN indent = PLUS IF first$ = "all" THEN indent = PLUS IF first$ = "create" THEN indent = PLUS IF first$ = "else" THEN indent = PLUS ## increase indent$ by 3 spaces IF indent = PLUS THEN indent$ = indent$ & " " END REPEAT loop CLOSE #5 CLOSE #4 CLOSE #3 STOP ## Function first_word$ DEFine FuNction first_word$(a$) LOCal x REPEAT loop1 IF a$(1) = " " THEN a$ = a$( 2 TO ) ELSE EXIT loop1 END IF END REPEAT loop1 x = " " INSTR a$ IF x = 0 THEN RETURN a$ RETURN a$( 1 TO x-1) END DEFine ## Function first_char ## returns the location of the first non-white ## space character. DEFine FuNction first_char (a$) LOCal count count=1 REPEAT loop2 IF a$(count) = " " THEN count = count + 1 ELSE RETURN count END IF END REPEAT loop2 END DEFine first_char [ Editors Note - The next couple of articles are based upon articles I've read and thought I would pass on to other programmers. I found the articles interesting and hope you might. I don't necessarily agree with the view points of the different authors, but I though their view points were worth hearing. I hope that these ideas will spark other ideas. ] NOTES ON PROGRAMMING IN C While surfing on the Net (as the media likes to hype) I ran into a site that had some essays on programming. One, "Notes on Programming in C" by Robert Pike had some interesting points. Procedure Names Procedures names should reflect what they do; function names should reflect what they return. Functions are used in expressions, often in things like IF's, so they need to read appropriately. IF (CHECKSIZE(x)) is unhelpful because we can't deduce whether CHECKSIZE returns true on error or non-error, instead IF (VALIDSIZE(x)) makes the point clear and makes a future mistake in using the routine less likely. Complexity Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost be self-evident. Data structures, not algorithms, are central to programming. Programming with Data Algorithms, or details of algorithms, can often be encoded compactly, efficiently and expressively as data rather than, say, as lots of IF statements. The reason is that the complexity of the job at hand, if it is due to a combination of independent details, can be encoded. A classic example of this is parsing tables, which encode the grammar of a programming language in a form interpretable by a fixed, fairly simple piece of code. Finite state machines are particularly amenable to this form of attack, but almost any program that involves the 'parsing' of some abstract sort of input into a sequence of some independent 'actions' can be constructed profitably as a data-driven algorithm. One of the reasons data-driven programs are not common, at least among beginners, is the tyranny of Pascal. Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored-program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works? Function Pointers Another result of the tyranny of Pascal is that beginners don't use function pointers. (You can't have function-valued variables in Pascal.) Using function pointers to encode complexity has some interesting properties. Some of the complexity is passed to the routine pointed to. The routine must obey some standard protocol - it's one of a set of routines invoked identically - but beyond that, what it does is its business alone. The complexity is distributed. There is this idea of a protocol, in that all functions used similarly must behave similarly. This makes for easy documentation, testing, growth and even making the program run distributed over a network - the protocol can be encoded as remote procedure calls. I argue that clear use of function pointers is the hard of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The O-O languages give you more of course, - prettier syntax, derives types and so on - but conceptually they provide little extra. Combining data-driven programs with function pointers leads to an astonishingly expressive way of working, a way that, in my experience, has often lead to pleasant surprises. Even without a special O-O language, you can get 90% of the benefit for no extra work and be more in control of the result. I cannot recommend an implementation style more highly. All the programs I have organized this way have survived comfortably after much development - far better than with less disciplined approaches. Maybe that's it: the discipline it forces pays off hansomely in the long run. DATABASES AND BITMAPS I read an interesting article that introduced the idea of using bitmaps in implemeting a database. I don't know if this concept will be of use to any QL programmers, but I found it interesting just to know. Conventional databases use B-trees and hashing to implement indexes. B-trees are just that, trees. An index is created in a tree structure lumping like records together. Hashing is using a mathmatical formula to distribute the records into an array, so that when you need to find them again, you just plug the record into the formula and it takes you to the proper place in the array. These structures work well in transaction-oriented systems (find, edit, delete), but they begin to have problems when the number of conditions of a query increase. The more ANDs and ORs the longer these structures take to search. Enter the bitmap index. Bitmap indexes store all possible values for a field into a bitmap. Bitmaps can be ANDed and ORed quickly to return a result. Queries can be made with many conditions in the index without even looking at the actual data in the records. Here is a example of a bitmap index. Using the illistration below, note that each record (1-7) has a 4-bit index which is holds the color of the car. With four possible values for color, the appropriate bit is turned on for each color. 1 2 3 4 5 6 7 ------------------- Blue 1 0 0 0 0 1 0 Cars 1 and 6 are Blue. Orange 0 1 0 0 0 0 0 Car 2 is Orange. Green 0 0 0 0 1 0 1 Cars 5 and 7 are green. Red 0 0 1 1 0 0 0 Cars 3 and 4 are red. Below is the query for "which cars are red, made in 1995, and cost $19,000." Each record has a bitmap index for each of the values. To get the result the three bitmaps are ANDed and the result shows in the Result column: Car #4. ORing three 7 bit numbers can be done very quickly. Red 1995 $19K Result ---------------------------- 1 0 0 0 0 2 0 0 1 0 3 1 0 0 0 4 1 and 1 and 1 = 1 5 0 0 0 0 6 0 0 0 0 7 0 1 0 0 Bitmap index have two major problems: they are hard to update and can't handle high cardinality. Cardinality is the number of different values each field can have. If you have a database of cities, there are many possible values for the city. Where as in this car example, there are only so many different colors a car can be. Bitmaps can take substantially longer to update than a plain B-tree. Like with most computer solutions, the most optimal solution may be a combination of all structures. Using B-trees, Hashing, and Bitmap indexes, in conjunction, may work out to be the best approach. DESKJET PRINT FILTER Once I got my DeskJet printer, I wanted to be able to produce nice looking text output. Quill and other word processors support only monospace fonts. The word processors that do support proportional spaced fonts are not real cheap. Being one to write my own print filters, I set myself out to write a print filter that will support word wrap and proportional spaced fonts. dj_print is such a printer filter. I could not find any documentation giving me the physical size of the various characters, so I had to do some testing and a bit of guessing. These numbers are used in an array to add up how big a line is before word wrapping it. I have only done these calculations for 12 pitch, so the program will not work well for other pitches. Centering of text still needs to be worked on, esp. with different pitch sizes. So this all turns out that the program only supports regular text at 12 pitch and centered text at 14 pitch (for nice headers). As for the commands that dj_print supports, they are listed in the procedure command. Like my other print filters, dj_print uses dot commands, like .BD. They should be on a line by themselves. Remember, this code is a project in the works. In other words, it still needs some work, but will pretty much do the job. ** DJ Print ssb 2.0 ** This version supports proportional fonts OPEN #5,con_250x150A75x75_32 PAPER #5,4 : INK #5,0 : BORDER #5,7 CLS #5 PRINT #5,"Enter Name of File to Print :" INPUT #5," ";infile$ ** Default print size is 12 Point ** Default type is CG Times ** CG Times ESC (s1p12v0s0b4101T ** Univers ESC (s1p12v0s0b52T ** Times Roman ESC (s1p12v0s0b5T ** Helvetica ESC (s1p12v0s0b4T DIM word_array$(30,40) DIM letter_width(126) RESTORE FOR x = 32 TO 126 READ temp letter_width(x) = temp NEXT x ** Letter Size Data for 12 Point letters DATA 3.8,5,5.5,12.5,8.3,14.2,14.2,3.8,6.2,6.2,9,14.2 DATA 3.8,5.5,3.8,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3 DATA 8.3,8.3,4.5,4.5,14.2,14.2,14.2,8.3,16.6,12.5,11.1,11.1 DATA 12.5,11.1,10,14.2,14.2,5.5,7.1,12.5,11.1,16.6,12.5,12.5 DATA 10,12.5,12.5,9,11.1,14.2,12.5,14.2,12.5,12.5,12.5,5,8.3 DATA 5,8.3,8.3,3.8,8.3,9,7.1,9,8.3,5.8,7.6,9,4.5,4.5,8.3,4.5 DATA 14.2,9,9,9,9,6.2,6.2,5,9,8.3,14.2,8.3,8.3,7.1,8.3,6.6 DATA 8.3,14.2 OPEN_IN #4,infile$ OPEN #3,ser1 ** Set Program Defaults YES = 1 NO = 0 type$="4101T" cde = NO center = NO margin = 500 line_length = 0 PRINT #3,CHR$(27);"(s1p12v0s0b";type$ PRINT #3,CHR$(27);"&a10L"; REPeat loop ** Get the next line of text INPUT #4,in$ IF EOF(#4) THEN EXIT loop ** If the line is blank then output a CR/LF to end ** the last paragraph and another LF to create ** a space between the two paragraphs. IF LEN(in$) = 0 THEN PRINT #3,CHR$(13) PRINT #3,"" line_length = 0 END REPeat loop END IF IF LEN(in$) >=3 AND in$(1) = "." THEN command END REPeat loop END IF ** If the line is to be Centered IF center = YES THEN word_size = 0 ** Find out how wide line is FOR z = 1 TO LEN(in$) word_size=word_size+letter_width(CODE(in$(z))) NEXT z ** Determine how much to space over line temp = INT(((margin - word_size)/2)/3.8) ** Print the write number of spaces FOR z = 1 to temp PRINT #3," "; NEXT z PRINT #3,in$;CHR$(13) END REPeat loop END IF ** If the line is Program Code IF cde = YES THEN ** do no processing on text PRINT #3,in$;CHR$(13) END REPeat loop END IF char = 1 word_count = 1 ** Split out all the words in a line into an array. ** Leading spaces and extra spaces between words are ** ignored. REPEAT word IF char > LEN(in$) THEN EXIT word char$ = in$(char) ** Take out leading spaces IF char$ = " " AND char=1 THEN ## If in$ ends in a space IF char+1 > LEN(in$) THEN EXIT word in$ = in$(char+1 TO ) char = 1 END REPEAT word END IF ** Find end of word by finding the next space IF char$ = " " THEN word_array$(word_count) = in$(1 TO char-1) IF char+1 > LEN(in$) THEN in$ = " " ELSE in$ = in$( char+1 TO ) END IF word_count = word_count + 1 char = 1 END IF ** Find end of word by the end of the string IF char = LEN(in$) THEN word_array$(word_count) = in$ word_count = word_count + 1 EXIT word END IF char = char + 1 END REPEAT word ** Now go through each word FOR x = 1 TO word_count - 1 word_size = 0 word$ = word_array$(x) FOR z = 1 TO LEN(word$) word_size = word_size + letter_width(CODE (word$(z))) NEXT z IF line_length + word_size > margin THEN PRINT #3,chr$(13) line_length = 0 END IF PRINT #3,word$;" "; line_length = line_length + word_size + 3.8 NEXT x END REPeat loop PRINT #3,CHR$(27);"&l0H" CLOSE #3 CLOSE #4 CLOSE #5 DEFine PROCedure command cmd$ = in$(1 TO 3) ** Page Break IF cmd$=".pd" OR cmd$=".PB" THEN PRINT #3,CHR$(12); ** Bold IF cmd$=".bd" OR cmd$=".BD" THEN PRINT #3,CHR$(27); "(s3B"; IF cmd$=".bo" OR cmd$=".BO" THEN PRINT #3,CHR$(27); "(s0B"; ** Underline (fixed and floating) IF cmd$=".ul" OR cmd$=".UL" THEN PRINT #3,CHR$(27); "&d1D"; IF cmd$=".uf" OR cmd$=".UF" THEN PRINT #3,CHR$(27); "&d3D"; IF cmd$=".uo" OR cmd$=".UO" THEN PRINT #3,CHR$(27); "&d";CHR$(64); ** Italics IF cmd$=".it" OR cmd$=".IT" THEN PRINT #3,CHR$(27); "(s1S"; IF cmd$=".io" OR cmd$=".IO" THEN PRINT #3,CHR$(27); "(s0S"; ** Large Letters (14 Point) IF cmd$=".lg" OR cmd$=".LG" THEN PRINT #3,CHR$(27); "(s1p14v0s0b";type$; IF cmd$=".lo" OR cmd$=".LO" THEN PRINT #3,CHR$(27); "(s1p12v0s0b";type$; ** CG Times IF cmd$=".cg" OR cmd$=".CG" THEN type$="4101T" PRINT #3,CHR$(27);"(s1p12v0s0b";type$; END IF ** Univers IF cmd$=".uv" OR cmd$=".UV" THEN type$="52T" PRINT #3,CHR$(27);"(s1p12v0s0b";type$; END IF ** Times Roman IF cmd$=".tr" OR cmd$=".TR" THEN type$="5T" PRINT #3,CHR$(27);"(s1p12v0s0b";type$; END IF ** Helvetica IF cmd$=".hv" OR cmd$=".HV" THEN type$="4T" PRINT #3,CHR$(27);"(s1p12v0s0b";type$; END IF ** New Paragraph IF cmd$=".pp" OR cmd$=".pp" THEN PRINT #3,CHR$(13) line_length = 0 END IF IF cmd$=".tb" OR cmd$=".TB" THEN ** We hope there is a number after .tb tab = in$(5 TO ) IF tab = 1 THEN PRINT #3,CHR$(9); ELSE FOR z = 1 TO tab PRINT #3,CHR$(9); NEXT z END IF END IF ** Center Text IF cmd$=".ct" OR cmd$=".CT" THEN center=YES IF cmd$=".co" OR cmd$=".CO" THEN center=NO ** Program Code IF cmd$=".pc" OR cmd$=".PC" THEN PRINT #3,CHR$(27);"(s0p10h0s0b3T"; cde=YES END IF IF cmd$=".po" OR cmd$=".PO" THEN PRINT #3,CHR$(27);"(s1p12v0s0b";type$; cde=NO END IF END DEFine command STRIPHTML_C With the popularity of the World Wide Web, more and more information is being formatted in HTML, the "language" of the Web. Since HTML is pure ASCII this is not a problem for people that dont' have Web Browsers. But, HTML commands can make text look very convoluted. The following C program will read a file with HTML commands and stip them out and print out only the text information. Since HTML commands all start with a less-than sign ( < ), and end with a greater-than sign ( > ), striping out the HTML is relatively easy to do: as you read in characters and echo them to the output file, turn off echoing when you see a < and turn it back on when you see a >. The end result will be a regular ASCII file. It may not be formated to look nice, but the HTML stuff will be gone. /* striphtml_c This program takes in a file with HTML commands and outputs a file with the HTML commands stripped out. */ #include main() { char c, file1[30], file2[30]; int fd1, fd2, html; printf("Enter Input File Name : \n"); gets(file1); printf("Enter Output File Name: \n"); gets(file2); fd1 = fopen(file1,"r"); if (fd1 == NULL) { printf("Did not open file: %s",file1); abort(1); } fd2 = fopen(file2,"w"); if (fd2 == NULL) { printf("Did not open file: %s",file2); abort(1); } html = NO; while (( c = getc(fd1)) != EOF) { if ( html == NO ) { if ( c == '<' ) html = YES; else putc(c,fd2); } if ( html == YES ) { if ( c == '>' ) html = NO; } } fclose(fd1); fclose(fd2); } Example HTML file: Title of Document

Level 1 Text

This is a paragraph. This is a paragraph. This is a paragraph that will wrap in the browser until the end paragraph marker.

The Paragraph marker is also used to create a blank line of text.