CompuCorder speech storage and output device. (evaluation) Bud Stolker.
When I first read the flyer announcing computalker Consultants' newest product, CompuCorder, I thought back to that crisp winter evening in early 1979 when Oric, my Vector Graphic computeR, spoke his first words. Installing the software had taken a month, and had exposed me for the first time to software patching and (shudder) assembly language.
Daily I called the Computalker lab across the continent, reporting on the progress of the previous night, and asking for new equates to assemble, new bytes to poke. My speech synthesizer just wouldn't speak. Desperately I hoped that I hadn't thrown away money on a technology that I might never comprehend, and on a device that might never work.
I needn't have worried. Like the Heathkit people, Computaker Consultants wouldn't let me fail. My progress was slow at first, then quicker as I grasped how the software worked. I remember vividly my sense of wonder when the mysterious black box mounted on my new circuit board finally cleared its electronic throat and spoke. "How do I sound, Boss?" it said. It was more of a mumble than a crisp question, but I imagined that I understood. What a proud papa was I that night!
Technology has come a long way since synthetic speech first appeared in the microcomputer world in the mid 70's. The development of reliable, inexpensive circuit boards for voice output by Votrax and Computalker Consultants amounted to a genuine breakthrough in both price and performance. The alte 70's saw much of the hardware reduced in size from boards to chips. Today several manufacturers offer synthesizer chips, and devices ranging from elevators to automobiles are talking back to their operators. Talks In Its Master's Voice
Now the makers of that original Computalker speech synthesizer [reviewed in Creative Computing, Sept.-Oct. 1978] have introduced CompuCorder, a new circuit that allows any S-100 computer to talk--or below or whisper--in its master's voice. The CompuCorder board, which retails for $295 plus $10 for CP/M-compatible driver software, can reproduce speech, as well as music and sound effects, with surprisingly high quality. By talking into a microphone, you can personalize the machine to speak in any voice, in any language, with any message. When Oric speaks now, he sounds just like The Boss. And once again I feel like a proud papa.
CompuCorder is essentially a solid-state tape recorder. It accepts sounds from a microphone or other audio input, stores them as data in random access memory (and on disk files if desired), and plays them back on demand through a conventional amplifier and speaker. The incoming sounds are converted to digital pulses by the on-board hardware, and stored in compressed format as individual bytes. For playback, the driver software sends the circuitry a byte at a time. The hardware converts the digital data back into analog waves which can then be amplified and sent to a speaker. High-Fidelity Encoding Scheme
The encoding process, known as Continuously Variable Slope Detection (CVSD), enables high fidelity reproduction of speech or other sounds. The concept is simple but powerful. As a person talks into a microphone, the mike generates high frequency electrical waves that may be seen on an oscilloscope. The idealized version of one cycle of these waves is the classic sine wave: a gradual slope up to a peak, down again and through the baseline to a bottom point, then back up again.
The CVSD technique closely approximates the patterns of speech by continuously correcting the voltage output of the board to mimic the original waveforms. The signal produced on playback hugs the slope of the original wave, dropping a notch when the slope rises higher than it should, or boosting the signal when it starts to fall below the value of the original speech curve. Each instruction becomes either an "up" or a "down," or in a digital computer, a high bit or a low bit.
This technique can reproduce sounds with remarkable clarity, but at a price: it eats up lots of memory. At its highest bit rate (32 kilobits or 4K of memory per second), CompuCorder can only record 9.5 seconds of speech in a 48K CP/M system. And the speech files can be enormous. But in a computer with a 5mb hard disk, it is possible to record up to 20 minutes of continuous speech or 40 minutes with moderate frequency loss at the lowest (2K/second) data rate. Other Methods Of Storing Speech
An alternative method of encoding speech is the one used by Texas Instruments in its Speak and Spell educational computer: linear predictive code (LPC). LPC-generated speech requires only a tenth as much memory as the CVSD method, but encoding the information in the first place requires a mainframe computer. In fact, LPC speech analysis on a microcomputer could require as much as a day's worth of computation per second of speech.
"Big deal," you may be saying, "I can record and store speech in my Apple for 40 bucks, and I don't need any extra hardware." That's true; The Voice from Muse records speech through the cassette port and outpus it to the built-in Apple speaker. But the technique used is different, and fidelity loss is inevitable. The Voice simply counts the number of times the waveform swings across the baseline from positive to negative, and then POKES the Apple speaker once for each boundary crossing. This form of frequency modulation ignores most of the speech information and produces a characteristically harsh rasp.
The Voice and several similar programs make use of an ingenious and inexpensive way to make a computer talk and they don't require a great deal of memory to operate. But without specially tuned supporting hardware, they just can't produce high fidelity.
CompuCorder, on the other hand, can detect and reproduce the fine, often redundant details of a speech signal that give it an indefinably rich qualtiy. System Can't "Understand" Speech
Voice recognition is, alas, impractical with the CompuCorder, even though it is clearly listening as you speak into the microphone. The CVSD technique generates a very compact coding, which effectively disguises such things as the ends of words, so there is no way to tell how long the word "hello" is, for example. This makes it very difficult to analyze the waveforms using a standard approach.
"Current methods of doing continuous speech recognition require tens of thousands of dollars worth of equipment," says Ron Anderson of Computalker. "I don't expect to see continuous recognition of unlimited vocabulary before the year 2000. It will take years of research and small improvements. Maybe it will have to wait for new hardware, like 100 MHz processor chips. But don't look for any breakthroughs soon."
The Japanese are having some success with speech recognition, but that is to be expected, according to Anderson. "The Japanese language lends itself phonetically to speech recognition by computer. It has a very regular structure, with only about 60 different syllables, while English has hundreds, with much more complex patterns of connected consonants. With a precise language like theirs, the solutions are almost trivial." Well, almost. Developed For The Military
But the development of CompuCorder was guided by the need to reproduce speech, rather than to understand it. It was developed for the military for use in a battle game simulator. A computer could, for example, select pre-recorded messages for broadcast by walkie-talkie in responce to changing conditions on the simulated battlefield. And CompuCorder can mimic anyone from a four-star general to the lowliest grunt.
There are other military applications, too: computerized air traffic control systems, cockpit instrument panels that vocally warn pilots of potentail problems, and sophisticated tutoring machines with foreign language vocabularies stored on disk, to name just a few.
These are situations which demand a device that can do more than a traditional phoneme synthesizer. Speech output can be used effectively when machine operators are already overloaded with visual information, as is the case in the complex control room of a nuclear power station. When a large number of messages must be heard and understood the first time, the job calls for a high capacity random access stored speech device. A CompuCorder-equipped compuer with 64K of memory and a 5mb hard disk can do such jobs as well as systems costing three times as much.
Because the capabilities of CompuCorder are greatly extended when used with a hard disk system, the manufacturer is promoting this device as a board-level component suitable for OEMs (Original Equipment Manufacturers). It is designed for folks who want to sell voice store-and-forward systems, paging systems, automatic announcing machines, and the like, where the presence of a high speed, high density disk drive is a given. But CompuCorder presents possibilities for the imaginative computer hobbyist with floppy disk drives as well. A Singing Adventure Game?
Consider, for example, the radio amateur who wants an automatic repeater system. Or the hacker who wants to spice up his latest Adventure game with the creak of an opening door, the roar of an erupting volcano, or the siren song of a beautiful Lorelei. How about a really intelligent telephone answering machine, or a burglar alarm that can dial police and yell for help, or perhaps an alarm clock that sounds off with an appropriate reminder statement, selected from a repertoire of dozens--or hundreds--of messages?
All of these applications are possible with CompuCorder, but some will tax the computer--and its programmer--to the limit. The biggest problem is the enormous appetite of the device for memory. I have been working with CompuCorder for a month now, and have found it to perform adequately, given the constraints of my memory and disk capacity. Variable Sampling Rate
The user must decide before he installs CompuCorder how much memory to allocate as a speech buffer. Four headers supplied with the system control the bit rate of the device, and therefore the fidelity and length of each message. Sample rates run from 10K to 32K bits per second. The higher the sample rate, the better the speech quality becomes, but the more memory must be dedicated as a speech buffer.
The 32K rate reproduces sounds clear as you could want; the 10K rate is barely intelligible. For applications involving the telephone you would use the next-to-lowest rate, 16K bps, since Ma Bell limits her bandwidth anyway. Even so, in my 48K system, I was able to squeeze out only 19 seconds per message at this bit rate.
The sampling rate is optimized for human speech, so don't get the impression that this is a poor man's digital sound studio. Really high fidelity music would require a higher bit rate. That could easily be achieved by changing a resistor or two on the removable headers, but again, as the sampling rate goes up, the length of the sound segment drops.
I would have preferred that the resistor headers be swich-selectable; instead, the user must remove the board from the computer and manually plug in the header of his choice. The assumption, I suppose, is that a user will stick with one bit rate of most of his applications. Microphone And Amplifier Required
CompuCorder requires a quality amplifier to reproduce accurately the full range of the vocal tract. If the high frequency sibilants (sssss) don't come through, speech sounds as bit mushy. If the lows are cut off, speech sounds tinny. No amplifier is supplied with the system. This is not a major problem, though, since most people who have a computer probably have a high fidelity amp as well.
Two miniphone plugs on the board accept jacks for microphone input and amplifier output. Because the plugs are flush against the top of the board, I had a problem with cabling. My computer has a low-profile cabinet that would not close with cables connected to the CompuCorder. I went to the largest audio distributor in the city looking for right-angle miniphone plugs, but to no avail. When I am using the CompuCorder, therefore, I have to keep the lid of the computer open--an inelegant solution. I hope that on the next version of this board, Computalker Consultants will move the I/O plugs inboard.
CompuCorder occupies two consecutive ports on the computer bus, one for status and the other for data. The board is set up initially to use ports AC and AD hex. A dip switch allows for changing the port assignments, but such a change also requires modifying the software slightly. CP/M Software Supplied
While the well documented software supplied with CompuCorder is easy enough to use, it does require some working knowledge of assembly language. Although my assembly language skills have not significantly improved since I patched in my original Computalker software three years ago, I had no particular difficulties.
The software consists of five machine language programs written for the Intel 8080 microprocessor and will therefore work on Z80s and 8085s as well. Each is assembled for use with the CP/M operating system, but the author provided a way to move the code easily from one operating system to another.
The input/output routines are contained in "universal I/O modules" that can be inserted into the source code before assembly. Computalker makes available drop-in modules for close to a dozen popular 8080-based systems. This is a smart approach to software portability, and one whihc I hope will catch on. Not everyone has or wants CP/M! I prefer the North Star Disk Operating System to CP/M, and was able to convert the main demo program to be North Star-compatible without trouble.
The main demonstration program, Corder, operates like a tape recorder. By typing R, you can record a speech sample. Since the program automatically allocates memory, it will not crash the computer if you talk too long. If you want to say only a word or two, you can type D for Done. Typing a P will play back the speech as often as you like.
Two other demo programs, Record and Speak, store sounds in diskfiles and retrieve them. Because CP/M accepts concatenated commands, creating and saving a speech file is as simple as typing RECORD MESSAGE, then speaking into the microphone. To retrieve the speech, the command is SPEAK MESSAGE. Because the diskfiles are potentially quite long, there may be a significant delay between typing the command and having it processed. My single density Shugart drives take as long as 15 seconds to load the Speak program and the Messasge file it needs{ A Winchester disk would speed up the process considerably.
Computalker also provides subroutines for recording and speaking that can be used with any programming language that can call a machine language subroutine. I had no particular difficulty linking these programs to Basic, but it did require reassembling the subroutines to an unused corner of memory. I also had to reserve some space for a speech data buffer. A 48K system with an operating system, Basic, CompuCorder driver software and a speech buffer doesn't leave much room for anything else, so the basic programs I wrote were necessarily very limited.
Any serious executive program would have to be written in assembly language and shoehorned into whatever space was available. This may or may not be a serious flaw, depending on the application at ahdn and the skills of the programmer. Come to think of it, my Computalker speech synthesizer uses lots of memory also: 22K just for the driver software and speech buffer in its most "intelligent" mode. System Human-Engineered
I found this board extremely easy to use. The software worked on the first try, and the microphone (which the user must supply) is a natural as an input device. More important, using a mike eliminates the need to generate words or phonemes (pieces of words) through the software. Gone is the awkward build-a-word approach that required the programmer to work double duty as a phonetician. Now all you have to do is talk.
One problem with using a mike near the computer is the proximity of external, unwanted noise, both mechanical and electrical. The blower fan on the computer registered as white noise. My magnetic mike, sensitive to electrical fields, picked up hum near Oric's power transformer. A friend's condensor mike was less sensitive, but didn't sound as good.
I achieved best results by crawling under the far end of the computer table, cupping my hand around the mike and my mouth to acoustically seal them, and speaking softly yet distinctly. It was hard to give keyboard commands that way, but the sound quality was worth it. Different types of microphones would no doubt require other recording techniques.
In all, I am pleased with the performance of the board, and would not hesitate to recommend it to anyone who understands its limitations. Its full potentail will not be realized unless you are willing to link it to a Winchester disk or perhaps a 5mb memory card.
For some folks, that isn't a problem. For others like myself, well, we can just keep waiting for those prices to fall ...
The Computalker people appear to be more interested in research and development than in marketing, so your local dealer may not be aware of this remarkable board. The Computalker staff will take direct product orders. I have found through experince that they support their customers after the sale with impressive expertise, courtesy, and prompt response. Their address if 1730 21st St., Santa Monica, Ca 90404. (213)828-6546.
Products: Computalker Consultants Compucorder (synthesizer)