Simulations: put the real world in your computer. Barry Keating.
Simulations are models of one sort or another of real systems; they allow users to conduct trial-and-error experiments to predict behavior of the actual system over time or to allow a better understanding of the behavior of the real system (a system is simply an assembly of interacting components and processes). When we observe the output of the trail-and-error experiments of the simulation (or model) it is very much like observing the real system.
The simplest simulations to understand are actually iconic models--physical representations of actual objects. These models have some (but not all) of the characteristics of the systems they represent; they have the most essential characteristics. A model airplane is a good example of an iconic model; the model airplane may look very much like a scaled-down version of the actual airplane, but it usually lacks some of the operating parts of the real airplane (i.e., landing gear, control surfaces, communications equipment, etc.). The model airplane, however, even though it lacks many of the features of the real thing may be quite accurate in behaving like the real airplane in a wind tunnel. In fact, many wind tunnel tests of aircraft and airfoil design are not done on full-scale aircraft but rather on models.
For the aircraft designer the equations describing the aerodynamics of the airplane could be solved to predict the behavior of the aircraft under various conditions, or because those equations are probably difficult to formulate and solve, the scale model may be built and tested in the wind tunnel as an alternative. Wind tunnel tests of iconic models are physical simulations.
In some ways computer simulation is not much different from simulation using iconic models. We often use computer simulation when a mathematical solution to a problem is either impossible or difficult to formulate. The advantage to using a simulation from the point of view of the user is that the likely results of particular actions can be determined prior to actually trying the implementation. A simulation user can test several alternatives and choose the one that gives the best results; proposed solutions and policies can be compared in just a few minutes of computer time, while observation of real life results might take years to accomplish.
Iconic models, like the airplane model, are built because they are expected to behave like the real thing in most instances. Radio control modelers know that for the most part their aircraft behave with startling realism; often fooling passersby who believe they have seen actual aircraft. But radio control modelers also know that their aircraft have some peculiarities not found in their full-scale counterparts. Phugoid oscillation, the tendency of an aircraft to first zoom upward, stall, and descend rapidly over and over again, is much more pronounced in scale models than in actual aircraft. Modelers know this and adjust for it when flying their models; they realize the model is only an essential representation of reality, not a perfect representation. Likewise, computer simulations are only an essential representation of the reality they model.
The basics of simulation are incredibly simple, but actual applications, ranging from air traffic control to financial forecasting, can be quite complex. Computer simulations are mathematical models rather than iconic models like the model airplane; in these computer based abstract simulations sets of equations and mathematical relationships stand for the quantities and characteristics of the systems being modeled. The solutions to the equations form the output of the model and can be used to predict the behaviour of the real system. Most often, the output is simply in the form of text output or tables but some simulations (even microcomputer simulations) use graphic output as a better way of describing results.
Reasons for Using Simulation
When a system is not able to be studied directly it may be studied using simulation because:
* The necessary resources to observe the actual system may not be available (e.g., to build and test many prototype aircraft could be too expensive in practice).
* A precise mathematical solution may not be possible to develop, or it might take too long to develop.
* It might be impossible to observe the results on an actual system (a materials supply system for a continuously operating process, for instance, cannot be used to test different supply rates).
* There may not be enough time to "wait and see" results from an actual system; speeded-up results may require the analyst the "telescope" time (e.g., to wait for acid rain to possibly eliminate the giant redwood forests may be an ineffective way to study the effects of such fallout on the forest).
Difficulties with Simulation
There are also instances in which simulations as a problem solving technique or as a teaching tool are probably inappropriate:
* Simulation does not yield exact answers. It deals with situations in which there is some uncertainty and so "answers" are approximations (in some cases, approximations that are not close enough to be useful).
* Creating a computer simulation of some situations may be quite expensive and out of date by completion.
Simulation Basics
All simulation models are abstractions of the systems which they represent, and to build a simulation the user must decide which characteristics of the real system are essential to the model and thus must be taken into account.
The two branches of simulation in common use on the microcomputer (and mainframes as well) are discrete event simulation and continuous simulation.
A discrete event simulation is a system constructed by defining the events where changes in the state of the system may occur. The model becomes dynamic by producing changes in the state of the system according to some time-ordered sequence. Queuing (waiting-in-line) problems are most often simulated as discrete event simulations: Suppose a bank has a single teller window; will the line at the window grow in length or hover about some particular length?
Continuous simulations describe the behavior of a system with a set of equations so that the system changes continuously with respect to time. The simulation may consist of algebraic, difference, or differential equations and in such a way is able to change continuously with time. An example of continuous simulation would be a model of an automobile front end suspension system in which the dynamics of running over various curbs, rocks, and potholes could be examined.
Whie discrete event simulations are characterized by large blocks of time during which nothing happens, continuous simulations assume that there is no instant in which nothing is changing (e.g., the suspension of the automobile is constantly changing as the tire rolls over new terrain).
Monte Carlo Simulations
Among the most frequently used kinds of discrete event simulations is the Monte Carlo simulations. The name dates back to World War II mathematicians John von Neumann and Stanislaw Ulam, who were trying to solve a problem at the Los Alamos Scientific Laboratory.
The problem they were working on was an extremely complicated one to answer and involved finding how far neutrons would travel through various materials. A trial and error solution would have been expensive and time consuming (there was a war on, and they needed the answers quickly). Their suggested solution was the equivalent of using a roulette wheel to determine step-by-step the probabilities of separate events and them merge them into a composite picture which gave them an approximate solution. At Los Alamos, von Neumann gave the secret work the code name "Monte Carlo," and this successful tool retains that name today.
Monte Carlo models have features that allow random events to be generated internally. In Basic, with the use of the RND (random) function, Monte Carlo simulations can easily be run on a microcomputer and are quite simple to construct.
Consider the Applesoft Basic simulation of tossing a coin in Listing 1. Line 130 generates 0's and 1's randomly, and we arbitrarily assign the occurrence of 0 to a "head" and the occurrence of a 1 to a "tail." Figure 1 shows the output to a single run of the program simulating ten tosses of the coin; each subsequent run would produce results which could be different from the six heads and four tails in our trial run. As we simulate many tosses of the coin, the number of heads and tails would approach the 50%-50% we would expect of a fair coin (change the number in line 120 to increase the number of tosses).
Of course we knew that a fair coin had exactly a 50% chance of giving a head on each toss, but what if we had not known the theoretical solution? That is where simulation can be of value; we could count up the results of thousands of actual coin tosses or we could simulate them in a few seconds on a microcomputer. The actual result in either case will rarely be exactly 50% heads and 50% tails, but the probabilities will tend to approach those "true" probabilities as the number of simulated tosses increases. We have then performed a Monte Carlo simulation using a powerful tool than can be applied to many business situations, logistics problems, scheduling studies, and system design situations.
Consider an actual situation in which simulation proved useful but in which the technique used was virtually identical to the coin toss situation. When the Dallas-Ft. Worth Airport was being constructed there was some question about how to construct the baggage handling facilities. Planners knew the approximate schedule of landings to be expected, the baggage capacity of the various aircraft, and other relevant pieces of information. They wished to build the luggage handling facilities in such a way as to minimize customer waiting at some reasonable cost.
A simulation model proved to be an ideal way to examine the effect of various luggage handling configurations on customer waiting time. With the model, planners could vary the arrival pattern of aircraft and see the results in passenger waiting time or they could vary the configuration of the aircraft landing (inserting many large aircraft one after another, for instance) and examine the likely waiting times. Clearly, this approach to planning facilities is superior to guessing at the results; of course, the results are only as good as the information fed into the simulation regarding aircraft arrivals, aircraft luggage capacity, occupancy rates, and so on.
An Example
Our example of simulation will be a discrete event simulation (a Monte Carlo simulation); we will examine bank customers arriving and being served by an automatic teller machine (ATM). Customers arrive at the ATM, wait for service if the machine is in use, are served, and then depart.
Customers arriving in the system when the ATM is in use wait in a single line in front of the machine. The arrival times of the customers and their service times are drawn from a probability distribution which we believe accurately describes the bank's customers. Our objective in running the simulation is to determine both how often a customer must wait longer than three minutes to be served and the average time a customer spends in line.
Because this simulation involves two instances of randomness in serial (first the customer's arrival time and second his service time) we call this a two-stage or multiple-phase simulation. The coin toss simulation was a single-stage simulation while the Dallas-Ft. Worth Airport simulation would obviously have been a many-phase simulation.
Our bank is unsure how many of its customers would use such a device and therefore how many of the ATMs to install. You, the manager, feel that customers would be quite annoyed at waiting longer than three minutes for service at the teller machine, and you suggest that a simulation of one teller machine might indicate whether the purchase of a second machine is necessary.
Gathering the Information. An analysis of the arrival pattern of 100 customers at an ATM at a branch of the downtown bank allowed the construction of the interarrival time frequency distribution in Table 1. Observations of 100 customers actually using the automated teller machines at the branch revealed the service time frequency distribution in Table 2.
The last column in each of these tables represents the set of assigned numbers (like the 0 for heads and 1 for tails) we will use to represent a particular category. For instance, in Table 1, 18% of the digits between 0 and 99 (i.e., the digits 0 through 17) were assigned to represent an arrival interval of 1 minute; 17% of the numbers (18 through 34) were assigned to represent an arrival interval of 2 minutes, and so on. When a particular random number is generated in our simulation, we will compare that number with the assigned numbers in column 4 to determine either when the customer is arriving (Table 1) or how long the customer uses the ATM (Table 2).
Running the Simulation. The form shown in Table 3 was constructed to allow the simulation to be run by hand. The form takes into account that the manager wants to look at both the arrival pattern of customers (which involves randomness) and the service time accounted for by each customer which also involves randomness).
To run the simulation, a set of random numbers between 0 and 99 is generated and placed in column 2 of Table 3; these numbers are used to determine the arrival pattern of the customers in the simulation. We have generated only ten numbers corresponding to ten different customers here but in actual practice a simulation run might include several thousand customers (many more than either of us would like to calculate by hand).
A second set of random numbers is generated and placed in column 4 of Table 3 to determine the service time for each of the ten customers. These random numbers cannot be the same numbers used to derive interval arrival times, if we believe arrival times and length of service to be independent.
By comparing the random numbers in column 2 to the assigned numbers in Table 3, we can generate the interval arrival times listed in column 3. The random numbers in column 4 are used in the like manner to generate service times.
Column 5 of Table 3 is arrived at by taking the random numbers in column 4 and comparing them to the assigned numbers (column 4) of Table 2. In this way service times for our simulated ten customers are generated independently of their arrival times.
The First Customer. Column 6 of Table 3 is the column in which we actually begin the simulation run. Assume that the simulation starts at time 0. Row 1 of column 3 tells us that the first customer arrives four minutes after the last customer, but since there was no previous customer, we will take this to mean four minutes after the beginning of the simulation. The customer's arrival time will then be written in column 6 as 04 (or four minutes "into" the simulation).
Since there are no customers currently using the ATM, this first customer may be served immediately, so "time service begins" in column 7 is also 04.
By consulting "service time" in column 5 we see that this customer requires two minutes to be served, so that if service begins at 04 the customer is free to leave two minutes later at 06 (this is written in column 8).
Subsequent Customers. Row 2 of Table 3 represents the second simulated customer. Note that this customer arrives one minute after the previous customer (column 3) and so arrives at time 05 (column 6). Since customer #1 is still at the ATM (because customer #1 does not leave until 06) customer #2 must wait until #1 has departed. This means customer #2 must wait one minute (column 9). It is just such instances that we are attempting to observe through simulation. Note that in row 8 of Table 3, which represents the eighth customer, a bottleneck occurs when three long service times occur consecutively. The eighth customer winds up waiting four minutes.
From the simulation it appears that with one ATM there will be some waiting time for customers. Whether this waiting time, on the average, is acceptable to the bank depends on the bank's willingness to accept the seemingly small risk of a customer waiting longer than three minutes. In this abbreviated simulation only one of the ten customers waits longer than three minutes. Whether the bank accepts this 10% probability that a random customer will wait more than three minutes should be compared with the costs involved in buying a second ATM with a resulting drop in the probability of waiting longer than three minutes.
Accuracy. It is very dangerous to draw conclusions from truncated simulations. If we repeat the simulation many times, we can feel more confident of the accuracy of the results. We assumed that the variables in the simulation (arrival interval and service time) were independent of each other. If this is not true, then the simulation will provide poor results. Finally, we used discrete (as opposed to continuous) simulation. In actual practice continuously distributed variables might provide more accurate results.
Microcomputer Version
Almost any programming language can be used to write a simulation, but we will continue our extended example by using Applesoft Basic (see Listing 2). The subroutine at line 200 generates one random number for the arrival interval and a second (different) random number for the service time.
In lines 80 through 140, the first of these numbers is used to calculate the customer's arrival interval (J), and in lines 160 to 185 the second random number is used to assign the same customer's service time (K).
The customer's arrival time (IA) is calculated in line 195; waiting time (IWA) is calculated in line 200; leaving time (IO) is calculated in line 210; and the customer's time "in the system" is given in line 215.
Each customer is represented on one line of output like the program output in Figure 2 where 25 customers were run through the system and 3 customers waited longer than three minutes. Given only this evidence, we would conclude that the probability a customer would wait longer than three minutes would be: 3 occurrences / 25 customers x 100 = 12%
However, since we would like accuracy approaching the real world probability, we would be better off increasing the number of customers shuttled through the bank by increasing the number 25 in line 60 to simulate a much larger group of customers.
By changing the number of customers from the 25 in line 60 to 1000, I found that 51 customers had to wait longer than three minutes with the longest wait being ten minutes. In addition, I found the average waiting time to be .708 minutes. Because 51 of the 1000 customers waited longer than three minutes, the simulation suggests that there is approximately a 5% chance that a customer will wait longer than three minutes: 51 occurrences / 1000 customers x 100 = 5.1%
This result is quite different from the result obtained using only 25 customers and is characteristic of this type of simulation. Convergence on the "true" answer will occur as large runs of the simulation are attempted. Try several 1000-customer runs and you will quickly convince yourself that the true answer is closer to a 5% chance of a customer waiting longer than three minutes than to the 12% chance we estimated with a run of only 25 customers.
Simulation Languages and Tools
While simulations can be written in virtually any computer language, it is often easier to use one of the specialized languages or software packages currently available. For small, short simulations, Basic or Fortran can be used. The disadvantage to using these general purpose languages, however, is that simulations can be very complex, and it is quite easy inadvertently to adopt some bad assumptions.
Special purpose simulation languages, on the other hand, are specifically adapted to those situations that occur most often in modeling. They run faster and make the programming simpler, and the finished product is less likely to contain common errors. Using a special purpose simulation language can be like gaining years of experience quickly. Simulations from Actuarial Micro Software is one such package available both for the Apple II series and IBM computers.
Simulations is actually a combination of two separate packages: Monte Carlo Simulations and GASS. Monte Carlo Simulations is a general purpose simulator which incorporates statistical analysis as well as the ability to run a Monte Carlo type discrete simulation. The statistical analysis section allows the fitting of the proper statistical distribution to your raw data. That distribution is then used to generate the random events in the simulation. A set of results for a simulation run are presented in Table 4. This report gives a description of the results of a simulation.
The graph in Figure 3 displays the results of fitting a negative binomial distribution to a set of raw data. While the manual and the programs in Simulations are easy to use, they lack the power to perform multiple-phase operations like the one in the bank simulation above. This severely restricts the type of problem that can be handled with the package. As a teaching tool, however, Simulations is the best on the market for demonstrating Monte Carlo type simulations.
EZQ from Acme Software Arts is a package available only for the Apple II series computers and is definitely for those with some simulation experience. It is not designed to handle Monte Carlo type simulations but rather is oriented to solving differential, difference, and algebraic equations.
Dynamic simulations can be handled easily by EZQ. The author of the program, Gerald Gottlieb, sent us several articles from medical journals which describe the use of EZQ to run simulations of muscles, neuromuscular stimulation, and energy absorption of football helmets. This package requires good grounding in differential equations but can be invaluable for those who have expertise in that area.
EZQ provides both tabular and graphic output for easy analysis of results. An example of the graphic output is shown in Figure 4.
Slam II from Pritsker and Associates is a simulation language for the IBM PC which handles both discrete event and continuous simulation as well as any combination of the two. Mainframe simulation users will be familiar with Slam, the mainframe version of this language, which has been around since 1979. Slam is in wide use in industry for modeling production lines, transportation networks, communications networks, military operations, computer systems, and material handling configurations. Many universities also use the mainframe version as an instructional tool for neophytes to simulation; there are some fine teaching materials available for use with Slam.
The IBM PC version is relatively new and sure to catch the attention of old Slam users, because the commands are similar. The Slam II system starts by designing a network, or flow diagram, which graphically portrays the flow of entities (e.g., people, parts, information) through the system. The network is made up of nodes, and Slam II includes 20 different node types from which to choose. To analyze the model, the network is translated into a statement model which serves as an input file for Slam II. It is possible to write the statement form of the simulation directly, but most users will probably resort to the diagrammatic approach. Slam II for the IBM PC allows output to be written to DIF files so that users can manipulate output to create bar charts, pie charts, or plots with Lotus 1-2-3, VisiCalc, or any other software recognizing the DIF format.
Micro-Dynamo is a simulation language, available for both the IBM and Apple II series computers, that deals solely with dynamic simulations. The language will be familiar to some as the language used to model world resources by Jay Forrester in the World II Model. That simulation created quite a public debate because it illustrated the limits to economic growth imposed by natural resources and increasing pollution and overpopulation.
Figure 5 shows the "basic behavior" of Forrester's model as replicated in the Apple version of Micro-Dynamo. The model has fallen into some disrepute because of Forrester's generous assumptions, but that need not concern us here. In Figure 5, the NR curve is natural resources; the P curve is population; the QL curve is quality of life; and the POLR curve is pollution. Figure 5 presents the most powerful characteristic of Micro-Dynamo, its ability to display output with color graphics in easy-to-understand formats.
Dynamo is not a new simulation language; it dates back to 1958 and is in common use on mainframes. Those familiar with the mainframe version will see the similarity of the microcomputer version. The authors indicate that no special training in mathematics is necessary to use Dynamo, unlike the background required for EZQ. A knowledge of high school algebra is deemed sufficient for using the software. No knowledge of programming is necessary either, because Dynamo places the equations in proper order for processing.
Addison-Wesley Publishing, which produces Micro-Dynamo, also publishes a college textbook titled Computer Simulation which uses extensive examples, which can be programmed in Dynamo. The book is a good bet for those interested in dynamic simulations (note that the book does not treat Monte Carlo type simulations).
While most microcomputer users would not consider an integrated package to be a specialized simulation language, it is possible to perform Monte Carlo type simulations with some packages. A good choice for such a use would be SuperCalc3 for the IBM PC or SuperCalc3a for the Apple IIe enhanced with 128K.
The SuperCalc software from Sorcim/IUS includes a random function similar to that found in Basic. This allows a user to set up a spreadsheet calling random numbers as any point to replicate any probability distribution. Since many microcomputer users are more at home with spreadsheets than with any of the programming languages, this could be a definite advantage in setting up simulations.
SuperCalc also has the ability to graph output from a spreadsheet, so graphic output from a simulation is relatively simple using the SuperCalc software. The SuperCalc3 packages includes such a simulation as a demonstration template. Their demonstration is actually a blackjack game. A look at the formulas in the template will reveal the use of the random function to generate outcomes. The only bounds to using SuperCalc for simulation of discrete events is your own imagination.