In my earlier life, I was a professional evaluator of instructional materials. In recent years, clients have frequently hired me to evaluate training games and simulations. Every day, I evaluate my own games and training activities. So, I figure that I know a thing or two about instructional evaluation: how the principles and procedures related to the evaluation of training games are similar to other types of evaluation and how they are unique to this specialized area.

Here are some of my thoughts on the why, what, and when of evaluating training games (and other such activities).

Why: Proving and Improving Effectiveness

We do evaluation for providing information to assist decisionmaking. There are two types of decisions to be made and, therefore, two goals for evaluation of training games. The goal of one type of evaluation, called summative evaluation, is to prove the effectiveness of a game.

Summative questions that are usually asked by training managers and administrators include these types yes/no items:

Should I use this game to train my nurses on patient-oriented medical records?
Do experts agree that this simulation faithfully reflects the reality of doing business with the Japanese?
Does the cost of this computer game fall within our training budget allocations?
What does the game do that other alternative training techniques could not do?

The goal of the other type of evaluation, called formative evaluation, is to improve the effectiveness of a game. Formative questions that are usually asked by the game designers include such items as these:

Employees don’t like to fill out the inventory-control forms. Does this simulation game clearly reflect the importance of this activity without scaring off the players?
Players who score high in this simulation game perform miserably in real-life work situations. How can we tweak the game to ensure increased transfer from the game?
The level of competition in this game is so high that players focus on winning the game without trying to learn. How can we tone down the intensity of competition to better balance interest in the game and its instructional effectiveness?
When we conducted this game with a mixed group of Asian men and women, players did not participate with the level of enthusiasm exhibited by other groups. What adjustments do we have to make in order to ensure that the game is positively received by people from different cultures?

Of course, the same evaluation data may be used in both summative and formative decisionmaking.

What: Expert Opinion and Player Reaction

Evaluation data can be collected from the opinions of experts and from the actual play of the game by participants. Very often, a training game rated high on design sophistication by experts may be a flop with the players. On the other hand, a training game that excites players may actually teach inaccurate principles and inappropriate procedures. Obviously, both player testing and expert reviews act as complementary sources of evaluation data and check and balance each other. Both types of evaluation are needed for formative and summative decisionmaking.

Different types of experts are qualified to pass judgment on the worth of an instructional game. A subject-matter expert (SME) helps us evaluate the appropriateness and adequacy of the instructional content through checking such items as

Are the objectives of this game related to the training objectives?
Does winning the game reflect the attainment of the training objectives?
Does the game accurately and adequately simulate the reality that it is supposed to reflect?
Does the game present up-to-date and accurate information about the training topic?

A game-design expert provides valuable information on the playability and the potential interest level of a training game. This person focuses on the structure of the game and sequence of play and checks such items as these:

Does the game have the best structure for dealing with this type of training content?

Are the rules of the game fair and simple?
Is the pace of the game appropriate?
Are chance elements in the game appropriately controlled?
Can the game be played without the need for special supplies and equipment?

A target-population expert is knowledgeable about the types of people who will be playing the game. This person evaluates the feasibility of using the game with potential players by exploring these types of questions:

Will the potential players consider the game to be too frivolous?
Is the pace of the game suitable for the attention span of typical players?
Are the players familiar with the game materials (such as playing cards and dice)?
Does the game require certain behaviors (such as cross-gender touching) that are incompatible with the cultural values of the players?
Can the players afford the time and cost of the game?

While all these expert opinions are useful and important, the final proof of the effectiveness of the game depends on player behaviors and responses. The types of questions that are explored by observing players and interviewing them include the following:

How do the players appear to enjoy the game?
What skills do players acquire by playing the game? Do these skills transfer to their workplace?
What attitude changes take place as a result of playing the game?
What insights do the players sharing during the debriefing?
What complaints do the players have about the game?

What: Side Effects and Main Effects

Because training activities provide a powerful holistic learning experience, they present a special hazard: The participants in games, simulations, roleplays, and other such activities learn much more than what the designer intended for them to learn. Recently, for example, I play-tested a realistic simulation game that vividly portrayed the plight of a terminally-ill patient. It effectively helped the participants (who were newly hired health-care providers) to empathize with the patient. This was one of the major training objectives. At the same time, the participants reported a very strong feeling of futility and depression. Some of them began to seriously reconsider their career choice. This attitude change was definitely not a training objective. To remedy the situation, we toned down the intensity of simulation and added several optimistic questions to the debrief.
Unanticipated and undesirable consequences of training games present a major training challenge. Questions about such effects usually occur in the use of simulation games in soft-skill areas. As an evaluator, I have to be conscious of such side effects since players focus on winning the game. With their defenses down because of this distraction, they are especially susceptible to unintentional attitude changes.

A checklist for this area of evaluation includes the following types of items:

Does the simulation simplify the variables to such an extent that the player gets a distorted picture of reality?
Does the reduced risk in the game teach the player inappropriate behaviors that are likely to be punished by real-life consequences?
Do the chance element in the game reduce the player's feelings of self determination?
Does the high level of motivation reduce the player’s tolerance toward less exciting training activities?
Does the intense competition in the game create lasting feelings of hostility toward each other?

This stress on side effects does not negate the need for measuring the main effects of a game. In some game-design circles, it has become fashionable to be vague and evasive about the training objectives of the game under the rationalization that complex and affective outcomes are not easily measurable. The practice of criterion-referenced measurement—setting up behavioral goals and validly measuring their attainment—has been extended to complex cognitive and affective objectives. The use of performance tests and unobtrusive measures, many of them built into the game itself, enables the evaluator to measure primary outcomes such as these:

What principles do the players master? How far are they able to generalize them?
What is the probability of the players being able to apply newly-learned principles to real-life situations?
What are the attitudinal objectives of the game? What behaviors during the game and after its conclusion indicate the achievement of these objectives?
Do the questions used in the game require such higher-order thinking skills as predicting, problem solving, critical thinking, inductive reasoning, and strategic planning?
How reliable is our measurement of main effects of playing the game? With what degree of confidence can we assert the attainment of various objectives?
How does the game incorporate measures of attitudes and values in the debriefing discussion?

When: During Play and After

Another evaluation dimension is represented by the process-outcome distinction. Process evaluation (also known as level 1 evaluation) concentrates on the play of the game and concerns itself with these types of questions:

How long does the game last? How long do the players think it lasts?
Which aspects of the game are too slow? Which aspects are too fast?
Do players demonstrate increased skills as the game progresses?
How do team members collaborate and consult among themselves?
How does the players' attention level fluctuate during different stages of the game?

Outcome evaluation (also known as level 2, 3, and 4 evaluation) involves measuring and judging what happens after—and as a consequence of—the play of the game. Questions asked in this type of evaluation include the following:

How does the players' attitudes toward the topic of the game affect their on-the-job performance?
What are the unanticipated negative consequences of playing the game?
What feelings and emotions predominate players' debriefing discussions? How are these feelings carried over to real life?
What new principles do the players learn? How are these principles applied to the work place?

Both … And …

We explored four polarized dimensions related to the evaluation of training activities:

Formative and summative evaluation
Expert opinion and player reaction
Main effect and side effect
Process data and outcome results

The question is not which type of evaluation should we undertake: formative or summative, expert or player, main or side effects, and process or outcome. As in the case of any polarity, it is always both formative and summative, both expert and player, both main and side effects, and both process and outcome.

When you are evaluating your own training game, don’t forget to check out all of these dimensions.