How To Evaluate Training Games and Activities

In my earlier life, I was a professional evaluator of instructional materials. In recent years, clients have frequently hired me to evaluate training games and simulations. Every day, I evaluate my own games and training activities. So, I figure that I know a thing or two about instructional evaluation: how the principles and procedures related to the evaluation of training activities are similar to other types of evaluation and how they are unique to this specialized area.

Here are some of my thoughts on the why, what, and when of evaluating training activities.

Why: Proving and Improving Effectiveness

We do evaluation to assist our decisionmaking. There are two types of decisions to be made and, therefore, there are two purposes for evaluation of training activities. The purpose of one type of evaluation, called summative evaluation, is to prove the effectiveness of a training activity.

Summative questions that are usually asked by training managers and administrators include these types yes/no items:

Should I use this game to train my nurses on patient-oriented medical records?
Do experts agree that this roleplay faithfully reflects the reality of doing business with the Japanese?
Does the cost of this online simulation game fall within our training budget?
What does the training activity do that other alternative training techniques cannot do?

The goal of the other type of evaluation, called formative evaluation, is to improve the effectiveness of an activity. Formative questions that are usually asked by the game designers include such items as these:

Employees don’t like to fill out the inventory-control forms. Does this simulation game clearly reflect the importance of this activity without scaring off the players?
Players who score high in this simulation game perform miserably in real-life work situations. How can we tweak the game to ensure increased transfer from the game?
The level of competition in this game is so high that players focus on winning the game without trying to learn anything. How can we tone down the intensity of competition to better balance interest in the game and its instructional effectiveness?
When we conducted this game with a mixed group of Asian men and women, players did not participate with the level of enthusiasm exhibited by other groups. What adjustments do we have to make in order to ensure that the game is positively received by people from different cultures?

Of course, the same evaluation data may be used in both summative and formative decisionmaking.

What: Expert Opinion and Player Reaction

Evaluation data can be collected from experts and from actual participants. Sometimes a training activity rated high on design sophistication by experts may be a flop with the players. On the other hand, a training game that excites players may actually teach inaccurate principles and inappropriate procedures. Obviously both player testing and expert reviews act as complementary sources of evaluation data. Both types of evaluation are needed for formative and summative decisionmaking.

Different types of experts are qualified to pass judgment on the value of an instructional activity. A subject-matter expert (SME) helps us evaluate the appropriateness and adequacy of the instructional content through checking such items as these:

Are the objectives of this activity related to the training objectives?
Does winning the game reflect the attainment of the training objectives?
Does the game accurately and adequately simulate the reality that it is supposed to reflect?
Does the activity present up-to-date and accurate information about the training topic?

A game-design expert provides valuable information on the playability and the potential interest level of a training game. This person focuses on the structure of the game and sequence of play and checks such items as these:

Does the game have the best structure for dealing with this type of training content?
Are the rules of the game fair and simple?
Is the pace of the game appropriate?
Can the game be played without the need for special supplies and equipment?

A target-population expert is knowledgeable about the types of people who will be playing the game. This person evaluates the feasibility of using the game with potential players by exploring these types of questions:

Will the potential players consider the game to be too frivolous?
Is the pace of the game suitable for the attention span of typical players?
Are the players familiar with the game materials (such as playing cards and dice)?
Does the game require certain behaviors (such as cross-gender touching) that are incompatible with the cultural values of the players?
Can the players afford the time and cost of the game?

While all these expert opinions are useful and important, the final proof of the effectiveness of the game depends on player behaviors and responses. The types of questions that are explored by observing players and interviewing them include the following:

How do the players appear to enjoy the game?
What skills do players acquire by playing the game? Do these skills transfer to their workplace situations?
What attitude changes take place as a result of playing the game?
What insights do the players share during the debriefing discussion after the play of the game?
What complaints do the players have about the game?

More What: Side Effects and Main Effects

Because training activities provide a powerful holistic learning experience, they present a special hazard: Participants in games, simulations, roleplays, and other such activities learn much more than what the designer intended for them to learn. Recently, for example, I play-tested a realistic simulation game that vividly portrayed the plight of a terminally ill patient. It effectively helped participants (who were newly hired health-care providers) to empathize with the patient. This was one of the major training objectives. At the same time, participants reported a very strong feeling of futility and depression. Many of them began to seriously reconsider their career choice. This attitude change was definitely not a part of the training objectives. To remedy the situation, we toned down the intensity of simulation and added several optimistic questions to the debrief.

Unanticipated and undesirable consequences of participation in games present a major training challenge. Questions about such effects usually occur in the use of simulation games in soft-skill areas. As an evaluator, I have to be especially conscious of such side effects since players focus on winning the game. With their defenses down because of this distraction, they are especially susceptible to unintentional attitude changes.

A checklist for this area of evaluation includes the following types of items:

Does the simulation simplify the variables to such an extent that the player gets a distorted picture of reality?
Does the reduced risk in the game teach the player inappropriate behaviors that are likely to be punished by real-life consequences?
Does the chance element in the game reduce the player's feelings of self-determination?
Does the high level of motivation reduce the player’s tolerance toward less exciting training activities?
Does the intense competition in the game create lasting feelings of hostility toward each other?

This stress on side effects does not negate the need for measuring the main effects of a game. In some game-design circles, it has become fashionable to be vague and evasive about the training objectives of the game under the rationalization that complex and affective outcomes are not easily measurable. The practice of criterion-referenced measurement—setting up behavioral goals and validly measuring their attainment—has been extended to complex cognitive and affective objectives. The use of performance tests and unobtrusive measures, many of them built into the game itself, enables the evaluator to measure primary outcomes such as these:

What principles do the players master? How far are they able to generalize them?
What is the probability of the players being able to apply newly learned principles to real- life situations?
What are the attitudinal objectives of the game? What behaviors during the game (and after its conclusion) indicate the achievement of these objectives?
Do the questions used in the game require such higher-order thinking skills as predicting, problem solving, critical thinking, inductive reasoning, and strategic planning?
How reliable is our measurement of main effects of playing the game? With what degree of confidence can we assert the attainment of various objectives?
How does the game incorporate measures of attitudes and values in the debriefing discussion?

When: During Play and After

Another evaluation dimension is represented by the process-outcome distinction. Process evaluation concentrates on the play of the game and concerns itself with these types of questions:

How long does the game last? How long do the players think it lasts?
Which aspects of the game are too slow? Which aspects are too fast?
Do players demonstrate increased skills as the game progresses?
How do team members collaborate and consult among themselves?
How does the players' attention level fluctuate during different stages of the game?

Outcome evaluation involves measuring and judging what happens after—and as a consequence of—the play of the game. Questions asked in this type of evaluation include the following:

How does the players' attitudes toward the topic of the game affect their on-the-job performance?
What are the unanticipated negative consequences of playing the game?
What feelings and emotions predominate players' debriefing discussions? How are these feelings carried over to real life?
What new principles do the players learn? How are these principles applied to the work place?

Both ... And ...

We explored four polarized dimensions related to the evaluation of training activities:

Formative and summative evaluation
Expert opinion and player reaction
Main effect and side effect
Process data and outcome result

The question is not which type of evaluation we should undertake: formative or summative, expert or player, main or side effects, and process or outcome. As in the case of any polarity, it is always both formative and summative, both expert and player, both main and side effects, and both process and outcome.

When you are evaluating your own training game, don’t forget to check out all of these dimensions.