Conducting In-House Play Testing
Conducting In-House Play Testing
What makes some games "good" and others "bad"?
Is this subjectivity something you can beat through play testing? By constructing a solid play testing team and following these guidelines, you'll improve your game's chances.
LucasArts, says that game play is "an unknown quantity that we're all trying to know." In short, there is no definitive magic formula for good game play.
Just because this quality is elusive, however, doesn't mean that it should be shoved aside as irrelevant. The ultimate responsibility for great game play usually rests in the hands of the producer. Still, before the code is written, the designer is often the person most concerned with "playability." More often than not these days, the producer and the designer aren't the same person. Games can be tweaked, improved, and enhanced during the testing phase, but if the game's basic design is flawed, it's already too late.
A good design should directly address components that allow the flexibility of altering game play. One of the best examples of such a component was designed and built by programmer Andy Caldwell (now with Screaming Pink Inc.). Caldwell's belief that good tweaking tools "relieve the programmer's time to work on programming while other people tweak" paid off in Street Hockey '95, a 16-bit multitap SNES game that unfortunately was undermarketed — it had great game play. Caldwell built the player attributes into a table structure and gave the testers access to the table. Each tester was assigned responsibility for the game play of certain characters. The designer and producer determined what each character's strength should be, and no other characters could be tweaked higher in that catagory. The testers had the ability to open up the table and change each character's attributes for shot accuracy, blocking ability, and so on. The testers then fed their improved attributes to the producer, who made sure that the testers weren't making "mega" players. The end result was that the programmer was able to concentrate on bug chasing and code performance improvements while the testers tweaked. Because of this tool, the game came in on time, under budget, and passed through Nintendo's approval system on the first pass — everyone on the team was empowered to do what they do best.
Starting and Controlling the Test Process
Play testing is usually accomplished in one of two ways: bringing in consumers (temporary play testers) and observing them while they use the product, or sending out beta copies of the game and eliciting feedback via a questionaire. Because conducting a wide-ranging beta test over the Internet is an article in itself, I'll only discuss in-house testing here. However, I do want to note that last fall I successfully used Internet Relay Chat (IRC) to conduct question and answer sessions with my external beta testers.
Conducting in-house play testing requires formal observation of temporary play testers playing the game over the course of several days. This type of testing shouldn't be confused with focus testing, which is conducted by your marketing team. The main purpose of in-house play testing is to put the game into the hands of each player and obtain individual feedback; marketing focus tests usually consist of showing the game to a group and obtaining group feedback. Sometimes people from an earlier marketing focus test might be invited back as temporary play testers, but usually these positions are filled through a variety of sources, such as recruiting friends of full-time testers, distributing flyers on local college campuses or at local arcades, posting notices on local Internet gaming bulletin boards, or advertising in local computer publications, such as The Computer-Edge in San Diego. Occasionally, good candidates can be found through temporary agencies, but most people don't boast of their gaming skills on resumes or job applications.
Wherever you decide to look for testers, make sure that you interview everyone before you hire anyone. Question interviewees about what types of games they most like to play. Don't hire somebody who only plays sports games to play test an RPG unless you want this individual to be one of those few purposefully hired to be unfamiliar with the genre.
The timing of play testing needs to be planned carefully. The game needs to be stable enough that the play tester doesn't spend too much time noting operational bugs, yet immature enough that effective changes can still be made to it. A minimum of one week's employment should be promised with the possibility of more. Since the hours that some play testers are available can vary, plan on double or late shifts for the regular testing staff during the weeks of play testing so as to accommodate those testers' schedules that only permit evening participation.
The ratio of temporary play testers to full-time staff testers monitoring them should be no less than 1:1. Each staff tester should always be observing, answering questions, and noting the temporary play testers' questions. Here are some key things for staff members to look out for:
• Where do play testers seem to get stuck and ask for help from the staff? The staff testers working with the play testers need to rate each individual based upon their game skills. Although somewhat subjective, if one play tester can't even get the game installed and everyone else can, it would appear that this particular play tester doesn't posess adequate skills for the job. However, don't let this discourage you. Not everyone you bring in is going to live up to expectations.
• What kinds of features do the play testers have the most questions about? In the case of a sports game, set the game at the shortest playing time possible so that an entire game can be played in an hour or so. In the case of graphical adventure games that have a variety of different environments, be sure to spread the play testing across those various environments. Be sure coverage for the whole game — and not just the first part of the game's experience — is included in play testing. If there is a bonus environment that players can only get to after solving all the puzzles in other environments, provide shortcuts, jump codes, or previously saved games so that testers can jump to that bonus environment without having to solve everything else. Otherwise, what should be the best part of the game could turn out to be weak and bug laden.
• Do play testers get frustrated with the game easily? How closely does their frustration level relate to their skill level? Benchmarks need to be established prior to bringing in the play testers; additional benchmarks will be added to as testing proceeds to measure key aspects of play testing. If the game uses puzzles, establish a minimum and a maximum amount of time for the play testers to solve each puzzle. If nobody can solve a certain puzzle in the expected minimum amount of time, don't stop the clock — let play testers continue until the maximum amount of time has expired. Find out if players really want to solve the puzzle or are becoming angry by their inability to solve it.
• Do play testers like the game? If they like the game, they'll be able to cite specific instances in the game that they liked or enjoyed. If they're bluffing, most likely they'll be unable to say any more than, "I just liked it."
When you have a significant number of play testers begging to have a copy to take home with them, you know you have a winner on your hands. But what if everyone seems to dislike the game? At this point in the schedule, too much money has been spent to throw it all away. It's time for the quality assurance (QA) manager to call a strategy meeting with testing, design, and production team members to review the usability test results.
• Are they complaining about the same things that earlier testers had noted in suggestion bug reports? If the play testers echo sentiments made during the earlier staff testing phase, and the items criticized were not fixed or changed, not enough attention has been paid to staff testers. These bugs will haunt you in product reviews after the game's been released.
• How long before the play testers become as bored with the game as the staff testers? A good testing schedule includes a lunch break after four hours and at least one 15-minute break every two hours. If the testers want to talk too much or need to take too many breaks, it could indicate that they are hitting the boredom stage. After a week of play testing (or at some other significant break during play testing), the play testers and their staff leaders should hold a group session to discuss the game. Prior to that, discussion between testers needs to be kept to a minimum so as not to alter opinions. During testing, the testers should be observed only — the producer and other "vested interests shouldn't engage the testers in conversation — other than to ask questions — lest the testers be tainted by that interaction as well.
Play Testing Goals
Play testing should provide the producer with as much information as possible for making the necessary game play tweaks. Testing needs to provide more information than just crash and lockup problems. The producer needs to hear opinions such as, "I think the game is boring because...." Bug reports should include a category for subjective feedback, perhaps in headings titled "Opinion" or "Comment." Remember, the testing department usually contains the highest ratio of gamers in the company. They are the ones who sit and test games all day — many go home and play games all night.
The QA manager's primary objective is staffing each project with the right mix of play testing talent. Secondarily, the QA manager needs to assure that the information flow remains constant
— and pertinent — to the goals of the project. Often, QA managers' biggest obstacle is losing their best play testers to the production department.
Since turnover in the testing department can be fairly high, being able to identify and hire skilled testers is critical. The QA manager should look for excellent written and oral communication skills — the foremost prerequisite. I once made the mistake of hiring someone who couldn't write understandable bug reports.
Even though this individual was a dedicated gamer with great ideas, it just didn't work out because this person couldn't communicate well.
Beyond communication skills, it helps for the tester to have a variety of experience in your game's genre. Also, throw in a few testers who know little or nothing about the genre, as this will broaden the insight you'll obtain about your title. Testers with less genre experience are often the ones who question the interface and yield improvements in areas where genre experts take things for granted.
The QA manager and the producer together need to choreograph a system of information sharing that will best help the project succeed. If you ask a tester and a producer why a game doesn't have good game play, you're liable to get two totally different answers. According to Paul Coletta, when testing says some aspect of the game is "wrong," the producer needs to interpret and evaluate whether that which is "wrong" affects the game's fun, pacing, or addictive qualities. Wayne Cline adds that the producer's biggest task is looking at testing reports and figuring out what will make the most impact on the game with the least disruption of the schedule.
The Dos and Don’ts of Managing Play Testing
DON’T BE DEFENSIVE ABOUT CRITICISM.
Some producers get too defensive about their game design and concept, and they miss out on the best evaluations testing can give. Every effort should be made to make the testers feel that their opinions are important. Otherwise, they might fail to convey that one comment that could make or break the playability of a game, simply because they feel that their opinions don't matter or that they'll offend someone by giving honest feedback.
On the other hand, there will always be testers who can't say anything nice and advocate an entire revamp of the game. (Hopefully, the game didn't get that far in production if it really is that bad.) Don't put up your defenses too quickly, and try not to take these comments as insults. Glean as much information as you can from these testers.
QA managers should instruct testers to be specific when wording their feedback about a game. For instance, my favorite bug report was one where the tester stated, "The pencil sucks." This was in reference to a puzzle in a graphical adventure game where the player needed to move a piece of paper over a rock and rub a pencil on it to get the clue. The real problem was that the pencil was not easily manipulated to do the rubbing. Had the tester been more specific, time wouldn't have been spent trying to decypher this cryptic comment and the problem would have been solved more quickly.
STAND BEHIND OPINIONS.
Testers should be taught to stick to their opinions, even if the producer tries to dissuade them from logging bug reports containing negative feedback. Some producers will go to great lengths to get their game through testing, but it's vital that the testing group report all issues they feel are important. Training testers to stick by their guns in the face of a direct challenge doesn't mean allowing them to become hostile. Testers who aren't perceived as thoughtful and helpful will get little cooperation from developers, ruining their chances to provide enough information or obtain enough support to do good work. According to James Bach, chief engineer with ST Labs, "Testers should be taught to give information, both positive and negative, without worrying about how developers will react to it." Furthermore, James advocates teaching testers that "the whole team owns quality, not just them. Testing is a process of revealing information that helps to make good decisions."
ENCOURAGE ESPRIT DE TESTING CORPS.
Naturally, the size of a testing group should correlate to the number of games the group is expected to test at once. Full-time testing teams generally consist of at least one lead, one assistant lead, and three to six full-time testers, depending on the type and complexity of the project.
Full-time testers need to have a sense of community as a testing group, and should have a dedicated testing lab. Testers need to be located together in an area that promotes communication and cross-training between testers, particularly in the games industry, where few testers are actually trained in software testing methodologies, and most of their training is obtained on the job. Physically locating testers with the project developers they are assigned to — and not with their fellow testers — could (and often does) hinder their objectivity. This doesn't mean that testers shouldn't have offices, just that their offices should be located near other testers. To counteract this separatism, testers (and particularly lead testers) need to be trained to work hard at developing strong communication with the developers whose products they are testing. They need to understand the basic architecture of the product they are testing to better find the bugs.
Ideally this community room will have all the necessary testing hardware. It can also double as a place to observe outside testers. A synergy of learning, communication, and discussion takes place in this setup. It promotes game-play-oriented comments and critique.
MIX UP THE HARDWARE.
Each project needs to be experienced on the minimum hardware configuration, as well as the closest thing possible to the maximum configuration and everything in between. The majority of testing needs to be conducted on the minimum configuration, because that is the promise to the customer. It's somewhat ghastly to see both "minimum" and "recommended" specifications on product boxes these days. What this dichotomy usually means is that the game will run on the minimum configuration, but if you want a decent experience, your machine had better have the recommended configuration. The difference between the two is causing a lot of unhappiness with customers.
Don't skimp on high-end testing either. Believe it or not, bugs can be found on the hottest machines around. I worked on one game that tested perfectly on the minimum specification, yet when customers attempted to install it on a machine that had 64MB RAM, the installer indicated that not enough memory was available to install the game. As it turned out, the game was looking for was 8MB RAM, and it only looked at the last digit. So it only installed on 8MB machines.
KEEP THE EYES FRESH.
When staff testers look at nothing but the project to which they are assigned for weeks and weeks on end, they become blind to problems that they might otherwise notice. Therefore, it's useful to move testers around to other projects every now and then to gain a "fresh set of eyes." Sometimes, staff testers for one project can be used as temporary play testers for other projects. This is another reason for locating testers in a community area, rather than spreading them out all over a facility.
How often have we caught ourselves passing down a project legacy to new testers? By "project legacy," I mean the harmful folklore used as justification for not solving an often-cited problem. For instance, one project I worked on spanned four CD-ROMs. Each time the tester started up the game, she needed to insert disk one into the drive, then swap to a second disk to resume play where she had left off. The reason she had to endure this disk swapping hassle (so the "pat" answer goes) was that correcting it would require an engine fix, and the engine "couldn't be changed." But making a change to the engine was possible; it was just that the programmer didn't want to do it, the producer didn't insist on it, and testers didn't make an issue out of the problem. We all had passed down the legacy that the engine couldn't be changed. A bug report for this problem was never even written, so when weekly meetings were held to review the reports, it wasn't ever discussed formally. Simply put, because of this "legacy," we had our blinders on when it came to that problem. Of course, this product's number one complaint once it went to market was the disk swapping issue.
OBSERVE YOUR TESTERS.
The best producers spend time in the test lab — listening, not talking. They listen to the testers and they strive to derive and implement game play abstracts from the testers' concrete comments. As Cline says, "We know we have a good game if the testers are enthusiastic after weeks and weeks of play." However, I have seen producers who spend too much time with the testers. Often in these situations, each time a tester critiques an aspect of the game, the producer explains or defends why it is the way it is. The tester doesn't write up the problem because he believes it can't (or won't) be changed. Thus, new project legacies are born. Producers need to interpret and consider — not rationalize — any issues raised by the testers' comments.
REWARD YOUR TESTERS.
Everyone works better and harder if they believe their hard work will be rewarded. To some staff testers, that reward might be recognition. To others, cold hard cash. Since the varieties are about as abundant as the number of people on staff, it is often difficult to reward everyone adequately. Some of the best (and most difficult) rewards include: recommending a tester for a promotion in recognition of a job well done, supporting a deserving tester when he or she applies for another job in the company (representing a step up the ladder), and recommending a tester for monetary bonuses. One of the easiest rewards is to spring for a pizza lunch and have a lunchtime game tournament playing the latest hot title whenever specific weekly goals for testing teams are met. Over the last couple of years, lunchtime tournament favorites in my shop have included Descent, Duke Nukem, and Diablo competitions.
MAKE TESTERS AWARE OF THE COMPETITION.
Make time for testers to review and analyze competitive products that are similar in nature to the one that they're expected to be testing. Make your testers the experts on the genre! Not only will you get better information from the testers, they'll appreciate the chance to play another game.
It All Boils Down To Teamwork
It's difficult to achieve that delicate balance between developers and testers during play testing. The guidelines addressed here don't encompass everything a game developer or publisher might want to do to test game play, but they're a place to start. The most important aspect of successful play testing is encouraging teamwork among the testers and developers. Listen to the testers, create an environment that is pleasant to work in, continually learn more about the craft, and stay fresh and honest. Play testing can be an ordeal, but when testers and developers work together, games ship on schedule, under budget, and with great game play.
Jeanne Collins is a quality assurance manager at GTE's Intelligent Network Services Group. She is sometimes referred to as a "self-proclaimed evangelist for quality assurance in the gaming industry" and chairs sigTEST, a CGDA Affiliate group.
/BY JEANNE COLLINS, GAME DEVELOPER (Vol.4 Number 3) June 1997/