(another excerpt from my upcoming book: "Breaking Through to the Other side")
An example – testing interactive World Cup 2002 services
As the national football teams from thirty-two countries engaged in the World Cup 2002 in June 2002, several broadcasters launched interactive applications to stimulate the viewers’ appetite for coverage.
The offerings included:
• Multiple TV-streams: Examples included, the Main picture (the general feed that analogue viewers would see), Team A and team B (focusing on the coaches and substitutes on the benches), Tactical (seen from a very high angle from behind one goal) and Highlights (looping or repeating goals and other, well, highlights of the game so far).
• A mosaic channel composed of some of the channels above.
• Various services: a separate TV-domain with news and team standings and more, overlay match info showing team line-ups and player profiles, plus occasional quiz and poll overlays, that during the game encouraged viewers to SMS a response to questions like “Who will win this game?” and “Who won the World Cup in 1998?”

Illustration XXII: A prototype of the Canal Digital mosaic channel for the World Cup 2002.
It includes from top left: the Main picture (the general feed that analogue viewers would see), Highlights, Team A (Argentina) and team B (Italy) and Tactical.
For the testing of these applications, we had football-enthusiasts watch games of their choice and in the company of their choice (three friends, a couple, two brothers, etc.) in a usability lab disguised as a living room. During each test, we provided the test persons with the above services from one of the two available broadcasters (Canal Digital and TDC Kabel TV), a brief introduction to the i-TV services and the remote control, and plenty of soft drinks and snacks.
The study encompassed:
• Seven games in the preliminary rounds (e.g. Italy-Ecuador and Denmark- Senegal);
• Seven different groups of different sizes (from two to four viewers);
• Varying age groups (from teenagers to a couple in their fifties); and
• Viewers with a range of experience using set top boxes and interactive TV (some had never used it, some had and some even owned the particular box).
The focus was to see if the viewers were interested in the interactive services and how and when they used them.
We learned that for these users, watching the World Cup was all about two things: engagement in and understanding of the game. Interactive services certainly had potential for playing a major role in both areas. The test persons expressed that they expected to get a more intense sensory experience, easier access to detailed, up-dated information about the game and the tournament as well as more freedom to choose the game coverage. But as it turned out, the applications had marginal applicability to the users’ interests and therefore, the users’ use of the interactive services while evident and engaged at first, tapered off during the match.
To increase the sensory experience, the enthusiasts wished to see the same game highlights again (and sometimes again and again), to see the game from different camera angles and to choose the type of commentator – some wanted the distanced, neutral tone of voice, some a commentator in another language or even from one of the competing nations, others wanted a specific commentator. They also wished they had access to a separate channel broadcasting the game with a fifteen to thirty second delay.
The enthusiasts had extensive knowledge of the participants in the games – players, substitutes, coaches, referees and commentators. Among other sources, this knowledge came from tele-text (also know as Text-TV), betting magazines, newspapers, and football shows. The interests of the enthusiasts would often include betting, polls and competitions on multiple media like the web and newspapers. (As of yet, none of the interactive TV applications include these elements.)
When watching the games, the viewers would spend much time debating the strengths and possibilities of the teams in the tournament. They demanded facts to help them understand the game here and now - such as substitutions and bookings (warnings) – as well as the unfolding tournament – including the current standings, possible scenarios in case of goals in this match etc. They did not mention any interest in statistical facts or World Cup trivia, such as who made the fastest goal ever, how many corner kicks per match or the like. The overall message that emerged from our testing was: Practical information about the current tournament, please.
To sum up: the enthusiasts did not find the available services very interesting and they did not use them very much:
• The camera angles provided would rarely bring the viewers any closer to the game action than would the regular broadcast stream. Only the tactical angle gained some interest and was used in 2-3% of an average game. The highlight channel was a disappointment in the first part of the game, as the highlights are few and not instantly updated, for instance after a goal.
• The separate TV domain with news and standings was completely superfluous in the context of the game that was the focus of the test. Maybe in between games viewers would find it interesting, but the information was out of date compared to the many resources of news and analysis that these enthusiasts would seek.
The quizzes and polls were dull and un-engaging. What the test persons really wanted to do was to place bets and to check the odds on the games, and to order pizzas, beer or soft drinks. In any case, they did not want to miss one second of the game, so these services would only be of interest to them if designed to function without disturbing the game.
Most of these findings – about the nature of watching a game and following a tournament – could have been achieved at an early stage of development of the World Cup i-TV services simply by applying generally accepted usability prototyping and iterative testing. This could have helped the broadcasters to provide much more interesting and appealing services to the viewers, increasing the general interest in interactive TV and maybe its penetration, and thereby serving the broadcasters (as well as the viewers) interests in many ways.
By Thomas.
September, 1. 2004.
Permanent URL to Interactive TV user research -an example
Five to six test persons are commonly used for each round of usability testing. The effect of each test person’s work (namely the number and the seriousness of the problems revealed by the test persons) usually decreases considerably after the third or fourth test person. The test persons then often begin to repeat some of the problems that have already been uncovered by the previous test persons. The effect, of course, varies a lot from test person to test person and from website to web site. To web sites with a wide target group, each different segment can give very different results, and you will likely need three to four test persons from each segment in order to gather those differences.
To make sure that you collect enough data, it is usually a good idea to ask six test persons to the test, even though four are enough. The test persons may not show up for a variety of reasons. It is far preferable to carry out one test too many than risk having to cancel an otherwise well prepared test due to dropouts. As with focus groups and workshops, you can try to safeguard yourself against many dropouts by having substitutes ready.
An alternative to increasing the number of test persons is to use more than one test leader. Keep in mind that the test leader’s influences on test persons vary and lead to different results. Structuring the test by having one test leader test three test persons and another test leader test the remaining three can generate more insight than using the same test leader to test all six test persons.
I recently stumbled over this reader comment by Jim Lewis on HFI's website which seems relevant in this context.
"(...)Something that people always seem to forget (...) is that the use of small-sample usability tests is bound to iterative testing. If you're not iterating, then small samples don't make much sense. By the time you finish testing, you should have tested a relatively large number of participants -- not just five!"
Mr. Lewis also suggests a procedure to determine sample size (oh yes, the World is made of Math):
"Basically, the message was that the sample size you need depends on the problem discovery rate (p). If p is large, then you don't need very many people to discover the problems available for discovery. If p is small, then you need a larger sample. An ROI simulation indicated that the appropriate target for the proportion of problems to discover also depended on the value of p, with higher values having a break-even point at around 98% and lower values of p having a break-even point at around 86% (it's important to keep in mind that these values might depend on the assumptions made for the simulation -- but I think they are still informative). The data I reported in Lewis (1994) had p=.16, so to get to 86% problem discovery, I needed to run 12 participants (I actually ran 15)."
Interesting, but I really don't know what p is - is it any close to pi (hahaha)?
Sounds a bit like: if it is hard to find the problems you need more testing. "One size doesn't fit all but try twelve anyway".
My law? Start with small sample (three), test, evaluate, continue testing if the evaluation results were poor, iterate. Good luck.
By Thomas.
August, 6. 2004.
Permanent URL to An old discussion still going strong: How many test persons do you need
Finding a usability partner in another country is not as easy as it sounds - especially not if you're looking for a reliable and resourceful professional (you probably are).
Here are some useful starting points:
Esomar - The World Association of Research Professionals
UPA – The Usability Professionals' Association
CHI and the local SIGs - (Special Interest Groups on) Computer-Human Interaction
Good luck!
By Thomas.
July, 1. 2004.
Permanent URL to Finding a usability partner in another country
Loved this; a simple solution to the problem of testing via the thinking-aloud protocol with test persons who don't like to offend (give negative feed back) or to admit that they cannot solve a task. Enter the Bollywood method;
Kath Straub:
"Chavan's Bollywood method derives from the Bollywood film genre, India's version of Hollywood movies, which are typically emotionally involved plots with great dramatic flourish. Within the usability-testing session, Chavan sets up a Bollywood scene -- the participant's beautiful, young, and innocent niece is about to be married. Erstwhile, the protagonist / usability-testing participant learns that the groom-to-be is a hit man! Worse yet, HE IS ALREADY MARRIED! The participant must deliver the evidence (and the wife) to the niece in person or she will never be convinced. No time to waste! Book that train ticket!
Chavan finds that participants who were previously reluctant to complete or comment on the task, willingly assumed this fantasy and with great excitement began the ticket-booking process. The fantasy situation provides license to communicate in a way that, under normal circumstances, would be culturally prohibited. Further, given the gravity of the situation, even minor usability challenges elicit clear and penetrating commentary."
By Thomas.
April, 5. 2004.
Permanent URL to The Bollywood approach to usability testing
Powered by Movable Type
Copyright 2004, Thomas Visby Snitker
Syndicate: RSS 2.0 excerpt | Atom full text