Thursday, October 15, 2009

Value Sensitive Design to Escape the Cubicle (my VSD conceptual investigation)

Sociotechnical Problem Space

I’ve recently been doing some research on the importance of eliminating “sensory distraction” in virtual learning environments, allowing learners to become immersed in the learning experience/activity, thereby securing their focus and making the learning experience as effective as possible. Research has shown that the more a person is isolated from their physical environment during a virtual learning task, the more effective that learning task becomes.

As I pondered problems that might be investigated through the VSD lens, it occurred to me that the absence of sensory distraction is probably a good means of enhancing focus in any computer-mediated task. The majority of the work I do at my place of employment is done sitting in my cubicle, working on my computer, and it’s important that I focus on what I am doing, in order to meet deadlines and produce high-quality work. However, I often find that I have trouble focusing, due to the extremely high levels of sensory distraction that are inherent in a cubicle environment. Above me, fluorescent lights shine down with such intensity that I often find myself squinting to see my computer monitor. Behind me, I hear a multi-function printer printing and stapling a large document, one of hundreds that it will produce today (the printer is shared by the whole floor). All around me I hear the conversations, typing, coughing, opening and closing of drawers, ringing phones, opening and closing of doors, and miscellaneous other sounds produced by my cubicle neighbors. Every few minutes, someone walks within three feet of me, looking for someone, or just wandering the sea of cubicles. It’s difficult to ignore them, as the walls of my cubicle only reach to their shoulders…so I see them all. When considered in light of the recent research on the effects of sensory distraction on computer-mediated learning, it’s no surprise that I sometimes struggle to focus. In fact, it’s a miracle that I ever get anything done, let alone anything creative (considering my less than inspirational surroundings). I believe that the situation could be improved through some form of virtual work environment, designed to isolate the user from his/her physical surroundings and enhance immersion in the task(s) at hand. In the most general terms, I am focusing on computer use (as a primary job function) in a cubicle environment as my sociotechnical problem space.

Value(s) Implicated

I believe there are several human values that are potentially implicated in this investigation, including privacy, autonomy, trust, human welfare, and accountability. However, the value I would like to focus on is “calmness,” as described by Friedman in The Computer Interaction Handbook. Friedman says that “the most potentially interesting…and profound change implied by the ubiquitous computing era is a focus on calm (Friedman 1256). If computers are everywhere, they had better stay out of the way…” Friedman goes on to talk about the importance of designing so that people can remain serene and in control. I would expand this by saying that not only do computers need to stay out of the way, but technology in general. According to Friedman, “it is not surprising that we see an emerging body of work addressing the challenges of information overload and interruptability” (Friedman 1257). In a paper called The Coming Age of Calm Technology, Mark Weiser and John Brown talk about technology being the “enemy of calm” (Weiser 3). They question whether we can look to technology itself for a solution to the chaos, and ultimately tell us that “some technology does lead to true calm and comfort” (Wesier 3). They clarify this by saying, “There is no less technology involved in a comfortable pair of shoes, in a fine writing pen, or in delivering the New York Times on a Sunday morning, than in a home PC. Why is one often enraging, the others frequently encalming? We believe the difference is in how they engage our attention. Calm technology engages both the center and the periphery of our attention...” (Wesier 3).

Merriam-Webster defines calmness as “free from agitation, excitement, or disturbance.” I think that last piece is key to what I am investigating here. Merriam-Webster tells us that to disturb is to “interfere with,” “destroy the tranquility or composure of,” or “to put to inconvenience.” It is important for a variety of reasons that a workplace computer user be free from disturbance, as defined here, particularly in the problem space in question…when focused computer use is a primary job function and the user is working in an inherently non-calm environment. Clearly the sensory distractions described above would be classified as disturbances and therefore contrary to the notion of calmness. It therefore seems logical that a tool designed to remove those distractions would promote calmness. However, any new technology introduced into this environment has the potential to add to that lack of calmness by itself becoming a disturbance. Therefore, I think that “calmness” would be a critical value to consider in the design of any system intended to overcome the challenges of sensory distraction in a cubicle environment.


Direct Stakeholders:
  1. The computer user/employee who is working in a cubicle environment.
    • The computer user/employee that uses the new tool/environment would be most directly affected. They would ideally be benefited in the sense that their calmness or the calmness of their work environment would increase (since that is essentially the goal of the tool/environment). However they could potentially suffer a decrease in calmness if the tool/environment is poorly designed.
  2. The employer or managers over this employee.
    • The key intended consequence of this tool/environment would be an increase (either in speed or quality) in production of work from the employee, which in turn benefits the employer or manager. Therefore the employer or manager will be directly affected by the results.
Indirect Stakeholders:
  1. Physically proximal computer users/employees.
    • Just as existing cubicle environments include many distractions originating from other cubicle dwellers in the neighboring area, any new technology introduced has the potential to affect neighboring users/employees (either in the form of increased distractions, or ideally, in the form of decreased distractions).
  2. Distant computer users/employees who must interact with the employee/user of the new system.
    • Many cubicle dwellers value the ability to quickly walk over to a co-worker and interact with them in person. A virtual work environment would be likely to change this interaction significantly.
  3. System designer.
    • As discussed in class, the system designer is a stakeholder in the sense that he/she is designing the system and attempting to consider human values that might be affected by his/her design choices.

Bricken, Meredith. “Virtual worlds: No interface to design.” Ed. M. Benedikt. Cyberspace: First steps. Cambridge: MIT Press, (1992).

Bronack, Stephen C., et al. “Designing Virtual Worlds to Facilitate Meaningful Communication: Issues, Considerations, and Lessons Learned” Technical Communication 55.3 (2008): 261-267.

Bronack, Stephen C., et al. “Presence Pedagogy: Teaching and Learning in a 3D Virtual Immersive World” International Journal of Teaching and Learning in Higher Education 20.1 (2008): 59-69.

“calmness.” Merriam-Webster Online Dictionary. 2009. Merriam-Webster Online. 15 Oct 2009.

Dickey, Michele D. “Three-dimensional virtual worlds and distance learning: two case studies of Active Worlds as a medium for distance eductation” British Journal of Educational Technology 36.3 (2005): 439-451.

“disturb.” Merriam-Webster Online Dictionary. 2009. Merriam-Webster Online. 15 Oct 2009.

Franceschi, Katherine G., and Ronald M. Lee. “Virtual Social Presence for Effective Collaborative E-Learning” Proceedings of the 11th Annual International Workshop on Presence. 2008.

Friedman, B., et al. “Human values, ethics, and design” The Human Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. Ed. Andrew Sears and Julie Jacko. New York: Lawrence Erlbaum, 2008. 1241-1266. Print.

Gabbard, Joseph L. A Taxonomy of Usability Characteristics in Virtual Environments. MS thesis Virginia Polytechnic Institute and State University, 1997.

Martinez, Nicola. “Second Life: The Future of Communications?” Proceedings of the 55th Annual Conference of the Society for Technical Communication. 2008.

Padmanabhan, Poornima. “Exploring Human Factors in Virtual Worlds.” Technical Communication 55.3 (2008): 270-275.

Slater, Mel. “Measuring Presence: A Response to the Witmer and Singer Presence Questionnaire” Presence: Teleoperators and Virtual Environments 8.5 (1999): 560-565.

Weiser, M., and John Brown. “The Coming Age of Calm Technology.” Beyond Calculation: The Next Fifty Years of Computing. Xerox PARC, 1996.

Witmer, Bob G., and Michael J. Singer. “Measuring Presence in Virtual Environments: A Presence Questionnaire” Presence 7.3 (1998): 225-240

Wednesday, October 14, 2009

Leveraging Cognitive Architectures in Usability Engineering

In chapter 5 of “The Human-Computer Interaction Handbook”, Byrne discusses how cognitive architectures can be applied in usability engineering (Byrne 95). He mentions that traditional engineering disciplines are grounded in quantitative theory. Engineers in these disciplines can augment their designs based on predictions derived from such theory. In contrast, usability engineers do not have these quantitative tools available and therefore every design must be subjected to its own usability test (Byrne 95). Computational models, based on cognitive architectures, have the potential to give usability engineers quantitative tools similar to those available in traditional disciplines. With these tools, usability engineers could potentially quantify the effects of changing attributes of a system (e.g. – changing aspects of a user interface). In this paper, I discuss additional implications and applications of cognitive architectures in usability engineering.

Testing Economy
There are clear, positive implications in leveraging cognitive architectures in usability engineering. Most apparent is an economy in testing.

Software testing can be costly, especially if the software is intended for a large population of users. For example, if the software is to be used by a global audience, languages and other aspects of the target cultural ecosystems need to be considered. Testing would need to be duplicated to support the variance in users. Additionally, testing can be costly for large systems containing many functional points. To meet the goal of building a large and functionally accurate system, multiple usability tests are performed that iteratively shape the software being developed. Usability tests must be altered and additional usability tests need to be created depending on how much the software changes between iterations. Therefore in highly-iterative development, the total cost of usability testing is multiplied by a factor of the number of iterations the software has gone through. Another point of consideration is the administrative costs of performing usability testing. This includes items such as: engineering test cases, writing test plans, distributing test plans, setting up security access for testers, and the coordination and tabulation of test results.

Leveraging a cognitive architecture can help mediate these costs if we can create realistic cognitive agents that model the user base. The costs of utilizing people to perform testing will be reduced since cognitive agents could test in their place. For global applications where there is a disparate user base, user differences could be simulated. For example, varying cultural dimensions could be modeled within the agents. Costs from iterative development could also be avoided as agents that were constructed for initial tests could simply be reused for subsequent testing. Overhead involved with administering tests will be lessened since there would not be a need for the coordination and distribution of testing among large groups of users.

Social Networking System Application and Usability
In his paper, Dr. Ron Sun explains how CLARION, a cognitive architecture, can be applied to modeling social networks (Sun 2006). CLARION is particularly well suited to model social networks since aspects of its various subsystems allow the creation of realistic social cognitive agents. CLARION includes a motivational subsystem that models needs, desires, and motivations within agents. More specifically, it can be used to model the physical and social motivations of the agent as the agent interacts with its environment (Sun, 2006). Additionally, the agent can understand other agents’ motivational structures, thereby promoting cooperation in a social setting. CLARION also includes a meta-cognitive subsystem that orchestrates the interaction of other subsystems within the architecture. This allows an agent to reflect on and modify its own behaviors, an ability that makes social interaction possible (Tomasello 1999). This “self-monitoring” allows agents to more effectively interact with each other by providing a means for the agent to alter behaviors that may prevent social interaction.

With the ability to effectively model human social aspects, we can use cognitive architectures to perform usability analysis on systems that function within a large social setting (for example: a big city population). Traditional usability analysis on the effects such systems might not be possible. The physical deployment of systems to a large community of people is met with several obstacles. First and foremost is cost involved in usability testing. As mentioned in the section above, there are overhead costs such as coordination of testing and distributing test plans. Additionally, a system interface would have to be set up for each person in the community to simulate its effects precisely. Thus, we would incur major expenses without first understanding potential benefits. Secondly, the actual coordination of usability testing in a large community would not be feasible. This is because recruiting the number of individuals required isn’t practical. Finally, there are temporal issues since a social network matures slowly in real-time. Using a cognitive architecture, we can construct a model of the social network that would enable us to avoid these pitfalls.

Along with mitigating the difficulties in usability analysis, there are other benefits to using cognitive architectures. Parameters of the simulated social network can quickly be changed to model real-life scenarios. (Parameters include community size, agent type distribution, and epoch length.) The beliefs, goals, and knowledge of the simulated people (cognitive agents) can also be modified. Finally, since the system is not deployed to actual users, coordination and deployment of changes to users does not need to occur. These benefits allow the social model to be adjusted rapidly and without recourse when managing shifting user requirements. Ultimately, being able to effectively manage change leads to a more usable software system.

Byrne, M. D. (2007). "Cognitive architecture." In Sears, A. & Jacko, J. (Eds.). The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. (pp. 93-114). Lawrence Erlbaum.

Sun, Ron (2006). “The CLARION cognitive architecture: Extending cognitive modeling to social simulation.” In: Ron Sun (ed.), Cognition and Multi-Agent Interaction. (pp. 1-26) Cambridge University Press, New York.

Tomasello, Michael (1999). The Cultural Origins of Human Cognition. Harvard University Press.

Tuesday, October 13, 2009

Human-computer capabilites and limitations

by David F. Bello

While it is of course relevant literature and a highly important field of inquiry into the study of human-computer interaction, I can't help but feel that computational imitations of human cognition are always already falsely deterministic of the way our human brains actually work. Perhaps exhibiting myself as a pessimist, I do not see how a discrete state machine of any sort might one day replicate the actual, internal functionality of a human brain. I am not a neuroscientist, nor have I ever claimed to be (except that one night at a bar in 2006, but let's not get into that...). It does, however, appear to me that the neurons in one's mind do not and cannot be represented fully by the electronic circuitry of a computer. The cognitive architectures discussed here act within software systems at a much higher level of abstraction than binary code and represent higher levels of cognitive activity than the fundamental units of neuron-to-neuron activity. Thus, any mapping of thought and code must take place in some middle-ground of cognitive architecture and software architecture that is being sought out in the 5th chapter of our textbook, "Cognitive Architectures," by Michael D. Byrne.

Douglas Hofstadter's books always come to mind when considering these more theoretical topics of HCI. For instance, in I am a Strange Loop, he analyzes the structures of abstraction that neuroscientists study (from amino acids, to synapses, to columns and cortices, and finally hemispheres), and compares these to cognitive structures which are the root of thought and language (from the concept "dog," to various types of memory, to memes and the ego, to a sense of humor, and finally the concept of "I") (Hofstadter 26). We don't always consider the relationship between these categories because they have not yet been discovered as such. For instance, in his other book, Godel, Escher, and Bach: An Eternal Golden Braid, he presents a fictional dialogue with an ant colony, whose individual units do not present data unto themselves, but serve to create meaning through large-scale systematic shifts in movement and action. This is related to the idea of software, because though we do not often comprehend the direct relationship between executing a program and the binary code beneath that layer of interface, it is always there. Just as it would be completely without meaning to see an ant walk up a stalk of grass, it is completely without meaning to see a single switch move from on to off and back without the fuller understanding of a complex system of computational programming in context. It is this nature of neurology to also follow this paradigm which leads to the idea that a single neuron has no meaning unto itself, but rather it must be the entire system of neurons firing in realtime and in context which provides meaning to the self and to the body.

But once we move forward from the levels of binary code and neural activity, there must be some middle ground through which computational cognitive models are able to accurately represent human cognition at an appropriate level. Mustn't there be? Perhaps software can be written to model the neural activity of the brain in its entirety. Or, moving upward through layers of mental abstraction, software which representations language and/or visual information in its entirety. Simpler than that, can we develop a program to accurately represent the entirety of human thought at the level of the self? That is, could a single person's neural activity ever be mapped completely into an implementation of computer software? If so, what are the implications for the relationships of death and birth, belief and memory, language and the senses? Can we ever provide a working cognitive model to "react" to a Beatles song or a painting by Goya?

To move away from my thus far tangential whimsy on the human condition, let us look at the more relevant limitations of human beings, rather than the limitations of computers to become like human beings. The Computational Complexity Theory (CCT) perfectly exhibits the way that human cognition and computer software are layered by abstraction. The task mentioned in the chapter, deleting a word from a paragraph, is something to be done, separate (computationally) from the interface itself. The interface, which in this case can be transferred out and replaced with an alternative set of means of interaction with the deletion of that word, exists "on top of" the underlying code representing the word, its placement, and the structures of both on the computer's hard drive and in its memory. Just as, when we might speak that paragraph out loud, we may be thinking in units of abstract thought; allowing our brain's language capacities to perform the work. Or we might be examining our phrases as units, rather than words; words, rather than syllables; and syllables rather than phonemes. I consider this telescopic view of language much like the layers of abstraction in software: graphical user interface, code objects and classes, programming language architecture, operating system architecture, machine language, hardware processing, etc.

What does this matter? Well, for one, the way that we view computing systems informs our thoughts about software and what we can do with computers as users. Also, the very fact that we use computational cognitive models to gauge how we think about ourselves and our own thinking can bring about great sociological inferences on not only how we use computers, btu how we "use" everything else: how we interact with one another and our environment, how we learn and grow as human beings, how we believe and how we think, etc. It is important to know that there is much we don't understand both about the human mind and the digital technology that surrounds us. While there are a great deal of programmers, there are fewer and fewer who go deeper and deeper into the levels of abstraction in code. In part, because of the growing complexity of software, but also because we are losing interest in the seemingly "mundane" aspects of binary coding, machine language, assembly code, etc. and more interested in those higher level programming systems which afford greater rewards in the form of developing complete software objects in much shorter periods of time and require a far shorter learning curve. Just as it is unnecessary for the average language user to think of every single phoneme that comes from their mouth and the oral and vocal structures behind that production of meaning, not every computer user is willing to take the time to understand what every element of hardware and software is doing while they check their email and add friends to their facebook profiles. There is much being left behind, but there is also further directions to go. Unlike the human mind, the computer is forever extensible: capable of memory upgrades, processing power, and storage capabilities far beyond that of the human mind. Where will these abstractions take us?

Byrne, M. D. (2007). "Cognitive architecture." In Sears, A. & Jacko, J. (Eds.). The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. (pp. 93-114). Lawrence Erlbaum.

Hofstadter, Douglas. G©œdel, Escher, Bach: An Eternal Golden Braid. New York: Basic Books, 1999.

Hofstadter, Douglas. I Am a Strange Loop. New York: Basic Books, 2007.

Back to the Drawing Board

Brian R Zaik

In their chapter of The Human-Computer Interaction Handbook, Welsh et Al. bring up Fitt’s Law, which helps HCI designers predict human perceptual-motor performance for users interacting with computer interfaces. Yet this and other insights were gained during the era of the two-dimensional, windowed graphical user interface. The future of interactive computing technology, as the authors note, seems especially bright when considering the kinds of haptic, augmented, virtual, and interactive interfaces currently under exploration at universities and in corporate labs all over the world. With such dramatic shifts in interface paradigms, Welsh et Al. point out that we cannot continue to believe that the laws and theories of the past will readily apply to the future. I can only conclude that with each new domain of HCI we explore, we will uncover new challenges that fit the paradigm – challenges which will likely require us to augment our existing knowledge of human capabilities with new strategies.

It is sometimes difficult to get past the novelty of an interface to realize its faults when actually used by a human being. In fact, sometimes we never get it completely right. Alan Kay’s lecture from Apple in 1987 included a video of Sketchpad, a computer-aided drawing system demonstrated in 1968. Watching the fluidity of the light pen’s interaction with the capable software in that video, my initial thought was, “Hey, this interface looks great!” But Kay quickly informs his audience that the light pen input device was a terrible input device, since the blood would drain out of the user’s fingers after only about 20 seconds of use (3). We are reminded by examples like this of the fact that no matter how interesting and intuitive an interface may appear at face value, we are ultimately restricted as interface designers by the mental and physical capabilities of our fragile minds and bodies. Unfortunately, new paradigms often bring with them a whole different set of challenges, and it is potentially dangerous to develop laws concerning human capabilities that are specifically suited to one interface paradigm.

Fitts' Law, as the four authors write, can be used as a predictive model of time to engage an object in an interface (4). They state that Fitts' Law is most useful when evaluating computer pointing devices, but should not be used as a crutch due to its limited usefulness with other types of interfaces. And they cite one type of interface for which laws like this just don't cut it: eye gazing interfaces.

I wanted to peer a little deeper into this type of experimental interface, so I discovered the thesis of Arne John Glenstrup and Theo Engell-Nielsen, two former students from the University of Copenhagen (1). These two studied the implementation and practicality of eye gazing computer interfaces in 1995, and included a detailed analysis of the problems associated with such interfaces, as well as the potential opportunities afforded by this new realm of interaction. They cite two key problems with eye gazing interfaces that track users' eye movements for computer interaction:

  • The 'Midas Touch' problem: The user's eyes never leave the screen, and thus it becomes necessary to use a "clutch" control to engage and disengage eye-gaze control.

  • The one-way zoom problem: In order to create a visual hierarchy for the user, it is necessary to have a zoom-out function that allows a user to zoom out from an object on which they have focused.

Would Fitts' Law (or Hicks-Hyman, or any other insight covered in Welsh et Al., for that matter) ever allow us to identify and understand these types of problems? The Dutch students realized that the two aforementioned issues are truly problematic if eye gazing is used as a direct replacement for the mouse pointer, and we know that Fitts' Law is particularly applicable to mouse pointer interfaces. Eye gazing UI is one example where our current body of knowledge about human capabilities (and interests, and favoritism of interface elements) is simply constrained by the types of hardware interfaces that were used as the method of computer interaction. This highlights the potential danger of conducting HCI research that relies too heavily on one particular type of UI paradigm. We can see here that the domain with eye gazing has changed so significantly from the old standard that many of the former insights into human capabilities fall short of predicting how humans will treat the eye-gazing interface. We need to develop new knowledge about this novel and unique paradigm.

But sometimes we can simply augment existing knowledge with what we already know to tackle new and distinct interface paradigms. I speak partially from my own experience when I discuss the difficulties of moving from one paradigm to another. I am a co-creator of the Concerto digital signage system (2), an open source advertising and communications medium that launched at RPI in 2008. We’ve delivered a fully functional digital signage package to the world, free of charge with openly distributed source code, and yet now our team of developers is exploring a whole set of new application areas for targeted digital advertising. Direct user interaction is an important and intriguing topic for digital signage in general – up until only recently, most signage platforms, Concerto included, have simply displayed visual messages about events, services, and other information on television screens. Direct user interaction holds the promise of pulling users closer to the advertising they see, and it can also change the “dumb” broadcasting medium into a dynamic platform for providing targeted messages on demand, depending on how individual users might interact with the medium.

With Concerto, gesture-based recognition and direct manipulation interfaces are both well worth exploring. These could extend Concerto’s capabilities to provide specific content on demand to users who stop in front of a Concerto unit and try to interact with it in certain ways. When people mostly use computers with mice and keyboards, the prospect of using hand gestures to virtually “flip” through screens of information seems foreign at first glance. Yet while these interfaces are relatively different from a mouse and keyboard, the insight we have from Fitts' Law may still be relevant. The key conclusions of that law state that "movement time must increase as the distance of the movement increase and/or the width of the target decreases" (4). If we mount a web cam on a Concerto screen and connect it to computer vision software to track hand movements of the user in free space, we’ll be interested in considering how far the user would be expected to move her hands in order to successfully complete a “flipping” gesture that would slide one full-screen message out for another new one. With direct manipulation interfaces, such as a multi-touch kiosk that could provide contextual content on demand in response to the user pressing the screen at arm level, the relative sizes of target objects must be designed to incorporate Fitts' conclusions. So while the interface paradigm is significantly distinct from a pointer-based interface, the insights of old can still be relevant. This is not to say that we are completely in the clear relying on old knowledge. Using hand gestures in front of a Concerto screen for an extended period of time may quickly fatigue the user, just like with the Sketchpad light pen. Only after entering this space and using physical gestures in such a way would we be able to accurately predict human limitations of that sort. So we need to augment our new insights with distinct interface domains with the knowledge that other HCI researchers before us have recorded.

Where do we go from here? It sure seems like we have a lot of work to do. But all is not lost – as we continue to explore new frontiers of HCI, we continue to build up a repository of insights into our own capabilities as human beings using a variety of different means of connecting with machines. It is our collective responsibility, as HCI adventurers in brave new worlds of computing, to study how human beings react to and interact with our new interfaces. Then, we must document them. It is clear that we will not be able to generalize many laws and observations of the past to virtual reality, augmented reality, three dimensions, and gaze-tracking systems. We may, however, be able to piggy-back off of existing knowledge to gain insights to help us deal with new, daunting challenges. We are fortunate to be involved in such a fast-changing, applications-focused field, and we should do our part to keep advancing knowledge forward, so that we may build a more rounded and versatile view of how our physical and mental characteristics affect our use of machines.


  1. Glenstrup, Arne J., and Theo Engell-Nielsen. "Eye Controlled Media: Present and Future State." Diss. University of Copenhagen, 1995. Denmark: DIKU, 1995. Datalogisk Institut. University of Copenhagen. Web. 12 Oct. 2009.

  2. "Home - Concerto Digital Signage." Concerto Digital Signage." Concerto Digital Signage. July 2009. Web. 12 Oct. 2009.

  3. Kay, A. (1987). Doing with Images Makes Symbols. University Video Communications.

  4. Welsh, T. N., Chua, R., Weeks, D. J., & Goodman, D. (2007). Perceptual-motor interaction: Some implications for HCI. In Sears, A. & Jacko, J. (Eds.). The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. (pp. 27-42). Lawrence Erlbaum.

Monday, October 12, 2009

Perceptual-Motor Interaction

The reading that this essay will be primarily concerned with is the Welsh, T. N., Chua, R., Weeks, D. J., & Goodman, D. (2007). Perceptual-motor interaction: Some implications for HCI. In Sears, A. & Jacko, J. (Eds.). The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. (pp. 27-42). Lawrence Erlbaum.


In the modern world of Adobe Flash and Microsoft Silverlight, it is often the case that we can forget the very basic restrictions of the our user’s operating environment. In ‘Perceptual-motor interaction’, the authors explore the foreign world of user’s physical actions and the underlying cognitive processes of those actions. It is the concern of this article not only how people move while they interact with computers, but why they move that way.

Current Understanding

Central to the author’s description of current understanding of perceptual-motor interaction are two ‘laws’, Fitts’ law and the Hick-Hyman law.

“The (Fitts’) law predicts pointing (movement) time as a function of the distance to and width of the target—where, in order to maintain a given level of accuracy, movement time must increase as the distance of the movement increase and/or the width of the target decreases.”(ibid. 28)
“the Hick-Hyman law predicts the decision time required to select a target response from a set of potential responses—where the amount of time required to choose the correct response increases with the number of possible alternative responses.”(ibid. 29)
Simply stated, the two laws are:
  1. Fitts’ Law – Closer, bigger items are easier to find.
  2. Hyman-Hicks - The more things you have select, the longer it takes to pick the right one.
Fitts’ law only describes the necessary response time after a target is selected, so in order to produce an accurate prediction of the response time necessary to acquire a target, the total time required to select the target must be added to the time necessary to acquire and select it.

While Fitts’ and Hyman-Hicks are very good at predicting the time necessary to acquire a target, they do not explain why. Interfaces are slipping into augmented and virtual reality, and more in-depth tools than these laws are necessary to explain why an interface will function well for human beings.


“Translation” refers to how the user processes perception and turns it into action. The key concept of translation appears to be layout. An interface where the items ‘on screen’ are arranged similarly to the user’s physical action is much more effective than one that is visually unrelated. The article refers to this as ‘calibration’ and uses the example of the user attempting to find the mouse cursor on a screen by waving it around in a somewhat random pattern. Connections are made in your brain between the motion of your hand and the cursor on the screen.


Attention is a finite resource. The article states that “…humans, like computers, have a limited capacity to process information in that we can only receive, interpret, and act upon a fixed amount of information at any given moment.” (ibid. 30) Different parts of the interface divert and demand our attention at any given time.

According to the article there are three characteristics of attention:

  1. Attention is selective – only a small part of the information may be focused on at a single time.
  2. Attention is changeable – the focus of attention may be diverted from one thing to another.
  3. Attention is dividable – it is possible to ‘pay’ attention to multiple things at once.
Human’s visual attention is available through the fovea and perifoveal areas on their retinas. The fovea is the part of the eye that is responsible for direct vision, and the perifoveal areas are responsible for peripheral vision. Attention can be shifted back and forth between direct and peripheral vision, and both are useful from an interface perspective.

Endogenous attention shifts are the conscious shift of attention between two things, such as a user deciding that he is done reading for a while, and looking out the window instead. An exogenous shift of attention is initiated outside of the user, such as a flash of lightning drawing your eyes to the skyline.

While it would seem that exogenous shifts of attention are involuntary, they are actually slightly more than that. Random changes in the visual stimulus only capture a user attention if the change falls within the set of things for which the user is looking. Even when we are distracted, we filter what distracts us.

Inhibition of Return (IOR)

Cognitive forces are in place in the human mind that make it hard to get back to work. Inhibition of return (IOR) refers to the interference that cognition experiences after being distracted. Interestingly, while it is relatively easy to facilitate the transfer of attention from one object to another, it can take considerably longer to return to a previous point after distraction. Blink tags make a page incredibly difficult to read, because they repeatedly divert the user’s attention and cause them to stumble back to where they were.


The word ‘interaction’ in Human Computer Interaction implies an almost conversational interplay between human and machine. The action that the user employs to communicate with the computer is closely tied to the way that their attention is directed. The model of response activation holds that: “Responses are activated to attended stimuli regardless of the nature of attentional dedication. Each stimulus that matches the physical characteristics established in the response set captures attention and, as a result, activates an independent response process.”(ibid 34) If a user undergoes a certain stimulus, they will ‘remember’ and quickly transition to a set of responses that they have connected to that stimulus. Conversely, if the stimulus is unfamiliar the responses will be sluggish and non-intuitive. It is similar hearing three notes of a song and “knowing” the rest. A design that resonates with a user will dramatically decrease increase their ability to interact with a system.

The context of the target is also important. Finding a Microsoft Word document in the midst of a folder containing interspersed Word and Excel documents is more difficult than simply finding the Word document amongst other Word documents. The mere presence of a single Excel document is more jarring than any of the ‘wrong’ Word documents.

Password Authentication – Human Capability vs. Security?

I first faced the challenge of balancing human capabilities with security requirements during a work project last year and have been interested in the topic ever since. A portion of the project was to design a login experience for users who would have to create new passwords meeting strict requirements, select security questions/answers and still protect the system from hackers or bots. This included a backup authentication process to verify the identity of users who have forgotten their username or password.

Our password requirements were an example of where security and human capability collided. The process of trying to come up with a password that had eight characters, one number, one capital letter, one symbol, no repeated character strings, no repetition of past 3 passwords yet also be “easy” to remember was expecting a lot of our customers. My team was able to mitigate some frustration and calls to the help desk by displaying the requirements and implementing dynamic visual feedback when each requirement was met. Still, the process was frustrating for customers who have many other passwords to account for in their lives.

Deborah S. Carstens, in Human and Social Aspects of Password Authentication, recognizes the need for reliable security practices by people and organizations. The level of security threats to computers and networks has increased along with our increased dependence on them in almost all aspects of our lives. Many companies and individuals rely on password authentication to verify the identity and access rights of users. Security practices may be necessary but are often complex and unrealistic because they don’t take into account human capabilities or cognitive theory principles. Carstens notes that it doesn’t have to be that way. “User memory overload can be minimized when all aspects of a password authentication system have been designed in a way that capitalizes on the way the human mind works and recognizes its limitations”.

Although there are other security techniques being implemented today such as biometric authentication, Carstens explains that research on passwords is still relevant. Most people are still entering passwords even if biometric scanners are available. And we all have multiple passwords to remember for many different systems – usually several at work, school, and personal use. Given that human memory is limited, there is a risk of information overload and there’s often a tradeoff between creating passwords that are “easy to remember” and ones that are truly secure.

When password requirements exceed human memory limitations, people are more likely to engage in practices that threaten security, such as creating easy-to-guess passwords or writing passwords down on paper and keeping that paper in an unprotected location.

In her research, she recommended several guidelines to create passwords that are both secure and do not exceed human capabilities. In addition, Carstens explained, users should be offered training on how to create passwords that are both secure and meaningful, which will aid in retention.

  • Passwords must be a combination of symbols, numbers, and letters

  • Passwords cannot use the same character more than twice

  • Passwords must not spell out words that are found in a dictionary or use a proper noun such as a name of a person, pet, place, or thing

  • Passwords cannot contain information easily accessible to the public such as a social security number, street address, family members’ birthdays, and wedding anniversary dates

  • Passwords contain two to four chunks of data and are comprised of 10 to 22 characters in length, which will be dependent on the character length capabilities of any given system

Her last guideline was interesting to me and she explains it more thoroughly in her article. Essentially, she recommends that people use “chunking” along with strings of letters and numbers that represent a meaningful concept. This builds on a lot of past research, from Miller (1956) to Wickens (1992) to Proctor (2002) and her own research in 2006, with many others in between.

In practice, you would select for yourself a core “chunk” that you could include in all your passwords. This would be followed by a second chunk that would have meaning to you in relation to the application/system you need to access.

Using her examples, let’s say you start out with “Mb#=43” which translates to “my basketball number equals 43” and the second chunk would be “fiem” which translates to “Florida industrial engineering major”. These two chunks would be combined to serve as your password for accessing a university portal, “Mb#=43fiem”.

To access your social network accounts, you might select a second chunk like “GMHSV”, which translates to “go Madison High School Vikings”. You would add this to your “core chunk” which result in “Mb#=43GMHSV”.

This seemed overly cumbersome at first, but what is intriguing is that it attempts to find a meeting point between the need for secure authentication without exceeding human capabilities. She also references research that supports offering guidance to users on how to create those meaningful yet secure passwords.

I think back to my project and realize that although we provided dynamic feedback on how well a user’s chosen password met security requirements, we did not offer guidance on how to come up with a password that not only meets the requirements but can also be remembered. For me, it’s definitely worth further consideration and further review of related research.

Carstens, D. (2009). Human and Social Aspects of Password Authentication. In M. Gupta & R. Sharman (eds.), Social and Human Elements of Information Security: Emerging Trends and Countermeasures, (pp. 1-14).

Human Capabilities in HCI

Human sensation, perception, cognition, and action are all very tightly integrated. Traditionally, experts have separated the studies of these systems into an input class which encompasses both sensation and perception, and an output class consisting of studying behavior and motor expression, with a space in between for processing- akin to the black box model. However I would argue that they are so tightly coupled that it is impossible to draw a distinct line which separates them succinctly. Rather, these systems- only a small portion of which can be physically observed to the naked eye- flow smoothly into each other, reminding us how interdependent they really are. Perhaps the most important facet of these couplings is the feedback loop that exists between the input and output classes of study, and how that relationship both affects and is affected by the more cognitive systems like attention and memory. More specifically, these systems and their relationships play a vital role in how we go about creating technology in a symbiotic manner.

The paper "Perceptual-Motor Interaction: Some Implications for HCI" contributed by T. Welsh, R. Chua, D. Weeks, and D. Goodman describe two theoretical and analytical frameworks for studying perceptual-motor interaction in HCI. In the first framework, the authors take the view of perceptual-motor interaction in the context of an information processing model. In this, they point to (pun unapologetically intended) how humans currently issue commands and select objects in user interfaces, traditionally using a mouse. They mentions Fitts' Law- a predictive model of time to engage a target- which essentially states that targets are easier to acquire if they are closer and bigger, and that the difficulty of accurately attaining a target is a function of the distance and width relative to the current position. Designers should be aware that commonly expected actions need to larger in diameter and should be as close to the starting point as possible. Another design implicative fallout from Fitts' law is the 5 fastest locations that a mouse cursor can access at any given time (assuming the cursor is at the center to start with, these locations are: the center, and the 4 corners- since each corner has infinite target width in both the x and y axes). Some interfaces have taken this to heart (Google Chrome's tabbed browser features tabs that reach all the way to the top edge of the screen), and many have not. It should be noted that Fitts' Law has successfully been applied to direct manipulation interface- not only in the case of mapped interfaces requiring mouse interaction.

The authors also mention the Hick-Hyman law, which predicts the decision time required to select a target response from a set of potential responses. This law states that the amount of time it takes to make a choice increases with the number of possible choices. This has direct implications for design in that task completion time increases greatly when you provide more menu options, more buttons, more labels, etc. This is a common argument against scope creep. This problem is not only applicable to Human Computer Interaction, however. In this TED talk, Barry Schwartz describes the paradox of choice, and how even though it may seem to follow that we increase satisfaction by offering users more options, all it does is make it take longer to choose (inducing chooser's paralysis), and ends up making them feel worse afterwards (due to chooser's regret).

The paper goes on to describe Attention, and how it affects perceptual-motor interaction due to its limiting affect on one's capacity to process information. The authors offer three important characteristics of attention:
  1. Attention is selective and allows only a specific subset of information to be processed
  2. Focus of attention can be shifted from one source of information to another
  3. Attention can be divided such that, within certain limitations, one may selectively attend to more than one at a time (the cocktail party phenomenon)

Attention is closely associated with ethics in web design now, since many companies have learned how to manipulate users' attentional characteristics to get them to pay attention to their advertisement on web sites. For instance, using movement, color, and other stimuli which break that barrier discussed in the divided attention (cocktail party) model, companies can lasso users' attention away from their original task material and toward their offers. Perhaps what web-savvy marketing companies didn't count on was a phenomenon known as "Banner Blindness." Essentially, users have rather quickly adapted to learn how advertisements look and behave on websites to the extent that their attention has learned to block it out. In accordance with the first described characteristic of attention, users' began to selectively ignore banner ads on websites. Lately, advertisers have had to become more clever in capturing the attention of website visitors, employing tactics like hidden ads, interactive ads, and fly-across ads.

The paper wraps up the discussion on perceptual-motor interaction in applied tasks by illustrating a few examples: remote endoscopic surgery, PDA operation, and eye-gaze vs mouse interactions. The last example exemplifies the power that eye gaze technologies may have in future selection activities, citing that they may be more efficicent that those of the traditional manual (mouse) systems. While supported by new and upcoming research activities, I would like to suggest caution in assuming superiority in every day actions with use of eye gaze interfaces. While there are several critiques, one that stands out to me is that of the Midas Touch problem. Human movements are easy, voluntary, and natural. Conversely, eye gaze commands are not necessarily natural for interface commands, not easy to do with high accuracy, and most importantly, are not always voluntary. Unlike a simple purposeful twist of a knob or a flick of a switch, the eyes are always "ON." How is an interface to know when to accept gaze as input, and when the user is just looking? This problem is a tricky one and needs to be addressed in any interface system where false-positives are likely to be an issue (gestural interfaces, Brain Computer Interfaces).

The eye gaze vs. mouse investigation reveals the importance and difficulty in HCI around trying to optimize for user productivity, in so far as finding ways to make interactions more efficient by taking advantage of how humans can interaction in particular ways. In the following concept video, the makers make a case for an alternative interface. The attempt here is to optimize our ability to interact with our fingers, while at the same time optimize for ergonomic comfort and visibility- common critiques of current touchscreen technologies.

10/GUI from C. Miller on Vimeo.