Monday, October 12, 2009

Perceptual-Motor Interaction

The reading that this essay will be primarily concerned with is the Welsh, T. N., Chua, R., Weeks, D. J., & Goodman, D. (2007). Perceptual-motor interaction: Some implications for HCI. In Sears, A. & Jacko, J. (Eds.). The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, 2nd Edition. (pp. 27-42). Lawrence Erlbaum.


In the modern world of Adobe Flash and Microsoft Silverlight, it is often the case that we can forget the very basic restrictions of the our user’s operating environment. In ‘Perceptual-motor interaction’, the authors explore the foreign world of user’s physical actions and the underlying cognitive processes of those actions. It is the concern of this article not only how people move while they interact with computers, but why they move that way.

Current Understanding

Central to the author’s description of current understanding of perceptual-motor interaction are two ‘laws’, Fitts’ law and the Hick-Hyman law.

“The (Fitts’) law predicts pointing (movement) time as a function of the distance to and width of the target—where, in order to maintain a given level of accuracy, movement time must increase as the distance of the movement increase and/or the width of the target decreases.”(ibid. 28)
“the Hick-Hyman law predicts the decision time required to select a target response from a set of potential responses—where the amount of time required to choose the correct response increases with the number of possible alternative responses.”(ibid. 29)
Simply stated, the two laws are:
  1. Fitts’ Law – Closer, bigger items are easier to find.
  2. Hyman-Hicks - The more things you have select, the longer it takes to pick the right one.
Fitts’ law only describes the necessary response time after a target is selected, so in order to produce an accurate prediction of the response time necessary to acquire a target, the total time required to select the target must be added to the time necessary to acquire and select it.

While Fitts’ and Hyman-Hicks are very good at predicting the time necessary to acquire a target, they do not explain why. Interfaces are slipping into augmented and virtual reality, and more in-depth tools than these laws are necessary to explain why an interface will function well for human beings.


“Translation” refers to how the user processes perception and turns it into action. The key concept of translation appears to be layout. An interface where the items ‘on screen’ are arranged similarly to the user’s physical action is much more effective than one that is visually unrelated. The article refers to this as ‘calibration’ and uses the example of the user attempting to find the mouse cursor on a screen by waving it around in a somewhat random pattern. Connections are made in your brain between the motion of your hand and the cursor on the screen.


Attention is a finite resource. The article states that “…humans, like computers, have a limited capacity to process information in that we can only receive, interpret, and act upon a fixed amount of information at any given moment.” (ibid. 30) Different parts of the interface divert and demand our attention at any given time.

According to the article there are three characteristics of attention:

  1. Attention is selective – only a small part of the information may be focused on at a single time.
  2. Attention is changeable – the focus of attention may be diverted from one thing to another.
  3. Attention is dividable – it is possible to ‘pay’ attention to multiple things at once.
Human’s visual attention is available through the fovea and perifoveal areas on their retinas. The fovea is the part of the eye that is responsible for direct vision, and the perifoveal areas are responsible for peripheral vision. Attention can be shifted back and forth between direct and peripheral vision, and both are useful from an interface perspective.

Endogenous attention shifts are the conscious shift of attention between two things, such as a user deciding that he is done reading for a while, and looking out the window instead. An exogenous shift of attention is initiated outside of the user, such as a flash of lightning drawing your eyes to the skyline.

While it would seem that exogenous shifts of attention are involuntary, they are actually slightly more than that. Random changes in the visual stimulus only capture a user attention if the change falls within the set of things for which the user is looking. Even when we are distracted, we filter what distracts us.

Inhibition of Return (IOR)

Cognitive forces are in place in the human mind that make it hard to get back to work. Inhibition of return (IOR) refers to the interference that cognition experiences after being distracted. Interestingly, while it is relatively easy to facilitate the transfer of attention from one object to another, it can take considerably longer to return to a previous point after distraction. Blink tags make a page incredibly difficult to read, because they repeatedly divert the user’s attention and cause them to stumble back to where they were.


The word ‘interaction’ in Human Computer Interaction implies an almost conversational interplay between human and machine. The action that the user employs to communicate with the computer is closely tied to the way that their attention is directed. The model of response activation holds that: “Responses are activated to attended stimuli regardless of the nature of attentional dedication. Each stimulus that matches the physical characteristics established in the response set captures attention and, as a result, activates an independent response process.”(ibid 34) If a user undergoes a certain stimulus, they will ‘remember’ and quickly transition to a set of responses that they have connected to that stimulus. Conversely, if the stimulus is unfamiliar the responses will be sluggish and non-intuitive. It is similar hearing three notes of a song and “knowing” the rest. A design that resonates with a user will dramatically decrease increase their ability to interact with a system.

The context of the target is also important. Finding a Microsoft Word document in the midst of a folder containing interspersed Word and Excel documents is more difficult than simply finding the Word document amongst other Word documents. The mere presence of a single Excel document is more jarring than any of the ‘wrong’ Word documents.

No comments: