Spatial Navigation by Seated Users of Multimodal Augmented Reality Systems

When seated users of multimodal augmented reality (AR) systems attempt to navigate unfamiliar environments, they can become disoriented during their initial travel through a remote environment that is displayed for them via that AR display technology. Even when the multimodal displays provide mutually coherent visual, auditory, and vestibular cues to the movement of seated users through a remote environment (such as a maze), those users may make errors in judging their own orientation and position relative to their starting point, and also may have difficulty determining what moves to make in order to return themselves to their starting point. In a number of investigations using multimodal AR systems featuring realtime servo-controlled movement of seated users, the relative contribution of spatial auditory display technology was examined across a variety of spatial navigation scenarios. The results of those investigations have implications for the effective use of the auditory component of a multimodal AR system in applications supporting spatial navigation through a physical environment.


Introduction
Users of multimodal augmented reality (AR) systems can be given first-hand experiences of navigation through unfamiliar environments while they are seated comfortably in a fixed location, even though the AR display technology includes only a relatively small screen for delivering the visual component of the experience. In fact, visual stimuli that are displayed only within a restricted spatial region (e.g., a display subtending horizontally less than 40 o visual angle), can enable human spatial orientation to be fully engaged; however, spatial orientation can be usefully enhanced by including auditory stimuli coordinated with body-based sensory information (e.g., via vestibular, proprioceptive, and/or haptic stimulation). Since the early years of AR technology development in the early 1990s (and its conceptualization, as described in [1]), a central feature of AR systems has been the realtime tracking of the user's bodily position and motion, enabling the immediate spatial adjustment of both auditory and visual display components according to changes in the user's viewpoint. Recently a great deal of interest has developed in AR systems that include servo-controlled movement of seated users to enable the superposition of additional translational and rotational components on the user's bodily motion, with the spatial adjustment of both auditory and visual components determined by the combined changes in the user's viewpoint (see, for example, [2], in which it is argued that translation combined with rotation make for efficient virtual locomotion accompanied by perceived self-motion).
Whereas some AR display systems are developed for mobile applications, the systems discussed in the current paper are more appropriately described as "Locationbased." At the outset of this paper's treatment of ARaided spatial navigation, some clarification of the use of the term AR in this context is required. The seated user of a multimodal AR system is surrounded by actual objects and events that are experienced first-hand (so to speak), while simultaneously being presented with virtual objects and events. Thus, users receive sensory inputs that are associated with objects and events that are actually present and occur within their local physical environment, while they are presented with sensory inputs that depend upon computer-mediation. Ideally, the computer-mediated sensory inputs depend upon a user's actions and local movements in just the same manner as they would were the virtual objects and events located within the physical environment through which that user moves. When what is displayed reacts to small changes in a user's spatial orientation in a natural manner, the virtual objects and events are experienced as if apprehended first-hand, which is the expected result given seamless superposition of virtual upon actual. Such seamless superposition of virtual objects and events upon actual environments requires timely coupling of tracked user motion and displaying of virtual objects through computer-mediated modification of sensory inputs according to motions caused by a user's motor commands (i.e., voluntary movements), completing a coherent experience.
The emphasis upon voluntary movements here is intended to underscore the importance of the active listening process in which seated users of AR systems typically engage. This process drives the spatial perception of users who experiences the small changes in sensations expected with small changes in their orientation, even when those users are seated (i.e., even when users are not actually engaged in locomotion, but executing voluntary shifts in head position and exhibiting typical head rotations). It has been well established that such voluntary movements that listeners make while walking do enable improved sound localization performance when presented with sound stimuli arriving from directions that otherwise would be ambiguous without those movements [3]. For example, the incidence angle of a sound that is arriving from a frontward versus a rearward direction is easily confused without such head movements, but this front/back ambiguity is readily resolved by small turns of the listener's head. Small rolling of the listener's head also can aid in resolving above/below ambiguities [3]. This active listening process is important in spatial orientation and navigation because environmental sounds can serve as stable landmarks that can keep individuals from losing their sense of direction, and they can keep them from "getting lost" in an unfamiliar environment. The following two sections of this paper discuss two fundamental factors in spatial orientation that provide the keys to successful spatial navigation by seated users of multimodal AR systems, the first being sensorimotor contingencies, and the second being the users sense of direction.

Sensorimotor contingencies and AR
Indeed, it has long been argued that the most successful AR systems will support sensorimotor contingencies [4], as described in the above paragraph. Such an important point should be reinforced here, beginning with this introductory statement: Variation in sensory stimulation should accompany the motor commands of the AR system user that result in bodily movements that would naturally change sensory stimulation for the user in anticipated ways. The predictability of the movement-coupled changes in sensory stimulation creates a conscious experience of a simulated physical reality that obeys rules in ways that would be expected in an actual physical reality. Note that such natural sensorimotor contingencies support more than just successful local interactions with virtual objects, as these sensorimotor contingencies also contribute to the system user's sense of immersion and the experience of presence in an environment containing computer-mediated virtual objects and events. In the context of spatial navigation by seated users of AR systems, it is thought that the user's sense of being in a in a real place will influence other relevant subjective attributes, such as the user's sense of direction. It will furthermore be argued in this paper that spatial navigation through an unfamiliar environment will be influenced strongly by the perception of virtual objects and events that behave in consistent and reliable ways.
It was probably Slater [4] who most clearly stated the related hypothesis that the illusion of being in a place combines with "plausibility" in producing realistic behaviour in immersive virtual environments. Slater [4] termed the underlying illusion here the Place Illusion (PI), and pointed out that this illusion is constrained by the sensorimotor contingencies afforded by the virtual reality system. The work described in the current paper was founded upon the hypothesis that PI, as defined by Slater [4], plays a central role in AR-based aids to spatial navigation. The reasoning supporting such a hypothesis is based on the relative fragility of the user's sense of direction. What is meant here by the term fragility, is that users often do lose their sense of direction when attempting to navigate through virtual environments, and that this happens in unexpected and perplexing ways, as described in the next section of this paper.

Sense of direction
Regarding the ways in which individuals lose their sense of direction, and with regard to the interesting experiences that can be associated with "getting lost," the interested reader is directed to an "Inner Navigation" book that focusses on these observations in depth. In the introduction to the book [5, page 7], the co-author Don Norman explained as follows: "Once upon a time, a strange man wandered into my office at the University of California, San Diego. Erik Jonsson was his name, and navigation was his game. 'I think people have an inner compass,' he told me, 'and when it goes wrong, the most amazing things happen.' And for over an hour he told me tales of confusion…" The book the two co-authored on "Inner Navigation" [5] demonstrates that we humans do seem to exhibit an implicit sense of direction from which we benefit even though we may have no conscious awareness of it. Indeed, we appear to rely on an inner compass to let us know the proper direction in which to go to reach a desired destination, but if we are asked to explain how we know the proper direction, we typically have no clear answer. Rather, individuals can generate plausible explanations for their sense of direction, and do so even when their inner compass is failing to aid them in their spatial navigation. It is truly enigmatic, this apparently amodal sense that exists without an immediate external sensory stimulus (no visual or auditory landmark needs to be present to support its existence).
The grand questions that motivated the writing of the "Inner Navigation" book [5] were the same as those which motivated this paper: Why do we get lost and how do we find our way? These questions could be given more traction for advancing knowledge about spatial navigation if they were stated as empirical questions of what makes us get lost and what improves spatial navigation. Therefore, these two questions are framed here in terms of testable hypotheses that stimulated the current research: 1. Under what testing conditions does behavior reveal the failure of the putative inner compass to guide individuals to their target destination? 2. Under what stimulus conditions do individuals best find their way to that destination?
As is so often the case when such grand questions become research questions, the issue of individual differences between participants in the research must be addressed. A recent review by Condon [6] concluded that individual variation in measured sense of direction is well explained by variation in personality traits, most strongly the personality trait of conscientiousness (a psychological construct measured in terms of attention to detail, organization, and diligence). Even by the early 1980s, there was some consensus that individuals differ substantially both in their self-reported confidence in their own sense of direction, and also in how much they worry about becoming lost [7]. Since that time, there also has been a great deal of data collected [see, e.g., [8]) showing that individuals certainly differ in their abilities to indicate the cardinal directions (N, S, E, W).
Although individual differences in these abilities clearly exist [8], a comprehensive treatment of such individual differences in sense of direction are beyond the scope of this paper, there is a clinical neurological syndrome that has been identified as topographical disorientation (TD). This is a disorder that is characterized by impaired orientation and navigation in real-world environments [9]. As a comprehensive a review of the substantial literature on the neuroscientific study of spatial cognition also is beyond the scope of this paper, it would be of value here to provide a link between the experience of getting lost and what is known about the neuroscience of spatial orientation. The reader is referred to the book by Dudchenko [9] for an exploration of that link in detail. For example, some individuals suffering from TD appear to have lost their inner compass after suffering traumatic brain injuries, such as lesions to the temporal lobe of the cortex. The Dudchenko [9] book that addresses the question "Why people get lost" gives a specific meaning to the term "lost" by contrasting it with its converse, which is the normal "spatial orientation" that is possessed by most people with a good sense of direction. So, when someone is unable to find their way in a large-scale space in which they might normally be expected to be able to find their way, lay language may say they are simply lost, but as described above, the neurological literature uses the term TD (topographical disorientation) to denote this inability to way-find in such spaces. The remainder of this paper is dedicated to the recounting of investigations of the authors into spatial navigation experiences that are provided for participants with apparently normal "spatial orientation" such as that possessed by most people with a good sense of direction, but experience disorientation when presented with spatial navigation problems using computer-mediated augmentation of directional cues that can be provided by AR systems.

Spatial navigation using AR
In the introduction of this paper, it was observed that in the early years of AR technology development, the literature (e.g., [1]) supported the notion that realtime tracking of the user's bodily motion was a central feature of AR systems. By enabling the immediate spatial adjustment of both auditory and visual display components according to changes in the user's viewpoint, AR systems create synthetic experiences that are superposed upon direct experience of the actual environment. Although it may be quite apparent to these seated users of AR system that they are not actually moving through space, the sensations they experience can be consistent with locomotion and can create strong self-motion illusions. It is very common for seated AR system users to report that they have experienced selftranslation, even though they are simultaneously aware that the seat that is supporting their whole body clearly has not translated relative to its initial position in the actual environment. These illusions have been termed synthetic experiences by Robinette [1] who gave this topic a thorough treatment in his 1992 paper presenting a taxonomy of such experiences. He intended the term synthetic experience to encompasses a wide variety of media, including virtual environments, teleoperation, applications employing a Head-mounted Display (HMD), film, the telephone, video games, and many earlier media. It was meant to be synonymous with the term technologically-mediated experience; however, it should be understood as limited to reproduced sensory experience. Robinette [1] also excluded strictly verbal descriptions such as novels and oral story-telling, as these rely on the imagination of the recipient rather than on the virtual objects and events that AR systems can present to users as reproduced sensory experiences.
This distinction becomes a bit problematic when dealing with the subjective attribute termed sense of direction, since sense of direction can be experienced in the absence of sensory stimulation. So, when seated users of AR displays are said to experience an illusion of place within which they have a clear sense of direction, they are referring to something rather enigmatic, which is only observed in a user's responses in tasks that tap their amodal sense of direction, such as the following: After a potentially confusing navigation (such as navigation through a maze), the AR system user is asked to point in the direction of their point of origin. Such tasks are frequently used in tests of spatial orientation, and notable examples of in-depth research employing seated users of AR displays can be found in the work of Riecke and colleagues (see [10] for review). One investigation that is particularly relevant to the work described in the current paper asked the question of whether active control and user-driven motion cueing can enhance selfmotion perception.
Because the investigations described in the current paper employed motion platforms to create sensations of self-motion for seated users of AR systems, the work of Riecke, et al. [11] has been very influential on current work, posing the question, as they have, "To Move or Not to Move." From this point then, the scope of this paper is limited to the discussion of two multimodal augmented reality (AR) systems that have been developed and investigated in the laboratories of the authors. Although both systems provided for active movement of seated users, neither system allowed for the testing of mobile applications, since both of these "Seated AR" systems were locked in position within their respective laboratories. One system, the Schair [12], provided realtime rotation of the seated user about only a single vertical axis, and therefore could vary only the yaw angle of the user relative to an external reference direction. The other system, here identified as the DBOX platform, enabled realtime rotation of the seated user about two axes other than the vertical axis to provide variation in the pitch and roll angles of the user. Originally developed to provide custom motion programs for multimedia display of action-oriented DVD films (see [13]), the DBOX platform also provided user translation along the vertical axis. Thus, for this latter system (based upon the DBOX platform), one translational dimension of linear motion, conventionally termed heave, was combined with two dimensions of angular motion to enable a 3-Degree-Of-Freedom (3DOF) motion display. These two systems have in common that they both provide body-based sensory information that is associated with self-movement. Although both systems provide for active movement of seated users, neither system allowed for the testing of mobile applications, since both were locked in position within their respective laboratories.

Body-based sensory information
Sensory information associated with self-movement can be divided into three categories: external (vision, audition, somatosensory), internal (vestibular, kinesthetic), and efferent (efference copy, attention). However, in most cases several sensory sources simultaneously contribute to our spatial knowledge, and thus experimenters cannot examine them separately [14]. For that reason, the term" body-based sensory information" has been widely used in spatial cognition research, referring to the amalgam of vestibular, kinesthetic, proprioceptive, and haptic stimulation, but including in addition to these afferent information sources, the influence of with efferent information (associated with voluntary motor commands). When users locomote through an environment, their ability to update their own selfposition and orientation with little cognitive load is described as automatic spatial updating in cognitive science [15]. In VR however, when an abundance of naturalistic landmarks are provided, a participant's spatial updating doesn't depend upon physical motion cues [15]. Yet, if such visual landmarks are missing and people cannot automatically re-orient, body-based sensory information becomes more relevant. The next two sections of this paper present investigations featuring realtime rotation of seated users about bodycentric axes, such as the vertical axis that extends along the midline from above to below the user).

Rotation of the seated user
One of the two systems discussed here, the Schair [12], rotated a seated AR system user about a single vertical axis, thereby varying only the yaw angle of the user relative to an external reference direction. And yet, in simulated navigation on a watery surface, the synthetic experience of speed-boat travel that was created was quite plausible. Interestingly, despite the existence of a noticeable lag in the servo-controlled rotation relative to the steering of the simulated speed boat, the illusion of proper sensorimotor coordination was not broken, as the experience of imperfect coupling of user-commands and body-based sensation met the expectations of seated users as the Schair was turned to the left and right. In this case, the featured "kyotei" (Japanese racing motorboat) simulator was forgiving of lag between visual bearing and rotary motion platform yaw, since such delay could be plausibly attributed to an understeered tiller, maritime momentum, and sluggish "coming about." Although the Schair [12] did not directly present any body-based sensation intended to support immediate illusions of translation, the sensations of rotation were enough to convince the user of forward motion, and so the illusion of speed-boat travel was quite successful. Other research [2] that has examined the role of rotational information in the support of the translational component of self-motion has used similar configurations for simulating motion for the seated user, but allowed for comparison of control methods with varying levels of translational cues and control, such as upper-body leaning (while sitting on the so-called "NaviChair," and leaning the upper-body to locomote). Results of studies using the "NaviChair" [2] suggest that alternative body-based display components tapping into the leaning of a seated user of the Schair [12] could enhance the spatial navigation experience of users. An important question here is that regarding how tight the command-response coupling must be for the sensorimotor contingencies to support the greatest illusions of place and maintain the most useful sense of direction. Similar questions can be asked regarding how tight coupling must be for supporting effective selfmotion illusions. These questions are addressed in the next section of this paper that describes the outcomes of investigations featuring realtime pitching and rolling of seated users.

Pitching and rolling the seated user
The DBOX platform used in the current investigations enabled realtime rotation of the seated user about two axes other than the vertical axis to provide variation in the pitch and roll angles of the user. Rotation about the interaural axis (the axis that passes through the user's ears) is typically named pitching, while rotation about the longitudinal axis (the axis that passes through the center of the user's head from anterior to posterior). It might aid the reader to visualize these two angular rotations through reference to the rotations typically experienced during vehicular travel. For example, when an airplane initiates a banking turn, passengers are briefly rolled to one side, and then they are rolled to the other side when the airplane is returned to straight-and-level flight.
In contrast to such brief variations in roll angle commonly experienced in airplane travel, when an automobile briefly accelerates in the forward direction, passengers briefly pitch backwards, and then pitch forwards if the brakes are applied abruptly. It was this braking scenario that inspired a study of self-motion perception [16] using the DBOX platform described in this section. First, it should be noted that self-motion typically is perceived when watching action-oriented films displaying first-person movement through an environment, such as that commonly shown during a high-speed car chase. Second, it should be noted that perceived self-motion typically is enhanced by pitching observers forward and back in their chairs as they watch a first-person high-speed car-chase scene in a film. This enhancement is readily demonstrated using the DBOX platform, and scenes depicting rapid accelerations and abrupt slamming on the brakes make for one of the preferred demonstration pieces for promoting the sale of such motion platforms. Finally, it should be noted that the soundtracks of action-oriented films displaying such scenes typically feature sound effects that support the illusions of self-motion for observers.
Noting that multichannel film soundtracks often include sounds arriving from stationary sources in the car-chase scene that fly by the moving observer from frontward to rearward locations, these so-called "fly-by" sounds could be coordinated in time with whole-body motion that is delivered to the observer via the DBOX platform, and such a multimodal display does enhance self-motion perception of seated users of the system. Indeed, observers report that custom-generated motion programs induce strong perceptions of self-motion, but controlled experimental studies [13] also have shown an increase in the reported sense of realism, sense of presence, and global preference for action-oriented films that include custom-designed motion-platform movement in multimodal playback. Prior to that study [13], there had been relatively little work done on the most effective integration of motion platforms into home entertainment oriented audiovisual content display, although there had been a significant body of research on how to create motion-cues for users of flight simulation technology via vestibular stimulation [17]. It should also be noted that non-rotary motion platforms (like the DBOX platform that has been deployed in many cinema internationally) employ the same "wash-out" of rapid acceleration that is used in flight simulation technology. Indeed, all such platforms need to slyly drift back into poised "cocked" position in order to service the anticipated jolt of rapid acceleration. Such surreptitious resets are similar to optical illusions of simultaneous contrast, wherein a gradient stealthily slides away from an extreme to allow sudden snap back.
The question then should be raised regarding how tight the coupling must be between audio and motion components for these multimodal presentations support the most effective self-motion illusions. A follow-up study using the DBOX platform [18] addressed this question directly. The precision of temporal synchrony between passive whole-body motion and auditory spatial information was determined for a number of simulated velocities via a multimodal time-order judgment task. That study's results suggested that sensory integration of auditory motion cues with wholebody movement cues could occur over an increasing range of intermodal delays as the velocity of virtual sound source motion was decreased. For simulated sound source velocities of less than 10 meter/sec, the platform motion could lag behind the audio motion by nearly 150 ms (see [18] for more details).

Discussion
While systematic studies employing seated users of the two multimodal AR systems introduced in this paper are still incomplete, the experimental questions that are being addressed are worth discussing here. One such experimental question that specifically addresses the sensorimotor loop that is central to effective AR is the following: When attempting to navigate an unfamiliar environment, does the AR system user become less disoriented during simulated travel when multimodal displays provide mutually consistent visual, auditory, and vestibular cues to the movement of seated users through a remote environment (such as a maze). There is clear anecdotal evidence that access to salient auditory landmarks users make fewer errors in judging their own orientation and position relative to their starting point. Indeed, in a gaming scenario, users without salient auditory landmarks may have difficulty determining what moves to make in order to return themselves to their starting point. In contrast, when users are presented with an auditory landmark that provides a constant reminder of the direction in which their starting point is located, those users typically display much more spatial navigation proficiency. A concrete example of this phenomenon is available to players of the 1996released video game "Super Mario 64" for the Nintendo 64, which the first game in the Super Mario series to feature full-3D gameplay.
The first author of this paper can recall a second-person experience of a failure in his own sense of direction while controlling the Mario character navigating through the "Toxic Maze" in that "Super Mario 64" game. What was most interesting about that experience at that time was that the failure in the player's sense of direction could be easily repaired by the use of stereo headphones rather than loudspeakers. This was because the game included interactive spatial audio that was effectively coupled with the motion of the Mario character through the game environment. The reader should forgive this paper's authors for now switching the language used in the remainder of this section to the first-person perspective. The following text is from the first author's notes documenting his realization at the time.

A first-person recollection
I had a realization about spatial auditory cues to selfmotion that could aid in spatial orientation (and how simulated sensory information might be employed to aid in spatial navigation through a maze). In the following example, I emphasize that changes in sensory information need not be experienced first-hand, since mouse movements controlling a visualized avatar (e.g., the Mario character in the video game) can afford a very natural spatial navigation experience (still taking advantage of human capacities for sensorimotor integration). In the Castle Courtyard of the game was a fountain, the watery sound of which could be heard from a distance (even when you were inside the Toxic Maze!). Those who are familiar with the "Super Mario 64" game will no doubt appreciate the following recount of my personal experience in spatial navigation within the game. My experiences revealed to me the following: Without the sound of the fountain in the Castle Courtyard as a sonic beacon, I could not trust my sense of direction after making only a few turns in the Toxic Maze with only visual cues to my changes in my orientation (in contrast to the related findings of Reicke, et al, [14]). To navigate the maze, and return to the fountain, I could turn towards its watery sound while inside the Toxic Maze, and I learned how to associate visual details with my orientation at each choice point (sometimes turning to take myself away from the fountain, putting it briefly behind me until I reached the appropriate point for the turn back towards the fountain.

Final observations
Given the above first-person recounting of the first authors spatial navigation experiences in a gaming context, a number of other research questions present themselves and will serve as a conclusion for this section. One obvious research question that presents itself is that regarding how important immediate vestibular sensation might be in closing the sensorimotor loop, since motor command (controlling the avatar) seemed to be carrying adequate self-motion information without first-hand self-motion sensations that could be gained if the user were to execute physically the same turning motions.
A good complementary research question is just how much vestibular information alone could improve a user's spatial orientation if they were actually (though perhaps passively) moved, without the accompanying motor commands that would naturally drive anticipated sensory changes as the sensorimotor loop is closed. It is tempting to hypothesize that visual and auditory cues to self-motion are adequate in this context, although there is evidence [19] that vibrotactile cues might also enhance the less powerful auditory induced self-motion perception and illusion of place (with the associated sense of presence). What can be concluded about the role of auditory cues to self-motion that were investigated in studies described in this paper that employed the Schair [12] and the DBOX platform also [13][16] [19]?
In those described investigations using multimodal AR systems that featured servo-controlled movement of seated users, the clear interaction of spatial auditory dispaly with motion-platform-based user motion was revealed in a number of spatial navigation scenarios. The results of those investigations have clear implications for the effective use of the auditory component of a multimodal AR system in applications supporting spatial navigation through a physical environment. Further research is under way to test the relative salience of auditory landmarks in aiding spatial navigation of seated users in second-person as well as first-person scenarios using two different tasks. One task is that which has been termed the "return to pointof-origin" task. The other uses a measure of "point-toorigin" skill to tap into a user's sense of direction as it develops through exposure to AR displays.