Visit StickyLock

3D Gaze Forecasting Research Advances Future AR Devices

New 3D Gaze Forecasting Might Improve AR Devices
New 3D Gaze Forecasting Might Improve AR Devices

Augmented reality devices, including smart glasses, may in the future be able to predict where users will direct their attention and render digital content before it is needed. Research led by Fiona Ryan, a PhD student in Georgia Tech’s School of Interactive Computing, focuses on tracking and forecasting user gaze from an egocentric perspective inside three-dimensional environments.

Current AR systems generally react to where a user is already looking. Ryan’s method is designed to provide those systems with advanced information about likely future gaze behaviour, allowing them to prepare content before a user shifts their attention.

Ryan is the lead author of the paper Forecasting 3D Scanpaths in Egocentric Video, which is being presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in Denver. The research studies how future gaze patterns can be predicted within a three-dimensional framework utilising first-person video data.

Previous research has studied predicting user gaze from two-dimensional still images. According to the paper, this work is the first attempt to address gaze forecasting through a three-dimensional model. The research is based on the observation that people move through three-dimensional environments and view the world from changing perspectives. As a result, the study concentrates on predicting attention paths through space rather than on a flat image.

The model follows a path of attention in three-dimensional space, as described by researchers. Instead of analysing gaze solely as points within a two-dimensional frame, the system tracks how a person’s gaze moves through a reconstructed environment as they interact with objects and move between locations.

Ryan conducted much of the research while interning at Meta. The project used data from Meta’s Aria Digital Twin dataset, which contains first-person video footage of users interacting with objects inside an apartment setting.

The dataset contains a detailed three-dimensional reconstruction of the environment. This allows researchers to establish a ground-truth three-dimensional gaze representation. Eye movements can then be traced and matched to points where gaze intersects with elements of the surrounding environment.

A demonstration of the technology shows the software following a user as they move towards a table with a cup. After the user picks up the cup, the system predicts the direction in which the user will turn next.

The research is based on how people visually examine a scene. Rather than processing every detail at once, individuals focus on specific areas through a sequence of fixations. These patterns can be influenced by the task being performed. In the example described by the researchers, a person picking up a cup may first look at the object and then direct attention towards the location where it will later be placed.

According to the research, the software can predict gaze behaviour an average of up to three seconds into the future. In some cases, predictions can extend up to 10 seconds. The study indicates that this timeframe could allow an AR system to render an enhanced environment proactively rather than responding only after a user has already shifted their attention.

The current work focuses on shorter forecasting periods. The research also considers the challenges associated with longer prediction windows. As forecasting extends further into the future, the number of possible outcomes increases, making reliable prediction more difficult. The study notes that potential future paths could diverge rapidly, which limits how far ahead gaze behaviour can reasonably be forecast from a short period of observed movement and attention.

The paper is presented as a proof of concept, and the researchers note that further work remains to be done. One area of interest includes incorporating different scenarios into future models to help reduce uncertainty and narrow the range of possible outcomes. The research also notes that some individuals maintain attention on a single object for extended periods and that understanding a person’s intended task could help identify likely future attention paths.

The study also identifies possible applications in robotics. The researchers suggest that information about where people direct their attention during tasks could be used to train algorithms for robotic systems. By examining how humans visually engage with tasks, researchers can support efforts to enable robots to learn and perform similar activities.

The paper presents a three-dimensional approach to forecasting human gaze within real-world environments. It demonstrates how user attention can be predicted in advance and explores how such forecasting could be applied to augmented reality systems, while also highlighting its potential relevance for future robotics research.

Join the Discussion


Visit StickyLock
Back to top