Abstract
Eye gaze is regarded as a promising interaction modality in extended reality (XR) environments. However, to address the challenges posed by the Midas touch problem, the determination of selection intention frequently relies on the implementation of additional manual selection techniques, such as explicit gestures (e.g., controller/hand inputs or dwell), which are inherently limited in their functionality. We hereby present a machine learning (ML) model based on the Bayesian framework, which is employed to predict user selection intention in real-time, with the unique distinction that all data used for training and prediction are obtained from gaze data alone. The model utilizes a Bayesian approach to transform gaze data into selection probabilities, which are subsequently fed into an ML model to discern selection intentions. In Study 1, a high-performance model was constructed, enabling real-time inference using solely gaze data. This approach was found to enhance performance, thereby validating the efficacy of the proposed methodology. In Study 2, a user study was conducted to validate a manual-free technique based on the prediction model. The advantages of eliminating explicit gestures and potential applications were also discussed.