ABSTRACT
The integration of Visual Question Answering (VQA) and eye gaze tracking enhance human computer interaction system and activity detection. The system processes an image and a question, and the pattern of the eye gaze yields the answer. As in a real-world scenario, the user views an image and the related question and gaze data by which their answer is inferred. The gaze is analyzed and compared to the right answer and they are shown if it is right or wrong. As a result, this method illustrates how combining VQA and eye gaze tracking can be useful to increase user interaction and better understand such practical scenarios as activity and education-based learning.
