This document is a technical report detailing the development of a novel attention detection system for virtual assistants (VAs). The project aims to enhance user interaction by implementing a more natural interface that utilizes computer vision. The system employs a multitask cascaded convolutional neural network for face detection, achieving a true positive rate of 95.04%, and a convolutional neural network for attention classification with an accuracy of 97.2%. The methodology includes dataset generation, face detection, and attention classification processes, which are described in detail. The attention detection pipeline has been integrated into a web application that simulates a virtual assistant. The report discusses the results, noting that while the system performs well in controlled environments, challenges arise in unfamiliar conditions such as poor lighting or varying angles. Future work aims to improve the generalizability of the attention classifier by expanding the dataset and enhancing performance in diverse conditions.