Action Unit Detection

We propose a convolutional neural network approach to address the fine-grained recognition problem of multi-view dynamic facial action unit detection. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, with an absolute improvement of 14% on the F1-metric. Additionally, it compares favorably against the winner of the FERA 2017 challenge. Moreover, we improve the state-of-the-art in the challenging dataset BP4D with a F1 performance of 63%.
Figure 1. Overview of AUNets. Our system takes as input a video of a human head and computes its optical flow field. It predicts the viewpoint from which the video was taken, and uses this information to select and evaluate an ensemble of holistic action unit detectors that were trained for that specific view. Final AUNets predictions are then temporally smoothed.

Different arrangement for the Optical Flow


Table 1. Comparison with state-of-the-art methods over BP4D dataset.
Table 2. Comparison with baseline [86] and official winner [82] of the FERA17 Challenge.
Figure 3. Zeiler’s method [49] for network visualization. The top row presents 5 different Action Units [7]. The heat maps highlight the most important regions in the human face for each specific Action Unit (blue: less important, red: more important).
Table 3. Quantitaive results of our method.
AU 1 2 4 6 7 10 12 14 15 17 23 24 Av.
F1 53.4 44.7 55.8 79.2 78.1 83.1 88.4 66.6 47.5 62.0 47.3 49.7 63.0



title={Multi-View Dynamic Facial Action Unit Detection},
author={Romero, Andr{\'e}s and Le{\'o}n, Juan and Arbel{\'a}ez, Pablo},
journal={arXiv preprint arXiv:1704.07863},
  • Address:
    Cra. 1 E No. 19A - 40. 111711, Bogotá, Colombia - Mario Laserna Building - School of Engineering - Universidad de Los Andes
  • Phone:
    [571] 332 4327, 332 4328, 332 4329