Figure 1. Overview of AUNets. Our system takes as input a video of a human head and computes its optical flow field. It predicts the viewpoint from which the video was taken, and uses this information to select and evaluate an ensemble of holistic action unit detectors that were trained for that specific view. Final AUNets predictions are then temporally smoothed.
Different arrangement for the optical flow
Results
Table 1. Comparison with state-of-the-art methods over BP4D dataset.Table 2. Comparison with baseline [86] and official winner [82] of the FERA17 Challenge.
Figure 3. Zeiler’s method [49] for network visualization. The top row presents 5 different Action Units [7]. The heat maps highlight the most important regions in the human face for each specific Action Unit (blue: less important, red: more important).