Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same actor across frames via linking, associating, tracking to generate spatial-temporal continuous action paths. To achieve the target, a novel actionness estimation method is firstly proposed by utilizing both human appearance and motion cues. Then, the association of the action paths is formulated as a maximum set coverage problem with the results of actionness estimation as a priori as in [1]. To further promote the performance, we design an improved optimization objective for the problem and provide a greedy search algorithm to solve it. Finally, a tracking-by-detection scheme is designed to further refine the searched action paths. Extensive experiments on two challenging datasets, UCF-Sports and UCF-101, show that the proposed approach advances state-of-the-art proposal generation performance in terms of both accuracy and proposal quantity.

@inproceedings{li2016searching,
  title={Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking},
  author={Li, Nannan and Xu, Dan and Ying, Zhenqiang and Li, Zhihao and Li, Ge},
  booktitle={Asian Conference on Computer Vision},
  pages={384--399},
  year={2016},
  organization={Springer}
}

arXiv
Springer First Online: 10 March 2017

The framework of our action proposal generation approach