Weijie Kong's Homepage

Nov 25, 2017

Visual object tracking

Learning Policies for Adaptive Tracking with Deep Feature Cascades
- Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features.
- Formulate the adaptive tracking problem as a decision-making process.
- Learn an agent to decide whether to locate objects with high conﬁdence on an early layer, or continue processing subsequent layers of a network.

Signiﬁcantly reduces the feedforward cost.
Train the agent ofﬂine in a reinforcement learning fashion.
Obviously, the major computational burden comes from the forward pass through the entire network, and can be larger with deeper architectures.
However, when the object is visually distinct or barely moves, early layers are in most scenarios sufﬁcient for precise localization - offering the potential for substantial computational savings.
The agent learns to ﬁnd the target at each layer, and decides if it is conﬁdent enough to output and stop there.