Visual object tracking
Learning Policies for Adaptive Tracking with Deep Feature Cascades
- Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features.
- Formulate the adaptive tracking problem as a decision-making process.
- Learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network.
- Significantly reduces the feedforward cost.
- Train the agent offline in a reinforcement learning fashion.
- Obviously, the major computational burden comes from the forward pass through the entire network, and can be larger with deeper architectures.
- However, when the object is visually distinct or barely moves, early layers are in most scenarios sufficient for precise localization - offering the potential for substantial computational savings.
- The agent learns to find the target at each layer, and decides if it is confident enough to output and stop there.