Human are very good at recognising novel objects at one retinal location to other varying 9 to 18 degree of freedom.
Current days CNN models support highly restricted on-line translation tolerance having a dependency over their trained tolerance from training dataset.
What causes human to perform extreme on-line translation tolerance is still to find.
Human high-level visual systems has extreme on-line translation tolerance, to perform same level of cognitive task its necessary to determine how could on-line invariance be integrated within vision models e.g. CNN variants.
Using GAP in CNN, the receptive fields of the neurons at the final layer of the model cover 100% of the pixel space.
Large receptive fields in human visual system give human to have a greater degree of visual translation tolerance.
Notes:
GAP: Global Average Pooling is a pooling operation designed to replace fully connected layers in classical CNNs. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.
More Study Resources:
- The visual system supports online translation invariance for object identification by Bowers, J. Vankov, I. & Ludwig C. in Psychonomic Bulletin Review.
- A quantifiable testing of global translation invariance in convolutional and capsule networks by Qi, W.
- Extreme Translation Tolerance in Humans and Machines by Ryan Blything, Ivan Vankov, Casimir J. Ludwig , Jeffrey S. Bowers in 2019 Conference on Cognitive Computational Neuroscience.
- Early differential sensitivity of evoked-potentials to local and global shape during the perception of three-dimensional objects by Leek, E. C., Roberts, M. V., Oliver, Z. J., Cristino, F., & Pegna, A. in Neuropsychologia.