Human are very good at recognising novel objects at one retinal location to other varying 9 to 18 degree of freedom.

Current days CNN models support highly restricted on-line translation tolerance having a dependency over their trained tolerance from training dataset.

Behavioral studies of translation tolerance

Fig 1: Behavioral studies of translation tolerance

What causes human to perform extreme on-line translation tolerance is still to find.

Human high-level visual systems has extreme on-line translation tolerance, to perform same level of cognitive task its necessary to determine how could on-line invariance be integrated within vision models e.g. CNN variants.

Using GAP in CNN, the receptive fields of the neurons at the final layer of the model cover 100% of the pixel space.

Comparison of accuracy scores Using GAP(5) & without GAP(others)

Fig 2: Comparison of accuracy scores Using GAP(5) & without GAP(others)

Large receptive fields in human visual system give human to have a greater degree of visual translation tolerance.

Accuracy Using GAP over no-GAP

Fig 3: Accuracy Using GAP over no-GAP

Notes:

GAP: Global Average Pooling is a pooling operation designed to replace fully connected layers in classical CNNs. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.

Global Average Pooling

Fig 4: Global Average Pooling

More Study Resources:

  1. The visual system supports online translation invariance for object identification by Bowers, J. Vankov, I. & Ludwig C. in Psychonomic Bulletin Review.
  2. A quantifiable testing of global translation invariance in convolutional and capsule networks by Qi, W.
  3. Extreme Translation Tolerance in Humans and Machines by Ryan Blything, Ivan Vankov, Casimir J. Ludwig ,​ Jeffrey S. Bowers in 2019 Conference on Cognitive Computational Neuroscience.
  4. Early differential sensitivity of evoked-potentials to local and global shape during the perception of three-dimensional objects by Leek, E. C., Roberts, M. V., Oliver, Z. J., Cristino, F., & Pegna, A. in Neuropsychologia.