Acalytica exists to help you understand your customer better. We serve academia in helping them understand their students better and at the centre of it all lies understanding how they learn.
How do people learn? How can we enhance learning performance? These are some of the questions that we have dedicated our lives to contribute to solving them.
When we learn something new, like a language or musical instrument, we often seek challenges at the edge of our competence—not so hard that we are discouraged, but not so easy that we get bored. This simple intuition, that there is a sweet spot of difficulty, a ‘Goldilocks zone’, for motivation and learning is at the heart of modern teaching methods and is thought to account for differences in infant attention between more and less learnable stimuli.
In the animal learning literature it is the intuition behind shaping and fading, whereby complex tasks are taught by steadily increasing the difficulty of a training task. It is also observable in the nearly universal ‘levels’ feature in video games, in which the player is encouraged, or even forced, to a higher level of difficulty once a performance criterion has been achieved. Similarly in machine learning, steadily increasing the difficulty of training has proven useful for teaching large scale neural networks in a variety of tasks, where it is known as ‘Curriculum Learning’ and ‘Self-Paced Learning’.
Despite this long history of empirical results, it is unclear why a particular difficulty level may be beneficial for learning nor what that optimal level might be. In this paper we address this issue of optimal training difficulty for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be classified into one of two classes (e.g., cat or dog).
In particular, we focus on the class of stochastic gradient-descent based learning algorithms. In these algorithms, parameters of the model (e.g., the weights in a neural network) are adjusted based on feedback in such a way as to reduce the average error rate over time. That is, these algorithms descend the gradient of error rate as a function of model parameters. Such gradient-descent learning forms the basis of many algorithms in AI, from single-layer perceptrons to deep neural networks, and provides a quantitative description of human and animal learning in a variety of situations, from perception, to motor control to reinforcement learning. For these algorithms, we provide a general result for the optimal difficulty in terms of a target error rate for training. Under the assumption of a Gaussian noise process underlying the errors, this optimal error rate is around 15.87%, a number that varies slightly depending on the noise in the learning process. That is the optimal accuracy for training is around 85%. We show theoretically that training at this optimal difficulty can lead to exponential improvements in the rate of learning. Finally, we demonstrate the applicability of the Eighty Five Percent Rule to artificial one- and two-layer neural networks, and a model from computational neuroscience that is thought to describe human and animal perceptual learning.
You can get the full paper on https://www.nature.com/articles/s41467-019-12552-4