AIE

What are the learning limitations of neural networks trained with backpropagation?

asked 3 months ago

858

In 1969, Seymour Papert and Marvin Minsky showed that Perceptrons could not learn the XOR function.

This was solved by the backpropagation network with at least one hidden layer. This type of network can learn the XOR function.

I believe I was once taught that every problem that could be learned by a backpropagation neural network with multiple hidden layers, could also be learned by a backpropagation neural network with a single hidden layer. (Although possibly a nonlinear activation function was required).

However, it is unclear to me what the limits are to backpropagation neural networks themselves. Which patterns cannot be learned by a neural network trained with gradient descent and backpropagation?

neural-networks

machine-learning

backpropagation

universal-approximation-theorems

computational-learning-theory

2 Answers

AIE

• answered 3 months ago

While I'm not familiar with any explicit statements regarding what a Multilayer Perceptron (MLP) cannot learn, I can provide some further detail on the positive statements you made about MLP capabilities:

A MLP with a single hidden layer is capable of what is commonly termed 'Universal Function Approximation', i.e. it can approximate any bounded continuous function to an arbitrary degree of accuracy. With two hidden layers, the boundness restriction is removed [Cybenko, 1988].

This paper goes on to demonstrate that this is true for a wide range of activation functions (not necessarily nonlinear). 3 layer MLPs are also capable of representing any boolean function (although they may require an exponential number of neurons).

See also this interesting answer on CS SE about other Universal approximators.

AIE

• answered 3 months ago

Multilayer Perceptron (MLP) can theoretically approximate any bounded, continuous function. There's no guarantee for a discontinuous function. There are plenty of important discontinuous functions, like, say, the prime counting function.

The prime counting function $pi(n)$ is simply equal to the number of primes less than or equal to $n$. It has a discontinuity about each prime $p$, so good luck trying to approximate this with a neural network!

However, this function is extensively studied and extremely important in number theory. See the Riemann hypothesis.

What are the learning limitations of neural networks trained with backpropagation?

2 Answers

Write your answer here