While I'm not familiar with any explicit statements regarding what a Multilayer Perceptron (MLP) cannot learn, I can provide some further detail on the positive statements you made about MLP capabilities:
A MLP with a single hidden layer is capable of what is commonly termed 'Universal Function Approximation', i.e. it can approximate any bounded continuous function to an arbitrary degree of accuracy. With two hidden layers, the boundness restriction is removed [Cybenko, 1988].
This paper goes on to demonstrate that this is true for a wide range of activation functions (not necessarily nonlinear). 3 layer MLPs are also capable of representing any boolean function (although they may require an exponential number of neurons).
See also this interesting answer on CS SE about other Universal approximators.