Formally, a single hidden layer is sufficient to approximate a continuous function to any desired degree of accuracy, so in that sense, you never need more than 1. This is called the Universal Approximation Theorem.
Finding the best topology for a given problem is an open research problem. As far as I know, there are few universal 'rules of thumb' for this.
For a given problem, one option is to apply a neuroevolutionary approach such as NEAT, which attempts to find a topology that works well for the problem at hand.