Deep networks have two main differences with 'normal' networks.
The first is that computational power and training datasets have grown immensely, meaning that it's practical to run larger networks and statistically valid (that is, we have enough training examples that we won't just run into over-fitting problems with larger networks).
The second is that back propagation is limited the more layers you have; each layer represents a gradient of the error, and so by the time one is about six layers deep there isn't much error left to modify the neuron weights. But one might reasonably expect earlier neurons to be more important than later neurons, since they represent 'concepts' that are closer to the raw inputs.
New training techniques sidestep this problem, typically by doing unsupervised learning on the raw inputs, creating higher-level 'concepts' that are then useful as inputs for supervised learning.
(For example, consider the problem of determining whether or not an image contains a cat from the pixels. The early layers of the network should be doing things like detecting edges, which one could expect to be shared among all images and mostly independent of what one is trying to do with the output layers, thus also hard to train through 'cat-not cat' signals many layers up.