The difference is mostly in the number of layers.
For a long time, it was believed that "1-2 hidden layers are enough for most tasks" and it was impractical to use more than that, because training neural networks can be very computationally demanding.
Nowadays, computers are capable of much more, so people have started to use networks with more layers and found that they work very well for some tasks.
The word "deep" is there simply to distinguish these networks from the traditional, "more shallow" ones.