We are getting there, with as usual some trade-off between quality and speed.
For example Lecture 8: Spatial Localization and Detection lecture shows some benchmarks (mAP = Mean Average Precision, higher is better; FPS = frame per second):
Convolutional neural network are leading type of feed-forward artificial neural network for image recognition. Can they be used for real-time image recognition for videos (frame by frame), or it takes too much processing (assuming they're written in C-like language)?
For example for classification of type of animals based on the training from huge dataset.
We are getting there, with as usual some trade-off between quality and speed.
For example Lecture 8: Spatial Localization and Detection lecture shows some benchmarks (mAP = Mean Average Precision, higher is better; FPS = frame per second):
1