One of the challenges of AI is defining Intelligence. If we could precisely define general intelligence then we could program it into a computer. After all an algorithm is a process so well defined that it can be run on a computer.
Narrow AI can be evaluated on its success at achieving goals in an environment. In domains such as computer vision and speech recognition narrow AI algorithms can be easily evaluated.
Many universities curate narrow AI tests. Fei-Fei Li a professor at Stanford who directs the Artificial Intelligence lab there organises the annual ImageNet Challenge. In 2012 Geoffrey Hinton famously won the competition by building a Deep Neural Network that could recognize pictures more accurately than humans can.
To my knowledge the testers commonly use Precision and recall evaluation metrics