<AI>Devspace

How can the generalization error be estimated?

clock icon
asked 3 weeks ago
message icon
3
eye icon
10.2K

How would you estimate the generalization error? What are the methods of achieving this?

3 Answers

Generalization error is the error obtained by applying a model to data it has not seen before. So, if you want to measure generalization error, you need to remove a subset from your data and don't train your model on it. After training, you verify your model accuracy (or other performance measures) on the subset you have removed since your model hasn't seen it before. Hence, this subset is called a test set.

Additionally, another subset can also be used for parameter selection, which we call a validation set. We can't use the training set for parameter tuning, since it does not measure generalization error, but we can't use the test set too since our parameter tuning would overfit test data. That's why we need a third subset.

Finally, in order to obtain more predictive performance measures, we can use many different train/test partitions and average the results. This is called cross-validation.

Error Estimation is a subject with a long history. The test-set method is only one way to estimate generalization error. Others include resubstitution, cross-validation, bootstrap, posterior-probability estimators, and bolstered estimators. These and more are reviewed, for instance, in the book: Braga-Neto and Dougherty, "Error Estimation for Pattern Recognition," IEEE-Wiley, 2015.

It's basically not possible to test besides some empirical experiments. All the generalization bounds only apply if your process actually follows the model assumptions which you don't actually know to be true.

1

Write your answer here