AIEnthusiast

How does noise affect generalization?

asked 4 months ago

1.4K

Does increasing the noise in data help to improve the learning ability of a network? Does it make any difference or does it depend on the problem being solved? How is it affect the generalization process overall?

3 Answers

AIEnthusiast

• answered 4 months ago

We typically think of machine learning models as modeling two different parts of the training data--the underlying generalizable truth (the signal), and the randomness specific to that dataset (the noise).

Fitting both of those parts increases training set accuracy, but fitting the signal also increases test set accuracy (and real-world performance) while fitting the noise decreases both. So we use things like regularization and dropout and similar techniques in order to make it harder to fit the noise, and so more likely to fit the signal.

Just increasing the amount of noise in the training data is one such approach, but seems unlikely to be as useful. Compare random jitter to adversarial boosting, for example; the first will slowly and indirectly improve robustness whereas the latter will dramatically and directly improve it.

AIEnthusiast

• answered 4 months ago

PS: There is already some very good answers provided here, I will merely add to this answers in the hope that someone will find this useful:

Introducing noise to a dataset can indeed have a positive influence on a model. In fact this can be seen as doing the same thing that you would normally do with regularizers like dropout. Some of the example of doing this are Zur at.al, Cires¸at.al where the authors successfully introduced noise into the dataset to reduce over-fitting.

The catch is in knowing how much noise is too much. If you add too much noise, this might render your dataset useless in that the resulting dataset may no longer contain sufficient resemblance to the original dataset, so you might as well be training on a completely different dataset. Thus too much noise could be seen to cause under-fitting, just like extremely high dropout rates.

As the saying goes; ~~change~~ balance is the spice of life :).

AIEnthusiast

• answered 4 months ago

Noise in the data, to a reasonable amount, may help the network to generalize better. Sometimes, it has the opposite effect. It partly depends on the kind of noise ("true" vs. artificial).

The AI FAQ on ANN gives a good overview. Excerpt:

Noise in the actual data is never a good thing, since it limits the accuracy of generalization that can be achieved no matter how extensive the training set is. On the other hand, injecting artificial noise (jitter) into the inputs during training is one of several ways to improve generalization for smooth functions when you have a small training set.

In some field, such as computer vision, it's common to increase the size of the training set by copying some samples and adding some noises or other transformation.

How does noise affect generalization?

3 Answers

Write your answer here