I. Cross Validation
Cross validation involves estimating the test error rate by holding out a subset of the training data. There are three main types of cross validation: using a validation set, leave-one-out cross validation, and k-fold cross validation.
In this simple method, we split the observations into a training and validation set. We build the model based on the training set, and tune its parameters based on their performance in the validation set.
There are two major drawbacks in this method:
- Validation estimates of the test error rate are highly variable depending on the choice of validation set
- only a subset of training data is used to build the model which makes it weaker than it could be, so the validation set often overestimates the test error
Leave-one-out cross validation
Leave-one-out cross validation (LOOCV) uses a single observation for the validation set, fits a model to the rest of the data, then predicts the single validation observation. It then repeats this process for every observation to calculate . The average of all of these errors is the LOOCV error:
LOOCV can be used for all kinds of predictive modeling, has very little bias and no randomness involved, but it can be resource intensive.
For least squares linear or polynomial regression, there is an easy shortcut to solve for the LOOCV error for computational efficiency (same cost as a single model fit):
K-fold cross validation
K-Fold cross validation randomly divides the observation into K folds of approximately equal size, has each one serve as the validation set for the rest. The test errors are then averaged to calculate the K-fold cross validation error.
This is more computationally feasible than LOOCV, and has higher bias and lower variance. This is because the model outputs of LOOCV are highly correlated with each other, while those of K-fold CV are somewhat less correlated with each other, and the mean of highly correlated quantities has higher variance. K-fold cross validation often actually outperforms LOOCV due to the bias/variance tradeoff.
Note: Use K = 5 or K = 10 in practice for the best results.
Classification vs. Regression
The previous formulas assume that we’re dealing with regression and use mean squared error as the measurement of performance. For classifcation, we can use to represent mischaracterized observations instead of MSE.
For example, LOOCV for logistic regression looks like this:
Bootstrap allows us to quantify the uncertainty associated with an estimator or model. It is particularly useful when we can’t calculate the standard error of an estimator (i.e. median), or when we can’t assume anything about the population distribution (i.e. normality).
- Sample with replacement a large number of times, on a data set to obtain .
- For each sample, calculate the bootstrap estimates
- Then calculate the standard error:
When choosing between models, the “one standard error rule” is used, and we choose the most sparse model who error is no more than one standard error above that of the best performing model.
Note that the probability that a bootstrap sample of size contains the th observation is . Since , converges to as