### Contents

The purpose of this series of posts is to create a concise overview of important modelling concepts, and the intended audience is someone who has learned these concepts before, but would like a refresher on the most important bits (e.g. myself).

### I. Motivation

Statistical learning involves building models to understand data.

Why estimate the function, *f*, that connects the input and output?

**Prediction:**- estimate output with a black box function to minimize the reducible error

**Inference:**- which predictors are associated with the response?
- what is the relationship between te response and each predictor?
- can the relationship between Y and each predictor be adequately summaried using linear equation?

How do we estimate *f*?

**Parametric Methods**make an assumption about functional form (i.e. linear), then use training data to fit or train the model.- Pro: Simplifies the problem down to estimating a set of parameters, and results are easily interpretable
- Con: The chosen model will likely not match the unknown form of
*f*

**Non-parametric Methods**do not make explicit assumptions about the functional form of*f*.- Pro: Potential to accurately fit a wider range of possible shapes
- Con: Need a very large number of observations to obtain an accurate estimate.

What are the two types of statistical learning?

**Supervised learning**has predictor measurements and associated response measurements.**Unsupervised learning**as observed measurements, but no associated response; often seek to understand relationships between variables or between observations.- Note: semi-supervised learning is when there are a limited number of response observations, and these methods are outside the scope of this book)

How do we assess model accuracy?

**Bias-Variance Tradeoff**- The goal is to develop a model that balances inflexible methods with large bias/small variance and flexible methods with small bias/large variance:

**Regression: Mean squared error**

**Classification: Error rate**

where is an indicator variable that equals 1 if the prediction is incorrect and 0 if correct.

- Notes:
- The Bayes classifier on average minimizes the test error rate by assigning each observation to the most likely class, given its predictor values: .
- The Bayes decision boundary is the separating boundary between classes (note: K-nearest neighbors often gets very close to the optimal Bayes classifier).
- The Bayes error rate is the lowest possible test error rate:

### II. Methods

Details are covered in the linked posts.

**Regression:** Predicting or explaining a continuous (quantitative) output

- Simple Linear Regression
- Multiple Linear Regression
- K-Nearest Neighbors Regression

**Classification:** Predicting or explaining a categorical (qualitative) output

- K-Nearest Neighbors
- Logistic Regression
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis

**Resampling Methods:** Techniques that produce more accurate models

- Cross validation
- Bootstrap

**Linear Model Selection and Regularization:** Subset selection, shrinkage, and dimension reduction techniques

- Linear Regression with forward, backward, and best subset selection
- L2 Regularization (Ridge Regression)
- L1 Regularization (Lasso)
- Principle Components Regression
- Partial Least Squares

**Moving Beyond Linearity:** Removing the linearity assumption

- Polynomial Regression and Step Functions
- Regression Splines
- Smoothing Splines
- Local Regression
- Generalized Additive Models

**Tree-based Methods:** Stratifying or segmenting the predcitor space into regions

- Regression and Classification Trees
- Bagging
- Random Forests
- Boosting

**Support Vector Machines:**

- Maximal Margin Classifier
- Support Vector Classifier
- Support Vector Machines (linear, polynomial, and radial kernel)

**Unsupervised Learning:** finding subgroups among variables, or grouping individuals according to observed characteristics

- Principle Components Analysis
- K-means Clustering
- Hierarchical Clustering