Linear Regression (LR) Models

Linear Regression (LR) Models

Linear Regression (LR) Models

  • is a type of continuous regression model whose function/estimator is linear with respect to the regression coefficients {𝜃0, ..., 𝜃𝑝}:
  • 𝑦̂ = 𝜃0 + 𝜃1𝑓1(𝒙) + ... + 𝜃𝑝𝑓𝑝(𝒙)
  • models the relationship between:
  • models mean/expected response as a function/conditional of regressors (where 𝑓𝑖(..) are feature functions)
    • 𝐄[𝑌|𝑋1=𝑥1, ..., 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, ..., 𝑥𝑘) = 𝑦̂ = 𝜃0 + 𝜃1𝑓1(𝑥1, ..., 𝑥𝑘) + ... + 𝜃𝑝𝑓𝑝(𝑥1, ..., 𝑥𝑘)
    • coefficient 𝜃0 represents the 𝑦 intercept when all feature functions 𝑓𝑖(..) equate to 0
    • coefficient 𝜃𝑖 represents the mean change in the dependent variable 𝑦 given a 1 unit change in the independent feature function 𝑓𝑖(𝑥1, ..., 𝑥𝑘) # for 1≤𝑖≤𝑝
  • is a type of level-level model (or even a level-log model when 𝑓𝑖(..) are log functions)
  • the dependent variable 𝑦 is the combination of the regression model and error
    • 𝑦𝑦̂ + 𝑒
    • dependent variable = (constant + independent variables) + error
    • dependent variable = deterministic + stochastic
    • deterministic component is the portion of the variation in the dependent variable that the independent variables explain. In other words, the mean of the dependent variable is a function of the independent variables. In a regression model, all of the explanatory power should reside here
    • error is the difference between the expected value 𝑦̂ and the observed value 𝑦. Let’s put these terms together—the gap between the expected and observed values must not be predictable. Or, no explanatory power should be in the error. If you can use the error to make predictions about the response, your model has a problem. This issue is where residual plots play a role.
    • the theory here is that the deterministic component of a regression model does such a great job of explaining the dependent variable that it leaves only the intrinsically inexplicable portion of your study area for the error. If you can identify non-randomness in the error term, your independent variables are not explaining everything that they can

LR - Steps

given sample/training data:

  • (𝑦1, 𝑥11, ..., 𝑥1𝑘) # sample 1
  • (𝑦2𝑥21, ..., 𝑥2𝑘) # sample 2
  • ...
  • (𝑦𝑛𝑥𝑛1, ..., 𝑥𝑛𝑘) # sample 𝑛

the task of Linear Regression:

  • choose line equation form, such as:
    • 𝐄[𝑌|𝑋1=𝑥1] = 𝑦̂ = ℎ(𝑥1) = 𝜃+ 𝜃1𝑥# univariate linear regression
    • 𝐄[𝑌|𝑋1=𝑥1, 𝑋2=𝑥2] = 𝑦̂ = ℎ(𝑥1,𝑥2) = 𝜃+ 𝜃1𝑥1 + 𝜃2𝑥# multivariate linear regression

    • 𝐄[𝑌|𝑋1=𝑥1, 𝑋2=𝑥2] = 𝑦̂ = ℎ(𝑥1,𝑥2) = 𝜃+ 𝜃1𝑥1𝑥+ 𝜃2𝑥12 + 𝜃3𝑥# multiple linear regression
  • where:
    • 𝐄[𝑌|..] and 𝑦̂ and ℎ(..) - scalar response/dependent variable or hypothesis function conditional on 𝑥𝑖's
    • 𝑥𝑖 - regressors or explanatory/predictor/covariate/independent variables
    • 𝜃𝑖 - regression coefficients/weights
  • estimate/find the values of the regression coefficients 𝜃𝑖 which best fit the line equation to the data
  • determine whether its a goodfit

LR - Types

LR TypeModel FormExample Models
Univariate Linear Regression𝐄[𝑌|𝑋1=𝑥1] = ℎ(𝑥1) = 𝑦̂ = 𝜃+ 𝜃1𝑓1(𝑥1)
  • 𝜃+ 𝜃1𝑥1
  • 𝜃+ 𝜃1𝑥12
Multivariate Linear Regression

𝐄[𝑌|𝑋1=𝑥1, ..., 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, ..., 𝑥𝑘) = 𝑦̂ =:

  • 𝜃+ 𝜃1𝑓1(𝑥1) + ... + 𝜃𝑘𝑓𝑘(𝑥𝑘)
  • 𝜃+ 𝜃1𝑓1(𝑥1, ..., 𝑥𝑘) + ... + 𝜃𝑘𝑓𝑘(𝑥1, ..., 𝑥𝑘)
  • 𝜃+ 𝜃1𝑥1 + ... + 𝜃𝑘𝑥𝑘
  • 𝜃+ 𝜃1𝑥13 + ... + 𝜃𝑘𝑠𝑖𝑛(𝑥𝑘)
  • 𝜃+ 𝜃1𝑥1𝑥3 + ... + 𝜃𝑘𝑥46𝑥𝑘
  • 𝜃+ 𝜃1𝑥1𝑥𝑘-2𝑥𝑘 + ... + 𝜃𝑘𝑥𝑘

LR - Methods for Estimating Coefficients (𝜃𝑖)

Methods estimating unknown coefficients {𝜃0, ..., 𝜃𝑘} of 𝐄[𝑌|𝑋1=𝑥1, ..., 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, ..., 𝑥𝑘) = 𝑦̂ = 𝜃+ 𝜃1𝑓1(𝑥1, ..., 𝑥𝑘) + ... + 𝜃𝑘𝑓𝑘(𝑥1, ..., 𝑥𝑘)

MethodDescription

Method of Least Squares
(Gradient Descent)

  • idea: minimizing square error via GRADIENT DESCENT
  • need to choose learning rate 𝛼
  • need many iterations
  • works well when the number of training examples 𝑋 is large

Method of Least Squares
(Projection Matrix - Normal Equation)

  • idea: minimizing square error via NORMAL EQUATIONS
  • no need to choose learning rate 𝛼
  • do not need to iterate
  • need to compute (𝑋𝑇𝑋)-1𝑋𝑇 or 𝑉𝐷-1𝑈𝑇
  • slow if the number of training examples 𝑋 is large because computing the inverse of a matrix is 𝑂(𝑛3)
Maximum Likelihood Estimation
MAP (Bayesian Linear Regression)
Newton-Raphson (N-R) Technique
  • idea: TODO

LR - Model Types

Linear Regression Models - takes an input vector 𝑥∊ℝ𝑛 as input and predicts the value of a scalar 𝑦∊ℝ as output (whose function/estimator is linear wrt the regression coefficients {𝜃0, ..., 𝜃𝑝})

Linear Model TypeDescription

Ordinary Least Squares Regression

  • has several weaknesses, including sensitivity to both outliers and multicollinearity and it is prone to overfitting

Stepwise Regression
Best Subsets Regression

Robust Regression

Ridge Regression

  • address multicollinearity
  • allows you to analyze data even when severe multicollinearity is present and helps prevent overfitting. This type of model reduces the large, problematic variance that multicollinearity causes by introducing a slight bias in the estimates. The procedure trades away much of the variance in exchange for a little bias, which produces more useful coefficient estimates when multicollinearity is present

Lasso Regression
(Least Absolute Shrinkage and Selection Operator)

  • performs variable selection that aims to increase prediction accuracy by identifying a simpler model. It is similar to Ridge regression but with variable selection
Elastic Net Regression
  • is a combination of regularizers Ridge regression and LASSO regression
Partial Least Squares (PLS) Regression
  • is useful when you have very few observations compared to the number of independent variables or when your independent variables are highly correlated. PLS decreases the independent variables down to a smaller number of uncorrelated components, similar to Principal Component Analysis. Then, the procedure performs linear regression on these components rather than the original data. PLS emphasizes developing predictive models and is not used for screening variables. Unlike OLS, you can include multiple continuous dependent variables. PLS uses the correlation structure to identify smaller effects and model multivariate patterns in the dependent variables
Beta Regression
  • models variables within (0, 1) range
Dirichlet Regression
  • models compositional data
Loess Regression
  • smoothing time series
Isotonic Regression
  • for approximation of data that can only increase (typically cumulative data)

LR - Methods for Determining How Well The Fitted Line Describes the Data

LR - Methods for Diagnosing Bias Variance 

LR - Subpages

LR - Resources