# Lasso introduction

In [None]:
# import some libraries

import numpy as np
from sklearn.linear_model import Lasso, Ridge, LinearRegression # not allowed in homework!

np.random.seed(0) # for reproducibility

# generate data:
n = 40
d = 1000
X = np.random.randn(n, d)
beta = np.zeros(d)
beta[0:4] = 3 * np.random.randn(4)
y = X @ beta + 3.5 + 0.2 * np.random.randn(n)

### Sparsity

Either write a function that computes the $\ell_0$ "norm" of a vector yourself, or lookup a built-in numpy function that does this for you.

For the vector $\vec \beta$ we just created, compute $\| \vec\beta \|_0, \| \vec\beta \|_1, \| \vec\beta \|_2 $: 

### Fitting an estimator

Scikit-learn makes it quick and easy to fit a number of linear models to your data.
For example, we can use ordinary least squares to estimate $\vec \beta_{\rm OLS}$ by

In [None]:
model = LinearRegression()
beta_OLS = model.fit(X, y).coef_
print(beta_OLS)

Is the answer close to the truth?
Why do you think OLS performs the way it does here?

Now, you can fit a ridge or lasso model using the same syntax.
Each of these has a regularizer either $\lambda \| \vec\beta \|^2$ or $\lambda \| \vec\beta \|_1$.
In `sklearn`, the hyperparameter $\lambda$ is called `alpha` to avoid confusion with python's `lambda` function.

The follwing code uses ridge regression to estimate the function:

In [None]:
model = Ridge(alpha=1.)
beta_ridge = model.fit(X, y).coef_
print(beta_ridge)

### Lasso estimator

Using the same syntax as for ridge, get the lasso estimate of the coefficient vector $\vec\beta$.
Use `alpha = 1.`.
Compare the sparsity to that of the OLS and ridge estimates, as well as the truth.

### Hyperparameter tuning

Go ahead an play with various values of `alpha`.
It's best to tune hyperparameters on a logarithmic scale. The default was `alpha = 1.`, so try things like `1e-1, 1e-2, 1e1, 1e2` to cover multiple orders of magnitude.

Look for the shrinkage (the estimated coefficients are "shrunken" values of the truth) and sparsity effects.