Day89: Lasso

Posted by csiu on May 24, 2017 | with: 100daysofcode, Machine Learning, Kaggle

Yesterday I used L2 regularized. Today I want L1 regularization on the same dataset (with the same setup) as yesterday i.e. the Digit Recognizer MNIST data set available on Kaggle (with setup 80% train + 20% test).

The Jupyter Notebook for this little project is found here.

Lasso

The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. (Scikit-learn)

It seems one way to specify LASSO is to use sklearn.linear_model.Lasso, however the predictions do not map to a valid output class and the accuracy is 0.2226.

from sklearn.linear_model import Lasso

reg = Lasso(alpha = 0.1)
reg.fit(X_train, y_train)

reg.predict(X_test)
#> [ 2.38436186 -0.76427569  3.47591499 ...,  1.58217907  5.61635695   3.52332026]

Another way is to use sklearn.linear_model.LassoLars, which made the predictions all the same. The accuracy dropped to 0.0943.

from sklearn.linear_model import LassoLars

reg = LassoLars(alpha=.1)
reg.fit(X_train, y_train)

reg.predict(X_test)
#> [ 4.44324405  4.44324405  4.44324405 ...,  4.44324405  4.44324405  4.44324405]

With regards to the poor accuracy, I think LASSO is for regularization and model training (but not for classification).