Day05 brings another Kaggle competition, “Digit Recognizer”. The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is.
The Jupyter Notebook for this little project is found here.
Random Forest
Today, I think I’ll use a random forest classifier. A random forest classifier is like an election where the outcome with the most votes (from the ensemble of decision trees) is the predicted classification. Here is a minimal example of how this is done in Python:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc = fit(X_train, y_train)
rfc.predict(X_test)
Number of trees?
I split the train data into 2 sets: 80% for training and 20% for validation. I then (crudely) optimized for the number of trees (default = 10 trees) and decided that 30 trees is as good as 100 trees.
With more trees, comes greater training time.
Evaluation
Submitting the results to kaggle, we get a public score of “0.95871”.