Today I made my first neural network using the Lasagne & Theano framework. The data I am working with comes from the Kaggle Digit Recognizer competition where the goal is handwriting recognition with the famous MNIST data.
Architecture
- 1 input layer: 784 nodes (one node for each input feature)
- 1 hidden layer: 397 nodes (mean of 784 and 10)
- 1 output layer: 10 nodes (one for each output class)
- “activation” ie. nonlinearity using
sigmoid
andsoftmax
- weight optimization using
adam
(same as in scikit-learn) - error for optimization by
categorical_crossentropy
2 modes of the script
(1) python $script -m inspect
to split train.csv into a train and validation set for getting an idea of model accuracy, and
(2) python $script -m predict
to train on all data in train.csv and to make predictions on test.csv.
Different activation functions
I also tried different nonlinearities in the hidden layer. The following is the prediction accuracy on the validation set:
lasagne.nonlinearities.sigmoid |
0.925952380952 |
lasagne.nonlinearities.tanh |
0.91556547619 |
lasagne.nonlinearities.rectify |
0.915327380952 |
lasagne.nonlinearities.linear |
0.868571428571 |
Using the sigmoid function for activation appears to have higher accuracy, and so was used for prediction.
Evaluation on Kaggle
Submitting the predictions to Kaggle, we get a score of 0.94629.