Day27: Simple neural network in Lasagne

Posted by csiu on March 23, 2017 | with: 100daysofcode, Machine Learning, Kaggle

Today I made my first neural network using the Lasagne & Theano framework. The data I am working with comes from the Kaggle Digit Recognizer competition where the goal is handwriting recognition with the famous MNIST data.

Architecture

  • 1 input layer: 784 nodes (one node for each input feature)
  • 1 hidden layer: 397 nodes (mean of 784 and 10)
  • 1 output layer: 10 nodes (one for each output class)
  • “activation” ie. nonlinearity using sigmoid and softmax
  • weight optimization using adam (same as in scikit-learn)
  • error for optimization by categorical_crossentropy

2 modes of the script

(1) python $script -m inspect to split train.csv into a train and validation set for getting an idea of model accuracy, and

(2) python $script -m predict to train on all data in train.csv and to make predictions on test.csv.

Different activation functions

I also tried different nonlinearities in the hidden layer. The following is the prediction accuracy on the validation set:

lasagne.nonlinearities.sigmoid 0.925952380952
lasagne.nonlinearities.tanh 0.91556547619
lasagne.nonlinearities.rectify 0.915327380952
lasagne.nonlinearities.linear 0.868571428571

Using the sigmoid function for activation appears to have higher accuracy, and so was used for prediction.

Evaluation on Kaggle

Submitting the predictions to Kaggle, we get a score of 0.94629.