Day62: Data exploration pt1

Posted by csiu on April 27, 2017 | with: 100daysofcode, Kaggle

A new competition was announced on twitter today, and I wanted to explore the data using a Kaggle kernel.


Using Kaggle kernels

Kaggle kernels are located in the “Kernels” section of a Kaggle competition are used to experiment with a competition’s data and to share techniques. Reading through a couple of kernels written by other Kaggle users, I found it very conductive to learning new ways to handle, explore, and work with data.

I wanted to share a Kaggle kernel too.


My experience with Kaggle kernels

Today I thought I would explore the Sberbank dataset using a Kaggle kernel. I thought it would be simple, quick, and easy to do.

I was wrong.

The Kaggle kernel had its own keyboard shortcuts (different from the Jupyter Notebook ones I’m used to).

It was not simple.

The Kaggle kernel kept on disconnecting, reconnecting, and/or restarting (which slowed everything down).

It was not quick.

The Kaggle kernel sometimes disconnect and wipe all progress I made up to an earlier “saved point” (which I did not specify and so significant parts of my code went missing).

The kernel was not easy.

The first two issues, I’m okay with. But after having to restart from scratch multiple times (and feeling worst and worst after each restart), I drew a line. This is the kernel I left off at. I am not particularly satisfied with this work (but I need sleep because I’ll be going on a road trip with a couple of friends) and so I will do a proper analysis (away from Kaggle kernels) on my local machine and share that work.