Machine Learning hackathon failures and successes
Part 1: The Failure
I have been messing with natural language processing on and off since I was a wee lad, first using Lisp and Prolog (I was very impressed with Eliza), it was something I always loved to tinker with. Few years ago, concepts which were academic since the 1970es, machine learning, and deep learning became actually useful in real world. This was possible thanks to power to cost ratio of modern GPUs, and several brilliant academic articles published since 2006, it became “the thing”. Everybody is talking about it, everybody is looking to hire “data scientists”, technologies using it became most talked about technologies out there, Siri, Alexa, Cortana, Google Assistant, various self driving car projects, machine players of various games beating experts, rumors of data scientists tilting voters in favor of Brexit and possibly in Trump/Clinton 2016 elections. It was about time I gave it another go, when Tikal Knowledge announced a machine learning hackathon and asked employees to submit ideas. Technology was mature enough that an experienced developer could handle it, even without having a phd in Mathematics, and natural language processing, my childhood love, has been one of the most lucrative subfields in deep learning. I submitted a modest idea, implementing Sentiment Analysis, do learning on a computer, and do evaluations on a mobile device. Google’s TensorFlow library enabled this, as it is supposed to be able to run on iOS, Android and Raspberry Pi devices. Wow, neural networks on a mobile processor…On an IOT device… That’s the stuff science fiction is made of, and I wanted it, bad.
So with dreams of doing my bit in implementing global apocalypse and robot human war, I submitted the idea and was notified that it was among eight elected ideas for the hackathon. But as I started learning a bit more about deep learning, I realized I possibly bit more then I can chew. Nearly every lesson on youtube quickly descended into math, field I am very rusty with. I studied applied math but it has been 20 years, and most mathemathicky thing I used in professional life is a bit of multiplication of matrices when dealing with graphics and OpenGL. I understood the theory, but how do I put it into practice?
Compile TensorFlow Android demo app, which is a picture classifier
Find nice python code that does training for sentiment analysis with TensorFlow, there are quite a few, train it with IMDB data from one of many sources online
Modify Android app to work with that data, use python evaluation code as sample, port it to Java
As it turns out, pretty hard.
Mission 1: compile the TensorFlow Android demo
First, install requirements. TensorFlow Android library is not distributed pre-built. Ok, so let’s follow the instructions as found at the github repository. Easy enough…
Not so fast, as in order to work with Android port, you have to install it from sources, following this guide. First do a git clone:
git clone --recurse-submodules https://github.com/tensorflow/tensorflow.git
It will not build unless you have installed Bazel build system. Ok, so let’s see what does this require.
Up to date XCode command line tools. Ooops…
It requires MacOS 10.12 Sierra. I am not fast in updating things after Windows Vista upgrade broke Symbian build system for a while. In my Symbian days there was a period of two years that new computers shipped with Vista, while Symbian developer tools (Carbide.c++ and Symbian SDK) worked only with XP, and laptop manufacturers did not supply XP drivers for new computers. It took Nokia two years before Carbide supported Vista out of the box. It was a total pain. So after that painful experience, I do not rush upgrading the OS to latest and greatest. I was still running El Capitan, but Bazel forced me to upgrade to Sierra, because it requires latest XCode tools, which require Sierra.
That took an entire evening. I still had a day job, preparations for hackathon could take place only in my free time, after work. Upgrading JDK and XCode tools took another evening. Ok, so now, in theory I could compile TensorFlow on my MacBook, but hackathon was fast approaching. Evening before the hackathon I ran the build and sat down to watch an episode of Stranger Things, it has to be done in 40 minutes, we are not in middle ages any more, right?
bazel build -c opt //tensorflow/examples/android:tensorflowdemo
Wrong. Episode of Stranger Things later, build was still running. As I had to get up early to be there on time, as a team lead I had to be there early to be able to organize things. I had to make sure that entire team had places to sit, desk space, access to power outlets and to network. So I let the build run overnight and went to bed hoping for the best. It seems I was too optimistic, it was still compiling in the morning, reporting over 1000 seconds compile time for each file. WTF? I have a MacBook Pro, not an Altair 8800… What is going on? There was no choice, I had to go. I packed my laptop and let it compile during the commute.
When I arrived in the office, it was still compiling. Not a good sign. After a bit of investigation it turned out that Bazel is set up by default to compile on some huge build farm with a million cores, it ran over 220 threads, and they seemed to be deadlocked. It wasn’t compiling at all, but was stuck in “after you, no, please, after you” deadlock.
After some investigation, I found that it is a common problem with Bazel, and that you can limit number of threads it uses by adding –jobs=x flag…
bazel build -c --jobs=10 opt //tensorflow/examples/android:tensorflowdemo
Ok, with limit of 10 threads it progressed much faster, 10 times faster, but it still took several hours. Snails pace, but at least it was a pace, not a standstill. While it builds, at least we will have time to investigate the training options. One of the team members suggested a fallback position, use Google Natural Language Processing API, implement a Node.js chat app that connects to it, and at least we will have something to show, it won’t be total embarrassment. I told him to go for it, and continued to investigate various projects for Sentiment Analysis with TensorFlow.
We spent several hours on it, and lunch time came. As Napoleon said, “army marches on its stomach”, we broke for lunch, while the cursed Bazel build was still going…
After lunch, build finally finished. Yipiee! Let’s see it run…
adb install -r bazel-bin/tensorflow/examples/android/tensorflowdemo.apk
And ka-boom! Runtime Exception, it did not manage to load a missing native library. We had about three hours left until the end of the hackathon, we wasted more than half of the day on a dead end, there is no way I will even get this to build on time, not to talk about modifying it. It was time to cut our losses, and pivot. Building TensorFlow library from source was not a viable option given the time constraints.
Upcoming Part 2: Pivot and success