World Cup Out'n'About

When you hear 'Big Data' and 'Analytics' in the same sentence it's usually about Storm and Hadoop. This time our team decided to take something different for a trial.

The task

We asked to become data scientiests for a day and analyze a large text file filled with tweets. We also asked to develop a WEB application for presenting the results. And, as usual, we only had 8 hours to accomplish it.

The goal

Assosiating tweets with a country and then showing it on map, seemed to us valuable. Check! Tracking tweet's sources and calculating TOP-5 popular ways to post a tweet, came up as the second idea. And then we carried away a bit and decided to integrate a mobile application that pushes tweets to our system.

The team

2 server side developers, 2 JS developers, 4 Android developers and no Dev-Ops developers.

Technology Stack

Although Storm and Cassandra seemed as a perfect match for the task, we decided to see how easely Spark and Mongo can takle it. We chose Spring Boot as the fastest way to create and deploy an aplication server. We also agreed to use REST-full services so our JS-based and Android clients could have a simple way to communicate with the server.


We decided to decouple the data analysis process from the rest system components and use Mongo as a midleware for data exchange:

Data Driver

We used Scala for a driver development. The driver takes tweets, bookmarks it, marks sources, calculates TOP-5 and sends the results to the Mongo.

val tweets = sc.textFile(path).flatMap(tweet => {
  //Create tuples by wanted tags
    tag => tweet.toLowerCase.contains(tag)
  ).map(tag => (tag, tweet))
}).map(tuple => {
  //Convert JSON string to a tweet object
  (tuple._1, WorldCupDictionary.MAPPER.readValue(
   classOf[Object2ObjectOpenHashMap[String, Object]]))
}).map(tuple => {
//Convert tweet object to a world cup object
  val map = new BasicBSONObject()
     "country", tuple._1)
    "twit", tuple._2("text").toString)
    "source", tuple._2("source").toString.replaceAll(
      "(<a href=\"(.*)\">)|(</a>)", ""
    "userName", tuple._2("user")
    map.put("dateInt", WorldCupDictionary.CUP_DATE_FORMAT.format(
  } catch { case ex: Exception => println(ex)}


REST Services

While Spring Boot creates the skeleton of the project and provides predefned full REST api, Spring Data generates for us the all necessary queries:

public interface TweetRepository extends MongoRepository<Tweet,String>{
public class TweetService {
  private TweetRepository repository;
  public void save(Tweet twit) {;

  public List fndAll() {
    return repository.fndAll();