World Cup Out'n'About

When you hear 'Big Data' and 'Analytics' in the same sentence it's usually about Storm and Hadoop. This time our team decided to take something different for a trial.

The task

We asked to become data scientiests for a day and analyze a large text file filled with tweets. We also asked to develop a WEB application for presenting the results. And, as usual, we only had 8 hours to accomplish it.

The goal

Assosiating tweets with a country and then showing it on map, seemed to us valuable. Check! Tracking tweet's sources and calculating TOP-5 popular ways to post a tweet, came up as the second idea. And then we carried away a bit and decided to integrate a mobile application that pushes tweets to our system.

The team

2 server side developers, 2 JS developers, 4 Android developers and no Dev-Ops developers.

Technology Stack

Although Storm and Cassandra seemed as a perfect match for the task, we decided to see how easely Spark and Mongo can takle it. We chose Spring Boot as the fastest way to create and deploy an aplication server. We also agreed to use REST-full services so our JS-based and Android clients could have a simple way to communicate with the server.

Architecture

We decided to decouple the data analysis process from the rest system components and use Mongo as a midleware for data exchange:

Data Driver

We used Scala for a driver development. The driver takes tweets, bookmarks it, marks sources, calculates TOP-5 and sends the results to the Mongo.

val tweets = sc.textFile(path).flatMap(tweet => {
  //Create tuples by wanted tags
  WorldCupDictionary.DICTIONARY.filter(
    tag => tweet.toLowerCase.contains(tag)
  ).map(tag => (tag, tweet))
}).map(tuple => {
  //Convert JSON string to a tweet object
  (tuple._1, WorldCupDictionary.MAPPER.readValue(
   tuple._2, 
   classOf[Object2ObjectOpenHashMap[String, Object]]))
}).map(tuple => {
//Convert tweet object to a world cup object
  val map = new BasicBSONObject()
  map.put(
     "country", tuple._1)
  map.put(
    "twit", tuple._2("text").toString)
  map.put(
    "source", tuple._2("source").toString.replaceAll(
      "(<a href=\"(.*)\">)|(</a>)", ""
))
  map.put(
    "userName", tuple._2("user")
    .asInstanceOf[java.util.Map[String,String]]("screen_name"))
  try{
    map.put("dateInt", WorldCupDictionary.CUP_DATE_FORMAT.format(
      WorldCupDictionary.TWITTER_DATE_FORMAT.parse(
        tuple._2("created_at").toString)
      ).toLong
    )
  } catch { case ex: Exception => println(ex)}
  
  map

}).cache()

REST Services

While Spring Boot creates the skeleton of the project and provides predefned full REST api, Spring Data generates for us the all necessary queries:

@Repository
public interface TweetRepository extends MongoRepository<Tweet,String>{
}
@Service
public class TweetService {
  @Autowired
  private TweetRepository repository;
  public void save(Tweet twit) {
    repository.save(twit);
  }

  public List fndAll() {
    return repository.fndAll();
  }
}  
Thank you for your interest!

We will contact you as soon as possible.

Send us a message

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com