Fuse Day 9/11 - An application on node.js & elastic search

In this fuse day, several of us learned how to program in node.js and how to use elastic search to store data.

 

The subject was to create an application that retrieves several rss feeds, parses the items to extract the subjects and indexes the items and subjects so they can later be queried.

 

Since node.js is single threaded, we created 2 applications:

  • A poller that accepts a configuration of feed urls and extractors and periodically processes them to get items and entities, and then pushes the results to elastic search
  • A repository that exposes a rest API to retrieve the feed items mentioning an entity.

 

Some insights we learned:

  • Node.js cannot be installed on Windows. While there is a premade executable for node.js itself, some modules are still built for linux only. Some other modules support only prior versions of node.js for which there is no binary. Trying to compile node.js (based on cygwin) did not go well creating link errors. Our only solution was to work with a virtual machine, based on instructions here. There were several things we had to work around: the ovf file seemed corrupter, so we needed to use the vmdk image, we needed to define bridged networking since NAT did not work.
  • Installation of node.js itself and modules is done by downloading source and compiling it. We've becomed accostumed, even in linux for pre-made binaries, so this fealt
  • Node.js is in version 0.5.6, but several prominent modules (e.g., express, we've also had issues with npm) still support 0.4.11. This is surprising, since the project is so fresh that we expected all early adopters to keep up to date
  • There isn't a lot of documentation, yet node.js is very easy to learn and work with. For elastic search there are several clients and we found elastical to be good for us.
  • The module system is very nice. It is based on CommonJS which means that even modules not intended for node.js can be used. For example, we've used underscore.js

 

Overall, it was a nice experiense. After the initial integration problems, we were quite productive and managed to get something working within a very short time.

 

Here are some code snippets, to whet your appetite:

 

First, the code that polls from feeds and parses them:

var rss = require('easyrss'),
    inspect = require('util').inspect,
    config = require('./config').config,
    _ = require('./lib/underscore')

exports.poll = function(callback) {
	_.map(config.feeds, function(feed) {
		rss.parseURL(feed.url, function(posts){
			_.each(posts, function(post){
				post.entities = _.reduce(feed.extractors, function(memo, extractor) {
					var extracted = extractor(post.description)
					if (extracted) memo.push(extracted)
					return memo
				}, [])
				callback(post)
			})
		})
		
	})
}

 

Next, the code that creates an http server and exposes a REST API that queries an elastic search server:

http.createServer(function (req, response) {
	console.log('got request ' + req.url);
	
	var query = require('url').parse(req.url, true).query;
	
	elClient.search({query: {
			query_string: {
				field: "entities",
				query : {
					query_string: {
						field:  "type",
						query: query.type
					},
					
				} 
			} 
			//term: {title: "Welcome to my stupid blog"}
		}}, function (err, results, res) {	

		response.writeHead(200, {'Content-Type': 'application/json'});
		response.write(JSON.stringify(results));
		response.end();
		
		console.log(inspect(err));
		
	});
	
	
	
}).listen(1337);