In this post I will describe an interesting architecture of one of the projects that I once wrote which has a lot in
common with Erlang's OTP Design Principles.

Introduction
 
A common approach to expose a service to the web is to run it under an application/web server like Tomcat/Jboss/other And if there are lots of different services that share a common behavior (such as rate limiting, statistics and more..), it could be a monster server that gathers lots of services or we could have multiple application servers running on different ports which have of course, some common jars but also some duplicate code. OTP Design Principles can be viewed as an umbrella of services and features – a resource management server that can update and upgrade services, verify that the system is healthy and working well, provide statistics about any service/worker, etc..
The update/add/delete approach - What would you do if you wanted to update the data in these services during runtime on production systems? Would you cache the old data while the script creates a new one and replace it? Or would you look up inside the data and change specific records? Every service behaves differently and has its own data so you'll need to implement an update mechanism for every service. OTP design overcomes these problems using the replacement approach, simply by loading a new JVM replacer process that holds the new data version instance or even might load a new code version as well.
Keep alive/healthy and working properly – What if a service is damaged and runs into a loop that exhausts the machine or even worse has a bad code that crashes the whole JVM with all the other services that are in that JVM. With OTP design you can monitor all of the processes and verify not only that the service is available but also verify that it works well. In addition, every service is on a different JVM so if one crashes, it won’t crash the whole system and a new replacer will be launched.
Language selection flexibility - what if you want to write a service in Ruby? OTP design Architecture enables this privilege, writing the service/worker can be done in any language.
Development effort/time – OTP helps you to focus only on the business logic. When writing a new worker you don't have to be concerned about exposing the web service, etc...

 

High level design

 
This design is based on separation of concerns:
  1. Workers – every worker acts as a service, for example: a lucene based service that returns certain results is based on its index. The worker receives its jobs, process them and return the results to the Client-Server Supervisor.
  2. Supervisors can be divided to 2 kinds: 
    1. Client-Server Supervisor– Handles the client requests/responses/continuations, delivers its unique job to every worker and provides a variety of services such as statistics,  rate limiting, asynch and synch requests handling, priority jobs, health check/status, etc...  
    2. Workers Supervisor executer – verifies that every worker works well and is responsible to replace workers as needed. It can have another role which is to download new versions (data or code) and replace these workers in runtime.

 

Diagram #1 demonstrates the communication between these modules and not the flow sequence which will be demonstrated in the PowerPoint below:


 

Diagram 1 – High level architecture

 

 

Jobs request handling - Client-Server supervisor & its Workers
 
Workers
Every worker has a generic behavior (GenericWorker): sign up to its supervisors, create periodically HTTP requests, receive jobs to execute, notify about failures, return the jobs results, work in multi threading mode, stop existing request processing etc..
When writing a new service, the developer mainly focuses on the business logic and usually doesn’t have to worry about the generic behavior.
The worker's HTTP request contains the results of the previous job in its body. The next job to process will be delivered to the worker in its HTTP response.
 
Client-Server Supervisor has 3 different Handlers: the first handles with the workers, second with the client requests and the third handler deals with status/statistics of all the queues/worker/services including with a detailed information about all the workers.
The main functionality of the Client-Server is to map all kinds of client service requests to the appropriate service/worker. This server is stateless and holds all queues in memory, there are 2 queues per service – first is the clientQ that contains all waiting client requests and second is the workerQ which contains all the workers that are waiting to receive jobs for this service. All communications between the client to server and workers to server are based on standard HTTP protocol.
Flow description - when a client request arrives to Client-Server Supervisor, a continuation for this request is initialized. If this service currently has available workers waiting in workersQ  then a new “good” worker will be taken from the queue and the job will be created. Otherwise if there are currently no available workers in the queue (they are all working on previous jobs), then this request will be added to the clientQ and when a worker comes back with its previous request, the server will first send its results to the previous client and then the worker will take the second job from the client, you can see the below PowerPoint that demonstrate the request flow.
 
Workers Supervisor executer main role is to be responsible for the resource management: amount of workers, starting/stopping workers, checking if they work well and replacing them when necessary, downloading the latest data/code versions for each service/worker, prepare a new version and replace the old worker, reverting to an old version and keeping X amount of latest good versions, (Versions can be data or code, referring as update and upgrade a service).
 
 

 

 
disadvantages / problems in this approach:
1.      Lots of connections can be created in the server - every worker creates multiple connections with the Client-Server supervisor. in order to avoid creation of so many connections you can use enable keep alive and to reuse connections. In addition you might need to configure the tcp configuration in the machine (http://publib.boulder.ibm.com/infocenter/cmgmt/v8r3m0/index.jsp?topic=/com.ibm.eclient.doc/trs40019.htm, http://smallvoid.com/article/winnt-tcpip-max-limit.html).
2.      This design is delicate regarding timings/clock: timeouts on the server side (continuation) VS client side read/connect timeouts.
The design that a worker is arriving to the Client-Server supervisor and hangs there until he receives a job is problematic because (for example):
let’s say you set 20 seconds read timeout at the worker's client side and a client request arrived after 19 seconds, the worker hanged in Client-Server for 19 seconds. in this situation the server will have 1 second to write the job to this worker. Without dealing with this kind of edge scenarios you can lose requests. One way to deal with this issue is to release a worker if there is not enough time to deliver the job and In the other hand I wouldn’t recommend to have endless(0) readTimeout or connectionTimeout.
3.      It takes time to build such an application server that will work well regarding resource management but once that you wrote the supervisors, the development of new services/workers is very easy and fast.
4.      Latency - the latency is harmed a bit because every bit is being written and read twice. on the server, we read the inputstream, write to the workers outputstream. in the workers side we read it again and when we have results we write it again to the server, the latency is being increased only by few Ms depends on the jobs content length.
5.      Separating the supervisor has advantages but also some disadvantages. Dealing with edge communication scenarios where supervisor needs to make decisions.
 
What are the benefits of this approach?
 
  1. Development time decreases - provides a faster way to develop a new service. There is no point in wrapping every new service with a web/application server and implementing all of the above benefits for every service. The initiative is to make a thin service/worker (just the service core itself). 
  2. Every registered worker automatically receives sub-services from the client-server supervisor.
  3. Ability to make data and code updates, done by the supervisor – workers can be replaced with newer versions.
  4. Rate limiting and resource management operations are given for free for all the services/workers.
  5. Watch over the workers/services and verify that they are up and working well.
  6. Information is exposed by API such as: last failed job time, number of good jobs done, total of failed jobs done, how many client requests are waiting in queue at a certain moment. 
  7. Priority queue mechanism – a distinction in client queue requests between normal and higher priority requests which is done by the client-server supervisor.
  8. Load/kill workers based on statistics – based on daily statistics the supervisor can pre-decide when to load more workers. 
  9. Development language selection flexibility.
  10. Minimize the risk if a crash occurs – worker sign up only once and don’t need to be restarted if one of its supervisor was restarted.

 

4

Comments

Sounds fascinating but I don't understand how it's done... Please explain more about "loading a new JVM replacer process" and "every service is on a different JVM".   Erlang dynamic code loading, process spawning and inter-process communication are amazingly efficient and easy.  You can create thousands of processes on a single machine (local or remote) and introduce to each one its owner (supervisor), update code on without shutting down servers, etc. It's amazing you made it with Java...

Thanks. "every service is on a different JVM": the meanning is that every worker runs on a different JVM, for example: a multi threaded worker that approaches to the Client-Server-Supervisor and receives new jobs to execute. The Workers-Supervisor-executer can make decissions like replacing a current worker with a new worker, this decission can be taken if there is a new data/code version for that kind of worker or if the worker doesnt work well / crashed. This is where "loading a new JVM replacer process" comes into the picture: 1. Workers-Supervisor-executer prepares a new worker. 2. Workers-Supervisor-executer starts the replacer (a new JVM/worker) with the right configurations. 3. Workers-Supervisor-executer verifies that the replacer signed up, passed validations and working well. 4. a. If the replacer is OK then the Client-Server-Supervisor will not give more jobs to the old worker.      b. Workers-Supervisor-executer will let the old worker to finish its old/current jobs. 5. The old worker will gracefully shut down and just in case it doesnt work then we all know the kill -9 :) (In this kind of design you must deal and be prepared to any kind of edge senario)    Hope its more clear now. Oh and one more small disadvantage: a bit larger memory consumption in the machine. The JVM it self consumes memory therefore having multiple services/workers that each one of them is on a different JVM, consumes only a bit more memory than if all were on 1 JVM.    You are right, you can have multiple workers and their Workers-Supervisor-executer on the same machine and remotely connect to a Client-Server-Supervisor which is located in another machine but it could be a latency cost when doing so.  

That's an interesting idea, in this design there isn't a pure bi-directional communication. Diagram #1 brief description 1. The workers do not have an open port, the red arrow in diagram #1 is an execution of a new worker (starting a new JVM). The worker fires HTTP requests to receive jobs and also approaches once, to its supervisors in order to sign up. Of course that every worker can have JMX...   2. The workers supervisor executer has an open port, receives requests and also sends requests.    A. Receives requests for a new service - Let's say that you don't want to start all the workers when the system starts up. For example: let's say that you have 2 different machines, the first provides only billing services and the second provides only provisioning services. In both machines you can have the same build with the same configurations, but both will start different services/workers, how? The billing client will send a request to the billing machine and the provisioning client will send a request to the provisioning machine. When a client request arrives to the client-server-supervisor it realizes that there is no available service of this kind, and then it sends a request to the workers supervisor executer which will start a new service/worker.     B. Sends a status request - workers supervisor executer sends a request toclient-server-supervisor and retrieves server load statistics, availability of the workers, if they work well etc…     C. Downloading/Pushing new versions – is not mentioned in diagram #1.   3. Client server supervisor also receives requests on 3 different ports and sends requests (communicates with the workers supervisor executer or to an Asynch client).