My thoughts about node.js
So, node.js is a hot topic now. I wanted to share some of my views about this framework, some misconceptions related to it (IMHO) and maybe help deciding when to use it. If you find any mistakes, please comment!
The first thing I want to discuss is the misconception (again, IMHO) that evented IO is more performant vs thread IO. Here is what happens when a process does IO (simplisticly):
- The kernel puts the process in a waiting list. By "putting" I just mean pushing the reference to the data structure that represents the process to a list. Nothing fancy. But this means the process will not be scheduled for CPU time
- The kernel (through a driver) issues the IO command.
- It handles interrupts from the IO device until the data is read/written
- It moves the process (reference) back to the running list.
- When it turn comes, the process becomes "running" by restoring its stack and registers in the CPU and setting the program counter to the line that made the IO.
- The process then resumes running.
The same is true for native threads. They are just a process that shares an address space with another process. So when a thead is blocked on IO, it does not consume cpu resources. In fact, you can think of the above steps as an evented IO where the callback is to set the PC and resume the thread.
So are threads better suited for IO? Well, it depends if you need threads.
Threads consume memory. In the JVM, a thread takes 512KB for its stack. While this can be changed with -Xss, it cannot be reduced to 0.
Another thing is that when working with threads, one has to be careful when using shared resources, since concurrent modifications can cause corruption. In node.js, only one callback is executing at each given time, so this risk doesn't happen.
However, node.js is just a single thread. What do you do if you have an 8 core machine and you want to utilize all of them? You use 8 node.js processes. But then, what happens if you do need a shared resource? Then concurrent modifications can happen (think writing to the same file), or it should be something that is guarded from these things (e.g. a database), but then such shared resources are also available to thread based applications.
Another thing with node.js' single threadedness is that only one computation can progress at any given time. So if a node.js server does more than communicate with a database or other servers, it cannot progress on more than 1 request at a given time. Of course you can spawn more node.js processes, but then the memory benefits of single threadedness are gone.
Moreover, evented IO exists in the JVM world (and ruby and python worlds), just not for standard servlets. And there are several http server frameworks that offer custom interfaces suitable for evented IO. But you also get threads, shared resources etc.
So when is node.js useful (IMHO):
- The server does little logic and involves mostly with fetching/pushing data to other servers. Then the performance hit doesn't really matter and the "async" first model out of the box is great for doing things right from the beginning.