Node.js is good for solving problems I don’t have
I have recently starting programming with Node.js and I like how simple and easy it use to write HTTP server code with it. Just because it’s easy doesn’t mean it’s appropriate for my needs or that it’s ready for prime time. What I have noticed in learning and using Node is that it was created primarily as a response to a problem that I just don’t have, or in fact that most web applications shouldn’t have.
Node was created to provide an event-based web server programming model that better utilizes threads on the server, particularly when it comes to IO operations (like filesystem reads or database calls). So rather than a thread having to wait for an IO operation to finish before program execution continues, the thread implements a callback to be called when the IO operation is finished. This way the threads are able to server more request because they aren’t waiting for expensive operations to complete.
Who has this problem? Whose web application performance bottleneck is that their threads are waiting for IO to complete? If this is your problem, then I don’t think you’ve got a very good web application implementation. Let me explain why.
HTML Generation is usually not the slowest part
Given that a web application is correct, available, and secure, users care most about speed. They don’t care about your hardware utilization or how many requests are handled per server, and they also don’t care how fast your web app is on average, they care about how fast it is for them. Page load time and time-until-usable are what users are concerned about.
Looking at cnn.com, there are 89 requests, totalling a little less than 1MB, which took 4.24s for it to load for me. Of those 89 requests, 3 were HTML requests from cnn.com (1 for the HTML page and 2 for weather). The HTML from cnn.com is about 30KB…out of 1MB! So if you want to speed up your site, where is the best place to focus? HTML from the server that makes up 3% of the total weight, 3% of the total number of requests, and 9% of the total page load time? Or would you focus on reducing the number of requests, the size of assets, and the caching of those assets?
Cnn.com’s HTML took 336 ms to get to me. Let’s say you made that 10x faster. You would have then reduced the total page time by 300 ms or about 7% of the total page load time and still get about 4 seconds for page load. You could have a 1 ms HTML response time and still have a slow site. The HTML generation and return time is usually not where the problem is for web application performance.
Most of the assets on a web page are static (meaning they don’t change per request) so they can be served by a cache server (so the origin server isn’t hit) and by the browser (so not even the cache server is hit). The origin server can generate the HTML and server up the static assets if needed, but it really shouldn’t do that very often because the browser cache and cache servers should be serving them. So then what you really need is a content server that is geared toward HTML generation, whether it be static or dynamic. So you have the origin server generating dynamic but cacheable HTML (like templated by little-changing info pages), and for handling dynamic but non-cacheable HTML (like search).
The content server should not need to do hardly any IO. Why would an HTML content server need to write to the filesystem? Even if it does, why does the web visitor need to wait on the result of that file write operation before seeing the server response? If you really need to write to the filesystem, spawn a thread or offload that operation to something else that can queue up write operations. Your content server doesn’t need to do it; it just need to invoke something else to do it.
If your content server is serving up dynamic content, what else can it be doing before it gets the data from the database? It’s primarily going to be formatting and creating presentation using the data from the DB, and if it has something it can be doing in the meantime I’m arguing it should be doing it. Something else can communicate with other services and cache HTML fragments or whatever. All the content server does is process content, so if it has to wait for the data, it waits.
But why would there be any IO for data that takes much time at all? If the data is so far removed from the presentation engine (the content server) that it blocks for any noticeable amount of time, you got a problem with data retrieval. The answer isn’t to create a callback for when the data finally arrives from the DB, the answer is to fix the problem of data coming back so slow from the DB.
Functional programming facilitates optimized and parallelized execution
One of the reasons I like functional programming is because the execution engine is able to parallelize function calls because functions only operate on data coming into the function and only output a result. Function don’t change properties or state on objects in memory. Since there’s no shared state or objects that can be accessed by two different processes, all operations are threadsafe. Better yet, with lazy evaluations like what MarkLogic does for many things, you can capture the result of a function call in a variable, but the execution engine doesn’t need to actually make the function call until you access something on that variable, which could be at any later point in your program. In fact, if you never access the variable the execution engine may never actually call the function that returns the value for that variable. Order of execution becomes much less important because the functions have no side effects and can be executed whenever the execution engine decides. The execute of one function does not affect another, so you can execute them all at once, or whenever resources are available. With Node, you’d be writing code to do all that: optimizing the method calling yourself. Instead, use a functional language and let the execution engine do it for you.
The problem I have is processing a lot of data quickly
I have megabytes and gigabytes of data to query and format for display on a web page. I need to be able to find a needle in a haystack and transform it into presentation quickly, for every request. First I need to get the speed down for just one user because that is as fast as I can go (unless another user were to cache it). Then I need that speed to remain fairly constant at scale, both with web traffic and amount of content. I am less concerned about how many requests each server can handle because I can scale horizontally if needed for both traffic and content size. With MarkLogic I have extremely fast access to the content I need. There’s no IO blocking to speak of. Even if there were, the execution engine will do some optimizing so parts of my code can execute in parallel. I spend time reducing query times, not coding callbacks for them.
Node enthusiasts are front-end coders not wanting to do server coding
I have used Javascript for over 15 years. I learned it before I learned Java. It’s really not too bad. I think what has happened in the web developer community is that some people who know front-end programming have gotten all excited that they can use their front-end skills to program the server. In fact, they think that they can even move a lot of processing that used to be done on the server up into the browser, using the programming languages and techniques they are used to, and all of the sudden it’s revolutionary and cutting edge. That’s a big reason CouchDB gained popularity, because there was no need for server programming. With HTML5, some have the idea that we don’t even hardly need a backend service at all, just to persist some state once in awhile.
So the Node community has tried to sell Node as solving a fundamental problem with server programming (blocking IO calls) but that’s really not the problem with web page speed or even server speeds, especially per request. I think the real reason is that they are mostly novices that want to use Javascript for the server side but they use the “blocking” argument to convince others. All the Node enthusiasts I know, some personally, are not very skilled server programmers but have pretty strong front-end skills. This revolution is more about front end coders not having to deal with the server side than any breakthroughs about how to do the server side. And the exuberance and arrogance from enthusiasts is meant to shame non-enthusiasts into thinking they’re old school, antiquated, or unable to learn new things, that this is the future and in a few years we’ll all be programming in Javascript and if you don’t get on board you’ll be out of a job (I heard this first-hand). Node has to be adopted, otherwise all these front-end coders will have to learn server programming.
But there are lots of things I like about Node, but not the community. I plan on using Node for easy HTTP server programming and for handling a large number of connections. But I need a Big Data server and a content server to generate dynamic and personalized HTML and to handle search. I’ll offload the HTML assets and cache as much as possible to cache servers, and I’ll optimize the front-end code to increase performance. Blocking calls, including IO, are just not one of my problems.