Node.js is good for solving problems I don’t have
I have recently starting programming with Node.js and I like how simple and easy it use to write HTTP server code with it. Just because it’s easy doesn’t mean it’s appropriate for my needs or that it’s ready for prime time. What I have noticed in learning and using Node is that it was created primarily as a response to a problem that I just don’t have, or in fact that most web applications shouldn’t have.
Node was created to provide an event-based web server programming model that better utilizes threads on the server, particularly when it comes to IO operations (like filesystem reads or database calls). So rather than a thread having to wait for an IO operation to finish before program execution continues, the thread implements a callback to be called when the IO operation is finished. This way the threads are able to server more request because they aren’t waiting for expensive operations to complete.
Who has this problem? Whose web application performance bottleneck is that their threads are waiting for IO to complete? If this is your problem, then I don’t think you’ve got a very good web application implementation. Let me explain why.
HTML Generation is usually not the slowest part
Given that a web application is correct, available, and secure, users care most about speed. They don’t care about your hardware utilization or how many requests are handled per server, and they also don’t care how fast your web app is on average, they care about how fast it is for them. Page load time and time-until-usable are what users are concerned about.
Looking at cnn.com, there are 89 requests, totalling a little less than 1MB, which took 4.24s for it to load for me. Of those 89 requests, 3 were HTML requests from cnn.com (1 for the HTML page and 2 for weather). The HTML from cnn.com is about 30KB…out of 1MB! So if you want to speed up your site, where is the best place to focus? HTML from the server that makes up 3% of the total weight, 3% of the total number of requests, and 9% of the total page load time? Or would you focus on reducing the number of requests, the size of assets, and the caching of those assets?
Cnn.com’s HTML took 336 ms to get to me. Let’s say you made that 10x faster. You would have then reduced the total page time by 300 ms or about 7% of the total page load time and still get about 4 seconds for page load. You could have a 1 ms HTML response time and still have a slow site. The HTML generation and return time is usually not where the problem is for web application performance.
Most of the assets on a web page are static (meaning they don’t change per request) so they can be served by a cache server (so the origin server isn’t hit) and by the browser (so not even the cache server is hit). The origin server can generate the HTML and server up the static assets if needed, but it really shouldn’t do that very often because the browser cache and cache servers should be serving them. So then what you really need is a content server that is geared toward HTML generation, whether it be static or dynamic. So you have the origin server generating dynamic but cacheable HTML (like templated by little-changing info pages), and for handling dynamic but non-cacheable HTML (like search).
The content server should not need to do hardly any IO. Why would an HTML content server need to write to the filesystem? Even if it does, why does the web visitor need to wait on the result of that file write operation before seeing the server response? If you really need to write to the filesystem, spawn a thread or offload that operation to something else that can queue up write operations. Your content server doesn’t need to do it; it just need to invoke something else to do it.
If your content server is serving up dynamic content, what else can it be doing before it gets the data from the database? It’s primarily going to be formatting and creating presentation using the data from the DB, and if it has something it can be doing in the meantime I’m arguing it should be doing it. Something else can communicate with other services and cache HTML fragments or whatever. All the content server does is process content, so if it has to wait for the data, it waits.
But why would there be any IO for data that takes much time at all? If the data is so far removed from the presentation engine (the content server) that it blocks for any noticeable amount of time, you got a problem with data retrieval. The answer isn’t to create a callback for when the data finally arrives from the DB, the answer is to fix the problem of data coming back so slow from the DB.
Functional programming facilitates optimized and parallelized execution
One of the reasons I like functional programming is because the execution engine is able to parallelize function calls because functions only operate on data coming into the function and only output a result. Function don’t change properties or state on objects in memory. Since there’s no shared state or objects that can be accessed by two different processes, all operations are threadsafe. Better yet, with lazy evaluations like what MarkLogic does for many things, you can capture the result of a function call in a variable, but the execution engine doesn’t need to actually make the function call until you access something on that variable, which could be at any later point in your program. In fact, if you never access the variable the execution engine may never actually call the function that returns the value for that variable. Order of execution becomes much less important because the functions have no side effects and can be executed whenever the execution engine decides. The execute of one function does not affect another, so you can execute them all at once, or whenever resources are available. With Node, you’d be writing code to do all that: optimizing the method calling yourself. Instead, use a functional language and let the execution engine do it for you.
The problem I have is processing a lot of data quickly
I have megabytes and gigabytes of data to query and format for display on a web page. I need to be able to find a needle in a haystack and transform it into presentation quickly, for every request. First I need to get the speed down for just one user because that is as fast as I can go (unless another user were to cache it). Then I need that speed to remain fairly constant at scale, both with web traffic and amount of content. I am less concerned about how many requests each server can handle because I can scale horizontally if needed for both traffic and content size. With MarkLogic I have extremely fast access to the content I need. There’s no IO blocking to speak of. Even if there were, the execution engine will do some optimizing so parts of my code can execute in parallel. I spend time reducing query times, not coding callbacks for them.
Node enthusiasts are front-end coders not wanting to do server coding
I have used Javascript for over 15 years. I learned it before I learned Java. It’s really not too bad. I think what has happened in the web developer community is that some people who know front-end programming have gotten all excited that they can use their front-end skills to program the server. In fact, they think that they can even move a lot of processing that used to be done on the server up into the browser, using the programming languages and techniques they are used to, and all of the sudden it’s revolutionary and cutting edge. That’s a big reason CouchDB gained popularity, because there was no need for server programming. With HTML5, some have the idea that we don’t even hardly need a backend service at all, just to persist some state once in awhile.
So the Node community has tried to sell Node as solving a fundamental problem with server programming (blocking IO calls) but that’s really not the problem with web page speed or even server speeds, especially per request. I think the real reason is that they are mostly novices that want to use Javascript for the server side but they use the “blocking” argument to convince others. All the Node enthusiasts I know, some personally, are not very skilled server programmers but have pretty strong front-end skills. This revolution is more about front end coders not having to deal with the server side than any breakthroughs about how to do the server side. And the exuberance and arrogance from enthusiasts is meant to shame non-enthusiasts into thinking they’re old school, antiquated, or unable to learn new things, that this is the future and in a few years we’ll all be programming in Javascript and if you don’t get on board you’ll be out of a job (I heard this first-hand). Node has to be adopted, otherwise all these front-end coders will have to learn server programming.
But there are lots of things I like about Node, but not the community. I plan on using Node for easy HTTP server programming and for handling a large number of connections. But I need a Big Data server and a content server to generate dynamic and personalized HTML and to handle search. I’ll offload the HTML assets and cache as much as possible to cache servers, and I’ll optimize the front-end code to increase performance. Blocking calls, including IO, are just not one of my problems.
You’re quite off the mark. The problem node solves is that of concurrency (see http://www.kegel.com/c10k.html). As a bonus it uses a widely known, web-focused scripting language.
“Whose web application performance bottleneck is that their threads are waiting for IO to complete?” – every application that has massive concurrency or uses realtime communication – think all kinds of messaging apps, twitter, chat, any system that provides push notifications, collaborative editing, etc.. It’s also a perfect fit for other kinds of non-http servers (IRC, jabber, local routing/messaging).
In short, it enables the realtime web with a very low barrier to entry. You can solve the same problem using Erlang, then new Java 7 async APIs, EventMachine, Twisted, or whatever, that doesn’t make them any less worthy.
There are lots of ways to deal with concurrency, and some of them don’t use any blocking. And concurrency only really becomes a problem when you are trying to have multiple processes use the same shared resource. There are techniques to reduce the number of things that are shared, or that need locks.
Event-based programming also does nothing to speed up a single request. It only can minimize how much multiple requests interfere with each other and slow each down. But it doesn’t speed any one of them up. A single request will only be as fast as the code to retrieve and format the data. Events don’t change that.
So rather than trying to get requests to all run the same speed (which is what Node does), why not try to make an individual request return faster and then work on trying to maintain that speed at scale (which is what I think MarkLogic does)?
“Event-based programming also does nothing to speed up a single request. It only can minimize how much multiple requests interfere with each other and slow each down. But it doesn’t speed any one of them up. A single request will only be as fast as the code to retrieve and format the data. Events don’t change that.”
You’re finally making a valid point! Great! Now why is this a complaint against node?
You’re complaining that node does exactly what node is meant to do. How foolish and silly.
“I have used Javascript for over 15 years.”
Have you really started working on Javascript at the very early stages?
“it was renamed JavaScript in a joint announcement with Sun Microsystems on December 4, 1995″
“JScript was included in Internet Explorer 3.0, released in August 1996″
Source: http://en.wikipedia.org/wiki/JavaScript
Yes. Since 1996 or 1997
Haters gonna hate
> HTML Generation is usually not the slowest part
Irrelevant, node is about reducing blocking IO bottlenecks not a blazingly fast HTML templating system
> Functional programming facilitates optimized and parallelized execution
Agreed, use erlang or haskell instead. But good luck with that, those communities are no where near as large as node and have a much larger learning curve.
> Node enthusiasts are front-end coders not wanting to do server coding
That’s just an insult. I don’t know anyone like that. Try joining the right sub communities, like the node IRC room or github.
It’s not irrelevant. That’s a very important use case for a lot of web developers, perhaps the most important: delivering HTML. And oftentimes that HTML needs to be dynamically generated. If I can’t deliver dynamic HTML fast, then nothing else the server does is going to be of much use.
> If I can’t deliver dynamic HTML fast, then nothing else the server does is going to be of much use.
Node is not just about CRUD applications or dynamically generating HTML. In fact, if you’re trying to use Node just to make a CRUD app, stop right now. That problem has been solved a long time ago. There are many, many uses for Node. Want to write a quick CLI program? Want to send data to your users RIGHT NOW? Want to make a realtime front-end for your Arduino project? Want to do this all with Open Source, easy-to-understand JavaScript?
That is what Node is, not some new sexy way to write your CMS.
> But there are lots of things I like about Node, but not the community
You give no reasons for your opinion, let alone a compelling fact or anecdote. Did someone mistreat you? I can assure you that the Node community is filled with kind, amazing people.
The fact that you think this is what node.js is intending to do is hilarious. Why would anybody write yet another language to output html templates? That has nothing to do with node.
It’s clear you missed the paradigm completely. I’m sorry that happened to you but that doesn’t mean you need to hate on others because they know how to comprehend the difference between a template generation language and a non-blocking IO language.
Sorry that you’ve only ever worked on a wordpress blog but some of us really do have IO issues. Load is a concern for most of the world. Let’s try this again.
Additionally, have you ever noticed how the default in node is to not even listen to HTTP requests? Notice how you have to explicitly do web development stuff and it doesn’t just happen out of the box? That’s because it’s not an HTTP server. It’s a non-blocking IO framework.
Way to miss the point.
> Node is not just about CRUD applications or dynamically generating HTML. In fact, if you’re trying to use Node just to make a CRUD app, stop right now. That problem has been solved a long time ago.
Ok then. Then I’ll expect that I shouldn’t hear much about node.js as an html server
“Ok then. Then I’ll expect that I shouldn’t hear much about node.js as an html server”
I hadn’t heard much about it until you came here bitching that it wasn’t a great HTML server. That’s like complaining that your motorcycle only has 2 wheels.
> Did someone mistreat you? I can assure you that the Node community is filled with kind, amazing people.
Actually I was using node and I tweeted to my 20+ followers an observation about node that somehow got noticed by some people in the node community. They weren’t exactly nice. This coupled with experiences with other node enthusiastics I know prompted me to write a blog post on it. The reaction to this blog post pretty much validates my opinions about the community. Very defensive.
But like I said, I still plan on using it for some things.
” They weren’t exactly nice. This coupled with experiences with other node enthusiastics I know prompted me to write a blog post on it.”
He asked you to expand. Think you could do that for us?
“The reaction to this blog post pretty much validates my opinions about the community. Very defensive.”
You’re making absurd arguments based on nonsense for a viewpoint nobody ever argued except you. Of course they’re defensive — we already have enough problems with the rest of the world mistaking this as another PHP. We don’t need idiot blogger-gone-developers adding to the problem.
“idiot blogger-gone-developers adding to the problem.”
Isn’t this the type of arrogant personal attacks becuase he has another opinion and experience that he is saying is a problem?
I feel your emotions, but leave the “idiot” out.
Not surprising some people are defensive.
For the problems you mention, other technologies already excel at that and they aren’t problems node.js solve more gracefully.
Node.js is however awesome for other use cases you don’t mention such as real-time communications, background asynchronous processing/queueing/dispatching, non-blocking API wrapper, etc. For these, node.js is much simpler and extremely efficient compared to technologies that compete for these use cases.
I’m not a front-end developer and play more with the back-end and I’ve found node.js technically beautiful for some problems. Not every problem, though.
Twice now in this thread one mentions that node is for real-time applications. This is nonsense ! You cannot have any guarantees on response time in the presence of cooperative concurrency. This point has been made many times. Check out Erlang !
And what did you do with Erlang?
nodejs is also for my daily scripting needs. easy to use with linux or mac.
I am using for some automated scripts, data fetching or any other daily scripting needs.
You do realize node.js is an event based I/O framework, not a web framework, right? Some halfwits like to make this last bit the main selling point, but in real world, the first definition is actually the true definition.
Yeah, I’m also using node.js for my realtime browser game where I know PHP simply wouldn’t be fast enough and hog too much memory.
If you’re going to do a company site or something like that you really don’t want to use node but I’m using it because it’s blazingly fast and that’s exactly what I need for my real time game.
I thought the main use of node was to stitch together multiple backend rest calls into one HTML page. This isn’t done afaik for speed but because it is a way to do it easily by people that might only know JavaScript.
I don’t quite understand your statements:
1) Want to send data to your users RIGHT NOW?
How does Node can speed up MySQL query? Your callback will be executed sometime in the future and will return the data at that time, not now.
2) Is it really really hard real-time or are we talking about an “illusion” that is real-time (soft real-time, OS dependent)?
3) Easy to understand JS? Arguable. Very very very arguable. You can easily shoot yourself in the foot in many many many many different times using JS. It took Crockford _years_ to be good at the language (and only focus on a subset of the language with lots of “Don’t Do That”
I like the CLI perspective, but the rest are “meh” until the libraries are all there.
hater.
> The fact that you think this is what node.js is intending to do is hilarious. Why would anybody write yet another language to output html templates? That has nothing to do with node.
I don’t know, you better tell all the people doing just that:
What companies are using Node.js in production?: http://www.quora.com/Node-js/What-companies-are-using-Node-js-in-production (note all the webservers usage)
What it’s like building a real website in Node.js: http://www.travisglines.com/web-coding/what-its-like-building-a-real-website-in-node-js (again with the webserver)
Q to Ryan Dahl: For what kinds of applications do you see people using Node.js? What are the use cases that make Node.js shine? “…I think Node can be used in a lot of different ways, even for a traditional sort of request-response, response-database query-response sort of websites, but also maybe for like sensor networks or something like that.” http://www.infoq.com/interviews/node-ryan-dahl (someone should tell him it’s ridiculous for him to think that).
stackoverflow q: What is node.js? Accepted Q: “Web development in a dynamic language (JavaScript) on a VM that is incredibly fast (V8)….Using JavaScript on a web server as well as the browser reduces the impedance mismatch…”
In this node.js thread “[nodejs] the old template story back and forth” (http://osdir.com/ml/NodeJS/2011-11/msg01132.html) I didn’t see anyone mention that it’s ridiculous for any templating to be done in node
We need to get the word out to these people that doing any dyanamic generated HTML with node.js is ridiculous.
Wow, point & match.
http://i.imgur.com/1OLYw.png
1. The assumption in Node is that HTML Generation IS THE fastest part, which makes your whole argument a straw man. What node offloads is the IO, which is the same principle on which Rails or any other dynamic language based framework was originally conceived.
2. Functional Programming
Have you written any web apps in your favorite “functional” language or framework? Which one do you use?
Most of the web apps need to pull some data, before it can proceed further. Functional goodness will only help if the algorithm itself can be made parallel.
3. Processing a lot of data quickly
This has nothing to do with Node anyway. A good programmer would fetch just the data required for the page. And the expensive queries run on the database, not within Node. If you are building a typical web app by bringing all data to Node, then you are doing it wrong.
And what you are doing is probably wrong in any framework you choose.
Of course, there are exceptions based on what your app does.
4. Node enthusiasts, Front-end coders. etc
No comment.
Please refrain from such posts until you have sufficiently experimented with Node.
Sorry to say but your post stinks of ignorance. Node is a system programming platform, web development happens to be just one of its many capabilities.
Generally speaking, the techniques you would use to optimize the speed of one single request are not the same techniques you would use to maximize concurrency & throughput under load. So if your top priority is one super-fast request, then yeah, don’t bother with callbacks on I/O. But if you want that one super-fast request to scale to 10,000 very fast simultaneous requests you are going to have to make more efficient use of server resources. I defy you to find a single large-scale production system that doesn’t make use of something like Node’s asynch callback model. It’s the reason ALL APIS IN WINDOWS 8 THAT TAKE MORE THAN 50MS ARE ASYNCH ONLY. They are not going to even give you the option to make those calls synchronously.
You are right that you should make all the parts of your system as fast as possible, cache like crazy, and wring out all inefficiencies. But if your attitude towards thread utilization is “if it has to wait for the data, it waits”, you are needlessly putting an upper limit on how much of the CPU your app can use. I don’t care how fast MarkLogic is. Unless your app is running inside a MarkLogic thread, getting any data from it, even freshly cached, is orders of magnitude slower than anything you do in-process, and your thread is sitting there idle for that time.
If Node switched from Javascript to pretty much any other language tomorrow, I suspect 90% of people would stay with it. If it switched to a synchronous model, 100% of people would leave it. That’s the value. If you think MarkLogic’s speed & your ability to write fast code means the asynch/callback model is stupid, good luck. I look forward to inheriting your customers when their business grows by a factor of 50.
Now the interesting thing is that while I have implemented services on Node, I doubt anyone commenting here has implemented anything on MarkLogic. This post is not a Node vs MarkLogic, although Markloigc is my platform of choice and is the focus of this blog. But I was really commenting on Node in general, but I think some info on MarkLogic would help.
> I don’t care how fast MarkLogic is. Unless your app is running inside a MarkLogic thread, getting any data from it, even freshly cached, is orders of magnitude slower than anything you do in-process, and your thread is sitting there idle for that time.
Actually my code does run in a MarkLogic thread. This is why we call implementations on MarkLogic “two-tiered” because there is no separation between the data-retrieval and app code in terms of process threads. The application code and the data queries execute together. You could think of it like returning dynamic HTML straight from Oracle using PL\SQL. That may sound painful, but imagine the concept and imagine that it was done extremely well. This is as if all your data, up to petabytes, is all already in memory in your application code’s process.
Data selection is extremely fast because MarkLogic indexes practically everything about the data: values, entities, relationships, even value ranges and word stemming. In fact, the sign of a good MarkLogic implementation is that all data selection is done via the indexes, not from reading the documents in the DB.
Data retrieval is extremely fast for several reasons.1) You may be able to get your data straight from the indexes, never having to even get the document. 2) Your document may already be cached in one of several caches, thus avoiding a disk hit. 3) You can scale horizontally using hundreds of hosts so that the data retrieval can be parallelized 4) Even when you do hit disk, the data fragment is small so not a lot needs to be pulled off disk.
MarkLogic uses Multiversion Concurrency Control (MVCC – http://en.wikipedia.org/wiki/Multiversion_concurrency_control) to run lock free, for both reads and writes (although you can get a lock if you want). Writes are journaled, but changes are only written to disk when the server decides to, not during your code’s execution in returning a response. So there’s no IO blocking in your app code even during a write. MarkLogic also implements a functional language, XQuery, which is inherently threadsafe since there are no mutable objects and the only shared resource is the data in the DB, which uses MVCC (see how this comes together?).
I ran some tests awhile ago where I created a data set in memory and the same data set in a MarkLogic DB. I created queries against the data, copied, and manipulated the data but the performance was the same whether the data was in memory (my code) or in the DB. Until we are talking massive amounts of data, I couldn’t create a non-trivial data set in memory that would perform faster than putting in the DB (think about that). This is also why I was able to search through 15,000 WURFL records in realtime, for each request, and have it measure 0 seconds when profiled (http://xquerywebappdev.wordpress.com/2011/08/19/the-1-millisecond-page/).
It turns out that this is actually a simpler programming model than 3-tiered OO platforms. I have been involved with helping several programmers new to MarkLogic and they pick it up fairly quickly. It’s just very different than the dominant 3-tiered OO model that everyone is used to. I suggest reading Jason Hunter’s paper “Inside MarkLogic” for more explanation on the concepts (http://www.scribd.com/doc/40204435/Inside-Marklogic-Server).
So what if you could write servers that effectively had all data in memory (scalable to petabytes), that did little or no disk IO per request, that indexed all data and all relationships among the data, that implemented a functional programming language along with MVCC which all but does away with concurrency issues, and which is highly-suited for X\HTML manipulation? Now wouldn’t that be cool?
I haven’t even mentioned word stemming, geospatial queries, document level security, alerting, or database replication\failover. And I haven’t mentioned how the military, big publishers, big financial firms, and other big names use this. So it’s gone the distance.
You completely missed the point and advantages of node.js, it has nothing to do with user experience.
I will admit that it does sound like Node isn’t an appropriate fit for your problem space. But that’s a huge jump from ignorant and deliberately inflammatory comments like “Node enthusiasts are front-end coders not wanting to do server coding”. Some are, sure. And I could write “MarkLogic enthusiasts are lightweights who throw tons and tons of money at scaling because they can’t write real programs” (my firm used MarkLogic a few years back and dumped it because the costs rose so quickly as data volume grew). Or MarkLogic developers are “mostly novices that don’t want to think about the CPU so they use the ‘slow data retrieval’ argument to convince others.”
I don’t think either of those statements are true, and I know some VERY serious backend developers who love Node despite the fact that Javascript is the language they’re least comfortable with.
Keep it about the facts and people won’t write things about the arrogance and limited technical skills of the MarkLogic developer community.
Well you can still make the claim about MarkLogic developers. My claims about Node enthusiasts wasn’t the reason node isn’t my choice for lots of things. My post has two elements to it: Node solves problems I don’t have, and I think Node is popular with front-end devs not wanting to do server coding.
But state your case if you think Marklogic devs are light-weights.
I think your first point: Node solves problems I don’t have, is legitimate, but is undermined by your very subjective, unsubstantiated, and inflammatory second point. You were a lot clearer in your comment responses on how MarkLogic avoids I/O blocking, IPC, etc. to provide crazy high performance than you were in your original post. I would suggest that expanding on that in the main article would have been a much better use of space than conjecture/trolling based on “all the Node enthusiasts I know”.
I don’t think ML devs are lightweights, and I’m emphatically not making that claim. I’m pointing out that any MarkLogic deficiencies or limitations I’d care to write about would not be taken seriously if I seemed to be suggesting that I was smart for not using it, and you must be a lightweight if you did. I learned a lot about ML from your post. I learned nothing about the pros/cons of Node/asynch/Javascript.
I think you’re missing the point of node entirely. It’s main advantage (at least in my mind) is for providing real-time features to websites via streaming connections. It is vastly better for this purpose than threaded servers. If you try to use a threaded server for streaming connections, you end up with hundreds or thousands of threads using memory and scheduler resources, even though aren’t actually doing anything 99% percent of the time.
I thoroughly enjoyed reading this comment thread. I make my living doing MarkLogic development, and have completed a handful of projects using Node (all but one was a web app). All I have to contribute to this mostly-dead discussion is that a real fun project would be providing RESTful access to my content via MarkLogic, and use Node to provide that content to the front-end. I have never used anything more enjoyable than MarkLogic for querying content, although CouchDB and MongoDB come close in all aspects except searching, and I have been mesmerized by Node – I’m just not sure if it is a passing fancy or a lasting love.