A frequent meme in both the Software Engineering class I teach and during my open office hours (when any student can come for advice or discussion on anything) is what appears to be a premature fear of lack-of-scalability when developing Internet services/Web apps. This concern is often hard to disentangle from techno-fetishism—the often irrational desire to use the latest rockstar tech, without really understanding whether it solves a problem you have, but providing the flimsy rationalization that “we need to use it for {scalability | dependability | having a good user experience | responsiveness}.”
There are various manifestations of this meme, including:
There are various manifestations of this meme, including:
- "We want to use {Mongo | Cassandra | your favorite NoSQL DB} because relational databases don't scale."
- "We want to write our server in Node.js because synchronous/blocking/task-parallel appservers don't scale." (Weaker version: replace "don't scale" with "require more servers".)
- "Our app is going to be huge, and we'd like your advice on how to achieve high scalability and 24x7 dependability across our soon-to-be-worldwide user base."
These thoughts are well-intentioned; many of these students are old enough to remember how MySpace crashed and burned in part because of its inability to get these things right, leaving a vacuum into which Facebook immediately stepped, and the rest is history, etc.
But since that time, the tools ecosystem has come a long way, computers have gotten way faster, and server cycles and storage have dropped in price by one or more orders of magnitude.
But since that time, the tools ecosystem has come a long way, computers have gotten way faster, and server cycles and storage have dropped in price by one or more orders of magnitude.
My challenge is therefore to help these students understand two things clearly:
- No technology will help you if your fundamental design prevents scaling (your app is poorly factored and/or cannot be divided into smaller services that can be scaled separately).
- If your app is well-factored, these days you can get surprisingly far with conventional RDBMS-backed stacks on commodity hardware.
Ozan Onay makes this point well in his post "You are not Google," which incidentally includes quotes from my Berkeley colleague Prof. Joe Hellerstein. As Onay points out:
"As of 2016, Stack Exchange [the forum system of which StackOverflow is just one part] served 200 million requests per day, backed by just four SQL servers: a primary for Stack Overflow, a primary for everything else, and two replicas."
In other words, the highest-traffic interactive Web app used by developers all over the world uses the most mundane (in many developers’ view) of technologies—most SQL servers support this kind of master/slave replication by simply tweaking a config file, no special app programming required. Similarly, as of a couple of years ago, all of Pivotal Tracker was hosted on a single RDBMS instance.
A more sarcastic view of the same meme is put forth in one of my favorite xtranormal movies on Node.js. Go ahead, watch it. It’s only a few minutes long. You’ll thank me someday.
The basic message is the same: Most new technologies, languages, frameworks, and so on evolved to fill a specific technical need, and especially if they evolved in a large company, they probably fill a technical need that you don't have, and that you may never have.
There is a subtext to the message too: it is relatively rare that a new tool or framework embodies a fundamentally new idea about software engineering. The event-loop framework used by Node.js has been around since the beginning of time; indeed, the original Mac OS required all apps to be written this way (in part because the original Mac used a microprocessor that didn’t support virtual memory, making true preemptive multitasking impossible), which was super painful. OS designers created threads to make this pain go away; threading systems do exactly what programmers are expected to do in Node. And modern threading systems running on modern hardware are damn fast.
There is a subtext to the message too: it is relatively rare that a new tool or framework embodies a fundamentally new idea about software engineering. The event-loop framework used by Node.js has been around since the beginning of time; indeed, the original Mac OS required all apps to be written this way (in part because the original Mac used a microprocessor that didn’t support virtual memory, making true preemptive multitasking impossible), which was super painful. OS designers created threads to make this pain go away; threading systems do exactly what programmers are expected to do in Node. And modern threading systems running on modern hardware are damn fast.
Finally, and most importantly, no tool or framework can save you from a bad design. Some frameworks try to help you by effectively legislating some design decisions out of existence (e.g. Rails all but mandates a Model-View-Controller architecture), but there's still plenty of room to shoot yourself in the foot design-wise.
That will be your real obstacle to scaling and dependability—not the fact that you didn't go with Node.js or write your front end using React. So you can still wear that Node t-shirt, but please consider wearing it ironically.
That will be your real obstacle to scaling and dependability—not the fact that you didn't go with Node.js or write your front end using React. So you can still wear that Node t-shirt, but please consider wearing it ironically.