What is Larry Thinking? #69 => Going Big, Micro Version, Part 1

In this edition…

About This Newsletter

This is the second of a two-part newsletter on “going big”. By “going big” I mean how one transitions from a Web site with little to moderate traffic, to one that can handle tons of traffic. The previous newsletter looked at going big from the macro perspective: theory, implementation, hardware, and networking. In this newsletter, I’ll look at the micro perspective: how to write code that scales well. And, as it turns out, this newsletter again got to be too big, so this is part one of two parts that makes up part two of the two-part series. (Huh?) In this newsletter, I’ll mostly focus on code. The next will mostly focus on the database.

Before going into details, I’m going to define what it means to be a “big” site. As I said in the previous newsletter, it actually depends upon the kind of content and activity the site has: X number of video requests is far more demanding than the same X number of mostly text pages. Likewise, X number of WordPress page requests is far more demanding than the same X number of static HTML page requests. For the purposes of this discussion, let’s say that “big” is a site that gets in the broad neighborhood of 100,000 to 500,000 pageviews per day. At that point (if not before), you’ll need more than one server to handle the load. (As a counterpoint, on the highest end, Netflix sometimes requires up to 20,000 servers at a single time.)

As always, questions, comments, and all feedback are much appreciated. And thanks for your interest in what I have to say and do!

What Were You Thinking? => Going Big, Macro Version

After my previous newsletter, Ross shared with me his own experiences when it came to hosting, going big, and jumping the gun. Ross had an interesting experience, learned a lot, and was nice enough to let me share his story here (which I’ve hopefully not garbled too badly).

Ross had created an online game, and when he went from the private testing stage to the public beta, the amount of activity quickly spiked. Ross suddenly found that his Virtual Private Server (VPS) account was insufficient to handle the load, and as income was already coming in at a great pace, Ross quickly upgraded to two dedicated servers: one web and one database.

Unfortunately for Ross, both the activity and income then dropped pretty quickly, leveling off just above profitable over the course of the first year. Both continued their decline over the next couple years.

In hindsight, Ross says he should have realized sooner that the upgraded two-server hosting was too much money and unnecessary, but it took three years before Ross downgraded the hosting to a single VPS again.

Ross also discovered after the fact that one of the driving forces behind the initial need for a more powerful hosting solution was inefficient code. Particularly problematic was a lack of proper indexes on the database. Later implementing those changes resulted in a much better performing site. Ross’s realization is great for me, as the purpose of this particular newsletter is focusing on coding for a site that could go big!

Fortunately Ross still feels pretty good about his experience, appreciating what he learned, the money that did come in, and the people that he met along the way. In his most recent email to me, Ross said:

…[the] next time around…I’ll be able to do more, faster, with fewer mistakes. I’ve been able to pass on some of the lessons learned to friends and family, which is satisfying, and if there’s some wisdom that helps your readers, too, that’s even better.

Thanks, Ross!

What is Larry Thinking? => Two Keywords for “Big” Programming

If I were to summarize how to program a site for a potentially “big” scale, it’d come down to two words: optimal and flexible.

Optimal means that the site should perform as well as possible. This should seem obvious, of course, but it bears repeating. Entire books could be written on coding optimally, but a few rules of thumb include:

Don’t create variables or functions you don’t need
Minimize interactions with the file system
Minimize interactions with the database
Minimize networking
Limit the amount of information retrieved from the database
Watch usage of loops and recursion
Minimize and compress the amount of data transferred to the browser

A catch is that you need to avoid premature optimization. Don’t overthink what you’re supposed to be doing while you’re performing the initial programming! The goal is really to write code using the best practices. Then, when the site is done, profile and benchmark what you’ve created to see how it performs. Then you can attempt to fix bottlenecks.

Next, there’s the issue of writing flexible code. To understand the importance of flexible code as it pertains to “going big”, one has to understand the two ways in which a site may need to grow:

Scale to handle more traffic
Add and change features and functionality

A site can often be scaled to handle more traffic simply by throwing more resources at it: better hosting, a better server, more memory, more servers, a Content Delivery Network (CDN), etc. You generally don’t have to do anything with your code to make it scale-able, although as Ross discovered, inefficient code can make it necessary to spend more on resources sooner than you otherwise should. It’s this second issue–the ability to add and change features and functionality–that’s of relevance to your programming.

Some years ago I saw a presentation by an engineer at YouTube who said that it was more important that they could fix problems quickly than that the site itself runs quickly. I’ve heard similar thoughts expressed by many other programmers: a developer’s time is precious, but a site can be made to run faster merely by throwing an extra $100 worth of RAM into a server.

The point of flexible code is that the site has to be written in such a way that it’s quick and easy for you to fix problems, add features, and make other significant changes. If the code is hard to update, then updates either won’t happen or won’t happen quickly enough (and therefore, users will leave).

As an example, I’m working on a site now for a customer with some terrible legacy code. The site works well enough, but the customer needs to add features and improve performance. The performance can largely be remedied by fixing several problems in the database:

A lack of indexes
Inconsistent column definitions

The database issues are not too difficult to solve. However, each added feature is taking twice as long as it should because the code is a mess. One main script is over 3,000 lines long! And because the site is being actively used (and is currently profitable), I cannot make the large, sweeping changes I’d like to make, at the risk of hurting the business. Hence, baby steps are required, changes are being made very slowly, and the client is paying extra because it’s so tedious to work with this code.

So how do you write flexible code? I’ll answer that specific question next, but first, there’s a conflict to be understood: flexible code is at odds with optimal code. Generally speaking, if you make your code more flexible, it won’t perform as well.

As a specific example, many developers use database-agnostic interfaces, such as PDO in PHP. PDO gives you a consistent interface regardless of the database in use. PDO also allows you to switch database applications on the fly, should that need arise. But the added interface layer drags down performance. Moreover, by being database agnostic, you can’t write your code to take advantage of features and capabilities of any specific database application. You’ve increased flexibility greatly, but hurt code optimization.

In this particular example, I’m generally of the opinion that one should just write the code to a specific database application. In almost 14 years of Web development, I’ve never once switched underlying database applications on an existing site. But the main point is that you have to find a balance between writing optimal code and writing flexible code. Fortunately, most of the best practices you’ll see online and read about in books hit this mark pretty well.

Q&A => How do you write flexible code?

I forget who asked me this and when, but the question surrounded what, exactly, flexible code is and how one writes it. There are two ways you can write flexible code:

In the code itself
Using a good workflow

One of the first hallmarks of flexible code is modularization: dividing code up into discrete, independent blocks. At the more beginner level, using multiple files is a modular approach. One file might connect to the database, another is a configuration file, another has useful functions. The primary files then include these as necessary.

Defining your own functions for certain routines is also a modular approach. With both of these examples, the benefit is that you can edit one piece of code in one location and have site-wide impacts. That’s flexible code.

Second, create variables for values repeatedly used throughout a site. This could be the site’s name or URL, an email address, and so on. You might not think you’ll ever change a site’s URL, but if you do, having assigned that value to a variable that’s used throughout the site will make that change a snap.

Third, and on the more advanced level, Object-Oriented Programming creates much more flexible code. It’s modular by definition. Sweeping changes can be made in single locations, including adding features. I’m not an “OOP is always better” kind of developer, but I don’t think there’s a question that OOP makes for more maintainable code when it comes to large sites. Well, good OOP, that is.

You can also create flexible code by creating good workflows. In other words, flexibility isn’t just demonstrated by the code itself, but also how you generate the code. For example, mastering your development tools is crucial. But more specifically, if you start using technologies like CoffeeScript (a JavaScript generator), and Sass, LESS, or Compass (CSS frameworks), edits can be made in one location and reflected everywhere. I’ll talk a bit more about these later in this newsletter.

The final component of your workflow should be version control. There are lots of arguments for version control, and being able to quickly institute code changes is just one. Being able to immediately revoke bad code changes is another! For large sites, the more advanced solution is continuous integration.

Again, much of what I’m talking about here is just a case of best practices: the kinds of things you should do whether or not your site would ever go big. The question is really how far you take any single approach.

Q&A => What do you think about Sass and Compass?

A thousand years ago or so, Jamie asked for my thoughts on Sass and Compass. Jamie said:

I’ve used Sass (SCSS) for a few projects, and Compass for one or two, and could never go back to vanilla CSS after doing so. I’m now looking forward to trying Sass’s “Sass” syntax in future projects.

At the time Jamie asked, I wasn’t really using CSS on such a level to appreciate these tools. And, to be frank, that’s still true. As a backend developer, I don’t do much with CSS, aside from the occasional edit of existing values to change a font’s size, a background color, or the height of a header. That’s changed some thanks to Twitter Bootstrap. When I’m using it for a site, I’ll end up doing more CSS edits, simply to customize the default Bootstrap look to something slightly more original (slightly). Bootstrap uses LESS, so I’ve started using LESS more myself (“LESS more” HA!). Coupled with CodeKit (for the Mac), it’s a great workflow to quickly develop sites.

Personally, I still don’t have much need for most of the features of these CSS frameworks. But they are great and totally useful. If you’re curious, I’d start out with something basic like LESS. If you use CSS a lot, then you may want to consider Sass.

On the Web => CoffeeScript at Dropbox

CoffeeScript is a language used to simplify the writing of JavaScript. One writes code in CoffeeScript and then compiles it, with the resulting output being usable JavaScript. The arguments for using CoffeeScript are:

Less code to write
Easier to maintain
Results in better JavaScript

I personally had not been too sold on using CoffeeScript until I read about how Dropbox used CoffeeScript to rewrite all of its JavaScript in a single week. Three employees performed that task, converting over 23,000 lines of JavaScript into 18,000+ lines of CoffeeScript, without affecting any of the functionality.

Larry Ullman’s Book News => “The Yii Book”

There was not as much progress on “The Yii Book” last month as I would have liked, but I’m hoping to change that in May. I did some changing around of the table of contents. These are just changes that came organically as the book progressed. I did not drop the modules chapter, but I originally had one chapter on using extensions, one on modules (writing and using), and one on writing extensions. It just made more sense to do one chapter on using extensions (including modules) and one on writing extensions (including modules). So there’s that. And I changed the order of the other chapters in Part 3 a bit.

I’ve finally completed Chapters 15, Internationalization, and 16, Leaving the Browser. I’m getting those to the tech editors after this newsletter goes out, with an update to follow sometime thereafter.

It’s May, so I’m really bearing down to push through finishing this ASAP. I know I say this every time, but I really do appreciate your patience with this. You’ve been great. Truly. The writing itself hasn’t taken that much longer than expected, but everything else surrounding self-publishing has been a major drain on my time. I’ll write about all this when it’s behind me.

Also, in case you missed it, the first public preview of Yii 2 came out earlier this month. I’ve written how I expect that to impact “The Yii Book” in this blog post.

Thanks to everyone for their interest in this book and to everyone that has already purchased a copy!

Series

Blog post categories

Note