Home > Uncategorized

The Death of the Data Center. Part 8 – Software Optimization

April 30th, 2009 Comment Go to comments

Let’s start with the statement I made in the last posting:

Executing cpu instructions is what a data center does.

This is the point of the whole series of articles I’ve produced here. If you build a car factory it’s in order to produce cars (of a given quality) as economically as possible. If you build a power plant it’s in order to generate power as economically as possible. If you build a scaled out data center, it is in order to execute cpu instructions as economically as possible.

Building such assets is a risky business. If you’re in a competitive market of any kind you can only control the price to a limited degree.  After that, profitability depends on sales success and cost control.

So a company will build a massively scaled data center (or several) with the specific goal of keeping throughput costs as low as possible. It doesn’t matter whether it’s Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Software as a Service (SaaS), the metric that is going to matter is the cost of executing each instruction. Actually there are other metrics that matter, the cost of managing each byte of data stored and the cost of each byte of information transmitted or received by the data center, but they are the same kind of metric.

I’m deliberately framing this in technical terms, because if you are going to optimize the throughput of a data center, you need to look at every possible strategy and it starts with the nuts and bolts. In fact, it’s mostly about nuts and bolts. So I’ve created a nuts and bolts list of ten possible areas where you might have a technical axe to grind:

  1. The cpu chips: Why not have your own chips designed? The division between machine instructions on the chip and software (compiled instructions sent to the chip) evolved from the need for flexibility, but in the scaled up data center you may not care much for flexibility at all. If it’s a PaaS data center or an SaaS data center, why not burn the application (or some of it) onto the chip. It will go a lot faster if you do.
  2. The OS: Why not pay a team of software engineers to optimize the OS you’re using for your area of application. This is particularly easy if you use Linux of FreeBSD, because there are lots of software engineers around who know those environments. I discussed this a little in the last posting in respect of removing unnecessary bits. However you may wish to go a little further than that in terms of having an OS that is communicating with other OSes across a vast network of OSes. You might like to include specific scale out capabilities.
  3. The App: Again for PaaS and SaaS, why not optimize the apps too. They might be written in an inefficient programming language; well rewrite in an efficient one. It might even be possible to tinker with the compiler or isolate routines that would be done better in Assembler and have someone translate them to a lower level. Remember that you’re building for a specific target environment.
  4. Real 64 bit architecture. A 64 bit address space is very big (in the exabyte area.) It can certainly directly address all the memory on 50,000 or 100,000 servers, plus a vast array of solid state memory too. We don’t have OSes that are built to do that, but if you are already writing the software at a low level, then why not write the OS to do some direct addressing across the whole data center – including through the network switches. This will save an immense number of cpu cycles.
  5. Compress the Data: Data compression saves space, but usually at the expense of a few cpu cycles, so there’s a trade-off to calculate here. Depending on circumstances you might be able to build a really impressive compression capability. There might even be the possibility of using solid state storage rather than hard disk. Solid state uses less electricity. The whole equation here may be very complex and is highly application dependent.
  6. Data Architecture. You can go with any of the free databases (MySQL, PostgreSQL, Firebird, etc.) But you may be inclined to use your own data structures. All the commonly used databases, whether free or paid for, were built yet again for the average circumstance. But you have the opportunity to build a data service for a specific circumstance. Remember for example that Google uses Map Reduce for some of its data. If all the data is going to be held in addressable solid state storage then any existing database is definitely the wrong product and will be horribly inefficient, because it is assuming the data comes from a spinning disk.An in memory structure will be far more efficient.
  7. Back-up, Disaster Recovery, Follow the Sun: The 3 data center strategy with mirroring (which I spoke of in The Death of the Data Center (Part 4 – Power Distribution and Cooling)) looks promising as a means of never needing to have back-ups (or UPS). It looks really inexpensive for disaster recovery and if you have a global business you may even be able to load balance across 3 data centers in the US , Europe and the Far East. In many instances the workload will follow the sun.
  8. System Management: There’s a potentially big win in the system management area. Part of root cause analysis and maintenance can be carried out preemptively by service management processors. Because you control the OS you can also insert an agent in every instance of the OS – an agent that gathers the data you need (and no other data). You will end up with a purpose built Configuration Management Database (CMDB) that actually works. You can ensure purity of software across the data center and you can also choose to upgrade the whole environment only every 18 montsh, say.
  9. Security: You can bolt the whole data center down with a set of closely controlled permissions. The main worry is that someone (external or internal) manages to run a rogue process that does something nefarious. But if there’s only one closely controlled mechanism for loading any executable, that’s never going to happen. You can even design that in to the OS and the system management capabilities. Security is always an afterthought in corporate IT, but it’s actually possible to write it into the application.
  10. Client caching: Put as much processing as possible on the customer’s client device (PC or MAC or even Smartphone) so that it doesn’t get executed in the data center. Naturally you will put the interface on the client, but you can certainly design the architecture so that it makes maximum use of the client. The beauty of this is that the customer probably won’t mind because most client devices have cpu cycles to spare.

I have not compiled this list with the idea that all scaled out data centers will, in the future, adopt  all these technical tactics. All I’m demonstrating here is that there are many areas of attack, where the designers of a data center can reduce the number of cpu cycles required to carry out recurring tasks. The motivation to do this will be very high. In the last 20 years we’ve watched cpus grow increasingly powerful and seen their power squandered by programmers that no longer cared to write efficient applications.

Well, the motivation for efficiency has now returned.

Also:

The Death of the Data Center: The Model
The Death of the Data Center: Location, Location, Location
The Death of the Data Center: Power
The Death of the Data Center: Cooling
The Death of the Data Center: Networking
The Death of the Data Center: Server Hardware
The Death of the Data Center: The Software
The Death of the Data Center: Software Optimization

Categories: Uncategorized Tags: Subscribe to RSS feed
  1. No comments yet.
  1. No trackbacks yet.