A Terrible Thing to Waste

Trends & Tech

"Waste not, want not" might be an old saying, but itsapplication to CPUs will help us to make the most ofprocessing.

by Paul Rolich

We live in an age and a world where squandering of naturalresources is the standard. Fossil fuel will be depleted before thenext millennia; the polar ice caps will have melted by that time;we have littered this planet from one end to the other. During mytime at sea in the navy I rarely was able to view an oceanunscarred with floating man-made detritus. There is no need tocarry that same philosophy of abuse into the world of dataprocessing. I like my solutions to be clean and elegant. Wastinghardware in applications where it is ill used just goes against mygrain. It doesn't make any difference whether you are running theIT shop for a billion-dollar insurance carrier or a midsizeagency–there always are efficiencies to be gained by judicious useof computer hardware. Moore's Law may imply we always will have anew pool of computing resources to play in, but that doesn't meanthat you always need the biggest, newest pool.

The current state of computer hardware is a sliding window withever-changing parameters. I remember (painfully) paying $750 of myown money not too long ago for a 16 MB stick of RAM. I spent manydays thinking about that expense before I took the plunge. For thatmuch money today, I can purchase a killer off-the-shelf system with32 times as much RAM–and that is not even using inflation-adjusteddollars. The relatively low cost of hardware has created ageneration of lazy programmers and bloated software, both rarelyconcerned about optimization. Yet that overpriced RAM-loving officesuite you are so dependent on as it slows down everything else onyour computer is not even managing to use all the CPU cyclesavailable to it. Even on hard-running 24/7 servers, it is estimatedaverage real CPU is somewhere around 30 percent. What about thatother 70 percent? Is there any way we can get to that resource?Sure there is, but it will take a lot of hard work and cooperationbetween CPU makers, hardware manufacturers, OS vendors, andsoftware developers. Let's take a quick look at ways we canoptimize CPU use right now.

Big Machines

I love commuting. I speed but constantly am being passed bymonster trucks that clearly are not designed for highways orcommuting. (Don't let the leather and the chrome fancies fool you.)They come blasting by at 85 mph with huge off-road tires whining inagony; 6+ liter V8s running at 4500 RPM because they are not gearedfor highway use; air conditioners struggling to drive off theengine heat; and wind buffeting everywhere from the nonaerodynamicbodies (the truck bodies, not the drivers). What is our problem?Why do we want to buy big powerful machines and then not use themfor the purpose for which they were intended? We do the same thingswith computers. Do you really need a 3.1 GHZ 500 MB RAM MTProcessor machine to send e-mail and run the occasionalspreadsheet? Look at it from a business point of view. Supposeevery June your claims department handled double the amount ofclaims that it does any other month. Would you double the claimsstaff so that it can work efficiently in June and then look forwork the rest of the year? Probably not, that is, not unless thebottom line is unimportant to you.

Multitasking

I used to run a sort routine on an IBM XT with a few megs of RAMusing Lotus 123 v. 2.0. There were about 30K line items, and thesort would take all day. There was not even a hint of multitasking.All that poor machine did all day was compare and move over andover and over again. The processor probably was idle most of thattime. Moving all that data around was the real resource killer inthose days–slow, minuscule RAM work spaces, almost nonexistentcache, and no VM made for very slow data-intensive processing. Onthe other hand, that machine was a killer when recalculatingspreadsheets–it was darn good at floating point math.

Cooperation?

Enter Windows 3.X and early versions of the Mac OS, and we enterthe world of cooperative multitasking. The theory is an individualprogram or process will run for a while, check the queue to see whoelse is in line, and then relinquish that spot. All runningprograms must cooperate and agree to share processor time. OK, nowwe all know how that works. Cooperation in a queue is not an innatehuman quality. You see the sign: "Right Lane Closed 1/2 MileAhead–MERGE LEFT NOW." That is a signal for all those pickupdrivers we saw earlier to dash into the right lane and passeveryone already in line, clogging up the whole mess. Same thinggoes for software developers. Who in their right mind would releasesoftware that willingly would give up CPU time to another program?Of course you had to do it, but it didn't mean you had to do itfairly.

A new generation of OSs introduced preemptive multitasking.Windows 95, OS2, UNIX, and later versions of the Mac OS gave thepower to the operating system. The operating system would assigneach running process a slice of CPU time. The running process didnot need to have any knowledge of any other process running on themachine. As far as it is concerned, it has sole access to the CPU,RAM, VM, hardware devices, etc. This is pretty cool for softwarevendors. They don't need to be concerned with multitasking at allas it is totally transparent. In fact, it even allowed developersto spawn multiple processes or threads at the same time from arunning program. I can create two processes to handle aparticularly time-consuming task, and the OS may give my programmore CPU time because it sees two processes from my program insteadof one. This does not guarantee efficiency–it only guarantees yourprogram may get more time that it can squander inefficiently.

Swapping out processes is not just a matter of slipping in andout of queue. Each process has an entire "state" or "context" itruns in (all those things we talked about like memory, CPU state,etc.) that must be saved and then restored when that process gets anew time slice. Preemptive multitasking gives the impression manyprocesses are running on the same box at once, since we have fastmachines with lots of RAM and good VMs, but it still is a veryinefficient process. A typical CPU can execute three instructionsper cycle–something that rarely occurs. In fact, all available CPUcycles rarely are used. There simply are too many roadblocks on apreemptive multitasking machine to get enough instructions to thatCPU queue. Think of the queue as a series of little bins (threeabreast) on a rapidly moving conveyor belt. A single supervisor orcontroller fills the bins with CPU instructions as they flow by. Afully utilized CPU would have all the bins filled all the time.

More CPUs!!!

Machines still were not running fast enough to meet theever-increasing demands, so Symmetric Multiprocessing (SMP)machines came to the rescue. An SMP machine has multiple processorsthat can be used to run any process. Any idle processor can be usedto run any process. That means multithreaded apps now really canexecute quicker. I can spawn multiple threads for aprocessor-intensive application, and each thread will be able torun independently. All we have done is thrown another engine intothe mix, though. Each processor on an SMP machine still isrestricted and throttled by the operating system as it preemptivelyschedules processes out to the individual CPUs. Using our conveyorbelt analogy, each individual belt still is feeding empty bins tothe CPU. So, now we have a faster machine running multipleinefficient processors.

Both Intel with its Xeon processors and Motorola/IBM with thePower G5 chip introduced a concept called thread-level parallelism(TLP) on a single processor. Called Hyperthreading for Intel andSimultaneous Multithreading, it is a form of simultaneousmultithreading technology (SMT) that allows multiple processes torun simultaneously on one processor. Taking the conveyor belt onestep further, we can imagine two supervisors dumping instructionsinto the bins instead of one. Thus, the heart of the CPU–theexecution unit–is able to achieve greater unitization. This isaccomplished by sharing some resources on the processor andreplicating others. The "core" parts of a processor–executingunits, registers, and caches–are shared. All other bits of theprocessor (pointers, queues, buffers, stacks, etc.) either areshared or replicated. The shared and replicated services workindependently feeding instructions to the working bits.

This was and is a very nice enhancement to processor efficiency,but it still doesn't make for 100 percent utilization. Intel admitsto maybe a 30 percent increase in efficiency. Scale that back a bitfor reality, and it still is a nice little chunk of power to grabfrom a single CPU. Unfortunately, very little software is writtento take advantage of SMT. In the first place, most software isn'teven aware it is running in an SMT environment instead of an SMPenvironment. In a multiprocessor world, it makes sense to spawnmultiple floating-point-intensive processes because they will runon separate processors. In a hyperthreaded world, it would not makesense because running two threads that zap the same CPU resourcesimultaneously on the same CPU zaps it only twice and may be lessefficient than a single thread. Enabling software to take fulladvantage of SMT machines consists of dropping down to assemblycode to query the processor directly to determine just what it iscapable of doing. Machine BIOS hides a lot of information from theOS, so it is likely your application may not be able to distinguishan SMP box from an SMT box. Of course, most of us are runningmultiple processor hyperthreaded servers these days, which adds tothe confusion.

Enter the Dragon

The latest and greatest from our friends at Intel is aHyper-Threaded Dual-Core Technology. A dual-core processor is asingle physical package that contains two microprocessors. Thesebeasts will share some resources such as high-level cache. So, nowwe need to write software that is optimized to use simultaneousmultiple threads on a single CPU as well as to use multipleprocessors that share some common resources. Then we will havemultiple dual-core proc machines. I get freaked out now when I lookat performance monitor. I don't know if I could handle that.

I should have been a pair of ragged claws scuttling across thefloors of silent seas.

OK, maybe I am a nut. Why should I care about efficient use ofCPUs or RAM or anything else? After all, it's not my money, and itisn't even expensive–buy the biggest and best box you can and loadit up. Nobody seems to care about global warming, so that wouldpush computer efficiency way down the list of things we shouldworry about. Maybe I care because I have been around computers fora long time. I remember reading Claude Shannon's works oninformation theory while I was cranking out assembly code on an IBMmainframe and being struck by the beautiful simplicity of computersand data processing. Unlike most science, there always is not justa better but a best way to do things, and I think we have lostsight of that elegance.

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

All PropertyCasualty360.com news coverage, best practices, and in-depth analysis.
Educational webcasts, resources from industry leaders, and informative newsletters.
Other award-winning websites including BenefitsPRO.com and ThinkAdvisor.com.

NOT FOR REPRINT