Last year I saw a picture of a distinguished looking man withgray hair and a beard. There was a caption that read: “I don'talways test my code, but when I do I do it in production”. Atthe time I had no idea that this was a well-known image of anadvertising persona called the “most interesting man in the world.”I guess I don't watch enough television—or drink enough beer.

|

I do remember that I was slightly amused and slightly disgusted.In my experience there have been far too many times when anapplication was not thoroughly tested before being promoted toproduction. Those applications often failed and had to be rolledback for remediation and proper testing. No real application teamwould test in production, right?

|

Ummm…Wrong

|

In fact we test in production all the time. And we do itintentionally. Even well-disciplined, well-managed applicationdevelopment teams must rely on a production release to fully vetand test their code. I am not talking about unit testing, orfunctionality testing, or integration testing, or system testing oreven performance testing.

|

We still need to do all those things and if we do them correctlywe won't have any bugs or code defects when we go to production.But it is impossible to validate actual performance of a complexsystem in a lower life cycle. The key word here is test.You can test and retest, but all you are really doing is validatingthe performance and functionality of the application in a testingenvironment.

|

Performance Testing

|

I was once involved in rolling out a large intranet. Theproduction infrastructure was replicated server-for-server in anenvironment for performance testing. We knew exactly what ouranticipated load would be. We knew when the load would ramp up andramp down. We knew the transactions that would be performed, so wecreated test scripts and scenarios to replicate that load.

|

We created 25K test users to run those scripts. After six weeksof performance testing we gave up. We were never able to performrepeatable performance tests to validate our required SLA's. Sometests were fine, other identical tests were not. We tweaked thetest scripts. We ran tests at different times of the day. Werewrote the test scripts. We used different machines to generatethe loads. We validated that there were no code defects orbugs.

|

The system design was more than adequate. It was sized to manage100 percent of peak annual load at 50 percent utilization. Thedecision was finally made to move the portal to production. Wecalled it a pilot. Traffic was gradually moved to the newproduction environment. The production farm ran as designed atloads we were never able to sustain in the performance environment.Two identical farms produced different results under load.

|

Huh?

|

So what happened? How can two farms which are identical performdifferently? The answer is obvious: The performance testinginfrastructure and the production infrastructure were not reallyidentical; they just looked that way. On paper everything matched,but in reality we had a lot of differences. Most of the serverswere virtualized—so that while they had identical specifications wewere never 100 percent certain that the physical hosts and storagematched.

|

The database servers were indistinguishable physical machines,but the storage was never validated to be identical. The loadbalancers were not exactly the same. The NIC's were not exactly thesame. The storage virtualization fabric was not exactly that same.In fact except for the server specifications nothing really matchedproduction.

|

In retrospect we now know that we can use that environment forperformance testing, but only to establish a reference benchmark.We know the metrics on the production farm at 1000 transactions persecond. We are able to load the test farm until those same metricsare achieved.

|

Testing new code in the test environment now provides somethingwe can extrapolate to provide expected results in production. Sosomething like 350 transactions per second equals a lode of 1000transactions per second in production. Not the best way to test butit provides some comfort level.

|

Test Scripts

|

Even if the environments were truly identical we are still stuckwith the limitations of test scripts. Test scripts are able tocreate load but they are never able to duplicate the kind of loadthat occurs when the end user is a human being. I can structure atest script in such a way that I can almost guarantee good results.I can also create scripts that will almost certainly bring the farmdown in a minute.

|

What I can't do is replicate human generated site traffic. Maybeyou have had better luck than me, but I have yet to find analgorithm that can manage load test scripts to simulate actual use.That is why I generally design websites to handle so manytransactions per second. I can then create synthetic events togenerate however many transactions I need to validate design.

|

Data

|

Then there is data. You simply don't have the same data in yourlower life cycles that you do in production. Test data is not realdata. It may look like real data but it isn't. Production data issometimes replicated for pre-production or staging but never fortesting. Scary IT campfire stories abound of the fate that befellorganizations (or CIO's) that mixed production data in testsystems. Too many things can go wrong to ever consider thatoption.

|

Even if you completely isolate your test life cycles so thatthey can never find their way past their firewalls, usingproduction data in test is not a good idea. Software testing isoften vended out or handled by contractors or offshore. You cannotmove PCI or regulated data from a secure production environment toa test environment that is surrounded with less rigor, complianceand security.

|

Integration

|

Even bad software developers and teams know that they must testintegration with other applications and data sources before deemingtheir application production ready. But successful integration withother test environments does not guarantee final success. There isso much more to rolling out good applications than writing code andintegrating.

|

How many times have you seen a team stumble when all the unittesting and integration is done and it's time for production?Little things like developer accounts embedded in connectionstrings creep out of the woodwork. All those little tricks that thedevelopment team hacked together to get the application runningcome back to haunt them.

|

We all know that properly designed applications use parametersthat can be defined at run time so that they can run in anysupported environment. We also know that these things are (almost)always put off in the initial rush to get something working.Ideally this would have all been detected in the progressionthrough the development, quality assurance, and performance lifecycles, but none of those environments are in the same domain asproduction, so all bets are off. That is why you never let yourdevelopment team install and configure code release through thetesting environment. Dedicated build teams discover these thingsearly on and prevent developers from hacking an application orenvironment to make a delivery date.

|

Pilot

|

And that is why we pilot—even if all of our prior testing wasflawless; even if we have zero defects; even if we have passed alltest cases with flying colors. The production world with real usersand real data is a new world. A well thought out pilot process doesa number of things. First it allows the user community to adapt tothe new application. Too often the folks who sign off on UAT aren'tthe same ones that use the application day in and day out. It alsoprovides a rigorous workout of the application.

|

It is not possible to posit enough test cases to cover all theedge cases that actual users will create. Those edge cases willeventually prove the value of the application. Pilot is also thefirst real opportunity to fully test all those integrations wethought were working in pre-production. The customer masterdatabase in pre-production had 2 million dummy records. The realcustomer master has 150 million actual records (some of which youstrongly suspect are bogus). A complex transaction that took subseconds in pre-production is now running for a second and a half.Pilot allows us to identify these bottlenecks.

|

Infrastructure

|

Pilot is also the first real opportunity you have to test yourenvironment. When you first built out your productioninfrastructure you specified everything you needed including theIOPS for your database storage as well as the number and capacityof LUNs. Your servers were delivered with the requested number ofcores and RAM.

|

Are you sure you really were provided what was specified? Datacenters use virtualization managers for everything from server tostorage. You may have had 2000 IOPS for LUN 22 on day 1. What doyou have on day 90? Or day 900? Virtualization allows optimaluse of all available assets, but it also allows for overutilization of available assets.

|

On day 1 your application may have been the only one using thephysical storage. On day 90 you may now be sharing that storagewith a digital asset management application. The applicationteam needs to check what their actual throughput and response timesare. Run low level tests from your servers to the storage andback.

|

The same rules apply to virtualized servers. All physical hostsare not created equal and all hypervisors are managed properly. Doyou really have access to all eight cores 24×7? And what do thosecores look like traced back to the physical device? Were youactually allocated a single processor with four hyper-threadedcores? Are you consuming half of a hyper-thread from eightdifferent processors? These are not equal.

|

Application owners must benchmark machine metrics when they areconfident they are being provided maximum resources from thevarious virtualized fabrics. And they need to repeat thosebenchmark tests throughout the life of the application.

|

So do we test in production? Of course we do. But we are lookingfor different defects in our production testing. You still need todo all the necessary work in lower life cycles and only promotecode that has been rigorously tested and is defect free. Code thatis promoted to production must be fully functional and have beenproven through testing to satisfy all business requirements. Wearen't expecting to find any surprises or discover hidden defectsin production, but we do need this final validation that it runsjust as well on busy interstates and crowded city streets as it didon the test track.

|

Please address comments, complaints, and suggestions to theauthor at [email protected].

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

  • All PropertyCasualty360.com news coverage, best practices, and in-depth analysis.
  • Educational webcasts, resources from industry leaders, and informative newsletters.
  • Other award-winning websites including BenefitsPRO.com and ThinkAdvisor.com.
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.