Do You Test Code in Production?

Last year I saw a picture of a distinguished looking man withgray hair and a beard. There was a caption that read: "I don'talways test my code, but when I do, I do it in production". At thetime I had no idea that this was a well-known image of anadvertising persona called the "most interesting man in the world."I guess I don't watch enough television—or drink enough beer.

I remember I was slightly amused and slightly disgusted. In myexperience there have been too many times when an application wasnot thoroughly tested before being promoted to production. Thoseapplications often failed and had to be rolled back for remediationand re-testing. No application team would test in production,right?

Ummm…Wrong

In fact, we test in production all the time. Evenwell-disciplined, well-managed application development teams mustrely on a production release to fully vet and test their code. I amnot talking about unit testing, or functionality testing, orintegration testing, or system testing or even performancetesting.

We still need to do all those things and if we do them correctlywe won't have any bugs or code defects when we go to production.But it is impossible to validate actual performance of a complexsystem in a lower life cycle. The key word here is test.You can test and retest, but all you are really doing is validatingthe performance and functionality of the application in a testingenvironment.

Performance Testing

I was once involved in rolling out a large intranet. Theproduction infrastructure was replicated server-for-server in anenvironment for performance testing. We knew exactly what ouranticipated load would be. We knew when the load would ramp up andramp down. We knew the transactions that would be performed, so wecreated test scripts and scenarios to replicate that load.

We created 25K test users to run those scripts. After six weeksof performance testing we gave up. We were never able to performrepeatable performance tests to validate our required SLA's. Sometests were fine, other identical tests were not. We tweaked thetest scripts. We ran tests at different times of the day. Werewrote the test scripts. We used different machines to generatethe loads. We validated that there were no code defects orbugs.

The system design was more than adequate. It was sized to manage100 percent of peak annual load at 50 percent utilization. Thedecision was finally made to move the portal to production. Wecalled it a pilot. Traffic was gradually moved to the newproduction environment. The production farm ran as designed atloads we were never able to sustain in the performance environment.Two identical farms produced different results under load.

Huh?

So what happened? How can two farms which are identical performdifferently? The answer is obvious: The performance testinginfrastructure and the production infrastructure were not reallyidentical; they just looked that way. On paper everything matched,but in reality we had a lot of differences. Most of the serverswere virtualized—so that while they had identical specifications wewere never 100 percent certain that the physical hosts and storagematched.

The database servers were indistinguishable physical machines,but the storage was never validated to be identical. The loadbalancers were not exactly the same. The NIC's were not exactly thesame. The storage virtualization fabric was not exactly that same.In fact except for the server specifications nothing really matchedproduction.

In retrospect we now know that we can use that environment forperformance testing, but only to establish a reference benchmark.We know the metrics on the production farm at 1000 transactions persecond. We are able to load the test farm until those same metricsare achieved.

Testing new code in the test environment now provides somethingwe can extrapolate to provide expected results in production. Sosomething like 350 transactions per second equals a lode of 1000transactions per second in production. Not the best way to test butit provides some comfort level.

Test Scripts

Even if the environments were truly identical we are still stuckwith the limitations of test scripts. Test scripts are able tocreate load, but they are never able to duplicate the kind of loadthat occurs when the end user is a human being. I can structure atest script in such a way that I can almost guarantee good results.I can also create scripts that will almost certainly bring the farmdown in a minute.

What I can't do is replicate human generated site traffic. Maybeyou have had better luck than me, but I have yet to find analgorithm that can manage load-test scripts to simulate actual use.That is why I generally design websites to handle so manytransactions per second. I can then create synthetic events togenerate however many transactions I need to validate design.

Data

Then there is data. You simply don't have the same data in yourlower life cycles that you do in production. Test data is not realdata. It may look like real data but it isn't. Production data issometimes replicated for pre-production or staging, but never fortesting. Scary IT campfire stories abound of the fate that befellorganizations (or CIO's) that mixed production data in testsystems. Too many things can go wrong to ever consider thatoption.

Even if you completely isolate your test life cycles so thatthey can never find their way past their firewalls, usingproduction data in test is not a good idea. Software testing isoften vended out or handled by contractors or offshore. You cannotmove PCI or regulated data from a secure production environment toa test environment that is surrounded with less rigor, compliance,and security.

Integration

Even bad software developers and teams know that they must testintegration with other applications and data sources before deemingtheir application production ready. But successful integration withother test environments does not guarantee final success. There isso much more to rolling out good applications than writing code andintegrating.

How many times have you seen a team stumble when all the unittesting and integration is done and it's time for production?Little things like developer accounts embedded in connectionstrings creep out of the woodwork. All those little tricks that thedevelopment team hacked together to get the application runningcome back to haunt them.

We all know that properly designed applications use parametersthat can be defined at run time so that they can run in anysupported environment. We also know that these things are (almost)always put off in the initial rush to get something working.Ideally this would have all been detected in the progressionthrough the development, quality assurance, and performance lifecycles, but none of those environments are in the same domain asproduction, so all bets are off. That is why you never let yourdevelopment team install and configure code release through thetesting environment. Dedicated build teams discover these thingsearly on and prevent developers from hacking an application orenvironment to make a delivery date.

Pilot

And that is why we pilot—even if all of our prior testing wasflawless; even if we have zero defects; even if we have passed alltest cases with flying colors. The production world with real usersand real data is a new world. A well thought out pilot process doesa number of things. First it allows the user community to adapt tothe new application. Too often the folks who sign off on UAT aren'tthe same ones that use the application day in and day out. It alsoprovides a rigorous workout of the application.

It is not possible to posit enough test cases to cover all theedge cases that actual users will create. Those edge cases willeventually prove the value of the application. Pilot is also thefirst real opportunity to fully test all those integrations wethought were working in pre-production. The customer masterdatabase in pre-production had two million dummy records. The realcustomer master has 150 million actual records (some of which youstrongly suspect are bogus). A complex transaction that tooksub-seconds in pre-production is now running for a second and ahalf. Pilot allows us to identify these bottlenecks.

Infrastructure

Pilot is also the first real opportunity you have to test yourenvironment. When you first built out your productioninfrastructure you specified everything you needed including theIOPS for your database storage as well as the number and capacityof LUNs. Your servers were delivered with the requested number ofcores and RAM.

Are you sure you really were provided what was specified? Datacenters use virtualization managers for everything from server tostorage. You may have had 2000 IOPS for LUN 22 on day one. What doyou have on day 90? Or day 900? Virtualization allows optimal useof all available assets, but it also allows for over utilization ofavailable assets.

On day one your application may have been the only one using thephysical storage. On day 90 you may now be sharing that storagewith a digital asset management application. The application teamneeds to check what the throughput and response timesare.

The same rules apply to virtualized servers. All physical hostsare not created equal and all hypervisors are managed properly. Doyou really have access to all eight cores 24×7? And what do thosecores look like traced back to the physical device? Were youactually allocated a single processor with four hyper-threadedcores? Are you consuming half of a hyper-thread from eightdifferent processors? These are not equal.

So do we test in production? Of course we do, but we are lookingfor different defects in our testing. You still need to do thenecessary work in lower life cycles and only promote code that hasbeen rigorously tested. Code that is promoted to production must befully functional and proven to satisfy business requirements. Wearen't expecting surprises or to discover hidden defects, but we doneed this final validation that it runs just as well on busyinterstates and crowded city streets as it did on the testtrack.

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

All PropertyCasualty360.com news coverage, best practices, and in-depth analysis.
Educational webcasts, resources from industry leaders, and informative newsletters.
Other award-winning websites including BenefitsPRO.com and ThinkAdvisor.com.

NOT FOR REPRINT