WHY AVERAGES ARE MEDIOCRE FOR HEALTHCARE PLANNING... SOME STATISTICAL CONSIDERATIONS FOR POPULATION
The recent $1.5B Powerball® jackpot prompted people across the US to talk about statistics, even if they did so unwittingly. “The jackpot is so big compared to the number of possible combinations that it is almost a good investment,” was a common water cooler refrain. While the merits and inequities of lotteries should be debated, the need for managers to properly understand statistics and probability is a sure bet. Perhaps nowhere is this need strongest than in provider-based population health management.
Historically, when health care organizations took care of patients they received a fee for each service rendered. This fee was roughly aligned with their costs. Profit was not guaranteed, but it was much less risky than in today’s environment. Now providers are often paid a fixed amount for a patient population. In exchange they are responsible for providing all the care that population requires. If the total costs of providing that care are less than the payment received, then the providers make money. If the total costs end up being more they lose. Other models are variations on this theme. Many are a hybrid of receiving payment for services rendered and sharing in overall population savings and overruns when compared to benchmarks. All of these permutations of risk-contracts require providers to make bets. They are betting that on average they can keep costs down, and that the average payment will be adequate. As they do so, it is helpful to keep the following 7 concepts of statistics, probability and benchmarking in mind. All of them suggest that “averages” may not be the best tool for running their businesses.
#1. With small numbers averages are as much the exception as the rule.
“I very much want to have at least one son and one daughter, so I will only need to have three kids, four at the most, to be safe.” Of course with three children there is a 25% chance of not achieving the desired goal of at least one boy and one girl. With four kids, the chances of not having a boy and girl are still higher than 10%. Even with seven kids, there is still a 1.5% chance of not having at least one of each. (With seven kids, there is a 100% certainty that life is going to be complicated, but I digress.) What this trivial example suggests is that even when the expected outcomes are well known, in this case each child has a 50/50 chance of being either a boy or girl, the actual outcome may differ substantially from what was “supposed” to happen.
The implication for population managers is sample size matters. Taking risks on small populations, even if their expected costs are known, is risky. Instead of 3 patients developing a devastating stroke this year, 6 may. These 3 extra patients, given their hefty price tag, could matter a lot to a provider under a risk contact. To make matters worse, small populations can live within large ones. For example, labor & delivery costs are a major health care expense for the non-elderly. 2011 data from Truven Analytics suggest that employers pay over a $130 per-member-per-year (PMPY) across their entire population for vaginal and caesarian deliveries.[i] This spend is concentrated obviously on the 2% of the population that are pregnant women. And within this small population there is real variation. The cost from birth through the first year of life for a healthy full term baby is around $5,085 (2011). For pre-mature and/or low birth weight babies, which are roughly 1-in-10 per the CDC,[ii] the costs were more than 10X that, at $55,393.[iii] Understanding this small population of pregnant women, and the potential variation that exists within it, is critical to effectively taking financial risk on the population as a whole.
Key takeaway(s): Managing risk at its core is a game that favors the large players. Be mindful that natural variations in system performance can be significant, especially when the sample is small. Understanding key sub-populations, e.g., those with high variability potential, is also crucial.
#2. The first set of data may not be all that indicative of the norm.
An alien from outer space comes to Earth to scout for her races ultimate planned colonization of the Planet. The first place she visits is a punk rock concert. The alien dutifully notes that over 60% of Earthlings have dangerously spiked hair. She further observes most seem to have an incredible tolerance for pain as demonstrated by the metal objects they put through parts of their faces. When her home planet receives the report the alien is recalled. The invasion of Earth is scrubbed. There are softer targets out there. This is an example of the fallacy of hasty generalization. The sample size was too small to be indicative of the population as a whole. Had the alien left the concert and gone across the street to the Whole Foods® market or the high-end day spa, her perceptions of Earthlings would have changed markedly.[iv]
In moving to risk-based payment models healthcare providers need to make sure the data they are using is indicative of the entire population. This has at least two elements. It needs to be broad and it needs to multi-year. Initial pilots may have demonstrated that a health system could reduce the costs of certain populations, e.g., reducing Emergency Department use among a targeted set of “frequent-flyers.” However, the ability to scale this program to other patients is not certain. Perhaps these patients were preferentially selected for one reason or another, or at minimum all “opted-in.” Perhaps the geography from which they were pulled had other viable alternatives to the ER which other groups may not. Perhaps the fact it was a pilot, and hence benefited from extra resources and management oversight, made a difference. More data is better. In particular, multi-year data is useful for removing potential one-time aberrations. For example, Boston, Massachusetts saw record snowfall in the winter of 2014/2015. Brigham & Women’s Hospital reported $10M in lost revenue related to the weather from cancelled surgeries, reduced admissions, and less utilization of out-patient services.[v] Just sampling one year, 2014/2015, or the difference between that year and the previous one, can lead to some very misleading observations.
Key takeaway(s): When projecting costs and revenues try to use as much data as possible, including historical data. Continually push your organization for more data, judiciously investing in it as needed. Further, attempt to understand the limits of any data sets and projections off of that data, including any skews or biases that reduce the data’s applicability going forward.
#3. Averages hide the tails. In population management, the tail often wags the dog.
Even when averages are of a large population, and samples are taken over a long period of time, they still may not matter that much. Before turning to healthcare, let us look to China. 200 million Chinese are considered poor by the standard international poverty measure of subsisting on $1.25 a day or less. Nearly 82 million live on less than one a dollar a day.[vi] At the same time more than 300 million Chinese own cars,[vii] and China now boasts the largest number of billionaires in the world.[viii] Clearly marketers don’t think about the “average Chinese consumer” when they plan product launches. The average is not that important. The clusters of people with similar behaviors, and the size of those clusters, are the key to understanding the market dynamics.
The same applies to healthcare. Perhaps even more so. To put it bluntly for population managers the average patient does not matter. In healthcare many distributions are not evenly distributed around a mean. The top 1% of spenders account for more than 20% of total spending, and the top 5% account for almost half.[ix] This is old news. However, less obvious is that the same dynamic exists within specific sub-populations as well. For example, the average cost of a patient with chronic kidney disease (CKD) is $15,204 per year. The top 10% of these patients spend on average $80,923 per year.[x] Clearly not all CKD patients are the same, not all will require the same level of intervention, and not all have the same short-term financial potential under a risk-based contract. It is hard to save $10,000 a year on a patient whose total healthcare bill is $7,000 per year. Saving $10,000 a year on an $80,000 a year a chronically ill patient, while not easy, is at least mathematically possible. Chronic Kidney Disease is not unique. The 5M congestive heart failure (CHF) patients show similar patterns. Medicare patients with CHF spend on average 300% of the typical Medicare Beneficiary.[xi] However, this pattern is hardly uniform. Later-stage Class III & IV CHF patients see several-fold greater utilization compared to those with Class I & II disease.[xii]
Averages masking the underlying market dynamics is not unique to patient costs. Imagine a walk-in clinic that operates three exam rooms. On average each room can serve 3 patients an hour. Thus the center’s total capacity is 9 patients per hour. On average 7.5 patients walk in its doors each hour. The patient experience should be great-- practically no wait for anyone. It can serve 9 an hour, and sees an average demand of 7.5 for an hour. Well, not exactly. Actually, not at all. Sometimes things are slow and rooms sit idle. Sometimes it is busier than usual and queues form. These busy times do not refer to just rare catastrophes, but when 10 or 11 people walk in. If the average is 7.5 patients per hour, 11 is hardly a huge spike. If 11 patients an hour walk in for three hours straight, the waiting room is going to get full. A simple mathematical model of this clinic suggests the average wait-time for a patient is 28 minutes. There is an 16% chance the patient will wait more than hour before being seen. On the other side, about 4% of the time the place will be void of any patients.[xiii]
On net, planning solely based on averages simply does not work, whether it is the “average” Chinese consumer, the average CHF patient, or the average volume a provider can serve. Managers need to understand the variation around the average, especially the ends of the spectrum, i.e., the “tails,” if they are to make effective decisions.
Key Takeaway(s): Attempt to find the micro-segments of populations that typically drive revenues and/or costs in healthcare. More generally, evaluate the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles, alongside the average, in any planning work.
#4. Random is not always random.
The previous example of wait-times at an urgent care center show how normal variations around an average can create some seemingly strange and unwanted system behavior. Often variations in the real world are not that random.
One example is the recent Chikungunya outbreak in Puerto Rico. Healthcare providers often have to predict how many hospital admissions for infectious illness they will see in the upcoming year. From past data they can deduce a reasonable range for known entities like the common cold, influenza, etc. More often than not that predicted range will be correct. But then sometimes it can be quite wrong. From mid-2014 to early 2015 more than 27,000 Puerto Ricans were infected with Chikungunya, a mosquito-borne illness that causes fever, joint-pain, and skin rashes.[xiv] Initial hospitalization rates were 13% of those infected.[xv] The issue is infectious disease does not randomly happen to people. If your neighbor has it, you are more likely to get it, and then transmit to others.
Similar non-random variations in population health costs can been seen in chronic illness. This is especially true as new technologies enter the market. While Hepatitis C transmission rates are relatively easy to predict using past years’ data, as are hospitalization rates, their costs are not always so. Specifically, the cost to treat Hep C patients skyrocketed when Solvaldi,® Harvoni® and other new agents hit the market. In 2014 Medicare spent $4.5B on new Hep C meds, up from $286M spent in 2013 on earlier-generation therapies.[xvi] While an extreme example it is not the only one. PCSK9 Inhibitors can reduce blood lipid levels, but of course will come at a steep price. The disclaimer so often used in financial performance, “past performance is not guarantee of future results,” applies equally well to population health. The only solution is to understand both the past data as well as when structural changes or one-off events can make that data no longer meaningful.
Key takeaway(s): Actively brainstorm ways the future may look very different from the past, including the impact of new technologies, patient disease burden, and market changes.
#5. Variables with some randomness tend to regress to the mean.
Perhaps the most commonly thought of statistical concept in population management is regression to the mean. In a nutshell, when one measures something that has an element of randomness to it extreme values are not likely to be repeated. Imagine a population health manager sampling the average annual healthcare costs of 100 non-insulin taking diabetics. He then pulls out the 10 costliest for enrollment in a new program. (He may not have picked the 10 costliest, but the 10 “highest-risk” patients, 8 of whom happen to be the costliest). He then enrolls those selected in a care management program focused on lifestyle change, medication adherence, some nifty iPhone app, etc. Low and behold next year the patients’ costs came down. Success! Or is it?
Well the program was likely good for the patient, but what about those paying the bills? The nature of type-II diabetes is that the costs in any one year have an element of randomness to them. By cherry picking out last year’s high-cost patients there is strong a chance that many will not repeat the same level of resource utilization even without an intervention. By analogy, just because you won a lot of money a Roulette table last night does not mean you should plan on winning again today. Regression to the mean does not mean the program is ineffective. However, it does mean the simple experiment described does not demonstrate it to be so. (Al Lewis’s book Why Nobody Believes the Numbers provides a detailed explanation of regression to the mean in population health and how to mitigate its effects.)
Regression to the mean occurs anytime there is some randomness to the variable being measured. I.e., what happened last year does not guarantee it will happen again. If the example experiment was looking at just drug costs instead of total healthcare costs, and the target population was Insulin-taking diabetics vs. patients on oral medications, then a significant year-over-year change would be more meaningful. (One would want to correct for the recent spike in Insulin drug prices.) This is because this year’s Insulin drug-spend is likely correlated with last year’s drug-spend. It is less random.
Regression to mean applies beyond to care management evaluation. Providers who outperform their peers in revenues, complication rates, etc., even on a risk-adjusted basis, may not really be any better than their colleagues. There is an element of chance in any complex system. People will have good years and bad ones.
Key takeway(s): When evaluating performance of an intervention, a program, a clinician or clinical team, etc., be sure to note that chance is always a factor. Above or below average performance this year may not imply similar performance next year. If at all possible attempt to control for regression to the mean.
#6. Rate of change can be more important than the current state.
Yogi Berra said “it is tough to make predictions, especially about the future.” This is in part because things change. Sometimes they change rapidly. In 2005 less than 5M Americans owned a smartphone. By 2010 it surpassed 60M. By 2013, over 140M.[xvii] It would not have seemed unreasonable for executives in 2010 to say, “Mobile is important, but still more people use their computers to access information. Let’s focus our limited R&D efforts on the desktop.” However, if lead times to develop and deploy a consumer-facing care management platform is 48 months, then woops. In fact, as of March 2015, 11.3% of consumers were “mobile-only” internet users, beating for the first time “desktop-only” users of 10.6%.[xviii] While few things change as rapidly as consumer technology, even non-technology phenomena can move pretty quickly. In 1994 China’s GDP was under $600B, making it roughly 8th in the world. It was behind such economic titans as Canada, UK and Italy. Fast forward 20 years and its GDP grew to $10T, squarely planting it in the number two slot, more than double third place Japan.[xix] This was the result of a 16% average annual growth rate.
Any executive, healthcare or otherwise, should monitor growth/decline rates of any key performance indicator. Opthamologists are acutely aware of how quickly things can move. For example, prices can fall quickly. Medicare reimbursement for a type of hospital-based cataract procedure fell from $1,925 in 1986 to $773 in 1999,[xx] a negative 7% year-over-year change. Thus in the early 1990s it was crucial to understand the revenues and costs of the procedure before making eye-care investment decisions, as well as the speed these variables were moving. Price is not the only the variable that shows such swings. Virtual healthcare visits, almost unheard of 20 years ago, are becoming the norm. Many of Kaiser Permanente's regional systems are performing greater than 50% of visits virtually—through mobile, or secure messaging, or video, per a Dec. 4, 2014 Modern Healthcare article. In the Kaiser Permanente Northern California virtual visits had grown from 4.1 million in 2008 to about 10.5 million by 2013. This is a ~14% annual growth rate. Virtual visits are expected to exceed in-person ones by 2016 in Kaiser’s Northern California operations.[xxi]
Key takeaway(s): When using data to benchmark performance and/or make predictions, take a hard look not only at the numbers today, but how the numbers are changing. A 7% plus annual growth or decline rate can quickly make the future look very different from the present.
#7. Why aspire to be average.
The final reason I’ll suggest why simple averages fail population managers is more psychological than mathematical. Averages let managers off the hook too easily. Many providers and insurers often benchmark themselves against the mean. “Our surgical complication rate is X% better than average,” “our avoidable admissions for COPD is Y% less than the average,” “our risk-adjusted PMPM spend is Z% less than average,” etc. Big deal. Well in fairness it is a big deal. Getting those results typically requires a lot of hard work by dedicated executives, clinicians, and staff. But then again, who wants to be average, or work for an average organization? Benchmark to the 90th percentile, or better yet, best-in-class. It is hard for individuals or organizations to achieve the goals they set for themselves. It is much harder to do markedly do better than them. Collins and Porras at Stanford School of Business suggest that a common thread of great companies is their setting of clear, yet very Big Hairy Audacious Goals, or B-HAGs.[xxii] Take for example The Villages Health, a multi-specialty group located in The Villages, the largest 55-and-over community in the US. Their mission is clear, “to make The Villages America’s healthiest home town.”[xxiii] Benchmarking to the mean simply does not get the job done.
Key takeway(s): Use data and benchmarking not just as for better decision making, but as a motivational tool.
Data, data-benchmarking, and quantitative projection is the life blood of health systems trying to profit from risk-based payments. Unfortunately, the tool clinicians know as the gold-standard for data-driven decision making—the double-blinded, placebo-controlled, randomized study—is not practical to use. Thus providers are forced to rely on the natural experiments the environment affords them. They must look at past data, selectively invest in surveys/studies to quantify select areas of high interest, and then make bets. Doing this better than average will at minimum require health systems to look beyond simple averages in their planning.
[i] Truven Health Analytics. What are the Leading Drivers of Employer Healthcare Spending Growth? Research Brief.
[ii] CDC www site.
[iii] March of Dimes. Premature Babies Cost Employers $12.7 Billion Annually. March of Dimes Releases New Report about the High Cost of Preterm Birth. Feb 7, 2014.
[iv] In full disclosure, the author has never been to a punk rock concert and the example may be entirely erroneous. The concept of Hasty Generalization remains valid.
[v] Massachusetts Emergency Management Agency. Attachment A: 2015 Severe Winter Weather Pattern Impacts - Supplemental Information, March 27, 2015. Found online.
[vi] WSJ Blog. More Than 82 Million Chinese Live on Less Than $1 a Day. Oct 15, 2014.
[vii] WSJ. China Soon to Have Almost as Many Drivers as U.S. Has People. Nov 28 2014.
[viii] Fortune. China now has more billionaires than the U.S. Oct 15, 2015
[ix] NIHCM. The Concentration of Healthcare Spending. NIHCM Foundation Data Brief. July 2 012
[x] Avalere. Analysis of the Accuracy of the CMS-Hierarchical Condition Category Model. Jan 2016.
[xi] Milliman. The High Cost of Heart Failure for the Medicare Population: An Actuarial Cost Analysis.” Feb 2015.
[xii] Ahmed A et al. Am Heart J. 2006;151(2):444-450.
[xiii] This examples uses the M/M/C queuing model as found at: http://www.supositorio.com/rcalc/rcalclite.htm.
[xiv] Fox News Latino. Puerto Rico hit by 27,000 chikungunya cases since May. Feb 6, 2015.
[xv] USA Today. Chikungunya has sickened more than 10,000 in Puerto Rico. Dec 4, 2014.
[xvi] The Washington Post. New hepatitis C drugs are costing Medicare billions. Mar 9, 2015.
[xvii] Heidi Cohen.com. WWW Site: 67 Mobile Facts To Develop Your 2014 Budget.
[xviii] Comscore. Number of Mobile-Only Internet Users Now Exceeds Desktop-Only in the U.S. Apr 28 2015
[xx] Ophthalmology Management. Article of Oct. 1, 1999.
[xxi] Modern Healthcare. Kaiser virtual-visits growth shows the technology's potential. Dec 4, 2014.
[xxii] Collins, Jim. Built to Last: Successful Habits of Visionary Companies. Harper Collins.
[xxiii] Discussions with management. Note, the author is a consultant to The Villages Health.