Cut Medicare, Raise Costs

         

 

Project FDA Report

No. 4 June 2011

Blue Pill or Red Pill: The Limits of Comparative Effectiveness Research

Tomas J. Philipson, University of Chicago

Eric Sun, Stanford University

Executive Summary

EMAIL THIS | PRINTER FRIENDLY

Comparative Effectiveness Research (CER) measures the effects of different drugs or other treatments on a population, with the goal of finding out which ones produce the greatest benefits for the most patients. Used properly, CER gives the patient, doctor, and payer hard information from thousands, or even millions, of cases, saving them time and money that otherwise would be spent on a trial-and-error quest for the right treatment.

Public and private payers for health care hope to use CER to cut costs without reducing quality of care. Great expectations have been placed on this approach. "If there's broad agreement … [that] the blue pill works better than the red pill," President Obama has said, "and it turns out the blue pills are half as expensive as the red pill, then we want to make sure that doctors and patients have that information available to them."

The potential short-term savings are significant. For example, antipsychotic drugs represent one of the largest and fastest-growing expenses for Medicaid. In 2005, a CER analysis of antipsychotic drugs found little difference between the effectiveness of older, cheaper antipsychotics and that of more expensive "second-generation" drugs. We determined that if reimbursement policies had been changed in response and Medicaid had stopped paying for the more costly drugs, it would have saved $1.2 billion out of the $5.5 billion that it spent on these medications in 2005. However, the consequences of this policy shift would have been worse mental health for many thousands of people, resulting in higher costs to society that would equal or outweigh any savings in Medicaid costs.

This result seems counterintuitive: How can it be that, when a CER study shows no difference between two drugs, limiting coverage for the more expensive drug could actually increase costs? The answer is that in most CER studies, it is the drug or treatment with the larger average effect on an entire population that "wins." In the president's hypothetical, the blue pills are "just as effective" as the red ones because, on average, they do as much good for patients. But the average patient is not the same as any particular individual patient. Declaring a treatment most effective based on an average is a medical and an economic error, for two reasons.

First, individuals differ from one another and from population averages. Therefore, what may be on average a "winning" therapy may simply not work for a large number of patients. Conversely, a drug that is less effective on average may still be the best, or only, choice for a sizable proportion of patients.

The second reason is the variance in dependence in patient responses across therapies. Dependence, for any individual patient, is the degree to which response to one treatment predicts response to another. Dependence varies from illness to illness and from drug to drug but is often an important aspect of finding treatments that work. One cannot know in advance, as a general rule, that Drug A's failure guarantees the failure of Drug B. Yet a reimbursement policy based on CER could well make this error: by refusing to reimburse Drug B on the grounds that Drug A is "more effective," such a policy assumes that failure with Drug A will predict failure with Drug B.

To understand the effect of these points on costs, we looked at the real-world consequences of applying CER results to the antipsychotics we mentioned. These drugs are one of the largest classes of medication for Medicaid patients, and the program's expenditures on antipsychotics are among its fastest-growing: they rose from $1 billion in 1995 to over $5.5 billion in 2005.

In 2005, a national CER study, the Clinical Antipsychotic Trials in Intervention Effectiveness (CATIE), compared the effects of first-generation, cheaper antipsychotics with drugs discovered later. The CATIE study found that second-generation antipsychotics were no more effective at treating schizophrenia symptoms than are first-generation drugs. Naturally, this led to calls for Medicaid to limit reimbursement for second-generation antipsychotics. As this debate continues, we set out to answer a simple empirical question: Would potential reimbursement policies based on the CATIE actually save money on health-care costs? Or would the effects of difference and dependence undo the cost savings?

We found that the latter is the case. Our analysis focused on antipsychotic coverage for roughly 250,000 non-elderly adult Medicaid enrollees with schizophrenia. First, we considered an extreme case: denial of all coverage for second-generation antipsychotics, on the grounds that the cheaper first-generation drugs are just as effective. We found that that this hypothetical policy would save Medicare $1.2 billion, compared with full coverage. However, we estimate that it would reduce patient health by 13,138 quality-adjusted life years (QALYs) because of reduced health among the 75 percent of patients who were not responsive to first-generation antipsychotics and who, because of the restrictive policy, received no other drug therapy. Given that QALYs are typically valued at $100,000, this suggests that the savings from denying coverage for second-generation antipsychotics ($1.2 billion) would be outweighed by the costs of reduced health for patients ($1.3 billion).

The second hypothetical policy we considered would cover perphenazine and risperidone (which are available in less costly generic forms) but exclude olanzapine (which is not). This policy would save Medicaid $500 million annually but reduce health by 10,146 QALYs, mainly because of reduced health among patients who are unresponsive to either risperidone or perphenazine and who receive no therapy for six months or longer because of the restrictive policy. At a value of $100,000 per QALY—again, the typical value assumed in the scholarly literature and by many payers—the health loss is nearly double the savings to Medicaid. Even at a value of $50,000 per QALY, such a policy would only "break even." Therefore, using the CATIE findings to support restrictive coverage policies would not be cost-effective. It would limit freedom of choice for doctors and patients and yield no real compensation in savings.

We do not suggest that CER be dropped from the tool kit of private and public payers who want to cut costs while maintaining quality. On the contrary: we know that CER will become only more important to policymakers in the future. The 2009 federal stimulus law allocated $1 billion for CER programs, and the 2010 health-care overhaul created an institute to promote CER and disseminate the results of this research to doctors and payers. The 2010 law also rescinds a prohibition on the use of CER for coverage decisions by Medicare. In the meantime, insurance companies and other private payers are also on the bandwagon. A recent survey found 85 percent of such organizations expecting that CER will soon be used to justify changes in reimbursement policies.

Our results suggest that CER will not fulfill its promise unless it is implemented differently by researchers and understood differently by policymakers. Simply put, seeking the treatment that is most effective on average will not improve health or save money. However, CER can be conducted in a way that takes difference and dependence into account and measures their effect. If CER is applied in this way—as a tool for matching individual patients to the best treatments for those individuals—it will realize its potential to reduce costs without inhibiting freedom of choice for doctors and patients.

About the Authors

TOMAS J. PHILIPSON is chairman of the Manhattan Institute's Project FDA. A managing director at Precision Health Economics, Philipson is also the Daniel Levin Chair in Public Policy at the Irving B. Harris Graduate School of Public Policy Studies and a member of the Department of Economics at the University of Chicago. In 2003-2004, Philipson served in the U.S. government as senior economic adviser to the commissioner of the Food and Drug Administration (FDA), and from 2004-2005 he was senior economic adviser to the administrator of the Centers for Medicare and Medicaid Services. He is the recipient of several international and national awards including the Kenneth Arrow Award of the International Health Economics Association in 2000 and 2006 (for best paper in health economics). Philipson is a co-editor of the journal Forums for Health Economics & Policy of Berkeley Electronic Press and is on the editorial boards of the journal Health Economics and the European Journal of Health Economics. Philipson earned his undergraduate degree in mathematics at Uppsala University, in Sweden, and his M.A. and Ph.D. in economics from the University of Pennsylvania.

ERIC SUN is a resident in the department of anesthesiology at Stanford University and a visiting fellow at the Bing Center for Health Economics at the RAND Corporation. His research has examined the costs and benefits of medical research and development, the role of the FDA and product liability in ensuring drug safety, and the economics of global public health. Sun's work has been published in the Journal of Health Economics, American Journal of Managed Care, Journal of Public Economics, Health Affairs, Health Economics, Health Services Research, and BE Press Forum for Health Economics. He holds an A.B. in molecular biology from Princeton University, an M.D. from the University of Chicago, and a Ph.D. in business economics, also from Chicago.

Introduction

In their quest to rein in costs without compromising quality, public and private payers for health care lately have placed hope in Comparative Effectiveness Research (CER): studies that compare alternative treatments for a given condition, with the aim of finding those that provide the most benefit to the most patients. In its report to the president and Congress, the Federal Coordinating Council for Comparative Effectiveness Research explains: "The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances." The intuition of these comparisons (often randomized clinical trials, though they are sometimes observational studies) is simple: doctors and patients can save time and money, as they go through the trial-and-error process of finding the right treatment, by knowing what is most effective for the whole population. CER is expected to raise the total health benefit and lower total spending by increasing the amount of useful information about each potential treatment for a given disease, be it diabetes, heart and lung ailments, schizophrenia, or a host of other conditions.

Although CER mainly compares treatments based on clinical utility, it is a small leap from clinical comparisons to economic ones. (In fact, the council report lists the ability to reduce costs as one criterion for judging the merit of potential CER studies.) Yet American policymakers were long reluctant to make this leap because of fears that CER-type evaluations would limit access to treatment, reduce doctors' autonomy, and even lead to rationed care.[1] In recent years, though, pressure to reduce costs has overcome this reluctance.[2] In an interview with ABC News, for example, President Obama signaled his support for a CER-based approach to cost control. "If there's broad agreement … [that] the blue pill works better than the red pill," he said, "and it turns out the blue pills are half as expensive as the red pill, then we want to make sure that doctors and patients have that information available to them."

The change in attitude has expressed itself in recent legislation and regulation. For example, the 2009 stimulus bill (the American Recovery and Reinvestment Act) provided nearly $1 billion in funding for CER. The act allocated $400 million to the Office of the Secretary in HHS, $400 million to the National Institutes of Health, and $400 million to the Agency for Healthcare Research and Quality. The 2010 health-care reform bill (the Patient Protection and Affordable Care Act) provides for the creation of a private, nonprofit organization, the Patient-Centered Outcomes Research Institute. This institute is tasked with identifying CER priorities, funding CER studies, and disseminating the results of CER to payers and physicians. Crucially, the bill also gives Medicare authority to incorporate CER into determining coverage decisions. Though the details remain to be worked out, this change in the law assures that CER will become an important factor in health-care expenditures.

Private payers have been no less interested in CER's potential to save money without hurting quality. A recent survey by the health-care consulting firm Xcenda found that 81 percent of payers believe that the importance of CER will increase in the next two years. And 85 percent foresee situations in which CER will be used to justify shifting the expense burden of costlier treatments onto patients.

In this broad-based acceptance of CER as a factor in payment decisions, its benefits—improved treatments for patients at less cost to payers—have been more often assumed than proved. The implications of CER and its strengths and weaknesses have only recently begun to attract scholarly attention.[3] But it is already clear that CER approaches will have the effect of shifting demand from the therapeutic "losers"—drugs and treatments shown to be less effective, or equally effective but more costly—toward the "winners."

It seems intuitive that these "winners" will provide more health benefits to society and, when cost is taken into account, that shifting to these favored treatments will save society money on health-care costs. Unfortunately, we have found that this intuition is wrong. CER as it is usually implemented will not have this positive health impact and may even lead to greater costs to society. CER is a promising method for matching patients to the right treatments, but it will have to be applied differently by researchers and understood differently by policymakers, if it is to fulfill its promise.

How CER Works

In most CER studies, it is the drug or treatment with the larger average effect on an entire population that "wins." In the president's hypothetical, the blue pills are "just as effective" as the red ones because, on average, they do as much good for patients. But just as the average human being theoretically has one ovary and one testicle, so the average patient is not the same as any individual patient. And declaring a treatment most effective based on an average is a medical and economic error. There are two reasons that average effectiveness cannot be equated with "best."

First, individuals differ from one another and from population averages. Therefore, what may on average be a "winning" therapy may simply not work for a large number of patients. Conversely, a drug that is less effective on average may still be the best, or only, choice for a sizable proportion of patients. CER researchers have recently attempted to address this problem by breaking down populations by gender, age, ethnicity, or other relevant categories. But these divisions into subgroups do not address the fundamental difficulty: treatment is a matter of matching an individual to a therapy. And average performance is often non-informative about how to find the right therapy for an individual. A drug that is less effective on average may still be the best choice for a sizable proportion of patients (see Appendix).

The second reason that average effectiveness cannot be equated with "best" is rooted in the dependence in patient responses across therapies. Dependence, for any individual patient, is the degree to which response to one treatment predicts response to another. Dependence varies from illness to illness and from drug to drug but is often an important aspect of finding treatments that work. One cannot know in advance, as a general rule, that Drug A's failure guarantees the failure of Drug B. Yet a reimbursement policy based on CER could well make this error: by refusing to reimburse Drug B on the grounds that Drug A is "more effective," such a policy assumes that failure with Drug A will predict failure with Drug B.

There are cases in which dependence is almost nonexistent—where the effectiveness of a vaccine, for example, perfectly predicts the effectiveness of another. However, dependence is usually more complex and will vary from illness to illness and from drug to drug. It cannot be ignored. On the contrary; in most cases, there is no hope of finding the optimal therapy for a given patient without knowing the differences in treatment effects across patients and the dependence in effects across treatments. Yet this is not the orientation of current CER studies, which identify and compare simple average treatment effects in a population. Thus, using average treatment effects to identify "winners" could actually worsen patient health by reducing freedom to choose the best therapies for an individual patient.

An Illustration: The CATIE Trail

To illustrate these points, consider the real-world effect of a CER study of antipsychotic drugs. Used to alleviate symptoms of psychosis in schizophrenia, bipolar disorder, and other mental illnesses, these medications consist of a first generation of drugs discovered in the 1950s, including chlorpromazine and haloperidal (called "typical antipsychotics"); and drugs discovered in later decades (the "atypical antipsychotics"), including risperidone and olanzapine. Antipsychotics are one of the largest classes of drugs for Medicaid patients and a growing part of its expenses: Medicaid expenditures on antipsychotics increased from $1 billion in 1995 to over $5.5 billion in 2005.[4]

In 2005, a national CER study, the Clinical Antipsychotic Trials in Intervention Effectiveness (CATIE), compared typical and atypical antipsychotics using the "gold standard" of medical studies, the randomized clinical trial (RCT).[5] (In RCTs, patients are assigned at random to take a drug or placebo; bias effects are avoided because neither patients nor researchers know which drug is being taken by which patient.)

Typical as well as atypical antipsychotics are used to control symptoms of schizophrenia. While the typical antipsychotics (e.g., haloperidal and perphenazine) are cheaper than the atypical antipsychotics (e.g., olanzapine and quetiapine), many of which remain branded only, the typical antipsychotic drugs generally have more severe side effects, including diabetes, sexual dysfunction, and motion impairment. The CATIE study confirmed this side-effect difference but found that second-generation antipsychotics are no more effective at treating schizophrenia symptoms than traditional antipsychotics.[6] Subsequent cost-effectiveness analysis using those results concluded, therefore, that first-generation antipsychotics were cost-saving: they delivered the "same" health benefit for less expense.[7]

Those results, unsurprisingly, led to calls for public payers to limit their coverage of second-generation antipsychotics.[8] (Costs of antipsychotics were one of the fastest-growing pharmaceutical expenditures in Medicaid in the late 1990s and early 2000s.)[9] This argument has been adopted by some influential media outlets and pharmacy benefit managers. In an editorial, for example, the New York Times held that "the nation is wasting billions of dollars on heavily marketed drugs that have never proved themselves in head-to-head competition against cheaper competitors."[10] There has been considerable policy debate on whether the evidence generated by the CATIE should be used as a basis for limiting reimbursement or coverage for atypical antipsychotics—in other words, whether coverage and reimbursement should be responsive to the CER generated by the CATIE.[11]

We recently set out to answer a fundamental question at the heart of this debate: Would using the CATIE to guide Medicaid reimbursement policy actually result in cost savings? Because the CATIE study permitted us to examine individual differences, we were able to assess not only average effects but also individual differences in drug response. That assessment has led us to conclude that the answer to our question is no. Using the CATIE to guide Medicaid reimbursement would not save American society money. Rather, what was gained in lower Medicaid payments would be lost in lost wages, tax payments, and other costs—the consequences of a poorer level of mental health among Medicaid recipients.

A unique aspect of the CATIE was that it followed a novel approach in which patients who discontinued their first drug assignment were given an alternate drug. Therefore, unlike typical randomized trials, the CATIE provides data on how individual patients responded to alternate therapies. This gave us a way to reanalyze the individual-level CATIE data. We found that optimal therapy varies significantly across patients—for example, nearly 75 percent of patients who failed to respond to first-generation antipsychotics would respond to second-generation antipsychotics.[12] Thus, while there may have been no significant difference between first- and second-generation antipsychotics on average, a substantial proportion of patients would benefit more from second-generation antipsychotics than from first-generation ones.

In light of these differences in patient responses, we analyzed how coverage policies would affect health and costs among Medicaid patients if they use the CATIE study to guide payment criteria. Our analysis focused on antipsychotic coverage for roughly 250,000 non-elderly adult Medicaid enrollees with schizophrenia.[13] We considered coverage for three drugs: perphenazine, a first-generation antipsychotic; and risperidone and olanzapine, two second-generation antipsychotics. These three drugs were chosen because they account for 70 percent of antipsychotic prescriptions in the United States. If Medicaid were to provide coverage for all three drugs, we estimate annual costs to be $4.5 billion.

We examined two potentially restrictive coverage policies that might be adopted in response to the CATIE findings. First, we considered an extreme case: denial of all coverage for second-generation antipsychotics, on the grounds that the cheaper first-generation drugs are just as effective. (Such denial is not legal under current law but, as we have noted, already has advocates; in the current climate of enthusiasm for CER as a cost-cutting measure, future changes to the law are certainly possible.) We found that this hypothetical policy would save Medicare $1.2 billion, compared with full coverage. However, we estimate that it would reduce patient health by 13,138 quality-adjusted life years (QALYs) because of reduced health among the patients who were not responsive to first-generation antipsychotics and who, because of the restrictive policy, received no other drug therapy. Given that QALYs are typically valued at $100,000, this suggests that the savings from denying coverage for second-generation antipsychotics ($1.2 billion) would be outweighed by the costs of reduced health for patients ($1.3 billion).

The second hypothetical policy we considered would cover perphenazine and risperidone (which are available in less costly generic forms) but exclude olanzapine (which is not). This policy would save Medicaid $500 million annually but reduce health by 10,146 QALYs, mainly because of reduced health among patients who are unresponsive to either risperidone or perphenazine and who receive no therapy for six months or longer because of the restrictive policy. At a value of $100,000 per QALY—again, the typical value assumed in the scholarly literature and by many payers—the health loss is nearly double the savings to Medicaid. Even at a value of $50,000 per QALY, such a policy would only "break even."

These results reveal the economic consequences of the facts that we have described about individual differences in treatment response and the inability of responses to a first treatment to predict response to a second or third. They follow from the fact that treatments labeled "losers" by a CER study may nonetheless benefit significant numbers of patients who would not be cured by the "winner" of the trial. The CATIE study found no differences between first- and second-generation antipsychotics on average, but a significant number of individual patients would benefit from second-generation drugs and not from first-generation medications. Therefore, using the CATIE findings to support restrictive coverage policies would not be cost-effective. It would limit freedom of choice for doctors and patients and yield no compensating savings to society.

Improving CER Evidence Metrics and Reimbursement Strategies

How can CER methods be improved to better serve the goals of cost control and quality? As we discussed, the traditional metric from a randomized clinical trial—the average response to treatment—is limited in its ability to answer the clinically relevant question of how best to match individual patients to available treatments. To do this, studies must provide insight into, first, individual differences in response and, second, dependence (the extent to which an individual's response to one treatment predicts response to another).

Therefore, we reject the simpleminded notion that CER can find the right, cost-effective "blue pill" for every patient and every condition and eliminate the costly, less effective "red pill." If, as seems very likely, health-coverage decisions will soon be influenced by CER, then CER must be implemented differently and used more insightfully by policymakers. Our recommendations are:

1. Coverage policies should reflect information about difference and dependence effects, not CER population-wide averages. Specifically, public and private payers should never deny coverage for the so-called losers of CER studies that were based on average effects. Instead, they should use information on differences (the variation in response to a given drug from patient to patient) as well as dependence (the variation in each individual's response to different drugs). Such studies should then be used to find the most cost-effective therapies for each patient—by informing the trial-and-error sequence through which a doctor tries first one treatment and then the next.

For example, "prior authorization" insurance policies now aim to provide this kind of guidance, by requiring failure on one therapy before they will authorize reimbursement for a second (usually more expensive) treatment. With better information about dependence effects, this type of policy could be expanded to save costs. A policy could, for instance, use data on differences and dependencies to specify, for a given condition, precisely which initial treatments should be tried, and then map subsequent steps based on nonresponse (essentially adding an economic perspective to the sequence tree in the Appendix). As we have stated, dependence is not a major issue for some diseases and therapies: in heart disease, for example, patients who fail to respond to a first drug are unlikely to do better on a second. In those instances, an informed reimbursement policy could limit payments for second and third treatments and save costs without reducing the overall health of the population. Well-designed protocols could also be built in to clinical software (for example, e-prescribing programs that write prescriptions) in order to further extend the impact of the effectiveness research on cost.

2. Going forward, CER should be used and implemented differently from the way it has been used to date. Of course, effective policies of this sort do not just depend on policymakers making the right use of CER results. They will also require changes in the conventions of CER itself, so that more studies supply the information that policymakers need. Hence, we also recommend that funders promote and support the more useful form of CER trial: not the kind that seeks only "winners" and "losers" in average effects but rather the kind that tracks individual differences and dependence in treatment effects.

Examples of CER techniques that do this include "crossover designs," such as the CATIE, in which patients are switched from one treatment arm to another.[14] Another approach incorporates "adaptive assignments,"[15] in which patients are switched between arms based on their treatment responses. In both designs, the switching of individual patients between trial "arms" provides information on the way a single drug produces different responses in different individuals and how, for any given individual, reaction to one drug predicts reaction to another.

Another way in which well-designed CER can provide fine-grained information about difference and dependence is by taking in the consequences of side effects, unpleasant reactions, and patient preference. It is a fact of life that a drug may appear more cost-effective after a clinical trial than it is in real-world conditions. This is because many randomized clinical trials involve measures to keep patients compliant. Outside the controlled conditions of the trial, though, a drug that is unfamiliar to patients—or that causes sexual dysfunction or provokes nausea—may well be less cost-effective because patients do not take it as frequently as they would a less troublesome alternative. These effects on compliance with a drug protocol are themselves prone to exhibit differences among individuals and dependence across treatments. Therefore, in order to provide information on what therapies are likely to be effective for an individual patient, future CER studies should also measure differences and dependence in adverse effects and compliance. As the FDA plays an important role in regulating randomized clinical trials, the agency should encourage the collection of these types of data for drugs that it must approve.

Read Full Article »
Comment
Show commentsHide Comments

Related Articles

Market Overview
Search Stock Quotes
Partner Videos