In health care, second opinions exist for a reason. Presented with identical medical conditions, some doctors will choose aggressive treatments while others will show more restraint.
The standard view among health care economists is that these differences reflect varying individual preferences or practice styles of doctors. If that’s right, these differences underscore the importance of medical guidelines. By standardizing care, the thinking goes, guidelines reduce waste and improve patient health.
But a new Stanford study, published in the May issue of The Quarterly Journal of Economics, challenges the premise that approaches to patient care are mainly a matter of practitioner preference. It suggests that policies may miss an important driver of differences in patient care: physician skill.
By analyzing more than 4 million potential cases of pneumonia, the study’s co-authors — David Chan and Matthew Gentzkow, both senior fellows at the Stanford Institute for Economic Policy Research (SIEPR), and Chuan Yu, a former pre-doctoral fellow at SIEPR and now a Stanford PhD student — find causal evidence that skill also explains why patients with the same condition can receive different diagnoses. Their study compares the rates at which radiologists, who are medical doctors, diagnose pneumonia against how many cases they fail to identify. Skill — defined in the study as diagnostic accuracy — accounted for nearly 80 percent of the variation of missed cases across radiologists.
In one of the most surprising insights, the study finds evidence to suggest that, when it comes to physicians who are more aggressive about diagnosing ailments, conventional wisdom gets it wrong: While it seems likely that doctors with higher diagnosis rates are erring on the side of caution and would miss fewer cases as a result, the opposite turns out to be true. Radiologists who were more likely to conclude a patient has pneumonia were also more likely to miss positive cases.
This indicates that differences in physician skill — and not just physician preferences — play an important role in patient care. According to research cited in the paper, diagnostic errors account for as much as 17 percent of poor patient outcomes in U.S. hospitals, and 9 percent of deaths.
Chan, an associate professor of health policy at the School of Medicine, says the research suggests that the current narrative around problems in the U.S. health care system is incomplete.
“Researchers and policymakers have often focused on inefficiency in health care stemming from misaligned incentives in areas around, for instance, insurance or litigation issues,” Chan says. “People haven’t paid as much attention to a source of inefficiency being about providers not knowing what to do.”
The research casts doubt on the extent one-size-fits-all solutions can rein in U.S. health care costs which, at $4 trillion annually, amount to 18 percent of gross domestic product.
“We might think that having measures aimed at creating more uniform practices means everybody makes the same decision, and outcomes improve as a result,” Gentzkow says. “But we now know that some doctors are really good at diagnosing whereas others have a hard time — and will diagnose a lot of patients — because they just aren’t sure.”
With such variation in diagnostic skills, relying solely on standardized approaches to equalize patient care — and to reduce the vast differences in health care costs across regions and providers — is not only insufficient, but potentially counterproductive, according to the researchers.
Gentzkow, the Landau Professor of Technology and the Economy at the School of Humanities and Sciences, says their study highlights the importance of policies that better support training of health care providers.
“For a doctor to not do treatments and not order tests is often what takes a lot of skill,” Gentzkow says. “You have to be a very good doctor to know when not to do things.”
The notion that some doctors are better than others may sound obvious, but it turns out to be hard to prove empirically.
Research has often assumed that individual preferences are significant factors in decision-making — not only among health care providers, but also judges, teachers, managers, and police officers. Indeed, such preferences might lead providers to over-treat or over-diagnose, but few studies have tackled the challenge of examining the role that skill also plays in correct or incorrect medical decisions.
“There is a general sense that diagnosis is very important, but it is just very hard to measure the quality of making diagnoses,” Chan says. In developing a framework that maps out the roles of skill as well as preferences in diagnoses and missed cases, “this paper is a step toward filling that gap.”
Chan and Gentzkow didn’t set out to study diagnostic skill. They initially wanted to examine the role of communication between different medical providers and how that affects patient care. Chan is a health economist whose research focuses on labor and productivity in the delivery of health care; he is also a practicing physician at the Department of Veterans Affairs (VA). Gentzkow studies industrial organization, political economy and the impact of communication across a variety of settings, including social media.
For what was supposed to be a study of health care communication, they zeroed in on the interaction between radiologists, who are tasked with reading chest X-rays, and the physicians who interpret the results and decide on a treatment. As a first step, however, they needed to find out if radiologists accurately interpret X-rays.
Pneumonia cases were ideal to study because the condition, if misdiagnosed initially, eventually shows clear symptoms. This allowed Chan and Gentzkow to study overall diagnoses rates among radiologists and the extent to which they missed positive cases during a patient’s first visit.
But with the striking discovery that radiologists with the highest diagnosis rates were also missing the most cases, Chan and Gentzkow knew they had tapped a source for new empirical evidence of the role that skill plays in physician care.
Ultimately, their analysis included 4.67 million chest X-rays taken in DVA emergency rooms between 1999 and 2015. Because radiologists are randomly assigned cases, Chan and Gentzkow could use a natural experiment to infer causal effects by radiologists on diagnostic decisions and missed cases — building on an approach pioneered by Guido Imbens, a SIEPR senior fellow and newly-minted Nobel Laureate. The researchers labeled a pneumonia case as missed if a patient returned to the ER within 10 days of receiving a negative diagnosis and was now found to be positive.
The results, the researchers say, show that standardizing care may be ineffective or even counterproductive as doctors vary in their skills to make diagnoses to begin with. Radiologists diagnose the potentially fatal lung infection at far different rates, relative to their caseloads, and skill level accounts for 39 percent of that. Differences in skill result in 78 percent of the variation of missed cases.
The study’s authors find that the most skilled radiologists tended to be older and more experienced. These radiologists also wrote shorter reports and spent more time generating them. This, according to the researchers, suggests that effort contributes to what counts as clinical skill.