Mobility Data Used to Respond to COVID-19 can Leave out Older and Non-White People

Information on individuals' mobility - where they go as measured by their smartphones - has been used widely in devising and evaluating ways to respond to COVID-19, including how to target public health resources. Yet little attention has been paid to how reliable these data are and what sorts of demographic bias they possess. A new study tested the reliability and bias of widely used mobility data, finding that older and non-White voters are less likely to be captured by these data. Allocating public health resources based on such information could cause disproportionate harms to high-risk elderly and minority groups.

The study, by researchers at Carnegie Mellon University (CMU) and Stanford University, appears in the Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, a publication of the Association for Computing Machinery.

"Older age is a major risk factor for COVID-19-related mortality, and African-American, Native-American, and Latinx communities bear a disproportionately high burden of COVID-19 cases and deaths," explains Amanda Coston, a doctoral student at CMU's Heinz College and Machine Learning Department, who led the study as a summer research fellow at Stanford University's Regulation, Evaluation, and Governance Lab. "If these demographic groups are not well represented in data that are used to inform policymaking, we risk enacting policies that fail to help those at greatest risk and further exacerbating serious disparities in the health care response to the pandemic."

During the COVID-19 pandemic, mobility data have been used to analyze the effectiveness of social distancing policies, illustrate how people's travel affects transmission of the virus, and probe how different sectors of the economy have been affected by social distancing. Yet despite the high-stakes settings in which this information has been used, independent assessments of the data's reliability are lacking.

In this study, the first independent audit of demographic bias of a smartphone-based mobility dataset used in the response to COVID-19, researchers assessed the validity of SafeGraph data. This widely used mobility dataset contains information from approximately 47 million mobile devices in the United States. The data come from mobile applications, such as navigation, weather, and social media apps, where users have opted in to location tracking.

When COVID-19 began, SafeGraph released much of its data for free as part of the COVID-19 Data Consortium to enable researchers, nonprofits, and governments to gain insight and inform responses. As a result, SafeGraph's mobility data have been used widely in pandemic research, including by the Centers for Disease Control and Prevention, and to inform public health orders and guidelines issued by governors' offices, large cities, and counties. Researchers in this study sought to determine whether SafeGraph data accurately represent the broader population.

SafeGraph has reported publicly on the representativeness of its data. But the researchers suggest that because the company's analysis examined demographic bias only at Census-aggregated levels and did not address the question of demographic bias for inferences specific to places of interest (e.g. voting places), an independent audit was necessary.

A major challenge in conducting such an audit is the lack of demographic information--SafeGraph data do not contain demographics such as age and race. In this study, researchers showed how administrative data can provide the demographic information necessary for a bias audit, supplementing the information gathered by SafeGraph. They used North Carolina voter registration and turnout records, which typically include information on age, gender, and race, as well as voters' travel to a polling location on Election Day. Their data came from a private voter file vendor that combines publicly available voter records. In all, the study included 539,000 voters from North Carolina who voted at 558 locations during the 2018 general election. The researchers deemed this sample highly representative of all voters in that state.

The study identified a sampling bias in the SafeGraph data that under-represents two high-risk groups, which the authors called particularly concerning in the context of the COVID-19 pandemic. Specifically, older and minority voters were less likely to be captured by the mobility data. This could lead jurisdictions to under-allocate important health resources, such as pop-up testing sites and masks, to vulnerable populations.

"While SafeGraph information may help people make policy decisions, auxiliary information, including prior knowledge about local populations, should also be used to make policy decisions about allocating resources," suggests Alexandra Chouldechova, assistant professor of statistics and public policy at CMU, who coauthored the study.

The authors also call for more work to determine how mobility data can be more representative, including asking firms that provide this kind of data to be more transparent in including the sources of their data (e.g., identifying which smartphone applications were used to access the information).

Among the study's limitations, the authors note that in the United States, voters tend to be older and include more White people than the general population, so the study's results may underestimate the sampling bias in the general population. Additionally, since SafeGraph provides researchers with an aggregated version of the data for privacy reasons, researchers could not test for bias at the individual voter level. Instead, the authors tested for bias at physical places of interest, finding evidence that SafeGraph is more likely to capture traffic to places frequented by younger, largely White visitors than to places frequented by older, largely non-White visitors.

More generally, the study shows how administrative data can be used to overcome the lack of demographic information, which is a common hurdle in conducting bias audits.

Amanda Coston, Neel Guha, Derek Ouyang, Lisa Lu, Alexandra Chouldechova, Daniel E Ho.
Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy.
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021. doi: 10.1145/3442188.3445881

Most Popular Now

Patient Safety must be Central to the De…

An EPR system brings together different patient information in one place, making it easier to access for healthcare professionals. This information can include patients' own notes, test results, observations by...

ChatGPT Shows Promise in Answering Patie…

The groundbreaking ChatGPT chatbot shows potential as a time-saving tool for responding to patient questions sent to the urologist's office, suggests a study in the September issue of Urology Practice®...

Survey: Most Americans Comfortable with …

Artificial intelligence (AI) is all around us - from smart home devices to entertainment and social media algorithms. But is AI okay in healthcare? A new national survey commissioned by...

AI Spots Cancer and Viral Infections at …

Researchers at the Centre for Genomic Regulation (CRG), the University of the Basque Country (UPV/EHU), Donostia International Physics Center (DIPC) and the Fundación Biofisica Bizkaia (FBB, located in Biofisika Institute)...

Video Gaming Improves Mental Well-Being

A pioneering study titled "Causal effect of video gaming on mental well-being in Japan 2020-2022," published in Nature Human Behaviour, has conducted the most comprehensive investigation to date on the...

Machine learning helps identify rheumato…

A machine-learning tool created by Weill Cornell Medicine and Hospital for Special Surgery (HSS) investigators can help distinguish subtypes of rheumatoid arthritis (RA), which may help scientists find ways to...

New Diabetes Research Links Blood Glucos…

As part of its ongoing exploration of vocal biomarkers and the role they can play in enhancing health outcomes, Klick Labs published a new study in Scientific Reports - confirming...

New AI Software could Make Diagnosing De…

Although Alzheimer's is the most common cause of dementia - a catchall term for cognitive deficits that impact daily living, like the loss of memory or language - it's not...

A New AI Tool for Cancer

Scientists at Harvard Medical School have designed a versatile, ChatGPT-like AI model capable of performing an array of diagnostic tasks across multiple forms of cancers. The new AI system, described Sept...

Vision-Based ChatGPT Shows Deficits Inte…

Researchers evaluating the performance of ChatGPT-4 Vision found that the model performed well on text-based radiology exam questions but struggled to answer image-related questions accurately. The study's results were published...

Bayer Launches New Healthy-Aging Ecosyst…

Combining a scientifically formulated dietary supplement, a leading-edge wellness companion app, and a saliva-based a biological age test by Chronomics, Bayer is taking a big step in the emerging healthy-aging...

New AI-Driven Tool could Revolutionize B…

Researchers at the Icahn School of Medicine at Mount Sinai have developed a noninvasive technique that could dramatically improve the way doctors monitor intracranial hypertension, a condition where increased pressure...