In the early months of the pandemic, there was a renewed interest in mobility data as society became fascinated with both the impact of lockdown measures and indeed the adherence to them. New research from Carnegie Mellon and Stanford suggests that while this mobile data is widely used, it may actually contain various forms of demographic biases that render it not especially reliable.
“Older age is a major risk factor for COVID-19-related mortality, and African-American, Native-American, and Latinx communities bear a disproportionately high burden of COVID-19 cases and deaths,” the researchers explain. “If these demographic groups are not well represented in data that are used to inform policymaking, we risk enacting policies that fail to help those at greatest risk and further exacerbating serious disparities in the health care response to the pandemic.”
Reliable data
The researchers focused their attention on the widely used SafeGraph dataset that contains information from around 47 million mobile devices from across the United States. The data is collected from a range of mobile apps, including social media, navigation, and weather, whenever users have opted in to have their location tracked.
At the start of the pandemic, SafeGraph released its data as part of the Covid-19 Data Consortium to allow researchers and policymakers to make more informed decisions. As such, the data has been widely used, including by the Centers for Disease Control and Prevention.
The researchers wanted to gauge just how reliable the data is and argue that the data only functions at a Census-aggregated level and therefore doesn’t address any potential demographic biases for specific places of interest.
Of course, a major obstacle is that the SafeGraph data doesn’t actually contain key information such as age, gender, and race, so the researchers used administrative data to plug the gaps. This data came from voter registration records in North Carolina, which gave them access to 539,000 voters who voted in 558 locations during the 2018 election.
Sampling biases
The analysis reveals sampling biases that clearly under-represent two key groups that are at particularly high risk of Covid. They found that both older and minority voters were less likely to be captured in the SafeGraph mobility data, which may result in an under-allocation of vital health resources to those communities.
“While SafeGraph information may help people make policy decisions, auxiliary information, including prior knowledge about local populations, should also be used to make policy decisions about allocating resources,” the researchers say.
The authors hope that their findings prompt additional research to be undertaken to ensure that similar mobility data is more representative in future. What’s more, they hope that by highlighting the biases in the data they encourage greater transparency as to the sources of data so that such biases can be uncovered sooner.