If I asked you to provide a COVID-19 statistic, I bet you could.
I bet many of you could report the current number of cases in your state, or at least you’d know what the tally was sometime within the past week. During the past few days, it’s likely you heard the statistics from television, the internet, or someone you know. Learning the new number of cases, deaths, and recoveries has become a somber routine for people all over the world.
On a personal scale, the statistics help us understand what the virus means for our routines and relationships. On a broader scale, the statistics are being used to predict the spread of the virus and model future health threats. This information is critical for community planning and emergency resource allocation to help contain the crisis.
Data in a time of pandemic
As a GIS professional, I’ve been thinking about the data challenges posed by this pandemic—and there are many. First off, data is coming from over 200 countries. This data needs to be updated daily: the situation is evolving rapidly, and it is essential we stay updated. In addition, the data is not super reliable as COVID-19 testing methods are still in development. This results in inaccurate tests about a third of the time. Moreover, many infected people are not getting tested because their symptoms are mild.
These data challenges—disparate source data, serious time crunch, and data unreliability—would be daunting even in the best of times. So what systems are in place to confront the data challenges of COVID-19?
In the US, The Centers for Disease Control and Prevention (CDC) developed “COVID-19 Case Report Form” to facilitate standardized reporting from hospitals and local health departments. In Europe, the European Centre for Disease Prevention and Control (ECDC) completes a daily screening and data validation of reported cases from up to 500 data sources. Additionally, ECDC receives updates from the countries in the European Union (EU), the European Economic Area (EEA), as well as the World Health and other international stakeholders.
At Mead & Hunt, we address some of these same data challenges, albeit with different stakes. We have had much success addressing these challenges as well as some valuable lessons learned. I’d like to share a few of them here.
Disparate data sources
Some time ago, our team was tasked with creating a GIS dataset for a particular asset for the entire US. To create this nationwide data, we had to take input data from all 50 states and synthesize the information into a single shapefile. The input data came to us in dozens of different formats with varying and/or inconsistent data attribution.
Since this dataset requires periodic updates, we developed recommended format standards for data submission. The standards were provided as a “how-to” document. To offer states some flexibility, we provided data submission standards for table and shapefile formats. We did this to accommodate differing staff and software resources among state agencies. With our data submission recommendation documents, we hope to improve the quality of the input data in the future and streamline the data update process.
Serious Time Crunch
Operating under a time crunch is part and parcel to the consulting world. It is our job to deliver quality products within timeframes that are often challenging. I have learned that it is important to recognize the difference between a challenging deadline and an unrealistic deadline. Sometimes it is necessary to tell a client that a deadline is unrealistic rather than try to get something done without leaving time for proper quality control.
Unreliable Data
Over the past decade, we have witnessed the growth of free, publicly available GIS datasets. This trend has made it easier to quickly obtain data for project analysis. What we have learned, however, is that free, publicly available data does not always suit our project needs.
For example, vehicular crash data compiled at a state or county level is generally reliable for identifying crash trends for large areas. However, applying this same data source to a specific traffic corridor, intersection, or other small area may be too generalized to extract meaningful information—or worse, it may be spatially inaccurate to the point of producing significant errors in the analysis.
To avoid data that is unreliable or unfit for our projects, our team develops a data needs matrix for all projects. This matrix includes data source and data development methodologies for each data element. The matrix is then vetted by our GIS analysts, engineers, and project managers.
Key takeaways
In many ways, the challenges of COVID 19 are incomparable to our everyday data tasks. In hospitals and medical centers, professionals are literally putting their lives on the line at the source data point. Despite some fundamental differences, there are many relatable data challenges. We now have a unique opportunity to see how these data challenges are being addressed during a time of crisis.
Through global collaboration between hospitals, health agencies, governments, and academic institutions, we are amassing data and lessons learned from all over the world. Thanks to the efforts of brave and competent people all over the world, this data and analysis is being used to improve our emergency response to COVID, as well as defend against future pandemics.