Abstract

The first purpose of this study is to describe a project focused on comparing the numbers of COVID-19 cases and deaths in the United States reported by four different online trackers, namely, those maintained by USAFacts, the New York Times, Johns Hopkins University, and the COVID Tracking Project. The second purpose of this study is to present results from the first five months of 2020 (January 22-May 31, 2020). This project is ongoing, so it will be updated regularly as new data from each of these trackers become available. Based on the time period included, the NYT has reported more cases than any of the other three trackers since late March/early April, and COVID Tracking Project has reported fewer deaths than any of the other three trackers since mid-March. It is hoped that the discrepancies identified by this project will provide avenues for research on their causes.

Introduction

This study aims to describe a regularly updated project I have been (and still am) conducting, the aim of which is to compare the number of COVID-19 cases and deaths in the United States reported by four different online trackers. In doing this, I hope to provide evidence either for or against the hypothesis that there are systematic differences between the values reported by the different trackers.
The four COVID-19 United States-specific datasets I will be comparing are from USAFacts, the New York Times (hereafter NYT), Johns Hopkins University (hereafter JHU), and the COVID Tracking Project. First of all, there are some differences in the start dates for each of the four datasets. All of them started on January 22, except for the NYT, which started a day earlier (January 21).
The total number of cases in the United States reported by each tracker were compared over time for each date from the first date including all four trackers (i.e. January 22, 2020) to the last day of May (i.e. May 31, 2020). This comparison was done to shed light on the extent to which the number of cases reported by the four included trackers, namely COVID Tracking Project, NYT, JHU, and USAFacts (hereafter simply "the four trackers") differed and how these differences had changed over time. Below, I give the URLs from which I obtained the data from each source used in this study.
  1. The COVID Tracking Project data was obtained from this link: https://covidtracking.com/data/us-daily/
  2. The NYT data was obtained from this link: https://github.com/nytimes/covid-19-data/blob/master/us.csv
  3. The JHU data was obtained from this link: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us
  4. Finally, the USAFacts data was obtained from this link for cases: https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv and this link for deaths: https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv

Results

Comparing the number of cases across the four trackers

The total number of cases over time reported by each of the four trackers is shown graphically in Figure 1. It is clear that all four match up very closely, as would be expected if they are both largely successful in their shared goal of measuring the same underlying value.