Search for COVID-19 and Reopening Information Here

COVID-19 Data Cleaning Update

Last Updated: 04/26/2021

On April 26, 2021, New Jersey implemented an automated method to remove duplicate case reports, leading to a one-time drop of 10,442 confirmed cases from the total confirmed cumulative COVID-19 cases.

During Monday's COVID-19 briefing, Dr. Ed Lifshitz, Medical Director of the Communicable Disease Service at the NJ Department of Health, explained:

Every day, thousands of entries related to COVID-19 are made into the Department's database. Reports are received from hundreds – or even thousands - of laboratories, local health departments and healthcare providers. While checks are in place to minimize errors, inevitably a small number of inappropriate cases – overwhelmingly duplicates - are created. As part of an ongoing process, the Communicable Disease Service routinely reviews the database to correct these errors. This is reflected in our daily numbers where we typically subtract out a small number (usually in the double digits) from our previously reported total cumulative cases.

Today we are subtracting 10,442 confirmed cases leaving today's total confirmed cumulative cases lower than yesterday, and I just wanted to take a moment to explain this one time drop.

"Cleaning" of data – a process by which we continually review cases and data in our system for duplicates and errors – has been performed by the Communicable Disease Service since the beginning of the outbreak. Duplicates are identified and merged.

This cleaning has been done manually, and has been effective, however given the large number of cases in the system, it has not been possible to identify all duplicate entries and manually correct them each day.

CDS has worked with Health Information Technology staff to develop an automated method to remove these duplicate reports.

We applied this automated method to the database yesterday and this one-time drop of ~1.2% in the cumulative number of cases reported was the result.

Moving forward, this automated process will be performed on a weekly basis, to supplement the manual daily data cleaning. Since this cleaning will happen weekly, variations in in the cumulative case counts will be small.