SMART POOLING: AI POWERED COVID-19 TESTING

M. ESCOBAR , G. JEANNERET , L. BRAVO-SÁNCHEZ , A. CASTILLO, C. GÓMEZ , D. VALDERRAMA, M. F. ROA, J. MARTÍNEZ, J. MADRID-WOLFF , M. CEPEDA, M. GUEVARA-SUAREZ , O. L. SARMIENTO , A. L. MEDAGLIA , M. FORERO-SHELTON , M. VELASCO, J. M. PEDRAZA-LEAL, S. RESTREPO, AND P. ARBELÁEZ

MEDRXIV PREPRINT, 2020

Abstract

Massive molecular testing for COVID-19 has been pointed as fundamental to moderate the spread of the disease. Pooling methods can enhance testing efficiency, but they are viable only at very low prevalences of the disease. We propose Smart Pooling, a machine learning method that uses sociodemographic data from patients to increase the efficiency of pooled molecular testing for COVID-19 by arranging samples into all-negative pools. We show efficiency gains of 42% with respect to individual testing at disease prevalence of up to 25%, a regime in which two-step pooling offers marginal efficiency gains. Additionally, we calculate the possible efficiency gains of one- and two-dimensional two-step pooling strategies and present the optimal strategies for disease prevalences up to 25%. We discuss practical limitations to conduct pooling in the laboratory.

Pooling strategies


Pooling methods allow to test several patients with fewer reagents by combining their samples in a single test tube. These methods are effective when the prevalence of the disease is low so that the probability of all samples in a pool being negative is high but fail as the prevalence increases, as shown in Figure 1. Pooling efficiency is our main tool to quantitatively compare different pooling protocols, as it is defined as the number of patients tested per detection kit consumed.

Figure 1. Two level pooling with a high prevalence of the disease.

Smart pooling


Smart Pooling is a machine learning method that enhances the efficiency of pooling testing strategies. Smart Pooling exploits clinical or demographic information of samples to estimate their probability of testing positive for COVID-19. As Figure 2 shows, our method uses these probabilities to arrange all samples into pools that maximize the efficiency of the testing phase. That is, we group such that positive samples are excluded from the pooling process and are evaluated in single tests, thus reducing the number of tests used for the same number of samples.

Figure 2. Smart pooling arranges samples into pools that maximize the probability of being all-negative.

Pipeline


Figure 3 shows the pipeline for Smart Pooling. Samples and data are collected from patients. The Smart Pooling model processes these data and returns an arrangement with the probability that each sample tests positive. In the lab, samples are pooled based on this arrangement. Subsequently, samples from positive pools are tested individually. Finally, the diagnostic of each sample is fed to the Smart Pooling platform. This enlarges the dataset and allows for continuous learning.

Figure 3. Smart Pooling pipeline.

Results


We had access to complementary information for each patient, such as: sex; age; date of onset of symptoms; date of the medical consultation; initial patient classification; information about the patients’ occupation; affiliation to the health system; travels (international or domestic); comorbidities; symptoms; and if they had had contact with a confirmed or suspected COVID-19 case. We collected 2,068 samples from April 18th to July 15th 2020.

Smart Pooling has a higher efficiency gain compared to two-step pooling and individual testing at every prevalence.

At a 20% prevalence, with 25,000 testing kits:

METHODEFFICIENCYPATIENTS
Individual testing100%25,000
Two-step pooling101%25,250
Smart pooling155%38,750

Conclusions


Smart Pooling enhances, but does not replace molecular testing. Smart Pooling uses artificial intelligence to enhance the performance of well-established diagnostics. It is an example of how data-driven models can complement, not replace, high-confidence molecular methods.

Smart Pooling could ease access to large scale testing. Adopting Smart Pooling could translate into more accessible and larger-scale massive testing. In the case of Colombia, this could mean testing 45,000 samples daily, instead of 25,000. If deployed globally, Smart Pooling truly has the potential to empower humanity to respond to the COVID-19 pandemic. It is an example of how artificial intelligence can be set to bring social good.

Universidad de los Andes | Monitored by Mineducación
Recognition as University: Decree 1297 of May 30th, 1964.
Recognition as legal entity: Resolution 28 of February 23, 1949 Minjusticia.

© Universidad de los Andes. All rights reserved.