MEDRXIV PREPRINT, 2020
Pooling methods allow to test several patients with fewer reagents by combining their samples in a single test tube. These methods are effective when the prevalence of the disease is low so that the probability of all samples in a pool being negative is high but fail as the prevalence increases, as shown in Figure 1. Pooling efficiency is our main tool to quantitatively compare different pooling protocols, as it is defined as the number of patients tested per detection kit consumed.
Smart Pooling is a machine learning method that enhances the efficiency of pooling testing strategies. Smart Pooling exploits clinical or demographic information of samples to estimate their probability of testing positive for COVID-19. As Figure 2 shows, our method uses these probabilities to arrange all samples into pools that maximize the efficiency of the testing phase. That is, we group such that positive samples are excluded from the pooling process and are evaluated in single tests, thus reducing the number of tests used for the same number of samples.
Figure 3 shows the pipeline for Smart Pooling. Samples and data are collected from patients. The Smart Pooling model processes these data and returns an arrangement with the probability that each sample tests positive. In the lab, samples are pooled based on this arrangement. Subsequently, samples from positive pools are tested individually. Finally, the diagnostic of each sample is fed to the Smart Pooling platform. This enlarges the dataset and allows for continuous learning.
We had access to complementary information for each patient, such as: sex; age; date of onset of symptoms; date of the medical consultation; initial patient classification; information about the patients’ occupation; affiliation to the health system; travels (international or domestic); comorbidities; symptoms; and if they had had contact with a confirmed or suspected COVID-19 case. We collected 2,068 samples from April 18th to July 15th 2020.
Smart Pooling has a higher efficiency gain compared to two-step pooling and individual testing at every prevalence.
At a 20% prevalence, with 25,000 testing kits:
Smart Pooling enhances, but does not replace molecular testing. Smart Pooling uses artificial intelligence to enhance the performance of well-established diagnostics. It is an example of how data-driven models can complement, not replace, high-confidence molecular methods.
Smart Pooling could ease access to large scale testing. Adopting Smart Pooling could translate into more accessible and larger-scale massive testing. In the case of Colombia, this could mean testing 45,000 samples daily, instead of 25,000. If deployed globally, Smart Pooling truly has the potential to empower humanity to respond to the COVID-19 pandemic. It is an example of how artificial intelligence can be set to bring social good.