Clinical Decision Support System for COVID-19 patients management

May 2, 2020

Everybody has head about COVID

The first case of COVID-19 was detected in Wuhan, China, on December 31st, 2019. Soon after, the WHO has characterized this viral disease as a global pandemic. Many governments have imposed a lockdown to protect the NHS from being overwhelmed and limit the spread of the disease. Since then, the COVID-19 has seriously impacted businesses and the economy. It also has been had serious implications for people’s livelihoods and health. For example, my 6 months of industrial placement as a Data Scientist in Airbus has been canceled.

The Goal

EPIC IMPOC is a clinical decision support system used in the NHS to improve the management of infectious diseases by facilitating data collection, infection diagnostics, and antimicrobial therapy advice at the point of care.

In the midst of the SARS-CoV-2 outbreak, we have implemented a new module for EPIC IMPOC (Enhanced, Personalized, and Integrated Care for Infection Management at the Point of Care), specifically to provide aid to clinicians to diagnose COVID-19 patients and treat them by retrieving similar cases.

The Final Solution

This module is a new component of EPIC IMPOC that focuses specifically on integrating current medical knowledge of COVID-19 and patient data with an inference engine to predict the likelihood of infection.

Additionally, it includes a patient retrieval system that is packaged in a desktop-friendly panel for clinicians within EPIC IMPOC to identify patients with similar severity levels, symptoms, and medical histories in the UK.

This EPIC IMPOC module is one of the first planned to undergo clinical trials in the NHS, starting in early July 2020. This solution will be a crucial part of the Imperial College Healthcare NHS Trust’s COVID-19 response to the second wave in the pandemic. The Trust’s five hospitals currently treat more than 1.125 million patients and employ 11,800 people annually.

Technical Overview and Learning outcomes

Data Preprocessing and Pipeline

During this project, I have preprocessed the data using Pandas library and selected the most important features using Random Forests. I have also built a data pipeline that includes: data filtering, data imputation, and data scaling that maximizes the Probabilistic Inference Module and the Case Base Reasoning system performances.

Probabilistic Inference Module

I have also worked on the Probabilistic Inference module estimates the probability of COVID-19 infection in a patient based on key pathology data. I have evaluated different machine learning algorithms using mainly Scikit Learn library, and the following algorithms were selected: LightGBM, Random Forest Classifier, and Decision Tree Classifier.

LightGBM Performance in Predicting SARS-COV-2 Infection Based on Patients' Blood Tests

Case Base Reasoning System

Finally, I have implemented the Case Base Reasoning system that retrieves similar patients and provide them to the clinicians to support them with treatment prescription using, on the one hand, the pathology profile, and on the other hand, the vital signs, symptoms, radiology results and medical history of the patient.

The CBR cycle - the 4 phases of the CBR cycle according to Aamodt and Plaza

This has been done by computing the weighted distance between a given patient and the remainder of cases and selecting the most similar cases. The system performance was evaluated using the patient outcome to verify and investigate patient similarities.

To evaluate the quality of the retrieved patients, a simple voting system was used to predict the patients' outcomes. The percentage of retrieved patients that have been admitted to the ICU/died was compared to a predefined threshold. In fact, after observing that the classes are highly imbalanced, the threshold has been lowered to maximize performance.

Comparaison of the CBR Performance in Death prediction for Different Distance Metrics

Comparaison of the CBR Performance in ICU Prediction for Different Distance Metrics

Finally, I have worked with the “Back-end Django Application Team” in order to integrate the machine learning models and the data pipeline with the Django Application.

Clinical Decision Support System for COVID-19 patients management

A Clinical Decision Support System (CDSS) that helps clinicians to diagnose COVID-19 patients and treat them by retrieving similar cases

Clinical Decision Support System for COVID-19 patients management

Everybody has head about COVID

The Goal

The Final Solution

Technical Overview and Learning outcomes

Data Preprocessing and Pipeline

Probabilistic Inference Module

LightGBM Performance in Predicting SARS-COV-2 Infection Based on Patients' Blood Tests

Case Base Reasoning System

The CBR cycle - the 4 phases of the CBR cycle according to Aamodt and Plaza

Workflow of the Supervised CBR

Comparaison of the CBR Performance in Death prediction for Different Distance Metrics

Comparaison of the CBR Performance in ICU Prediction for Different Distance Metrics