Development of the machine learning (ML) module
For Signals
Proprietary algorithms for classifying ECG signals were developed and implemented, based on the Lempel-Ziv complexity concept (1976) derived from information theory. This is an efficient estimator of entropy. The proposed approach assumes that disease information is reflected in the biosignal. Available methods for signal encoding and digitization (Zhang et al., 2001; Abasolo et al., 2011) were reviewed, and a novel encoding method was proposed that accounts for signal fluctuations and extensively utilizes biosignal variability. This method considers the oscillations of successive ECG measurements around the signal’s mean value, known as the Fluctuation Addressing Encoding Method (FAEM).
Machine learning methods were used to identify the optimal parameters for the proposed estimator. Recurrent Neural Networks (RNNs), capable of modeling the sequential nature of ECG signals, were employed for ECG signal classification, which is crucial for tracking changes over time.
The algorithms were developed and evaluated using ECG signals sourced from retrospective public databases and project partners. A database was created during the project’s first stage, containing signals from the following sources:
– National Sleep Research Resource** from the Sleep Heart Health Study: 1. 6441 patients over 40 years old and 2. 3295 patients
– MIT-BIH Normal Sinus Rhythm Database (nsrdb)**: 18 ECG signals (healthy individuals)
– Project partners’ ECG signals:
– PEACS BV, Netherlands** (study on the impact of physical and mental stress on heart health and the occurrence of Microvascular Coronary Diseases – MCD) from 20 patients and additional signals from the **MIT-BIH Arrhythmia Database**: 47 patients.
Two additional databases were analyzed:
– PTB-XL (standard 12-lead ECG recordings from Physikalisch Technische Bundesanstalt)
– CPSC (10,330 standard 12-lead ECG recordings used in the “China Physiological Signal Challenge”).
Based on the PTB-XL database, distributions of all ECGs marked as normal and free from any abnormalities by independent experts were created. For each ECG signal from both databases, a measure of conformity with the distribution of normal ECGs was determined (separately for each segment), both in terms of the potential location in milliseconds from activation and the potential value in millivolts. These measures were used to develop a model for detecting abnormalities in the ECG signal. Logistic regression was employed due to the interpretability of all independent variables, which is challenging to achieve with typical machine learning and artificial intelligence models. To reduce the impact of borderline cases on the model’s predictive capabilities, between 0 and 10% of outlier observations from the normal ECG distribution were removed.
In total, 14,381 ECGs were used to construct the normal WaveECG and PathECG distribution and fit the model, and 6,320 independent ECGs were used for validation. The QRS model’s performance achieved an AUC of 82.1% (80.3% – 84%) for men, 86.1% (84.7% – 87.6%) for women, and 84.3% (83.1% – 85.5%) for both genders combined, resulting in sensitivity and specificity of 70.7% (69.1% – 72.3%), 77.1% (72.5% – 81.3%), 69.1% (67.2% – 71%), 80.8% (77.2% – 84.1%), 71.4% (70.2% – 72.6%), and 77.8% (74.8% – 80.4%), respectively.
The P-wave model’s performance was significantly lower, with an AUC of 65.4% (62.8% – 68.0%) for women and 66.8% (63.9% – 69.6%) for men. The only significant predictor was WaveECG, where a 10-percentage-point increase in WaveECG corresponded to a 19.9% reduction in the likelihood of an abnormal ECG for women’s ECG evaluated according to male criteria and a 23% reduction for men’s ECG evaluated according to both genders’ criteria.
A thorough evaluation of the impact of removing outlier observations from the normal ECG distribution on the predictive ability of the analyzed parameters was also conducted. It was shown that asymmetrical removal of outliers significantly increases the model’s predictive ability, especially for the T-wave, as demonstrated by an AUC of 81.5% in detecting abnormal ECGs. The greatest asymmetry was observed for path-ST, which increased the AUC from 55% to 69%.
Based on the confusion matrix, accuracy (ACC), precision (PR), recall (RC), F1-score (F1), and specificity (SP) were calculated for all classes in the training and test sets. ROC curves were presented along with AUC values and 95% confidence intervals (CI) for each class.
Four ROC curves for the classification model demonstrate performance for each class:
– Normal Sinus Rhythm/NSR [AUC=0.99; 95% CI: (98.33%, 98.96%)]
– Left Bundle Branch Block/LBBB [AUC=0.99; 95% CI: (99.91%, 99.84%)]
– Right Bundle Branch Block/RBBB [AUC=0.99; 95% CI: (98.22%, 98.06%)]
– Undefined/NA [AUC=0.96; 95% CI: (95.68%, 96.87%)]
For Images
The databases used for algorithm development were expanded with data from a database covering 53 men and 27 women (82 3D CT scans with abdominal contrast) from the National Institutes of Health Clinical Center (NIH) [https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU] and a database from the Memorial Sloan Kettering Cancer Center (New York, NY, USA), containing 281 contrast-enhanced abdominal CT scans.
The use of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) of the Long Short-Term Memory (LSTM) type, and autoencoders was also proposed. Convolutional networks were selected as effective for analyzing spatial patterns. For medical image segmentation, such as CT scans, convolutional neural networks like 3D-UNET and the You Only Look Once (YOLO) version 7 object detection algorithm were proposed.
The segmentation model’s performance, expressed as the DICE score in the test database (independent of the training database), was 0.73 (95% CI: 0.68–0.77) for healthy individuals and 0.72 (95% CI: 0.7–0.74) for cancer patients. The IOU score was 0.58 (95% CI: 0.53–0.63) and 0.57 (95% CI: 0.54–0.6), respectively. Voxel-level AUC values were 0.9865 (95% CI: 0.98646–0.98658) and 0.9898 (95% CI: 0.98976–0.98989), respectively.