Affiliation:
2Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
ORCID: https://orcid.org/0000-0001-6874-9331
Affiliation:
1Department of Industrial Engineering, University of Houston, Houston, TX 77004, USA
Email: yxiang4@central.uh.edu
ORCID: https://orcid.org/0000-0003-0696-2924
Explor Med. 2025;6:1001347 DOl: https://doi.org/10.37349/emed.2025.1001347
Received: January 30, 2025 Accepted: May 28, 2025 Published: July 17, 2025
Academic Editor: Ienglam Lei, Mayo Clinic, USA
The article belongs to the special issue Artificial Intelligence and Machine Learning in Cardiovascular Medicine
Cardiovascular diseases (CVDs) are a leading cause of mortality globally, necessitating innovative approaches for improved diagnosis, prognosis, and treatment. Recent advances in artificial intelligence (AI) and machine learning (ML) have revolutionized cardiovascular medicine by leveraging vast multi-modal datasets—including genetic markers, imaging, and electronic health records (EHRs)—to provide patient-specific insights. This review highlights the transformative potential of AI applications, such as AI-enabled electrocardiograms (ECGs) and deep learning (DL)-based analysis, in enhancing diagnostic and prognostic accuracy and personalizing patient care. Notable progress includes predictive models for a variety of CVDs, including ischemic heart disease, atrial fibrillation, and heart failure, with performance metrics significantly surpassing traditional methods. Emerging technologies, such as explainable AI, large language models, and digital-twin technologies, further expand the horizons of precision cardiology. This paper also discusses challenges facing the AI and ML applications in CVDs and promising future directions.
Cardiovascular disease (CVD) continues to pose a significant global health challenge, accounting for nearly one-third of all deaths worldwide, thus being one of the leading causes of mortality and imposing substantial burdens on healthcare systems [1, 2]. The discovery and early accurate diagnosis and treatment of CVDs are crucial to preventing severe complications, reducing mortality rates, and improving the overall quality of life for patients [3, 4]. With developments in machine learning (ML) and artificial intelligence (AI), its integration in medicine has become a significant factor in assisting medical professionals and other specialists. Some examples include assisting specialists with complex operations, improving decision-making processes, and enabling precise identification of CVDs through imaging. The goal is to reduce the risks associated with complex procedures, enhance understanding of CVDs and individual patient behavior, and improve computer-assisted diagnosis. AI, ML, and deep learning (DL) research focus on cost-effective, rapid, and non-invasive methods for accurately detecting CVDs using advanced performance metrics such as sensitivity, recall, accuracy, F1-score, precision, and specificity [5]. With rising prevalence and escalating costs, the need for innovative diagnostic, prognostic, and therapeutic strategies has never been greater [6]. AI and ML are transforming cardiovascular medicine, leveraging vast datasets to enable precise and personalized care that was previously unattainable [7].
Given the global burden of CVDs, there is a critical need for innovative diagnostic methods that are both scalable to large populations and tailored to individual patients. AI and ML present transformative opportunities by rapidly analyzing multimodal data such as genetic markers, imaging, and electronic health records (EHRs) to uncover complex patterns, enabling faster, more accurate, and patient-specific diagnoses compared to traditional approaches. By leveraging these advanced technologies, healthcare systems can not only detect CVD earlier but also prioritize preventive care and personalized interventions, significantly reducing disease progression and improving outcomes such as mortality across diverse populations. Traditional approaches to diagnosing and managing CVD often rely on population-level data, which limits their applicability to individual patients. AI, however, allows for patient-specific predictions and interventions by uncovering complex relationships within multimodal datasets, such as genetic markers, imaging data, and EHRs [6].
While diagnostic and prognostic applications of AI and ML in cardiovascular medicine differ in focus, diagnosis identifies existing disease, whereas prognosis predicts future outcomes, both heavily rely on supervised learning techniques [8–12]. In both tasks, labeled clinical data trains models to recognize complex and subtle patterns within multimodal datasets such as electrocardiograms (ECGs), imaging, and genetic profiles. Diagnosis frames pattern recognition as distinguishing between disease and non-disease states, while prognosis extends it to predicting disease progression, risk stratification, or treatment response. Thus, whether detecting current abnormalities or forecasting future events, many of the AI and ML approaches treat cardiovascular prediction as a pattern recognition problem rooted in supervised learning frameworks. Emerging technologies such as digital twin simulations and large language models (LLMs) further expand the horizon of AI applications in cardiology. Digital twins enable real-time, personalized modeling of cardiac function and responses to treatment, while LLMs provide enhanced clinical decision support by synthesizing unstructured medical data [13]. These advancements not only improve diagnostic accuracy but also facilitate tailored treatment strategies, such as optimizing cardiac resynchronization therapy (CRT) or identifying candidates for advanced interventions [1].
Despite these advancements, challenges persist, including data bias, lack of standardization, and limited external validation of AI models. Addressing these barriers is crucial for the safe and equitable integration of AI into clinical practice. Regulatory frameworks, such as the upcoming EU Artificial Intelligence Act, aim to establish robust guidelines to ensure the efficacy and safety of AI-driven tools in healthcare [14]. Advancements in AI have revolutionized clinical decision support, particularly in intensive care units, where timely decisions can significantly impact patient outcomes. However, challenges such as the “black box” nature of DL models, dataset biases, and the need for interpretability persist [15]. Removing these barriers is crucial for successfully deploying LLMs in cardiovascular medicine.
This review surveys the most recent advances in AI and ML in cardiovascular medicine, examining their applications in diagnosis, prognosis, and treatment guidance across different prevalent CVs. Additionally, it highlights emerging technologies, addresses current challenges, and outlines future directions for integrating AI into cardiology. Figure 1 below shows the techniques discussed in prognosis and diagnosis.
Flowchart discussing the breakdown of the prognosis and diagnosis techniques mentioned. AI: artificial intelligence; ML: machine learning; ECG: electrocardiogram; CNN: convolutional neural network; CAD: coronary artery disease; DL: deep learning; NLP: natural language processing; EHR: electronic health record; HFpEF: heart failure with preserved ejection fraction; AS: aortic stenosis; MR: mitral regurgitation; HF: heart failure; XGBoost: Extreme Gradient Boosting
Atrial fibrillation (AF), a common cardiac arrhythmia characterized by rapid and irregular heartbeats, is traditionally diagnosed using prolonged monitoring tools such as Holter devices or implantable loop recorders. These methods, while effective, are often resource-intensive and inconvenient. For example, AI-enabled ECGs offer a transformative alternative by detecting AF from a single 10-second, 12-lead ECG, identifying subtle patterns associated with AF even during sinus rhythm, achieving an area under the curve (AUC) of 0.87–0.90 [16]. For example, an AI-enabled ECG model developed using 649,931 ECGs achieved an AUC of 0.87 in detecting AF during sinus rhythm. When multiple ECGs per patient were analyzed, the model’s AUC improved to 0.90, with a sensitivity of 82.3% and a specificity of 83.4%, showcasing its robustness in identifying AF-related structural changes invisible to human interpretation. This level of diagnostic precision is particularly valuable in detecting AF during sinus rhythm or asymptomatic episodes, where traditional monitoring might fail. The high sensitivity and specificity of AI models allow for early identification, enabling timely interventions to mitigate complications like stroke and systemic embolism [16, 17]. Other works that use DL methods for AF diagnosis can be found in [18–21].
The integration of AI into wearable devices, such as smartwatches, has further advanced AF diagnostics. Smartwatch-based ECG systems employing convolutional neural networks (CNNs) have demonstrated sensitivities exceeding 97%, enabling real-time, non-invasive arrhythmia monitoring in ambulatory settings. For instance, studies show that AI-driven algorithms in consumer-grade devices provide detection rates comparable to clinical-grade equipment, significantly enhancing accessibility for early diagnosis and intervention. Additionally, multi-class DNN models trained on over 91,231 single-lead ECGs can classify up to 12 cardiac rhythm categories with a mean F1 score of 0.837, surpassing the performance of experienced cardiologists [22]. AI-ECGs are instrumental in guiding timely and personalized treatment strategies. In AF, early detection through AI-ECGs supports the initiation of anticoagulation therapy, thereby reducing the risk of stroke in patients with undiagnosed AF. This targeted approach mitigates the bleeding risks associated with empiric anticoagulation by ensuring that only those with confirmed or high-risk AF receive treatment [16].
Jo et al. [23] developed a DL model specifically designed to detect AF from standard ECG waveforms, with an emphasis on clinical explainability. Using a large, high-quality dataset of labeled ECGs from the Sejong Medical Center, their model demonstrated exceptional diagnostic performance across multiple ECG formats. These results were consistent across external validation cohorts, including datasets from different countries, devices, and sampling rates, highlighting the model’s robustness and broad applicability. This work is particularly significant given the increasing reliance on portable and wearable ECG technologies, which often operate with fewer leads and under less-controlled environments. As healthcare systems move toward more decentralized and continuous monitoring approaches, high-accuracy, explainable AI (XAI) models like this one offer a promising solution for earlier detection, triaging, and management of AF, potentially reducing adverse outcomes and healthcare burdens.
Ischemic heart disease (IHD), primarily caused by the narrowing or blockage of coronary arteries, poses significant risks such as myocardial infarction (heart attack) and heart failure (HF). Early and accurate diagnosis is critical to mitigating these risks, yet traditional methods, such as stress tests and coronary angiography, can be resource-intensive, invasive, and inaccessible to many populations [24, 25].
AI and ML-based ECG analysis effectively identified patients with aortic stenosis (AS) in a study of 258,607 adults [26]. Moderate to severe AS was present in 3.7% of participants. The AI-ECG achieved high diagnostic accuracy. Adding age and sex improved performance and further increased it in patients without hypertension. Notably, false-positive AI-ECGs were associated with double the risk of developing moderate or severe AS over 15 years. The algorithm in [26] shows promise for screening AS in primary care, pending further validation in diverse populations.
A model trained using paired 12-lead ECG and echocardiogram data from 44,959 patients achieved high performance (an AUC of 0.93, with sensitivity and specificity of 86.3% and 85.7%, respectively) [26].
This demonstrates AI’s capability to detect conditions such as left ventricular systolic dysfunction noninvasively. Another study using AI to analyze ECG patterns achieved an AUC of 0.9954 for detecting ST-segment elevation myocardial infarction (STEMI), outperforming experienced cardiologists in diagnostic accuracy [7].
AI models that integrate multimodal data, such as echocardiography and laboratory biomarkers, further enhance diagnostic precision. For example, by combining ECG findings with echocardiographic data, AI systems improve the identification of structural abnormalities like left ventricular dysfunction and coronary artery blockages. Models analyzing both modalities have shown an AUC of 0.918 in detecting cardiac contractile dysfunction, underscoring their diagnostic value [27]. When applied to asymptomatic populations, AI-driven tools identified patients at four times the risk of developing ventricular dysfunction, even when their initial ejection fraction (EF) appeared normal. This highlights AI’s ability to detect preclinical abnormalities that could lead to future cardiac events [28].
AI integration into widely available technologies such as wearable devices has democratized access to diagnostic tools: smartwatch-based ECG systems utilizing CNNs have achieved sensitivities exceeding 97% in detecting arrhythmias and IHD-related abnormalities. These tools enable real-time, non-invasive monitoring in ambulatory settings, making early detection more accessible, especially in resource-limited environments [27, 28].
The integration of AI into myocardial perfusion imaging (MPI) has significantly enhanced the diagnostic accuracy of IHD. Otaki et al. [29] introduced coronary artery disease (CAD)-DL, an explainable DL model that analyzes stress perfusion, wall motion, and wall thickening maps from SPECT imaging, along with patient age, sex, and cardiac volumes. A key strength of CAD-DL lies in its explainability. By generating attention and probability maps, the model visually highlights myocardial regions contributing to its predictions, allowing clinicians to verify and interpret results in real time. These visual aids enhance trust, making CAD-DL a practical second-reader tool. With results generated in under 12 seconds on standard clinical workstations, this model is not only accurate but also deployment-ready, showcasing how AI can meaningfully integrate into and improve routine cardiovascular diagnostics.
HF (heart failure) is a condition where the heart’s ability to pump blood effectively is compromised, often assessed through left ventricular EF (ejection fraction) [30, 31]. HF often manifests as reduced EF or preserved EF. AI-enabled ECGs have significantly enhanced the diagnostic process by identifying low EF with high sensitivity and specificity, outperforming traditional biomarkers like B-type natriuretic peptide (BNP). These models analyze ECG signals alongside echocardiographic data to detect subclinical left ventricular dysfunction, providing an early and scalable diagnostic solution. This capability is critical for identifying asymptomatic patients who may benefit from early therapeutic interventions [27, 28].
AI-enabled diagnostic systems have shown remarkable efficacy in identifying early signs of cardiac contractile dysfunction, a precursor to HF. In one study, CNNs trained on 44,959 annotated 12-lead ECGs achieved an AUC of 0.93, with sensitivity, specificity, and accuracy scores exceeding 85%. These models demonstrated robust performance in prospective cohorts, successfully screening patients for left ventricular dysfunction across diverse populations [32].
In another study, natural language processing (NLP) was used to analyze EHRs and identify patients with HF with preserved EF (HFpEF) who were previously undiagnosed [1]. In that study of 3,727 patients, only 8.3% had a clinician-assigned HFpEF diagnosis, but the NLP tool found that 75% met the European Society of Cardiology criteria for the condition. These undiagnosed patients had higher hospitalization rates and mortality, highlighting the importance of accurate diagnosis. NLP enhances efficiency by automating data analysis, enabling earlier interventions, and uncovering hidden high-risk cases, showcasing its potential to transform cardiovascular care.
DL models have also shown great potential in HF diagnosis, particularly for HFpEF. An ensemble neural network model analyzing ECG data predicted HFpEF with remarkable accuracy [32]. The study stratified patients into high- and low-risk groups, identifying a 33.6% likelihood of developing HFpEF within 24 months for high-risk individuals compared to 8.4% for low-risk individuals. Sensitivity maps revealed the model’s focus on R waves in the QRS complex and T waves, offering insights into key predictive features [32, 33].
In HF management, AI-guided identification of low EF allows for timely pharmacological intervention, including ACE inhibitors and beta-blockers, which slow disease progression. AI-ECGs also assist in selecting candidates for advanced therapies, such as CRT or implantable cardioverter-defibrillators (ICDs), based on their likelihood of benefit. For example, studies show that AI guidance in primary care clinics led to a higher frequency of follow-up echocardiograms in patients flagged by the model, optimizing diagnostic and treatment resources by prioritizing high-risk cases for more thorough evaluations [28, 33].
AI-based models, including logistic regression (LR), have been applied in recent years to diagnose congestive HF (CHF). Son et al. [34] used LR-based decision-making models to predict CHF in patients with dyspnea, generating decision rules through a rough set (RS)-based model. The RS model outperformed the LR model, achieving greater accuracy. Similarly, random forest (RF) models demonstrated nearly 100% accuracy in CHF classification. Masetic and Subasi [35] tested various classifiers, including SVM, ANN, and k-NN, ultimately selecting RF for its superior accuracy and specificity using long-term ECG data. Additionally, Wu et al. [36] identified CHF over six months before clinical diagnosis by analyzing 179 variables from EHRs with LR and boosting methods, while SVM underperformed due to data imbalance.
Recent advances in AI and ML have significantly improved the detection and diagnosis of valvular heart diseases (VHD). In VHD, such as AS and mitral regurgitation (MR), accurate diagnosis is critical for effective management. AI models have demonstrated high accuracy in diagnosing AS using ECGs [27]. For MR, AI tools incorporate clinical and imaging data to identify phenotypes that correlate with disease severity and surgical outcomes. These advancements provide a non-invasive diagnostic framework for early detection and stratification of patients requiring intervention.
Techniques such as phonocardiogram (PCG) signal analysis using DL have also demonstrated exceptional accuracy in classifying normal and diseased heart sounds. The modified Xception network model, for instance, achieved an accuracy of 99.45%, with sensitivity and specificity of 98.5% and 98.7%, respectively. This performance surpasses other CNN-based models, including VGG16, DenseNet121, and InceptionNet, in both accuracy and efficiency [37].
In one framework, Yang et al. [38] developed a three-stage DL system for evaluating VHD. The framework performed tasks such as echocardiographic view classification, VHD screening, and segmentation for severity metrics. It used a CNN to detect regurgitant lesions by analyzing color in the left atrium and left ventricular outflow tract (LVOT). For stenotic lesions, the algorithm employed Doppler metrics to compute valve areas and pressure gradients. The framework demonstrated high diagnostic accuracy with an AUC of 0.91–0.97 across various VHD types and showed strong performance in a prospective dataset of over 1,300 cases ML for AS severity classification [39].
Sengupta et al. [40] utilized supervised and unsupervised ML to stratify patients into high or low AS severity groups. Using echocardiographic data from 1,052 patients, the algorithm accurately classified 99% of cases with definitive echocardiographic features and improved discrimination and reclassification over conventional methods. Validation showed correlations with other AS markers, including calcium scores and biomarkers such as BNP, optimizing the timing for aortic valve replacement (AVR).
Commercial AI systems have also been developed for AS detection. For example, one system identified moderate or severe AS using echocardiograms without Doppler data, achieving sensitivities and specificities of 91% and 94%, respectively. These advancements suggest growing potential for AI in automating diagnostic workflows and expanding accessibility [41]. NLP has been applied to extract valve severity from echocardiogram reports, addressing challenges with structured data and inaccuracies in ICD coding. Solomon et al. [42] developed an NLP algorithm with a 99% positive predictive value (PPV) and negative predictive value (NPV) for identifying AS. Among patients classified with AS by the algorithm, only 64.6% had a corresponding ICD code. The NLP-derived parameters aligned with physician-assessed severity, highlighting its potential to improve diagnosis recording, care pathways, and adherence to clinical guidelines [43].
Multimodal data integration has also been used for VHD diagnostics. For example, combining echocardiographic data with PCG signals and other biomarkers allows AI systems to refine risk stratification. In a recent study, DL models analyzing 12-lead ECGs achieved AS diagnostic accuracies exceeding 88%, while CNNs trained on over 70,000 ECGs demonstrated internal and external AUCs of 0.816 and 0.877, respectively, for diagnosing MR [44].
El-Sofany et al. [45] proposed a comprehensive ML framework that integrates both public (Cleveland Heart Disease Dataset) and private clinical datasets collected from Egyptian hospitals. The study evaluated ten ML classifiers—including Extreme Gradient Boosting (XGBoost), Random Forest, Support Vector Machine, and LR—combined with three distinct feature selection strategies: chi-square (χ2), ANOVA, and mutual information. Among the evaluated models, XGBoost using the SF-2 feature subset and a balanced dataset through SMOTE achieved the highest diagnostic performance. These results not only surpass many previous benchmarks in heart disease classification, but also demonstrate the model’s ability to identify early-stage heart disease with high confidence.
A key strength of this study lies in its generalizability and adaptability. By incorporating a novel dataset from a population often underrepresented in digital health research, the model accounts for diverse demographic and clinical variations. Additionally, the authors developed a mobile application that operationalizes the best-performing model for real-time symptom-based prediction, enabling individuals—especially those in remote or resource-constrained settings—to screen for heart disease risk using accessible tools. This combination of high-performance prediction, cross-regional data inclusion, and practical application signifies a meaningful step toward scalable, AI-driven cardiac diagnostics in global health settings.
These advancements not only reduce diagnostic errors but also facilitate personalized management plans by identifying phenotypic subgroups with distinct prognoses. This enables tailored surgical interventions and improved post-surgical risk assessments.
Prognosis in cardiology encompasses the prediction of a disease’s likely trajectory, including risks of progression, recurrence, or adverse outcomes. Traditional prognostic models, such as the Framingham risk score (FRS) and Global Registry of Acute Coronary Events (GRACE), rely on structured clinical and demographic data to estimate long-term risks [46–48]. However, these models often generalize predictions based on population averages, limiting their applicability to individual patients. The emergence of AI in cardiology has redefined prognosis by enabling patient-specific predictions through advanced data analysis.
AI algorithms excel in detecting early warning signs of cardiovascular conditions, offering new avenues for preventive cardiology. For instance, AI-enabled ECGs have demonstrated incredible accuracy in predicting AF by identifying subtle structural changes, such as atrial enlargement or fibrosis, that precede clinical manifestations. These capabilities are particularly valuable for patients with cryptogenic strokes, where undiagnosed AF is a significant risk factor for recurrence. By stratifying these patients into high-risk categories, AI-enabled ECGs facilitate timely interventions, such as anticoagulation therapy, to prevent further complications [6, 16].
Multimodal ML models further personalize treatment by incorporating data from wearable devices, lab results, and EHRs. These models analyze patient-specific factors, such as physical activity levels or genetic predispositions, to recommend lifestyle modifications and treatment adjustments that are tailored to the individual’s risk profile. For instance, patients with high cardiovascular risk due to genetic and clinical factors may benefit from personalized preventive measures, including dietary adjustments or specific exercise regimens, guided by the ML model [6]. In the subsections that follow, we will review the use of AI and ML in the prognosis and diagnosis of several common CVDs, including IHD, VHD, and AF.
Accurate risk prediction is essential for early intervention, especially in individuals with subclinical disease. Traditional prognostic models such as the FRS and the GRACE score have been widely used to assess cardiovascular risk. Recent studies report that the FRS achieves an AUC of approximately 0.82 in certain cohorts [49], while the GRACE score demonstrates an AUC of 0.839 for in-hospital mortality prediction in acute coronary syndrome (ACS) patients [50]. However, these models are based on population-level data and often lack the granularity needed for personalized risk estimation.
The integration of AI and ML has transformed risk prediction in IHD by leveraging high-dimensional, multimodal data. For example, XGBoost models have shown impressive performance, achieving an AUC of 0.808 in recent predictive studies [51]. XGBoost is particularly effective at capturing nonlinear patterns in structured clinical datasets but requires careful tuning and offers limited interpretability.
DL approaches, particularly those using CNNs, excel at processing unstructured data such as imaging and ECG signals. A recent study showed that a CNN-based model outperformed traditional CAD2 models across modalities, maintaining strong predictive accuracy while processing raw clinical data [52]. While DL models offer flexibility and scalability, they are data-intensive and often opaque, posing challenges for widespread clinical adoption.
The most promising advancements have emerged from multimodal fusion models, which combine clinical parameters, imaging, genetic data, and EHRs. These hybrid systems improve risk prediction through late fusion techniques and ensemble learning. For example, a multimodal AI model incorporating EHR, imaging, and genomics achieved a significantly higher predictive accuracy compared to traditional scores, improving both AUC and AUCPR metrics (Cao et al. [51]).
In summary, while traditional tools like FRS and GRACE have established benchmarks for cardiovascular risk prediction, AI-enabled models—particularly those integrating diverse data sources, demonstrate superior accuracy and personalized insights. However, successful clinical translation depends on improving model interpretability, addressing data quality and bias, and validating performance across diverse populations.
VHD encompasses a range of conditions characterized by dysfunction of one or more cardiac valves, including MR, AS, and tricuspid regurgitation. Left untreated, these conditions can lead to HF, arrhythmias, and other life-threatening complications. In AS, CNNs analyzing 12-lead ECGs can detect early abnormalities with diagnostic accuracy exceeding 88%. These models also predict post-treatment outcomes for interventions like transcatheter AVR (TAVR), integrating variables such as echocardiographic findings, biomarkers, and comorbidities to assess survival and risk of complications. Similarly, for MR, unsupervised ML techniques classify patients into phenotypic subgroups based on clinical and echocardiographic data, enabling tailored surgical strategies and improved post-surgical risk assessment [27].
AI’s predictive power is further amplified through multimodal data integration, combining imaging, hemodynamic, genetic, and proteomic information to refine risk stratification and guide personalized interventions. The integration of AI into VHD diagnostics has produced significant results [53]. For AS, deep-learning models achieved internal and external AUCs of 0.884 and 0.861, respectively, in detecting early abnormalities and predicting post-treatment outcomes for procedures like TAVR. Similarly, CNNs trained on over 70,000 ECGs achieved internal and external AUCs of 0.816 and 0.877 for diagnosing MR. For pulmonary hypertension, predictive models demonstrated external AUCs as high as 0.902, facilitating early intervention and improving risk assessments for post-surgical complications.
Echocardiography remains a cornerstone in diagnosing AS, but its operator dependency and variability pose challenges. AI and ML have transformed echocardiographic analysis by automating image interpretation. DL models trained on echocardiographic data can accurately diagnose AS by analyzing two-dimensional and Doppler features. In a cohort of 256 patients, AI closely matched human measurements of critical parameters [54]. These models not only enhance diagnostic accuracy but also reduce reliance on operator expertise, promising greater consistency in echocardiographic assessments. Although further validation is needed, AI-driven echocardiography shows potential for streamlining diagnostics and improving outcomes in patients with AS [54].
To improve the reliability of AI/ML models used in prognostic tasks such as predicting VHD progression, current research has emphasized the need for bias mitigation and standardization in clinical imaging datasets. As highlighted by Hasanzadeh et al. [55], bias often originates from inconsistencies in image acquisition, labeling protocols, and underrepresentation of diverse patient populations. These factors can lead to models that perform well in development settings but poorly in real-world deployment. Measurement bias, in particular, is a concern in image-based tasks, where variability in imaging hardware, staining techniques, or labeling standards (e.g., in echocardiographic or MRI data) can distort model predictions. To address this, ongoing efforts such as standardized acquisition protocols, stratified sampling, external validation across diverse populations, and model lifecycle surveillance are being implemented to enhance generalizability and fairness. Integrating these strategies into cardiovascular AI workflows is essential to ensure accurate and equitable prognostic tools for all patient groups.
AI-powered digital tools for screening and monitoring of VHDs have proven to be very effective in many studies [56]. A meta-analysis on the efficacy of AI models demonstrated that the collective AI models had a pooled accuracy of 81%, sensitivity was 83%, and specificity was 72% [57]. These results clearly show AI-driven ECG offers high accuracy in VHD screening. However, a combined approach with clinical judgment is necessary in primary care settings.
Beyond diagnostics, multimodal AI approaches enhance the understanding of patient heterogeneity by identifying subgroups with distinct prognoses and therapeutic responses. This capability significantly improves clinical decision-making and individualized patient management. These models also assess post-surgical risk, guiding closer monitoring for patients with a poor prognosis. Despite challenges such as data limitations and the interpretability of DL models, AI-driven tools are transforming VHD management by enhancing early detection, optimizing surgical decisions, and informing long-term care strategies. Continued research and validation are essential for further integrating these technologies into routine clinical practice [27].
AF is one of the most common cardiac arrhythmias, characterized by irregular and often rapid heart rhythms. It is associated with significant morbidity and mortality, primarily due to its strong correlation with ischemic stroke, HF, and systemic embolism. Accurate prognosis and early prediction of AF are critical for timely intervention and risk mitigation. ML models, particularly CNNs, have revolutionized the early prediction of AF. By analyzing sinus rhythm ECGs, these models detect precursors of AF, such as atrial enlargement or fibrosis, with exceptional accuracy. For example, a CNN trained on 649,931 annotated 12-lead ECGs achieved an AUC of 0.87, enabling the early identification of individuals at risk for AF [27]. These AI tools have demonstrated the ability to predict AF-related complications, such as ischemic strokes, even before clinical symptoms manifest. This predictive capability facilitates timely interventions, including anticoagulation therapy, reducing the risk of recurrent cerebrovascular events in high-risk populations [6].
While ECG analysis remains central to AF diagnostics, incorporating multimodal data enhances predictive power. Combining ECG signals with cardiac imaging data, such as echocardiographic measures of left atrial volume and fibrosis, significantly improves risk stratification. Additionally, integrating circulating biomarkers, such as NT-proBNP and C-reactive protein (CRP), with genomic data helps assess systemic inflammation and genetic predisposition to AF [6]. For instance, late fusion models combining genetic and electrophysiological data have shown a 2.1% improvement in AUC and a 9.1% enhancement in area under the precision-recall curve (AUCPR), highlighting the efficacy of integrating diverse data sources for AF risk assessment [6]. The clinical utility of AI-driven tools in AF extends beyond diagnosis and prediction to personalized management. Multimodal risk stratification allows for tailored interventions, such as intensified rhythm control strategies for high-risk subgroups. Furthermore, these models guide resource allocation by identifying patients who would benefit most from advanced therapies, including catheter ablation or left atrial appendage occlusion.
ML has also been instrumental in predicting complications related to AF, particularly HF, which is closely linked to AF and carries significant prognostic implications. A recent study utilized ML models to predict hospitalization due to HF in AF patients, leveraging a comprehensive dataset from the Fushimi AF Registry. The registry included detailed data from 4,394 patients, divided into derivation (2,383 patients) and validation (2,011 patients) cohorts. Data preprocessing involved the exclusion of variables with more than 30% missing data and imputation of missing values using mean values for continuous variables and mode values for categorical variables. Of the initial 168 variables, 66 were retained for model development after careful evaluation for clinical relevance.
The study employed six supervised ML algorithms, including RF, light gradient boosting machine, elastic net, linear support vector machine, neural network, and Naïve Bayes, with hyperparameter tuning performed using a grid search algorithm and 5-fold cross-validation. The models were evaluated based on sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC). Notably, the RF model demonstrated robust predictive performance with an AUC of 0.75, outperforming traditional models such as the Framingham HF risk model (AUC: 0.67) [58].
For practical application, a final set of seven variables was selected based on their clinical validity, feasibility, and applicability. These variables included age, history of HF, creatinine clearance, cardiothoracic ratio, left ventricular EF, LV end-systolic diameter, and LV asynergy. Validation of the model was conducted using the validation cohort, with multiple imputations addressing missing data to enhance reliability. Kaplan-Meier analysis showed that the model effectively stratified patients into distinct risk groups, with high-risk individuals demonstrating a significantly greater likelihood of HF hospitalization compared to low-risk groups [58]. This study highlights the value of integrating diverse data sources, including imaging data and biomarkers, to enhance predictive accuracy. By focusing on clinically accessible variables, the model offers a practical tool for identifying high-risk AF patients, supporting timely interventions and resource allocation. These advancements highlight the potential of ML not only for prognosis but also for guiding personalized management strategies in patients with AF, bridging the gap between early diagnosis and the prevention of severe complications such as HF hospitalization [58]. While HF is a significant and well-studied complication of AF, other major outcomes—such as stroke and AF progression—have also been targeted in AI-based prognostic models. For example, Serhal et al. [59] highlight that AF is the leading cause of stroke-related death and morbidity, responsible for strokes in over 17 million people globally according to the World Stroke Organization. The review emphasizes that AI models leveraging ECG waveforms and wavelet-based features are being increasingly adopted to identify high-risk individuals for early intervention. These models can capture subtle variations in RR intervals or P-wave morphology that precede thromboembolic events, potentially aiding in stroke risk stratification.
In addition, the paper discusses how AI models are used to predict AF progression, such as transitions from paroxysmal to persistent or permanent AF. The ability to anticipate this clinical worsening enables tailored treatment plans and early therapeutic intervention. For instance, ML models using wavelet entropy or multiscale entropy applied to ECG data have been shown to effectively distinguish between stable and progressing AF cases.
Across all these conditions, XAI plays a pivotal role in building clinician trust and ensuring adoption. Techniques such as attention mapping and feature importance scoring highlight the specific regions of input data—such as ECG segments—that contribute to the model’s predictions. In conditions like HF, where diagnostic complexity is high, these methods provide clinicians with transparent and interpretable insights, aiding in critical decision-making and enhancing confidence in AI-guided diagnostics [17, 27].
A compelling demonstration of XAI in cardiovascular diagnostics is the use of probabilistic graphical models (PGMs) to model multimorbidity landscapes derived from large-scale EHR datasets. Unlike black-box neural networks, PGMs explicitly represent the conditional dependencies between clinical variables such as diagnoses, medications, and procedures, enabling clinicians to trace how each factor contributes to a specific outcome. In the study by Wesołowski et al. [60], these models were used to quantify individualized risk for complex cardiovascular events—including heart transplant and sinoatrial node dysfunction while allowing users to pose flexible, interpretable queries on combinations of risk factors. For example, the model showed that milrinone use in a patient with cardiomyopathy resulted in a 407-fold increased risk for heart transplantation. These interpretable networks support both hypothesis generation and clinical validation, and their deployment as web-based, PHI-free applications illustrates the scalability and real-world potential of XAI tools in precision medicine.
Another example used to overcome the interpretability challenges of black-box AI models, Jo et al. [23] designed a modular architecture combining feature-specific subnetworks and neural-backed ensemble trees (NBETs). Their model separately quantified rhythm irregularity and P-wave absence—two hallmark features of AF—before making a diagnostic decision. This modular setup enabled clinicians to trace the AI’s conclusions to physiologic ECG features, such as stating “AF was detected due to the absence of a P-wave”, thereby improving trust, transparency, and clinical adoption.
To address the black-box nature of ML models, El-Sofany et al. [45] implemented an XAI framework using SHAP to visualize the contribution of individual features to heart disease predictions. This interpretability allows healthcare providers to understand why predictions were made—such as elevated cholesterol or abnormal ST depression—enabling more informed clinical decisions. Their future plans to incorporate ICE, Local Interpretable Model-agnostic Explanation (LIME), and rule-based explanations reflect a comprehensive strategy to ensure transparency and trust in AI-driven diagnostics.
While DL models have demonstrated success in cardiovascular diagnostics, the demand for explainability remains critical for clinical adoption. The framework proposed by Sarra et al. [61] exemplifies how diagnostic transparency can be improved through structured architectures like 1D-CNN and Bi-LSTM, trained on both real and GAN-augmented datasets. By incorporating dimensionality reduction techniques such as PCA, the model not only reduced computational complexity but also facilitated clearer interpretation of the most informative features contributing to heart disease predictions. Their results achieving up to 99.3% accuracy and 100% AUC demonstrate the potential of explainable, data-augmented DL models to provide clinically actionable insights while avoiding overfitting and bias from limited datasets. This approach aligns with the broader goal of XAI: enhancing model trustworthiness by allowing clinicians to trace predictions back to relevant physiological signals (e.g., ECG, blood pressure). Such advances bridge the gap between black-box algorithms and interpretable, reliable clinical decision-making tools.
To address the challenge of interpretability, Otaki et al. [29] developed an XAI system integrated into clinical SPECT MPI workflows. By producing CAD attention maps and probability overlays, the model visually highlighted the myocardial segments influencing its predictions. This explainability empowered clinicians to validate AI recommendations against known perfusion defects, thereby improving diagnostic confidence and potential patient communication. The model’s rapid inference time (< 12 seconds) and compatibility with standard workstations further support its real-world applicability.
LLMs, such as the Generative Pre-trained Transformer (GPT), are advanced DL systems trained on extensive datasets such as EHRs, imaging reports, and clinical time series data to process and produce human-like language. Their integration into cardiovascular medicine is emerging as a transformative tool, offering novel applications in diagnostics, patient management, and clinical decision-making. The ability of LLMs to integrate multimodal data mirrors existing applications of recurrent neural networks and reinforcement learning (RL) in cardiovascular ICUs, which analyze temporal patterns to support diagnostic and prognostic decision-making [62].
LLMs are constructed using DL frameworks. They rely on extensive training data such as text data from medical literature, clinical guidelines, and patient records to build these models. LLMs also rely on transformer architecture, which allows models to analyze context in text by understanding relationships between words in sentences, making them capable of complex reasoning. Additionally, another tool LLMs rely on is Continuous Learning, given that models improve with more data and feedback, refining their ability to handle specific medical queries, interpret diagnostic reports, and summarize patient information.
LLMs are increasingly utilized across various areas of cardiovascular medicine, significantly enhancing efficiency and accuracy in clinical processes. LLMs play a critical role in clinical decision support by integrating patient history, laboratory results, imaging studies, and clinical guidelines to assist physicians in diagnosing conditions such as ACS and HFpEF [1].
Face recognition has been evaluated in the emergency room and in CAD. Forte et al. [63] assessed whether a neural transfer CNN for data augmentation, trained on a dataset of simulated and augmented facial photographs reflecting acutely ill patients, could differentiate between healthy individuals and those infused with lipopolysaccharide to simulate acute illness. In the external validation set, the four individual feature models focusing on different parts of the face distinguished acutely ill patients with sensitivities ranging from 10.5% for the skin model to 89.4% for the nose model. Specificity ranged from 42.1% for the nose to 94.7% for the skin. A stacked model combining all four facial features achieved a C-index of 0.67, distinguishing acutely ill patients with a sensitivity of 100% but a specificity of only 42%.
Several studies have demonstrated the feasibility of detecting CAD and predicting outcomes based on a single facial photo with reasonable accuracy [64]. The algorithm examines hair structure and density, wrinkles on the forehead, around the eyes, and the chin, deriving a comprehensive analysis that correlates with clinical outcomes, such as major adverse cardiovascular events, drawn from large patient populations. However, in a Chinese study evaluating 5,796 patients across eight sites, the C-index remained modest at 0.73—higher than the widely used Diamond-Forrester model and the CAD consortium clinical score—indicating promise but highlighting the need for improved sensitivity and specificity for broader clinical utility.
Voice-based diagnostics, another emerging AI application, leverage speech analysis to detect subtle changes associated with CVDs. These methods analyze physiologic inputs, such as laryngeal nerve function, arterial blood supply, and respiratory dynamics, to identify conditions like arrhythmias. In the context of AF, one study recorded vowel sounds, such as “Ahh” and “Ohh”, alongside ECG tracings in 158 patients with AF. Following cardioversion to sinus rhythm, numerical values generated by the algorithm became markedly more homogeneous, demonstrating its ability to track AF episodes. The area under the receiver operating characteristic curve (AUC) exceeded 0.98 for “Ahh” and 0.89 for “Ohh”, illustrating the promise of AI-based voice recognition for AF detection. Although these results are promising, validation in larger, independent cohorts is needed to establish clinical utility [65].
The potential of LLMs in cardiovascular medicine is highlighted by impressive achievements. For instance, the Articulate Medical Intelligence Explorer (AMIE) demonstrated diagnostic accuracy equivalent to human experts, achieving high efficiency in processing patient data. Similarly, NLP algorithms powered by LLMs have been used to analyze EHRs, identifying undiagnosed HFpEF with a sensitivity of 75%, far surpassing manual clinical documentation. These examples highlight the transformative capabilities of LLMs in improving diagnostic precision and streamlining healthcare workflows [1].
The concept of a “digital twin”—a virtual replica of a patient’s heart—represents an innovative step toward precision medicine. Unlike traditional models that offer static snapshots, digital twins enable continuous updates, leveraging real-time data to reflect the evolving health of the patient and predict disease trajectories. Digital twins are constructed using real-time patient data, such as ECGs, imaging scans, and physiological measurements, to simulate individual responses to potential treatments. This approach allows clinicians to test various treatment scenarios virtually, offering a predictive tool to identify optimal therapies based on a patient’s unique health profile. Digital twin technology is particularly promising for managing complex conditions like HF, where it can simulate responses to treatments such as CRT [22]. Cardiovascular digital twins have the potential to revolutionize individual health monitoring by providing continuous feedback on the risk of adverse events such as myocardial infarction, HF, or stroke [66, 67]. These systems will also maintain comprehensive historical records of an individual’s cardiovascular health, analyzing physiological data collected over months or even years [67]. This capability is particularly valuable for physicians, enabling them to assess a patient’s current health status relative to their baseline with greater accuracy and precision. As a result, disease diagnosis and personalized treatment planning become more effective and targeted [68]. Figure 2 outlines the workflow for digital twin technology in cardiology.
The workflow of digital twin technology in cardiology. It begins with the collection of biometric data (Step 1) and its integration via cloud storage (Step 2). A virtual heart model is then created using simulation and mechanistic models (Step 3) and updated in real time with live data (Step 4). The twin enables predictive modeling and therapy testing (Step 5), while AI algorithms provide diagnostic insights and treatment recommendations (Step 6). Final outputs are delivered to clinicians via dashboards and patient records (Step 7). HF: heart failure; AI: artificial intelligence
Furthermore, cardiovascular digital twins empower medical practitioners to simulate potential outcomes, predict health trajectories, and make informed decisions about the most effective interventions based on a patient’s unique cardiovascular profile. By integrating personalized risk assessment, prevention, and treatment strategies, digital twins will significantly enhance the delivery of precision medicine in cardiovascular care.
One of the most significant advancements in digital twin technology is its ability to integrate multiscale mechanistic models with AI. Mechanistic models utilize established principles of electrophysiology and biomechanics to simulate cardiac function from cellular to system levels. For example, finite element models of the heart, combined with lumped-parameter models of systemic circulation, can simulate myocardial strain, ventricular pressures, and flow dynamics. AI complements these models by processing large, heterogeneous datasets, bridging gaps in the mechanistic framework, and automating processes like parameter estimation and sensor data integration [13].
Recent advancements demonstrated the successful integration of wearable body-worn sensors, Bluetooth technology, and 5G networks, enabling real-time data acquisition of biosignals and their seamless integration into the digital twin interface. For example, edge computing innovations processed these biosignals to detect dysrhythmias in myocardial infarction patients with high sensitivity and specificity (over 90%). Another application utilized ML algorithms to detect stenoses and assess aneurysm severity with an accuracy exceeding 95% based on a physiologically realistic virtual patient database [69].
In one specific study, a digital twin model was constructed to simulate HF dynamics by employing high-fidelity finite element models coupled with real-world data streams from implantable devices. This digital twin was used to optimize implantable device settings, reducing intervention failure rates by 30% and achieving a 15% improvement in patient outcomes compared to traditional approaches [70]. Furthermore, by enabling iterative testing of various interventions in silico, the model significantly accelerated the preclinical development process, cutting costs by up to 25% and reducing time-to-market for therapeutic devices [70].
To achieve widespread clinical adoption, challenges such as continuous updating, integration of multimodal data, and standardization need to be addressed. Current efforts include utilizing wearable technology for continuous monitoring, incorporating AI to improve real-time updates, and enhancing interpretability to build trust among clinicians. As these issues are resolved, digital twins hold the potential to revolutionize cardiovascular healthcare by combining precision diagnostics with predictive, personalized treatment strategies [13].
While AI-enabled ECGs hold promise, several challenges must be addressed to facilitate widespread clinical integration (illustrated in Figure 3). One key issue is data bias; many ML models in cardiology are trained on limited, homogeneous datasets that may not represent the broader population. This limitation can lead to reduced generalizability and accuracy in diverse patient groups. Federated learning, a technique that allows models to be trained across multiple datasets without centralizing patient data, offers a potential solution by enabling diverse data access while maintaining privacy [6, 22].
Ethical challenges and opportunities for using AI in cardiovascular medicine. ECG: electrocardiogram; AI: artificial intelligence; LIME: Local Interpretable Model-agnostic Explanation
Additionally, AI and ML algorithms must meet rigorous quality standards to ensure their safe integration into clinical practice. These standards include clearly defined intended use, high-quality and diverse data, sufficient sample sizes, robust external validation, openness of data and software, and the ability to adapt as patient demographics and treatments evolve [7]. Larger cohorts (e.g., > 10,000 participants) improve model reliability when data quality is maintained, and external validation on independent datasets—preferably from different institutions or countries—is essential before clinical application [71].
However, many AI-based cardiovascular tools lack these standards, with nearly two-thirds failing proper validation, which risks patient harm [72]. Only validated models should be used clinically, and efforts to assess evidence quality must improve, as few randomized controlled trials currently exist in this field, leaving observational studies prone to bias [73]. Regulatory frameworks like the EU Artificial Intelligence Act, set to take effect in 2024, require conformity assessments to ensure compliance before approval [73].
These frameworks emphasize transparency and explainability of algorithms, especially given the “black box” nature of many AI systems. XAI models, such as LIMEs, aim to break down decision-making processes and address concerns about bias, thus enhancing clinician and patient trust [74].
Beyond technical challenges, ethical considerations are critical. Issues such as privacy, consent, and bias in algorithm development require robust solutions. For instance, integrating diverse data sources is essential to reduce disparities, as current cardiovascular AI models often fail to adequately represent minority populations [74]. Furthermore, while AI has the potential to democratize healthcare through accessible tools like wearable ECG monitors, it also raises concerns about cybersecurity and the potential misuse of sensitive patient data [74].
The high computational demands of AI model development also present sustainability challenges. Training AI systems requires significant energy, contributing to environmental concerns. Addressing this issue will require the use of optimized algorithms and curated datasets to balance model performance with environmental impact [74].
To overcome these challenges and realize the full potential of AI-enabled ECGs, future efforts must focus on developing robust validation protocols, addressing ethical concerns, and ensuring regulatory compliance. Collaboration among stakeholders, including clinicians, developers, and policymakers, will be key to driving innovation while safeguarding patient welfare.
AI and ML are redefining the landscape of cardiovascular medicine, offering unprecedented capabilities in diagnosis, prognosis, and personalized treatment. AI-enabled tools, such as ECGs and multimodal models, have demonstrated remarkable accuracy in predicting conditions like AF and IHD, achieving metrics that significantly surpass traditional methods. Similarly, innovations like digital twins and LLMs are paving the way for more precise, data-driven patient care.
The integration of AI into clinical practice not only enhances early detection of life-threatening conditions but also optimizes resource allocation by prioritizing high-risk patients for advanced interventions. AI-driven approaches have proven especially valuable in overcoming limitations of operator dependency in diagnostics, as seen in the automation of echocardiographic analyses for valvular heart disease. Moreover, these advancements offer scalable solutions for under-resourced healthcare settings, democratizing access to high-quality cardiovascular care.
Despite its transformative potential, the widespread adoption of AI in cardiology faces challenges, including biases in training datasets, lack of generalizability, and limited external validation. Addressing these issues will require robust regulatory frameworks, such as the EU Artificial Intelligence Act, alongside collaborative efforts to ensure diverse and high-quality data collection.
In conclusion, AI and ML represent a paradigm shift in cardiovascular medicine, with the potential to significantly reduce morbidity and mortality associated with CVDs. Continued research, validation, and interdisciplinary collaboration are imperative to harness the full potential of these technologies and ensure their safe, equitable, and effective implementation in clinical practice.
AF: atrial fibrillation
AI: artificial intelligence
AS: aortic stenosis
AUC: area under the curve
CAD: coronary artery disease
CHF: congestive heart failure
CNN: convolutional neural network
CRT: cardiac resynchronization therapy
CVDs: cardiovascular diseases
DL: deep learning
ECG: electrocardiogram
EF: ejection fraction
EHRs: electronic health records
FRS: Framingham risk score
GRACE: Global Registry of Acute Coronary Events
HF: heart failure
HFpEF: heart failure with preserved ejection fraction
ICDs: implantable cardioverter-defibrillators
IHD: ischemic heart disease
LLMs: large language models
LR: logistic regression
ML: machine learning
MR: mitral regurgitation
NLP: natural language processing
RF: random forest
VHD: valvular heart diseases
XAI: explainable artificial intelligence
XGBoost: Extreme Gradient Boosting
AVB: Conceptualization, Investigation, Methodology, Writing—original draft, Writing—review & editing. YX: Supervision, Validation, Writing—review & editing. JW: Supervision, Validation. All authors have read and approved the submitted version.
The authors declare that they have no conflicts of interest.
Not applicable.
Not applicable.
Not applicable.
Not applicable.
Not applicable.
© The Author(s) 2025.
Open Exploration maintains a neutral stance on jurisdictional claims in published institutional affiliations and maps. All opinions expressed in this article are the personal views of the author(s) and do not represent the stance of the editorial team or the publisher.
Copyright: © The Author(s) 2025. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
View: 140
Download: 6
Times Cited: 0