Diagnostics, cilt.16, sa.9, 2026 (SCI-Expanded, Scopus)
Background/Objectives: Prostate adenocarcinoma exhibits substantial inter-patient heterogeneity, limiting the accuracy of current prognostic tools. Prostate-specific antigen-based assessment remains insufficient for reliable survival prediction. There is a clear need for integrative, data-driven approaches that leverage multi-dimensional clinical and molecular data to improve outcome stratification. This study aimed to develop and evaluate an explicable machine learning framework for predicting overall survival in prostate adenocarcinoma. Methods: A comprehensive machine learning pipeline was constructed using clinical and laboratory data from 494 patients in the TCGA PanCancer Atlas cohort. Following data curation, 16 clinically relevant features were selected through expert-guided filtering and feature selection techniques. Missing values were addressed using imputation strategies, and class imbalance was mitigated using SMOTE. Eight machine learning models were evaluated, including a novel hybrid ensemble model combining Gradient Boosting Machine and random forest (GBM + RF). Model performance was assessed using stratified 10-fold cross-validation and quantified via accuracy, precision, recall, F1-score, and ROC-AUC. Model interpretability was examined using LIME, and prognostic relevance was validated through Cox proportional hazards regression. Results: The hybrid GBM + RF model demonstrated superior performance, achieving 97% accuracy and a ROC-AUC of 0.95 under mode imputation with SMOTE balancing. Ensemble-based models consistently outperformed single classifiers, particularly in handling missing data and class imbalance. Key predictors of survival included progression-free survival, hypoxia-related scores, genomic instability markers, and immune-associated variables. Cox regression analysis confirmed the independent prognostic significance of these features, supporting the biological plausibility of the model. Conclusions: An explainable ensemble machine learning approach enables accurate and clinically interpretable prediction of overall survival in prostate adenocarcinoma. The proposed framework provides a robust foundation for precision urology decision-support systems and warrants validation in independent cohorts.