Sensors, cilt.26, sa.3, 2026 (SCI-Expanded, Scopus)
The non-destructive and chemical-free determination of anthocyanin content in single maize kernels is of great importance for plant-breeding programs. Previous studies have mainly relied on Near-Infrared Reflectance (NIR) spectroscopy and color-based approaches, often using conventional or randomly selected modeling techniques. In this study, an Automated Machine Learning (AutoML) framework was employed to predict anthocyanin content using spectral and digital image data obtained from individual maize kernels measured in two orientations (embryo-up and embryo-down). Forty colored maize genotypes representing diverse phenotypic characteristics were analyzed. Digital images were acquired in RGB, HSV, and LAB color spaces, together with NIR spectral data, from a total of 200 kernels. Reference anthocyanin content was determined using a colorimetric method. Ten datasets were constructed by combining different color space and spectral features and were grouped according to kernel orientation. AutoML was used to evaluate nine machine learning algorithms, while Partial Least Squares Regression (PLSR) served as a classical benchmark method, resulting in the development of 1918 predictive models. Kernel orientation had a notable effect on model performance and outlier detection. The best predictions were obtained from the RGB dataset for embryo-up kernels and from the combined RGB+HSV+LAB+NIR dataset for embryo-down kernels. Overall, AutoML outperformed conventional modeling by automatically identifying optimal algorithms for specific data structures, demonstrating its potential as an efficient screening tool for anthocyanin content at the single-kernel level.