Opening Ceremony
| [10:20 – 10:30] Opening the Workshop Gabriela Czibula (Babeș-Bolyai University), Eugen Mihuleț (Romanian National Meteorological Administration) |
Meteorological Research
(chaired by Istvan Czibula)
[10:30 – 10:55] ICON-LAM Model Configuration: Sensitivity Tests
Ioan Ștefan Gabrian (Romanian National Meteorological Administration)
In the transition from the COSMO model to the new ICONLAM (icosahedral nonhydrostatic model in limited-area mode), a series of sensitivity tests were carried out to calibrate the operational configuration on a 2.8 km grid covering Romania. Three versions of the ICON model were compared: the current operational configuration (OPER) and two updated D2 configurations for summer (July 2024, dominated by convective systems) and winter (February 2025, dominated by synoptic-scale forcing). The analyzed periods cover distinct seasonal regimes (intense convection in summer versus large-scale conditions in winter). All configurations use the same time step and the same initial and boundary conditions from the IFS model, with hourly outputs consistent with the existing operational schedule. The aim is to quantify possible improvements in critical surface parameters and to prepare the transition of the new configuration into operational use.
[10:55 – 11:20] MTG-I1 Lightning Imager NWC SAF Products: overview, algorithm and its applications
Raluca Gabrian (Romanian National Meteorological Administration)
In December 2022, the first satellite of the Third series of Geostationary Meteosat satellites (MTG-I1) was launched. It carries two new instruments: the Flexible Combined Imager (FCI) and the Lightning Imager (LI). LI, the first instrument of its kind on a European satellite—detects lightning over Europe, Africa, the Middle East, and parts of South America, enabling storm monitoring from early instability to lightning discharges. LIStack product is capable of accumulating lightning data over a user configurable period of time. If users want to compare or correlate LIStack products with ground-station data, a parallax correction is available and can be configured by the user. The LIJump product was developed to detect small, short-duration discharges linked to the most rapid changes in flash rate. It can help identify storms that may produce severe weather (including tornadoes, hail, and straight-line winds) in the short term. LIJump is a prototype product that uses MTG-LI lightning flashes (LFL), builds storm “cells” directly from LI, and detects rapid increases in lightning activity, which are often connected to severe weather within the next 0–60 minutes. Using data from this new instrument, the aim is to extend capabilities by developing a system that processes it and generates near-real-time products, to improve both the understanding and the prediction of lightning-related events.
[11:20 – 11:45] GEM-Qt: Generating embeddings for meteorological reanalysis data using a quadtree-based approach
Andrei Mihai (Babeș-Bolyai University)
Data embeddings are used to capture essential features of raw high-dimensional data by transforming it into a lower-dimensional space and preserving essential characteristics of the data. Our research focused on creating embeddings for meteorological reanalysis data, and we developed a new method, GEM-Qt, that uses quadtree-based decomposition of the data, to compress and encode relevant meteorological information from it. Reanalysis data embeddings are helpful for data clustering, classification, and retrieval tasks, as they allow for the grouping of similar items and efficient querying in large meteorological datasets. The quality of the proposed quadtree-based embeddings is validated using clustering-based performance metrics. A comparison to related work highlights that the GEM-Qt outperforms other baseline models proposed in the literature for generating embeddings from meteorological images.
[11:45 – 12:10] Enhancing photovoltaic energy forecasting using centralized and distributed deep learning on remote sensing data
Alexandrescu Ștefan (Babeș-Bolyai University)
The proliferation of photovoltaic (PV)-based energy generation from various sources and systems in the last decades is of significant importance, due to the need of integrating more renewable energy sources to overcome fossil fuel-based and pollution-producing power plants. In the context of the increasing deployment of PV farms, power systems are becoming highly depend on several factors, such as weather parameters or age, and the fluctuations in energy generation can lead to several challenges. As solar energy is also dominating the growth in renewable capacity, a demand for accurate, robust and efficient forecasting methods has risen. Additionally, there are no few situations where there is a need for scalability, per-site parallelization, efficient communication between different sites or local data privacy concerns. Therefore, we analyse several centralized deep-learning estimators for PV energy output from power plants, and also distributed artificial intelligence (DAI)-based systems that efficiently handle multi-site energy prediction or heterogenous data from long-distanced locations.
Applications of Deep Learning
(chaired by Iuliana Bocicor)
[12:30 – 12:55] An empirical analysis about the usage of static source code metrics for Software Defect Prediction
Oneț-Marian Zsuzsanna (Babeș-Bolyai University)
Software Defect Prediction (SDP) is defined as the automated identification of defective components within a software system. It is a supervised classification problem and there is a huge number of Machine Learning (ML) approaches proposed for it in the literature. Depending on where the train and test data come from, there are three settings: inner version defect prediction, cross-version defect prediction and cross-project defect prediction. We first investigated the cross-version defect prediction setting from a novel perspective: since the train and test data are two versions of the same project, they are likely to contain identical or very similar instances which might affect the reliability of evaluation for a ML approach. Our experiments on 28 pairs of successive software versions indicate that the presence of these instances inflate the measured performance. We extended our experiments to investigate the presence and effect of identical instances to all three SDP settings and considered a large set of different repositories proposed in the literature. As features, for every data set we considered only the static source code metrics, since they are very popular as features for SDP
[12:55 – 13:20] ET-SDP: Enhancing code embeddings with effort-related and test coverage metrics for improved software defect prediction
Chelaru Ioana-Gabriela (Babeș-Bolyai University)
Software defect prediction (SDP) aims to identify defect-prone code modules in order to optimize testing resources and improve software quality. While traditional approaches rely on software metrics derived from code structure, this paper proposes ET-SDP, an approach that enhances code embeddings using effort-related and test coverage metrics. We introduce three feature sets: the top-30 software metrics selected through clustering relevance ranking, 170 effort-related metrics capturing development process characteristics, and five test coverage metrics derived from unit tests. Our evaluation on 15 releases of Apache Calcite and 6 releases of Apache Ant-Ivy demonstrates that effort-related and test coverage metrics provide better defect prediction performance with far fewer features than traditional software metrics. In unsupervised clustering experiments, effort-based embeddings achieve better alignment with defect labels. In supervised classification, effort-related features achieve an AUC of 0.932 on Calcite and 0.883 on Ant-Ivy, outperforming the top-30 software metrics at comparable feature count (AUC of 0.632 and 0.587, respectively). Feature importance analysis reveals that test coverage metrics are the strongest predictors of defect proneness. These findings suggest that process-oriented metrics, particularly those related to code testing and development history, capture defect patterns more effectively per feature than static code structure metrics, offering practical guidance for software quality assurance.
[13:20 – 13:45] X-cDiT: Explainable Latent Conditional Diffusion Inpainting with Transformers for Iris Occlusion Removal
Negoițescu Andreea (Babeș-Bolyai University)
This paper presents the development of X-cDiT, an explainable iris inpainting model that employs conditional diffusion with Transformers in a latent space created using a pre-trained variational autoencoder fine-tuned via Mean Squared Error. The conditioning is leveraged by computing ResNet-50 iris identity embeddings for each subject. The goal is removing iris occlusions that may affect iris recognition processes, such as irregular light reflections, eyelashes and eyelids, and restoring the missing areas by artificially generating visually and biologically compliant iris texture portions. Following the model training and evaluation on visible light spectrum (VIS) and near-infrared (NIR) datasets separately, X-cDiT was proven to outperform existing related studies on this topic. Moreover, VIS experiments produced better results compared to NIR ones. The PSNR and SSIM validation metrics computed on whole images after inpainting have values of 40.239 and 0.982 on VIS, respectively 36.854 and 0.957 on NIR. Calculated exclusively on the inpainted areas, the obtained PSNR registers 27.164 on VIS and 25.619 on NIR. The occlusion mask ratio also represents a significant factor that influences the inpainting performance, as smaller masks lead to better results.
[13:45 – 14:10] BRAI-XNet: A Deep Learning Framework for Brain Tumor Classification from MRI
Boroș Patricia-Ioana (Babeș-Bolyai University)
Brain tumor classification from MRI is a central task in medical image analysis with increasing relevance. The BRAI-XNet framework addresses multi-class classification across glioma, meningioma, pituitary, and no-tumor categories. Transfer learning is applied using pretrained CNN backbones including MobileNetV2, EfficientNet-B0, and VGG16. Additional lightweight modules are introduced for mixed attention, residual feature propagation, and spatial boundary refinement. Training follows a two-stage strategy with frozen backbone warm-up and partial fine-tuning of higher layers. Evaluation is performed using accuracy and macro-F1 to ensure balanced performance across classes. Comparisons include standard backbones versus enhanced variants and individual models versus ensemble methods. Explainability is achieved through Grad-CAM and LIME visualizations highlighting relevant anatomical regions.
[14:10 – 14:35] Early prediction of endometriosis risk through explainable artificial intelligence
Măgui Anca-Elena (Babeș-Bolyai University)
Endometriosis is a gynecological condition frequently characterized by prolonged delays between the onset of symptoms and formal clinical diagnosis. To mitigate this latency, we propose a machine learning-driven approach for the early estimation of endometriosis risk using self-reported symptoms, emphasizing the critical integration of Explainable Artificial Intelligence. Following a rigorous feature selection pipeline designed to isolate the most predictive clinical indicators, advanced predictive models were developed and optimized utilizing a Nested Cross-Validation framework coupled with Optuna-based hyperparameter tuning to prevent data leakage. To make the predictive models interpretable for clinicians, we applied Shapley Additive Explanations (SHAP) values and surrogate decision trees. Furthermore, the clinical viability of the predictions was validated through calibration curves and decision curve analysis.