Opening Ceremony
[10:20 – 10:30] Opening the Workshop Gabriela Czibula (Babeș-Bolyai University), Eugen Mihuleț (Romanian National Meteorological Administration) |
Meteorological Research
(chaired by Istvan Czibula)
[10:30 – 10:55] Weather-data driven models for regional energy forecasting: integrating photovoltaic production, consumption patterns, and climate change projections
Eugen Mihuleț (Romanian National Meteorological Administration)
The global push towards sustainable energy solutions has placed renewable energy sources, particularly photovoltaic (PV) systems, at the forefront of power generation strategies. In the context of the substantial fall in the costs of renewables and the evident growth in the share of renewables from the total energy production, the increasing deployment of both wind and PV means that power systems are becoming highly dependent on the weather. This work aims to investigate methods adapted to the Romanian context for forecasting both PV generation and energy consumption. While this work investigates two apparently different topics (PV generation and energy consumption), we consider that in the context of a future national energy system dominated by renewable energy sources, it is very interesting to approach these two topics simultaneously. This approach aims to provide a coherent, practical, and useful tool for stakeholders and policymakers to analyse the impact of weather under various climate scenarios. Such an approach would be useful for smart grid and demand response studies, regional energy planning and community energy systems, microgrid and distributed energy resource (DER) studies, and integrated energy systems for urban areas.
[10:55 – 11:20] The Accumulation of MTG LI Level 2 Products
Raluca Ștefan (Romanian National Meteorological Administration)
In December 2022, the first satellite of the Third series of Geostationary Meteosat satellites (MTG-Imager) was launched. MTG represents the first European geostationary meteorological satellite system capable of detecting lightning over Europe, Africa, the Middle East, and parts of South America. For the first time, it will be possible to track the entire lifecycle of a storm from space, from the initial atmospheric instability to the actual lightning discharges. In processing this data, the C programming language was chosen as a first step due to its superior processing speed. Among the advantages of the C language are its efficient memory management, given that execution directly interacts with the computer’s processor architecture. Parallelisation is much more efficient in C, as it allows multiple threads to run simultaneously, whereas in Python, due to the GIL (Global Interpreter Lock), only one thread can execute at a time. The developed prototype generates a two-dimensional matrix variable containing data on accumulated events over the period specified by the NWCSAF users. Subsequently, the list of saved variables can be updated. The generated results ensure the understanding and prediction of severe meteorological phenomena, with the accumulation process assisting in identifying areas at high.
[11:20 – 11:45] Adapting COALITON-4 Nowcasting ML model to the NMA’s operational context
Claudiu Adam (Romanian National Meteorological Administration)
Based on NMA’s operational needs for automating the severe weather warning issuance chain, we are adapting COALITION-4 developed by the team at MeteoSwiss. The proposed model is a data-driven encoder-decoder model for predicting heavy rain, hail, and lightning at 0-1 hour lead time. The ML model integrates weather radar data, satellite imagery (FCI), lightning detection sensors, NWP model output, and DEM data, where radar products were found to be the most important data type for all three hazard type predictions. Similar to the Switzerland case, in Romania’s case most data products were processed to be time series at 5-min intervals with the intent of generating training patches. All files containing the data were converted to NetCDF format for better data handling and parallel processing and all complex mathematical computations were transferred from the CPU to be executed on the GPU.
[11:45 – 12:10] A study on using generative deep learning architectures for classification of ground-based cloud images
Alexandrescu Ștefan (Babeș-Bolyai University)
Clouds play an essential role in Earth’s climate, being the source of precipitation, controlling the amount of solar energy that reaches the surface of our planet, as well as affecting the overall Earth radiation budget. Thus, cloud classification is important as it helps in weather forecasting and supervising climate changes. Clouds are organised in many forms, their composition and density are variable, with different colours and heights, which makes their classification a challenging one. Two new Machine Learning approaches, a Diffusion-based and a Vision Transformer-based classifier, are introduced. Experiments are conducted on multiple ground-based cloud datasets of different sizes and characteristics, using various techniques and training methods. The performances are compared to that of other deep learning architectures used in the literature.
Applications of Deep Learning
(chaired by Zsuzsanna Marian and Iuliana Bocicor)
[12:30 – 12:55] Learning Code Embeddings from Abstract Syntax Graphs via Graph Attention Networks
Sîrbu Alexandru-Gabriel (Babeș-Bolyai University)
Embeddings are dense vector representations of data that capture essential features, enabling models to better understand the structure and semantics of the information which increases their overall performance. In the context of programming languages, models often treat code as a natural language, causing them to learn both syntax and semantics implicitly. This work proposes a novel approach to code embedding generation based on Abstract Syntax Graphs (ASGs), a more compact and less redundant representation than traditional Abstract Syntax Trees, particularly at the leaf node level. Thus, a Graph Attention Network model is pre-trained on a next node and edge prediction task, effectively learning to reconstruct ASGs and internalize structural information. To evaluate the quality of the resulting embeddings, the embedded data is clustered using K-Means, DBSCAN, and Spectral Clustering, using cosine similarity as the distance metric, and the silhouette score to assess the quality of the embeddings.
[12:55 – 13:20] FS-iSDP: A few-shot learning approach using siamese networks for interpretable software defect prediction
Chelaru Ioana-Gabriela (Babeș-Bolyai University)
Software defect prediction (SDP) plays a critical role in improving software quality by identifying defective components during development and maintenance. This paper introduces FS-iSDP, an interpretable few-shot learning approach based on Siamese Neural Networks, designed to address the challenges posed by highly imbalanced SDP datasets. The proposed model learns to measure the similarity between software application classes represented through software metric-based features, enabling effective defect classification with limited training examples. Experiments were conducted on the Apache Calcite open-source project using a cross-version setup that simulates realistic software evolution. The results show that FS-iSDP achieves high recall and strong AUC values across multiple software versions, even as the number of defects decreases and class imbalance intensifies. To improve performance and reduce computational costs, Univariate Feature Selection is applied, which leads to improved precision and critical success index, along with a significant reduction in training time. In addition, a comparative evaluation is performed against state-of-the-art SDP methods, showing that FS-iSDP achieves competitive performance in most metrics. Finally, LIME analysis is used to interpret the model predictions, offering insight into which features most influence defect classification decisions. This approach proves effective for realistic, evolving software systems, and provides both predictive capabilities and interpretability in scenarios with scarce defect labels.
[13:20 – 13:45] A Deep Learning Approach for Medical Visual Question Answering via Cascaded Multitask Learning
Toader Teodora-Alexandra (Babeș-Bolyai University)
Medical Visual Question Answering (MVQA) is a multimodal task that combines visual and textual information to address medical inquiries, with promising applications in computer-aided diagnosis and medical education. While Deep Learning has shown strong potential in this area, its effectiveness is limited by the scarcity of high-quality, annotated medical data. To address this challenge, we introduce our model, CAMMA, a cascaded multitask architecture for MVQA, which achieves state-of-the-art performance on the OVQA dataset with 71.45% accuracy. By leveraging the advantages of multitask learning, CAMMA reduces overfitting and enhances data efficiency by capitalizing on the additional output information for each input sample. We further assess its generalisability on the VQA-Med 2019 dataset and explore the impact of task selection and weighting within the multitask framework.
[13:45 – 14:10] Reconstructing perceived visual stimuli from brain fMRI signals using diffusion models
Gabor Ioana (Babeș-Bolyai University)
Visual brain decoding is a field of study focused on using brain activity to understand and reconstruct visual experiences. Recent advances in generative AI, particularly diffusion models and OpenAI’s Contrastive Language-Image Pre-training (CLIP) embeddings, have led to more semantically accurate reconstructions. In this presentation, we explore the potential for improving the computational efficiency of state-of-the-art fMRI-to-image models. We hypothesize that good-quality images can be generated by predicting only a small fraction of the total number of CLIP embeddings typically used by existing methods. To evaluate this hypothesis, we conduct experiments on the Natural Scenes Dataset using both a simple ridge regression model and an adapted version of a more complex federated learning approach (BrainGuard).
[14:10 – 14:35] Money laundering detection using autoencoder graph neural networks
Grama Tudor-Ionuț (Babeș-Bolyai University)
As part of this work, we aim to detect money laundering operations in transaction data represented as graph data-structures. Our main contribution is integrating autoencoder components in graph neural network architectures, in order to incorporate a reconstruction step in the traditional edge classification problem and enhance model quality based upon the usage of reconstruction errors. We also discuss other necessary adaptations in order to render graph neural network architectures capable of properly handling transaction data. Another novel part of our contribution is the use of SHAP and SHAP values on financial transaction data, in order to potentially gain further insights into the most important features from distinguishing normal and fraudulent activities and their individual impact.