Evaluating Deep Learning Models for Cross-Platform UI Component Detection: A Study Across Web, Desktop, and Mobile Interfaces (2025)

Procedia Computer Science

Authors

Madalina Dicu, Camelia Chira

Abstract

User interfaces look different across web, desktop, and mobile platforms — not just in layout, but in how buttons, icons, and text appear. This makes it hard for deep learning models trained on one platform to accurately detect UI components on another. In this paper, we evaluate the cross-domain generalization of three modern object detectors — YOLOv8, YOLOv9, and Faster R-CNN — trained on one or more GUI platforms using three datasets: GENGUI (web), UICVD (desktop), and VINS (mobile). We focus on three common UI classes: Text, Button, and Icon, and compare model performance across four scenarios: in-domain training, domain adaptation, fine-tuning, and combined training. Our results show that YOLOv9 consistently delivers the best cross-domain performance, especially when fine-tuned — achieving up to 95.5% mAP when adapted from desktop to web interfaces. We also fnd that Text is the most transferable class, while Button and Icon require adaptation to new visual styles. Fine-tuning emerges as the most effective strategy for improving generalization with limited data.

Citation

@Inproceedings{Dicu2025EvaluatingDL,
 author = {Madalina Dicu and Camelia Chira},
 booktitle = {Procedia Computer Science},
 title = {Evaluating Deep Learning Models for Cross-Platform UI Component Detection: A Study Across Web, Desktop, and Mobile Interfaces},
 year = {2025}
}