Enhancing semantic segmentation for urban accessibility using high-fidelity synthetic data

Main Article Content

Santiago Felipe Luna Romero
Renato Gouveia
Mauren Abreu Souza

Abstract

Semantic segmentation of urban scenes is essential for the development of smart cities; however, its effectiveness relies heavily on large, pixel-level annotated datasets, which are particularly scarce for mobility aids. This study aims to enhance semantic segmentation for urban accessibility applications by leveraging synthetic data. The proposed methodology integrates high-fidelity synthetic data generation using Unreal Engine 5.1, automated semantic mask processing, and the training of state-of-the-art segmentation models. A dataset of 5,036 images with pixel-perfect labels across 22 classes, including sidewalks, wheelchairs, and walking aids, was created to support this investigation. Two architectures were benchmarked: a baseline U-Net and DeepLabv3+ with ASPP. Pre-training with synthetic data increased global mIoU from 0.0626 to 0.84 (13.4x) and substantially improved precision, recall, and F1-score (by approximately 6.8x, 9.3x, and 10.4x, respectively). For accessibility-critical classes, motorized wheelchairs achieved an IoU of 0.94, and sidewalks attained a recall of 0.98. Overall, all 22 classes surpassed the deployment threshold ( ≥ 0.75 IoU). These findings demonstrate that synthetic data, combined with imbalance-aware training strategies, provides a viable pathway toward robust semantic segmentation solutions for urban accessibility applications.

Article Details

Section
Scientific Paper
Author Biographies

Santiago Felipe Luna Romero, Pontifícia Universidade Católica do Paraná

PhD candidate at The Pontifícia Universidade Católica do Paraná - Artificial Intelligence - Electronic Engineering

Renato Gouveia, Pontifícia Universidade Católica do Paraná

 

 

Mauren Abreu Souza, Pontifícia Universidade Católica do Paraná

PhD and Post-doctoral research involved 3D multimodality imaging modelling.

References

[1] M. Ivanovs, K. Ozols, A. Dobrajs, and R. Kadikis, “Improving semantic segmentation of urban scenes for self-driving cars with synthetic images,” Sensors, vol. 22, no. 6, p. 2252, Mar. 2022. [Online]. Available: http://doi.org/10.3390/s22062252

[2] E. Mohamed, K. Sirlantzis, and G. Howells, “Indoor/outdoor semantic segmentation using deep learning for visually impaired wheelchair users,” IEEE Access, vol. 9, pp. 147 914–147 932, 2021. [Online]. Available: http://doi.org/10.1109/access.2021.3123952

[3] R. Azad, M. Heidary, K. Yilmaz, M. Hüttemann, S. Karimijafarbigloo, Y. Wu, A. Schmeink, and D. Merhof, “Loss functions in the era of semantic segmentation: A survey and outlook,” arXiv preprint, 2023. [Online]. Available: http://doi.org/10.48550/ARXIV.2312.05391

[4] J. L. Gómez, M. Silva, A. Seoane, A. Borrás, M. Noriega, G. Ros, J. A. Iglesias-Guitian, and A. M. López, “All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes,” 2023. [Online]. Available: http://doi.org/10.48550/ARXIV.2312.12176

[5] J. Tian, N. Mithun, Z. Seymour, H.-P. Chiu, and Z. Kira, “Striking the right balance: Recall loss for semantic segmentation,” arXiv preprint, 2021. [Online]. Available: http://doi.org/10.48550/ARXIV.2106.14917

[6] Z. Song, Z. He, X. Li, Q. Ma, R. Ming, Z. Mao, H. Pei, L. Peng, J. Hu, D. Yao, and Y. Zhang, “Synthetic datasets for autonomous driving: A survey,” 2023. [Online]. Available: http://doi.org/10.48550/ARXIV.2304.12205

[7] R. Kamimura, “Information-theoretic enhancement learning and its application to visualization of self-organizing maps,” Neurocomputing, vol. 73, no. 13–15, pp. 2642–2664, Aug. 2010. [Online]. Available: http://doi.org/10.1016/j.neucom.2010.05.013

[8] Q. Wu and H. Liu, “Unsupervised domain adaptation for semantic segmentation using depth distribution,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 14 374–14 387. [Online]. Available: https://upsalesiana.ec/ing35ar9r1

[9] S. F. Luna-Romero, C. R. Stempniak, M. Abreu de Souza, and G. Reynoso-Meza, Urban Digital Twins for Synthetic Data of Individuals with Mobility Aids in Curitiba, Brazil, to Drive Highly Accurate AI Models for Inclusivity. Springer Nature Switzerland, 2024, pp. 116–125. [Online]. Available: http://doi.org/10.1007/978-3-031-52090-7_12

[10] Y. Yuan, Y. Du, Y. Ma, and H. Lv, “DSCNet: enhancing blind road semantic segmentation with visual sensor using a dual-branch Swin-CNN architecture,” Sensors, vol. 24, no. 18, p. 6075, Sep. 2024. [Online]. Available: http://doi.org/10.3390/s24186075

[11] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Álvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” arXiv preprint, 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2105.15203

[12] S. F. Luna Romero, C. R. Stempniak, M. Abreu de Souza, and G. Reynoso-Meza, “A transfer learning model proposal for country border security using aerial thermal images,” in Procedings do XXIV Congresso Brasileiro de Automática, ser. CBA2022. SBA Sociedade Brasileira de Automática, Oct. 2022. [Online]. Available: http://doi.org/10.20906/cba2022/3341

[13] S. F. L. Romero, M. A. d. Souza, and L. S. Andrade, “Synthua-dt: A methodological framework for synthetic dataset generation and automatic annotation from digital twins in urban accessibility applications,” Technologies, vol. 13, no. 8, p. 359, Aug. 2025. [Online]. Available: http://doi.org/10.3390/technologies13080359

[14] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” arXiv preprint, 2015. [Online]. Available: http://doi.org/10.48550/ARXIV.1505.04597

[15] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv preprint, 2018. [Online]. Available: http://doi.org/10.48550/ARXIV.1802.02611

[16] S. F. Luna-Romero, M. Abreu de Souza, and L. Serpa Andrade, “Artificial vision systems for mobility impairment detection: Integrating synthetic data, ethical considerations, and real-world applications,” Technologies, vol. 13, no. 5, p. 198, May 2025. [Online]. Available: http://doi.org/10.3390/technologies13050198

[17] J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” arXiv preprint, 2018. [Online]. Available: http://doi.org/10.48550/ARXIV.1804.06516

[18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 2015. [Online]. Available: http://doi.org/10.48550/ARXIV.1502.03167

[19] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” arXiv preprint, 2017. [Online]. Available: http://doi.org/10.48550/ARXIV.1703.06870

[20] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” arXiv preprint, 2017. [Online]. Available: http://doi.org/10.48550/ARXIV.1708.02002

[21] J. Brewer, K. Rajagopal, A. Sadofyev, and W. van der Schee, “Evolution of the mean jet shape and dijet asymmetry distribution of an ensemble of holographic jets in strongly coupled plasma,” Journal of High Energy Physics, vol. 2018, no. 2, Feb. 2018. [Online]. Available: http://doi.org/10.1007/jhep02(2018)015

[22] R. Gouveia. (2025) Pibiti semantic segmentation. Github, Inc. [Online]. Available: https://upsalesiana.ec/ing35ar9r3