Semantic Segmentation of High-Resolution Airborne Images with Dual-Stream DeepLabV3+

AKÇAY, ÖZGÜN; KINACI, AHMET; AVŞAR, EMİN; AYDAR, UMUT

doi:10.3390/ijgi11010023

Semantic Segmentation of High-Resolution Airborne Images with Dual-Stream DeepLabV3+

AKÇAY Ö., KINACI A. C., AVŞAR E. Ö., AYDAR U.

ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, cilt.11, sa.1, 2022 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 11 Sayı: 1
Basım Tarihi: 2022
Doi Numarası: 10.3390/ijgi11010023
Dergi Adı: ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Agricultural & Environmental Science Database, CAB Abstracts, INSPEC, Veterinary Science Database, Directory of Open Access Journals
Anahtar Kelimeler: deep learning, semantic segmentation, photogrammetry, multi-spectral aerial imagery, digital surface model, vegetation index, land cover classification, CLASSIFICATION, BOUNDARY
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Çanakkale Onsekiz Mart Üniversitesi Adresli: Evet

Özet

In geospatial applications such as urban planning and land use management, automatic detection and classification of earth objects are essential and primary subjects. When the significant semantic segmentation algorithms are considered, DeepLabV3+ stands out as a state-of-the-art CNN. Although the DeepLabV3+ model is capable of extracting multi-scale contextual information, there is still a need for multi-stream architectural approaches and different training approaches of the model that can leverage multi-modal geographic datasets. In this study, a new end-to-end dual-stream architecture that considers geospatial imagery was developed based on the DeepLabV3+ architecture. As a result, the spectral datasets other than RGB provided increments in semantic segmentation accuracies when they were used as additional channels to height information. Furthermore, both the given data augmentation and Tversky loss function which is sensitive to imbalanced data accomplished better overall accuracies. Also, it has been shown that the new dual-stream architecture using Potsdam and Vaihingen datasets produced 88.87% and 87.39% overall semantic segmentation accuracies, respectively. Eventually, it was seen that enhancement of the traditional significant semantic segmentation networks has a great potential to provide higher model performances, whereas the contribution of geospatial data as the second stream to RGB to segmentation was explicitly shown.