Assessment of Approaches for the Extraction of Building Footprints from Pléiades Images

The Marina area represents an official new gateway of entry to Egypt and the development of infrastructure is proceeding rapidly in this region. The objective of this research is to obtain building data by means of automated extraction from Pléiades satellite images. This is due to the need for efficient mapping and updating of geodatabases for urban planning and touristic development. It compares the performance of random forest algorithm to other classifiers like maximum likelihood, support vector machines, and backpropagation neural networks over the well-organized buildings which appeared in the satellite images. Images were subsequently classified into two classes: buildings and non-buildings. In addition, basic morphological operations such as opening and closing were used to enhance the smoothness and connectedness of the classified imagery. The overall accuracy for random forest, maximum likelihood, support vector machines, and backpropagation were 97%, 95%, 93% and 92% respectively. It was found that random forest was the best option, followed by maximum likelihood, while the least effective was the backpropagation neural network. The completeness and correctness of the detected buildings were evaluated. Experiments confirmed that the four classification methods can effectively and accurately detect 100% of buildings from very high-resolution images. It is encouraged to use machine learning algorithms for object detection and extraction from very high-resolution images.


Introduction
The precise identification of the ratio of existing buildings through extraction of building footprints from satellite images is a significant task in many applications like urban planning, hazard assessments and disaster risk management, and map updating of built-up areas.
Dahiya et al. in [1] used the split and merge technique to segment high resolution satellite images with fair contrast. Afterward, many filters were implemented to extract important features that were later transformed into a vector image. Finally, buildings were extracted from the area of the vector image. Liu and Prinet [2] utilized the discriminative feature-based technique, which described buildings effectively. Zhang et al. [3] created an approach for identifying buildings from high-resolution satellite images with usual contrast on a global scale. The developed methodology produced population density maps for the identification of buildings. Deep learning has also been applied to identify buildings from satellite images. Xu et al. [4] used deep learning to suggest a framework with guided filters for detecting buildings from satellite imageries. Aamir et al. [5] suggested a model can efficiently extract buildings from QuickBird images.
Pixel-based and object-based techniques are the main image classification techniques in the literature.
Pixel-based classification methods neglect spatial background and only use spectral information, such as spectral vectors, at each pixel position. The Maximum Likelihood (MLH) algorithm is a popularly used classification approach. The MLH approach assumes the data is distributed normally for each class. A given pixel in the MLH procedure has a probability of belonging to a certain class. As a result, each pixel's probability is determined, and each pixel is allocated to the highest probability class.
Object-based classification methods categorize pixels based on their spatial characteristics as well as spectral values. It divides the image into objects that represent sets of pixels based on their spatial characteristics and other factors [6,7].
Diverse learning-based algorithms have been implemented as an alternative to pixel-based and object-based approaches to get more precise and reliable built-up information from satellite images. Several researchers as [8][9][10][11] used SVM and RF for buildings extraction and detection, Yuan [12] employed machine learning algorithms for counting buildings from satellite images.
Machine-learning (ML) techniques are non-parametric and data-driven because they do not make any assumptions about data distribution and learn the relationship between input and output data [13]. Random forest (RF), backpropagation (BP), and support vector machines (SVM) are the most commonly machine learning-based algorithms. Machine learning techniques are often used to extract meaningful information from high-resolution images. Hsu et al. [14] as well as Salah et al. [15] used SVM to categorize land cover. Haykin [16] combined the findings of SVM, BP and RF for land cover classification from high-resolution imageries.
Previously, researchers have used a machine learning approach, for example Turker and Koc-San [17] integrated several algorithms to extract buildings automatically from high-resolution images. After recognizing the buildings patches, the buildings borders are retrieved using Hough transformation and sequential edge detection and perceptual grouping.
Based on neural networks, Lari and Ebadi [18] proposed a technique for increasing the degree of automation in extracting buildings features with varied roofing from high-resolution multispectral satellite pictures in Middle Eastern nations.
Shoaib et al. [19] compared pixel and object-based classification techniques to extract buildings from WorldView-2 and GeoEye images. Then, based on shadow, context, form, and Digital Surface Model data, several refining processes are carried out. A comparison of the classification techniques revealed that the MLH Classifier for pixel-based techniques and SVM for object-based technique were the most effective.
To categorize image data, Xu et al. [20] developed an enhanced RF approach that combined a novel feature weighting method with a tree selection method. They have conducted a number of studies with image databases. The results of the experiments showed that the RF created using this strategy actually reduced generalization error and enhanced the test accuracy classification performance.
Walton [21] used Cubist, RF, and support vector regression to estimate urban cover from Landsat-7 imagery utilizing a higher resolution cover map as training and reference data. The results showed that Cubist was the best when predicting impervious surface cover. Optimum implementations of RF and Support Vector Regression might improve performance significantly.
Zhou and Chang [22] used ML to automate the classification of building structures. Several ML algorithms were evaluated, and the Gradient Boosted Decision Tree gave the best accuracy of 91.7%.
Numerous techniques have been developed using ML-based methods to classify built-up areas. However, existing results could be further improved. To date, a limited number of ML algorithms have been assessed in the existing literature. Therefore, the objectives of this research were to evaluate RF for their effectiveness and prospects for buildings extraction. A secondary objective was to assess the accuracy of RF compared to widely used ML classification techniques as MLH, SVM and BP neural networks in order to derive the building footprints in a coastal city as a case study.
This research contributes to the existing theory in that it offers an initial effort toward the automated classification of buildings in the built environment domain. The novelty of this research is that our approach outperforms several peer methods for building footprints detection and extraction.

Field of Study
Marina is located on the northern costal shore of Egypt (Fig. 1). It is the official new port of entry to Egypt. It is located between 30° 50′ 00″ N and 30° 50′ 59″ N and between 28° 57′ 44″ E and 28° 57′ 59″ E. We selected this study area because it is the new official gateway of entry to Egypt and the development of infrastructure is proceeding rapidly in this region.
The available data sources were color Pléiades images at 0.5-m resolution and digital large-scale maps 1: 2500.

Method
The processing chain for building footprint extraction was as follows (Fig. 2 depicts the flowchart): 1. Radiometric correction of Pléiades was performed. 2. Rectification of Pléiades was performed using map control points.
3. The image has been classified using RF, then the performance of the RF algorithm was compared to maximum likelihood, support vector machines, and BP. 4. Assessment of classification accuracy. 5. Application of morphological operations. 6. Edges detected using the Sobel edge detection algorithm and then converted into vectors. 7. Assessment of the performance of the completeness and correctness analyses.

ROI development
Classification of Pléiades Image

SVM Backpropagation
Classification accuracy assessment Apply morphological operations

Sobel edge detection
The edge image is then converted into vector Quality Analysis Completeness, correctness, and quality

Geometric Correction of the Pléiades Images
Registration of the Pléiades images was performed using 2D second order. ER-DAS-Imagine 2014 was utilized for the geometric correction. The CPs and ICPs that were used in the investigation were collected from reference digital large-scale maps 1:2500 using AutoCad Map 2004. Sharp edges were selected to enable easy identification on both maps and images. The accuracy of the extracted points was ±0.25 m. Figure 3 depicts the distribution of 10 well distributed control points CPs (yellow color) and 10 well distributed check points ICPs (green color) over the study area. The RMST error was 0.431 m.

Classification of Pléiades Images
In this research, four classifications (RF, MLH, BP neural network, and SVM) were performed. Region of Interests (ROIs) are collected and stored in a shapefile with two classes; buildings and non-buildings. Each record in the ROIs includes one target value (i.e. the class label) and several attributes. Using ROIs, these algorithms can train a classifier, and then use the relationships identified in the training process to classify the remaining pixels. The aim of the classification was to generate a model (based on the training data) that predicts the target values of the test data given only the test data attributes.

Maximum Likelihood Classification
The MLH is implemented in all software image processing packages [24]. It assumes that data is distributed normally and calculates the probability of each pixel. Each pixel is assigned to the highest probability class [13].
A disadvantage of MLH classification that it consumes considerable amounts of time and effort to prepare the training samples [25]. Also, MLH cannot effectively handle the mixed-pixel problem and it has a salt and pepper effect [26].
ENVI implements MLH classification as follows for each pixel: where: i -class, x -n-dimensional data (where n is the number of bands), p(ω) -probability that class ω i occurs in the image and is assumed the same for all classes, i ∑ -determinant of the covariance matrix of the data in class ω i , − ∑ 1 i -its inverse matrix, m i -mean vector.

Support Vector Machines
SVM is an efficient classification method [14]. SVM with Gaussian radial basis function was utilized with Gamma = 0.167 and penalty parameter = 100. The penalty parameter permits a certain degree of misclassification. Radial basis function (RBF) was used because it has been confirmed as effective in remote sensing applications [15,16].
It is a non-parametric classifier that has the added benefit of being able to decrease empirical classification error while maximizing class separation using different hyperplane transformations. This enables SVM to better handle of high-dimensional data and classes with a multimodal feature space, often resulting in better results than other algorithms [27].
Also, its capacity to generalize well, even with small training data, is its biggest advantage over other classifiers [28,29].
Other benefits include the fact that no earlier knowledge of the underlying data distribution is needed [13].
SVM has a number of drawbacks [30]: -It is difficult to read unless the features are interpretable.
-It can be computationally expensive, so a good kernel function is required.
-There is a lack of transparency in the outcomes [30].
Radial basis function (RBF) is given by: where g is the gamma coefficient in the kernel.

Backpropagation Neural Network Classification
A layered feed-forward classification technique was applied using standard BP. The system uses the input vector to generate its own output vector and compare it to the desired output vector. If there is no difference, no learning is required, or the weights are altered to decrease the difference [31].
The BP classifier provides a number of advantages over the maximum likelihood classifier, including the fact that it does not require data to have a normal distribution. Moreover, BP requires fewer training samples [32].
Furthermore, a BP neural network can combine data from many sensor data types and auxiliary information, which a normal parametric classifier cannot [33]. It has a massively parallel structure and is relatively noise robust [34].
However, BP has significant weaknesses, including: -sluggish learning rates, -difficult convergence, -complex network topology, and ambiguous network meaning, while its parameterization limits its use [34,35].

Random Forest Classification
RF is a non-parametric technique that can treat complex, high-dimensional data more appropriately than conventional ones [36][37][38].
It is an ensemble method for image classification focusing on classification and regression analysis. The main parameter is the number of trees that can be set by the consumer [39]. RF is a model utilizing random grid search that fits a count of tree predictors on different sub-samples of the dataset [33]. RF is included in land cover classification utilizing different satellite sensor imagery [40][41][42]. To evaluate the model, cross-validation was used to choose the parameters that generalized best the data, find the best number of trees, and the best number of maximum features. The main parameter is the number of trees, and the larger the better. The optimal number of trees in the decision tree of RF was studied. We tested the number of trees (i.e. 100 and 1000) in the RF classifier.
RF classification has a range of substantial advantages which help it yield a high degree of classification accuracy: -it requires fewer parameters, -minimal manual intervention, -it manages high-dimensional data, and -its capacity in determining significant variables and predicting the missed values [40,42,28].
It is well known that RF is characterized by notably computational efficiency. The main obstacle of RF is that a wide number of trees might make it too slow and ineffective for real-time predictions.

Criteria for Selecting a Classifier
The chosen classifier depends on a number of factors encompassing the accuracy of the classification, the quantum of human intervention, the required processing time and complexity of the classifier [25].

Quality Analysis
Quality tests such as quality, correctness, and completeness were used to verify the quality of the developed technique [43,44].
The extracted candidate buildings segments were classified into the following groups [28]: - After classification, the quality tests below were applied: -≈ ≈ +

Findings and Discussion
A classification was executed and basic morphological operations such as opening and closing were applied to enhance the segmented image's smoothness and connectedness. Figure 4 demonstrates the distinction between two urban mapping groups (urban and non-urban) for MLH, BP neural network, SVM, and RF respectively.
The overall accuracy (Tab. 2) of RF, MLH, BP neural networks, SVM were found 97%, 95%, 92%, and 93% respectively. The classification was performed using snap and ENVI 5.1 software. It was found that RF results obtained from Pléiades data have the highest accuracy of the four different classification methods (Tab. 2). This is because of the abovementioned advantages of RF. The optimal number of trees was investigated. Many trials were investigated and it is found that random forest's efficiency improves as the number of trees grows. It was noted that, after growing 500 trees, RMSE value decreased very slowly and converged after 1000 trees [45]. Thus, the trees number was set to 1000.
In remotely sensed data classification, RF was examined at different spatial scales to estimate landuse/landcover from satellite imageries [21] and laser-scanning data with accuracy of 95% [46]. Decision tree, Convolutional Neural Network and RF were used in the automatic classification of buildings and RF had the higher accuracy results by [10]. Tooke et al. [47] predicted building age from LiDAR-derived data successfully using RF. Guo et al. [46] used LIDAR data and aerial image data for urban mapping based on RF classification with a global accuracy of 95%. Bassier et al. [9] identified structural elements by processing the entire building of point cloud data using RF classification with a precision of 85%. He et al. [48] recognized building group patterns in topographic maps based on RF and graph partitioning with 90% of correctness.

Conclusions
This research addressed a binary classification problem and four machine learning algorithms were assessed. The overall accuracy for RF, MLH, SVM, and BP neural network were found 97%, 95%, 93% and 92% respectively. The results furnished a number of conclusions -First, the data labelling approach has an impact on the classification results.
-Second, our results outperformed other peer methods.
-Third, the completeness and correctness of our results for RF indicate that it can accurately classify 100% of buildings. -Fourth, it was found that the best two classifiers were RF and maximum likelihood. -Lastly, the RF classification approach was found very promising for the extraction of building footprints. The computational efficiency of RF was excellent, with only a few minutes of runtime necessary for training. The performance of RF was enhanced by both increasing the number of trees and with more time.
Future work will include the comparison of numerous machine learning classifications using high and very high-resolution images.