Assessing the Shallow Water Habitat Mapping Extracted from High-Resolution Satellite Image with Multi Classification Algorithms

: Remote sensing technology is reliable in identifying the distribution of seabed cover yet there are still challenges in retrieving the data collection of shallow water habitats than with other objects on land. Classification algorithms based on remote sensing technology have been developed for application to map benthic habitats, such as Maximum Likelihood, Minimum Distance, and Support Vector Machine. This study focuses on examining those three classification al - gorithms to retrieve information on the benthic habitat in Pari Island, Jakarta using visual interpretation data for classification, and data field measurements for accuracy testing. This study used five classes of benthic objects, namely sand, sand-seagrass, rubble, seagrass, and coral. The results show how the proposed approach in this study provides an overall good classification of marine habitat with an accuracy produced 63.89–81.95%. The Support Vector Machine algorithm produced the highest accuracy rate of about 81.95%. The Support Vector Machine algorithm at a very high spatial resolution is considered to be capable of identifying, monitoring, and performing the rapid assessment of benthic habitat objects.


Introduction
Indonesia is the largest archipelagic country on Earth, with 17,500 islands. About 7.81 million km 2 of its total area (62%) is sea characterized by high biodiversity [1]. Ecologically, benthic habitats play an important role in marine ecosystems. One of the direct benefits of the existence of coral reef ecosystems is that they serve as a location for reef fishing that can be used as a food source and increase the income of the surrounding community [2]. Spatial modeling related to the fishing grounds around Nias Island also shows that many fishing areas were found around coral reef objects [3]. Other benefits of coral reefs include as a marine tourism location, beach protector and abrasion barrier [4,5]. In the last decade, marine ecosystems have experienced tremendous pressure caused by various phenomena, either natural factors or anthropogenic one such as destructive fishing [6], and the disposal of waste in the sea leading to ocean acidification [7]. Conservation efforts are needed to protect the biodiversity of marine creatures [8].
Marine biodiversity is critical to the sustainability of the living environment of people. Currently, there are numerous parties who continue to promote and support the marine conservation agenda, such as Marine Strategy Framework Directive [9] and World Wildlife Fund [10] and The Nature Conservancy [11]. All these parties have developed various conservation management strategies and measurements by utilizing information on benthic habitat distribution. However, there are still challenges presented by collecting data on benthic habitats. Measuring field data on the distribution of benthic habitats incurs relatively high costs, causing problems related to the lack of information about shallow water habitats [9]. With these limitations, a suitable way to go about shallow water habitat mapping is to use remote sensing technology. Along with the rapid development of remote sensing technology, it will be easier to monitor coastal and marine resources, especially coral reef ecosystems or benthic habitats. The utilization of remote sensing technology can predict benthic habitat effectively without requiring expensive field operational costs [10,11]. Experts have relied on remote sensing to identify shallow water objects such as seagrass, coral, and large-scale macro algae for decades [12][13][14][15][16].
There have been a variety of classification algorithms based on remote sensing images developed by experts that are reliable for the mapping of shallow water habitats. Some of the most commonly used are: Maximum Likelihood (MLC) [12,14,16,17,19], Support Vector Machine (SVM) [12,14], Minimum Distance (MD) [18,19]. Every algorithm has its own strengths and weaknesses in generating information comprehensively. Therefore, it is necessary to understand the algorithm for each before determining the appropriate algorithm to use. The selection of the appropriate algorithm will have a significant impact on producing information with a high level of accuracy. This study aims to analyze the three different algorithms in optimizing shallow water habitat mapping using PlanetScope images with a very high spatial resolution (resampled 3 m).

Study Site
Pari Island is located in the north of Jakarta Province, precisely at 5°50'20"-5°50′25″ and 106°34′30″-106°38′20″ (Fig. 1). Administratively, Pari Island is one of several islands in Kepulauan Seribu region, Jakarta, with an area of about 41.32 ha in total. The island is flat with an altitude between 0-3 m above sea level with white sandy beaches and mangrove habitat. Sukarno [20] stated that the coral reefs of Pari Island in the percentage of live coral cover were 40-60% at a depth of 1-3 m. Furthermore, the Ministry of Marine Affairs and Fisheries of Indonesia [21] reported that flat reefs were found at depths of up to 8 meters and with several types including Porites, Favia, Montipora, Echinopora, Fungia, Acropora, Goniastrea, Sandalolitha, Ctenctis, and Montastrea.

Research Data
A single PlanetScope image at the 3B level acquired on 4 July 2021 was processed in this study. 3B level data is PlanetScope ortho scene product, where the data have been orthorectified and scaled Top of Atmosphere radiance (at sensor).
PlanetScope images have a very high spatial resolution with 3.7-4.1 m for spatial resolution (resampled to 3 m). Its use will enable benthic habitats to be identified with high variety objects. The PlanetScope satellite has five band with band range of 455-860 nm (Tab. 1). Source: [22] The validation data consists of 240 data points with 163 in situ data points collected by field measurement activities and additional data using visual interpretation with 77 data points. These sample points represent five different object classes in shallow water. To have a stable estimation of the classification performance for each algorithm, a splitting data set for training and testing was conducted in this study. The proportion data for training purposes was 70% of the total data, while 30% was set aside for testing purposes. For cross-validation purposes, a 70/30 train/test splitting ratio is recommended to serve a reasonable balance in the classification process [23].

Object Classes
Five object classes with different spectral characteristics were recorded in field measurements, consisting of coral (K), seagrass (Lm), sand (P), sand-seagrass (PLm), and rubble (R). In theory, sand and rubble have a high reflectance value, because most of the energy that emits toward hard object is either reflected or absorbed and little is transmitted. Meanwhile, live coral has a lower reflectance value due to the presence of zooxanthellae that can absorb the energy emitted. Seagrass provided the lowest reflectance value due to the fact that vegetation has energy-absorbing properties in the red and blue wavelength [24].

Land masking
The land masking process is one of the phases in satellite image processing which aims to focus on the study area in order to facilitate observations in image processing. The land masking stage was conducted using an NIR band because this band can distinguish land and water clearly [25]. Based on the characteristics of electromagnetic waves, most of the radiation incident on water is not reflected but is transmitted or absorbed. Longer visible wavelengths and near infrared radiation are absorbed more by water than by visible wavelengths. Therefore, water looks blue or blue green due to stronger reflectance at these shorter wavelengths and darker if viewed at red or near infrared wavelengths [24].

Water column correction
Water column correction is carried out to reduce the disturbances to objects from the underwater habitat caused by the water column. When light penetrates water, its intensity decreases exponentially (attenuates) with increasing depth because of several processes, absorption, and scattering. Different wavelengths will have different levels of attenuation. In the visible spectrum, the red wavelength (600-700 nm) attenuates more rapidly than the shorter wavelength [26].
There are various techniques to correct the influence of depth on bottom reflectance. Nevertheless, the removal of this distortion would require two main variables, a measurement of depth for every pixel and the information of the attenuation characteristics of the water column (e.g. concentrations of dissolved organic matter) [15]. As these two variables are difficult to obtain in most areas, Lyzenga [27,28] proposed a simple image-based approach to compensate for the effect of variable depth (water column correction). This method was then developed by Mumby et al. [15].
The main idea of this water column correction method is that instead of predicting the reflectance of the seabed, the method produces a depth-invariant bottom index from each pair of spectral bands [29]. The relationship between reflection and exponential attenuation for each depth is linear with the following equation: where: L i and L j -reflectance value in band i and band j, K i /K j -attenuation coefficient ratio in band i and band j.
Attenuation coefficient ratio value (K i /K j ) is the value determined by the transformation of the reflected biplot value in two bands (L i and L j ). Biplot comparison data obtained from substrate with the same type, except for the depth variable, equation would be: where:

Classification
Image classification technique is essential in image processing to deal with identifying the position of objects belonging to a certain object class defined in the image. The two main types of the classification ways in remote sensing are unsupervised and supervised classification [30,31]. This study selected the supervised classification technique to categorize the object class defined. Supervised classification is a classification that is carried out in image processing, where the classification criteria are determined based on the class signature obtained through the creation of a training area. It is based on the use of different types of algorithms to label the pixels in an image as representing a particular type, or class [32]. This study used three commonly used classification algorithms in detecting benthic habitat mapping, namely Support Vector Machine (SVM) [12,14,33], Maximum Likelihood (MLC) [12,14,16,17,19] and Minimum Distance (MD) [18,19]. SVM are trained by solving a constrained quadratic optimization problem. This algorithm performs the classification by finding the hyperplane as a separator of two classes by maximizing the margins among classes [30]. The MLC algorithm is a conventional method in pixel-based classification. It makes application of a discriminant function to assign a pixel to the class with the highest likelihood. Class mean, vector, and covariance matrix are key inputs to the function and can be estimated from the training pixels of a particular class [34]. The MD algorithm basically classifies the mean vector values then calculates the Euclidean distance from each unknown pixel value. It aims to classify patterns by using a distance measure that involves the class distribution. Unknown pixels will be marked as a particular class toward the center of the class closest to them [35,36].

Post-classification
The post-classification part consists of two components: the editing of the polygons of classified objects and accuracy assessment. The classification results that served in the vector data format have a separated polygon one to another. The editing of polygons performed by merging of the polygons with each identity object class into one feature set. It serves a better visualization of classification output and making it easier to calculate the area in a number of each of the object classes. The other component was conducting the accuracy assessment toward the result of machine learning algorithm classification. It aimed to assess whether the map generated from the classification process was suitable for use. The accuracy assessment employed the independent validation data set collected from field measurements and visual interpretation on the image to reduce general bias. The data point consisted of 240 data points in detailed 163 in situ data points obtained from field measurements while 77 data points collected from visual interpretation. About 30% of the total data points were used to assess the overall accuracy value of the entire number of correctly classified pixels to a total number of validation data points. The accuracy assessment method used a confusion matrix and kappa coefficient.

Confusion matrix
The results of remote sensing data classification were validated using an error matrix. This was done by comparing the image classified as a map with the sample point. The accuracy test refers to Congalton [36]. The confusion matrix generates some information such as Producer's Accuracy (PA), User's Accuracy (UA) and Overall Accuracy (OA). PA shows how often real features in the field are correctly displayed on the classified map, while UA shows how often classes on the map will be present in the field. OA essentially explains the overall correctness of the classified map [36].
Confusion matrix calculations are shown in the following equations: where: k -number of classes, n ij -number of sample pixels in cell (i, j) of the error matrix, n +j -number of sample pixels in column (reference class) j of the error matrix, n i+ -number of sample pixels in row (map class) i of the error matrix.

Kappa coefficient
The kappa coefficient is generated from a discrete multivariate technique to evaluate the accuracy of classification. This analysis is based on the difference between the actual level of agreement in the error matrix represented by the main diagonal and the probability of conformity indicated by the total row and column [36].
Kappa analysis calculations are shown in the following equation: Then, the calculated value is matched with the level of conformity made by Landis and Koch [37], as shown in Table 2.

Pre-classification
To compensate for the influence of the depth variable on shallow water habitats, this study proposed to derive the depth invariant indices by conducting the water column correction. Water column correction represents an additional complexity in extracting information from seabed substrates by means of remote sensing technology [38]. Lyzenga's water column method applied in this study served the purposes of reducing the effect of water attenuation but did not remove them [39]. Depth invariant indices transformation used a band combination for the correction of the water column as blue and green bands (bands 1 and 2). The combinations of these bands were used for the calculation attenuation coefficients (K i /K j ). The result of the corrected image is presented in Figure 2.

Classification
By using high-resolution PlanetScope satellite imagery with a tile size 25 km 2 [22] into supervised classifications with multi-algorithms workflow, a benthic habitat map of Pari Island was created. This map is composed five benthic habitat classes, namely coral, rubble, seagrass, sand, and seagrass-sand, which are all deemed to be ecologically meaningful for marine systems. The National Research and Innovation Agency of Indonesia and Ministry of Fisheries and Marine Affairs underscore the diversity of coastal ecosystems around the platform reef and the island itself are especially noteworthy. This area is characterized by a complex interplay of mangrove ecosystems, seagrass beds, and coral reefs along with all the biota associated with these ecosystems [40].
Based on distribution mapping for five classes of objects in Pari Island (Fig. 2), it is known that coral cover on red colored is spread along the coastline entire the island and located in the edge of platform reef. This type of coral is identified as fringing reefs in combination with atoll in the middle [40]. Fringing reefs grow near the coastline around islands to a depth of about 3 to 8 m. Geomorphologically, fringing reefs can be categorized into three main zones: forereef, reef crest, and backreef [41]. Seagrass meadow and sand-seagrass are densest and more expansive, being found everywhere in the shallow water areas on Pari Island. Seagrass is identified as being associated with coral and rubble. The rubble class is found on the lagoon side. The rubble on coral reefs can be defined as dead coral skeleton that has fractured [42]. By visual appearance, the rubble objects are quite identical with sand due to the color, which is commonly white. It sometimes means that misclassifications are possible between sand and rubble.

Post-classification
Based on the classification with three algorithms, it shows that the seagrass (P), sand (S) and seagrass-sand (PLm) habitats are predominantly found anywhere in shallow water habitats on Pari Island, with a total area of 880 ha (Fig. 3  The accuracy assessment of the distribution of benthic habitats on PlanetScope images, using the MLC, MD and SVM algorithms is presented in Tables 3-5. Based on the results of accuracy assessment toward the five selected classes, SVM delivers the highest overall accuracy of 81.95%, followed by MLC with 73.61% and MD with 63.89%. MD represents the lowest accuracy value, which means MD failed to classify the defined object. It can be seen that MD produces an overestimated map for the rubble class compared to the other algorithms. The maximum accuracy from the SVM algorithm produces represented spatial distribution of coral and sand with PA reach 100%. The main error of the MLC algorithm is that it contributes to the misclassification of the detected seagrass in other benthic classes. This can be seen from the low PA obtained for the seagrass class (26.09%). Meanwhile, for the MD algorithm, over-calculation occurs for the rubble class, leading a lot of sand cover to be identified as bubble (UA for rubble: 38.1%). The SVM algorithm was successful in separating three similar classes (sand, seagrass and sand-seagrass), but it was difficult to identify rubble.   By looking at Figure 5, the pattern of PA and UA for each algorithm is quite different. SVM with maximum accuracy delivers a consistent pattern between PA and UA across the object classes. The gap is quite small within 3 to 7%. On the other hand, MLC and MD shows random pattern of PA and UA with a big gap between PA and UA across the objects. MD has a bigger gap of PA and UA across the classification objects than MLC and SVM. This means that MD has the least accuracy. When UA is higher than PA, this means the classification results are underestimated, otherwise PA is higher than UA, meaning the results are overestimated [12]. c) The kappa result (Tab. 6) is categorized according to the value range as mentioned by Landis and Koch [37]. Values ≤ 0 as indicating no agreement and 0.01-0.20 as none to slight agreement, 0.21-0.40 as fair agreement, 0.41-0.60 as moderate agreement, 0.61-0.80 as substantial agreement, and 0.81-1.00 as almost perfect agreement. Based on the study, SVM has the highest kappa result by 0.7333 which means the classification result is agreed in substantial level. MLC and MD have 0.6636 and 0.569 respectively.

Misclassification Issues
This study conducted the performance test toward the classification result using multi-classification algorithms in two ways: accuracy assessment and the strength of agreement. The confusion matrix method was selected to carry out the accuracy assessment with OA value as the quantitative parameter. The kappa coefficient was selected to assess the agreement for the classification result served in quantitative parameter ranging 0 to 1, then the coefficient was categorized into the confidence level to describe the agreement level. The quantitative performance results absolutely depend on the classification process. Some misclassification is possible identified due to several factors, such as (1) water column disturbance; (2) spectral confusion and limitations of satellite sensor capabilities; (3) the observer's ability to identify objects. Ad 1. The radiation absorption by water in aquatic ecosystems plays a role in causing misinterpretation in the classification process. Spectral absorption and backscattering govern the reflectance of the ocean due to the presence of particles, phytoplankton, photosynthetic pigments, de-pigmented particles, and soluble material. A water column correction allows one to be rid of the influence of those particles in water. However, it cannot guarantee that the reflectance of shallow water objects will be quite perfect to extract. It is also mentioned by Zoffoli et al. [38], even in the best conditions, it is not possible to be completely depth independent because uncertainties depend on the wavelength, bottom depth, and type of bottom.
Ad 2. Several benthic communities located in the study area have similar spectral signatures. For example, the high spectral signature of sand is similar to rubble, plus the confusion between coral and seagrass near the coast is evident from the classifications made using the three algorithms (Fig. 2, Tab. 6). Kutser and Jupp [43] explain that large variability in coral spectral markings occurs even between the same species, making it difficult to distinguish them. Moreover, when using multi-spectral sensors such as Planetscope, it has a limited number of bands with large wavebands, thus making habitat discrimination more difficult [44]. Ad 3. The observer's ability to identify objects can also be one of the main contributors to classification errors. This was mentioned by Hollnagel [45], who showed that collecting valid data is a common human error in research. Errors in interpreting objects and disagreements on data measurements can be the main causes for the production of inaccurate classification results [46]. Ideally, observers should have had advanced training about how to do the field measurement and how to analyze the field data scientifically or rather in interpreting the field data in the classification process.

Conclusion
This study has successfully proven that it is possible to identify objects in shallow-water habitats using PlanetScope images. However, there have still been misclassifications in the final results due to the difficulty of identifying objects in benthic habitats. Based on the final results of the classification using three different algorithms, the SVM algorithm generated the highest accuracy rate (81.95%) compared to other algorithms selected. Also, its classification result indicates the agreement at a substantial level based on the kappa coefficient, which means the acceptable level of agreement, is quite high. In this study, we also mentioned the differences in each algorithm which are possible when used to describe the uncertainty in a spatial prediction. In addition, the classification result similarities obtained from the selected algorithms can be considered valid data.

Author Contributions
Author 1: conceptualization, methodology, image processing, and analysis data. Author 2: methodology, writing -original draft preparation. Author 3: collecting field data, writing -review and editing, and translating. Author 4: collecting field data and writing.