Digital Cartographic Generalization – Study of Its Thresholds and Stages in Example of Cartographic Line

: Digital generalization of spatial data has been the goal of the research in many research centers around the world. This article presents the evolution of cartographic generalization, drawing the reader’s attention to the change of its nature from analog to digital. Despite the passage of time and developing technologies, scientists have unfortunately yet to develop a uniform automatic generalization algorithm. One of the factors that hinder this process is the high complexity and complication of the whole process. The article is an attempt to answer this problem and addresses the issue of digital cartographic generalization by creating a proposal of thresholds and stages of cartographic generalization depending on the ratios of the numbers of points of generalized objects. The publication attempts to examine the possibility of applying an objective criterion of drawing recognition by examining digital generalization algorithms and setting its thresholds. The practical aim of the publication is to present generalization thresholds on the example of Chrobak’s algorithm. The proposal to make the selection of generalization thresholds dependent on the percentage share of points is a solution that is as simple to use as it is to implement. The method of defining intervals based on the three-sigma rule is a solution that guarantees that the obtained results will be characteristic of the probability density function of the normal distribution, which will define individual intervals most objectively.


Introduction -Research Issues
Cartographic generalization is a process that consists of the selection and simplification of selected map content; it is an intentional and logical process that causes changes in visualized objects [1].The overriding goal of generalization is the ability to create maps at various scales while maintaining spatial relationships between objects on the Earth's surface [2].According to this rule, the map should present the surrounding space in the simplest possible way (or one that will have a utilitarian character).This means that, due to the carefully selected (simplified) content, the map will be a valuable source of information about a given phenomenon [3].Content selection is strongly correlated with map scale; large-scale studies can be considered to be more objective due to the smallest possible percentages of omitted objects or their parts [4].
Initially, cartography was often treated as an art; the created studies had the character of unique works.Over time, cartography gained a new meaning; for example, through the need to register owned lands, which affected the amounts of taxes [5].A more utilitarian character of cartography was revealed at the stage of great geographical discoveries.Then, it turned out that an accurate map that showed the coastlines of continents, countries, or rivers was often worth its weight in gold.
Another milestone in the development of cartography was accelerating globalization and, thus, commercialization, which forced the need to systematize the rules that governed the objects that were presented on maps at different scales.
In addition to the method of mapping and presenting relief, the problem of generalization was already defined as an important scientific problem in the 19th century due to the need to change the approach to the method of creating maps.The subjective process of generalization that existed from the beginning (which was completely dependent on the knowledge and experience of cartographers) was replaced by attempts to systematize the principles of cartographic generalization and unify the method of generalization [6][7][8][9][10].The advent of computers opened up new possibilities for cartographers to systematize and specify the rules of generalization in the forms of mathematical formulas.The algorithmization of the generalization process led to clarifications of the generalization rules in the forms of mathematical formulas that were intended to ensure its objectivity and repeatability.The new research problem hid its potential under the concept of digital cartographic generalization, which has been highlighted in this publication on the example of line generalization.
The publication deals with the topic of the digital generalization of a line on an example of the Vistula River.
The choice of a specific example for the analysis was dictated by the desire to popularize the region and one of the largest European rivers.In addition, the Vistula is interesting in its middle course due to a lack of regulation, which also translates into the fact that the river meanders.

Literature
Digital cartographic generalization ostensibly provided objectivity in the search for unbiased and commonly used criteria [11].The unification of the map-development method led to a situation where the experience, workshop, and knowledge of a person who supervised the generalization process could seem irrelevant.Of course, there is a lot of truth in this if we only consider the technical process of carrying it out (i.e., reducing the number of objects or their components as a result of changing a map's scale).It is worth noting, however, that the digitization of maps resulted in the emergence of new algorithms for automatic cartographic generalization, the use of which giving different results depending on the type of object and the initial adopted conditions.In the 1960s, the first algorithms for automatic cartographic generalization using computers were developed by Perkal [12], Tobler [13], and Lang [14], among others.The first research focused on the generalization of linear objects [13][14][15].Initially, digital methods of generalization provided support and were tools for improving the work of cartographers [16].Nowadays, the process of generalization is related to the generalization of the contents of spatial databases, not maps as such [17,18].
Scientists agree on the need to automate the generalization process [18].Research on the generalization process that was conducted by Li Zhilin [19] divided the simplification algorithms into those that take the scales of the ultimately created map into account as well as those that eliminate the polyline vertices.The last of the mentioned groups is characterized by the selection of the so-called critical points of the primary line and the elimination that results from the target scale.Within this group, there is a classification that divides algorithms into independent unconditional processing procedures of a local nature as well as procedures of a global nature [8,20].The group of local algorithms includes those that were created by the following scientists: Jenks, Reumann-Witkam, and Lang.Examples of global algorithms are the Douglas-Peucker and Chrobak algorithms [21].
An important problem from the point of view of science is also defining the method of generalization.One of the most important models is that of Ratajski (1973) [22] in which the stages and thresholds of generalization can be distinguished.Ratajski defined the limit (limes) of generalization as the moment of reaching the limiting capacity of a map within which further generalization would not be possible without renewing the map's capacity.Morison's model [17] is based on the types of generalization that were distinguished by Robbinson -arranging them in a series of transformations and distinguishing such elements as classification, simplification, symbolization, and induction.Another example of a more advanced model is that of Nickerson and Freeman [20].This model defined the concept of an intermediate scale, which implied the creation of an output map by reducing the map to a scale of 1 : k • m and the area to (w • h)/k 2 (where: w • h is the area of the symbol on an intermediate scale; k -enlargement of the symbol on an intermediate scale; m -denominator of map scale).Another advanced example of a model is that of Brassel and Weibel [23], which distinguished between the so-called statistical generalization (which boiled down to content filtering) and cartographic generalization (which affected changes of a map's structure to improve its visual message).
The new work of Professor Chrobak showed successful attempts to create a unified methodology of geometry that was visible from the scale of generalization.The first of the introduced methods concerned the verification of the resolution of drawing lines in the mode of simplified dimensions of line recognition, line widths, and scales of operational maps.The verification was based on patterns in the forms of elementary triangles [24] that specified the number of vertices of the generalized line.The step-generation algorithmization search function affected the geometrical and structural properties of a road network.Among the problems of cartographic generalizations based on winding mountain roads, deep-learning algorithms of neural networks are also used to solve them [25].These are the algorithms that are used by scientists to generalize and smooth out winding roads [26].Despite the errors in the results of the algorithms' work (which attracted the attention of the authors of this algorithm), artificial neural networks have attracted attention as a tool of the future.This was also confirmed by Zang's works [27] as well as Jensen's [28].The issue of generalization also appears in the works of Weibel [29].A contribution to the methodology of generalizing the geometry of features was also made by scientists from Poland [30], whose work was taken into account on the use of the characteristic features of metric spaces that represented the restrictive conditions of Lipschitz and Cauch, the measures of Salishchev's triangles, and Banach's theorem due to the uniqueness of the general process.Thanks to this research, the process became transparent and made it possible to present features at every map scale.

Purpose, Scope, and Methodology -Research Problem
The scientific aim of the publication is to examine the possibility of applying an objective criterion of drawing recognition by examining digital generalization algorithms and determining their thresholds.The practical aim of the publication is to present generalization thresholds on the example of Chrobak's algorithm.The most commonly used generalization algorithms with their implementations in GIS software and Chrobak's algorithm (whose implementations were carried out using a Python code script) were selected for comparison.
Jenks's algorithm eliminates a point if its distance from a line that connects two adjacent points is less than a given threshold value.All of the points of the simplified line coincide with the points of the original line.
The Reumann-Witkam algorithm is an algorithm that belongs to the group of unconditional local processing procedures, which examines both directly adjacent points but also evaluates groups of lines.In the mathematical sense, two straight lines are drawn on both sides of a line parallel to the segment that connects the first two points of it.Then, the algorithm looks for the next line segment that intersects one of the lines.All of the points between the first and last intersections of the segment and the straight line are removed from the result line.Then, the algorithm examines the next segment between the left point and its next neighbor.
Lang's algorithm belongs to the group of conditional extended local processing procedures.In this group, the algorithms are required to specify the number of points to be grouped in a tested line in addition to their linear and angular tolerances.In the algorithm, the first point of the line is connected by a segment with the point that is located at the nth key point.The next step is to calculate all of the distances from this segment to the nodal points between the individual vertices.If the calculated distance is greater than the specified tolerance, the procedure is repeated for the segment between the first and n - 1 points until all of the distances between the line and the intermediate points are less than the tolerance.The whole procedure is repeated several times from the nth point toward the first point.
The Douglas-Peucker algorithm is an example of a global algorithm that takes an entire line (or its specified fragment) into account when solving, selecting extreme points by an iterative method, taking the distance from the chord in the segment into account.The algorithm preserves the general character of the original line.All of the simplified points are in line with the points of the original line [31].
Chrobak's algorithm belongs to the group of global algorithms.Due to the application of the recognition standard, the algorithm enables the simplification process to be carried out in an automated manner without the participation of the operator; this guarantees an unambiguous test result [21,32] (Fig. 1).

Fig. 1. Elementary triangle
The operation of the algorithm boils down to the selection of intermediate vertices based on the determined elements of the primary line.The selection of extremes begins with a triangle that is formed on the examined polyline from the points of the base: the initial, final, and most distant vertex of the polyline.Verification of the standard is based on the following formula: where: ε 01 , ε 02 -length of the shorter leg in the triangle, M -denominator of map scale.
A fragment of the Vistula River in its middle course was selected as a test area for the individual algorithms.Both the left and right banks of the Vistula River were vectorized based on a high-resolution orthophotomap that was downloaded by WMS (Fig. 2).The vectorization process was deliberately chosen as a way to obtain information due to the need to verify the algorithms in the context of selecting the right key points that are involved in the generalization process.Both banks of the river have been vectorized for an approximate length of about 5500 m.As a result of the vectorization, two broken lines were obtained with 138 points for the right bank of the river and 213 for the left bank, respectively.6 present the results of the generalization that was made with the tested methods.The figures confirmed that the selection of the boundary scales for the studied section of the Vistula was appropriate.This was evidenced by the high compliance of the fragment of the examined Vistula ribbon with all of the generalization algorithms for the scale of 1:30,000, which confirmed the assumption of the similarity stage and the transition to the simplification stage.The algorithms behaved similarly for the next scale threshold (which can be seen in Figure 4).For this scale, the stage of simplification ends and the schematization begins.For the next scale, it was decided to present two drawings due to the fact that the tested algorithms were subject to symbolization.This was evidenced not only by Figures 5  and 6 but, above all, by the table that showed the percentage share of the vertices of the generalized band (Table 1).
The generalization models that are commonly known from the literature try to address the issue in a comprehensive way, which makes them complicated; this requires a lot of knowledge from the operator, who must take the specific task that the resulting map is intended to serve into account when using them.The solution that is proposed by the authors boils down to a partial modification of the Ratajski model [22] and referring it to generalized linear objects.This way of application as well as the simplicity of the rules of the defined solution make it possible to use it even among operators who do not have much experience in the field of cartography.The lack of complicated algorithms means that the applied solution will not cause problems with implementation.The proposal includes the creation of four generalization thresholds, making the generalization thresholds dependent on the ratio of the number of points remaining after the generalization to the number of points before the generalization.In the opinion of the authors, defining the thresholds of the range should be supported by the probability density function of the normal distribution with mean μ and variance σ 2 an example of which is the Gaussian function [33].The rationale for using the function is that the normal distribution is the distribution over the expected value (mean).It follows that the results are on the left side and on the right side of the graph.According to the three-sigma criteria, approximately 68% of the results are in the range of one standard deviation to one plus one standard deviation.The situation is similar for two and three sigmas.This rule is used to analyze the distribution of scores and examine the outliers that drive the conduct distribution figure to a normal distribution.This means that, even if the original score distribution is not normal, the generalization score distribution should be approximated by a normal distribution: Taking the fact that the value of the statistical variable is located at a distance of not less than and not more than one standard deviation from the mean into account, the probability is about 68%, which also defines the generalization threshold range.Using the classic measure of function variability that is most often used in statistics (the so-called three-sigma rule), a probability of 95% will be the next point of the designated generalization threshold.
The proposed proposal for examining the thresholds and stages of the generalization boils down to defining the following ranges: -similarities defined at level of ≥95% are understood as form of mutual close relationships that can be indicated based on their properties or features; -simplification within range of (for 68% to 95%), resulting in reduction in complications and complexity of graphics (which is easier to understand) without significantly affecting key points; -schematization included within range of (for 10% to 68%), which should be understood as creating ordered structure that greatly facilitates interpretation or analysis, affecting key points; -symbology for which generalization threshold would be less than 10%, and lines are represented by characters that are about to reach limit of map capacity (further generalization will no longer be possible).
In the case of applying the three-sigma rule at a probability level of 99.7%, there is a risk of omitting important terrain details and omitting the bridge next to the river (the representations of which would be limited to single points).Generalization at this stage should, therefore, be monitored very carefully.
The vertical axis of the graph (Fig. 7) shows the percentage value of the points that remain in the generalization that was previously calculated for each of the output scales.The successive values of the denominators of the output scales are placed on the horizontal axis.The classic case of the solution can be used for each of the simplification algorithms, with a particular emphasis on the Chrobak's algorithm (which is the only one of the existing algorithms that allows for not only reductions of the vertices but also their addition).Then, the general formula is as follows: where: n 3 -number of points after generalization, n 0 -number of points before generalization.
This would have to take the following form: where: n 3 -number of points after generalization, n 0 -number of points before generalization, n 2 -number of added points.
Chrobak's model would also require us to calculate the length of the side of an elementary triangle, which would be implemented according to the following formula: where: M -denominator of scale, ε j -length of side of elementary triangle.
Information regarding the number of vertices was obtained on the basis of the.shp layer data in the postprocessing process using the vector-analysis tool.In the case of Chrobak's algorithm, information on the points before and after the generalization as well as the added points was generated in the form of a text file report.The use of Chrobak's algorithm (including the addition of points) in the process of determining the generalized broken line was aimed at presenting the river banks on the map; this corresponds to the optics that have been presented in scientific publications [32].

Results
Figure 8 was constructed based on the test results presented in Table 1.The graph shows the percentage share of the remaining vertices in terms of the vertices of the original curve.Comparing the algorithms, it should be stated that their behaviors were similar (although a detailed analysis showed that Chrobak's algorithm stood out with a smaller number of rejected vertices as compared to the other algorithms).Regardless of the chosen algorithm, it is possible to specify four generalization thresholds depending on the percentages of the remaining points (Table 1).The method of conversion and graphical interpretation was presented based on Chrobak's algorithm, which was rated the best.This algorithm caused the smallest percentage and numerical drops of generalized vertices (Table 1, Fig. 8), which affected the correct graphical representation of the points; also, the method of calculating the generalization thresholds was different from the other algorithms due to the possibility of creating new vertices (Table 2).
Regardless of the selected algorithm, the generalization thresholds can be defined for each of them; it can also be noticed that the selection of the thresholds depended not only on the type of algorithm that was used but also on the nature of the input object, its shape, and the number of vertices (Figs. 2, 8, Table 1).
As we can see in the results, the simplification interval that was most effective occurred for the range of mean scales between 15,000 and 60,000 after taking the analyzed Chrobak's algorithm into account (Figs. 9, 10).

Conclusions and Discussion
When analyzing the obtained results, it should be stated that the recognition standard that was introduced into the generalization process produced good results, causing the smallest possible changes in the geometry of the curve; this was also confirmed by research that was conducted by other scientists [34,35].The research results were presented with the example of Chrobak's algorithm;  the smallest number of vertices that were removed from the curve proved the most precise reflection of reality.The algorithms that were selected for analysis had a common feature, which was their rejections of vertices (the difference being the rate of rejection depending on the denominator of the target map scale).The proposal to make the selection of generalization thresholds dependent on the percentage share of points is a solution that is as simple to use as it is to implement.The method of defining intervals based on the three-sigma rule is a solution that guarantees that the obtained results will be characteristic of the probability density function of the normal distribution (which will define individual intervals most objectively).This interval is characterized by high correspondence with the input object while optimizing the number of vertices and reducing the complications and complexity of the graphics (which is easier to understand) without significantly affecting the key points.The introduced generalization proposal defines the framework of the intervals in a strict (mathematical) way, leaving no room for subjective interpretations; this should be considered to be its undoubted advantage.

Fig. 5 .
Fig. 5. Result of generalization of fragment of Vistula River for 1:1,000,000 scale

Fig. 6 .
Fig. 6.Result of generalization is visible for entire section of Vistula under study

Fig. 7 .
Fig. 7. Chart of thresholds and stages of generalization

Fig. 8 .
Fig. 8. Graph showing percentages of vertices after simplification depending on denominator of map scale

Fig. 10 .
Fig. 10.Presentation of generalization thresholds on example of left bank of Vistula River

Table 1 .
List of number of vertices resulting from simplification and their percentage shares

Table 2 .
Selection of generalization scales on example of Chrobak's algorithm for right bank of Vistula River Presentation of generalization thresholds on example of right bank of Vistula River