Semantic and Syntactic Interoperability Issues in the Context of SDI

Interoperability is one of the core concepts of Spatial Data Infrastructure due to the fact that exchange and access to spatial data is the foremost aim of any SDI. These issues are closely related to the concept of application schema that plays a significant role in interchanging spatial data and information across SDI. It is the basis of a successful data interchange between two systems as it defines the possible content and structure of data, thus it covers both semantic and syntactic interoperability. These matters also appear in a couple of questions concerning SDI including, among others, a model‑driven approach and data specifications. 
Spatial data exchange through SDI involves UML and GML application sche­mas that comprise semantic and syntactic interoperability respectively. How­ever, working out accurate and correct application schemas may be a challenge. Moreover, their faultiness or complexity may influence the ability to valid data interchange. 
The principal subject of this paper is to present the concept of interoperability in SDI, especially semantic and syntactic, as well as to discuss the role of UML and GML application schemas during the interoperable exchange of spatial data over SDI. Considerations were conducted focusing on the European SDI and the National SDI in Poland.


Introduction
"Spatial Data Infrastructure (SDI) is a general term for the computerised environment for handling data (spatial data) that relates to a position on or near the surface of the earth. It may be defined in a range of ways, in different circumstances, from the local up to the global level" [1].
There are over 150 SDI initiatives described in the literature according to Longley et al. [2]. INSPIRE is the main SDI initiative in the European Union (EU), established at the supranational level to support environmental policies and policies or activities that may have a direct or indirect impact on the environment. The INSPIRE Directive [3] set up an infrastructure for spatial information in Europe that includes metadata, spatial data sets and spatial data services; network services and technologies; agreements on sharing, access and use; coordination and monitoring mechanisms, processes and procedures, established, operated or made available in accordance with this Directive, what means in an interoperable manner.
Interoperability, particularly semantic and syntactic interoperability, is one of the core concepts of SDI due to the fact that the exchange of and access to spatial data is the main aim of any SDI. The significant role in this case plays an application schema that is the basis of a successful data interchange between two systems. It defines the possible content and structure of data [4].
In the process of spatial data interchange through SDI, two types of application schema take part: the first one expressed in the UML (Unified Modelling Language) and the second one in the GML (Geography Markup Language). In general, the UML application schema comprises semantic interoperability and the GML application schema covers syntactic interoperability. However, working out accurate and correct application schemas may be a challenging task. Many issues should be considered, such as appropriate regulations for given problem or topic, production opportunities and limitations. Besides, what if these structures are too complex? Does this fact influence the ability to valid data exchange? Therefore, examining the complexity and quality of these application schemas seems to be an extremely important issue in the context of semantic and syntactic interoperability in SDI.
The principal subject of this paper is to present the concept of interoperability in SDI, on the example of the European SDI and the National SDI in Poland, as well as to discuss the role of the UML and GML application schemas that are commonly used during the interoperable exchange of spatial data within these SDIs.
This article briefly sets out the context of further research aiming to elaborate a general methodology for examining and evaluating the UML and GML application schemas quality and, at a later stage, quality and complexity measures of data structures expressed in the UML and GML. The results of the conducted research will primarily become the contribution base for creating some guidelines and recommendations that will allow the optimisation of the UML and GML application schemas currently in force in Poland.

Interoperability in the Context of SDI
The concept of interoperability has been widely defined in diverse contexts related to information technology (e.g. [5][6][7]). The most fundamental and well known definition comes from the Institute of Electrical and Electronics Engineers (IEEE) and reads as follows: the ability of two or more systems or components to exchange information and to use the information that has been exchanged [8]. According to the ISO/IEC 2382-1 [9] interoperability means the capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units.
In the geographic information (geoinformation/geomatics) domain, special attention should be given to the interoperability frameworks determined in the ISO 19100 series of International Standards and by the European Commission that are the foundation of the functioning of European SDI.

Interoperability according to the ISO 19100 Series
The ISO 19100 series of geographic information standards establishes the conceptual framework for interoperability of geographic information and compares it "to an interpersonal communication process by which independent systems manipulate, exchange, and integrate information that are received from others automatically" [10] (Fig. 1).

Fig. 1. Levels of interoperability
Source: own elaboration on the basis of [10] and adapted from [11] The reference model worked out in the ISO 19101 [10] defines and describes interoperability of geographic information in relation to system, syntactic, structural, and semantic levels. According to it, interoperability in geographic information is broken down into six layers [11]: network protocols, file systems, remote procedure calls, search and access databases, geographic information systems, semantic interoperability.
Network protocols are the lowest layer but also the most significant on account of each layer in this decomposition is dependent on the layers that underlie it. This level of interoperability describes basic communication between systems within a network of computers that consists of hardware and software. Network protocols that belong to software determine communication between applications and the transmission of signals on the network [10].
The next layer is file systems and interoperability. At this level it means above all the ability to open and display files from another system in their native format [10].
Remote procedure calls, by contrast, enable users to execute programs on a remote system, regardless of any operating system [10].
Afterwards, the search and access databases layer is responsible for "the ability to query and manipulate data in a common database that is distributed over different platforms". This means "seamlessly access databases despite the locations, the data structures, and the query languages of the database management systems" [10].
Interoperability between geographic information systems (GIS) provides transparent access to spatial and temporal data, the sharing of spatial databases, and other services independently of the platform. To achieve this kind of "interoperability, real world phenomena need to be abstracted and represented using a common mechanism, services shall follow a common specification model, and institutional issues solved in an information communities model" [10].
The highest level of interoperability in geographic information is semantic interoperability that concerns the proper exchange, interpretation, understanding and use of geographic information between systems [10].

Interoperability according to INSPIRE
In line with the INSPIRE Directive [3] interoperability means the possibility for spatial data sets to be combined, and for services to interact, without repetitive manual intervention, in such a way that the result is coherent and the added value of the data sets and services is enhanced. This stands for "users of the infrastructure are able to integrate spatial data from diverse sources and these retrieved datasets follow a common structure and shared semantics" [12]. Interoperability is one of the core concepts of the European SDI due to it is "built on the existing standards, information systems and infrastructures, professional and cultural practices of the 27 Member States of the EU in all the 23 official and possibly also the minority languages of the EU" [12]. Therefore, "in the context of SDI interoperability is the ability to exchange and manipulate geographic information across distributed systems without having to consider the heterogeneity of the information source, e.g. format and semantics" [10]. Furthermore, SDIs "are becoming more and more linked to and integrated with systems developed in the context of e-Government" [12] that "consists of governance, information and communication technology (ICT), business process re-engineering, and citizens at all levels of government, e.g. city, state/province, national and international" [10]. A crucial element of e-government data is geographic information and its relation with other types of information in an interoperable manner.
The European Interoperability Framework (EIF) "defines a set of recommendations and guidelines for e-government services so that public administrations, enterprises and citizens can interact across borders, in a pan-European context" [13]. For the purpose of the EIF, interoperability is the "ability of information and communication technology (ICT) systems and of the business processes they support to exchange data and to enable the sharing of information and knowledge" [14]. Moreover the EIF defines an interoperability model ( Fig. 2) that includes [15]: a background layer (interoperability governance), four principal layers of interoperability (legal, organisational, semantic and technical), and a cross-cutting component of the four layers (integrated public service governance).  [15] Interoperability governance "refers to decisions on interoperability frameworks, institutional arrangements, organisational structures, roles and responsibilities, policies, agreements and other aspects of ensuring and monitoring interoperability at national and EU levels" [15]. One of the significant parts of interoperability governance at the EU level is the EIF, and the INSPIRE Directive in turn "is an important domain-specific illustration of an interoperability framework including legal interoperability, coordination structures and technical interoperability arrangements" [15]. An example of interoperability governance at the national level is the National Interoperability Framework [16] established in Poland that lays down the minimal requirements for public registers and electronic information exchange and also information and communication systems.
Legal interoperability "ensures that organisations operating under different legal frameworks, policies and strategies are able to work together" [15].
Organisational interoperability is about "documenting and integrating or aligning business processes and relevant information exchanged" [15]. This layer also refers to service identification, availability and access.
Semantic interoperability provides "that the precise format and meaning of exchanged data and information is preserved and understood throughout exchanges between parties. In the EIF, semantic interoperability covers both semantic and syntactic aspects" [15]. The first one aspect "refers to the meaning of data elements and the relationship between them. It includes developing vocabularies and schemas to describe data exchanges, and ensures that data elements are understood in the same way by all communicating parties" [15]. In turn the syntactic aspect concerns "describing the exact format of the information to be exchanged in terms of grammar and format" [15].
Technical interoperability includes "the applications and infrastructures linking systems and services, by extension, interface specifications, interconnection services, data integration services, data presentation and exchange, and secure communication protocols" [15].
Integrated public service governance covers legal, organisational, semantic and technical layers. This jointing layer refers to "ensuring interoperability during preparation of legal instruments, organisation business processes, information exchange, services and components that support public services" [15].

Semantic and Syntactic Interoperability
A common part of the interoperability models introduced above is the semantic interoperability. This layer also appears in other interoperability frameworks, linked to information systems in various domains, like healthcare, emergency management or military and defence, widely discussed in the literature. However, the majority of these models base on Levels of Conceptual Interoperability Model (LCIM) [17]. Turnitsa [18] in the current version of LCIM distinguishes 7 levels, including no interoperability (level 0), technical, syntactic, semantic, pragmatic, dynamic and conceptual interoperability (level 6).
Semantic interoperability is defined as the ability of two or more computer systems to automatically interpret the information exchanged meaningfully and accurately in order to produce useful results as defined by the end users of both systems [19]. A necessary precondition for achieving not only semantic interoperability, but any further interoperability, is the syntactic interoperability. According to Krishnamurthy and St. Louis [19], two or more computers systems exhibit syntactic interoperability, if they are capable of communicating and exchanging data. In this case specified data formats are crucial, e.g. the XML (eXtensible Markup Language) standard provides this kind of interoperability and it is used by the GML. Thus, in general terms and in the context of spatial data, the semantics refers to the content and the meaning of information (spatial objects and their attributes) while the syntax refers to the structuring or ordering of data [20].
Both semantic and syntactic issues should be considered in order to reach interoperability in SDI [22]. They are closely related to the concept of application schema and play one of the key roles in interchanging spatial data and information across SDI. These matters also appear in a couple of questions concerning SDI, among others, model-driven approach, spatial data interchange and data specifications.

Data-Centric View on SDI
The CEN/TR 15449 series of Technical Reports [1,[21][22][23][24] developed by the European Committee for Standardization (CEN) offers two different approaches to SDI: data-centric view and service-centric view. The data-centric view on SDI addresses among others the concept of semantic interoperability [22] and it is related to the data that are at the heart of SDI. This perspective includes application schemas and metadata [1].
One of the general considerations for achieving interoperability according to the reference model for SDIs defined in the CEN/TR 15449-1 [1] is the use of the model-driven approach. This solution for SDI development is also promoted by the ISO 19100 series of International Standards and it follows the concepts formulated in the model-driven architecture (MDA) defined by the OMG [25], that enables cross-platform interoperability.
In the model-driven approach, the starting point is the universe of discourse (view of the real or hypothetical world that includes everything of interest [10]) expressed in the form of a conceptual model that formally can be represented in one or more conceptual schemas. This schema, using a conceptual schema language, defines how the universe of discourse is described as data [22]. The conceptual schema language is a formal language containing the required linguistic constructs to describe the conceptual model in the conceptual schema [10]. Besides, conceptual schema can be used by one or more applications and then it is called an application schema. The application schema provides not only "a description of the semantic structure of the spatial dataset but also identifies the spatial object types and reference systems required to provide a complete description of geographic (spatial) information in the dataset" [22].
The set of principles for such model-driven approach is supplied by the ISO 19100 suite of geographic information standards (Fig. 3). In general outline, the information is described by an application schema (formal, implementation-independent description of semantics and logical data structures). Specifications and implementations for different techniques (e.g. relational database, XML schema for data transfer) and various implementations environments (e.g. J2EE, .Net) can be obtained from the schema in a more or less automatic way [22].

Spatial Data Interchange
Access and exchange of spatial data are the main goals of any SDI. In this context, semantic and syntactic issues become very important, specifically when spatial data are interchanged between different systems [22]. Applications (software) and users (people) should interpret data and information in the same manner to ensure they are understood as it was planned by the creator of the data. In line with the ISO 19100 series of standards that support this level of interoperability, two fundamental issues need to be determined to achieve interoperability between heterogeneous systems [4]. The first is to define the semantics of the content and the logical structures of spatial data, something which should be done in the application schema. Second, a system and platform independent data structure needs to be defined that can represent data according to the application schema [4].
An overview of an interoperable data exchange is illustrated in Figure 4. System A wants to send a dataset to system B, what follows, system B has to be able to use data from system A. To ensure a successful data transfer, both systems must agree on a common application schema I, which encoding rule R to apply, and what kind of transfer protocol to use [4]. The application schema defines the possible content and structure of the interchanged spatial data, thus it underpins the interoperable data exchange. By contrast, "the encoding rule defines the conversion rules for how to code the data into a system independent data structure" [4].
For the purpose of the data transfer, data is structured in accordance with the common application schema I and encoded/decoded in compliance with the principles defined in the ISO 19118 standard [4]. Data mappings (MAI and MIB) specify how existing schema A can be converted to the application schema I and how the data according to the application schema I can be transformed to an existing schema B [22]. In case of differences between data structure of system A or B and I, that kind of mappings may be difficult to accomplish. Nevertheless, if the semantics of system A or B are different from that of I, such mappings may be even impossible to achieve [22]. Hence semantics is a very important issue.

Application Schemas
An application schema is a conceptual schema for applications with similar data requirements [4]. As mentioned above, the application schema is the basis of a successful data transfer as it defines the possible content and structure of the exchanged spatial data. Therefore, it covers both semantic and syntactic interoperability.
Additionally, beyond providing the description of the features in the data set, the application schema also identifies the spatial object types and reference systems, as well as data quality elements [10].
Moreover, to ensure a fulfilling result, it should be accessible to both the sender and receiver of spatial data. The International Standards in the domain of geographic information recommend that it should be transferred before data interchange proceeds. Then, both ends of this transaction can prepare their systems by implementing the appropriate mappings and data structures corresponding to the application schema [4].
During the interoperable spatial data exchange, two types of application schema commonly take part, the first one expressed in the UML, the second one in the GML.
In line with the ISO 19100 suite of standards, the application schema used in the spatial data interchange process should be expressed in the UML conceptual schema language, in compliance with the ISO 19103 [26] and the ISO 19109 [27]. These International Standards provide a set of rules for how to properly write the application schema, including the usage of standardized schemas to define feature types. The UML allows to present data models in a graphical way (as UML diagrams) that provides a well understood form of the spatial data, especially for people. In addition, this presentation is also readable by machines as XMI (XML Metadata Interchange) format to support the transition to the encoding schemas.
The GML is an XML encoding based on principles specified in the ISO 19118 [4]. It was developed to provide a common XML encoding for spatial data, as well as "an open, vendor-neutral framework for the description of geospatial application schemas for the transport and storage of geographic information in the XML" [28].
The GML application schema is an application schema written in the XML Schema in accordance with the rules specified in the ISO 19136 [28]. It also has to import the GML schema that compromises XML encodings of a number of the conceptual classes defined in the ISO 19100 series of International Standards.
In conclusion, generally the UML application schema comprises semantic interoperability and it is mainly dedicated to humans, whereas the GML application schema covers syntactic interoperability and is intended for machines and software.
Both UML and GML application schemas are widely used in the European SDI as well as at the national level in Poland. They are an integral part of spatial data specifications and relevant regulations in the form of data models.

Data Specifications
The spatial data are the centre of SDI and they represent the real world (the universe of discourse) in abstracted form that can be structured in data models. The ISO 19100 series of geographic information standards provides well defined methodology, based on conceptual modelling, for elaborating such models. The spatial data model is a mathematical construct to formalise the perception of space [12]. A conceptual model includes semantics (concepts) to place spatial objects within the scope of the description, while an application schema adds logical structure to this semantics.
Data models are encapsulated in data specifications that beyond these models also contain other relevant requirements about data, such as rules for data capture, encoding, and delivery, as well as provisions of data quality and consistency, metadata, etc.
In the broader sense, data specification can refer to both the data product specification and the interoperability target specification in SDI. The first one is a detailed description of a dataset or dataset series used for creating a specific data product [29]. The second one is used for transforming existing data so that they share common characteristics [12].
In the case of INSPIRE, such data specifications have been developed for the 34 themes of the 3 annexes of the Directive to achieve interoperability in the European SDI. These guidelines can be used by the Member States of the EU to create new datasets or to transform existing datasets according to the specifications by mapping the existing model to the model described in the data specification documents. Thus, semantic interoperability can be attained in this manner. Various datasets can be used together and be understood by different SDI users in the same way [22].
By way of illustration, a similar approach was used in Poland to implement the INSPIRE Directive and establish the National SDI. Passing the law of the infrastructure for spatial information in Poland, that is a transposition of the INSPIRE Directive (what means an adjustment of the INSPIRE regulations to the national law), involved the introduction of many acts and changes to related laws, among others the law on geodesy and cartography. The Head Office of Geodesy and Cartography (HOGC), the main coordinator of the creation and functioning of SDI in Poland, made a decision to replace the existing instructions and guidelines (very often obsolete) by regulations of the Cabinet or relevant minister that on the one hand became annexes to the law on geodesy and cartography and on the other hand put some recommendations of the INSPIRE Directive into action.
An integral part of these elaborated regulations are the UML and GML application schemas that define information structures of spatial databases, corresponding to each regulation. In terms of the ISO 19131 [29], these instructions are data product specifications. The aforementioned schemas were prepared according to the ISO 19100 series of International Standards in the geographic information domain, what should ensure interoperability of spatial data sets and GIS applications in Poland. Regulations cover the whole legal and technical issues regarding the geodesy domain. This was a very ambitious challenge, particularly the methodology of the conceptual modelling and the usage of the UML and GML notations in conceptual schemas, which describe the information content of databases, being applied for the first time in Poland.

Interoperability Challenges in SDI
Two approaches are possible to reach interoperability in SDI: transformation and harmonisation of spatial data. Transformation uses the information and communications technologies and does not impact the original data structures, while data harmonisation is the process of modifying and fine-tuning semantics and data structure to enable compatibility with agreements (specifications, standards, or legal acts) across borders and/or user communities [12]. When technical arrangements are not sufficient to connect the interoperability gap between the communicating systems in SDI, then harmonisation is needed. In the opinion of Tóth et al. [12], the combination of these two approaches provides the best solution in SDI.
Therefore, countries participating in the creation of the European SDI "should transform or harmonise their existing datasets to match them with specifications as described and required by the INSPIRE Directive and its implementing rules" [22]. In practice, "these specifications can also be used to create new datasets or datasets series that match those requirements" [22].
In the case of Poland, the process of harmonisation required either working out new data structures or adjusting existing data structures of spatial databases to INSPIRE guidelines and recommendations.
As stated above, data structures are described with the use of UML and GML application schemas. Nevertheless, working out accurate and correct application schemas is not an easy task. There should be considered many issues, for instance recommendations of the ISO 19100 series of geographic information standards, appropriate regulations for given problem or topic, production opportunities and limitations (i.e. software, tools).
In addition, the GML application schema is strictly connected with the UML application schema, in other words, it should be its translation. Following the ISO 19136 [28], the GML application schema can be constructed in two different and alternative ways. The first is by adhering to the rules specified in the ISO 19109 for application schemas in the UML, and conforming to both the constraints on such schemas and the rules for mapping them to the GML application schemas (according to the ISO 19136). The second is by adhering to the rules for GML application schemas (specified in the ISO 19136) for creating a GML application schema directly in the XML Schema. The first approach is commonly used in practice.
However, not everything that can be expressed in the UML can be represented straightforwardly in the GML, and this can have a significant influence on the spatial data sets and GIS interoperability, and thereby the ability to valid data exchange. Moreover, what should one do in the case of overly complex or even faulty application schemas that are indeed the base of successful spatial data interchange and also determine the final structure of a database? Incorrect or overly complex data structures have a direct influence on the the ability to generate GML data sets with concrete data (objects) and thereby can cause various problems and anomalies at the data production stage. Such problems have not only appeared in Poland during the adjustment of existing spatial data structures to INSPIRE guidelines and recommendations, particularly INSPIRE data specifications, but mainly after publishing regulations that define data structures for relevant spatial databases.
Already during their creation, these application schemas had some technical problems concerning the UML to GML transformation. After publishing the regulations, some contractors also reported remarks about application schemas, among others, faults, mistakes or anomalies in their notation. One of the reasons may be an ambiguity of the UML to GML transformation [30], while another may be the overly complex application schemas elaborated. Unfortunately, these problems can influence the potential to generate GML files with spatial data, as well as the ability of GIS software to process these files.
At the HOGC, work is currently underway to detect the most problematic issues related to the existing UML and GML application schemas and to propose some improvements to optimise these schemas. In turn, at the European level, INSPIRE data specifications are revised regularly and some corrigenda or new versions of these documents are published on the INSPIRE website.

Conclusions and Summary
The deployment of SDIs facilitates the interoperability of geographic information. SDI is meant as a networked environment (e.g. Internet) that supports the easy and coordinated access to geographic information and geographic information services [31]. It is an essential resource required for a specific activity, e.g. can enhance crisis and disaster management or environmental monitoring. The data plays the key role in SDI because the exchange of and access to spatial data is the principal objective of any SDI. One of the fundamentals of interoperability are standards and specifications. In the case of the European SDI, examples of such standards are the UML and GML that are also included in spatial data specifications in the form of application schemas. Data specifications refer to the interoperability target specification in the context of SDI.
Establishing the Infrastructure for Spatial Information in Europe requires, among others, harmonising different spatial data sets and thereby ensuring their semantic and syntactic coherence. This process involves adjusting existing spatial data structures to INSPIRE guidelines and recommendations, especially INSPIRE data specifications. In these documents data structures are described with the use of UML and GML application schemas. Incorrect or too complex data structures have direct influence the ability to generate GML data sets with concrete data (objects) and thereby can cause various problems and anomalies at the data production stage.
According to the CEN/TR 15449-1 [1] one of few general considerations for achieving interoperability is keeping things simple and checking the quality. For these reasons, the capability to examine and estimate the UML and GML application schemas quality, including also exploring their complexity, seems to be worthwhile, very interesting and an important issue in the context of semantic and syntactic interoperability in SDIs.
As a part of further research it is proposed to develop a methodology for examining the UML and GML application schemas quality focusing on a number of selected application schemas prepared in the HOGC in Poland within the INSPIRE Directive implementation works, as well as application schemas from INSPIRE data specifications. The results of this work will first of all provide the foundations for the elaboration of guidelines and recommendations for the optimisation of existing UML and GML application schemas included in Polish regulations.