A Python Library for the Jupyteo IDE Earth Observation Processing Tool Enabling Interoperability with the QGIS System for Use in Data Science

: This paper describes JupyQgis – a new Python library for Jupyteo IDE enabling interoperability with the QGIS system. Jupyteo is an online integrated devel - opment environment for earth observation data processing and is available on a cloud platform. It is targeted at remote sensing experts, scientists and us - ers who can develop the Jupyter notebook by reusing embedded open-source tools, WPS interfaces and existing notebooks. In recent years, there has been an increasing popularity of data science methods that have become the focus of many organizations. Many scientific disciplines are facing a significant trans - formation due to data-driven solutions. This is especially true of geodesy, en - vironmental sciences, and Earth sciences, where large data sets, such as Earth observation satellite data (EO data) and GIS data are used. The previous expe - rience in using Jupyteo, both among the users of this platform and its creators, indicates the need to supplement its functionality with GIS analytical tools. This study analyzed the most efficient way to combine the functionality of the QGIS system with the functionality of the Jupyteo platform in one tool. It was found that the most suitable solution is to create a custom library providing an API for collaboration between both environments. The resulting library makes the work much easier and simplifies the source code of the created Python scripts. The functionality of the developed solution was illustrated with a test use case.


Introduction
The functioning of the world is increasingly based on collecting and processing vast amounts of data. Tools and methods of data acquisition are constantly evolving, entering more and more spheres of our existence. Data are collected on everything, at every time and in every place. This causes each area of life to change gradually and today many things are done differently from the past. For example, in the case of scientific research, model-driven approaches have been supplemented with data-driven approaches [1,2].
In recent years, there has been an increase in the popularity of data science methods in many organizations. Data science is already widely used in business to design successful strategies and policies. The economic sector is facing a significant transformation due to the penetration of data-driven innovation in the business core. A similar transformation is underway within many scientific disciplines [3,4]. This is especially true of geodesy, environmental sciences and Earth sciences. These are disciplines that use large data sets, such as Earth observation satellite data (EO data) and GIS data. The market for this data is broad and diverse. Companies providing data develop or buy increasingly newer technologies and tools because the data processing techniques and tools used several years ago are no longer sufficient. It is also challenging to do without big data processing and storage techniques in this field [5]. Such a situation was predicted earlier by the scientific community [6]. Today, there has been a significant increase in the number and variety of new data science tools in response to the growing demand for the processing of increasingly larger data sets [7,8].
Geoinformation derived from Earth observation satellite data is used in many scientific, governmental and planning tasks. These include, among others, geoscience, atmospheric sciences, cartography, resource management, civil security, disaster relief, as well as planning and decision support [9,10]. Earth observation has irreversibly arrived in the big data era, among others, with the ESA's Sentinel satellites and with the blooming of so-called NewSpace companies, representing the market for private access to space and technologies related to this issue. This not only requires new technological approaches to manage and process large amounts of data but also new analysis methods such as machine learning, artificial intelligence and cluster analysis [11][12][13][14].
In 2019, the volume of only the open data produced by Landsat-7 and Landsat-8, MODIS (Terra and Aqua units) and the three first Sentinel missions (Sentinel-1, Sentinel-2 and Sentinel-3) was around 5 PB [15]. These big data sets often exceed the memory, storage and processing capacities of personal computers, imposing severe limits that lead users to take advantage of only a small portion of the available data for scientific research and operational application [16,17] The demand for new solutions is constantly increasing. Among the new platforms and tools created to store and process EO data in recent years, for example [18] are Google Earth Engine (GEE), Sentinel Hub, Open Data Cube (ODC), System for Earth Observation Data Access, Processing and Analysis for Land Monitoring (SEPAL), OpenEO, JEODPP, pipsCloud and Jupyteo IDE.
When writing about data science today, it is hard not to refer to Python. For scientific computing, data science and machine learning it is the most preferred programming language. This is mainly because Python is relatively easy to learn. Its possibilities are very extensive, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. Python is available on many open-source or free-access platforms, including Jupyter, Anaconda Individual Edition and Google Colab [19][20][21][22][23].

Jupyteo IDE
The variety of libraries available in Python, like Scikit-learn, Pandas (Python Data Analysis Library), NumPy, TensorFlow, Matplotlib, and PySpark, makes techniques such as machine learning or cluster analysis within reach of anyone who can program and is an expert in the given field and is open to new programming techniques. Although the functionality of existing libraries developed by others is often sufficient, sometimes specialized or custom-made tools are required.
Building new tools and platforms for data science very often consists of adapting and improving existing solutions. In such cases, Python and the solutions that use it are handy because most of them are open-source, making it possible to modify their source code. To build a new tool or platform, it is necessary to formulate the functional requirements of a planned solution. The software components must then be identified among existing products to cover most of the specified requirements. For example, Jupyter would be a good choice as the basis for a data science platform. Functionalities that cannot be achieved by adjusting ready-made elements should be programmed on one's own. In the case of a web platform, the most convenient way is to integrate everything into a single, scalable environment using Docker [24]. This is how the Jupyteo platform was created.
Jupyteo is an online integrated development environment (IDE) for earth observation data processing available on a cloud platform. It was created based on an earlier project: JupyTEP IDE [25]. The current version -Jupyteo -is updated, rebuilt and is more extensive than the original -JupyTEP IDE. The main objective of building the Jupyter notebook IDE for EO data processing (Jupyteo) was to extend the Jupyter software ecosystem [26] and customize the existing components for the needs of EO scientists and other professional and non-professional users strongly related to the EO data community. The general approach was based on the configuration, customization, adaptation and mainly integration of Jupyter, Docker, EO data cloud infrastructure and accessible libraries, EO data tools (application programming interface (API), European Space Agency (ESA) sentinel application platform (SNAP) [27], Orfeo Toolbox (OTB) [28] and geospatial data abstraction library (GDAL) [29], etc.). Jupyteo also contains a set of extended Docker Stack based on predefined Docker images and designated for different processing environments and different tasks, such as machine learning, advanced scientific data manipulation and SAR or GIS data processing.
Jupyteo is based on a web-based user interface in the form of an extended and modified Jupyter user interface (UI) with a customized layout, EO data processing engine and a set of predefined notebooks, widgets and tools (Fig. 1). The final IDE is targeted to remote sensing experts, scientists and users who can develop the Jupyter notebook by reusing embedded open-source tools, WPS interfaces and existing notebooks. A fully scalable Docker environment is suitable for the demanding and resource-consuming EO data processing community and automatic tasks related to the processing and development of scripts and algorithms. Jupyteo is also equipped with a spatial data viewer based on the Leaflet plugin for web browsers as a presentation layer. It is used to browse EO datasets and display the results of the processing running in Jupyteo on a map. The Jupyteo platform is available at https://www.jupyteo.com/. It was created and is maintained by WASAT sp. z o.o. It is made available to external users and used by WASAT in ongoing work for implementing tasks and cooperation within scientific projects. Tasks performed using Jupyteo concern the development of data processing algorithms (data science), mainly in the field of spatial data processing, including Earth observation (EO) data and statistical analyses. The platform is also used to validate all of the new solutions and algorithms. Jupyteo, as a demanding platform, is constantly updated and extended. Jupyteo and all its components run under a Linux system encapsulated in Docker containers. Any further considerations and examples in this paper also apply to software running under Linux.

Motivation
If one needs to analyze and edit spatial information or compose and export graphical maps using an open-source GIS system, QGIS is a good choice. It supports vector and raster layers in many formats. QGIS is also well-integrated with other open-source GIS packages, including PostGIS, GRASS GIS, and MapServer. Plugins written in Python or C++ extend QGIS's capabilities. Plugins can help for example with geocoding using the Google Geocoding API, perform geoprocessing similar to ArcGIS's standard tools and provide interfaces to PostgreSQL/PostGIS SpatiaLite and MySQL databases.
However, there is occasionally a need to perform statistical analysis, visualize a graph, train and use a machine learning model or access cluster data sources using PySpark and Hadoop. The best way to perform such tasks is to use a programming language, e.g. Python and a dedicated online platform such as Jupyteo with all libraries and APIs preconfigured and integrated onboard.
There are occasionally projects which are needed to use both GIS and a dedicated platform to perform several advanced tasks. In this case, the exchange of data between both environments is necessary. However, it is a non-standard activity where a case-by-case approach is often required, especially when it comes to exchanging data between different environments and formats. Therefore, a question arises that at the same time allows the purpose of this study to be formulated: what is the most convenient way to combine the functionality of the GIS (QGIS) system with the functionality of the Jupyteo platform in one tool? In this article the author wishes to share his thoughts on the problem and discuss how best to solve it. Therefore, the main goal of this work was to develop a solution that allows for a convenient combination of the functionality of the QGIS system with the functionality of the Jupyteo platform (Jupyter). The way to achieve this goal was to create a library in Python, providing the API for Jupyter scripts (notebooks) that made this connection possible. It should be emphasized here that at the time of commencement of works, there was no solution providing similar functionalities for both the Jupyteo and Jupyter platforms. In the sources, one can find some attempts to solve the problem in question, but none of the presented methods turned out to be sufficient. They are discussed later in this article on current trends among solutions for Jupyter to enable cooperation with QGIS.
Thus, the assumption was to combine the functionality of QGIS with the Jupyteo service. Jupyteo, as a web-based tool, gives online access, works in the cloud, has access to repositories of spatial data and has the possibility of parallel and distributed processing. Moreover, Jupyteo has many tools pre-installed and implemented, such as the already mentioned Scikit-learn, TensorFlow, or Pandas (and others). However, QGIS allows the increased performance of spatial analyses and map editing, while Jupyteo does not have such advanced functionalities. It was necessary to analyze possible scenarios and choose the most optimal and useful approach to combine both environments.
The common element that connects both environments is the Python interface. QGIS has its Python interface -PyQGIS, which gives access to its functionality. In turn, in Jupyteo, Python is the primary programming language. Since Jupyteo is based on Jupyter, the search for a solution began with analyzing existing solutions that allow Jupyter to interact with QGIS.
The sought after solution should work in such a way as to enable the notebook to be integrated with QGIS using as little source code as possible. It should be ready to use with a single library call. The same applies to individual functions such as reading data from layers, saving or analyzing. Each of them should be supported by one or several necessary methods. Thanks to this, the obtained solution will be easy to use in many projects without the need to unnecessarily increase the volume of the source code of the scripts, making them more readable and easier to modify. The author's experience has so far showed that the functionalities needed to work with QGIS in notebooks using the PyQGIS interface are often complex. This can be seen in the code snippet provided in section 2.2 of this paper. This snippet is an example of printing the list of layers, where instead of a single call (e.g. listLayers), the user needs to provide a loop to iterate and display the result. The solution discussed in this paper should be able to somehow "hide" this complexity and simplify the work. Several existing solutions on this subject were analyzed, three of which are described below.

Current Trends among Solutions for Jupyter to Enable Cooperation with QGIS
The search showed that there are relatively few existing solutions that enable QGIS to cooperate with Jupyter. Three of them were taken into consideration: 1) Simple import of "qgis" library into Python script.

3) Connection of a Jupyter / IPython notebook to the Python console in QGIS.
Jupyteo is a server-side web-based system. Each modification of its components requires installation or uninstallation of software components on the platform's backend on which it is running. Generally speaking, Jupyteo instance based on the Linux system runs in the form of a Docker container with all necessary components and configurations. For this, a separate Docker image is preconfigured with all necessary components predefined in a Docker file. A QGIS image is based on an extension of SciPy Jupyter Docker Stack, an entry point for the definition of a QGIS Docker image for Jupyteo. At the stage of starting the Jupyteo QGIS environment, a Docker container with all QGIS configurations boots in the form of a highly usable encapsulated system with all necessary QGIS-related libraries. The last part of the starting process is opening a Jupyter notebook and importing post configuration scripts that enable paths for QGIS resources.

Simple Import of a "qgis" Library into a Python Script
When QGIS is installed in the operating system, the Python interface for QGIS can be connected to a notebook script by importing libraries associated with it. Here is an example use of the QGIS library. Throughout this paper, "In [1]:" and "Out [1]:" statements in the code stand for input-output IPython/Jupyter cells, respectively.
1. Import QGIS: In[1]: import qgis 2. Import needed libraries e.g.: : from qgis.gui import * from qgis.core import * from PyQt5.QtCore import * from qgis.analysis import QgsNativeAlgorithms The above example illustrates the most basic method. Thanks to this, PyQGIS can be used in a similar way to the built-in Python console in QGIS, apart from functions related directly to the QGIS GUI. However, creating a functional notebook script and performing more advanced operations requires a large amount of source code. This will make it more difficult, for example, to display the content of GIS layers or processing results. Loading data into tools used in data science, such as Pandas or PySpark, will also be problematic because PyQGIS is not compatible with them, as well as it is not compatible with Jupyteo (Jupyter) notebooks. Of course, this does not mean that working with QGIS in this way is impossible, but it can be said that it might be complicated.

Use of the Extension for Jupyter -3Liz nbextension
Jupyter (and thus Jupyteo) allows adding software components in the form of so-called notebook extensions (nbextensions). It is a plugin-like mechanism. Extensions can be downloaded or created by users. In the case of creation, the extension must be prepared according to the template provided on the Jupyter [30] project pages. It can then be installed in the Jupyter environment. Thanks to this, new and non-standard functionality is added.
Based on this mechanism, another way of Jupyter's cooperation with QGIS is available [31]. This approach is shared on Github under the name qgis-nbextension by 3Liz.com. Once installed, there is no need to import the QGIS library directly into the notebook script. Access to QGIS functionality is available via so-called IPython magic commands, for example:

In[1]:
% load_ext qgis_ipython % qgis --verbose from qgis.core import Qgis, QgsProject, QgsMapSettings By using this extension, the necessity of connecting to the qgis library is avoided. However, it remains necessary to import individual classes depending on the activities that are to be performed using PyQGIS. Moreover, qgis-nbextension still does not facilitate the collaboration of QGIS with data science and Jupyteo-related tools.

Connection of a Jupyter / IPython Notebook to the Python Console in QGIS
It is also possible to reverse the procedure and connect a Jupyter notebook to QGIS [32]. In this case, it is possible to run notebooks from the Python console in QGIS. However, this method requires Jupyter to be installed on the local machine along with QGIS. In cooperation with a web-based system such as Jupyteo, this method will not be appropriate.

Clarifcation of the Objectives
After becoming acquainted with the possibilities offered by the existing solutions, their functionalities were compared with the requirements for the solution sought. The results of the comparison are summarized in Table 1. The comparison shows that none of the existing solutions offers all the required functionalities. On this basis, the main goal of this paper could be formulated, which was to create a solution that would enable cooperation between Jypyteo / Jupyter notebook and QGIS, taking into account all the requirements listed in Table 1. It was assumed that this solution would be a Python library providing a properly constructed API.
To achieve the goal, the following objectives were formulated: -Installation and configuration of QGIS in the Jupyteo platform system. It is not about QGIS software with a graphical interface, but the ability to access PyQGIS from the operating system shell. -Preparation of a test data set.
-Analysis of data access methods (read / write) and analysis in QGIS projects using the PyQGIS interface. -Development of a method of transferring map styling from a very extensive QGIS environment to the simplified Leaflet browser, which is used in Jupyteo. -Implementation of the developed functionalities, such as those mentioned above, inter alia: opening the QGIS project file in a notebook, reading, writing, viewing descriptive and graphical data, cooperation with Pandas tables. -Implementation of the test case including: • data read from QGIS project using Python notebook script • processing and analysis in Jupyteo notebook -Python script, especially using tools unavailable in QGIS like Pandas and Scikit-learn • re-saving the results in the QGIS project, • reading and presentation of the obtained results both in QGIS UI and in Jupyteo.

Solution Overview, Design Methods and Tools
In response to the problem posed, the author decided to create a Python library called JupyQgis, which could be used from the script level in Jupyteo (Fig. 2). The JupyQgis library provides an API for communication with QGIS via PyQGIS and integrates the functionality of both QGIS and the Jupyteo platform. The library was attached to the Jupyteo platform's API, thanks to which it is available to all users without the need to install it separately on individual virtual machines.
A need to develop the JupyQgis library appeared during research based on EO data using the Jupyteo platform. More than once, projects that our team faced required the use of GIS functionality and performing analyzes in an external environment, which was most often QGIS. Thus, problems arose which allowed to formulate requirements for Jupyteo's cooperation with QGIS. The formulation of the requirements made it possible to identify individual functionalities that should be implemented to achieve the assumed goal. Due to the specificity of the Jupyteo platform, which is based mainly on the use of Python scripts, the most universal form of the solution seemed to be the Python library, integrated with the Jupyteo platform environment, providing the appropriate API.
In general terms, the methodology for developing JupyQgis can be presented as follows: -Requirement specification -identification of problems to be solved and activities that can be automated. There are also methods for processing and analyzing EO and GIS data not available in QGIS, such as: merging data from Pandas tables with QGIS tables or performing fundamental statistical analyses such as, e.g., correlation analysis, linear regression or Tukey's test between selected GIS attributes together with an illustration in a graph. The JupyQgis library works with any project created in QGIS version 2 or 3. The QGIS project, together with layer files, must be placed on the Jupyteo server by uploading project files, or it should be accessible in any other network location that allows remote data reading and writing to access the data. To illustrate the developed solution, selected functionalities of the JupyQgis library are presented and discussed below.
Working with JupyQgis begins by establishing a connection with the QGIS project by creating a JpQgis instance. The path and filename of the QGIS project must be passed as a string parameter to the constructor method. In this way, one can access all fields and methods of the source class: In[]: from jupyQgis import * jpq = jpQgis(<path to qgs of qgz project file>) There are two ways of accessing QGIS data by JupyQgis: 1) direct access to the QGIS project, 2) access through methods of the JpQgis class.

Direct Access to the QGIS Project
JpQgis allows access to a QgsProject instance through the JpQgis.project field: In[]: qgs_project_instance = jpq.project This object gives full access to the opened QGIS project and can be accessed by using the PyQGIS methods described in the QGIS documentation [33]. For the purposes of this article, such a mode can be called standard access.

Access to Descriptive Data through JpQgis Class Methods
This access differs from standard access because the JpQgis class methods enable integration with Jupyteo and provide additional functionality. One of the goals behind the creation of this library, along with the integration with Jupyteo, was to simplify the syntax for the implementation of individual functionalities related to PyQGIS. For example, to list all layer names available in the project, one can use the listLayers() method:

Out[]:
['layer1Name','layer2Name','layer3Name'] The above code will return a Python list containing names of all layers available in the project. To get the same result with PyQGIS, along with opening the QGIS project, one would have to use the following code:

In[]:
#Open QGIS project prj = QgsProject() prj.read(<path to QGIS project file>) #Build array with layer names layersTmp = [] for layer in self.project.mapLayers().values(): layersTmp.append(layer.name()) print layersTmp As shown in the example above, thanks to JupyQgis, both operations -opening a project and displaying an array of names -could be reduced into two lines of code. The same will be in the case of any other function implemented in JupyQgis. For example, to display the metadata of the selected layer, the getLayersFieldNames() method can be used:

In[]: jpq.getLayerFieldNames(<layer name>)
Access to the data contained in the layer's table using JpQgis methods is realized via Pandas. Thanks to this, data analysis and processing scope have become significantly expanded because the Pandas library has extensive functionality in this area [34]. Moreover, it is very popular and fast, and many other data science libraries are compatible with it, such as Scikit-learn, NumPy, Matplotlib and PySpark, which significantly facilitates the integration and exchange of data between different environments.
For example, let us assume that there is a QGIS layer named 'wojewWGS84'. To get its table data as a Pandas DataFrame data structure, one can use getLayerTable-Data() method from the JpQgis class: In[]: import pandas as pd df = jpq.getLayerTableData('wojewWGS84') To manipulate this table, one can then access it like any other Pandas Data-Frame. For example, to select a particular record and attributes, use the code:

Presentation of Graphic Data in Jupyteo Web-map Browser
The presentation of graphic data is one of the essential functions of GIS systems. The Jupyteo platform is designed to process spatial data and therefore it has been equipped with a map viewer. As it is a network solution, the Jupyteo map viewer uses the well-known and popular Leaflet library. However, it is not compatible with QGIS projects and, compared to QGIS, it supports only a few formats. Among these formats -apart from the internal vector format -there are also GeoJson, SVG, JPG, PNG and WMS. When working with QGIS, the user can choose different formats. Implementation of various format support by Jupyteo could turn out to be troublesome and unprofitable. For this reason, a method to automatically convert graphic data to a Leaflet-supported format has been developed. This functionality is currently in the testing phase and works only with vector layers. When a QGIS layer is displayed in the Jupyteo map viewer, it is automatically converted to the GeoJson format and then goes to the map view (Fig. 4).
However, after this conversion, GeoJson data does not contain styling information. Thus, there is a need to acquire additional information about the layer styles during the conversion process. This information is saved differently depending on the styling method used by QGIS for a particular layer. If the layer has a single style for all features, the situation is quite simple. It is only necessary to read the color or line style information and save it along with GeoJson data. However, if a classification has been used for a layer -e.g. due to the individual value of an attribute or ranges of values -then each feature may have a different appearance style. Individual styling in QGIS is performed using algorithms, each appropriately adapted to a specific method of the layer's chosen style. For this, the QGIS renderer class is used, assigned individually to each layer depending on the styling settings, which may change at any time while working with the program. This is a very flexible mechanism from the point of view of PyQGIS API users. However, styling information is not permanently saved with graphic data because layer styling is done on the fly. For this reason, to obtain styling information for Jupyteo, one had to refer to the individual layer settings via the PyQGIS API and save them in an additional column as a GeoJson layer attribute. This was necessary because there is no QGIS styling mechanism equivalent on the Leaflet library side. Thanks to this, the layer displayed in Jupyteo looks practically the same as in QGIS (Fig. 5).

Application of the JupyQgis Library in Data Processing
The JupyQgis library was created to add GIS functionality to the Jupyteo IDE platform. Thanks to this, it is possible to process the same spatial data sets in both tools in one place. The user has at his disposal such functionalities as: -downloading data from the QGIS project, -processing the data on the Jupyteo side, -calling tools for processing GIS spatial data in Jupyteo, -saving processing results back to the QGIS project.
The described library is created as a convenience in work on projects that require the processing of both EO data from repositories available in Jupyteo and GIS analytical data and tools. It is also important to consider how the JupyQgis library might be applied in spatial data processing. A sample test case has been developed for this purpose, which requires using GIS vector data and Python data science tools.

Sample Test Case: Spatial Data Processing
The test case analysis consisted of forecasting the average prices of a square meter of residential real estate in the following statistical year and presenting the results in a thematic map. Since the data at the author's disposal concerned the years 2019 and earlier, forecasts covered the year 2020. The data were obtained from the current databases of the Polish Central Statistical Office [35] and the author's own studies. The conducted analysis is of a statistical and illustrative nature and should not be treated as a method of property valuation, e.g. for market purposes. The main purpose of its conduct was to prepare a test case for the developed JupyQgis library.
Data sets constituting input data for analysis included: -Own data sources in the form of the QGIS project: Real estate appraisal is not the subject of this article, but the author decided to provide some details related to this issue due to the test case described. The value of a property depends on many characteristics. Their effect on the price is different, depending on the national economic and market conditions in which that property is located [36]. When valuing a residential property, the following factors are taken into account: access to roads and communication, distance from the city center, access to power, water and sewage networks, proximity to green and recreational areas, prices of similar properties and many others [37,38].
Since the publicly available statistical data covering the entire territory of Poland are not that detailed, several features were selected which apply to the whole country and generally result from specific features customarily used in property valuation.
For the purposes of the current paper, it was assumed that the population density and number of residential buildings are associated with the availability of other characteristic features of cities such as a denser road network or an extensive water supply and sewage network. Therefore, they can be treated as a summary generalization of the influence of urban development features. When the population density and the number of buildings are of greater value in a specific area, the area can be treated as more industrialized and vice versa. Analyses presented by financial institutions related to the real estate market also show a correlation between salaries and real estate value in Poland [39,40]. In recent years, along with the increase in salaries and their level in individual regions, the real estate value has increased. Therefore, this feature is also used in this article as affecting the value of a property.
Time is another quite important factor. It is observed that in a country such as Poland, the value of real estate increases in the following years. There are also regions where the real estate price is higher or lower than in others. This is due, for example, to the level of industrialization (large cities) or the development of tourism (sea, lakes, forests, or mountains). Price differentiation due to the above-mentioned factors is permanent and is related to a specific location. Information on each district's location is expressed by the code value in the KOD_TERYT attribute and was also included in the forecast.
The forecast algorithm consisted of several stages (Fig. 6). Work started with the preparation and compilation of data. Data from both QGIS and the Central Statistical Office were loaded into one script in Jupyteo, where they were standardized and appropriately processed. For the statistical data, it was necessary to select only the columns required in tables, complete province identifiers, standardize column names, change the decimal separator for numerical values and delete records with missing values.
The individual datasets were then combined into one table. For connection, the TERYT administrative unit identifier used in Poland, common to all records, was used. This value appears in described datasets in the field "KOD_TERYT" and applies to individual districts. As a merge result, a dataset was created with the structure shown in Figure 7.
The resulting dataset contains all attributes that have been selected for forecasting the value of a square meter of residential property, broken down by years in individual districts, including: -AVG_SALARY -average salaries, -BUILDINGS -number of apartment buildings per district, -POP_DEN_KM -population density per square kilometer in a district, -YEAR -the year for which a given record was compiled in the table, -KOD_TERYT -territorial unit identifier (district), thanks to this attribute, the property location in the country was taken into account in the prediction, -PRICE_RESI -square meter property price in the district.
The dataset was divided into training and testing parts. In order to train the model, the Scikit-Learn library and a random forest algorithm were used. Model validation showed an MAE error at the level of PLN 366 and accuracy at the level of PLN 0.96. The resulting model allows forecasting the price of a square meter (PRICE_RESI) based on the following attributes: AVG_SALARY, BUILDINGS, POP_DEN_KM, YEAR, KOD_TERYT.  As the data needed for the test case were available only for 2019 and earlier, the developed model will also forecast PRICE_RESI values from this period. In order to obtain the values for 2020, it was decided to supplement individual input attributes with the predicted values for 2020. For this task, the prediction was carried out using linear regression models built for the time series of individual attributes in relation to districts. This operation concerned attributes, which tend to change in time, i.e. AVG_SALARY, BUILDINGS, POP_DEN_KM.
For example, the predicted value of AVG_SALARY for 2020 for the district "Powiat tarnowski" was PLN 5412 (Fig. 8). The time series for its determination was built based on data from 2015-2019. The values of other input attributes were automatically predicted in the same way for each district. The data supplemented with values from time series forecasting was then used to predict the PRICE_RESI values for 2020 using the previously described machine learning random forest algorithm. The resulting dataset was transferred to the QGIS project using the JupyQgis library. A thematic map was then prepared to illustrate the distribution of forecasted values in individual districts, which was the ultimate goal of the test case to be achieved.

Example of Data Processing with JupyQgis
To illustrate the functionality, use and role of the JupyQgis library in data processing, selected essential fragments of the Jupyteo/Python code implementing the described algorithm are presented below.

. Discussion
The work carried out in this paper suggests that the most convenient way to combine the functionality of the QGIS system with the functionality of the Jupyteo platform was to create a library providing a straightforward and easy-to-use API. The created API significantly facilitates cooperation between both environments, enabling the QGIS project to be used directly from the Jupyteo script. The interoperability is two-way, which means that the user can easily read data from an existing QGIS project, process it with tools commonly used in data science and then save the results back to the QGIS project. The use of JupyQgis library shortens and simplifies the source code of the created Python scripts in relation to the original PyQgis. The resulting tool can be easily used and transferred as a software component of any Python or IPython-based platform extending its functionality. Combining the functionality of a GIS system such as QGIS with a network platform such as Jupyteo creates additional possibilities for analyzing and processing GIS data, especially through: -online cloud data processing, -using modern and very efficient tools to work with big data, such as Pandas, PySpark and others, -use of machine learning tools such as Scikit-learn, -access to data repositories offered by platforms such as Jupyteo (e.g., EO data), -enabling parallel processing (available in Jupyteo), -easy data integration from multiple sources, -enabling new, future solutions and tools that are not yet available by translating a large part of the functionality into processing using a constantly developing language such as Python.
In terms of existing solutions that could be adapted, JupyQgis stands out for its functionality. Using the library involves calling the necessary objects and methods fully integrated with Jupyteo. The existing projects did not meet expectations because they do not fully cooperate with the Jupyteo platform. Using them would involve creating extensive scripts, which, apart from the main functionality, would have to implement cooperation with QGIS.
One of the more difficult problems to solve was presenting spatial data in the form of a map in a Jupyteo map viewer. Leaflet library -component used in Jupyteo uses only one vector format, which is GeoJson. It means that all formats derived from QGIS must be converted to it to be displayed correctly. Additionally, there is a graphic style incompatibility between QGIS and Leaflet. This problem should be given special attention in future work. At the moment, the JupyQgis library created has not yet been thoroughly tested. QGIS is a complex system, which supports many data formats. The design study and test case were based on vector spatial data. The next step will be to develop and test cooperation with raster datasets. At this point, there also might be a problem with converting graphic data formats.

Conclusions
The primary purpose of this work was to determine the most convenient way to combine the functionality of the GIS (QGIS) system with the functionality of the Jupyteo platform in one tool.
During the works, it was found out that the existing solutions aimed at QGIS interoperation with Jupyter did not meet most of the assumed requirements. This applies primarily to the possibility of convenient data exchange with data science-related tools (such as Pandas or Scikit-learn), visualization in a web map viewer, or direct, two-way access to the QGIS project (reading and writing data). The elimination of the above-mentioned deficiencies also affects the readability of the created scripts and the simplification of the source code, which translates into a significant work simplification.
To achieve the assumed goal, a practical attempt to solve the problem of interoperability between QGIS and Jupyteo was made. It was found that the most suitable solution would be to create a proprietary library providing API for collaboration between both of the environments mentioned above. The created library meets expectations and enables efficient cooperation. This fact is supported by a practical example of data processing using various tools presented in this article, where data from the QGIS project was imported into a Jupyteo script. The data was easily combined with external data sources and a forecast of the value of real estate data was performed using machine learning algorithms. The forecast results were then transferred from Jupyteo to QGIS, where a thematic map was created. The created map was again displayed in Jupyteo without a problem. It can be said that the described processing chain, combining QGIS and Jupyteo in one process, has been completed. In this way, it was shown that the JupyQgis library fulfils its role and can become another tool used in data science. Thanks to it, it is possible to include data from QGIS for analyses in Jupyteo in real-time, which significantly extends the existing functionality of both environments.
The problem addressed in this paper has only been partially covered in other studies in the literature. Existing studies have offered some solutions but did not provide satisfactory solutions. The current study identified the problems that need to be faced when combining the functionality of an extensive desktop QGIS system with an online platform such as Jupyteo.
The current study considers mainly loading and saving vector data, which certainly narrows the scope of problems that may still arise. In future steps, attention should be paid to the exchange of raster data and cooperation with QGIS analytical tools, paying particular attention to those that save the results in the QGIS project data in real-time.
The emerging JupyQgis library will be published under an open license, which will make the author's contribution to the development of tools related to the processing of spatial data with the use of GIS public.