The Application of Regression Analysis for Estimating the Market Value of Commercial Real Estate

The transaction price of a land property with commercial buildings depends on both its quantitative and qualitative attributes. Quantitative attributes include surface areas of plots of land and usable floor spaces of premises and buildings with various intended purposes, as well as values of rents. Qualitative attributes are represented by the global attributes of these properties. In the analysis of the land property market with commercial buildings, all pairs that relate a transaction price to individual attributes are considered. The market value prediction is based on multiple regression analysis for a two-dimensional random variable, represented by the price and the predetermined attribute. The final market value of the property being valued is calculated as the weighted average of the market values predicted for each attribute. This research paper presents the procedure for determining the market value of land with commercial buildings, which falls within the method of statistical analysis of the market. The derived formulas and substantively justified algorithms may be the basis for market analysis and estimation of the market value of such land. This procedure has been thoroughly verified using two practical numerical examples.


Formulating a Research Problem
Land properties developed with building structures intended for commercial purposes, services or production, will be referred to further in this paper as land with commercial buildings. The usable floor space of such properties may include: surface areas of residential, office and commercial premises, surface areas of workshop premises and buildings, surface areas of production or warehouse buildings and facilities, as well as surface areas of ramps and storage yards. An important element of this real estate is also the surface area of their land. The main carrier of the market value of such real estate is the usable floor space of premises and commercial buildings.
All legal regulations and the methods of valuating built-up properties recommended therein and discussed Czaja and Parzych [1], always refer to unit transaction prices, which result from dividing the transaction price for the entire property by the usable floor space of the premises or the building of homogeneous purpose.
Such a unit transaction price, representing a single type of usable floor space, forms the basis for market analysis and property valuation.
It is not possible to determine one representative usable floor space for land developed with commercial buildings of various intended purposes, because each type of specific surface generates a different value of the unit market price, and thus, a different market value.
Real estate transactions involving the sale of land built-up with commercial buildings always present the total transaction price, which includes the price of all types of premises and buildings, as well as the price of a plot of land. Separation of the total transaction price into the prices of individual commercial premises and buildings, and the price of the land, is possible using a generalized market analysis based on the estimation of parametric models.
Research studies on the relationship between transaction prices of commercial real estate and their physical, location, rent and economic attributes, has been the subject of numerous scientific papers including those written by such authors as d'Amato [2] and d'Amato and Amoruso [3].
Fehribach et al. [4] studied the volatility of industrial property prices depending on their 11 physical, location, rent and economic attributes, in order to establish price indices for them. Saderion et al. [5] demonstrated the use of hedonic models to estimate the value of commercial real estate, which are based on the characteristic attributes of these properties and net operating profit. In the final part of this research paper, its authors proposed that commercial real estate analysts should exhibit great inquisitiveness as far as the selection and estimation of unit prices and capitalization rates for various types of real estate are concerned. Parzych and Czaja [6] provided the principles for parameter estimation for various elements of commercial real estate, using the weighted least squares method (WLS). The estimated parameters are unit market values of surface areas of premises having various intended purposes as well as coefficients for distinguished attributes that create the market value.
The developed algorithms were verified on numerous practical examples of commercial property valuation. Epley [7] presented the principles of using the Automatic Valuation Model (AVM) to estimate the market value of a property. The author formulated the criteria which should be met for the Automatic Valuation Model to yield correct results of the analysis of market trends.
Market analysis and valuation of land with commercial buildings, based on total transaction prices, can also be performed using multiple two-dimensional regression models, for which it is necessary to define appropriate quantitative and qualitative attributes and to formulate the appropriate algorithm. As Bruneau and Cherfouh [8] and McCartney [9] claim in their papers, long and short-term analysis of both commercial real estate prices and rents is also an important element of theoretical considerations.
This research paper presents the principle of determining appropriate quantitative and qualitative attributes for particular types of premises and buildings as well as for a plot of land, which will be directly related to the transaction price of such real properties. For such attributes, a market analysis and valuation algorithms for land with commercial buildings have been presented and which are referred directly to the total transaction prices of similar properties.

Development of Quantitative and Qualitative Attributes to Describe Volatility of Transaction Prices of Commercial Real Estate
A set of sale transactions of similar real estate that may include usable floor spaces for housing, commerce, services or production, is understood as the market of land with commercial buildings, subject to their location being in the same city or in neighboring communes. The sale of such real estate always consists of a total transaction price, including a plot of land and all buildings as components of this land.
The transaction price of a land property with commercial buildings depends on its quantitative as well as qualitative attributes. Quantitative attributes include surface areas of plots of land and usable floor spaces of premises and buildings with various intended purposes, and they may also comprise values of rents and capitalization rates. Qualitative attributes are represented by global attributes of these properties, such as their location, technical condition, or their standard.
If lands with commercial buildings are located in different districts of the city or in different neighboring communes, then the attribute "location" should be formulated for them, according to the following scale: 4 -very advantageous 3 -advantageous, 2 -average, 1 -disadvantageous.
The market value of the components of a land property is, to a large extent, dependent on the technical condition of the buildings. If the considered land properties include components constructed in a different time period or using building materials of different durability, then the attribute "technical condition of the building" must be defined, according to the following scale: 4 -very good, 3 -good, 2 -satisfactory, 1 -unsatisfactory.
Transaction prices of the analyzed properties in the database for market analysis should be adjusted for the date of the analysis (valuation). In order to determine the change rate of transaction prices in time, the best solution is to take commercial premises with a homogeneous intended purpose or land properties earmarked for commercial development. Such groups of real estate allow for specifying unit transaction prices, which will constitute the basis for determining the unit price change rate within one month.
Adjustment of total transaction prices of land with commercial buildings as for the analysis date, should be performed according to the following formula: where: C TK -transaction price is adjusted as for the analysis date, C T -transaction price obtained in the sale, B -unit price change rate within 1 month, t K -number of months from first transaction to adjustment (analysis) date, t T -number of months from first transaction to considered transaction.
Having taken the above assumptions into account, the database of land with commercial buildings for market analysis should be presented in the appropriate table, the general form of which is illustrated in Table 1.
Based on the exemplary real estate database contained in Table 1, it is noticeable that it is possible to create seven pairs associating transaction price with the specific seven attributes. The size of these pairs is different, therefore, the specific analyzed attributes have different representativeness, and thus, with varying probability, they describe the volatility of transaction prices. All attributes with a minimum size of 2 will be used for the market analysis. The table above demonstrates that the pairs; price -land, price -warehouse and price -location are represented by six properties, while the pairs; price -office and price -commerce are represented by five properties. The pairs price -flat and price -production are represented by three properties.
The market analysis of land with commercial buildings will consider all the pairs that associate transaction price with individual attributes. These pairs will represent a two-dimensional random variable, in which the dependent variable is the transaction price, and the independent variable determines the considered attribute. For each analyzed pair as a two-dimensional random variable, it is necessary to specify the characteristic parameters which include: the average value and standard deviation for both analyzed variables, as well as the total correlation coefficient. Due to the different size of the studied pairs, the probability of representativeness of individual attributes should be taken into account, the value of which will be determined as the ratio of the number of elements in the analyzed attribute and the number of all real properties in the database. The value of this probability should be taken into account when putting together the percentages of individual attributes contributing to the market value of the entire property.
Based on the performed market analysis, according to the principles set out above, the market value of any property can be predicted in each pair considered, i.e. in any two-dimensional random variable. The regression analysis for the two-dimensional random variable represented by the price and the determined attribute is used for this forecast. The ultimate market value of the analyzed property is a weighted average of the market values forecast for each attribute. The formulas for determining these weights are derived using the analysis of variance and probability theory.
To implement the above-mentioned principles of market analysis and real estate valuation, formulas are developed and substantively justified for the estimation of the parameters of a two-dimensional random variable and for the prediction of a dependent variable and its inaccuracy. All theoretical considerations constitute the subject of the next part of the research paper, and their practical illustration, in the form of numerical examples, is presented in the last part of this study.

Correlation and Regression of Two-dimensional Random Variable with Respect to Market Analysis and Property Valuation
In the statistical analysis of the market, the price is a dependent variable, whereas the specified property attribute represents an independent variable. Relationships between the price and the individual attributes of real estate can be described using independent models of a two-dimensional random variable.
In order to establish a regression model for two random variables, which are represented by the attribute a and by the transaction price C, appropriate scales must be determined for them. The graphic image of these two variables in the orthogonal coordinate system is called the correlation diagram. Based on the correlation diagram, it is possible to initially conclude what the relationship between the two variables is.
The parameter defining the mutual dependence of the random variables is the covariance between a variable a and a variable C, which is denoted by cov[a,C]. Depending on the units of the analyzed random variables, the quantity cov[a,C] may take different values. Therefore, the value of the covariance can be standardized using the values of standard deviations of both random variables in boundary distributions, i.e. σ[a] and σ[C].
The standardized cov[a,C] is a measure of linear interdependence of the random variables a and C, and it is called (Pearson's) total correlation coefficient, i.e.: The value of the total correlation coefficient can be determined based on results from the sample, i.e. the database of real estate for market analysis, according to the following alternative formulas: where a, C and σ[a], σ[C] denote the average value and standard deviations of the analyzed attribute and transaction price respectively.
Based on the total correlation coefficient and other characteristic parameters of a two-dimensional random variable, it is possible to determine the functional relationship between the dependent variable and the independent variable, which is called the linear regression model. Determining the regression model for a two-dimensional random variable (a, C) involves a selection of such a function: which will represent (replace) the set of values of both variables. The estimation of the parameters of this function is always carried out according to the ordinary least squares (OLS) approach, i.e. according to the principle: If the function g(a) is adopted in the following linear form: the condition (4) is expressed by the following relationship: After raising the expression in square brackets to the power of two, a function is obtained that represents the following quadratic form: Function (7) will satisfy the minimum condition when its partial derivatives are equal to zero, i.e.: After solving the above system of equations, the final formulas for A and B are as follows: After substituting formulas (9) and (10) into the expression (5), we obtain an association of all characteristic parameters of a two-dimensional random variable (a, C), in the form of the following expression: The quantities occurring in this relationship are defined by formulas (1) to (2), which means that the linear regression model is defined by the average values and standard deviations in boundary distributions determined subject to the database of sold real estate, and taking into account the total correlation coefficient of the variable with respect to the variable C.
Based on the regression model in formula (11), with a predetermined value of one variable, it is possible to predict (calculate) the value of the second variable. In the case of a property value forecast, the value of the attribute (a P ) will be determined, and the predicted unit price of the property (C P ) will be calculated.
Using the analysis of variance for a two-dimensional random variable, it is possible to prove that the absolute correlation coefficient squared (r 2 ) is closely related to the reliability of explaining transaction prices by the attribute under consideration. For the analysis of the variance of a two-dimensional random variable represented by the value of the attribute a and the value of the transaction price C, their average values will be used.
In the analysis of variance, the sums of squares of differences of a random variable in relation to its average value, and the sum of squares of differences of a random variable with respect to its predicted value from the regression line, are always considered. A measure of the total dispersion of the value of a random variable C in relation to its average value C is the sum of the squares of their differences, called the total sum of squares (TSS), i.e.: If the considered values of the random variable are described by the regression line, then the difference − i C C can be replaced with two components: The first component of this sum relates to the part of the random variable C i which is not explained by the linear regression model, while the second component relates to the part of this variable explained by the linear regression prediction.
The explained sum of squares (EES) of the variable C i in the regression model is defined based on the difference between the predicted value of C Pi from the regression line and the average value of C , hence it is expressed by the following formula: The residual part of the sum of squares (RSS) of the variable C i by the regression lines is the difference between TSS and ESS, i.e.: The aim of the presented analysis of variance is to show that the ratio of the explained sum of squares (EES) to the total sum of squares (TSS) is equal to the square of the absolute correlation coefficient (r 2 ) for the considered attribute relative to property prices considered in the database, i.e.: Having taken into account formula (17) and the definition for the variance of the variable C, the expression (16) can be written in the following form: Dependence (18) proves that r 2 determines the measure of adjustment of the straight line of regression of formula (11) to the set of points representing a twodimensional random variable represented by a predetermined attribute and the price of individual properties. The quantity 2 j r can be a measure of explanation of the transaction price by a linear regression model for a predetermined j-th attribute a jP , hence its value can be taken as the weight of accuracy of the prediction C jP based on formula (17). In the analysis of the real estate market, it can be assumed that the value of 2 j r is the contribution of the considered attribute to the explanation of the price volatility in the database. If these contributions are standardized, then these values will represent the weighting factors of the attributes. Determination of weighting factors for specific attributes is an essential stage of the market analysis and forms the basis for the valuation of each real property.

Algorithm for Estimating Market Value of Land with Commercial Buildings
In the first stage of the algorithm, each pair of "attribute -transaction price" from the real estate database must be considered independently, i.e. specify the following parameters for a two-dimensional random variable: j a -average value for the attribute, j C -average value for the corresponding prices, σ(a j ) -standard deviation for the attribute, σ(C j ) -standard deviation for the corresponding prices, r j -total correlation coefficient, P j -probability of representativeness of the pair (P j = n j /n).
Based on the above parameters, for each specified j-th pair, a model (11) of linear two-dimensional regression should be determined, with the following denotations: Equation (19) is the basis for forecasting the market value of a similar property, for each analyzed attribute in the database for market analysis. For this purpose, it is necessary to determine the value of the considered attribute a jP , and the value of the price forecast, which represents the predicted market value, should be calculated for this value. Thus, for each j-th attribute, the representativeness of which is determined by the ratio of the number of its value to the number of all transaction prices, the predicted market value (  j MV ) is determined.
According to the considerations contained in (18), the reliability of the market value prediction according to the relationship (20) is determined using 2 j r , and therefore the weights of the forecast must be determined for the average value from the predicted market values for each attribute. Having adopted the earlier denotations, the weight of reliability of the market value prediction from the j-th attribute should be determined according to the following formula: The estimated market value (EMV) of a property is calculated as the weighted average of forecasts determined from individual attributes, i.e.: The compliance of the valuation model with the database of comparable real estate may be assessed according to the following procedure. For each real property in the database, formula (22) is implemented and, consequently, its market value (EMV i ) is calculated. The standard deviation (σ n ) from differences in transaction prices (C i ) and estimated market values (EMV i ) of each real estate from the database is an average measure of inconsistency of the valuation model with the database of comparable real estate, i.e. The non-compliance ratio of the valuation model with the database of comparable real estate will be expressed by the coefficient of dispersion which has the following form: where C means the average price of real estate in the database.

Numerical Example of Market Analysis and Property Valuation
Subject to six notarial deeds analyzed in a given commune, the following market information was obtained: The real estate described by the notarial deeds has the following identical attributes: -city zone -outskirts, -earmarked in the land use plan -areas for services and commerce, -technical condition and standard of materials used -good.
Therefore, qualitative attributes will not be taken into account in the market analysis and property valuation.
The above information from notarial deeds forms the basis for comparing the values of individual quantitative attributes and transaction prices for the analyzed real estate, as illustrated in Table 2. Based on the database of sold real estate contained in Table 2, all statistical parameters for two-dimensional random variables, represented by the selected attribute and the transaction price, were calculated. The values of these parameters are summarized in Table 3. The parameters summarized in Table 3 form the basis for each attribute and transaction price, a comparison of the linear regression equation that will be used to predict the market value of the real estate being valued.
The valuation was carried out for the land with commercial buildings, with qualitative attributes identical to the ones included in the database, but with the following quantitative attributes: -land surface area -5,600 m 2 , -office space -220 m 2 , -commercial space -520 m 2 , -warehouse space -720 m 2 .
All the calculated parameters of the regression line and the predicted market value for real estate with the above-listed attributes are demonstrated in Table 4. In order to compare the obtained property valuation result with the estimated value according to the parametric model, appropriate matrices were defined using the values contained in Table 2. Having applied the matrix calculations, according to Subsection 4.6 of the research paper Parzych and Czaja (2015), estimation of the parameters of the Gauss-Markov model was performed, and then, on their basis, the market value of the analyzed property was calculated. As a result of this valuation, the market value of EMV = PLN 4,424,600 was obtained. When compared to the value contained in Table 5, it appears that the difference between the estimated market values according to the two analyzed models does not exceed 2%.
If the weights of the reliability of the attributes are standardized, i.e. ˆ/ j j j j p p p = ∑ , and these values are multiplied by EMV = PLN 4,341,209 and then divided by the value of the analyzed attribute (surface area), then the unit market value for specific real estate components is obtained. All the calculations aimed at obtaining the unit market value have been carried out in Table 5. In order to assess the conformity of the valuation model with the database of comparable real estate, for each real estate in the database its market value (EMV i ) was calculated. Standard deviation (σ n ) from differences in transaction prices (C i ) and estimated market values (EMV i ) is a measure of the inaccuracy of estimating the market value of the analyzed property. The non-compliance ratio of the valuation model with the database of comparable real estate was calculated according to formula (24) and summarized in Table 6.  The value of the coefficient λ = 0.12 demonstrates that the market value estimated according to the multiple two-dimensional linear regression procedure may be determined with an inaccuracy of less than 12%. The value of this coefficient leads to an interval ( ) 0.90 1 0.85 ≥ − λ > , which means that the compliance of the valuation model with the database of comparable real estate is at a fairly high level.

Numerical Example of the Valuation of a Land Property Developed with a Building Intended for Trade and Services
The land properties selected for the market analysis were those developed with buildings intended for trade and services, located within the valued property and being the subject of trading in the last six months. The analysis was made of the information on the nine market transactions, in which the land properties were described using six price-setting attributes. The scales for optional attributes were established on the following four levels: 3 -very advantageous, 2 -advantageous, 1 -average, 0 -disadvantageous. Table 7 contains information on the attributes (features) of land properties developed with buildings intended for trade and services as well as on their rents and unit transaction prices. The subject of the valuation is a land property developed with buildings intended for trade and services, which has the attributes listed in the last row of Table 8. It is more beneficial to determine the unit transaction price for each analyzed property rather than consider the total transaction price as then the further analysis of the attribute may be omitted [usable floor space of premises].
After using the Excel application, all the statistical elements necessary for the implementation of formula (11) were determined and their values included in Table 8. The final formula for the unit market value has the following form: All the calculations presented above were performed with the Excel application.

Conclusions
The procedure presented for estimating market value of land developed with commercial buildings is based on the assumptions of statistical analysis of the market. The derived formulas and substantively justified algorithms may form the basis for market analysis and estimation of the market value of commercial real estate. The database for market analysis and valuation should contain information regarding at least six properties similar to the one being valued, which were subject to real estate transactions on the market.
The proposed procedure for estimating the market value of real estate is based on multiple models of two-dimensional regression, defined by the pairs: attributetransaction price, the parameters of which are estimated according to the method of least squares. The algorithms of this procedure allow us to take into account the lack of some components of quantitative attributes in the considered pairs. The result of estimating the market value of a property can be assessed in terms of its inaccuracy, and then the assessment of the conformity of the valuation model with the database of comparable real estate can be performed.
The analysis of the presented algorithms made on the basis of the two numerical examples demonstrated that the procedure proposed for estimating market value will always yield results that are consistent with the Gauss-Markov model. All the calculations occurring in this procedure may be performed with the widely available Excel software application.