ABSTRACT

Andrew J. Plantinga and Elena G. Irwin Introduction Empirical models of land use and land-use change are critical for testing theories of land use and informing policies aimed at managing land-use change. Empirical models have been used to identify the causes of a particular distribution of land use and the factors that drive land-use change. For example, these models can be used to test the extent to which net returns to alternative uses and physical characteristics of land (i.e., land quality) influence land-use decisions. Empirical models of land-use change are based on the theoretical models reviewed in Chapter 6, which date back to von Thünen’s (1826) spatial model of land use in the mid-19th century. Von Thünen’s key insight, that differences in transportation costs are capitalized into land values and generate a spatially heterogeneous pattern of land rents, underlies the classic urban bid-rent model developed by Alonso (1964) and Muth (1969). Differences in urban land rents are explained by differences in households’ costs of commuting to a centrally located employment center. Given that the costs of commuting to a central business district (CBD) are an important component of a household’s budget, a household’s net income decreases with increasing distance from the CBD and the rent the household is willing to pay will be lower at a location that is farther from the CBD than at a location nearer to the center. This generates the downward-sloping ‘bid rent gradient,’ in which urban land rents are hypothesized to decrease with distance from the CBD, that is a key component of empirical models of urban land use. Other theoretical models that also build on the von Thünen approach and that have provided a basis for empirical models include Barlowe (1958), Found (1971), and Capozza and Helsley (1989). While empirical models of land use and land-use change have a common theoretical underpinning, the structure of the models varies according to the data used for estimation and the research question. For example, aggregate data of land use, e.g. at the county-level, are often times the only data available to a researcher and therefore the empirical model is oriented to explaining land use and land-use change at a regional level rather than at an individual parcel level. In other cases, randomly-sampled plot-level data may be necessary to explore plot-level determinants of land-use change, e.g. the importance of on-site physical characteristics such as soil type and slope as well as location features such as spillover effects from surrounding land uses. On the other hand, if the contiguous

spatial pattern of land use is the primary research interest, then a complete dataset of the entire population of parcels within a geographic region – rather than just a random sample – would be important. These considerations suggest the following categorization of empirical models: (1) aggregate data models, (2) sample plot data models, and (3) parcel data models. In this chapter, we provide an overview of these models and a discussion of related econometric and modeling issues. Aggregate Data Models We begin the discussion of aggregate data models with a description of common sources of aggregate land-use data for the United States. We then present the basic aggregate land-use shares model that forms the centerpiece of econometric analysis with aggregate data. Next, we present a discussion of several econometric and modeling issues that arise with the basic model, including how to measure net returns to alternative uses, the importance of controlling for spatial differences in land quality, modeling of dynamic land-use decisions, and, finally, the estimation of disaggregated shares models. Aggregate Data Sources The majority of econometric land-use models have been estimated with aggregate data. One obvious appeal of using aggregate data is its low cost. As discussed in Chapters 2-5, a variety of federal government agencies collect data on land use in the U.S. with both times-series and cross-sectional observations. The U.S. Department of Commerce conducts the Census of Agriculture, which provides county-level data on farmer-owned land.1 For instance, the Census of Agriculture reports the area of cropland (by crop type), pastureland, and woodland for each county and approximately each five years. Note, however, that Census of Agriculture data on forest area is incomplete since it only reports farmer-owned woodland. Many agricultural states (Iowa, Wisconsin, etc.) collect these data on an annual basis through state Agricultural Reporting Services. The Forest Service, an agency within the U.S. Department of Agriculture, collects data on all forestland in the U.S. through its Forest Inventory and Analysis (FIA) unit. FIA inventories are conducted on a state-by-state basis and on a cycle that varies by state but is typically in the five to fifteen year range.2 The inventories provide county-level estimates of forest area, disaggregated by owner, species, and additional forest characteristics. Due to the nature of their sampling design, the FIA does not collect detailed information on non-forest uses. Thus, in applications where non-forest uses are of interest, researchers often combine Census of Agriculture data on agricultural land uses with FIA data on forest land uses to yield an aggregate (county-level) data panel (e.g., Hardie and Parks 1997; Mauldin et al. 1999). The Bureau of the Census produces estimates of urbanized land area based on the population census. Prior to 1990, only state-level estimates are reported. The

1990 and 2000 censuses, however, provide estimates at the county level. These estimates are not based on observed or reported land use but on population density within a specified geographic area. Thus, these estimates are not consistent with Census of Agriculture and Forest Service data because agricultural and forested land may be found within an area classified as urban. As an alternative, researchers often compute the area of land in urban and other uses as the difference between the total land area of a county and the agricultural and forest land areas. The Basic Shares Model The most common aggregate data model involves estimating the relationship between shares of land in alternative uses and hypothesized determinants of land use (e.g., Lichtenberg 1989; Parks and Murray 1994; Parks and Kramer 1995; Wu and Brorsen 1995; Wu and Segerson 1995; Hardie and Parks 1997; Mauldin et al. 1999; Plantinga et al. 1999). If county-level data is employed, then the land-use shares would be defined as the per cent of total county area devoted to given uses (e.g., the share of county land in agricultural use). The observed share for land-use k (k=1,…,K), in county i (i=1,…,I), and at time t (t=1,…,T) can be expressed as yikt=pikt+εikt, where pikt is the expected share of land allocated to use k, and εikt is a random error term with mean zero. The expected share, pikt, represents the optimal land allocation given economic and other conditions prevailing in time t. The actual land allocation observed at time t, yikt, may differ from the optimal allocation due to random occurrences such as bad weather or unanticipated price changes. These random events are assumed to have a zero mean, implying E[yikt]=pikt. The expected shares are assumed to be a function of a vector of explanatory variables, Xit, and unobserved parameters to be estimated, βk. From the theory in Chapter 6, aggregate land-use shares depend on the net returns to alternative uses as well as the distribution of land quality in the county, the measurement of which is discussed below. Roughly speaking, the βs measure the effect of the explanatory variables on the expected shares. Researchers frequently use the following logistic specification of the expected share,

(1)

= K

e

ep

β ,

for all i, k, and t. This specification confines the land-use shares to the unit interval. Another advantage is that the model of observed shares can be transformed to yield an estimating equation that is linear in the parameters (see, for example, Chapter 19 in Judge et al. 1988). Specifically, the natural logarithm of each observed share normalized on a common and arbitrarily chosen share (below, yi1t) takes the form, (2) iktitktiikt Xyy µβ += '1 )/ln( ,

e

eps εε β

β +=+=

)Pr( '' ijtjiktkiktijt XX ββµµ −≤− for all j. Modeling the difference in the error terms, µijt-µikt, with a logistic (normal) cumulative distribution function leads to a logit (probit) model of land-use choice. If a time series of sample plot data is available, then (6) can be modified in a straightforward way to model land-use transitions (see the discussion of parcel data models for more details). As an example, Kline and Alig (1999) use plot-level FIA data on broadlydefined land classes to estimate the probability that land changes from either farm or forest use to developed use in western Oregon and Washington. Independent variables include forest and farm rents, sociodemographic characteristics of the county in which the plot is located, and variables indicating the presence of zoning restrictions. Zoning restrictions may prohibit certain land uses and, thus, must be accounted for in the empirical analysis. A similar modeling approach is used by

McMillen (1989) to estimate the probability that land parcels in McHenry County, Illinois, are allocated to farm, residential use, or unimproved vacant lots. The data set consists of all parcels that were sold during the period 1979 to 1983. The independent variables include parcel characteristics such as size, neighborhood characteristics, and distances to important sites such as downtown Chicago. A number of recent applications make use of sample plot data from the NRI. Claassen and Tegene (1999) use NRI data for the Cornbelt region to estimate the probability that land is allocated to cropland, pasture, or the Conservation Reserve Program (CRP). Schatzki (1998) uses a similar approach to model CRP enrollment decisions in Georgia. Finally, Lubowski (2002) estimates a nationalscale model of land use that examines transitions between cropland, forest, pasture, range, urban, and CRP lands. Econometric and Modeling Issues The structure of plot-level models is similar to that of aggregate data models and, indeed, the plot-level model is equivalent to the aggregate data model under certain restrictions.8 From an econometric and land-use modeling perspective, however, there are advantages to using plot-level data. First, to the extent they are available, variables measuring plot-level characteristics such as land quality can be included in the econometric model. In aggregate data models, these characteristics must be represented using less precise aggregate variables. Second, if plots are resampled over time, observations of land-use transitions are provided and, in principle, these can be modeled explicitly. Time-series data are best suited to explaining changes in land use as the result of changes in the land-use determinants of interest (e.g., commodity prices). At present, the NRI and FIA databases provide relatively few observations over time. As more time-series observations are recorded, however, the relative advantage of using plot-level data will increase. Parcel Data Models If the contiguous pattern of land use within a region, and in particular the underlying spatial processes that generate these patterns, is of interest, then it is desirable to have data on the full population of land parcels within a region as opposed to a sampling of land plots (Bell and Irwin 2002). Note that aggregate data (e.g., county data) provide contiguous coverage of land use in a region but do not provide information on the spatial pattern of individual land uses. Thus, these data do not easily accommodate the exploration of spatial relationships between uses. Data Sources With the advent of Geographic Information Systems (GIS) to store and organize geographically-referenced data, land-use data of an entire population of parcels

within a specified geographic area have become more readily available. Increasingly, county tax auditors, state planning agencies, emergency service agencies, and other governmental entities are collecting and storing detailed data on parcel and building characteristics in an electronic format that has made it possible for researchers to compile parcel databases for counties. Attribute information from local tax assessment databases typically includes market transaction price(s), assessed values, current land use, zoning, lot size, location, and structural characteristics of any house or building on the parcel. In addition to public sources, parcel data for metropolitan areas may also be purchased from several national real estate companies. While these data make it possible to model land-use conversion at the level of the individual decision maker, acquiring and managing these data can be challenging. The availability of these data differs tremendously from state to state and, in many cases, from county to county. Often government agencies save only the most current information, so that changes over time in land use and other attributes are difficult to piece together. For example, local agencies do not always track a residential lot’s subdivision history, so that the researcher must discern which subdivided lots comprise the original parcel. Because one county will typically contain tens, and sometimes hundreds, of thousands of parcels, management of these data requires a GIS to store and organize data and to generate spatial variables. Other geographically-referenced data, including roads, cities and towns, recreational areas, soil quality, slope and elevation, school districts, etc., can also be acquired and overlaid with the parcel data, and GIS software can be used to generate a host of spatial variables to be included in econometric models. Again, availability of these data for a particular region varies greatly across states and sometimes across counties. Some of these data are available from federal government sources (e.g., the U.S. Bureau of the Census maintains Tiger Line files, from which roads, hydrology, Census tracts, and other geographic features can be extracted). Other federal government sources of GIS data include the US Geological Survey, the Environmental Protection Agency, and the U.S. Department of Housing and Urban Development. Data can increasingly be downloaded online from these agencies’ websites or from other online data sources.9 Model Specification Parcel data models of land use include those that explain land use, land values, and land-use conversion. All three types of models begin from the assumption that land is a heterogeneous good, comprised of a bundle of characteristics, and that the land use, value, or change can be estimated as a function of the parcel’s characteristics. An advantage of using parcel data in modeling land-use change is that the data are at the same level of resolution as the economic agent who makes the land-use conversion decision. This avoids problems of aggregation and the need to assume a representative agent and allows for a much more detailed investigation of the land-use pattern and change. In addition, because data are

available for a contiguous area, models can be estimated that account for spatial processes of land-use change and spatial interactions among nearby parcels. An important econometric issue that arises in the estimation of these models is the likely spatial autocorrelation of the error terms, which arises due to measurement error or unobserved variation that is positively correlated over space. This issue is discussed further in a later section. Discrete Choice Models of Land Use It is generally assumed that land is in a ‘productive’ use, implying that positive returns are generated from the use of the land (e.g. through agriculture or commercial forestry uses). Following Nelson and Hellerstein (1997), the net present discounted returns from productive land at parcel i in use k in period t can be written as:

t ∆ ≥∆+<≤=

lim)( 0

In the land-use conversion case, the hazard rate is usually the function of interest. In this context, the hazard rate can be defined as the conditional probability that a parcel is developed in period t, given that it has remained in an undeveloped state until time t. The hazard rate is typically modeled as a function of time and explanatory variables, some of which may be time-variant. Different assumptions are possible regarding the distribution of durations. Fully parametric models, including the exponential, Weibull, log-normal, loglogistic, and complementary log-log models, can be specified. In addition, a semiparametric approach, commonly referred to as the proportional hazards model or Cox regression model, is also possible. Irwin and Bell (Chapter 9) use this type of duration model to estimate a model of residential land conversion in which the

influence of parcel-level characteristics and local growth management policies on the timing of a parcel’s development are estimated. Other examples from the literature include Hite, Sohngen, and Tempelton (2002), who use a duration model to study the factors influencing the suburbanization of agricultural land in a rural-urban county of Ohio. They find that property taxes have varying effects of the timing of the development of parcels with varying land quality. Irwin and Bockstael (2002) use a duration model to estimate the effects of neighborhood land use on the conversion timing of undeveloped parcels to residential use in exurban areas. Because the neighborhood land use variables vary over time as conversion occurs, a duration model is needed to capture the influence of these time-variant attributes on the conversion probability. Nickerson and Bockstael (2001a,b) model the landowner's decision to preserve land in a farmland preservation program that results in permanent protection, given that development represents a competing (and equally irreversible) land-use alternative. Duration modeling techniques are used to shed light on those factors that affect the timing of preservation and development decisions. Spatial Econometric Issues A major econometric issue that arises in estimating parcel data models is spatial dependence.11 Spatial dependence refers to the notion that values associated with locations that are close-by are more correlated than values associated with locations that are farther apart. This condition may arise simply because neighboring sites tend to share many common features (e.g., they are both within close proximity to an urban center) or because of an underlying spatial process that causes neighboring sites to have similar values. For example, crime in one neighborhood may spillover into an adjacent neighborhood, causing both neighborhoods to experience high crime rates. Spatial dependence can also arise in aggregate and plot data models. However, in these cases, the scale of the data or the geographic dispersion of observations tends to mitigate the effects. Depending on the type of spatial dependence, several different econometric problems arise. First, because spatial data is often measured according to boundaries that do not correspond to the geographic extent of the spatial dependence, measurement errors are frequently present. If so, the errors associated with neighboring locations will be correlated. This condition, referred to as spatial autocorrelation, can also arise due to spatially correlated omitted variables within an econometric model. In either case, ordinary least squares (OLS) is an unbiased, but inefficient, estimator. Spatial econometric techniques involve positing the form of the spatial autocorrelation and rewriting the model so that the error structure is independent and identically distributed (i.i.d.). A second form of spatial dependence arises when values at different locations in space are interdependent (Anselin 1988): (10) yi = f(y1,…,yi-1,yi+1,…yN),

where yi is the observed value at location i and i = 1,…,N. As specified, this system is unidentifiable since it results in N2-N parameters with only N observations. In this case, spatial econometric techniques are used to impose structure on the spatial process represented by f so that only a limited number of parameters need to be estimated. Spatial dependence of this form, if left uncorrected, will lead to biased OLS estimates due to the correlation between the spatially lagged dependent variable and the error term. Established spatial econometric techniques are available for dealing with spatial autocorrelation and spatial lag structures for models with a continuous dependent variable. In both cases, a maintained hypothesis is made about the spatial structure (either of the errors or of the spatial lag) by means of an NxN spatial weights matrix. This matrix represents the researcher’s best guess of how the values (or errors) associated with different locations are related. Each element of the matrix, wij, represents the assumed spatial dependency between locations i and j. A variety of different structures are possible. For example, in the case of a lag, the researcher may hypothesize that only nearest neighbors interact with each other, in which case a nonzero value would be assigned to all wij in which i and j are nearest neighbors and wij equals zero otherwise. Alternatively, the dependence may be assumed to be a decreasing function of distance between any two locations, in which the weights can be assigned by means of an inverse distance function, wij= f(1/dij), where dij is the distance between i and j. Bell and Bockstael (2000) explore the consequences of spatial autocorrelation in a model of residential land values. The authors reason that this model is likely to suffer from an omitted variables problem that, in a spatial setting, will lead to spatial autocorrelation. Assuming that the form of the spatial autocorrelation is a first-order spatial autoregressive structure, the model is rewritten as:

(11a) y = Xβ + ε (11b) ε = ρWε + µ , where y is a vector of residual residential land prices, X is a vector of parcel-level characteristics, ε is an error vector with a zero mean and a non-spherical variancecovariance matrix σ2(I-ρW)–1(I-ρW′)–1, where ρ is the spatial autoregressive coefficient, W is spatial weights matrix, and µ is i.i.d., with a variance-covariance matrix σ2I. Several different specifications of W are used in estimating the model using both Generalized Methods of Moments and Maximum Likelihood techniques. In addition, to avoid the economic interpretation problems that arise with row standardization of a distance-decay spatial weights matrix, a series of higher-order contiguity matrices are used to represent the spatial dependencies with a more flexible form. Results from these estimations show that parameter estimates are sensitive to row standardization and the specification of W and that significance levels of some of the coefficients change when the correction for spatial autocorrelation is applied. While it is straightforward to apply these methods to estimating hedonic models of land values in which the dependent variable is a continuous variable,

application of these methods to discrete choice land-use change models is much more challenging. In theory, the spatial error autocorrelation may take a similar form as in the continuous case illustrated in Equation (11). However, rather than a continuous dependent variable y, the discrete choice model contains a binary or categorical dependent variable indicating the discrete state of land use or land-use change associated with a parcel. Building on the model laid out in Equation (6), a discrete choice model with spatial error autocorrelation can be expressed as: (12) )Pr()Pr()1Pr( 1 ijtiktitijtijtiktiktitit VVVVyy −<=+>+== − εµµ , where εit = µijt – µikt. The structure of the spatial error autocorrelation embedded in εt may take on any number of different forms, e.g., a first-order spatial autoregressive structure could be expressed as in Equation (11b). The result is a correlated error structure among neighboring observations: corr(εit, εht) > 0, where parcels i and h are neighbors. Even though the underlying spatial error structure may be the same in continuous variable and discrete choice models, the consequences of the resulting spatially correlated error covariance structure are more severe in a discrete choice setting. The added complexity arises because of the heteroskedasticity that is induced by the spatially correlated covariance structure that arises from spatial dependence. While heteroskedastic errors in a continuous model do not result in inconsistent estimates, they do lead to problems of inconsistency in discrete choice and duration models. As detailed by Fleming (2002), several approaches have been proposed for dealing with this problem in a discrete choice framework.12 Pinske and Slade (1998) have proposed a Generalized Methods of Moments estimator for the binary probit model that corrects for heteroskedasticity arising from a first-order autoregressive specification of spatial error autocorrelation. However, this method only corrects the inconsistency problem; the resulting estimates are still inefficient. As a result, hypothesis testing is invalid. To obtain both consistency and efficiency, full spatial information must be incorporated into the estimation procedure. In this case, incorporating the non-zero covariance structure implies that the likelihood function must be expressed in terms of an Ndimensional integral. Evaluation of this N-dimensional integral is computationally difficult and often limited to datasets with a small number of observations (i.e., 500 or less). For a full discussion of these issues and a discussion of an alternative approach using a weighted non-linear least squares estimator, see Fleming (2002). Less complicated strategies have been employed by others in the literature. For example, Nelson and Hellerstein (1997) estimate a multinomial discrete choice model of land use in which they eliminate suspected spatial error autocorrelation by using a spatial sampling routine that randomly selects a sub-sample of data points where no two sites in the sub-sample are considered neighbors. In addition, they construct a normalized measure of vegetative cover from the dependent variable of the original neighbors of each observation in the sample and include this as an explanatory variable. In doing so, they attempt to control for spatial

dependence that they surmise is due to the spatial lag effects associated with the vegetative cover variable. Irwin and Bockstael (2002) take an alternative approach to estimating a landuse conversion model with spatial lag effects. They hypothesize that land-use externalities among neighboring sites create interdependence among the conversion decisions of agents. Due to correlation between spatially correlated errors and the spatial lag variable, an identification problem arises that cannot be solved simply by assuming a spatial structure for the error terms or the interaction effects. An identification strategy based on bounding the spatial interaction term from above is used, so that the effect is identified only if the estimated interaction parameter is negative. Evidence of negative interactions among land parcels converted to residential use is found, which the authors argue leads to a ‘repelling’ effect among residential subdivisions and explains the scattered pattern of residential development in their study area. A second type of spatial effect that arises in models with spatial data is spatial heterogeneity, i.e., non-constant error variances across space. Correction for this type of nonstationarity can be carried out with the usual methods of correcting for heteroskedastic errors. However, in the case in which both spatial heteroskedasticity and spatial dependence occur, standard tests for heteroskedasticity may be misleading. With a single cross-section equation, spatial dependence and spatial heteroskedasticity may be observationally equivalent (Anselin 2001). Conclusions

In this chapter we have reviewed the major types of empirical economic models of land use and land-use conversion and the modeling and data issues that arise in each case. While the review is intended to be comprehensive of empirical economic models, it is not comprehensive of empirical land-use models in general. Many ‘non-economic’ empirical models of land-use change exist, some of which may be considered reduced form models that are motivated by assumptions regarding underlying economic processes.13 In addition, we have not reviewed simulation-based models of land use and land-use change. While some of these models are again outside of the realm of economic models, others have been developed based on economic theories of land use and land-use change. Examples of the latter include agent-based economic models of land-use change, in which the land-use behavior of profit-maximizing landowners is spatially distributed across a simulated landscape (for a review of these models, see Parker et al. 2001). Such models are useful for understanding the evolution of aggregate-level patterns of land use as a function of individual-level behavior in which interdependence among landowners (e.g., due to spatial externalities) is an important element. These models are a natural complement to empirical models because they offer a means to predict changes in aggregate-level land-use patterns using estimated parameters from an empirical model. By comparing these simulated predictions with observed patterns, it is possible to draw conclusions regarding the extent to

which the estimated individual effects generate changes at a regional scale in the land-use pattern. For example, Irwin and Bockstael (2002) use the estimated parameters from a duration model of residential land-use conversion to simulate the predicted pattern of development under two scenarios: one in which the estimated negative effects from neighboring development were included (along with the other estimated parameters) and the other in which these effects were restricted to be zero. The results illustrate the extent to which the negative development externalities are predicted to lead to an increased sprawl of residential development. Empirical models of land use and land-use change are critical for testing theories of land use. The evidence from the empirical literature strongly supports the notion that private land-use decisions are determined by the financial net returns to different land uses. As well, land quality is shown to consistently explain the aggregate distribution of land use. Less clear is the influence of private non-market benefits on land-use decisions. For example, a landowner may retain land in forest for recreational uses, even if it would be optimal to convert it to an alternative use based solely on market returns. Another issue deserving attention is the effect of uncertainty on land-use decisions. Given the conversion costs associated with switching land uses and uncertainty about future returns, one might expect there to be option values related to retaining land in its current use. Above, we cite several studies that have considered the influence of option values on landuse decisions, but this remains an open area for research. Empirical land-use models are also useful for examining policies aimed at managing land-use change. For example, researchers can test whether land-use decisions are affected by existing land-use policies such as zoning restrictions (e.g., Kline and Alig 1999; Cho et al. 2001) or by variables that might be influenced by future policies. In the latter case, econometric land-use models have been used to simulate the effects of subsidies and taxes that modify the net returns to alternative land uses. Plantinga et al. (1999) and Stavins (1999) use econometric land-use models to simulate the effects of policies to promote carbon sequestration in forests. In a similar fashion, Plantinga and Ahn (2002) analyze the effects of hypothetical land-use conversion and retention subsidies. Econometric models are particularly suited to the analysis of land-use policies because they are based on historical data and, thus, have the potential to capture the actual decisions made by private landowners facing returns to alternative uses. Notes 1 A website providing convenient access to Census of Agriculture, Bureau of Census,

and other federal government data is https://govinfo.kerr.orst.edu. 2 For each state inventory, a report is published presenting basic statistics on the forest

inventory, including forest area by county. For example, a recent inventory report for Wisconsin is Schmidt (1997). More detailed data can be obtained by accessing the raw inventory data. For eastern states, these data have been assembled in a consistent format referred to as the Eastwide Data Base.