Spatial data are inherently important in environmental applications. An example is collecting data from air or water quality sensors. Such data collection mechanisms introduce dependence in the collected data due to their spatial proximity/distance. This dependence must be taken into account not only in the data analysis stage (and there is a good statistical literature on spatial data analysis methods), but also in the design of experiments stage. One example of a design question is where to locate the sensors and how many sensors are needed?
Where does explain vs. predict come into the picture? An interesting 2006 article by Dale Zimmerman called “Optimal network design for spatial prediction, covariance parameter estimation, and empirical prediction” tells the following story:
“…criteria for network design that emphasize the utility of the network for prediction (kriging) of unobserved responses assuming known spatial covariance parameters are contrasted with criteria that emphasize the estimation of the covariance parameters themselves. It is shown, via a series of related examples, that these two main design objectives are largely antithetical and thus lead to quite different “optimal” designs”
(Here is the freely available technical report).