Spatial Data Foundations
Date Modified: May 6, 2024
Authors: Mitchell Manware , Kyle P Messier
Key Terms: Geospatial Data
Programming Language: R
Motivation
Environmental health research relies on various types of data to accurately measure, model, and predict exposures. Environmental data are often spatial (related to the surface of the Earth), temporal (related to specific time/period of time), or spatio-temporal (related to the surface of the Earth for a specific time/period of time). These data are at the core of environmental health research, but the steps between identifying a spatial data set or variable and using it to help answer a research question can be challenging.
The spatial data foundations vignettes are designed to introduce the necessary steps for conducting analyses with spatial data in R. They will introduce R packages that are equipped to handle spatial data, and will demonstrate how to access, import, and analyze three different types of spatial data. The vignettes will focus primarily on spatial data, but some aspects of temporal and spatio-temporal data will also be discussed.
Objectives
Users will learn about the following topics related to spatial data in R:
- Point, polygon, and raster data types
- Downloading data from a URL
- Importing data
- Checking data type, structure, and class
- Reclassifying data
- Computing summary and zonal statistics
- Plotting individual and multiple data sets
Data Types
These vignettes will cover how to access, import, and analyze point, polygon, and raster spatial data types. The details of what constitutes each unique spatial data type, however, will not be covered.
For detailed descriptions of each type of spatial data, please see Simple Features for R (Pebesma (2018)) for point and polygon data types, and Introduction to Raster Data (Introduction to Raster Data (2023)) for raster data.
Data Sources
The exploratory analyses utilize free and publicly available environmental data. The code chunks are designed to access each specific file used for the exploratory analyses, but a description of each data source and data set is available below.
Producer | Data | Data Type |
---|---|---|
Environmental Protection Agency (EPA) | PM2.5 Daily Observations | Point |
National Oceanic and Atmospheric Administration (NOAA) | Wildfire Smoke Plumes | Polygon |
United States Census Bureau | United States Cartographic Boundary | Polygon |
National Oceanic and Atmospheric Administration (NOAA) | Land Surface Temperature | Raster |
Packages
Various R packages can be used to create, import, analyze, and export spatial data. If you have not used these packages previously, they may not be installed on your machine. The following chunk of code installs and imports the packages required to conduct the exploratory analyses in this vignette.
Installing and importing new packages may required R to restart.
vignette_packages <- c(
"dplyr", "ggplot2", "ggpubr", "sf",
"terra", "tidyterra", "utils"
)
for (v in seq_along(vignette_packages)) {
if (vignette_packages[v] %in% installed.packages() == FALSE) {
install.packages(vignette_packages[v])
}
}
library(dplyr)
library(ggplot2)
library(ggpubr)
library(sf)
library(terra)
library(tidyterra)
library(utils)
ggplot2
and ggpubr
The ggplot2
and ggpubr
packages will be used throughout the vignette for creating publication quality plots. Please see ggplot2: Elegant Graphics for Data Analysis (3e) (Wickham (2016)) and ggpubr: ‘ggplot2’ Based Publication Ready Plots (Kassambara (2023)) for in depth descriptions of the syntax and functionality utilized by these packages.
The exploratory analyses performed in this vignette are designed for educational purposes only. The results of the following analyses are not peer-reviewed findings, nor are they based on any hypotheses.
Coordinate reference systems and projections
Coordinate reference systems (CRS) are important for spatial analyses as they define how spatial data align with the Earth’s surface (Lovelace, Nowosad, and Muenchow (2019)). Transforming (projecting) the data to a different CRS may be necessary when combining multiple datasets or creating visuals for particular areas of interest. It is important to note that transforming spatial data can cause distortions in it’s area, direction, distance, or shape (Lovelace, Nowosad, and Muenchow (2019)). The direction and magnitude of these distortions vary depending on the chosen CRS, area of interest, and type of data (Steinwand, Hutchinson, and Snyder (1995)). For guidance on selected the right coordinate reference system based on the data, area of interest, and analysis goals, see Choose the right projection (Choose the Right Projection (2023)).
For the following analyses which focus on the coterminous United States, the Albers Equal Area projection (EPSG Code: 5070) will be utilized.