Spatial Data Foundations

Profile-CMP Profile-SBS Profile-STU


Date Modified: May 6, 2024

Authors: Mitchell Manware author-mm, Kyle P Messier author-kpm

Key Terms: Geospatial Data

Programming Language: R

Motivation

Environmental health research relies on various types of data to accurately measure, model, and predict exposures. Environmental data are often spatial (related to the surface of the Earth), temporal (related to specific time/period of time), or spatio-temporal (related to the surface of the Earth for a specific time/period of time). These data are at the core of environmental health research, but the steps between identifying a spatial data set or variable and using it to help answer a research question can be challenging.

The spatial data foundations vignettes are designed to introduce the necessary steps for conducting analyses with spatial data in R. They will introduce R packages that are equipped to handle spatial data, and will demonstrate how to access, import, and analyze three different types of spatial data. The vignettes will focus primarily on spatial data, but some aspects of temporal and spatio-temporal data will also be discussed.

Objectives

Users will learn about the following topics related to spatial data in R:

  • Point, polygon, and raster data types
  • Downloading data from a URL
  • Importing data
  • Checking data type, structure, and class
  • Reclassifying data
  • Computing summary and zonal statistics
  • Plotting individual and multiple data sets

Data Types

These vignettes will cover how to access, import, and analyze point, polygon, and raster spatial data types. The details of what constitutes each unique spatial data type, however, will not be covered.

For detailed descriptions of each type of spatial data, please see Simple Features for R (Pebesma (2018)) for point and polygon data types, and Introduction to Raster Data (Introduction to Raster Data (2023)) for raster data.

Data Sources

The exploratory analyses utilize free and publicly available environmental data. The code chunks are designed to access each specific file used for the exploratory analyses, but a description of each data source and data set is available below.

Exploratory analyses data sources
Producer Data Data Type
Environmental Protection Agency (EPA) PM2.5 Daily Observations Point
National Oceanic and Atmospheric Administration (NOAA) Wildfire Smoke Plumes Polygon
United States Census Bureau United States Cartographic Boundary Polygon
National Oceanic and Atmospheric Administration (NOAA) Land Surface Temperature Raster

Packages

Various R packages can be used to create, import, analyze, and export spatial data. If you have not used these packages previously, they may not be installed on your machine. The following chunk of code installs and imports the packages required to conduct the exploratory analyses in this vignette.

Installing and importing new packages may required R to restart.

vignette_packages <- c(
  "dplyr", "ggplot2", "ggpubr", "sf",
  "terra", "tidyterra", "utils"
)

for (v in seq_along(vignette_packages)) {
  if (vignette_packages[v] %in% installed.packages() == FALSE) {
    install.packages(vignette_packages[v])
  }
}

library(dplyr)
library(ggplot2)
library(ggpubr)
library(sf)
library(terra)
library(tidyterra)
library(utils)

ggplot2 and ggpubr

The ggplot2 and ggpubr packages will be used throughout the vignette for creating publication quality plots. Please see ggplot2: Elegant Graphics for Data Analysis (3e) (Wickham (2016)) and ggpubr: ‘ggplot2’ Based Publication Ready Plots (Kassambara (2023)) for in depth descriptions of the syntax and functionality utilized by these packages.

The exploratory analyses performed in this vignette are designed for educational purposes only. The results of the following analyses are not peer-reviewed findings, nor are they based on any hypotheses.

Coordinate reference systems and projections

Coordinate reference systems (CRS) are important for spatial analyses as they define how spatial data align with the Earth’s surface (Lovelace, Nowosad, and Muenchow (2019)). Transforming (projecting) the data to a different CRS may be necessary when combining multiple datasets or creating visuals for particular areas of interest. It is important to note that transforming spatial data can cause distortions in it’s area, direction, distance, or shape (Lovelace, Nowosad, and Muenchow (2019)). The direction and magnitude of these distortions vary depending on the chosen CRS, area of interest, and type of data (Steinwand, Hutchinson, and Snyder (1995)). For guidance on selected the right coordinate reference system based on the data, area of interest, and analysis goals, see Choose the right projection (Choose the Right Projection (2023)).

For the following analyses which focus on the coterminous United States, the Albers Equal Area projection (EPSG Code: 5070) will be utilized.

Additional Resources

For additional resources pertaining to the packages used in this vignette, please see the following:

References

Choose the Right Projection. 2023. Esri. https://learn.arcgis.com/en/projects/choose-the-right-projection/.
Introduction to Raster Data. 2023. https://datacarpentry.org/organization-geospatial/01-intro-raster-data.
Kassambara, Alboukadel. 2023. Ggpubr: ’Ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. “Coordinate Reference Systems.” In Geocomputation with r. Chapman; Hall/CRC.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Steinwand, Daniel R, John A Hutchinson, and John P Snyder. 1995. “Map Projections for Global and Gontinental Data Sets and an Analysis of Pixel Distortion Caused by Reproiection.” Photogrammetric Engineering & Remote Sensing 61 (12): 1487–97.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.