Simplify Workflows of D6 Research Projects

tutorial rstats workflow data management

Learn how to use the {d6} package to follow the project workflow within the department “Ecological Dynamics” at the Leibniz Institute for Zoo and Wildlife Research. The package functionality allows you to set up a standardized folder structure, to use templates for standardized reports, and provides a corporate theme for ggplot2 and some helpful utility functions.

Cedric Scherer https://cedricscherer.com (IZW Berlin)https://ecodynizw.github.io/
2020-12-09

The {d6} package aims to simplify workflows of our D6 research projects by providing a standardized folder structure incl. version control, Rmarkdown templates, and other utilities.

There are five main functionalities:

  1. Create standardized project directories with new_project()
  2. Install a set of common packages with install_d6_packages()
  3. Create figures that match our lab identity via theme_d6()
  4. Provide custom Rmarkdown templates via File > New File > Rmarkdown... > From Template
  5. Render all your Rmarkdown documents to ./docs/report with render_all_reports() or render_report()


The function simple_load() is a utility function that is currently in an experimental state. It allows you to install (if not yet) and load a set of packages—even a combination of CRAN and GitHub packages—in a single step.




Installation

The package is not on CRAN and needs to be installed from GitHub. To do so, open Rstudio and run the following two lines in the console. In case the {remotes} package is already installed, skip that step.

install.packages("remotes")
remotes::install_github("EcoDynIZW/d6")

(Note: If you are asked if you want to update other packages either press “No” (option 3) and continue or update the packages before running the install command again.)




Create Project Directory

Run the function new_project() to create a new project. This will create a standardized directory with all the scaffolding we use for all projects in our department. It also add several files needed for documentation of your project.

To start a new project in the current working directory, simply run:

d6::new_project("unicornus_wl_sdm_smith_j")

Please give your project a unique and descriptive name: species_country_topic_name

For example, when John Smith is developing a species distribution models for unicorns in Wonderland, a descriptive title could be: unicornus_wl_sdm_smith_j. Please use underscores and the international Alpha-2 encoding for countries.

The main folders created in the root folder (here unicornus_wl_sdm_smith_j) are the following:

.
└── unicornus_wl_sdm_smith_j
    ├── data
    ├── docs
    ├── output
    ├── plots
    └── scripts

The full scaffolding structure including all sub directories and additional files looks like this:

. 
└── unicornus_wl_sdm_smith_j
    ├── .Rproj.user          —  Rproject files
    ├── data                 —  main folder data
    │    ├── geo             —  main folder spatial data (if `geo = TRUE`)
    │    │    ├── processed  —  processed spatial data files
    │    │    └── raw        —  raw spatial data files
    │    ├── processed       —  processed tabular data files
    │    └── raw             —  raw tabular data files
    ├── docs                 —  documents main folder
    │   ├── admin            —  administrative docs, e.g. permits 
    │   ├── literature       —  literature used for parametrization + manuscript
    │   ├── manuscript       —  manuscript drafts (main + supplement)
    │   ├── presentations    —  talks and poster presentations
    │   └── reports          —  rendered reports
    ├── output               —  everything that is computed (except plots)
    ├── plots                —  plot output
    ├── scripts              —  script files (e.g. .R, .Rmd, .Qmd, .py, .nlogo)
    │   ├── 00_start.R       —  first script to run after project setup
    │   └── zz_submit.R      —  final script to run before submission
    ├── .gitignore           —  contains which files to ignore for version control
    ├── .Rbuildignore        —  contains which files to ignore for package builds
    ├── DESCRIPTION          —  contains project details and package dependencies
    ├── NAMESPACE            —  contains context for R objects
    └── project.Rproj        —  Rproject file: use to start your project

Use A Custom Root Directory

You don’t need to change the working directory first—you can also specify a path to a custom root folder in which the new project folder is created:

## both work:
d6::new_project("unicornus_wl_sdm_smith_j", path = "absolute/path/to/the/root/folder")
## or:
d6::new_project("unicornus_wl_sdm_smith_j", path = "absolute/path/to/the/root/folder/")

The resulting final directory of your project would be absolute/path/to/the/root/folder/unicornus_wl_sdm_smith_j.

Use Version Control

If you want to create a GitHub repository for the project at the same time, use instead:

d6::new_project("unicornus_wl_sdm_smith_j", github = TRUE)

By default, the visibility of the GitHub repository is set to “private” but you can also change that:

d6::new_project("unicornus_wl_sdm_smith_j", github = TRUE, private_repo = FALSE)

Note that to create a GitHub repo you will need to have configured your system as explained here.

Setup without Geo Directories

If your project does not (or will not) contain any spatial data, you can prevent the creation of the directories geo-raw and geo-proc by setting geo to FALSE:

d6::new_project("unicornus_wl_sdm_smith_j", geo = FALSE)

Add Documentation to Your Project

After you have set up your project directory, open the file 00_start.R in the R folder. Add the details of your project, fill in the readme, add a MIT license (if needed) and add package dependencies.




Install Common Packages

You can install the packages that are most commonly used in our department via install_d6_packages():

Note that this function is going to check pre-installed versions and will only install packages that are not installed with your current R version.

Again, there is an argument geo so you can decide if you want to install common geodata packages as well (which is the default). If you are not intending to process geodata, set geo to FALSE:

d6::install_d6_packages(geo = FALSE)

The default packages that are going to be installed are:

tidyverse (tibble, dplyr, tidyr, ggplot2, readr, forcats, stringr, purrr), lubridate, here, vroom, patchwork, remotes

The following packages will be installed in case you specify geo = TRUE:

sf, terra, stars, tmap




Corporate ggplot2 Theme

The package provides a ggplot2 theme with sensible defaults and additional utilities to simplify the process of creating a good-looking, clean look. Furthermore, we aim to have a consistent look across all our figures shown in manuscripts, presentations, and posters.

The theme can be added to a ggplot object as usual:

library(ggplot2)
ggplot(mpg, aes(x = displ, y = cty)) +
  geom_point() +
  d6::theme_d6()

Or set as the new global theme by overwriting the current default:

Typefaces

The D6 corporate theme uses the PT font super family and will inform you to install the relevant files in case they are missing on your machine:

By default, the theme uses PT Sans. If you prefer serif fonts or such a typeface is required, you can set serif = TRUE inside the theme:

ggplot(mpg, aes(x = displ, y = cty)) +
  geom_point() +
  d6::theme_d6(serif = TRUE)

Additional Utility Arguments

In addition to the common arguments to specify the base settings (base_family, base_size, base_line_size, and base_rect_size), we have added the following utility settings to simplify the modification of the theme:

ggplot(mpg, aes(x = class, y = hwy, color = factor(year))) + 
  geom_boxplot() +
  d6::theme_d6(
    grid = "y",
    legend = "top",
    mono = "yl",
    bg = "cornsilk"
  )




Use Custom Rmarkdown Templates(

The package also provides several templates for your scripts. In Rstudio, navigate to File > New File > RMarkdown... > Templates and choose the template you want to use. All templates come with a pre-formatted YAML header and chunks for the setup.

The following templates are available for now:




Render Rmarkdown Files to Reports

The render_*() functions take care of knitting your Rmarkdown files into HTML reports. The functions assume that your .Rmd files are saved in the R directory or any sub directory, and will store the resulting .html files in the according directory, namely ./docs/reports/.

You can render all .Rmd files that are placed in the R directory and sub directories in one step:

You can also render single Rmarkdown documents via render_report():

d6::render_report("my-report.Rmd")
d6::render_report("notsurewhybutIhaveasubfolder/my-report.Rmd")




Install and Load a Set of Packages

The simple_load() function takes a vector of packages, checks if they are installed already, installs them if needed, and loads them via library() afterward. You can provide both, CRAN and GitHub packages, at the same time. GitHub packages need to be specified as “owner/repository” without any spaces.

d6::simple_load(pcks = c("dplyr", "ggplot2", "EcoDynIZW/d6berlin"))

You can also force a re-installation of packages. CRAN and GitHub packages are controlled individually via update_cran and update_gh, respectively.

d6::simple_load(pcks = c("dplyr", "ggplot2", "EcoDynIZW/d6berlin"),
                update_cran = TRUE, update_gh = TRUE)




Acknowledgements:

This package would not exist without the work of many great people!

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Citation

For attribution, please cite this work as

Scherer (2020, Dec. 9). Ecological Dynamics: Simplify Workflows of D6 Research Projects. Retrieved from https://ecodynizw.github.io/posts/d6package/

BibTeX citation

@misc{scherer2020simplify,
  author = {Scherer, Cedric},
  title = {Ecological Dynamics: Simplify Workflows of D6 Research Projects},
  url = {https://ecodynizw.github.io/posts/d6package/},
  year = {2020}
}