Unfolding the universe of possibilities..

Whispers from the digital wind, hang tight..

In-Depth Guide to Creating and Publishing an R Data Package Using Devtools

A step-by-step account of developing my “Richmondway” R Data package, featuring the Expletives Count by Roy Kent

Photo by Erda Estremera on Unsplash

When I was invited to speak at the 2023 Posit conference to present on storytelling with animation and interactivity, I spent months deliberating on the perfect dataset. It seemed every intriguing dataset had been exhausted, and I wasn’t inspired to base my presentation on a mundane one. Then one day, marathoning through episodes of Ted Lasso, an American sports comedy-drama television series, and chuckling at the eloquently placed expletives from Roy Kent sparked an idea. I rewatched the series (at 2x speed, mind you) and tallied each time Roy used or gestured the word “F**k”. That became my dataset! In this article, I will walk you through the detailed steps I took to turn this dataset into an R Data package, allowing you to easily create one yourself.

Welcome to the making of Richmondway, my very first R Data package that allows you to download the data to delve into the intricate details of Roy Kent’s lexicon, episode by episode, season by season. Finally, answering the question once and for all that nobody asked — Which season did Roy Kent say F**k the most?

Source: Posit Conference 2023 presentation by Deepsha Menghani

Why did I Create a Package?

I’ve been eager to learn how to create an R package. Starting with a simple data package seemed an excellent first endeavor.Embedding data for function testing is essential. It familiarizes users with package functionalities — a step I’d need for any complex packages in the future.This dataset was too entertaining to keep to myself. Packaging it ensured easy access for everyone post-conference.

So, whether you’re curious about R Data package creation or you’re a “Ted Lasso” aficionado, brew some tea and let’s dive in!

Detailed steps I used to create the R Data Package

Step 0: The Dataset and Package Name

Here’s what a snapshot of my dataset, named “richmondway”, looks like. It has 34 rows corresponding to every episode and 16 columns with various values that are described in the package we will be creating.

https://medium.com/media/d1fd8ed9261ef991952f3e4f3424abff/href

Naming your package is like naming a pet — it is extremely special. While you’ll want to choose a memorable name, also ensure it’s simple, especially if you intend to make it publicly available. I named mine Richmondway — a nod to AFC Richmond, the football club Roy Kent used to play for in Ted Lasso. And, because it starts with “R”, which is a lucky coincidence. I also wanted the name to be a clear enough indicator of what is inside the package.

Step 1: Install the Toolkit

Install these R packages: devtools, usethis, and roxygen2. These make it extremely easy to structure and document your new package.

install.packages(c(“devtools”, “usethis”, “roxygen2”))

Step 2: Create a New Package as a Project

There are two ways you can create a package using devtools. Devtools takes care of a lot of initial package structure set up needed.

Method 1: Directly from the RStudio consoleScreen cast of project creation for R package using devtoolsMethod 2: With usethis package

With usethis::create_package() command, you can directly create a new package by providing the path where you want to create the package directory. For the rest of this article I will continue to showcase other usethis commands that make it simpler and faster to go through a lot of package creation and documentation steps.

usethis::create_package(“path/richmondway”)

You’ve just created a folder with the bare necessities for an R package. If you peek inside, you’ll find some mysterious files. No worries, we’ll get to know them one by one. Here are the files you will see get created as part of your project.

Initial home directory of project

Step 3: Add the Dataset

I had saved the dataset in my local environment as “richmondway”. Running the command below adds a ‘/data’ directory to the root of your package and places a “.rda” file in it.

usethis::use_data(richmondway)A single file named “richmondway.rda” within the “data” folder

Step 4: Create the Data Dictionary `data.R`:

This is where you describe your dataset. Trust me, the better the description, the easier it is for others to unlock its potential. This will feed into your documentation in the later steps as well. You can create this file using the command below and then later update it with all the required details.

usethis::use_r(“data”)

This command will create a data.R file inside the R folder. Update the contents of this file to contain the format and columns available in your dataset. While there are many more columns in my dataset, for this example I am just showing three columns below. You should add the description of all columns as clearly as possible as it will appear in your dataset package documentation. The below format is used inside the data.R file to create the descriptions.

#’ Data to showcase f**k count
#’
#’ A dataset containing the number of times the word f**k was used in Ted Lasso by Roy Kent.
#’
#’ @format A data frame with 34 rows and 16 columns.
#’ describe{
#’ item{Character}{Single value – Roy Kent}
#’ item{Episode_order}{The order of episodes from the 1st to the last}
#’ item{Season}{The season 1,2 or 3 associated with the count}
#’ }
#’ @source Created by Deepsha Menghani by watching the show and counting the number of F**ks used in sentences and as gestures
#’
#’ @examples
#’ data(richmondway)
“richmondway”

Let’s break this file down into its components:

Description Comments:#’ Data to showcase f**k count
#’
#’ A dataset containing the number of times the word f**k was used in Ted Lasso by Roy Kent.

This is a short title and description of the dataset. Comments starting with #’ are used to annotate R objects in a special way that they will be picked up by the roxygen2 package, which is used in R to produce documentation.

Format Comment:#’ @format A data frame with 34 rows and 16 columns.

This specifies the format of the data. In this case, the dataset is a data frame with 34 rows and 16 columns.

Description of Variables:#’ describe{
#’ item{Character}{Single value – Roy Kent}
#’ item{Episode_order}{The order of episodes from the 1st to the last}
#’ item{Season}{The season 1,2 or 3 associated with the count}
#’ }

This part provides a detailed description of some of the main variables/columns in the data frame. The describe block is used to list out and describe each variable that is represented by an item tag, with the name of the variable and its description.

Source Comment:#’ @source Created by Deepsha Menghani by watching the show and counting the number of F**ks used in sentences and as gestures

This provides information about the origin or source of the data. It’s important to credit the creator and provide context on how the data was gathered.

Examples Comment:#’ @examples #’ data(richmondway)

This provides an example of how users can access and use the dataset. In this case, it simply demonstrates how to load the data into R once they will install your package.

Data Name:”richmondway”

This is the name of the dataset. It’s in quotes because it indicates that this documentation is associated with the dataset of that name in the package.

When users install and load your package in R and then type ?richmondway, they’ll see this documentation presented in a structured format, helping them understand what the dataset is about, its structure, and how to use it.

Step 5: Update the “DESCRIPTION” File

The description file is a more higher-level documentation of the package. Have a look at the DESCRIPTION file in your package home folder, it should come pre-populated with instructions and needs to be updated to the correct descriptions. The fields I updated in my description file are below, the rest I left it to default.

Package: richmondway
Title: A dataset containing the number of times the word f**k was used in Ted Lasso by Roy Kent
Authors@R: person(“Deepsha”, “Menghani”, email = “ab*@gm***.com”, role = c(“aut”, “cre”))
Description: Downloads the dataset containing the number of times the word f**k was used in Ted Lasso by Roy Kent.
License: file LICENSE

Step 6: Create the “LICENSE” file

This file is like your package’s CV. Note that in my DESCRIPTION file, I refer to the file called LICENSE. This file doesn’t yet exist, so we will now create it. This will refer to a file where your license information is stored. The license information tells the user of the package how to use the data made available through this package, aka, the rights. For my case, I used the CC0 license and added the standard description to my LICENSE file using the commands below.

license_text <- ‘CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>’

writeLines(license_text, con = “packagepath/LICENSE”)Newly created LICENSE file is highlighted in the project home directory

Step 7: Load the Documentation

Now that all the files have been created and updated, we load the documentation using this handy command below. This command will create documentation using the data.R file where we added the description of the data. This documentation is placed inside a newly created folder called man that stands for “manual”.

devtools::document()Newly created “man” folder is highlighted in the project home directory

Once the documentation command is run, you can use the help command ?richmondway, and it should open up the package documentation. Make sure the documentation is clear and the necessary details from the data.R file are showing up as expected.

Step 8: Check

You can now test that everything is running smoothly and created perfectly by running the command below. This command performs a wide array of checks to ensure the consistency and validity of your package.

devtools::check()

The output of devtools::check() gives you NOTES, WARNINGS, and ERRORS, each requiring varying degrees of attention:

ERRORS: Must be fixed immediately as they indicate major issues.
WARNINGS: Should be addressed to ensure functionality and CRAN compliance.
NOTES: Provide helpful advice and suggestions, and sometimes need to be addressed for CRAN submissions.

Step 9: Install the package locally and test it

The below command from devtools can be used to install the package locally. Then, use the same method you shared in the example within data.R file to test the data access.

devtools::install() # Install the package locally
data(richmondway) # Access the data through the package

Step 10. Publish your brand new package to GitHub

Now we need to initialize our Git repository and push the package to GitHub using some more handy usethis commands. Before running these commands, make sure you have a GitHub account and that you have set up SSH keys or personal access tokens (PAT) for use with GitHub.

usethis::use_git() # Git integration
usethis::use_github() # Github integrationWhat does usethis::use_git() doInitializes Git: This function initializes a new Git repository in your project.First Commit: It makes an initial commit with the current state of the project.What does usethis::use_github() doGitHub Repository Creation: This function helps in creating a new GitHub repository and connects your local Git repository to the remote GitHub repository.Authentication: Helps in setting up authentication with GitHub. It checks if you are authenticated with GitHub. If not, it might prompt you to do so.Push: Pushes your local git commits to the remote GitHub repository.

Step 11: And finally share your Package with the world

You can now share your package repository link. Anyone can install your package directly from GitHub using devtools::install_github(“your_username/packagename”) and access the data.

For example, the data from richmondway within my GitHub repository package can be accessed with the following command:

devtools::install_github(“deepshamenghani/richmondway”)

You made it!

If you’ve reached this point, congratulations! You’ve just turned a fun binge-watching session into something both educational and delightful.

Feel free to play with this fun dataset and tag me in your analysis and visualizations. Or you can fork richmondway on GitHub and contribute to the Roy Kent lexicon. Keep on packaging on!

Resources

Richmondway package repositoryR Packages by Hadley Wickham and Jennifer BryanRoxygen2 DocumentationUsethis PackageDevtools Package

Happy coding! If you’d like, find me on Linkedin.

In-Depth Guide to Creating and Publishing an R Data Package Using Devtools was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment