Using R: Restart your R session

Don’t save your workspace

A few years ago I wrote this piece of advice about using R:

To everyone learning R: Don’t save your workspace.

When you exit an R session, you’re faced with the question of whether or not to save your workspace. You should almost never answer yes. Saving your workspace creates an image of your current variables and functions, and saves them to a file called ”.RData”. When you re-open R from that working directory, the workspace will be loaded, and all these things will be available to you again. But you don’t want that, so don’t save your workspace.

Loading a saved workspace turns your R script from a program, where everything happens logically according to the plan that is the code, to something akin to a cardboard box taken down from the attic, full of assorted pages and notebooks that may or may not be what they seem to be. You end up having to put an inordinate trust in your old self. I don’t know about your old selves, dear reader, but if they are anything like mine, don’t save your workspace.

What should one do instead? One should source the script often, ideally from freshly minted R sessions, to make sure to always be working with a script that runs and does what it’s supposed to. Storing a data frame in the workspace can seem comforting, but what happens the day I overwrite it by mistake? Don’t save your workspace.

Yes, I’m exaggerating. When using any modern computer system, we rely on saved information and saved state all the time. And yes, every time a computation takes too much time to reproduce, one should write it to a file to load every time. But I that should be a deliberate choice, worthy of its own save() and load() calls¹, and certainly not something one does with simple stuff that can be reproduced a the blink of an eye. Put more trust in your script than in your memory, and don’t save your workspace.

Restart your R session

Here is a sequel to that advice: Restart your R session.

It is very tempting to just press “Source” or “Run All” again in RStudio to run the script from the top, but be kind to yourself and first restart your R session.

The logic is exactly the same as above. R invites interactive use while you are developing your scripts, which is great. You can experiment, poke around in the objects, and learn while you’re doing. At the same time, your working environment tends to accumulate all kinds of baggage. Maybe the neat function that you wrote accidentally relies on global variable that you created before encapsulating the code, and that is still around in your environment. Get rid of that by frequently restarting your R session.

Working close to the limit of your computer’s ability? Seeing that vector memory exhausted (limit reached?)? Sure, you could manually rm some of the big objects, but it’s easier to restart your R session.

Working on a package? Trying to keep the analysis code separate from functions that can be reused? Good idea. You have rebuilt the package, run the tests … it should work now. You won’t know until you restart your R session.

Maybe you know perfectly in what order you ran those blocks of code in your RMarkdown, backtracking through any change and keeping track of the dependencies between chunks in your head like a wizard … or you’re like me and you better restart your R session.

Fire, fury, projects and packages

When writing about this on Mastodon, Ben Bolker pointed me to Jenny Bryan’s post on Project-oriented workflow, which includes this memorable quote, originally from one of her talks:

The point of the post is that data analyses should be organised into project folders (and if you’re an RStudio user, Rstudio projects to aid that).

This is more overhead; giving every data analysis its own package is a bit much, and I don’t think R packages are an amazing way to store datasets. But if you have a bunch of different data analysis projects and some of functions that can be shared between them, the right way share those functions is probably to make a package, rather than copy/pasting! Packages are not just for sharing code with others, but to share code with the most important collaborator, yourself.

Today, I would probably recommend using saveRDS and readRDS instead. They just let you save and load one object at a time, and allows you to easily load something under a new name. This tends to be cleaner.↩︎