The R Formula Cheatsheet
R’s formula syntax is extremely powerful but can be confusing for beginners.1
This post is a quick reference covering all of the symbols that have a “special” meaning inside of an R formula: ~, +, ., -, 1, :, *, ^
, and I()
.
You may never use some of these in practice, but it’s nice to know that they exist.
It was many years before I realized that I could simply type y ~ x * z
instead of the lengthier y ~ x + z + x:z
, for example.
While R formulas crop up in a variety of places, they are probably most familiar as the first argument of lm()
.
For this reason, my verbal explanations assume a simple linear regression setting in which we hope to predict y
using a number of regressors x
, z
, and w
.
Symbol | Purpose | Example | In Words |
---|---|---|---|
~ | separate LHS and RHS of formula | y ~ x | regress y on x |
+ | add variable to a formula | y ~ x + z | regress y on x and z |
. | denotes “everything else” | y ~ . | regress y on all other variables in a data frame |
- | remove variable from a formula | y ~ . - x | regress y on all other variables except x |
1 | denotes intercept | y ~ x - 1 | regress y on x without an intercept |
: | construct interaction term | y ~ x + z + x:z | regress y on x , z , and the product x times z |
* | shorthand for levels plus interaction | y ~ x * z | regress y on x , z , and the product x times z |
^ | higher order interactions | y ~ (x + z + w)^3 | regress y on x , z , w , all two-way interactions, and the three-way interactions |
I() | “as-is” - override special meanings of other symbols from this table | y ~ x + I(x^2) | regress y on x and x squared |
Fun fact: R’s formula syntax originated in this 1973 paper by Wilkinson and Rogers.↩︎