240

I saw in a tutorial about regression modeling the following command:

myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

What exactly does this command do, and what is the role of ~ (tilde) in the command?

2
  • Care to share the link to the tutorial? Sounds interesting. Commented Sep 1, 2013 at 13:27
  • 2
    @cheeesus...I was going through the ebook data mining in R with case studies...there you can find many more of such interesting examples. Commented Sep 3, 2013 at 17:02

3 Answers 3

237

The thing on the right of <- is a formula object. It is often used to denote a statistical model, where the thing on the left of the ~ is the response and the things on the right of the ~ are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".

The myFormula <- part of that line stores the formula in an object called myFormula so you can use it in other parts of your R code.


Other common uses of formula objects in R

The lattice package uses them to specify the variables to plot.
The ggplot2 package uses them to specify panels for plotting.
The dplyr package uses them for non-standard evaulation.

Sign up to request clarification or add additional context in comments.

2 Comments

For a slightly more expansive discussion: stackoverflow.com/questions/8055508/the-tilde-operator-in-r/…
The 'formulas' section of the lazyeval vignette gives a good introduction to what a formula is
97

R defines a ~ (tilde) operator for use in formulas. Formulas have all sorts of uses, but perhaps the most common is for regression:

library(datasets)
lm( myFormula, data=iris)

help("~") or help("formula") will teach you more.

@Spacedman has covered the basics. Let's discuss how it works.

First, being an operator, note that it is essentially a shortcut to a function (with two arguments):

> `~`(lhs,rhs)
lhs ~ rhs
> lhs ~ rhs
lhs ~ rhs

That can be helpful to know for use in e.g. apply family commands.

Second, you can manipulate the formula as text:

oldform <- as.character(myFormula) # Get components
myFormula <- as.formula( paste( oldform[2], "Sepal.Length", sep="~" ) )

Third, you can manipulate it as a list:

myFormula[[2]]
myFormula[[3]]

Finally, there are some helpful tricks with formulae (see help("formula") for more):

myFormula <- Species ~ . 

For example, the version above is the same as the original version, since the dot means "all variables not yet used." This looks at the data.frame you use in your eventual model call, sees which variables exist in the data.frame but aren't explicitly mentioned in your formula, and replaces the dot with those missing variables.

3 Comments

Thanks for the answer @Ari B. Friedman but the last line is a bit ambiguous where you say 'dot means "all variables not yet used"'. If you could illustrate it further.
@Ankita, "not yet used" in this context means not referred to. In Species~., species is the only variable that has been used. Therefore, it depends upon every other variable in the data.frame.
I don't understand myFormula <- Species ~ . . When dot still be substituted with variables from data.frame? Could you provide an example
11

In a word,

The tilde(~) separates the left side of a formula with the right side of the formula.

For example, in a linear function, it would separate the dependent variable from the independent variables and can be interpreted as saying, “as a function of.” So, when a person’s wages (wages) as a function of their years of education (years_of_education), we do something like,

wages ~ years_of_education

Here,

 Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

It means, Species is a function of Sepal Length, Sepal Width, Petal Length and Petal Width.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.