TL;DR
If you are using a tibble (commonly used in the tidyverse) you can safely do any of the following to select columns and you will get a tibble back:
library(tibble)
tb <- tibble(A = 1:2, B = 3:4)
# By index
tb[1]
tb[, 1]
tb[1:2]
tb[, 1:2]
# By name
tb["A"]
tb[, "A"]
tb[c("A", "B")]
tb[, c("A", "B")]
This is in addition to the answer given by @Sam Firke which uses the popular select() verb for column selection.
You can use any of these selection operators on base R data frames, but know there are some cases where you should specify drop = FALSE.
There is already some discussion about tidyverse versus base R in other answers, but hopefully this adds something.
You can see from the documentation ?`[.data.frame` (and the answer from @Joshua Ulrich) that data frame columns can be selected several ways. This has to do with the drop argument:
If TRUE the result is coerced to the lowest possible dimension. The
default is to drop if only one column is left, but not to drop if only
one row is left.
If a single vector is given, then columns are indexed and selection behaves like list selection (the drop argument of [ is ignored). In this case, a data frame is always returned:
df <- data.frame(A = 1:2, B = 3:4)
str(df[1])
# 'data.frame': 2 obs. of 1 variable:
# $ A: int 1 2
str(df[1:2])
# 'data.frame': 2 obs. of 2 variables:
# $ A: int 1 2
# $ B: int 3 4
str(df[c("A", "B")])
# 'data.frame': 2 obs. of 2 variables:
# $ A: int 1 2
# $ B: int 3 4
However, if two indicies are given ([row, column]) then selection behaves more like matrix selection. In this case the default argument of [ is drop = TRUE so the result is coerced to the lowest possible dimension only if there is only a single column left:
str(df[1, ]) # single row selection (does not reduce dimension)
# 'data.frame': 1 obs. of 2 variables:
# $ A: int 1
# $ B: int 3
str(df[, 1]) # single column selection (does reduce dimension)
# int [1:2] 1 2
Of course you can always change the default behavior by setting drop = FALSE:
str(df[, 1, drop = FALSE])
# 'data.frame': 2 obs. of 1 variable:
# $ A: int 1 2
In the tidyverse, tibbles are preferred. They are like data frames, but have a few significant differences -- one being column selection. Column selection using tibbles never reduces dimensionality, as shown above:
library(tibble)
tb <- as_tibble(df)
class(tb)
# [1] "tbl_df" "tbl" "data.frame"
str(tb[, 1])
# tibble [2 × 1] (S3: tbl_df/tbl/data.frame)
# $ A: int [1:2] 1 2
str(tb[1])
# tibble [2 × 1] (S3: tbl_df/tbl/data.frame)
# $ A: int [1:2] 1 2
All the other tibble column selection works as you would expect (above only shows by index, but you can select by name too).
select(df, c('A','B','C'))