I have a number of dataframes imported from a multi-tab Excel that I want to combine into one big dataframe. But first, I would like to create a new column to append the respective sheet name to each dataframe (i.e. for dataframe A, create new column with value "A", for dataframe B, create new column with value "B"). Is there a simple way to do this? I imagine a loop of some sort, but I haven't been able to find a solution online that illustrates how to extract the sheetnames from Excel. Would appreciate any tips on how to do this. Thanks!
2 Answers
Check out the readxl package from Hadley Wickham. You can use the excel_sheets() to get a listing of each sheet in a workbook that you can then use in a loop.
Example with 4 tabs in a excel book I made named "a", "b", "c", and "d". The result is a list with a dataframe for each tab with a column in the dataframe keeping track of the sheet name.
library(readxl)
#initialize readin listing
mysheets_fromexcel <- list()
mysheetlist <- excel_sheets(path="your/path/yourworkbook.xlsx")
i=1
for (i in 1:length(mysheetlist)){
tempdf <- read_excel(path="C:/Users/john/Desktop/Book1.xlsx", sheet = mysheetlist[i])
tempdf$sheetname <- mysheetlist[i]
mysheets_fromexcel[[i]] <- tempdf
}
mysheets_fromexcel
[[1]]
# A tibble: 3 x 2
revision sheetname
<dbl> <chr>
1 1 a
2 2 a
3 3 a
[[2]]
# A tibble: 3 x 2
revision sheetname
<dbl> <chr>
1 1 b
2 2 b
3 3 b
[[3]]
# A tibble: 3 x 2
revision sheetname
<dbl> <chr>
1 1 c
2 2 c
3 3 c
[[4]]
# A tibble: 3 x 2
revision sheetname
<dbl> <chr>
1 1 d
2 2 d
3 3 d
Comments
I have based my solution on akaDrHouse, but I cannot commet yet on answers, so I write it as separate solution. I had make small changes in the count of the for loop and how to select the sheet. Additionally I store separate dataframes with the same name as the sheet. mysheets_fromexcel[[i]] <- tempdf does not work for me, it makes a list of the sheets as separate tibbles.
library(readxl)
xlsx_file <- "../path/to/excelfile.xlsx"
mysheets_fromexcel <- list()
mysheetlist <- excel_sheets(xlsx_file)
i=1
for (i in 1:length(mysheetlist[])){
tempdf <- read_excel(path=xlsx_file, sheet = i)
tempdf$sheetname <- mysheetlist[i]
##mysheets_fromexcel[[i]] <- tempdf
assign(mysheetlist[[i]], tempdf)
}