6

I need to process data that is being provided in a password protected Excel (xlsx) workbook. For legal reasons, I cannot create an unprotected Excel file or a csv file etc and process from there. None of the Excel import packages can deal with password protected workbooks.

From this answer Import password-protected xlsx workbook into R I have managed to extract the data. However it is imported in a list of character lists format. The dput of my list looks like this:

list(list("ID", "ID1", "ID2"),
     list("V2", NULL, "text2"),
     list("Name", "John Smith", "Mary Brown"),
     list("Score", 1, 2),
     list("email", "[email protected]", "[email protected]"))

What I want is a dataframe with columns ID, V2 etc that looks like this:

   ID    V2     Name        Score  email
   ID1   NULL   John Smith  1      [email protected]
   ID2   text2  Mary Brown  2      [email protected]

There are empty cells in the original Excel workbook, so solutions with unlist will not work.

Using a combination of answers from R list to data frame and other similar questions, I have the following code (where listform is the name of the list):

matform <- as.matrix(sapply(listform, function(s) s)) # retains empty
df <- data.frame(matform[2:nrow(matform),])
names(df) = matform[1,]

This is close, but the dataframe has lists as columns. So str(df) yields:

'data.frame':   2 obs. of  5 variables:
 $ ID:List of 2
  ..$ : chr "ID1"
  ..$ : chr "ID2"
 $ V2:List of 2
  ..$ : NULL
  ..$ : chr "text2"
and so on
3
  • Can you share an example that has missing data? Commented Aug 20, 2015 at 11:47
  • Surprisingly tricky problem for a question where I initially thought the solution would be a single line. Commented Aug 20, 2015 at 12:06
  • Have revised the example data so there is a NULL value. This breaks all the answers with unlist. Commented Aug 20, 2015 at 12:17

3 Answers 3

3

"SetDT" from the "data.table" package seems to be very powerful:

> library(data.table)

> null2na <- function(x){ ifelse(is.null(x),NA,x)}

> f <- function(x){sapply(x,null2na)}

> L <- list(list("ID", "ID1", "ID2"),
+           list("V2", NULL, "text2"),
+           list("Name", "John Smith", "Mary Brown"),
+           list("S ..." ... [TRUNCATED] 

> L <- setDT(L)[, lapply(.SD, f)]

> setnames(L,colnames(L),unlist(L[1,]))

> L <- L[-1,]

> L
    ID    V2       Name Score        email
1: ID1    NA John Smith     1 [email protected]
2: ID2 text2 Mary Brown     2    [email protected]

> str(L)
Classes ‘data.table’ and 'data.frame':  2 obs. of  5 variables:
 $ ID   : chr  "ID1" "ID2"
 $ V2   : chr  NA "text2"
 $ Name : chr  "John Smith" "Mary Brown"
 $ Score: chr  "1" "2"
 $ email: chr  "[email protected]" "[email protected]"
 - attr(*, ".internal.selfref")=<externalptr> 
> 

(A data table is a better data frame.)

The function "f" does two jobs: It "unlist"s and turns NULL into NA.

Sign up to request clarification or add additional context in comments.

6 Comments

Not quite. Try str(L). All the columns are still lists.
Any idea how to fix it?
L <- setDT(L)[, lapply(.SD, unlist)][-1] maybe (while L is the original list)
unlist cannot handle empty cells in the underlying data
Now the lists are gone and NULL is NA, by brute force.
|
3

First step: extract the names:

names = lapply(listform, `[[`, 1)
data = setNames(lapply(listform, `[`, -1), names)

Second step: unlist data and bind columns

result = as.data.frame(sapply(data, unlist))

The magic here happens in sapply, which internally calls simplify2array. as.data.frame at the end is needed to make a data.frame out of the resulting matrix, and to assign proper data types to the individual columns.

The above code has got one problem though: the column types probably aren’t what you want. This can be fixed as follows:

col_classes = sapply(lapply(listform, `[[`, 2), typeof)
result = as.data.frame(sapply(data, unlist), stringsAsFactors = FALSE)
for (col in seq_len(ncol(result)))
    class(result[[col]]) = col_classes[col]

Now you’ll get the following result:

> str(result)
'data.frame':   2 obs. of  5 variables:
 $ ID   : chr  "ID1" "ID2"
 $ V2   : chr  "text1" "text2"
 $ Name : chr  "John Smith" "Mary Brown"
 $ Score: num  1 2
 $ email: chr  "[email protected]" "[email protected]"

And this, I think, is what you want.

6 Comments

Thanks, I just tried this and it doesn't seem to work. Am I supposed to be using my list (listform) where you have data or my datafram (df)? Either way I didn't seem to get a sensible dataframe out.
@JenB Yes, sorry. Use listform instead of data, I’ll amend the answer. This code definitely works with your example data.
This gets the correct structure but unfortunately has the problem that it can't deal with empty cells in the Excel workbook (which is a problem with unlist). I will have a go with the function(s) s in the sapply, whcih I think enters NULL for the empties.
@JenB function (s) s does nothing at all, actually. It’s the same as identity, and applying the identity isn’t a meaningful operation. You can just call simplify2array(listform) directly, instead of calling sapply for this effect. However, a better solution would be to fill the empty elements before applying my code. Incidentally, the as.matrix in your code is also a redundant null operation.
You are correct, made no difference :) So I will have to investigate adding some prior code to fill the lists with NA
|
2

Here's one way using data.table v1.9.5 (for the transpose() function):

require(data.table) # v1.9.5+
setDT(sapply(ll, function(x) setattr(transpose(x[-1L]), 'names', x[[1L]])))[]
#     ID    V2       Name Score        email
# 1: ID1    NA John Smith     1 [email protected]
# 2: ID2 text2 Mary Brown     2    [email protected]

Use setDF() instead of setDT() to return a data.frame instead.

2 Comments

Can not find the transpose function
Have you tried following the instructions in the answer? Or clicking on the link for example? Hint - "using data.table v1.9.5*"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.