0

I am attempting to split my data into three separate dataframes (train, test, validate) using a function but it is not returning the results I require.

This is my function:

   splitData <- function(type) {
    set.seed(1337)
    rowTrain <- createDataPartition(y = cleaned.data$CHURN, p = 0.7, list = FALSE)
    bufferDF <- cleaned.data[-rowTrain,]
    rowTest <- createDataPartition(y = cleaned.data$CHURN, p = 0.50, list = FALSE)
    if(type == "train") {cdTrain <- cleaned.data[rowTrain,]}
    if(type == "train") {cdTrain}
    if(type == "test") {cdTest <- cleaned.data[rowTest,]}
    if(type == "test") {cdTest}
    if(type == "validate") {cdValidate <- bufferDF[-rowTest,]}
    if(type == "validate") {cdValidate}
}

Could you please shine some light on where I am going wrong?

Cheers

4
  • why do you have two if statements for each type? Why not just do, e.g., if(type=="train") {cdTrain<-cleaned.data[rowTrain,]; cdTrain}? Commented Mar 24, 2018 at 4:55
  • @doviod great point... still reasonably new to R so learning bit by bit so cheers for that tip. I'm running the function using the command cdTrain <- splitData(train) to no avail. Also have tried splitData("train") and splitData(type = "train"). Where am I going wrong? Commented Mar 24, 2018 at 5:01
  • what results do you get from running it? Commented Mar 24, 2018 at 5:03
  • Check my reply to your comment down below @doviod :D Commented Mar 24, 2018 at 5:09

1 Answer 1

1

The function missing() examines whether an argument was passed to the function it is within. Passing something like train=="y" is meaningless, because train=="y" is not an argument for the function splitData. If you're trying to make sure that the various variables were passed before you do something, it should be if(!missing(train)).

However, I'm not sure what your function hopes to achieve - it doesn't actually use any of the arguments it receives, other than to check if they exist or not...

UPDATE:

Try this:

splitData <- function(type) {
  set.seed(1337)
  rowTrain <- createDataPartition(y = cleaned.data$CHURN, p = 0.7, list = FALSE)
  bufferDF <- cleaned.data[-rowTrain,]
  rowTest <- createDataPartition(y = cleaned.data$CHURN, p = 0.50, list = FALSE)
  if(type == "train") {cdTrain <- cleaned.data[rowTrain,]
    return(cdTrain)}
  if(type == "test") {cdTest <- cleaned.data[rowTest,]
    return(cdTest)}
  if(type == "validate") {cdValidate <- bufferDF[-rowTest,]
    return(cdValidate)}
}

Note that "validate" will give you a very short list, because you're using -rowTest created from the full data set on the shorted bufferDF, which only includes 30% of the data set. You might want to replace the line defining rowTest with something like:

rowTest <- createDataPartition(y = bufferDF, p = 0.50, list = FALSE)

Which will give you a sample of 50% of the test data.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.