0

I want to create multiple dataframes based on values in a column.

sample data

df
Index Product ID Amount
200   Prod1   01 100
201   Prod1   01 150
202   Prod1   01 123
203   Prod1   01 123
204   Prod1   02 110
205   Prod1   02 175
206   Prod1   02 190
207   Prod2   03 120
208   Prod2   03 135
209   Prod2   03 150

I would like to add a column as Base for each ID. The value of Base is the first amount value in each ID.

>df1
Index Product ID Amount Base
200   Prod1   01 100    100
201   Prod1   01 150    100
202   Prod1   01 123    100
203   Prod1   01 123    100
204   Prod1   02 110    110
205   Prod1   02 175    110
206   Prod1   02 190    110
207   Prod2   03 120    120
208   Prod2   03 135    120
209   Prod2   03 150    120

I am thinking of subset the df by ID first. Just wondering if any way to do this?

2
  • 2
    If your df is sorted: do.call(rbind.data.frame, lapply(split(df, df$ID), function(sset) within(sset, Base <- Amount[1]))) Commented Jun 1, 2014 at 18:00
  • Do you need the first Amount value that appears or the minimum Amount value to be put in Base column? Commented Jun 1, 2014 at 18:10

5 Answers 5

7

You could create a list of data frames and then embed them into any environment you want using list2env

SubData <- lapply(unique(df$ID), function(x) cbind(df[df$ID == x, ], Base = df$Amount[df$ID == x][1]))


# [[1]]
#   Index Product ID Amount Base
# 1   200   Prod1  1    100  100
# 2   201   Prod1  1    150  100
# 3   202   Prod1  1    123  100
# 4   203   Prod1  1    123  100
# 
# [[2]]
#   Index Product ID Amount Base
# 5   204   Prod1  2    110  110
# 6   205   Prod1  2    175  110
# 7   206   Prod1  2    190  110
# 
# [[3]]
#    Index Product ID Amount Base
# 8    207   Prod2  3    120  120
# 9    208   Prod2  3    135  120
# 10   209   Prod2  3    150  120

Now give your data frames whatever names you want and use list2env in order to create them in the environment

names(SubData) <- c("df1", "df2", "df3")
list2env(SubData, envir = .GlobalEnv)

Now you have these datasets in the global enviroment, e.g.

df1
##   Index Product ID Amount Base
## 1   200   Prod1  1    100  100
## 2   201   Prod1  1    150  100
## 3   202   Prod1  1    123  100
## 4   203   Prod1  1    123  100
Sign up to request clarification or add additional context in comments.

2 Comments

user3689870, see my edit re list2env, I think this would be the proper way to do this
user3689870, didn't you want multiple data frames?
4

Using ave:

dat$Base <- ave(dat$Amount,dat$ID,FUN=min)

# Index Product ID Amount Base
# 1    200   Prod1  1    100  100
# 2    201   Prod1  1    150  100
# 3    202   Prod1  1    123  100
# 4    203   Prod1  1    123  100
# 5    204   Prod1  2    110  110
# 6    205   Prod1  2    175  110
# 7    206   Prod1  2    190  110
# 8    207   Prod2  3    120  120
# 9    208   Prod2  3    135  120
# 10   209   Prod2  3    150  120

EDIT

In case you want the first value and the minium one:

dat$Base <- ave(dat$Amount,dat$ID,FUN=function(x)x[1])

2 Comments

I didn't downvote any of you, also wondered why (thought perhaps bc in @agstudy's answer you are using min and not the first value of Amount)?
@beginneR Maybe..I edit my answer to add first value.
3

Assuming your data.frame is called dat, here's a data.table solution:

require(data.table)
setDT(dat)[, Base := Amount[1L], by=ID]
#    Index Product ID Amount Base
#  1:   200   Prod1  1    100  100
#  2:   201   Prod1  1    150  100
#  3:   202   Prod1  1    123  100
#  4:   203   Prod1  1    123  100
#  5:   204   Prod1  2    110  110
#  6:   205   Prod1  2    175  110
#  7:   206   Prod1  2    190  110
#  8:   207   Prod2  3    120  120
#  9:   208   Prod2  3    135  120
# 10:   209   Prod2  3    150  120

Comments

2

You could use dplyr to create the Base column, but just to be clear, this does not yet create different data.frames (as indicated in your question).

require(dplyr)

df <- df %.% group_by(ID) %.% mutate(Base = first(Amount))

Comments

1

Or using dplyr:

library(dplyr)
df1 <- df %>% 
  arrange(ID, Amount) %>%
  group_by(ID) %>% 
  mutate(Base = Amount[1])

4 Comments

Nice one! I'm not sure if the sorting/arranging is necessary, since in the question the OP says "The value of Base is the first amount value in each ID", not the minimum value.
@beginneR Oops, you are right. Well I'll leave it anyway to show how easy sorting is with dplyr. :)
Can't find where %>% is documented. Any tips?
%>% is documented here (and originally here )

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.