1

Using this as a quick starting point;

http://pandas.pydata.org/pandas-docs/stable/reshaping.html

In [1]: df
Out[1]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
4  2000-01-04        B  1.212112
5  2000-01-05        B -0.173215
6  2000-01-03        C  0.119209
7  2000-01-04        C -1.044236
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

Then isolating 'A' gives this:

In [2]: df[df['variable'] == 'A']
Out[2]: 
        date variable     value
0 2000-01-03        A  0.469112
1 2000-01-04        A -0.282863
2 2000-01-05        A -1.509059

Now creating new dataframe would be:

dfA = df[df['variable'] == 'A'] 

Lets say B's would be:

dfB = df[df['variable'] == 'B'] 

So, Isolating the dataframes into dfA, dfB, dfC......

dfList  = list(set(df['variable']))
dfNames = ["df" + row for row in dfList]  

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfNames[i] = dfNew      

It runs... But when try dfA I get output "dfA" is not defined

4
  • You are writing dfNew into dfNames[i], not dfA. It's roughly equivalent to the difference between dfA and "dfA". I don't know if an exact solution to your question is possible in python due to the lack of macros. You maybe could do this with a context manager? But really, I would think about doing it another way. It might help if you could give some more context for the overall issue. Commented Aug 11, 2015 at 3:18
  • @JohnE Thanks, Its much harder than it looks. I am trying to create everything dynamically, Segment out the smaller arrays so I can pickle them. In the above example, Simply trying to find a way to break out those four categories into separate df or array.. thanks for actually reading the code. Commented Aug 11, 2015 at 11:32
  • @JohnE look at accepted ans. Commented Aug 11, 2015 at 20:24
  • yep, that is thorough Commented Aug 11, 2015 at 20:43

4 Answers 4

5

Use groupby and get_group, eg:

grouped = df.groupby('variable')

Then when you want to do something with each group, access it as such:

my_group = grouped.get_group('A')

Gives you:

    date    variable    value
0   2000-01-03  A   0.469112
1   2000-01-04  A   -0.282863
2   2000-01-05  A   -1.509059
Sign up to request clarification or add additional context in comments.

2 Comments

does not really answer the question.. A list into multiple dataframes... Trying not to use groupby, trying to solve above question.
Any particular reason you want to create dynamically named variables and perform a linear scan of the original dataframe as many times as the number of unique values of variables + 1? @Merlin?
4

To answer your question literally, globals()['dfA'] = dfNew would define dfA in the global namespace:

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    globals()[dfName] = dfNew   

However, there is never a good reason to define dynamically-named variables.

  • If the names are not known until runtime -- that is, if the names are truly dynamic -- then you you can't use the names in your code since your code has to be written before runtime. So what's the point of creating a variable named dfA if you can't refer to it in your code?

  • If, on the other hand, you know before hand that you will have a variable named dfA, then your code isn't really dynamic. You have static variable names. The only reason to use the loop is to cut down on boiler-plate code. However, even in this case, there is a better alternative. The solution is to use a dict (see below) or list1.

  • Adding dynamically-named variables pollutes the global namespace.

  • It does not generalize well. If you had 100 dynamically named variables, how would you access them? How would you loop over them?

  • To "manage" dynamically named variables you would need to keep a list of their names as strings: e.g. ['dfA', 'dfB', 'dfC',...] and then accessed the newly minted global variables via the globals() dict: e.g. globals()['dfA']. That is awkward.

So the conclusion programmers reach through bitter experience is that dynamically-named variables are somewhere between awkward and useless and it is much more pleasant, powerful, practical to store key/value pairs in a dict. The name of the variable becomes a key in the dict, and the value of the variable becomes the value associated with the key. So, instead of having a bare name dfA you would have a dict dfs and you would access the dfA DataFrame via dfs['dfA']:

dfs = dict()
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfs[dfName] = dfNew   

or, as Jianxun Li shows,

dfs = {k: g for k, g in df.groupby('variable')}

This is why Jon Clements and Jianxun Li answered your question by showing alternatives to defining dynamically-named variables. It's because we all believe it is a terrible idea.


Using Jianxun Li's solution, to loop over a dict's key/value pairs you could then use:

dfs = {k: g for k, g in df.groupby('variable')}
for key, df in dfs.items():
    ...

or using Jon Clements' solution, to iterate through groups you could use:

grouped = df.groupby('variable')
for key, df in grouped:
    ...

1If the names are numbered or ordered you could use a list instead of a dict.

6 Comments

using Jianxun Li answer your modification to his ans. How would loop over the dict and isolate the the 'k' into their own dataframe... I am trying to generalize a solution so I dont know the keys ahead of time. I can drop the "df" in 'dfA'
To loop over a dict, dfs, use for key, values in dfs.items() (in Python3), or for key, values in dfs.iteritems() (in Python2). I don't understand what "isolate the 'k' into their own dataframe" means. If your question is a clarification of the current question, please add it your question above. If it is a follow-up question, please consider asking it as a new question.
Isolate each key into its own dataframe.. So, there are 4 keys, then there would be 4 dataframes.
Then use the for key, value in dfs.items() loop as shown above.
Regarding your suggested edit: My answer was really an attempt to convince you to not use globals(). Therefore I do not want to add globals there since I am NOT advocating the use of globals(). Once you have the dict as in Jianxun Li's answer, there is no need for using globals()[key] = df. Anywhere you would need A, you would use dfs['A'] instead.
|
1

df.groupby('variable') returns an iterator with key/df pairs. So to get a list/dict of subgroups,

result = {k: g for k, g in df.groupby('variable')}

from pprint import pprint
pprint(result)

{'A':          date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091,
 'B':          date variable   value
3  2000-01-03        B -1.1356
4  2000-01-04        B  1.2121
5  2000-01-05        B -0.1732,
 'C':          date variable   value
6  2000-01-03        C  0.1192
7  2000-01-04        C -1.0442
8  2000-01-05        C -0.8618,
 'D':           date variable   value
9   2000-01-03        D -2.1046
10  2000-01-04        D -0.4949
11  2000-01-05        D  1.0718}


result['A']

         date variable   value
0  2000-01-03        A  0.4691
1  2000-01-04        A -0.2829
2  2000-01-05        A -1.5091

Comments

0
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    vars()[dfNames[i]] = dfNew

3 Comments

The vars() function will extract the values of dfName and turns them to variables
please, give an explanation to your answer
The output of the code above is " 'dfA' is not defined " , that's because he is trying to assign a dataframe to a string (which is 'dfA' ). So to convert strings to variables we need to use the vars() function. In this case for example dfNames[i] = "dfA", so when we apply vars()[dfNames[i]] will return a variable dfA .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.