4

The same task in Pandas can be easily done with

import pandas as pd
df = pd.DataFrame({"lists":[[i, i+1] for i in range(10)]})
df[['left','right']] = pd.DataFrame([x for x in df.lists])

But I can't figure out how to do something similar with a dask.dataframe

Update

So far I found this workaround

ddf = dd.from_pandas(df, npartitions=2)
ddf["left"] = ddf.apply(lambda x: x["lists"][0], axis=1, meta=pd.Series())
ddf["right"] = ddf.apply(lambda x: x["lists"][1], axis=1, meta=pd.Series())

I'm wondering if there is another way to procede.

2
  • Do I understand: you are trying to get around calling .assign() with two terms? Commented Jul 24, 2017 at 13:28
  • @mdurant I updated the question. I tried to use assign to do so but without success. Commented Jul 24, 2017 at 21:31

1 Answer 1

9

You could achieve this using assign:

ddf = ddf.assign(left=ddf.lists.map(lambda x: x[0]),
                 right=ddf.lists.map(lambda x: x[1]))

e.g.,

ddf.compute()


     lists  left  right
0   [0, 1]     0      1
1   [1, 2]     1      2
2   [2, 3]     2      3
3   [3, 4]     3      4
4   [4, 5]     4      5
5   [5, 6]     5      6
6   [6, 7]     6      7
7   [7, 8]     7      8
8   [8, 9]     8      9
9  [9, 10]     9     10

An alternative way of phrasing this (see comments, below) might be

ddf = ddf.assign(**{k: ddf.lists.map(lambda x, i=i: x[i]) 
                 for i, k in enumerate(['left', 'right'])})
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. I'm wondering if is possible to use a loop for assing and/or apply. I mean in pandas I can do something like this {value: df.lists.map(lambda x: x[key]) for key, value in enumerate(["left","right"])} but this doesn't work with dask.
Since assign takes optional keyword arguments, you could use **kwargs syntax to pass a dictionary comprehension.
I did so as on my previous comment but I got an error.
I think using .map() which was built for pure mapping with the help of dictionaries instead of .apply() which was created for just that, applying functions when simple mapping is not enough... Is... 'slightly wrong'. I understand that you are trying to avoid the hassle of meta=..., but it was put there for a reason, that's what the cluster needs...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.