I need advice from you. Right now i do some computation with pandas library. Program is using multiprocessing and df.apply. The simple example showing my idea is here:
import multiprocessing
import pandas as pd
def f2(row, item):
# do some computation with item and rows values and return something
return 'something'
def f1(item):
d1 = {'col1': [4,5,6], 'col2': [7,8,9]}
df = pd.DataFrame(d1)
df['col3'] = df.apply(f2, args=(item,))
if __name__ == '__main__':
l1 = [1,2,3]
for item in l1:
x = multiprocessing.Process(target=f1, args(item, ))
I have PC and another one. That is why I am thinking about "local cluster'.
How can I run this code using dask distributed library?
What should I change in this code?
Does dask distributed works with multiprocessing?
Will dask distributed be faster than work on single machine?
It computes on small df - c.a. 25000 rows