0

I'm trying to learn to walkaround with dask for my machine learning project.

My data set is too big to play with Pandas, so I must stay in lazy loading.

here a smal sample to show how it is set up: enter image description here

I try style pandas, but it run with no end...

clean = ds[ds['ptype'] == 0]

this way is same result:

ds_filtered = ds.where(ds['ptype'] != 0, drop=True)

co-pilot show me somes other way but without lazy loading or juste not working solution

UP DATE:

a new way I try, but the ouput df is same as innitial, dimension time not shorter like expected.

def remove_no_prcp(df):
  return df[df['ptype'] != 0]

resample = xr.apply_ufunc(remove_no_prcp, ds.chunk({'time': -1}), dask='parallelized', output_dtypes=[float])

1 Answer 1

0

So I believe you are missing the gist here by mentioning it. Lazy loading simply means, nothing will be worked upon until compute() function is called, which will then take care of execution.!

So you need to have all your logic in place, at the end you can call compute function.

If your data is too large, compute internally merges everything and brings the entire data into the caller, which may still kill you application, die to OOM, so better to use map_partions with compute, to write it into separate files or push output to some database.!

Sign up to request clarification or add additional context in comments.

1 Comment

thank for your comment, I understand the LAzy loading concept, my but I strugle on how to make the logic in place and how to validate if the datas are like expected. Can you give me a simple exemple on what command to filter 0 for exemple and how I can make test to see if my data are like expected? I will use this exemple to build the others parts of my logic. All my trys fall into memorie problem, so something is missing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.