Dask, how to drop row with specific value in a variable into lazy computing

Question

I'm trying to learn to walkaround with dask for my machine learning project.

My data set is too big to play with Pandas, so I must stay in lazy loading.

here a smal sample to show how it is set up:

I try style pandas, but it run with no end...

clean = ds[ds['ptype'] == 0]

this way is same result:

ds_filtered = ds.where(ds['ptype'] != 0, drop=True)

co-pilot show me somes other way but without lazy loading or juste not working solution

UP DATE:

a new way I try, but the ouput df is same as innitial, dimension time not shorter like expected.

def remove_no_prcp(df):
  return df[df['ptype'] != 0]

resample = xr.apply_ufunc(remove_no_prcp, ds.chunk({'time': -1}), dask='parallelized', output_dtypes=[float])

Aseem Ginoria · Accepted Answer · 2024-11-06 20:56:05Z

0

So I believe you are missing the gist here by mentioning it. Lazy loading simply means, nothing will be worked upon until compute() function is called, which will then take care of execution.!

So you need to have all your logic in place, at the end you can call compute function.

If your data is too large, compute internally merges everything and brings the entire data into the caller, which may still kill you application, die to OOM, so better to use map_partions with compute, to write it into separate files or push output to some database.!

answered Nov 6, 2024 at 20:56

Aseem Ginoria

562 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jonathan Roy Over a year ago

thank for your comment, I understand the LAzy loading concept, my but I strugle on how to make the logic in place and how to validate if the datas are like expected. Can you give me a simple exemple on what command to filter 0 for exemple and how I can make test to see if my data are like expected? I will use this exemple to build the others parts of my logic. All my trys fall into memorie problem, so something is missing

Collectives™ on Stack Overflow

Dask, how to drop row with specific value in a variable into lazy computing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related