2

I've got some data on S3 bucket that I want to work with.

I've imported it using:

import boto3
import dask.dataframe as dd

def import_df(key):
        s3 = boto3.client('s3')
        df = dd.read_csv('s3://.../' + key ,encoding='latin1')
        return df

key = 'Churn/CLEANED_data/file.csv'
train = import_df(key)

I can see that the data has been imported correctly using:

train.head()

but when I try simple operation (taken from this dask doc):

train_churn = train[train['CON_CHURN_DECLARATION'] == 1]
train_churn.compute()

I've got Error:

AttributeError Traceback (most recent call last) in ()

1 train_churn = train[train['CON_CHURN_DECLARATION'] == 1]

----> 2 train_churn.compute()

~/anaconda3/envs/python3/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs) 152 dask.base.compute 153 """ --> 154 (result,) = compute(self, traverse=False, **kwargs) 155 return result 156

AttributeError: 'DataFrame' object has no attribute '_getitem_array'

Full error here: Error Upload

1
  • Running into similar error myself, will update with an answer if I'm able to troubleshoot. Preliminarily looks like it might have something to do with different datatypes across files being read in dd.read_csv Commented Nov 19, 2019 at 0:26

4 Answers 4

1

I was facing a similar issue when trying to read from s3 files, ultimately solved by updating dask to most recent version (I think the one sagemaker instances start with by default is deprecated)

Install/Upgrade packages and dependencies (from notebook)

! python -m pip install --upgrade dask
! python -m pip install fsspec
! python -m pip install --upgrade s3fs

Hope this helps!

Sign up to request clarification or add additional context in comments.

1 Comment

This post doesn't look like an attempt to answer this question. Every post here is expected to be an explicit attempt to answer this question; if you have a critique or need a clarification of the question or another answer, you can post a comment (like this one) directly below it. Please remove this answer and create either a comment or a new question. See: Ask questions, get answers, no distractions
0

If it's a row-wise selection on 'CON_CHURN_DECLARATION' you should be able to filter the dataframe with :

train_churn = train[train.CON_CHURN_DECLARATION==1]

1 Comment

this is exactly as mine code and both don't work (same error)
0

You potentially have a old version of dask. Installing version 2.13.0 fixed this issue for me.

Comments

0

I had the same issue with dask ( version 2.14.0). Reinstalling dask solved my problem. I believe there must be some problem with the previously installed version.

2 Comments

at-least mentioning dask version numbers would make your answer more useful.
@suvy Thanks for suggestion, added the version of dask

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.