0

It's similar to this, but the solution doesn't solve my problem.

I use pandas's astype function to parse a string into data in int32 format, but because there are some outliers in the original data, it causes ValueError exception.

I want to delete the wrong row data.

my code is:

df['DRIVEDIR'] = df['DRIVEDIR'].astype('int32')

the df (a small subset) is:

                                    DRIVEDIR 
PASSTIME                                                                       
2017-06-02 11:01:08.247000+08:00       3            
2017-06-02 11:00:55.710000+08:00       2            
2017-06-02 11:00:41.139000+08:00       鲁XXX              
2017-06-02 07:43:41.818000+08:00       2            
2017-06-02 11:04:21.317000+08:00       3            
2017-06-02 11:04:18.460000+08:00       2            
2017-06-02 11:04:13.159000+08:00       1  

I try use df['DRIVEDIR'] = df['DRIVEDIR'].astype('int32',errors= 'ignore'), but it can't change the dtype form object to int32, there's no way I can deal with it later.so, how to delete wrong row from dataframe when get ValueError by using astype from object to int32.

2
  • use pd.to_numeric(df.DRIVEDIR, errors='coerce') Commented Oct 25, 2017 at 10:52
  • Great, the to_numeric should works well. Many thanks. Commented Oct 25, 2017 at 10:56

1 Answer 1

1

As mentioned in my comment, use pd.to_numeric. Invalid items are coerced to NaN. You can then just filter them out and cast to int after that.

pd.to_numeric(df.DRIVEDIR, errors='coerce').dropna().astype(int)

2017-06-02    3
2017-06-02    2
2017-06-02    2
2017-06-02    3
2017-06-02    2
2017-06-02    1
Name: DRIVEDIR, dtype: int64
Sign up to request clarification or add additional context in comments.

2 Comments

Because my data is in a dataframe(many columns), I want to delete the NaN row from to_numeric, and I use the following code: ' df['DRIVEDIR'] = pd.to_numeric(df['DRIVEDIR'], errors='coerce') df = df[df['DRIVEDIR'].notnull()] df['DRIVEDIR'] = df['DRIVEDIR'].astype('int32')', but there is a warning, how to make it no longer warning?:SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
@hall df = df[df['DRIVEDIR'].notnull()].copy() and this is an unrelated question. If this problem was solved, please mark it accepted. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.