2

I have the following code:

data = pd.read_csv('audit_nor.csv')
d1 = pd.get_dummies(data)
header = d1.columns.values
print(header)
print(type(header))

The output looks like:

['ID' 'Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted'
 'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal'
 'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp'
 'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate'
 'Education_Bachelor' 'Education_College' 'Education_Doctorate'
 'Education_HSgrad' 'Education_Master' 'Education_Preschool'
 'Education_Professional' 'Education_Vocational' 'Education_Yr10'
 'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8'
 'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married'
 'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed'
 'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive'
 'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional'
 'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service'
 'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male'
 'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India'
 'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica'
 'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines'
 'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam']
<type 'numpy.ndarray'>

I am trying to remove the 'ID' from header, so I could remove the entire 'ID' column from the data frame. I did:

columns = header.delete('ID')

but get errors:

AttributeError: 'numpy.ndarray' object has no attribute 'delete'

I am wonder what should be the proper way to resolve this. Thanks!

2
  • What is the exception (i.e., explain the errors)? Commented Mar 23, 2016 at 18:28
  • Error message updated above. Thanks! Commented Mar 23, 2016 at 18:41

1 Answer 1

3

You can use numpy.delete with numpy.where for find index:

import numpy as np

print np.where(header=='ID')
(array([0], dtype=int64),)

columns = np.delete(header, np.where(header=='ID'))
print columns
['Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted'
 'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal'
 'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp'
 'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate'
 'Education_Bachelor' 'Education_College' 'Education_Doctorate'
 'Education_HSgrad' 'Education_Master' 'Education_Preschool'
 'Education_Professional' 'Education_Vocational' 'Education_Yr10'
 'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8'
 'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married'
 'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed'
 'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive'
 'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional'
 'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service'
 'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male'
 'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India'
 'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica'
 'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines'
 'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam']

Or you can use list comprehension for remove ID:

columns = [x for x in header if x != 'ID']
print columns
['Age', 'Income', 'Deductions', 'Hours', 'Adjustment', 'Adjusted', 'Employment_Consultant', 'Employment_PSFederal', 'Employment_PSLocal', 'Employment_PSState', 'Employment_Private', 'Employment_SelfEmp', 'Employment_Unemployed', 'Employment_Volunteer', 'Education_Associate', 'Education_Bachelor', 'Education_College', 'Education_Doctorate', 'Education_HSgrad', 'Education_Master', 'Education_Preschool', 'Education_Professional', 'Education_Vocational', 'Education_Yr10', 'Education_Yr11', 'Education_Yr12', 'Education_Yr5t6', 'Education_Yr7t8', 'Education_Yr9', 'Marital_Absent', 'Marital_Divorced', 'Marital_Married', 'Marital_Married-spouse-absent', 'Marital_Unmarried', 'Marital_Widowed', 'Occupation_Cleaner', 'Occupation_Clerical', 'Occupation_Executive', 'Occupation_Farming', 'Occupation_Machinist', 'Occupation_Professional', 'Occupation_Repair', 'Occupation_Sales', 'Occupation_Service', 'Occupation_Support', 'Occupation_Transport', 'Sex_Female', 'Sex_Male', 'Accounts_Cuba', 'Accounts_England', 'Accounts_Germany', 'Accounts_India', 'Accounts_Indonesia', 'Accounts_Iran', 'Accounts_Ireland', 'Accounts_Jamaica', 'Accounts_Malaysia', 'Accounts_Mexico', 'Accounts_Philippines', 'Accounts_Portugal', 'Accounts_UnitedStates', 'Accounts_Vietnam']
#if you need filter df by columns
df = df[columns]

Or filter array by removing first item (ID has to be first element of header):

columns = header[1:]
print columns
['Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted'
 'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal'
 'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp'
 'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate'
 'Education_Bachelor' 'Education_College' 'Education_Doctorate'
 'Education_HSgrad' 'Education_Master' 'Education_Preschool'
 'Education_Professional' 'Education_Vocational' 'Education_Yr10'
 'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8'
 'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married'
 'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed'
 'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive'
 'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional'
 'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service'
 'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male'
 'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India'
 'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica'
 'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines'
 'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam']

#if you need filter df by columns
df = df[columns]

But if you need remove column ID, use drop:

df = df.drop('ID', axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.