2

I have a Question, I have this lines:

s=codecs.open('file.csv', encoding="utf-8").read()
array1=np.asarray(s.splitlines())

print(array1)

and I become this results from array:

['39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K'
 '50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K'
 '38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K'
 ...
 '36, Private, 146311, 9th, 5, Married-civ-spouse, Machine-op-inspct, Husband, White, Male, 0, 0, 40, United-States, <=50K'
 '47, Self-emp-not-inc, 159869, Doctorate, 16, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 0, 50, United-States, <=50K'
 '21, Private, 204641, Some-college, 10, Never-married,']

what I want is to transformate it into:

[['39', 'State-gov', '77516', 'Bachelors', '13',....,'<=50K]['50'...]]

also now is a Array with one row and many columns, and in each column is a string, and I want to change each column into one row with the numbers of columns that have the number of charachters..

I dont have any Idea about it, I wanted splited it but I cant

Could somebody helps me?

thanks!

1 Answer 1

1

Method 1: Generating desired array from file

If you are starting from a csv, you might as well just use np.genfromtxt:

If filename.csv looks like:

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K

Then:

new_arr = np.genfromtxt('filename.csv', dtype='str')

>>> new_arr
array([['39,', 'State-gov,', '77516,', 'Bachelors,', '13,',
        'Never-married,', 'Adm-clerical,', 'Not-in-family,', 'White,',
        'Male,', '2174,', '0,', '40,', 'United-States,', '<=50K'],
       ['50,', 'Self-emp-not-inc,', '83311,', 'Bachelors,', '13,',
        'Married-civ-spouse,', 'Exec-managerial,', 'Husband,', 'White,',
        'Male,', '0,', '0,', '13,', 'United-States,', '<=50K']],
      dtype='<U19')

Method 2: Fixing your array:

Otherwise, if you already have the array:

>>> arr
array(['39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K',
       '50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K'],
      dtype='<U133')

You can iterate through it and split each string to get the output you want:

new_arr = np.array([i.split() for i in arr])

>>> new_arr
array([['39,', 'State-gov,', '77516,', 'Bachelors,', '13,',
        'Never-married,', 'Adm-clerical,', 'Not-in-family,', 'White,',
        'Male,', '2174,', '0,', '40,', 'United-States,', '<=50K'],
       ['50,', 'Self-emp-not-inc,', '83311,', 'Bachelors,', '13,',
        'Married-civ-spouse,', 'Exec-managerial,', 'Husband,', 'White,',
        'Male,', '0,', '0,', '13,', 'United-States,', '<=50K']],
      dtype='<U19')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, the method 1 was my solution!!
Glad to help :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.