2

I am trying to replace string values in an array with the array median, but I am getting an error code when I try to make a boolean. I have a defined array with 3 string values and the code I am trying to enter is:

arr2 = np.array ([1,2,3,1,5,2,3,4,2,
                  4,1,3,4,1,2,5,3,2,
                  1,"?",1,"n",3,2,5,
                  1,2,"Nan",3,2,2,4,3])

flag_good = [element.isdigit() for element in arr2]
flag_bad = ~flag_good

but I get an error code when running the line:

flag_bad = ~flag_good

How would I go about replacing the sting values with the array's median?

2
  • I apologize for the formatting it did retain its original format Commented Mar 20, 2020 at 16:06
  • 1
    Edit the question to show the full error message as properly formatted text. Commented Mar 20, 2020 at 16:08

2 Answers 2

3

The tilde operation is part of numpy and is a shortcut for numpy.invert.

By using a list comprehension, flag_good is a Python list, which doesn't support this operation.

For example, you can convert the list flag_good to a Numpy array and then use the invert function.

>>> flag_bad = ~np.array(flag_good)
>>> flag_bad
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False,  True, False, False, False, False, False,
        True, False, False, False, False, False])

Or you can use vectorize and directly return a Numpy array where the function is called elementwise instead of using the list comprehension.

>>> flag_good = np.vectorize(lambda x: x.isdigit())(arr2)
>>> flag_bad = ~flag_good
>>> flag_bad
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False,  True, False, False, False, False, False,
        True, False, False, False, False, False])
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you. The Boolean is now working like you have showed. But It is reporting an error when I try and run the code: arr2[FlagBad] = np.median(arr2[FlagGood]) is the function different on an array contains string and digit values then an array containing only digit values? I am trying to replicate a function I wrote to replace outliers with a mean, but this is replacing string values with a median and it is proving to be much different
@GregSullivan Which code exactly doesn't work? And what's the error message?
File "/Users/gregorysullivan/opt/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py", line 3561, in median return mean(part[indexer], axis=axis, out=out) File "<__array_function_ internals>", line 6, in mean File "/Users/gregorysullivan/opt/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3257, in mean out=out, **kwargs) File "/Users/gregorysullivan/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py", line 151, in _mean ret = umr_sum(arr, axis, dtype, out, keepdims) TypeError: cannot perform reduce with flexible type
@GregSullivan And which code do you execute? In the code listed above their is no mean function or something like that.
The code I executed was: arr2[FlagBad2] = np.median(arr2[FlagGood2]) but after seeing the error code and working on it I entered: arr2[FlagBad2] = np.median(arr2[FlagGood2].astype(int)) and it worked! Thank you for you help!
1

I feel like you could solve the problem from the beginning, editing your list comprehension.

flag_bad = [not(element.isdigit()) for element in arr2]

To answer your question, though, I'd do this:

import numpy as np
input_list = [1,2,3,1,5,2,3,4,2,
                  4,1,3,4,1,2,5,3,2,
                  1,"?",1,"n",3,2,5,
                  1,2,"Nan",3,2,2,4,3]

# calculate the median
median = int(np.median([elt for elt in input_list if type(elt) is int])) 

# replace elements of the list only if you have a non-int 
output_array = np.array([elt if type(elt) is int else median for elt in input_list])
print(output_array)

Output:

[1 2 3 1 5 2 3 4 2 4 1 3 4 1 2 5 3 2 1 2 1 2 3 2 5 1 2 2 3 2 2 4 3]

2 Comments

Thank you, I don't think python will calculate the median when there is string and int values in the array, and that is what I was struggling to figure out.
@GregSullivan Yes. That's why I only use the integers from the list to calculate the median. The code above should work on any platform without any errors.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.