2

I am trying to do a union of two numpy arrays in the following manner

np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )

The output reads:

array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
    0.46,  0.47,  0.48,  0.49,  0.5 ,  0.5 ,  0.51,  0.52,  0.53,
    0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.6 ,  0.61,
    0.62,  0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,
    0.7 ,  0.71,  0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,
    0.79,  0.8 ,  0.8 ,  0.9 ])

In the output of this union, the number 0.5 features twice. Even when I use the unique function in numpy, this replication of the number 0.5 doesn't go away. Meaning:

np.unique( np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  ) )

also gives the same output. What am I doing wrong? How can I correct this and get the desired output (i.e. have only one occurrence of the number 0.5 in my array?

7
  • 3
    It's probably an issue with floating point comparison/precision. Is it possible to use integers for the arange and union operations and then do a division later on to get normalized arrays? Commented Mar 8, 2017 at 8:04
  • The array that I am intending to use has unequal spacing, i.e. spacing of 0.1 between 0.1 and 0.9 plus a spacing of 0.01 between 0.4 and 0.8. Commented Mar 8, 2017 at 8:10
  • You should still be able to achieve that using integers. See my answer for example. Commented Mar 8, 2017 at 8:13
  • Try looking at them separately: a[13] gives 0.5, but a[14] gives 0.50000000000000011 Commented Mar 8, 2017 at 9:42
  • 1
    Did either of the posted solutions work for you? Commented Mar 11, 2017 at 16:54

4 Answers 4

3

Given the input array is sorted, using the same philosophy as in this post -

a[np.r_[True,~np.isclose(a[1:] , a[:-1])]]

Sample run -

In [20]: a = np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )

In [21]: a
Out[21]: 
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.6 ,  0.61,
        0.62,  0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,
        0.7 ,  0.71,  0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,
        0.79,  0.8 ,  0.8 ,  0.9 ])

In [22]: a[np.r_[True,~np.isclose(a[1:] , a[:-1])]]
Out[22]: 
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])
Sign up to request clarification or add additional context in comments.

Comments

1

As stated by @ImNt in the comments, this might be due to floating point comparision/precision (probably they are not 0.5 in memory, but 0.500000000001)

You can make a workaround, though. You know your numbers will be at most 2 digits long. Then, you can first np.round the array before applying np.unique.

x = np.union1d( np.arange(0.1, 0.91, 0.1), np.arange(0.4, 0.81, 0.01)  )
x = np.round(x, 2) # Round 2 floating points
x = np.unique(x) 

Output:

array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])

Comments

1

As I have written in my comment, it will be an issue due to floating point precision and their comparison. If applicable in your particular case I would suggest working with integers and normalizing later on.

For example

x = np.union1d( np.arange(10, 91, 10), np.arange(40, 81, 1)  )
x = x/100.0

Output:

[ 0.1   0.2   0.3   0.4   0.41  0.42  0.43  0.44  0.45  0.46  0.47  0.48
  0.49  0.5   0.51  0.52  0.53  0.54  0.55  0.56  0.57  0.58  0.59  0.6
  0.61  0.62  0.63  0.64  0.65  0.66  0.67  0.68  0.69  0.7   0.71  0.72
  0.73  0.74  0.75  0.76  0.77  0.78  0.79  0.8   0.9 ]

Comments

1

Or you could use Fractions:

>>> import numpy as np
>>> from fractions import Fraction
>>> np.union1d( np.arange(Fraction(1,10), Fraction(91,100), Fraction(1,10)), np.arange(Fraction(4,10), Fraction(81,100),Fraction(1,100)))
array([Fraction(1, 10), Fraction(1, 5), Fraction(3, 10), Fraction(2, 5),
       Fraction(41, 100), Fraction(21, 50), Fraction(43, 100),
       Fraction(11, 25), Fraction(9, 20), Fraction(23, 50),
       Fraction(47, 100), Fraction(12, 25), Fraction(49, 100),
       Fraction(1, 2), Fraction(51, 100), Fraction(13, 25),
       Fraction(53, 100), Fraction(27, 50), Fraction(11, 20),
       Fraction(14, 25), Fraction(57, 100), Fraction(29, 50),
       Fraction(59, 100), Fraction(3, 5), Fraction(61, 100),
       Fraction(31, 50), Fraction(63, 100), Fraction(16, 25),
       Fraction(13, 20), Fraction(33, 50), Fraction(67, 100),
       Fraction(17, 25), Fraction(69, 100), Fraction(7, 10),
       Fraction(71, 100), Fraction(18, 25), Fraction(73, 100),
       Fraction(37, 50), Fraction(3, 4), Fraction(19, 25),
       Fraction(77, 100), Fraction(39, 50), Fraction(79, 100),
       Fraction(4, 5), Fraction(9, 10)], dtype=object)
>>> _.astype(float)
array([ 0.1 ,  0.2 ,  0.3 ,  0.4 ,  0.41,  0.42,  0.43,  0.44,  0.45,
        0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,  0.54,
        0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,  0.63,
        0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,  0.72,
        0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,  0.9 ])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.