2

I have a big array and a part of that is shown below. in each list, the first number is start and the 2nd number is end (so there is a range). what I want to do is:

1: filter out those lists (ranges) which are smaller than 300 (e.g. the 18th list in the following array must be removed)

2: get a smaller ranges (lists) in this way: (start+100) to (start+200). e.g the first list would be [ 569, 669].

I tried to use different split functions in numpy but non of them gives what I am looking for.

array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       [  40, 1363],
       [  56, 1457],
       [ 132,  606],
       [1175, 2096],
       [ 484, 2839],
       [ 132, 4572],
       [ 166, 1693],
       [  69, 3300],
       [ 142, 1003],
       [2118, 2118],
       [ 715, 1687],
       [ 301, 1006],
       [  48, 2142],
       [  63,  330],
       [ 479, 2411]], dtype=uint32)

do you guys know how to do that in python?

thanks

1
  • 1
    Filter should be easy... Python has an explicit filter function Commented Jul 30, 2016 at 14:10

4 Answers 4

2

Assuming your array is called A, then:

import numpy as np

# Filter out differences not wanted
gt300 = A[(np.diff(A) >= 300).flatten()]

# Set new value of first column
gt300[:,0] += 100

# Set value of second column
gt300[:,1] = gt300[:,0] + 100

Or maybe something like:

B = A[:,0][(np.diff(A) >= 300).flatten()]
C = np.repeat(B, 2).reshape((len(B), 2)) + [100, 200]
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Jon, when I run the last line it gives this error: "TypeError: list indices must be integers, not tuple"
@user3925736 which line exactly?
0

A general note before: You should use tuples to represnt such ranges, not lists, They are immutable data types with a meaning to the order of items in them.

As for 1, it is pretty easy to filter in python:

filter(lambda single_range: single_range[1] - single_range[0] > 300, ranges)

A clearer way (in my opinion) to do this is with a list comprehension:

[(start, end) for start, end in ranges if end - start > 300]

As for 2, I don't fully understand what you mean, but if you mean creating a new list of ranges, where each range is changes using a single function, you mean a map (or my preferred way, a list comprehension which is equal but more descriptive):

[(start + 100, start + 200) for start, end in ranges]

1 Comment

Hi Ehud, this is a numpy array. and I would like to get the same array for further handling.
0
data = [[ 469, 1300],
        # ...
        [  63,  330],
        [ 479, 2411]]

print(
    filter(lambda v: v[1] - v[0] >= 300, data)
)

print(
    [[v[0] + 100, v[0] + 200] for v in data]
)

Explanation:

The first command uses the builtin filter method to filter the remaining elements based on a lambda expression.

The second iterates over the list and generates a new one while doing so.

If the input and output should be numpy arrays try the following. Note: There is no way to filter an numpy array without creating a new one.

data = array([
    ( 469, 1300),
    ( 171, 1440),
    # ...
    (  63,  330),
    ( 479, 2411)], dtype=(uint32, uint32))

print(
    array(filter(lambda v: v[1] - v[0] >= 300, data), dtype=(uint32, uint32))
)

print(
    array([[v[0] + 100, v[0] + 200] for v in data], dtype=(uint32, uint32))
)

1 Comment

HI Simon, great. it works perfectly for matrix (list of lists), but my data is array and I have to get array (that is a numpy array). do you know how to manipulate the input array to get a numpy array like input but with the mentioned changes?
0

We can find which rows have the small difference with:

In [745]: mask=(x[:,1]-x[:,0])<300
In [746]: mask
Out[746]: 
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False,  True, False], dtype=bool)

We can use that mask to select those rows, or to deselect them

In [747]: x[mask,:]
Out[747]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)
In [748]: x[~mask,:]
Out[748]: 
array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       ...
       [ 479, 2411]], dtype=uint32)

To make a new set of ranges; get the first column; here I am using [0] so the selection remains a column array:

In [750]: x[:,[0]]
Out[750]: 
array([[ 469],
       [ 171],
       [ 187],
        ...
       [  48],
       [  63],
       [ 479]], dtype=uint32)

Add to that the desired offsets. This takes advantage of broadcasting.

In [751]: x[:,[0]]+[100,200]
Out[751]: 
array([[ 569,  669],
       [ 271,  371],
       [ 287,  387],
       [ 304,  404],
       [ 140,  240],
       [ 156,  256],
      ...
       [ 401,  501],
       [ 148,  248],
       [ 163,  263],
       [ 579,  679]], dtype=int64)

There are other ways of constructing such an array

np.column_stack([x[:,0]+100,x[:,0]+200])
np.array([x[:,0]+100, x[:,0]+200]).T   # or vstack

Other answers have suggested the Python list filter. I'm partial to list comprehensions in this kind of use, for example:

In [756]: np.array([i for i in x if (i[1]-i[0])<300])
Out[756]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)

For small lists of lists, the pure Python approach tends to be faster. But if the object is already a numpy array, it is faster to use the numpy operations that work on the whole array at once (i.e. do the iteration in compiled code). Hence my suggestion to use the boolean mask.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.