split elements in array using python

Question

I have a big array and a part of that is shown below. in each list, the first number is start and the 2nd number is end (so there is a range). what I want to do is:

1: filter out those lists (ranges) which are smaller than 300 (e.g. the 18th list in the following array must be removed)

2: get a smaller ranges (lists) in this way: (start+100) to (start+200). e.g the first list would be [ 569, 669].

I tried to use different split functions in numpy but non of them gives what I am looking for.

array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       [  40, 1363],
       [  56, 1457],
       [ 132,  606],
       [1175, 2096],
       [ 484, 2839],
       [ 132, 4572],
       [ 166, 1693],
       [  69, 3300],
       [ 142, 1003],
       [2118, 2118],
       [ 715, 1687],
       [ 301, 1006],
       [  48, 2142],
       [  63,  330],
       [ 479, 2411]], dtype=uint32)

do you guys know how to do that in python?

thanks

Filter should be easy... Python has an explicit filter function — OneCricketeer
– OneCricketeer, Commented Jul 30, 2016 at 14:10

Jon Clements · Accepted Answer · 2016-07-31 03:47:36Z

2

Assuming your array is called A, then:

import numpy as np

# Filter out differences not wanted
gt300 = A[(np.diff(A) >= 300).flatten()]

# Set new value of first column
gt300[:,0] += 100

# Set value of second column
gt300[:,1] = gt300[:,0] + 100

Or maybe something like:

B = A[:,0][(np.diff(A) >= 300).flatten()]
C = np.repeat(B, 2).reshape((len(B), 2)) + [100, 200]

edited Jul 31, 2016 at 3:47

answered Jul 30, 2016 at 14:40

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3925736 Over a year ago

Hi Jon, when I run the last line it gives this error: "TypeError: list indices must be integers, not tuple"

Jon Clements Over a year ago

@user3925736 which line exactly?

Ehud Halamish · Accepted Answer · 2016-07-30 14:28:32Z

0

A general note before: You should use tuples to represnt such ranges, not lists, They are immutable data types with a meaning to the order of items in them.

As for 1, it is pretty easy to filter in python:

filter(lambda single_range: single_range[1] - single_range[0] > 300, ranges)

A clearer way (in my opinion) to do this is with a list comprehension:

[(start, end) for start, end in ranges if end - start > 300]

As for 2, I don't fully understand what you mean, but if you mean creating a new list of ranges, where each range is changes using a single function, you mean a map (or my preferred way, a list comprehension which is equal but more descriptive):

[(start + 100, start + 200) for start, end in ranges]

answered Jul 30, 2016 at 14:28

Ehud Halamish

1743 bronze badges

1 Comment

user3925736 Over a year ago

Hi Ehud, this is a numpy array. and I would like to get the same array for further handling.

Simon Kirsten · Accepted Answer · 2016-07-30 14:45:24Z

0

data = [[ 469, 1300],
        # ...
        [  63,  330],
        [ 479, 2411]]

print(
    filter(lambda v: v[1] - v[0] >= 300, data)
)

print(
    [[v[0] + 100, v[0] + 200] for v in data]
)

Explanation:

The first command uses the builtin filter method to filter the remaining elements based on a lambda expression.

The second iterates over the list and generates a new one while doing so.

If the input and output should be numpy arrays try the following. Note: There is no way to filter an numpy array without creating a new one.

data = array([
    ( 469, 1300),
    ( 171, 1440),
    # ...
    (  63,  330),
    ( 479, 2411)], dtype=(uint32, uint32))

print(
    array(filter(lambda v: v[1] - v[0] >= 300, data), dtype=(uint32, uint32))
)

print(
    array([[v[0] + 100, v[0] + 200] for v in data], dtype=(uint32, uint32))
)

edited Jul 30, 2016 at 14:45

answered Jul 30, 2016 at 14:15

Simon Kirsten

2,57720 silver badges21 bronze badges

1 Comment

user3925736 Over a year ago

HI Simon, great. it works perfectly for matrix (list of lists), but my data is array and I have to get array (that is a numpy array). do you know how to manipulate the input array to get a numpy array like input but with the mentioned changes?

hpaulj · Accepted Answer · 2016-07-30 17:05:57Z

We can find which rows have the small difference with:

In [745]: mask=(x[:,1]-x[:,0])<300
In [746]: mask
Out[746]: 
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False,  True, False], dtype=bool)

We can use that mask to select those rows, or to deselect them

In [747]: x[mask,:]
Out[747]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)
In [748]: x[~mask,:]
Out[748]: 
array([[ 469, 1300],
       [ 171, 1440],
       [ 187, 1564],
       [ 204, 1740],
       ...
       [ 479, 2411]], dtype=uint32)

To make a new set of ranges; get the first column; here I am using [0] so the selection remains a column array:

In [750]: x[:,[0]]
Out[750]: 
array([[ 469],
       [ 171],
       [ 187],
        ...
       [  48],
       [  63],
       [ 479]], dtype=uint32)

Add to that the desired offsets. This takes advantage of broadcasting.

In [751]: x[:,[0]]+[100,200]
Out[751]: 
array([[ 569,  669],
       [ 271,  371],
       [ 287,  387],
       [ 304,  404],
       [ 140,  240],
       [ 156,  256],
      ...
       [ 401,  501],
       [ 148,  248],
       [ 163,  263],
       [ 579,  679]], dtype=int64)

There are other ways of constructing such an array

np.column_stack([x[:,0]+100,x[:,0]+200])
np.array([x[:,0]+100, x[:,0]+200]).T   # or vstack

Other answers have suggested the Python list filter. I'm partial to list comprehensions in this kind of use, for example:

In [756]: np.array([i for i in x if (i[1]-i[0])<300])
Out[756]: 
array([[2118, 2118],
       [  63,  330]], dtype=uint32)

For small lists of lists, the pure Python approach tends to be faster. But if the object is already a numpy array, it is faster to use the numpy operations that work on the whole array at once (i.e. do the iteration in compiled code). Hence my suggestion to use the boolean mask.

Collectives™ on Stack Overflow

split elements in array using python

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related