1

I have an array of functions shape (n,) and a numpy matrix of shape (m, n). Now I want to apply each function to its corresponding column in the matrix, i.e.

matrix[:, i] = funcs[i](matrix[:, i])

I could do this with a for loop (see example below), but using for loops is generally discouraged in numpy. My question is what is the quickest (and preferably most elegant) way to do this?

A working example

import numpy as np

# Example of functions to apply to each row
funcs  = np.array([np.vectorize(lambda x: x+1),
                   np.vectorize(lambda x: x-2),
                   np.vectorize(lambda x: x+3)])
# Initialise dummy matrix
matrix = np.random.rand(50, 3)

# Apply each function to each column
for i in range(funcs.shape[0]):
    matrix[:, i] = funcs[i](matrix[:, i])
13
  • 2
    Could you share the actual functions that you are working with? Commented Sep 4, 2018 at 13:11
  • 2
    Loops are discouraged when you're performing the same operation over and over, but as you're looping over different Python functions here, it may not be possible to make it any faster. Commented Sep 4, 2018 at 13:11
  • funcs = np.array([np.vectorize(lambda x: x+1),... looks like you've gone off course. It might be better to take a step back and show what problem you're actually trying to solve. Commented Sep 4, 2018 at 13:13
  • @Divakar, the functions I am working with are based on sklearn.cluster.dbscan, but vary slightly for a specific task. I think it would become too complex to explain that for the question I have here. Commented Sep 4, 2018 at 13:13
  • @roganjosh Care to explain? This is pretty close to what it would be in pure Python, i.e. a list of functions. Commented Sep 4, 2018 at 13:15

2 Answers 2

1

For an array that has many rows and a few columns, a simple column iteration should be time effective:

In [783]: funcs = [lambda x: x+1, lambda x: x+2, lambda x: x+3]
In [784]: arr = np.arange(12).reshape(4,3)
In [785]: for i in range(3):
     ...:     arr[:,i] = funcs[i](arr[:,i])
     ...:     
In [786]: arr
Out[786]: 
array([[ 1,  3,  5],
       [ 4,  6,  8],
       [ 7,  9, 11],
       [10, 12, 14]])

If the functions work with 1d array inputs, there's not need for np.vectorize (np.vectorize is generally slower than plain iteration anyways.) Also for iteration like this there's no need to wrap the list of functions in an array. It's faster to iterate on lists.

A variation on the indexed iteration:

In [787]: for f, col in zip(funcs, arr.T):
     ...:     col[:] = f(col)
     ...:     
In [788]: arr
Out[788]: 
array([[ 2,  5,  8],
       [ 5,  8, 11],
       [ 8, 11, 14],
       [11, 14, 17]])

I use arr.T here so the iteration is on the columns of arr, not the rows.

A general observation: a few iterations on a complex task is perfectly good numpy style. Many iterations on simple tasks is slow, and should be performed in compiled code where possible.

Sign up to request clarification or add additional context in comments.

Comments

1

A loop is efficient here since the job in the loop is heavy.

A readable solution is just :

np.vectorize(apply)(funcs,matrix)

1 Comment

Thanks for the solution, do you know if there is an equivalent to apply in python 3.x?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.