Optimizing Code for Spectral Gradient algorithm in Python

Question

I am building an algorithm to run proximal spectral gradient for research purposes. The code is almost in its finalized step, but the runtime of the code is slow. I hope to seek your professional review to improve my code efficiency.

Below is the code for my algorithm:

from numpy import ones_like,array,diag,trace,identity,negative,random,cov,mean
from math import sqrt
from numpy.linalg import norm, inv
from numba import njit,jit

@njit(fastmath={'nsz'})
def project_f(v, mu, covar, lam_k, beta_1, theta, beta_2, gam):
    func = 0.5*beta_1*v.T@covar@v - theta*mu.T@v + gam/2*(v.sum()-1.)**2 + lam_k * (
                v.sum()-1.) + beta_2/2*v@v
    return func

@njit(fastmath={'nsz'})
def project_df(v, mu, covar, lam_k, beta_1, theta, beta_2, gam):
    par_func = beta_1*covar@v - theta*mu + gam*ones_like(v)*(v.sum()-1.) + lam_k*ones_like(v) + beta_2*v
    return par_func

@njit(fastmath={'nsz'})
def aug_lag(lam, gam, v):
    lam += gam*(v.sum()-1.)
    return lam

@njit(fastmath={'nsz'})
def lipschitz(beta_1,beta_2,gam,v,covar):
    lips = beta_1*sqrt(trace(covar@covar)) + beta_2*sqrt(len(v)) + gam*len(v)
    return min(1.,1./lips)

@njit(fastmath={'nsz'})
def spectral_grad(v,v_1,g,g_1,prev_B):
    s = v_1 - v
    y = g_1 - g

    if s.T@s > s.T@y:
        s1 = s**4
        s2 = s**2
        w_k = ((s.T@s) - (s.T@y)) / s1.sum()
        B_k  = array([1./(1.+ w_k*i) for i in s2])
        return diag(B_k)

    return prev_B
    #return np.identity(len(s)) #rescaling

@njit(fastmath={'nsz'})
def proximal(vec,sparsity,b1,b2,gam,covar,grad):
    L = lipschitz(b1,b2,gam,vec,covar)
    parameter = (2*sparsity/(L+1/L))**0.5
    v = []
    for i,j in zip(vec,grad):
        if abs(i) < parameter:
            v.append(0)
        else:
            new_v = i - j/(L+1/L)
            v.append(new_v)
    return array(v)



@njit(fastmath={'nsz'})
def return_and_risk(vec,mu,covar):
    return vec@mu,vec@covar@vec

@njit(fastmath=True)
def negative_count(vec):
    result = 0
    for i in vec:
        if i < 0:
            result += 1
    return result

@njit(fastmath=True)
def zero_count(vec):
    result = 0
    for i in vec:
        if i == 0:
            result += 1
    return result

@jit(forceobj=True)
def spectral_gradient1(df,v,covar,mu,b1,thet,b2,gam,tol,lam,spars,MAX_ITER=2000):
    vec = v
    B = identity(len(vec))
    aug_lam = lam
    vec_sum, grad = [],[]
    alpha = lipschitz(b1, b2, gam, vec, covar)

    for i in range(MAX_ITER):
        gradient = df(vec,mu,covar,aug_lam,b1,thet,b2,gam)
        direction = negative(inv(B) @ gradient)
        profit, std_dev = return_and_risk(vec,mu,covar)
        vec_1 = vec + alpha * direction
        gradient_1 = df(vec_1,mu,covar,aug_lam,b1,thet,b2,gam)
        B = spectral_grad(vec,vec_1,gradient,gradient_1,B)
        aug_lam = aug_lag(aug_lam, gam, vec_1)
        vec = proximal(vec_1,spars,b1,b2,gam,covar,gradient_1)
        vec_sum.append(vec.sum())
        grad.append(norm(gradient,2))

        # stopping criteria
        if norm(gradient,2) <= tol:
            #print("The optimum vector for", {df}, " is at ", vec,"at iteration ", i+1)
            #negat = negative_count(vec)
            #print("No of negative: ", negat)
            #zero = zero_count(vec)
            #print("No. of zeros: ", zero)
            #print("Vector sum: ",sum(vec))
            #print("Gradient: ", norm(gradient,2))
            break

        if i == MAX_ITER-1:
            #print('Higher no. of iterations is needed')
            #print("Vector: ", vec)
            #negat = negative_count(vec)
            #print("No of negative: ",negat)
            #zero = zero_count(vec)
            #print("No. of zeros: ", zero)
            #print("Vector sum: ",sum(vec))
            #print("Gradient: ", norm(gradient,2))
            break

    return vec, array(vec_sum), array(grad), i+1, profit, std_dev

tol, spars, ini_lam = 1.e-4, 1.e-3, 1000
beta_1,theta,beta_2,gam = 1.,.5,1.,1.
sim_data = -1+2*random.rand(1000,5)
sim_mean = mean(sim_data,axis=0)
sim_covar = cov(sim_data,rowvar=False)
v = array([1. / (len(sim_mean)) for i in range(len(sim_mean))])

vec2000,vsum,grad,_,_,_ =spectral_gradient1(project_df,v,sim_covar,sim_mean,beta_1,theta,beta_2,gam,tol,ini_lam,spars)

I have run the code once to check on its speed and it definitely needs to be optimized more

%timeit spectral_gradient1(project_df,v,sim_covar,sim_mean,beta_1,theta,beta_2,gam,tol,ini_lam,spars)
3.15 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Can you offer sample input and output data so that we can verify this ourselves? — Reinderien
– Reinderien, Commented Feb 21, 2022 at 15:06
Hi @Reinderien thanks for the comment. I have edited the bottom part of the script by adding the random input so that you can be run it smoothly :-) — Kevin Choon Liang Yew
– Kevin Choon Liang Yew, Commented Feb 21, 2022 at 16:12

Reinderien · Accepted Answer · 2022-02-21 19:18:19Z

Descoping-import from numpy is unconventional, and you should instead do the conventional import numpy as np.

Don't from math import when sqrt exists in numpy. Also, don't **0.5; use sqrt.

Add PEP484 type hints. These are important for self-documentation and do not impact performance.

gam is not a useful abbreviation of gamma; likewise for lam etc. Since lambda is a Python keyword the typical strategy is to write an underscore suffix.

Cache repeated calculations such as s.T@s and norm() into local variables.

In proximal, replace your for loop with a vectorised assignment.

Step 0 in performance analysis is to profile, and this is a one-line invocation of cProfile.run that immediately illustrates the problem: the overwhelming cost is the repeated call to linalg.inv(). So the problem is algorithmic and not in implementation; the algorithm needs to be examined critically. To this end: it looks like B remains the identity matrix throughout your calculation, which means that direction, rather than negative(inv(B) @ gradient), is numerically equivalent to -gradient. Of course this is much faster, but I cannot say whether this betrays a fault in your implementation.

Don't add your test parameters in the global namespace. Wrap these in a main.

Your use of rand is deprecated and you should be using default_rng instead.

Add at least bare-minimum numerical regression tests.

Remove your unused functions: project_f, negative_count and zero_count.

You need to move several repeated calculations to pre-calculated constants higher up in the call chain. For instance, theta * mean is constant, and the beta1*covariance calculation is also constant.

trace(covar@covar) is inefficient and is better-expressed as np.tensordot(cov, cov).

Suggested

import cProfile
from typing import Protocol

import numpy as np


class ProjectFun(Protocol):
    def __call__(
        self,
        v: np.ndarray, theta_mean: np.ndarray, cov: np.ndarray,
        lambda_k: float, beta_1: float, beta_2: float, gamma: float,
    ) -> np.ndarray:
        ...


def project_df(
    v: np.ndarray,  # varies
    theta_mean: np.ndarray, cov: np.ndarray,
    lambda_k: float,  # varies
    beta_1: float, beta_2: float, gamma: float,
) -> np.ndarray:
    return (
        beta_1 * cov @ v
        + beta_2 * v
        + (gamma * (v.sum() - 1) + lambda_k)
        - theta_mean
    )


def lipschitz(
    beta_1_cov: np.ndarray, beta_2: float, gamma: float,
    v: np.ndarray,  # varies
) -> float:
    lips = (
        beta_1_cov  # beta_1 * np.sqrt(np.tensordot(cov, cov))
        + beta_2 * np.sqrt(len(v))
        + gamma * len(v)
    )
    return min(1., 1/lips)


def spectral_grad(
    v: np.ndarray, v_1: np.ndarray,
    g: np.ndarray, g_1: np.ndarray, prev_B: np.ndarray,
) -> np.ndarray:
    s = v_1 - v
    y = g_1 - g
    sT_y = s.T@y
    sT_s = s.T@s
    if sT_s <= sT_y:
        return prev_B

    s1 = s**4
    s2 = s**2
    w_k = (sT_s - sT_y) / s1.sum()
    B_k = 1/(1 + w_k*s2)
    return np.diag(B_k)


def proximal(
    vec: np.ndarray,  # varies
    sparsity: float,
    beta_1_cov: np.ndarray, beta2: float, gamma: float,
    grad: np.ndarray,  # varies
) -> np.ndarray:
    L = lipschitz(beta_1_cov, beta2, gamma, vec)
    parameter = np.sqrt(2*sparsity/(L + 1/L))
    v = vec - grad/(L + 1/L)
    v[np.abs(vec) < parameter] = 0
    return v


def aug_lag(lambda_: float, gamma: float, v: np.ndarray) -> float:
    return lambda_ + gamma * (v.sum() - 1)


def return_and_risk(vec: np.ndarray, mean: np.ndarray, cov: np.ndarray) -> tuple[
    float,  # profit
    float,  # standard deviation
]:
    return vec@mean, vec@cov@vec


def spectral_gradient(
    df: ProjectFun,
    v: np.ndarray,
    cov: np.ndarray,
    mean: np.ndarray,
    beta1: float,
    theta: float,
    beta2: float,
    gamma: float,
    tol: float,
    lambda_: float,
    sparsity: float,
    MAX_ITER: int = 2000,
) -> tuple[
    np.ndarray,  # some mystery vector
    np.ndarray,  # some mystery vector sum
    np.ndarray,  # gradient
    int,         # iterations
    float,       # profit
    float,       # standard deviation
]:
    vec = v
    B = np.identity(len(vec))
    aug_lambda = lambda_
    vec_sum, grad = [], []
    beta_1_cov = beta1 * np.sqrt(np.tensordot(cov, cov))
    alpha = lipschitz(beta_1_cov, beta2, gamma, vec)
    theta_mean = theta * mean

    for i in range(MAX_ITER):
        gradient = df(vec, theta_mean, cov, aug_lambda, beta1, beta2, gamma)
        direction = -gradient  # -(np.linalg.inv(B) @ gradient) ??? B is always the identity
        profit, std_dev = return_and_risk(vec, mean, cov)
        vec_1 = vec + alpha*direction
        gradient_1 = df(vec_1, theta_mean, cov, aug_lambda, beta1, beta2, gamma)
        B = spectral_grad(vec, vec_1, gradient, gradient_1, B)
        aug_lambda = aug_lag(aug_lambda, gamma, vec)
        vec = proximal(vec_1, sparsity, beta_1_cov, beta2, gamma, gradient_1)
        vec_sum.append(vec.sum())
        norm = np.linalg.norm(gradient, 2)
        grad.append(norm)

        # stopping criteria
        if norm <= tol:
            break

    return vec, np.array(vec_sum), np.array(grad), i+1, profit, std_dev


def main() -> None:
    rand = np.random.default_rng(seed=0)
    sim_data = -1 + 2*rand.random((1000, 100))
    sim_mean = np.mean(sim_data, axis=0)
    sim_covar = np.cov(sim_data, rowvar=False)
    v = np.full_like(sim_mean, 1 / (len(sim_mean)))

    vec, vec_sum, grad, iters, profit, std_dev = spectral_gradient(
        df=project_df,
        v=v,
        cov=sim_covar,
        mean=sim_mean,
        beta1=1,
        theta=0.5,
        beta2=1,
        gamma=1,
        tol=1e-4,
        lambda_=1000,
        sparsity=1e-3,
    )

    def isclose(expected: float, actual: float) -> None:
        assert np.isclose(expected, actual, rtol=0, atol=1e-12)

    assert vec.shape == (100,)
    isclose(-0.007010796055097749, vec[0])
    isclose(-0.004207865733542051, vec.mean())

    assert vec_sum.shape == (2000,)
    isclose(-974.5320405384236, vec_sum[0])
    isclose(0.4996461919685679, vec_sum.mean())

    assert grad.shape == (2000,)
    isclose(10000.137462968969, grad[0])
    isclose(574.544628745777, grad.mean())

    assert iters == 2000
    isclose(0.0087850136941670460, profit)
    isclose(0.0016628777135266198, std_dev)


if __name__ == '__main__':
    cProfile.run('main()', sort='tottime')

Output

         72558 function calls (70525 primitive calls) in 0.491 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4000    0.267    0.000    0.285    0.000 274298.py:16(project_df)
     2000    0.111    0.000    0.111    0.000 274298.py:77(return_and_risk)
     8023    0.021    0.000    0.021    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.021    0.021    0.488    0.488 274298.py:84(spectral_gradient)
...

The protocol class is an easy way to specify a callback type whose parameters and return values are typed and named. The Protocol documentation will interest you. — Reinderien
– Reinderien, Commented Feb 22, 2022 at 0:36
main is a simple entry point function used to scope your variables, avoid polluting the global namespace and ease reuse and testing. — Reinderien
– Reinderien, Commented Feb 22, 2022 at 0:38
Yes, I definitely will look into the Protocol. Just started in Python and more to learn through the research :-) Also, just to understand more, the assert is a statement similar to Try Except statement, while the expected arg in isclose() func is a manual thing I have to key in? Please let me know if I understand it correctly. — Kevin Choon Liang Yew
– Kevin Choon Liang Yew, Commented Feb 22, 2022 at 0:53
One more thing, will numba be useful in your code as to how I use it in my code? — Kevin Choon Liang Yew
– Kevin Choon Liang Yew, Commented Feb 22, 2022 at 0:58
I find it universally unimportant as compared to fundamental algorithmic improvements. You are welcome to try it. — Reinderien
– Reinderien, Commented Feb 22, 2022 at 2:01

Stack Exchange Network

Optimizing Code for Spectral Gradient algorithm in Python

1 Answer 1

Suggested

Output

You must log in to answer this question.

Hot Network Questions

Optimizing Code for Spectral Gradient algorithm in Python

1 Answer 1

Suggested

Output

You must log in to answer this question.

Related

Hot Network Questions