Optimizing a complex algorithm

Question

I know this is not an ideal place for questions of this scope, but I'm not sure where else to ask this or how to break it down. I've been working on a function for the past couple weeks, that runs, but for it to be feasible for my purposes, I need to speed it up 200-300x.

I have an image array, where all pixels of similar color have been averaged and set to that average value. Then I have a 2D array of the same height and width, which labels each unique and non-contiguous feature of the image.

Using these I need to assess the size of each feature and its level of contrast to each of its neighbors. These values are used in an equation and if the output of that equation is below a certain threshold, that feature is merged with its most similar neighbor.

I've uploaded the image and the feature label array (printed with numpy.savetext()) to OneDrive and attached links

code:

def textureRemover(pix, labeledPix, ratio = 1.0):
    numElements = numpy.amax(labeledPix)
    maxSize = numpy.count_nonzero(labeledPix)
    MAXIMUMCONTRAST = 443.405

    for regionID in range(numElements):        
        start = time.clock()
        regionID += 1
        if regionID not in labeledPix:
            continue

        #print(regionID)
        #print((regionID / numElements) * 100, '%')

        neighborIDs = getNeighbors(labeledPix, regionID)
        if 0 in neighborIDs:
            neighborIDs.remove(0) #remove white value
        regionMask = labeledPix == regionID

        region = pix[regionMask]

        size = numpy.count_nonzero(regionMask)
        contrastMin = (ratio - (size / maxSize)) * MAXIMUMCONTRAST 
        regionMean = region.mean(axis = 0)

        if len(neighborIDs) > 200:
            contrast = numpy.zeros(labeledPix.shape)
            contrast[labeledPix!=0] = numpy.sqrt(numpy.sum((regionMean - pix[labeledPix!=0])**2, axis = -1))

            significantMask = (contrast < contrastMin)
            significantContrasts = list(numpy.unique(contrast[significantMask]))

            significantNeighbors = {}
            for significantContrast in significantContrasts:
                minContrast = min(significantContrasts)
                if labeledPix[contrast == minContrast][0] in neighborIDs:
                    significantNeighbors[minContrast] = labeledPix[contrast == minContrast][0]
                else:
                    significantContrasts.pop(significantContrasts.index(minContrast))

        else:
            significantNeighbors = {}
            for neighborID in neighborIDs:
                neighborMask = labeledPix == neighborID
                neighbor = pix[neighborMask]
                neighborMean = neighbor.mean(axis = 0)
                contrast = numpy.sqrt(numpy.sum((regionMean - neighborMean)**2, axis = -1))
                if contrast < contrastMin:
                    significantNeighbors[contrast] = neighborID

        if significantNeighbors:
            contrasts = significantNeighbors.keys()            
            minContrast = min(contrasts)

            minNeighbor = significantNeighbors[minContrast]
            neighborMask = labeledPix == minNeighbor
            neighborSize = numpy.count_nonzero(neighborMask)

            if neighborSize <= size:
                labeledPix[neighborMask] = regionID
                pix[neighborMask] = regionMean

            else:
                labeledPix[regionMask] = minNeighbor
                pix[regionMask] = pix[neighborMask].mean(axis = 0)

        print(time.clock() - start)
    return pix

pix

labeledPix

I know I'm asking for a lot of help, but I've been stuck on this for a few weeks and am unsure what else I can do. Any help will be greatly appreciated!

Have you profiled your code? Are there certain bottlenecks? What about parallelization. Is your algorithm splittable to N independent jobs? Then you could perhaps use AWS or another cloud computation service to access 200-300 nodes (thus reach the desired speedup). — Omni
– Omni, Commented Jan 3, 2018 at 21:15
Cant access the pictures. This sounds like a linear algebra problem related to compressing images by using a type of averaging. Take a look at this: aix1.uottawa.ca/~jkhoury/haar.htm — Katianie
– Katianie, Commented Jan 3, 2018 at 21:22
I have profiled it with low-resolution (using time.clock() to time different blocks) and taken care of any bottlenecks that I could detect. My understanding is that this is a CPU bound task and using a python module for parallelization would actually reduce performance. I'll start looking unto cloud computation right now, thanks for suggestion! — asheets
– asheets, Commented Jan 3, 2018 at 21:22
@Katianie This is interesting and very helpful for some other parts of my code! Here though, the averaging is already done, and the function above is assessing each already averaged zone — asheets
– asheets, Commented Jan 3, 2018 at 21:27

Paul Panzer · Accepted Answer · 2018-01-05 15:59:01Z

Here is an optimized version of most of your logic (I underestimated the amount of work that would be...). I skipped the >200 branch and am using fake data because I couldn't access your link. When I switch off your >200 branch your and my code appear to give the same result but mine is quite a bit faster on the fake example.

Sample output:

original
26.056154000000003
optimized
0.763613000000003
equal
True

Code:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def mockdata(m, n, k):
    colors = np.random.random((m, n, 3))
    i, j = np.ogrid[:m, :n]
    labels = np.round(k*k * (np.sin(0.05 * i) + np.sin(0.05 * j)**2)).astype(int) % k
    return colors, labels

DIAG_NEIGHBORS = True
MAXIMUMCONTRAST = 443.405

def textureRemover2(pix, labeledPix, ratio=1.0):
    start = time.clock()
    pix, labeledPix = pix.copy(), labeledPix.copy()
    pixf, labeledPixf = pix.reshape(-1, 3), labeledPix.ravel()
    m, n = labeledPix.shape
    s, t = labeledPix.strides
    # find all sizes in O(n)
    sizes = np.bincount(labeledPixf)
    n_ids = len(sizes)
    # make index for quick access to labeled areas
    lblidx = np.split(np.argsort(labeledPixf), np.cumsum(sizes[:-1]))
    lblidx[0] = None
    # find all mean colors in O(n)
    regionMeans = np.transpose([np.bincount(labeledPix.ravel(), px)
                                / np.maximum(sizes, 1)
                                for px in pix.reshape(-1, 3).T])
    # find all neighbors in O(n)
    horz = set(frozenset(p) for bl in as_strided(labeledPix, (m,n-1,2), (s,t,t))
               for p in bl)
    vert = set(frozenset(p) for bl in as_strided(labeledPix, (m-1,n,2), (s,t,s))
               for p in bl)
    nb = horz|vert
    if DIAG_NEIGHBORS:
        dwnrgt = set(frozenset(p) for bl in as_strided(
            labeledPix, (m-1,n-1,2), (s,t,s+t)) for p in bl)
        dwnlft = set(frozenset(p) for bl in as_strided(
            labeledPix[::-1], (m-1,n-1,2), (-s,t,t-s)) for p in bl)
        nb = nb|dwnrgt|dwnlft
    nb = {p for p in nb if len(p) == 2 and not 0 in p}
    nb_dict = {}
    for a, b in nb:
        nb_dict.setdefault(a, set()).add(b)
        nb_dict.setdefault(b, set()).add(a)

    maxSize = labeledPix.size - sizes[0]

    for id_ in range(1, n_ids):
        nbs = list(nb_dict.get(id_, set()))
        if not nbs:
            continue
        d = regionMeans[id_] - regionMeans[nbs]
        d = np.einsum('ij,ij->i', d, d)
        mnd = np.argmin(d)
        if d[mnd] < ((ratio - sizes[id_]/maxSize) * MAXIMUMCONTRAST)**2:
            mn = nbs[mnd]
            lrg, sml = (id_, mn) if sizes[id_] >= sizes[mn] else (mn, id_)
            sizes[lrg], sizes[sml] = sizes[lrg] + sizes[sml], 0
            for nb in nb_dict[sml]:
                nb_dict[nb].remove(sml)
                nb_dict[nb].add(lrg)
            nb_dict[lrg].update(nb_dict[sml])
            nb_dict[lrg].remove(lrg)
            nb_dict[sml] = set()
            pixf[lblidx[sml]] = regionMeans[lrg]
            labeledPixf[lblidx[sml]] = lrg
            lblidx[lrg], lblidx[sml] = np.r_[lblidx[lrg],lblidx[sml]], None
    print(time.clock() - start)
    return pix

from scipy.ndimage.morphology import binary_dilation
import time

STRUCTEL = np.ones((3,3), int) if DIAG_NEIGHBORS else np.array([[0,1,0],[1,1,1],[0,1,0]], int)

def getNeighbors(labeledPix, regionID):
    nb = set(labeledPix[binary_dilation(labeledPix == regionID, structure=STRUCTEL)])
    nb.remove(regionID)
    return sorted(nb)

numpy = np

def textureRemover(pix, labeledPix, ratio = 1.0):
    pix, labeledPix = pix.copy(), labeledPix.copy()
    numElements = numpy.amax(labeledPix)
    maxSize = numpy.count_nonzero(labeledPix)
    MAXIMUMCONTRAST = 443.405

    start = time.clock()
    for regionID in range(numElements):        
        regionID += 1
        if regionID not in labeledPix:
            continue

        #print(regionID)
        #print((regionID / numElements) * 100, '%')

        neighborIDs = getNeighbors(labeledPix, regionID)
        if 0 in neighborIDs:
            neighborIDs.remove(0) #remove white value
        regionMask = labeledPix == regionID

        region = pix[regionMask]

        size = numpy.count_nonzero(regionMask)
        contrastMin = (ratio - (size / maxSize)) * MAXIMUMCONTRAST 
        regionMean = region.mean(axis = 0)

        if len(neighborIDs) > 20000:
            contrast = numpy.zeros(labeledPix.shape)
            contrast[labeledPix!=0] = numpy.sqrt(numpy.sum((regionMean - pix[labeledPix!=0])**2, axis = -1))

            significantMask = (contrast < contrastMin)
            significantContrasts = list(numpy.unique(contrast[significantMask]))

            significantNeighbors = {}
            for significantContrast in significantContrasts:
                minContrast = min(significantContrasts)
                if labeledPix[contrast == minContrast][0] in neighborIDs:
                    significantNeighbors[minContrast] = labeledPix[contrast == minContrast][0]
                else:
                    significantContrasts.pop(significantContrasts.index(minContrast))

        else:
            significantNeighbors = {}
            for neighborID in neighborIDs:
                neighborMask = labeledPix == neighborID
                neighbor = pix[neighborMask]
                neighborMean = neighbor.mean(axis = 0)
                contrast = numpy.sqrt(numpy.sum((regionMean - neighborMean)**2, axis = -1))
                if contrast < contrastMin:
                    significantNeighbors[contrast] = neighborID

        if significantNeighbors:
            contrasts = significantNeighbors.keys()            
            minContrast = min(contrasts)

            minNeighbor = significantNeighbors[minContrast]
            neighborMask = labeledPix == minNeighbor
            neighborSize = numpy.count_nonzero(neighborMask)

            if neighborSize <= size:
                labeledPix[neighborMask] = regionID
                pix[neighborMask] = regionMean

            else:
                labeledPix[regionMask] = minNeighbor
                pix[regionMask] = pix[neighborMask].mean(axis = 0)

    print(time.clock() - start)
    return pix

data = mockdata(200, 200, 1000)
print('original')
res0 = textureRemover(*data)
print('optimized')
res2 = textureRemover2(*data)
print('equal')
print(np.allclose(res0, res2))

Your output looks like a huge improvement! Thank you! I'm trying to work my way through it now and test it against my data. I've changed the links to be available on google drive if you're interested

Collectives™ on Stack Overflow

Optimizing a complex algorithm

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related