23

I had a look at several dicussions on several sites and none of them gave me a solution. This piece of code takes more than 5 seconds to run :

for i in xrange(100000000):
  pass

I'm working on an integer optimization problem and I have to use an O(n log n) algorithm edit : an O(n²/4) algorithm, where n stands for all the matrix' items, ie, in the following code, n * m = 10000. So, for a matrix 100 * 100 with 10000 elements, it will result in nearly 25000000 iterations .... Its code can be summed up like this :

m = 100
n = 100
for i in xrange(m):
  for j in xrange(n):
    for i2 in xrange(i + 1, m):
      for j2 in xrange(j + 1, n):
        if myarray[i][j] == myarray[i2][j2] and myarray[i2][j] == myarray[i][j2]:
          return [i, j], [i2, j2]

Should I give up with Python and go back to Java or to C ?

I work with Python 2.7 and Psyco isn't available. PyPy doesn't support Tkinter out of the box and I'm using Tkinter.

So, would they improve the looping speed ? Are there other solutions ?

8
  • did you try to implement this with numpy arrays if that is possible ? Commented Jan 7, 2012 at 15:38
  • 1
    PyPy certainly speeds such loops up. Commented Jan 7, 2012 at 15:39
  • 3
    You can speed up your loops all you want, the code above is not O(n log n). Commented Jan 7, 2012 at 15:44
  • 1
    And pypy would speed things up, but by a factor of 4-5. Such a loop should take less than 0.5 sec on a decent computer when written in c. Commented Jan 7, 2012 at 16:42
  • It looks like this algorithm is n^2*m^2, and there's not a lot of optimization you can do to speed it up in a particular language. You would benefit far more from reexamining your algorithm and seeing if there are any unnecessary iterations. Commented Jan 7, 2012 at 18:27

5 Answers 5

18

Not the prettiest coding style, but desperate times call for desperate coding. Try turning your nested nested loops into one big generator expression:

try:
    i,j,i2,j2 = ((i,j,i2,j2)
        for i in xrange(m)
          for j in xrange(n)
            for i2 in xrange(i + 1, m)
              for j2 in xrange(j + 1, n)
                if myarray[i][j] == myarray[i2][j2] and 
                    myarray[i2][j] == myarray[i][j2]).next()
    return [i,j],[i2,j2]
except StopIteration:
    return None

Updated to use builtins next and product, and Py3 range instead of xrange:

from itertools import product
match = next(((i,j,i2,j2)
    for i, j in product(range(m), range(n))
        for i2, j2 in product(range(i+1, m), range(j+1, n))
            if myarray[i][j] == myarray[i2][j2] and 
                myarray[i2][j] == myarray[i][j2]), None)
if match is not None:
    i,j,i2,j2 = match
    return [i,j],[i2,j2]
else:
    return None
Sign up to request clarification or add additional context in comments.

2 Comments

It didn't improve the speed but nice expression anyway.
Can't upvote at the moment, need 5 more reputation points : /
14

You can still use the python notation, and have the speed of C, using the Cython project. A first step would be to convert the loop in C loop: it's automatically done by typing all the variables used in the loop:

cdef int m = 100
cdef int n = 100
cdef int i, j, i2, j2
for i in xrange(m):
  for j in xrange(n):
    for i2 in xrange(i + 1, m):
      for j2 in xrange(j + 1, n):

For The next part, it would be even better if that myarray would be pure C array, then no rich python comparaison nor array access. With your python array, you can remove the rich comparaison by doing native comparaison (if you have double in your array):

        cdef double a, b, c, d
        a = myarray[i][j]
        b = myarray[i2][j2]
        c = myarray[i2][j]
        d = myarray[i][j2]

        if a == b and c == d:
          return [i, j], [i2, j2]

You can see how optimization are going by running cython -a yourfile.pyx, then open the yourfile.html generate. You'll see how Cython have optimized your code, and removed Python overhead.

4 Comments

Ok, thanks, I will try cython cause I need a big boost of speed.
This page detail the benefits from cython with a problem similar to the one I'm dealing with : korokithakis.net/posts/optimizing-python-with-cython
I started using cython and numpy for the arrays. Loops are really faster. But I got a problem : I have function calls that slow down the loops, so I've got a lot of work to do with cython.
function call in python are expensive. Learn Cython more, you'll see how awesome it is :)
2

Sorry to tell you, but it's not a matter of optimization. No matter which language or implementation you choose, this algorithm is not O(n*log n) in the worst- and average case. You might want to read up on how the Big-O notation works and develop a better algorithm.

1 Comment

Ok it is O(n²/4), not O(n log n), but there is no other algorithms and this is an optimisation problem (see edit in the main post for details).
1

Sorry the generator expr didn't work. Here is a different scheme that first tallies all like values, and then looks for rectangular groups:

from collections import defaultdict
from itertools import permutations
def f3(myarray):
    tally = defaultdict(list)
    for i,row in enumerate(myarray):
        for j,n in enumerate(row):
            tally[n].append((i,j))

    # look for rectangular quads in each list
    for k,v in tally.items():
        for quad in permutations(v,4):
            # sort quad so that we can get rectangle corners 
            quad = sorted(quad)
            (i1,j1),(i2,j2) = quad[0], quad[-1]

            # slice out opposite corners to see if we have a rectangle
            others = quad[1:3]

            if [(i1,j2),(i2,j1)] == others:
                return [i1,j1],[i2,j2]

2 Comments

thanks, I tryed it and learned some new things about Python syntax, but it didn't run faster than the other things.
If you have a sparse array, and you only want matching values that are nonzero, then you can skip all of the permutation testing of the zero cells by preceding the for quad in permutations(v,4): statement with if k == 0: continue.
1

I looked several times across your code and if i get it right, your marking your rectangles with two different corner-markers. I'm sorry if my answer is more a comment to clarify your position then a really answer.

Answer-Part:: If your looking for an appropriate algorithm, consider a scan-line algorithm. I found an example here @SO for a "biggest rectangle solution".

My Question to you is, what are you really looking for?

  1. best way to solve the for-loop dilemma
  2. best language for your algorithm
  3. a faster algorithm to find rectangles

I must also point out, that Paul's and your solution produce different results, because Pauls assumes corners are marked by same values and you assume, that corners are marked through two different values!

I took the time and liberty to illustrate it with a ugly c&p script: It compares both algorithm by creating a 2D-field and fills it with string.lowercase chars and turns the corners to uppercase-chars.

from random import choice
from string import lowercase
from collections import defaultdict
from itertools import permutations
from copy import deepcopy

m,n = 20, 20
myarray = [[choice(lowercase) for x in xrange(m)] for y in xrange(n)]

def markcorners(i,j,i2,j2):
    myarray[i][j] = myarray[i][j].upper()
    myarray[i2][j] = myarray[i2][j].upper()
    myarray[i2][j2] = myarray[i2][j2].upper()
    myarray[i][j2] = myarray[i][j2].upper()

def printrect():
    for row in myarray:
        print ''.join(row)
    print '*'*m

def paul_algo():
    tally = defaultdict(list)
    for i,row in enumerate(myarray):
        for j,n in enumerate(row):
            tally[n].append((i,j))

    # look for rectangular quads in each list
    for k,v in tally.items():
        for quad in permutations(v,4):
            # sort quad so that we can get rectangle corners 
            quad = sorted(quad)
            (i1,j1),(i2,j2) = quad[0], quad[-1]

            # slice out opposite corners to see if we have a rectangle
            others = quad[1:3]

            if [(i1,j2),(i2,j1)] == others:
                markcorners(i1,j1,i2,j2)

def xavier_algo():   
    for i in xrange(m):
        for j in xrange(n):
            for i2 in xrange(i + 1, m):
                for j2 in xrange(j + 1, n):
                    if myarray[i][j] == myarray[i2][j2] and myarray[i2][j] == myarray[i][j2]:
                        markcorners(i,j,i2,j2)

savearray = deepcopy(myarray)
printrect()

xavier_algo()
printrect()

myarray = deepcopy(savearray)
paul_algo()
printrect()

2 Comments

Hi. Thanks for your answer. I need the opposite corners values to be different. A better algorithm would be nice. I will have a look at the one you point out tommorow. Got to sleep now cause it's kinda late here.
I had a look at the algo you suggested. Well thought, but it doesn't fit my needs which are particular. I don't need to find the biggest rectangle but I have to find all of them with opposite corners that got the same value. It is used in a neighbourhood function in a tabu search for a discrete optimization problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.