"for" iteration vs. direct indexing performance

Question

Just got a strange results that I am trying to understand. I have a dataset about 325k rows (lists) with about 90 items each (strings, floats etc - it doesn't really matter). Say, if I want to do some processing for all item then I can iterate over them using 2 "for"s:

for eachRow in rows:
    for eachItem in eachRow:
        # do something

In my system this code executed for 41 sec. But if I replace nested loop with series of index acess ( eachRow[0], eachRowm[1] and so far up to eachRow[89] ), the execution time drops to 25 sec.

for eachRow in rows:
    eachRow[0]  # do something with this item
    eachRow[1]  # do something with this item
    ..
    eachRow[89] # do something with this item

Of course, writing code like that is not a good idea - I was just looking for a way to impove data processing performance and accidentally found this strange approach. Any comments?

index access is very fast in python ... thats what lists are setup for inserts/deletes are computationally expensive/ lookups are fast — Joran Beasley
– Joran Beasley, Commented Sep 10, 2012 at 16:13
but there is still overhead on it greater than on a lookup ... not much but spread over 300k rows maybe a significant ammount — Joran Beasley
– Joran Beasley, Commented Sep 10, 2012 at 16:14
Could you show a reproducible example that we could try and analyze? — David Robinson
– David Robinson, Commented Sep 10, 2012 at 16:15
In order to get better performances without "loop unrolling by hand", you should try to employ the functional paradigm - i.e. feed the "map" function with the transformation you want to perform and the complete array you're are working on. — ziu
– ziu, Commented Sep 10, 2012 at 16:18
My guess is that there is something else that was changed other than the indexing. As David Robinson suggested, post two completely working functions that do the same thing with the two methods that we can try to run an compare as well. — Mark Hildreth
– Mark Hildreth, Commented Sep 10, 2012 at 16:21

DSM · Accepted Answer · 2012-09-10 16:53:24Z

There does seem to be a slight performance advantage to doing the unrolling, but it's negligible, and so unless your do_something function really does almost nothing, you shouldn't see the difference. I have a tough time believing equivalent behaviour with the different approach could amount to a factor of 60%, although I'm always willing to be surprised by some implementation detail I'd never thought about.

tl;dr summary, using 32500 instead of 325000 because I'm impatient:

do_nothing easy 3.44702410698
do_nothing indexed 3.99766016006
do_nothing mapped 4.36127090454
do_nothing unrolled 3.33416581154
do_something easy 5.4152610302
do_something indexed 5.95649385452
do_something mapped 6.20316290855
do_something unrolled 5.2877831459
do_more easy 16.6573209763
do_more indexed 16.8381450176
do_more mapped 17.6184959412
do_more unrolled 16.0713188648

CPython 2.7.3, code:

from timeit import Timer

nrows = 32500
ncols = 90
a = [[1.0*i for i in range(ncols)] for j in range(nrows)]

def do_nothing(x):
    pass

def do_something(x):
    z = x+3
    return z

def do_more(x):
    z = x**3+x**0.5+4
    return z

def easy(rows, action):
    for eachRow in rows:
        for eachItem in eachRow:
            action(eachItem)

def mapped(rows, action):
    for eachRow in rows:
        map(action, eachRow)

def indexed(rows, action):
    for eachRow in rows:
        for i in xrange(len(eachRow)):
            action(eachRow[i])

def unrolled(rows, action):
    for eachRow in rows:
        action(eachRow[0])
        action(eachRow[1])
        action(eachRow[2])
        action(eachRow[3])
        action(eachRow[4])
        action(eachRow[5])
        action(eachRow[6])
        action(eachRow[7])
        action(eachRow[8])
        action(eachRow[9])
        action(eachRow[10])
        action(eachRow[11])
        action(eachRow[12])
        action(eachRow[13])
        action(eachRow[14])
        action(eachRow[15])
        action(eachRow[16])
        action(eachRow[17])
        action(eachRow[18])
        action(eachRow[19])
        action(eachRow[20])
        action(eachRow[21])
        action(eachRow[22])
        action(eachRow[23])
        action(eachRow[24])
        action(eachRow[25])
        action(eachRow[26])
        action(eachRow[27])
        action(eachRow[28])
        action(eachRow[29])
        action(eachRow[30])
        action(eachRow[31])
        action(eachRow[32])
        action(eachRow[33])
        action(eachRow[34])
        action(eachRow[35])
        action(eachRow[36])
        action(eachRow[37])
        action(eachRow[38])
        action(eachRow[39])
        action(eachRow[40])
        action(eachRow[41])
        action(eachRow[42])
        action(eachRow[43])
        action(eachRow[44])
        action(eachRow[45])
        action(eachRow[46])
        action(eachRow[47])
        action(eachRow[48])
        action(eachRow[49])
        action(eachRow[50])
        action(eachRow[51])
        action(eachRow[52])
        action(eachRow[53])
        action(eachRow[54])
        action(eachRow[55])
        action(eachRow[56])
        action(eachRow[57])
        action(eachRow[58])
        action(eachRow[59])
        action(eachRow[60])
        action(eachRow[61])
        action(eachRow[62])
        action(eachRow[63])
        action(eachRow[64])
        action(eachRow[65])
        action(eachRow[66])
        action(eachRow[67])
        action(eachRow[68])
        action(eachRow[69])
        action(eachRow[70])
        action(eachRow[71])
        action(eachRow[72])
        action(eachRow[73])
        action(eachRow[74])
        action(eachRow[75])
        action(eachRow[76])
        action(eachRow[77])
        action(eachRow[78])
        action(eachRow[79])
        action(eachRow[80])
        action(eachRow[81])
        action(eachRow[82])
        action(eachRow[83])
        action(eachRow[84])
        action(eachRow[85])
        action(eachRow[86])
        action(eachRow[87])
        action(eachRow[88])
        action(eachRow[89])


def timestuff():
    for action in 'do_nothing do_something do_more'.split():
        for name in 'easy indexed mapped unrolled'.split():
            t = Timer(setup="""
from __main__ import {} as fn
from __main__ import {} as action
from __main__ import a
""".format(name, action),
                      stmt="fn(a, action)").timeit(10)
            print action, name, t

if __name__ == '__main__':
    timestuff()

(Note that I didn't bother making the comparisons exactly fair, because I was only trying to gauge the likely scale of the variations, i.e. changes of order unity or not.)

Vladimir Ignatov · Accepted Answer · 2012-09-10 16:51:46Z

0

Sorry guys, it was my fault. It was something wrong with my system (this is not a standalone Python interpreter but a built-in in big system). After restarting the whole system I've got right results - about 2.8 sec for both variants. I feel stupid. Looking for a way to delete my question because of irrelevance.

answered Sep 10, 2012 at 16:51

Vladimir Ignatov

2,8882 gold badges18 silver badges12 bronze badges

1 Comment

DSM Over a year ago

No need to feel stupid! -- we've all been there -- and your instinct that this was a strange result was right. I would definitely recommend isolating the strangeness into a short example that people can copy and paste before asking on SO, though. Most of the time you'll realize what's going on yourself, and if you can't quite get it, then there's a group of about twenty to twenty-five people in the Python community who I'd trust to get the right answer inside of five minutes, ten at the outset, if they can paste code into their interpreter and start playing immediately.

chucksmash · Accepted Answer · 2012-09-10 17:58:05Z

Unlike the other responder who timed this, I saw a quite a large difference in timings. First, my code:

import random
import string
import timeit

r = 1000
outer1 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer2 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer3 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]

def x1(L):
    for outer in L:
        for inner in L:
            inner = inner[:-1]

def x2(L):
    for outer in L:
        for y in range(len(outer)):
            outer[y] = outer[y][:-1]

def x3(L):
    for x in range(len(L)):
        for y in range(len(L[x])):
            L[x][y] = L[x][y][:-1]

print "x1 =",timeit.Timer('x1(outer1)', "from __main__ import x1,outer1").timeit(10)
print "x2 =",timeit.Timer('x2(outer2)', "from __main__ import x2,outer2").timeit(10)
print "x3 =",timeit.Timer('x3(outer3)', "from __main__ import x3,outer3").timeit(10)

Note I'm running each of these 10 times. Each list is being populated with 3000 items which each contain 90 items which are each a random string of ten letters.

Representative results:

x1 = 8.0179214353
x2 = 0.118051644801
x3 = 0.150409681521

The function that uses no indexing (x1) takes 66 times longer to execute than does the one which uses indexing only for the inner loop (x2). Oddly enough, the function which only uses the indexing for the inner loop (x2) performs better than the one which uses indexing for both the outer loop and the inner loop (x3).

Collectives™ on Stack Overflow

"for" iteration vs. direct indexing performance

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related