python iterating multiple lists in multiple loops

Question

I need to iterate through multiple lists, and do some calculations for the matching records:

for (a,b,c,d) in list1:
   for (a2,b2,e) in list2:
       if (a==a2) and (b==b2):
           mylist.add(a,b,c,d,e,d*e)

Is there an efficient way of doing the above calculation. Thanks a lot.

@ColBeseder: I'm guessing that the intent was if (a==a2) and (b==b2), but I agree that OP should clarify. — happydave
– happydave, Commented Dec 24, 2013 at 20:32
List may look like list1 = (name, last_name, gender, job_class, salary) list2 = (name, last_name, increase) and assume one more list, list3 = (job_class, bonus) so need to find all matching records, so that — hercules.cosmos
– hercules.cosmos, Commented Dec 24, 2013 at 20:34
You seem to be using the wrong data structures for the job. Please show some actual code and some actual data to work with. — Tim Pietzcker
– Tim Pietzcker, Commented Dec 24, 2013 at 20:40

Eric · Accepted Answer · 2013-12-24 21:04:09Z

5

Build some dictionaries for fast lookup:

data1 = {(a, b): (c, d) for a, b, c, d in list1}
data2 = {(a, b): e for a, b, e in list2}

result = []
for a, b in set(data1) & set(data2):
    c, d = data1[a, b]
    e = data2[a, b]
    result.append((a, b, c, d, e, e*d))

edited Dec 24, 2013 at 21:04

answered Dec 24, 2013 at 20:58

Eric

98.1k54 gold badges257 silver badges389 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Adam Smith Over a year ago

Worth pointing out, this is slower than the implementation in the question for datasets in the thousands.

Eric Over a year ago

@adsmith: Really? Shouldn't this be O(N) vs O(N^2)?

DSM Over a year ago

@adsmith: I find that very hard to believe. I could believe it was slower for lists of very small size, but for even moderate size I think the algorithmic complexity here has to win. @Eric: Note that this assumes that everything is unique in (a,b); not unreasonable, but not clear to me from the question. (Not sure how the OP is handling two people with the same name, e.g.)

Adam Smith Over a year ago

I generated dummy lists of 2,000 entries for list1 and 1,000 entries for list2. There WERE many duplicate entries, so it's possible that's caused the flip in efficiency. With actual data, a dictionary may provide faster lookup, but it's harder to test without some real data and I'm not about to generate dummy data 2,000 items large to run timeit on!

SimonT · Accepted Answer · 2013-12-24 21:00:13Z

With the new information in mind, that:

list1's elements are of the form (name, last_name, gender, job_class, salary),
list2 is contains elements of the form (name, last_name, increase) (presumably a raise for a person),
list3 is has elements like (job_class, bonus),

... you may benefit in both performance and code clarity using a dict.

Using a tuple of the form (first,last) to reference each person in your program, you can do something like this (in a basic example with input to get information):

people = dict()
for i in range(num_ppl):
    name = tuple(input().split()) # input is something like "Bob Smith"
    people[name] = getPeopleInfo() # read gender, job_class, salary, etc. and make a list
for i in range(num_raises):
    first, last, increase = input().split()
    people[(first,last)][-1] *= float(increase)
for i in range(num_bonuses):
    job_class, bonus = input().split()
    for name in people: # iterating through a dict gives the keys (similar to indices of a list, but can be immutable types such as tuples)
        if people[name][2] == job_class:
            people[name][-1] += bonus

Any immutable type such as str, int and tuple can be used as a key in a dict, similar to the 0-based integers used for a list. Note that a list can change (e.g. using list.append) and is "mutable"; therefore a list cannot be a key. For more information about dict you can read up on the documentation.

SimonT · Accepted Answer · 2013-12-24 20:40:22Z

1

In terms of time and memory efficiency, the current code seems mostly optimal. You have to check all of the elements of list1 and list2 against each other for your comparisons.
One addition to eliminate some repeated "wrong" cases is to add between the two for loop lines:

if a != b: # none of the items in list2 will satisfy a==b2 and b==b2
    continue

You can also use if a == b == b2 in Python instead of having to tie the statements together with and.

Depending on how your records are stored and accessed, you may benefit from using dicts over lists. A dict can tell if some An example of its implementation might be:

lookup = dict()

# when adding an item to what would be list2
if b2 in not in lookup:
    lookup[b2] = []
lookup[b2].append((a2,e))
# ...

for (a,b,c,d) in list1:
    if a == b and a in lookup:
        for (a2,e) in lookup[a]:
            mylist.add(a,b,c,d,e,d*e)

answered Dec 24, 2013 at 20:40

SimonT

2,3981 gold badge19 silver badges33 bronze badges

2 Comments

Tim Pietzcker Over a year ago

Careful, it should have been if a==a2 and b==b2:- there is a typo in the question, but the comments make it clear.

hercules.cosmos Over a year ago

List may look like list1 = (name, last_name, gender, job_class, salary) list2 = (name, last_name, increase) and assume one more list, list3 = (job_class, bonus) so need to find all matching records, so that for (name, last_name, gender, job_class1, salary) in list1: for (name2, last_name2, increase) in list 2: if (name==name2) and (last_name==last_name2): for (job_class2, bonus) in list3: if(job_class==job_class2): final_list.add(name, last_name, salary*increase + bonus)

Collectives™ on Stack Overflow

python iterating multiple lists in multiple loops

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related