2

I need to iterate through multiple lists, and do some calculations for the matching records:

for (a,b,c,d) in list1:
   for (a2,b2,e) in list2:
       if (a==a2) and (b==b2):
           mylist.add(a,b,c,d,e,d*e)

Is there an efficient way of doing the above calculation. Thanks a lot.

6
  • Give us an example of how the lists look? Commented Dec 24, 2013 at 20:28
  • For a start, in the outer loop ` If a != b: continue` Commented Dec 24, 2013 at 20:31
  • 1
    @ColBeseder: I'm guessing that the intent was if (a==a2) and (b==b2), but I agree that OP should clarify. Commented Dec 24, 2013 at 20:32
  • List may look like list1 = (name, last_name, gender, job_class, salary) list2 = (name, last_name, increase) and assume one more list, list3 = (job_class, bonus) so need to find all matching records, so that Commented Dec 24, 2013 at 20:34
  • 2
    You seem to be using the wrong data structures for the job. Please show some actual code and some actual data to work with. Commented Dec 24, 2013 at 20:40

3 Answers 3

5

Build some dictionaries for fast lookup:

data1 = {(a, b): (c, d) for a, b, c, d in list1}
data2 = {(a, b): e for a, b, e in list2}

result = []
for a, b in set(data1) & set(data2):
    c, d = data1[a, b]
    e = data2[a, b]
    result.append((a, b, c, d, e, e*d))
Sign up to request clarification or add additional context in comments.

4 Comments

Worth pointing out, this is slower than the implementation in the question for datasets in the thousands.
@adsmith: Really? Shouldn't this be O(N) vs O(N^2)?
@adsmith: I find that very hard to believe. I could believe it was slower for lists of very small size, but for even moderate size I think the algorithmic complexity here has to win. @Eric: Note that this assumes that everything is unique in (a,b); not unreasonable, but not clear to me from the question. (Not sure how the OP is handling two people with the same name, e.g.)
I generated dummy lists of 2,000 entries for list1 and 1,000 entries for list2. There WERE many duplicate entries, so it's possible that's caused the flip in efficiency. With actual data, a dictionary may provide faster lookup, but it's harder to test without some real data and I'm not about to generate dummy data 2,000 items large to run timeit on!
2

With the new information in mind, that:

  1. list1's elements are of the form (name, last_name, gender, job_class, salary),
  2. list2 is contains elements of the form (name, last_name, increase) (presumably a raise for a person),
  3. list3 is has elements like (job_class, bonus),

... you may benefit in both performance and code clarity using a dict.

Using a tuple of the form (first,last) to reference each person in your program, you can do something like this (in a basic example with input to get information):

people = dict()
for i in range(num_ppl):
    name = tuple(input().split()) # input is something like "Bob Smith"
    people[name] = getPeopleInfo() # read gender, job_class, salary, etc. and make a list
for i in range(num_raises):
    first, last, increase = input().split()
    people[(first,last)][-1] *= float(increase)
for i in range(num_bonuses):
    job_class, bonus = input().split()
    for name in people: # iterating through a dict gives the keys (similar to indices of a list, but can be immutable types such as tuples)
        if people[name][2] == job_class:
            people[name][-1] += bonus

Any immutable type such as str, int and tuple can be used as a key in a dict, similar to the 0-based integers used for a list. Note that a list can change (e.g. using list.append) and is "mutable"; therefore a list cannot be a key. For more information about dict you can read up on the documentation.

Comments

1

In terms of time and memory efficiency, the current code seems mostly optimal. You have to check all of the elements of list1 and list2 against each other for your comparisons.
One addition to eliminate some repeated "wrong" cases is to add between the two for loop lines:

if a != b: # none of the items in list2 will satisfy a==b2 and b==b2
    continue

You can also use if a == b == b2 in Python instead of having to tie the statements together with and.

Depending on how your records are stored and accessed, you may benefit from using dicts over lists. A dict can tell if some An example of its implementation might be:

lookup = dict()

# when adding an item to what would be list2
if b2 in not in lookup:
    lookup[b2] = []
lookup[b2].append((a2,e))
# ...

for (a,b,c,d) in list1:
    if a == b and a in lookup:
        for (a2,e) in lookup[a]:
            mylist.add(a,b,c,d,e,d*e)

2 Comments

Careful, it should have been if a==a2 and b==b2:- there is a typo in the question, but the comments make it clear.
List may look like list1 = (name, last_name, gender, job_class, salary) list2 = (name, last_name, increase) and assume one more list, list3 = (job_class, bonus) so need to find all matching records, so that for (name, last_name, gender, job_class1, salary) in list1: for (name2, last_name2, increase) in list 2: if (name==name2) and (last_name==last_name2): for (job_class2, bonus) in list3: if(job_class==job_class2): final_list.add(name, last_name, salary*increase + bonus)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.