13

I have a bunch of sorted lists of objects, and a comparison function

class Obj :
    def __init__(p) :
        self.points = p
def cmp(a, b) :
    return a.points < b.points

a = [Obj(1), Obj(3), Obj(8), ...]
b = [Obj(1), Obj(2), Obj(3), ...]
c = [Obj(100), Obj(300), Obj(800), ...]

result = magic(a, b, c)
assert result == [Obj(1), Obj(1), Obj(2), Obj(3), Obj(3), Obj(8), ...]

what does magic look like? My current implementation is

def magic(*args) :
    r = []
    for a in args : r += a
    return sorted(r, cmp)

but that is quite inefficient. Better answers?

2
  • 1
    If they are: stackoverflow.com/questions/464342/… Commented Jul 21, 2009 at 9:40
  • How big are those lists? How much time is spent sorting them? Measure before (and after) you optimize. Commented Jul 21, 2009 at 23:26

10 Answers 10

13

Python standard library offers a method for it: heapq.merge.
As the documentation says, it is very similar to using itertools (but with more limitations); if you cannot live with those limitations (or if you do not use Python 2.6) you can do something like this:

sorted(itertools.chain(args), cmp)

However, I think it has the same complexity as your own solution, although using iterators should give some quite good optimization and speed increase.

Sign up to request clarification or add additional context in comments.

6 Comments

Using key instead of cmp should be prefered (and shoudl be faster). Python3 does not have cmp parameter anyway.
Actually, I was just using the same format as OP, but you are absolutely right and key should be preferred over cmp.
Well, and the OP's cmp function is wrong and doesn't work. If you're using heapq, you'll have to provide lt etc. methods on your class or use a tuple of (sorting key, object) in your heap instead.
I think you mean itertools.chain(*args). What you wrote is equivalent to sorted(iter(args), cmp), which makes little sense.
If I understand correctly that, sorting concatenated lists is Θ(n.log n) of complexity (proposed solution by code example), but merging (of sorted) lists is Θ(n). Not so small difference.
|
3

I like Roberto Liffredo's answer. I didn't know about heapq.merge(). Hmmmph.

Here's what the complete solution looks like using Roberto's lead:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points

a = [Obj(1), Obj(3), Obj(8)]
b = [Obj(1), Obj(2), Obj(3)]
c = [Obj(100), Obj(300), Obj(800)]

import heapq

sorted = [item for item in heapq.merge(a,b,c)]
for item in sorted:
    print item

Or:

for item in heapq.merge(a,b,c):
    print item

Comments

2

Use the bisect module. From the documentation: "This module provides support for maintaining a list in sorted order without having to sort the list after each insertion."

import bisect

def magic(*args):
    r = []
    for a in args:
        for i in a:
            bisect.insort(r, i)
    return r

Comments

2

Instead of using a list, you can use a [heap](http://en.wikipedia.org/wiki/Heap_(data_structure).

The insertion is O(log(n)), so merging a, b and c will be O(n log(n))

In Python, you can use the heapq module.

3 Comments

+1: Sorting a list in inherently inefficient: prevent the sort by using a smarter structure.
@OrganicPanda: Did you read the answer? It says that heapq amortizes the sorting cost. That's a smarter structure. Consider this, too. Accumulating three separate collections seems silly. Why not accumulate one hash of mutable objects; this can get updated by objects from the other sources. Now the "comparison" is moot because the objects have all be properly associated with each other without any sorting.
@S.Lott My apologies - I thought you were suggesting a smarter answer of your own but not explaining it .. my bad
0

I don't know whether it would be any quicker, but you could simplify it with:

def GetObjKey(a):
    return a.points

return sorted(a + b + c, key=GetObjKey)

You could also, of course, use cmp rather than key if you prefer.

Comments

0

One line solution using sorted:

def magic(*args):
  return sorted(sum(args,[]), key: lambda x: x.points)

IMO this solution is very readable.

Using heapq module, it could be more efficient, but I have not tested it. You cannot specify cmp/key function in heapq, so you have to implement Obj to be implicitly sorted.

import heapq
def magic(*args):
  h = []
  for a in args:
    heapq.heappush(h,a)
  return [i for i in heapq.heappop(h)

3 Comments

Your heapq method is a mess. You're pushing entire lists instead of their items, and you're ignoring the key. The one liner is cool, though.
Yeah you are right, I have used heapq just few times and I did not paste that to console to test it. My fault, sorry. Although now I see that Obj object must be defined "sortable" for heapq to work, because you cannot specify cmp/key function in heapq.
This code is all around a mess. Both snippets have syntax errors, and using sum for concatenating lists is very inefficient. Not to mention that there's operator.attrgetter to replace the lambda.
0

I asked a similar question and got some excellent answers:

The best solutions from that question are variants of the merge algorithm, which you can read about here:

Comments

0

Below is an example of a function that runs in O(n) comparisons.

You could make this faster by making a and b iterators and incrementing them.

I have simply called the function twice to merge 3 lists:

def zip_sorted(a, b):
    '''
    zips two iterables, assuming they are already sorted
    '''
    i = 0
    j = 0
    result = []
    while i < len(a) and j < len(b):
        if a[i] < b[j]:
            result.append(a[i])
            i += 1
        else:
            result.append(b[j])
            j += 1
    if i < len(a):
        result.extend(a[i:])
    else:
        result.extend(b[j:])
    return result

def genSortedList(num,seed):
    result = [] 
    for i in range(num):
        result.append(i*seed)
    return result

if __name__ == '__main__':
    a = genSortedList(10000,2.0)
    b = genSortedList(6666,3.0)
    c = genSortedList(5000,4.0)
    d = zip_sorted(zip_sorted(a,b),c)
    print d

However, heapq.merge uses a mix of this method and heaping the current elements of all lists, so should perform much better

Comments

0

Here you go: a fully functioning merge sort for lists (adapted from my sort here):

def merge(*args):
    import copy
    def merge_lists(left, right):
        result = []
        while left and right:
            which_list = (left if left[0] <= right[0] else right)
            result.append(which_list.pop(0))
        return result + left + right
    lists = list(args)
    while len(lists) > 1:
        left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
        result = merge_lists(left, right)
        lists.append(result)
    return lists.pop(0)

Call it like this:

merged_list = merge(a, b, c)
for item in merged_list:
    print item

For good measure, I'll throw in a couple of changes to your Obj class:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points
  • Derive from object
  • Pass self to __init__()
  • Make __cmp__ a member function
  • Add a str() member function to present Obj as string

Comments

0

Here is your magic function, in O(n0+n1+n2+...).

It tracks a pointer for each iterable and yields the smallest item at any pointer. i.e. Same idea as the 'merge' part of mergesort.

def merge(*iterables, key=lambda x:x):
    L = len(iterables)
    ls = [len(l) for l in iterables]
    ps = [0] * len(iterables)
    while sum([ls[i] - ps[i] for i in range(0, L)]) > 0:
        min, mink = None, None
        mini = -1
        for i in range(0, L):  # loop over pointers
            if ps[i] == ls[i]: continue
            keyed = key(iterables[i][ps[i]])
            if mink is None or keyed < mink:
                min = iterables[i][ps[i]]
                mink = keyed
                mini = i
        yield min
        ps[mini] += 1

Usage is:

merge([1, 2, 3], [3, 4, 5], [2, 4, 10]) == [1, 2, 2, 3, 3, 4, 4, 5, 10]

Note: It takes a key function, but the input iterables must also have been sorted according to the same key otherwise the inputs are not properly sorted.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.