3

Sort all the strings lexicographically but if a string is present completely as a prefix in another string, then string with longer length should come first.

e.g. 1 test, testtube are 2 strings and the string test is present as a prefix in testtube

sorted- testtube, test.

e.g. 2 bank, ant, testtube, test

sorted- ant, bank, testtube, test

How can we do this in python ? Tried alot but have'nt got any solution, need help.

1
  • how about creating a prefix tree first? Commented Mar 20, 2020 at 11:45

5 Answers 5

6

Perhaps append an "impossibly large" character at the end of each string?

def sort(a):
   return sorted(a, key=lambda s: s + chr(0x10FFFF))

Demo:

>>> sort(['test', 'testtube'])
['testtube', 'test']

>>> sort(['bank', 'ant', 'testtube', 'test'])
['ant', 'bank', 'testtube', 'test']

>>> sort(['test', 'testbb', 'testa'])
['testa', 'testbb', 'test']

It's the largest code point (chr even gives a ValueError for something larger) and actually a "noncharacter" and shouldn't occur naturally but we're free to use it for this:

Noncharacters are code points that are permanently reserved in the Unicode Standard for internal use. They are not recommended for use in open interchange of Unicode text data. [...] Applications are free to use any of these noncharacter code points internally.

Later in that section, the standard even suggests this usage (emphasis mine):

[...] U+10FFFF is associated with the largest legal UTF-32 32-bit code unit value, 10FFFF16. This attribute renders these two noncharacter code points useful for internal purposes as sentinels. For example, they might be used to indicate the end of a list, to represent a value in an index guaranteed to be higher than any valid character value, and so on.

Sign up to request clarification or add additional context in comments.

3 Comments

while i like the idea a lot it does not work for words = ('test', 'testbb', 'testa') (i did not downvote).
@hiroprotagonist It returns ['testa', 'testbb', 'test'] for that. Is that not correct?
@hiroprotagonist is the comparison of 2 words based on letters in them getting compared one by one or whole word is getting compared at a time ?
3

Python comes with a sort built in, accessible through list.sort or the sorted function, and the key argument lets you configure the order to sort by. The key function will be called on each element of the input, and the key function's return values will be compared instead of the original input elements to determine ordering.

By default, string comparison considers "end of string" to be lower than an actual character. With the order you want, "end of string" is considered higher than an actual character. We can represent this by making a list of characters from a string, and adding a special "end of string" marker to the end of the list. Our "end of string" marker implements comparison such that "end of string" compares equal to another "end of string", but greater than any character:

from functools import total_ordering

@total_ordering
class EndMarker(object):
    def __eq__(self, other):
        # Only equal to another end-marker...
        if not isinstance(other, EndMarker):
            return NotImplemented
        return True
    def __lt__(self, other):
        # and not less than anything.
        return False

endmarker = EndMarker()

def funky_sort(strings):
    return sorted(strings, key=lambda string: list(string) + [endmarker])

Alternatively, we can rely on the limited range of Unicode code points, by converting each character in the strings to its numerical code point, and making the end marker an integer greater than any possible code point. Or, we could make the end marker a floating-point infinity:

endmarker = float('inf')

def funky_sort(strings):
    return sorted(strings, key=lambda string: [ord(char) for char in string] + [endmarker])

1 Comment

Bit shorter version: [*map(ord, string), endmarker].
0
from functools import cmp_to_key

def custom_cmp(x, y):
    if x.startswith(y) or y.startswith(x):
        return len(y) - len(x)
    if x < y:
        return -1
    return 1

custom_key = cmp_to_key(custom_cmp)

a1 = ['test', 'testtube']
a1.sort(key=custom_key)
print(a1)
a2 = ['testtube', 'test']
a2.sort(key=custom_key)
print(a2)
a3 = ['bank', 'ant', 'testtube','test']
a3.sort(key=custom_key)
print(a3)

Result:

['testtube', 'test']
['testtube', 'test']
['ant', 'bank', 'testtube', 'test']

The idea is to pass custom key function, which is created by converting a comparator (because comparators are deprecated in Python 3).

Comments

0

One potential approach that seems intuitive: Sort everything normally first, and then make groups or chains where the items are a perfect prefix from left to right. Using groupby, we can tackle this as follows.

from itertools import groupby
from operator import itemgetter

def weird_sorting(list_):
    """designed to sort lexically, except when
    a string is a complete prefix of another, in which case
    the order is reversed
    """
    # sort everything lexically
    temp = sorted(list_)
    # get a grouper that indicates 
    # True if current string starts with the previous string
    grouper = [False] + [r.startswith(l) for l, r in zip(temp, temp[1:])]

    output = []
    # Group items in the grouper key, 
    # and reverse all items that have a True group, making sure to 
    # handle the "very first" string that is starting a True chain
    for k, group in groupby(zip(grouper, temp), key=itemgetter(0)):
        items = [v for k, v in group]
        if k:
            prev_item = output.pop()
            reversed_list = [prev_item, *items][::-1]
            output.extend(reversed_list)
        else:
            output.extend(items)
    return output


test1 = ['test', 'testtube']
print(weird_sorting(test1))
test2 = ['bank', 'ant', 'testtube', 'test']
print(weird_sorting(test2))
test3 = ['bank', 'ant', 'testtube', 'test', 'testtubebabies', 'zeta1', 'zeta11', 'zz']
print(weird_sorting(test3))

Outputs:

['testtube', 'test']
['ant', 'bank', 'testtube', 'test']
['ant', 'bank', 'testtubebabies', 'testtube', 'test', 'zeta11', 'zeta1', 'zz']

Comments

-2
strList = ['tube','testtube']

print(strList)
a = strList.sort()
print(strList)

strList = ['bank', 'ant', 'testtube','tube']

print(strList)
a = strList.sort()
print(strList)

Result:

['tube','testtube']

['testtube', 'tube']

['bank', 'ant', 'testtube', 'tube']

['ant', 'bank', 'testtube', 'tube']

2 Comments

this will not work for 'test', 'testbb', 'testa'.
You didn't include a single test case where one string is a prefix of another string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.