0

I was doing 368B on CodeForces with Python 3, which basically asks you to print the numbers of unique elements in a series of "suffixes" of a given array. Here's my solution (with some additional redirection code for testing):

import sys

if __name__ == "__main__":
    f_in = open('b.in', 'r')
    original_stdin = sys.stdin
    sys.stdin = f_in

    n, m = [int(i) for i in sys.stdin.readline().rstrip().split(' ')]
    a = [int(i) for i in sys.stdin.readline().rstrip().split(' ')]
    l = [None] * m
    for i in range(m):
        l[i] = int(sys.stdin.readline().rstrip())

    l_sorted = sorted(l)
    l_order = sorted(range(m), key=lambda k: l[k])

    # the ranks of elements in l
    l_rank = sorted(range(m), key=lambda k: l_order[k])

    # unique_elem[i] = non-duplicated elements between l_sorted[i] and l_sorted[i+1]
    unique_elem = [None] * m

    for i in range(m):
        unique_elem[i] = set(a[(l_sorted[i] - 1): (l_sorted[i + 1] - 1)]) if i < m - 1 else set(a[(l_sorted[i] - 1): n])

    # unique_elem_cumulative[i] = non-duplicated elements between l_sorted[i] and a's end
    unique_elem_cumulative = unique_elem[-1]

    # unique_elem_cumulative_count[i] = #unique_elem_cumulative[i]
    unique_elem_cumulative_count = [None] * m
    unique_elem_cumulative_count[-1] = len(unique_elem[-1])

    for i in range(m - 1):
        i_rev = m - i - 2
        unique_elem_cumulative = unique_elem[i_rev] | unique_elem_cumulative
        unique_elem_cumulative_count[i_rev] = len(unique_elem_cumulative)

    with open('b.out', 'w') as f_out:
        for i in range(m):
            idx = l_rank[i]
            f_out.write('%d\n' % unique_elem_cumulative_count[idx])

    sys.stdin = original_stdin
    f_in.close()

The code shows correct results except for the possibly last big test, with n = 81220 and m = 48576 (a simulated input file is here, and an expected output created by a naive solution is here). The time limit is 1 sec, within which I can't solve the problem. So is it possible to solve it within 1 sec with Python 3? Thank you.

UPDATE: an "expected" output file is added, which is created by the following code:

import sys

if __name__ == "__main__":
    f_in = open('b.in', 'r')
    original_stdin = sys.stdin
    sys.stdin = f_in

    n, m = [int(i) for i in sys.stdin.readline().rstrip().split(' ')]
    a = [int(i) for i in sys.stdin.readline().rstrip().split(' ')]
    with open('b_naive.out', 'w') as f_out:
        for i in range(m):
            l_i = int(sys.stdin.readline().rstrip())
            f_out.write('%d\n' % len(set(a[l_i - 1:])))

    sys.stdin = original_stdin
    f_in.close()
2
  • What is the expected output for the large input? Commented Nov 27, 2013 at 10:03
  • @thefourtheye I've attached the link of the "expected" output created by a naive solution. Commented Nov 27, 2013 at 10:29

1 Answer 1

1

You'll be cutting it close, I think. On my admittedly rather old machine, the I/O alone takes 0.9 seconds per run.

An efficient algorithm, I think, will be to iterate backwards through the array, keeping track of which distinct elements you've found. When you find a new element, add its index to a list. This will therefore be a descending sorted list.

Then for each li, the index of li in this list will be the answer.

For the small sample dataset

10 10
1 2 3 4 1 2 3 4 100000 99999
1
2
3
4
5
6
7
8
9
10

The list would contain [10, 9, 8, 7, 6, 5] since when reading from the right, the first distinct value occurs at index 10, the second at index 9, and so on.

So then if li = 5, it has index 6 in the generated list, so 6 distinct values are found at indices >= li. Answer is 6

If li = 8, it has index 3 in the generated list, so 3 distinct values are found at indices >= li. Answer is 3

It's a little fiddly that the excercise numbers 1-indexed and python counts 0-indexed. And to find this index quickly using existing library functions, I've reversed the list and then use bisect.

import timeit
from bisect import bisect_left

def doit():
    f_in = open('b.in', 'r')
    n, m = [int(i) for i in f_in.readline().rstrip().split(' ')]
    a = [int(i) for i in f_in.readline().rstrip().split(' ')]
    found = {}
    indices = []
    for i in range(n - 1, 0, -1):
        if not a[i] in found:
            indices.append(i+1)
            found[a[i]] = True

    indices.reverse()
    length = len(indices)
    for i in range(m):
        l = int(f_in.readline().rstrip())
        index = bisect_left(indices, l)
        print length - index

if __name__ == "__main__":
    print (timeit.timeit('doit()', setup="from bisect import bisect_left;from __main__ import doit", number=10))

On my machine outputs 12 seconds for 10 runs. Still too slow.

Sign up to request clarification or add additional context in comments.

1 Comment

Although I made my code accepted by using |= in the set union operation, yours is more efficient according to my test.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.