5

I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.

The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.

[EDIT] Here's a snippet (taken from another, much cleaner -and freely available-, implementation)

    def error_if_not_in_range01(value):
        if (value <= 0) or (value > 1):
            raise Exception, \
                str(value) + ' is not in [0,1)!'
    def H(x)
        n = len(x)
        entropy = 0.0
        sum = 0.0
        for x_i in x: # work on all x[i]
            print x_i
            error_if_not_in_range01(x_i)
            sum += x_i
            group_negentropy = x_i*log(x_i)
            entropy += group_negentropy
        error_if_not_1(sum)
        return -entropy
    def T(x):
        print x
        n = len(x)
        maximum_entropy = log(n)
        actual_entropy = H(x)
        redundancy = maximum_entropy - actual_entropy
        inequality = 1 - exp(-redundancy)
        return redundancy,inequality

Is there any way out of this problem?

1
  • Would you mind showing the python snippet that implements your calculation? Commented Nov 29, 2013 at 7:18

1 Answer 1

4

If I understand you correctly, the formula you are trying to implement is the following:

enter image description here

In this case, your problem is calculating the natural logarithm of Xi / mean(X), when Xi = 0.

However, since that has to be multiplied by Xi / mean(X) first, if Xi == 0 the value of ln(Xi / mean(X)) doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.

In the case that you are implementing Shannon's formula directly, the same holds:

enter image description here

In both the first and second form, calculating the log is not necessary if Pi == 0, because whatever value it is, it will have been multiplied by zero.

UPDATE:

Given the code you quoted, you can replace x_i*log(x_i) with a function as follows:

def Group_negentropy(x_i):
    if x_i == 0:
        return 0
    else:
        return x_i*log(x_i)

def H(x)
    n = len(x)
    entropy = 0.0
    sum = 0.0
    for x_i in x: # work on all x[i]
        print x_i
        error_if_not_in_range01(x_i)
        sum += x_i
        group_negentropy = Group_negentropy(x_i)
        entropy += group_negentropy
    error_if_not_1(sum)
    return -entropy
Sign up to request clarification or add additional context in comments.

3 Comments

Oh, that sounds like a very good idea, I will try it tomorrow morning.
What's the reason to check the x_i to be in the range(0,1) ???
I'm not sure. This appears to be the source of the snippet, so there might be more information in the referenced book: poorcity.richcity.org/oei

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.