5

I am looking to implement a very simple function which finds the median of an unsorted array by counting the number of smaller elements and number of larger elements if they are equal in number then the original is considered as median.

I know several algorithm like minHeap and Quick Select, but I am trying to keep things simple as a human would do with a naked eye to simply count larger and smaller numbers. So far I have implemented below function but problem arise when I have duplicate entries in array and also with even and odd array length.

I am new to C programming and need to understand what is going wrong. Below is the code, I have written a function to return random array of variable length to test this function from.

int med(int count, int *array)
{
int i, j, median = -1, smaller = 0, larger = 0;

for(i = 0; i < count; i++)
{
    for(j = 0; j < count; j++)
    {
        //larger++

        if(array[i] < array[j] && i!=j)
        {
            larger++;
        }
        //Smaller++
        if(array[i] >= array[j] && i!=j)
        {
            smaller++;
        }
    }
    printf("\nFor pivot: %d", array[i]);
    if(larger == smaller)
    {
        printf("\n Smaller: %d", smaller);
        printf(" Larger: %d", larger);
        median = array[i];
        break;
    }
    else
    {
        printf("\n Smaller: %d", smaller);
        printf(" Larger: %d", larger);

        larger = 0;
        smaller = 0;
    }
}
return median;
}

In some cases like {3,5,0,2,3} my function returns -1 but the actual result should be 3.

EDIT Initially I started with strictly greater or lesser but this condition (larger == smaller) never gets hit when I have duplicate entries thus I considered equal elements as smaller. I am having difficulty handling the equalities

8
  • Is there any reason why you want to avoid sorting it? Commented Apr 20, 2019 at 2:04
  • It's starting to look like an XY-problem but you made me curious. What's the purpose of this new sorting algorithm? Commented Apr 20, 2019 at 2:12
  • Its surely not a xy problem! I do understand there are various ways to do it as I mentioned 2 of them and you gave one in the answer. I am just thinking it from how a human would do it given 4,5 numbers to find median (count for every element if number of smaller are equal to number of larger) and you have it! Also the new algorithm is something I thought and couldn't find on internet so why not try it! Commented Apr 20, 2019 at 2:16
  • 1
    But what is the purpose of creating a sorting algorithm utilizing a median function that is O(n²)? That sorting algorithm would be O(n²log n) if you implement the sorting well, but this is worse than even the simplest sortings like bubble sort and selection sort. And if you use a naive implementation of the sorting, it will be O(n³) Commented Apr 20, 2019 at 2:22
  • I understand that, and I can always i,prove upon the way to find median. but to start with, this was my approach and I am stuck. There are plenty of workarounds but why opt one without even trying to overcome this? Commented Apr 20, 2019 at 2:25

2 Answers 2

5

B. Shefter found the bug for you. However, I still want to address the question.

I am looking to implement a very simple function which finds the median of an unsorted array by counting the number of smaller elements and number of larger elements if they are equal in number then the original is considered as median.

Only do this, if you can do it faster than O(nlog n), because that's the time complexity of qsort. I would recommend trying the median of medians algorithm. You can read about it here and here is the code from that site, but with comments removed:

int select(int *a, int s, int e, int k){
    if(e-s+1 <= 5){
        sort(a+s, a+e);
        return s+k-1;
    }
    
    for(int i=0; i<(e+1)/5; i++){
        int left = 5*i;
        int right = left + 4;
        if(right > e) right = e;
        int median = select(a, 5*i, 5*i+4, 3);
        swap(a[median], a[i]);
    }
    
    return select(a, 0, (e+1)/5, (e+1)/10);
}

I know several algorithm like using minHeap and Quick Select but I am trying to keep things simple as a human would do with a naked eye to simply count larger and smaller numbers.

While it is a good thing to keep things simple, make sure that that's what's your doing. The C standard library has a built in quick sort. If you use that one, the code can look like this:

int int_cmp(const void *a, const void *b) 
{ 
    const int ia = *(const int *)a; 
    const int ib = *(const int *)b;

    return ia-ib;
}

int med(int count, int *array)
{
    int tmp[count]; // You might want to use malloc instead

    memcpy(tmp, array, count * sizeof(*array));

    qsort(tmp, count, sizeof(tmp[0]), int_cmp);

    return tmp[count/2];
}

It is both faster and easier to read. Your code is O(n²) while this is O(nlog n).

You mentioned in a comment that you want to use this for a new sorting method. Then I want to mention that the median of sets with an odd number of elements typically is not a member of the set, so you need to alter your definition of median to suit your needs.

Here is an example of how you can achieve what you want in a pretty readable way, while still maintaining your idea. I start by adding a subproblem, which instead of "what is the median in the array" is "is x the median of the array". And then we ask that question for each element in the array until we have found the median.

int is_median(int x, int *array, int count) {
    int l=0, h=0;

    for(int i=0; i<count; i++) {
        if(array[i] < x) l++;
        else if(array[i] > x) h++;
    }
    
    if(h == l) return 1; // This is always a sufficient condition

    // Here you need to decide what to do. Just the above is not enough
    // for your purposes.
    else if(<condition>) return 1; 

    else return 0;
}

int med(int count, int *array) {
    for(int i = 0; i < count; i++) {
        if(is_median(array[i], array, count)) return array[i];
    }
    return 0; // This line should never be executed. It't only here
              // to suppress a warning.
}
    
Sign up to request clarification or add additional context in comments.

Comments

3

The -1 comes from the following: Your codes initialize median to -1, and it never changes unless larger == smaller. In cases where that never happens after iterating through the entire array, the code returns -1.

I think the conceptual bug is that you have arbitrarily decided to increment smaller when two numbers are equal. If you walk through your code, you'll see why you're getting -1 in the example you show: you end up with larger=1 (the 5) and smaller=3 (the 0, 2, and 3). Thus, since larger isn't equal to smaller, median isn't set to 3 and remains -1.

That's what's going wrong. How to handle equalities to fix the conceptual bug is up to you!

10 Comments

Yes I understand why -1 is coming. I kept the = intentionally because I want to consider them as smaller numbers and that's where the problem comes, I cannot arbitrarily choose to increment smaller as in some cases it doesn't work as shown above. Also if I choose to remove that and consider {4,1,0,1,4} case median should be 1 but again the condition never hits and the median is never updated. That's where I am stuck "How to handle this?"
@CCPP Don't count array elements that are equal to the pivot. if (abs(larger - smaller) <= 1) then you've found the median.
@user3386109 So you mean keeping the conditions strictly greater and strictly smaller plus this abs(larger - smaller) <= 1 would suffice?
@CCPP Yes, I believe so. If the number of elements in the array (not counting the pivot) is an odd number, then larger and smaller will differ by 1.
@user3386109 Tried it with {1,5,1,5,1} doesn't work!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.