10

This is the code I have written for searching a node in a tree as a part of my assignment:

typedef struct NODE
{
    int key; 
    struct NODE *left, *right; 
} Node;

Node *search(Node *head, int key)
{
    if (!head)
        return 0; 
    if (head->key == key)
        return head; 
    
    return or(search(head->left, key), search(head->right, key)); 
}

Node *or(void *ptr1, void *ptr2)
{
    Node *temp = (Node *)((long)ptr1 | (long)ptr2);
    return temp;
}

I wrote a recursive function because it's the 1st thing that comes to mind when you want to scan a tree. This was supposed to be simple, but it quickly became a dance with pointers (had to do bitwise OR on the Node* pointers).

The function analyzes the left and right sub-trees of a node, returning NULL if node is not found, or the Node address if the key gets matched. In the end, we do a bitwise OR on the two pointers returned which enables us to return the non-zero node address if the key gets matched.

Although this isn't too much code (and it works), I think it is more complicated than necessary. Can anyone recommend a simpler technique? (Recursive preferred)

edit:

Node *left = search(head->left,key) ;
if(left)
return left ; 
else 
return search(head->right,key) ;

This was simply what I could have done without involving myself with bitwise operations on pointers.

9
  • 1
    Binary trees are very often kept organized by the key, so that a comparison of the desired key with the current node’s key tells you whether to search the left subtree or the right subtree. Then you do not need to search both, and you can use a simple loop. If your tree is not ordered by the key and you must search the entire tree, you need additional data for that, such as the local state of recursive function calls, an additional data structure listing subtrees yet to be searched, or additional links in the tree giving a path through it. Commented Aug 15 at 17:46
  • 8
    Doing a bitwise OR on pointers converted to integers is an ugly kludge and should be avoided. Even if your tree is not ordered by the key, it is wasteful to always search the left and right subtrees. You should at most search one first and then decide: If the search is successful, return what it found. Otherwise, search the other. Commented Aug 15 at 17:46
  • 1
    As you say your code works, what is the problem? This looks like asking for a code review. Commented Aug 15 at 17:51
  • @EricPostpischil the assignment just mentioned a normal binary tree, that's why .. and yes I felt the code getting dodgy when I had to typecast the pointer to an integral type to use the | operator. I realized a major problem was not being able to scan the sub-trees separately. For e.g. if I write return search(head->left,key) ;, it does not wait for the right sub-tree to be scanned and just returns with the NULL value if it cannot find the key in the left sub-tree. Commented Aug 15 at 18:16
  • 1
    In or, doing ptr1 | ptr2 just produces a garbage value. Not like return ptr1 ? ptr1 : ptr2 which is closer to your intent. And, in search, there is no short circuit evaluation in the call to or. Both args must be resolved, so search always has to be called for both left and right subtrees (i.e. wasteful). Your final edit is the correct/best way to do things (glad you found it). Because, IMO, a single return is easier to debug (one breakpoint in gdb), I would do: Node *ret = search(head->left,key); if (ret == NULL) ret = search(head->right,key); return ret; Commented Aug 15 at 22:17

3 Answers 3

15

Your original approach only works if you can make these assumptions:

  • type long is wide enough for the conversion from Node* to long and back to yield the original pointer. This is true on 32-bit systems and 64-bit Unix systems, but not on 64-bit Windows where long is only 32-bit wide and pointers require 64 bits.
  • there is at most one Node* with the key value in the tree. Otherwise or-ing the bits of the matching pointers will produce a meaningless result that will invoke undefined behavior.
  • performance is not needed as this implementation walks the whole tree for every search with a linear complexity instead of the expected Log(N) complexity.

I wrote original because I have never seen anyone else do it this way! Sadly, it is not a good approach because it is inefficient, non portable and risky, but creativity is a good skill to hone in programming.

Assuming your binary tree is actually constructed as a binary search tree, the classic approach for a look up is to compare the key with the node value and recurse only on the left or right branch as needed:

typedef struct NODE {
    int key; 
    struct NODE *left, *right; 
} Node;

Node *search(Node *head, int key) {
    if (!head)
        return NULL; 
    if (key == head->key)
        return head; 
    if (key < head->key)
        return search(head->left, key);
    else
        return search(head->right, key);
}

This can be simplified a bit with a single terminal recursion:

Node *search(Node *head, int key) {
    if (!head || key == head->key)
        return head;
    else 
        return search(key < head->key ? head->left : head->right, key);
}

Note that an iterative version might perform better:

Node *search(Node *head, int key) {
    while (head && key != head->key) {
        head = key < head->key ? head->left : head->right;
    }
    return head; 
}

Finally, here is a cryptic alternative with fewer branches(*):

Node *search(Node *head, int key) {
    while (head && key != head->key) {
        head = (key > head->key)[&head->left];
    }
    return head; 
}

If the tree is not sorted (just an ordinary binary tree), you may have to explore the whole tree this way:

Node *search(Node *head, int key) {
    if (!head || key == head->key)
        return head; 
    Node *n = search(head->left, key);
    return n ? n : search(head->right, key);
}

(*) Please do not use this kind of code in production, this is just a cryptic example for educational purpose. Here is an explanation: in C a[b] is the same as *(a+b) which is commutative so you can swap the pointer and the index as well and write b[a] with the same effect. (key > head->key) has the value 0 or 1 depending on the comparison, &head->left is the address of the left structure member, adding 1 to this address gives the address of the right member (a tad abusive but should work). So this expression selects the left or right member based on the comparison without an actual branch on modern processors.

Sign up to request clarification or add additional context in comments.

13 Comments

@ikegami: good point, I commented this possibility and posted alternative code for this misleading case.
yes, what you wrote at the very end is what I would have written if I hadn't got confused. There is a part of answer I don't understand: how does this work ? head = (key > head->key)[&head->left];
I thought this oddity would trigger your curiosity (and get me a downvote)... in C a[b] is the same as *(a+b) which is commutative so you can swap the pointer and the index as well and write b[a] with the same effect. (key > head->key) has the value 0 or 1 depending on the comparison, &head->left is the address of the left structure member, adding 1 to this address gives the address of the right member (a tad abusive but should work). So this expression selects the left or right member based on the comparison.
I think it ishead->left that holds the address of the left structure member. head->left is already a pointer, so putting an & before it makes a double pointer. This thing is really cool, btw.
No, actually I'm sorry again. The & is necessary. *( (key>head->key) + &(head->left)). Say, 1st expression evaluates to 1, then it becomes *( address of head->right) i.e. head->right.
There's something else I would like to ask you, maybe unrelated to the question at hand. Is there any convention of declaring functions as int *func_name() and not int* func_name() in C ? I do get the convention of declaring pointer variables like int *var and not int* var, but I don't get the reason of putting the * before the function name. Putting it after the return type should make more sense because you know you are returning a pointer to that type, but I keep seeing people putting it before the function name.
@M.B.: it is a side effect of the C syntax rules: attaching the * to the typename leads to classic mistakes in declarators that declare more than one item: Node* left, right; declares left as a pointer and right as a Node structure. To underscore this counter-intuitive precedence, Node *left is preferred and makes it more natural for *right to follow. This convention was started by K&R in the early days of the language and applies to function declarations too for stylistic consistency. It is used in the C bible The C Programming Language and in the C Standards too.
@M.B.: note that the placement of operators in the declarators is consistent with the use of these operators in expressions: char buf[10] declares an array of 10 char, and buf[0] is the first char in the array. char *strcpy(char *dest, const char *src) declares the function as returning a pointer to char: *strcpy(buf, "") evaluates to the value of the char at the beginning of the buf array. It would be weird to add a space between the * and the function name in this case, and the function definition is consistent with the usage in expressions (however rare in this case)
|
4

Doing a bitwise OR on pointers is not a good idea. You should instead compare the current value against the key and use that to see if you should search the left or right subtree.

Node *search(Node *head, int key)
{
    if (!head)
        return 0; 
    if (head->key == key) {
        return head; 
    } else if (head->key > key) {
        return search(head->left, key)
    } else {
        return search(head->right, key);
    }
}

This assumes the tree is sorted (which it should be). If not, search one side, check the result, and if needed check the other side.

Node *search(Node *head, int key)
{
    if (!head)
        return 0; 
    if (head->key == key)
        return head; 
    
    Node *result = search(head->left, key);
    return result ? result : search(head->right, key);
}

2 Comments

This assumes a sorted tree, but the OP didn't say it was a sorted tree. That said, why would one use a binary tree that's not sorted.
@dbush: Some C compilers have an extension for this very purpose: return search(head->left, key) ?: search(head->right, key);
1

If you write your expression as below, both searches will be made, and you don't need to search if you return as soon as you find your node. This is only useful if you want to find every matching record.

If your tree has equal probability of having to explore any of the children of a node to search for a node, then you will have to select some based on some heuristic, and if the heuristic fails, only then you will need to search the other branch.

return or(search(head->left, key), search(head->right, key));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.