BST and Trees

Random Binary Search Trees and Treeps

RANDOM BINARY SEARCH TREES

All basic operations on a binary search tree run in O(h) time, where h is the height (or depth) of the tree.

A randomly built binary search tree on n distinct keys is a binary search tree that arises from inserting the keys in random order into an initially empty tree, where each of the n! permutations of the input keys is equally likely. Therefore, the construction of the actual binary search tree is done by the standard "insertion" method. The only difference is that the input is entered at random. This works only if all keys are initially available and no additional insertions/deletions take place on the tree.

What is your intuition about the expected maximum depth (~height of root) of a randomly constructed binary search tree?

The depth of a particular node is defined to be the length of the path from the root to the node in question. Therefore, it is obvious from this definition, that there exists a relation between the depth of a node and time. Also, the height of a particular node is the maximal distance from that node to any of the leaves in its subtree. The height of a tree is the height of the root of the tree, and it is equal to the maximum depth of any node in the tree. This proves that there is a connection between height and depth. Now, if D_n is the depth of the n^th inserted node, the following shows that the expected value of D_n is not more than 2+2log(n) where log is the natural logarithm.

Let us start by constructing a binary search tree with the following, randomly chosen, list of integers:

3 2 14 15 12 5 6 11 9 10 8.
Using this input sequence, we want to determine which keys (integers) are on the path from the root to the last node. In this example, we arbitrarily chose the number 8 to be the last node. We will denote the last node (8) as x_n.

First, observe that every node to the left of x_n must be less than x _n and every node to the right of x_n must be greater than x_n. This property must hold because it is the definition of a binary search tree. Therefore, let's divide the random list into two sublists:

LEFT OF X_n -- all numbers in list that are less than x_n.
RIGHT OF X_n -- all numbers in list that are greater than x_n.

Although this may seem obvious, it is important to note that a subset of a random permutation is, in itself, a random permutation. Table 1 (below) shows the results of the divisions.

RANDOM LIST	3	2	14	15	12	5	6	11	9	10	8
LEFT OF X_n	3	2	-	-	-	5	6	-	-	-	-
RIGHT OF _n	-	-	14	15	12	-	-	11	9	10	-

Table 1. Separation of keys into two sublists about the key x_n.

Now, let's go through the list one node at a time.

The first node is the number 3. This becomes the root of our binary search tree.
Next is the number 2. This number is not part of the path from the root to the node because the node in question (8) is larger than 3 and is therefore to the right of 3. 2 on the other hand is to the left of 3, so it does not affect our depth analysis.
Next is the number 14. 14 is to the right of 3 and is therefore on the path to x_n.
Next is the number 15. This number is not on the path because it is to the right of 14 and the node in question (8) is to the left of 14. ...

This process is continued until we get to the end of the list. Therefore, the above sequence of random numbers corresponds to the following binary search tree:

Nodes on the path that are to the left of x_n make up the up-records to x_n and those that are on the path to the right of x_n make up the down-records to x_n.

In this case, it is easy to find the depth of x_n by just looking at the sublists in Table 1 (above). Analyze the left sublist first. Go through this list and circle all keys that are on x_n's path. These keys are 3, 5, and 6. Do the same with the right sublist. These keys are 14, 12, 11, and 9. We now have the following:

If we count the number of circled keys, this will tell us the depth of our node (the node with the number 8). In this case, we have 7 circled keys. Therefore, the depth of our node, D ₈, is 7.

Mathematically, the expected number of up-records in a random permutation is equal to (count probabilities for each one of the numbers):

Prob ( X₁ is a record ) + Prob ( X₂ is a record ) +. . . + Prob ( X_n is a record )
= 1 + 1/2 + 1/3 + 1/4 + . . . . . + 1/n
* Notice that this series is a harmonic series ! Why?

Suppose we have a list of rainfall figures for a hundred years. How many record-breaking falls of rain do you expect have taken place over that period? We assume that the rainfall figures are random, in the sense that the amount of rain in any one year has no influence on the rainfall in any subsequent year.

The first year was undoubtedly a record year. In the second year, the rain could equally likely have been more than, or less than, the rainfall of the first year. So there is a probability of 1/2 that the second year was a record year. The expected number of record years in the first two years of record-keeping is therefore 1 + 1/2. Go on to the third year. The probability is 1/3 that the third observation is higher than the first two, so the expected number of record rainfalls in three years is 1 + 1/2 + 1/3. Continuing this line of reasoning leads to the conclusion that the expected number of records in the list of observations is

= 1 + 1/2 + 1/3 + 1/4 + . . . . . + 1/n

< 1 + log(n) ( This is the natural Log)

Therefore, the number of up-records in the set "LEFT OF X_n" is not more than 1+ log (n). Similarily, the number of down-records in the set "RIGHT OF X_n" is not more than 1+ log (n) also.
This implies that the expected value of D_n is less than 2 ( 1 + log(n) ) .
Thus, Expected Value of D_n < ( 2 + 2log(n) )

*The 1+log(n) bound (above) was obtained by a method of summation called "comparison with an integral".

Assume all keys are not initially available and insertions/deletions may occur throughout the lifetime of the BST. Is it possible to still use a randomized BST and take advantage of its height balancing properties?

Treeps

A node to be inserted in a Treep is assigned a random priority. Nodes are inserted into the treep using both the bst property on keys and the priority of the node as in a min heap - thus the name.

So think of a treep in the following way. If nodes, { x1, x2, .., xn}, are inserted, then the resulting treep is a normal binary search tree with the node order given by the randomly chosen priorities. Just as with AVL and Red-Black trees, additional information is maintained for height balancing. See example trace from lecture

Modifications 2/23/06 D. Byrnes

Copyright © 1997, Gaetano Zullo, Gabriel Kevorkian, Chris Foote. All rights reserved. Reproduction of all or part of this work is permitted for educational or research use provided that this copyright notice is included in in any copy.

Random Binary Search Trees and Treeps

RANDOM BINARY SEARCH TREES

Table 1. Separation of keys into two sublists about the key xn.

Treeps

Table 1. Separation of keys into two sublists about the key x_n.