Mitchell, Ch. 8
-
Instance-based learning does not store all training examples for testing purpose.
False
-
The most computationally intensive part in instance-based learning is testing.
True
-
K-means clustering delivers k clusters or k-prototypes for a given cluster.
True
-
Give an advantage of k-nearest neighbors learning.
Has zero training time, all work is done during classification.
-
Give a disadvantage of k-nearest neighbors learning.
Storage expensive: one must store all data to be able to use this method -> cannot apply it easily to big data. Also, works better for data with less than 20 attributes because it uses distance to compute similarity between data pionts. A third problem is, attributes must be normalized or scaled to be in same range, otherwise attributes with large range will overpower the attributes with smaller range.
-
What is meant by the curse of dimensionality?
In high-dimensional space data becomes very sparce and distance-based techniques are less meaningful especially in classification.
-
Know to explain and apply k-nn classifier and k-means clustering.
-
Draw the Voronoi diagram for 1-Nearest Neighbor with Euclidian distance for data D below. (Hint: A similar picture to the right picture from Fig. 8.1, pg. 232.) D = [1 1 Class1; 5 1 Class2; 5 5 Class3]
-
Perform k-means (k=3) clustering on the dataset below. Assume the initial 3 random prototypes/centroids are the top left points.