K-means clustering ~ Research Mining

Saturday, 19 October 2013

K-means clustering

October 19, 2013 No comments

K-means Is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The k-means algorithm takes the inputs parameter, k, and partitions a set of n objects into k cluster so that the resulting intracluster similarity is high but the intercluster similarity is low. Cluster similarity is measured in regard to the mean value of the objects in a cluster, which can be viewed as the cluster’s centroid or center of gravity.

Given K, the k-means algorithm is implemented in four steps:

Partition objects into k nonempty subsets
Compute seed points as the centroids of the clusters of the current partition(the centroid is the center, i.e., mean point, of the cluster)
Assign each object to the cluster with the nearest seed point
Go back to step 2, stop when no more new assignment

Finally this algorithm aims at minimizing an objective function, in this case a mean squared error function is calculated as:
\[E=\sum\limits_{i=0}^{k}{{{\sum }_{p\in {{c}_{i}}}}\left| p-{{m}_{i}} \right|}\]

where E is the sum of the square error for all objects in the data set; p is the point in space representing a given object; and mi is the mean of cluster Ci (both p and mi are multidimensional).

Algorithm: k-means.

The k-means algorithm for partitioning, where each cluster’s center is represented by the mean value of the objects in the cluster.

Input:

K: the number of clusters,

D: a data set containing n objects.

Output: A set of k cluster

Advantages: with a large number of variables, k-means may be computationally faster than hierarchical clustering (if k is small). K-means may produce tighter cluster than hierarchical clustering, especially if the cluster are globular.

Disadvantages:

· Difficult in comparing the quality of the clusters produced.

· Applicable only when mean is defined.

· Need to specify k, the number of clusters, in advance.

· Unable to handle noisy data and outliers.

· Not suitable to discover clusters with non-convex shapes.

Research Mining

Saturday, 19 October 2013

K-means clustering

0 comments:

Post a Comment

Comment

Recent

BTemplates.com

Search This Blog

Blog Archive

Labels

Translate

Report Abuse

About Me

Featured post

Mahalanobis Distance using R code

Weekly

Labels

Blog Archive

Labels

Blogroll

About