Rachit Kumar SaxenaManager-Editorial
What is Cluster Analysis?
Cluster Analysis is a concept that is the basis of logical reasoning, where to draw conclusions, the samples are placed in similar “groups” or “clusters”. This is called clustering. Many guidelines are used to form these clusters, which creates an integral part of data management in statistical analysis.
In a population of “n” individuals, understanding the characteristics of every individual is not feasible. Hence, if the “n” individuals are divided into groups, understanding their characteristics is easier.
A cluster plot is characterised as having high internal homogeneity and high external heterogeneity.
Here the colours represent different sets of data. You can easily calculate the relatedness of two sets of data. The pink and blue representing data are closer in relatedness than the green.
Types of Cluster Analysis
1. Hierarchical Cluster Analysis
Here one cluster is formed and grouped with another similar cluster which is grouped with another forming one large Agglomerated Cluster. The opposite of this is Divisive Clustering.
2. Centroid-Based Clustering
Here, there will be one central entity around with similar data is clustered. K-Means Method of clustering is often used.
3. Distribution-Based Clustering
Objects belonging to the same distribution are put into a single cluster. This type of clustering can help to analyse by correlation and dependence between attributes.
4. Density-Based Clustering
Here, clusters are defined by the higher density areas than the remaining of the data set. Objects in sparse areas considered noise or border points. This helps eliminate out-of-range data.
Weightage of Clustering
This topic is just about making the students of Class 12 aware of this method of segregation. This topic has indirect weightage in the exams as not many practical questions will be asked, but definitions will come. There might be just 1 or 2 marks coming from this part, but this is used in the analytics in national entrance exams. Later on, this will be employed in biostatistics and research.
Illustrated Examples on Clustering
1. If four sets of data A, B, C and D are represented in a cluster plot and one cluster B is found inside D cluster, and the other two are far apart, what can you conclude from it?
Solution.
- The B and D clusters are agglomerative and show that they have similar properties.
- There are three distinct clusters in the cluster plot and not four.
- The difference between cluster B and D is that B has all the D properties, but the points other than B in D have different properties.
2. A strain of bacteria ‘a’ is resistant to antibiotic Amp and Tet. Another strain ‘b’ is resistant only to Amp. While another strain ‘c’ is resistant to Tet and Azi. How would you represent this?
Solution.
Strain ‘b’ cluster can be represented within or very close to strain ‘a’ cluster and strain ‘c’ cluster will overlap a little with strain ‘a’ cluster.
3.A data group of i, ii, iii, iv and v were plotted, and the first four formed a distinct cluster whereas v was found far away. What could be the reasons?
Solution.
The v data term must be an outlier or border value or could be an error in recording.
FAQs on Cluster Analysis
Q: Define Cluster Analysis
Q: What does a Cluster Plot of similar attributes look like?
Q: What are outliers?
Q: What type is the K-means clustering method? Is the centroid value a part of the data?
Q: What is Cluster Analysis used for?
News & Updates
Statistics Exam
Student Forum
Popular Courses After 12th
Exams: BHU UET | KUK Entrance Exam | JMI Entrance Exam
Bachelor of Design in Animation (BDes)
Exams: UCEED | NIFT Entrance Exam | NID Entrance Exam
BA LLB (Bachelor of Arts + Bachelor of Laws)
Exams: CLAT | AILET | LSAT India
Bachelor of Journalism & Mass Communication (BJMC)
Exams: LUACMAT | SRMHCAT | GD Goenka Test