Histogram Explained with Examples, Types, and More
What is a Histogram?
A histogram represents values in a group of data that appear as vertical bars side-by-side. The height of each bar tells how many times (frequency or count) these values in that range occur in the data. It is a visual summary of a dataset's distribution and frequency of values.
Components of a histogram include
- Bins or Intervals - Bins in histogram represent the intervals or ranges into which the data is divided. Each bar in the histogram defines the frequency of data points within a particular bin.
- Bars - The bars are the vertical columns representing the frequency or count of data points falling within each bin. The height of each bar corresponds to the frequency of data points in that bin.
- X-axis and Y-axis - The x-axis represents the intervals or bins, while the y-axis represents the frequency or count of data points within each interval.
- Title and Labels - A histogram should have a title that describes the data being represented. Additionally, it should have labels for the x-axis and y-axis to provide context and clarity.
Best-suited Statistics for Data Science courses for you
Learn Statistics for Data Science with these high-rated online courses
Basic Histogram Example
Given this basic histogram definition, let’s look at one example.
Let's say you have a list of exam scores for a class of students as a dataset.
70, 85, 92, 78, 89, 92, 95, 78, 85, 90
Each number (70, 85, 92, etc.) is a value. When constructing a histogram for these exam scores, you might group them into intervals like 70-79, 80-89, and 90-99. The histogram would then show how many students scored in each of these score ranges.
So, the histogram helps you visualise the distribution of scores and how frequently different score ranges occur in the dataset.
How to Create a Histogram Step-by-Step
How would you make this histogram of exam scores for a class of students?
Using the example above, let’s create a histogram in Google Spreadsheets. You could similarly follow the same steps in MS Excel.
Step 1: Open Google Sheets
Open Google Sheets and create a new sheet or use an existing one.
Step 2: Enter Data
In a column, enter the exam scores. Let's assume you enter them in column A starting from A1.
Exam Scores |
70 |
85 |
92 |
78 |
89 |
92 |
95 |
78 |
85 |
90 |
Step 3: Create Bins
In another column, create bins (score ranges). For example, you can use the following bins.
Score Ranges |
70-79 |
80-89 |
90-99 |
Step 4: Count Scores in Bins
In the adjacent column, use the COUNTIFS function to count how many scores fall into each bin. Assuming you put the bins in column B and the scores in column A.
Type or paste the following in each row.
=COUNTIFS(A:A,">=70",A:A,"<=79")
=COUNTIFS(A:A,">=80",A:A,"<=89")
=COUNTIFS(A:A,">=90",A:A,"<=99")
Results will show this way, corresponding to each row.
3 |
3 |
4 |
You could additionally look into our guide to COUNTIF in Excel
Step 5: Create a Bar Chart
Select the data in the bins and the corresponding counts. Then, go to Insert > Chart. Choose 'Chart type' as 'Bar chart'.
Adjust the chart settings if needed, such as giving it a title, labelling axes, etc.
Now, you've successfully created a histogram in Google Sheets showing how many students scored in each score range.
As you can see, the exam scores on the y-axis show the frequency or count of the values that fall into each bar/interval/bin. The score ranges on the x-axis are the type of data you want to measure, where 70 to 76 is one bin, and 88 to 94 is another.
Histogram in Statistics for Analysis
A histogram in statistics is a visual representation of the frequency distribution of a dataset. It is particularly useful for displaying the central tendency and the spread of continuous data. The horizontal axis of the histogram represents the continuous data values grouped into the specified bins. The vertical axis indicates the frequency of occurrence for each bin.
By examining the shape and pattern of the histogram, one can easily identify the central tendency of the dataset, observe its distribution, and discern any patterns or trends present.
Now, let’s focus on learning the key areas mentioned with regard to statistics (and the same example of exam scores in a class from above).
Frequency Distribution
Frequency distribution refers to how often the data appears in the dataset.
The frequency distribution of the exam scores in this very histogram will show that
- 3 students got marks between 70 and less than 79
- 3 students scored between the range of 80 and 89
- 4 students scored above 90 and less than 99
The histogram displays the distribution of exam scores, showing how many students achieved scores within specific ranges (bins or intervals). It indicates that more students fall into certain score ranges than others.
To illustrate the concept further, the main components of frequency distribution would be
Data Points: These are the individual pieces of information in a dataset. For instance, the data points are 70, 85, 92, 78, 89, 92, 95, 78, 85, 90. In the context of frequency distribution, these values are grouped or categorised.
Frequency: This is the count of how many times each value or range of values occurs in the dataset. It represents how often each category appears. For instance, the score 78 appears twice, and the score 85 appears twice.
Bins or Intervals: In the context of continuous data, values are often grouped into intervals or bins. The frequency distribution then shows how many values fall into each interval. For instance, the bins could be defined as 70-79, 80-89, and 90-99.
If you want to learn more about it, you may as well take the free Frequency Distribution course from Great Learning. This short course covers the basics of the topic, all within an hour.
Central Tendency
Central tendency is a statistical measure, representing the central or typical value of a dataset.
The main measures of central tendency are the mean, median, and mode.
Let's use the example dataset of exam scores along with its frequency distribution:
Example Dataset: 70,85,92,78,89,92,95,78,85,90
Mean (Average)
The mean is calculated by adding up all the values and dividing by the total number of values.
Example.
Formula for Mean
(70+85+92+78+89+92+95+78+85+90)/10 = 85.4
Go check out our blog on the Mean Formula too!
Median
The median is the middle value when the data is arranged in ascending or descending order.
If there's an even number of values, the median is the average of the two middle values.
Example.
Arranging the scores in ascending order-
70,78,78,85,85,89,90,92,92,95. The median is 87
Mode
The mode is the value that appears most frequently in the dataset.
Example: The mode of this dataset is 92 as it appears twice, more than any other score.
Central tendency measures are often used in statistical models. For example, the mean is a key parameter in many statistical models, and understanding its value helps in making predictions based on the model.
You may check courses like Measures of Central Tendency, another useful and free course from Great Learning.
Types of Histograms
Moving on, let’s help elaborate the types of histogram.
The shape and characteristics of histograms can vary based on the distribution of the data. Different types of distributions result in different types of histograms.
Normal Distribution
When the data is symmetrically distributed around the mean, the histogram will have a bell-shaped curve with the highest frequency at the mean and symmetrical tails on both sides.
Skewed Distribution
In a skewed distribution, the data is not symmetric and tends to have a longer tail on one side. A positively skewed distribution has a tail on the right side, while a negatively skewed distribution has a tail on the left side.
Bimodal Distribution
Bimodal distributions have two distinct peaks, indicating the presence of two different groups or conditions within the data.
Uniform Distribution
In a uniform distribution, all values have approximately the same frequency, resulting in a rectangular-shaped histogram.
Exponential Distribution
An exponential distribution often results in a histogram with a rapidly decreasing frequency as the values increase, creating a skewed, right-tailed shape.
Understanding the type of distribution in the dataset is essential for selecting the appropriate analysis techniques and drawing meaningful conclusions from the histogram. Each type of distribution provides valuable information about the underlying data and can offer insights into the nature of the variables being studied.
When interpreting histograms, it's important to consider the specific characteristics of the distribution and how they impact the shape and appearance of the histogram. This understanding allows for a more comprehensive analysis of the dataset.
Parting Thoughts
Hope we helped you uncover the secrets hidden in your data with histogram. Now you have got a teaser into the core concepts of this powerful tool - from what it is to how to make one in easy steps.
Aquib is a seasoned wordsmith, having penned countless blogs for Indian and international brands. These days, he's all about digital marketing and core management subjects - not to mention his unwavering commitment ... Read Full Bio