The region that the bin covers starts at its starting point and goes up to, but not including, the starting point plus the width of the bin.Īlso note that, generally, the bins in a histogram will all have the same width (size) however, this may not always be true.Īlright, so let’s use the above graph to give a practical example of each of the above terms. The bin’s counts (the y-axis) is the number of values that fall within the region that a bin covers. We’ll first give an abstract definition, and then we’ll consider the above example.Įach bin has a starting point, a width (or a size), and an associated count that’s represented by the height of the bin. Let’s take a look at these bins in more detail. In this case, I’ve generated some data to show a (very basic) possible distribution of daily steps taken for a set of students at university. A relative count (a count done relative to how many data points there are in the data set).An absolute count (literally a count of how many times it appears), or.The y-axis of a histogram always shows a measure of frequency. They show us which ranges contain a lot of data and which are more sparse. Histograms are visualizations that allow us see how the values of our data points are distributed. Histograms normally consist of an x-axis and a y-axis, and are made up of a series of bars, also called bins.
So without further ado, let’s get into histograms, what they are, how to read them, when and how to use them, how to make them in Python, and finally, the limitations of histograms.
Humans are largely visual beings who process images and remember images much better and faster than they do text hence, data visualization allows raw data to come to life and communicate with us in our language of choice: via pretty pictures, basically. I’ll be the first to admit – raw data is not the most fun thing to look at, and worse yet, raw data is almost impossible to draw conclusions or make recommendations from. Next up in our Deep Dive into Data Visualization series comes histograms! As mentioned in our other blog posts on scatter plots and on box plots, data visualization is an instrumental part of many careers, including data scientists, data analysts, machine learning engineers, business analysts, marketers, product analysts and so on and so forth. Skip to what you’re interested in reading: