Statistical data types, classify a group of individual data points to help statisticians apply measurement properly and conclude key assumptions. Alternatively, you can consider data types as methods to work with various types of variables.
Knowledge of these data types is paramount for doing Exploratory Data Analysis EDA which is one of the important aspects of machine learning. Mixing these data types may lead to wrong data analysis and eventual waste of effort and time.
For those unaware, EDA is used by data scientists to study the main characteristics of data sets often by using data visualization methods. This helps them examine data for patterns, abnormalities, test hypotheses and verify assumptions.
Once you gain a sound knowledge of these data types, you can perceive them for what they are (qualitative or quantitative) and measure them properly.
This article will give you a deep dive into how to approach these data types and what goes behind them.
Data types in statistics
There are only 2 classes of data in statistics: quantitative data and qualitative data. This highest level of classification comes from the fact that data can either be measured or can be an observed feature of interest.
Qualitative data are also referred to as categorical data. They are an observed phenomenon and cannot be measured with numbers. Examples: a race, age group, gender, origin, and so on. Even if they contain a numerical value, they hold no meaning (1 for male and 0 for female).
Quantitative data, on the other hand, tells us about the quantities of things or the things we can measure. And, so they are expressed in terms of numbers. It is also known as numerical data and includes statistical data analysis. Examples: height, water, distance, and so on.
We can further subdivide quantitative data and qualitative data into 4 subtypes as follows: nominal data, ordinal data, interval data, and ratio data.
Qualitative (Categorical) data types
Qualitative data can be subdivided into nominal and ordinal data types. While both these types of data can be classified, ordinal data can be ordered as well.
Nominal Data
Nominal data is a type of data that represents discrete units which is why it cannot be ordered and measured. They are used to label variables without providing any quantitative value. Also, they have no meaningful zero.
Some examples of nominal data include
- Gender ( Male, Female)
- Hair color ( Black, Brown, Gray, etc)
- Nationality (Indian, American, Chinese, etc)
Data scientists use hot encoding, to transform nominal data into a numeric feature.
The only logical operation that you can apply to them is equality or inequality which you can also use to group them. The descriptive statistics you can do with nominal data include frequencies, proportions, percentages, and central points. And, to visualize nominal data, you can use a pie chart or a bar chart.
Ordinal Data
Ordinal values represent discrete as well as ordered units. Unlike nominal, here the ordering matters. However, there is no consistency in the relative distance between the adjacent categories. And, similar to nominal data, ordinal data also don't have a meaningful zero.
Examples of ordinal data
- Opinion (agree, mostly agree, neutral, mostly disagree, disagree)
- Socioeconomic status (low income, middle income, high income)
Data scientists use label encoding to transform ordinal data into a numeric feature.
The descriptive statistics that you can do with ordinal data include frequencies, proportions, percentages, central points, percentiles, median, mode, and the interquartile range. Here the visualization methods that cabe used are the same as nominal data.
Quantitative (Numerical) Data Types
Two types of quantitative data are discrete data and continuous data. Discrete data have distinct and separate values. Therefore, they are data with fixed points and can’t take any measures in between. So all counted data are discrete data. Some examples of discrete data include shoe sizes, number of students in class, number of languages an individual speaks, etc. Continuous data, on the other hand, represent an endless range of possible values within a specified range. It can be divided into finer parts to be measured but not counted. Continuous data examples include temperature range, height, weight, etc.
Continuous data can be visualized by histogram or box plot while bar graphs or stem plots can be used for discrete data.
Here are two types of continuous data
Interval Data
It represents ordered data that is measured along a numerical scale with equal distances between the adjacent units. These equal distances are also referred to as intervals. So a variable contains interval data if it has ordered numeric values with the exact differences known between them.
Interval data can be continuous or discrete.
Examples of Interval data
- IQ test’s intelligence scale
- Time if measured using a 12-hour clock
You can compare the data with interval data and add/subtract the values but cannot multiply or divide as it doesn't have a meaningful zero. The descriptive statistics you can apply for interval data include central point, range, and spread.
Ratio Data
Like Interval data, ratio data are also ordered with the same difference between the individual units. However, they also have a meaningful zero so they cannot take negative values.
Examples of ratio data
- Temperature on a Kelvin scale (0 degrees represent total absence of thermal energy)
- Height ( zero is the starting point)
Now with real zero points, we can also multiply and divide the numbers. Besides, you can sort the values as well. The descriptive statistics you can do with ratio data are the same as interval data and include central point, range, and spread.
Overall, ratio data and interval data are the same with equal spacing between adjoining values but the former also has a meaningful zero. Besides addition and subtraction, you can also multiply and divide the data, which is impossible with interval data as it does not have an absolute zero. However, interval data can take negative values with no absolute zero while ratio data cannot.
Conclusion
This blog shows various statistical data types and their characteristics. You also learned the difference between quantitative and qualitative data, the two broad classes of data types.
Now you would be able to discern between categorical and numerical data as statistical data analysis is only possible for the latter. And, what visualization and plot methods to use for each. Also, you know which categorical variables can be converted to numeric variables.
These data types hold a vital place in statistics and data science in general. Once you know how to work with data types, you can make accurate data-driven decisions that will eventually steer your exploratory data analysis efforts in the right direction.