Types of Data Every Aspiring Data Scientist Must Know About

Types of Data Every Aspiring Data Scientist Must Know About

7 mins read640 Views Comment
Rashmi
Rashmi Karan
Manager - Content
Updated on Sep 29, 2021 12:47 IST

Data is the fuel and a mandatory element for data-driven businesses to get actionable insights and make business decisions. Data science processes involve extensive use of structured and non-structured data. Before your start working on data, it is crucial to understand what are the different categories and types of data used in data science. Let’s get started!

2021_09_Types-of-Data.jpg

What is Data?

A data is the representation of a variable that can be both quantitative or qualitative. It refers to a form where the information is translated and represented through a sequence of symbols, numbers, or letters. A piece of data itself cannot demonstrate anything exclusively and need to be evaluated as a whole to examine the results.

Must Explore – Data Science Courses

Recommended online courses

Best-suited Data Science courses for you

Learn Data Science with these high-rated online courses

80 K
4 months
1.18 L
12 months
90 K
24 months
Free
4 weeks
1.24 L
48 months
1.75 L
20 weeks

What is a Data Type?

A data type is the property of a value that determines its domain. The data type is an important attribute that tells a computer or a system about the value of a given piece of data. It is important to understand and interpret a data type to ensure that the data has been collected in the preferred format, event properties are properly defined, and the value of data is as expected. The article focuses on both categories of data and the types of data used in data science. This segmentation will help you to focus on utilizing data in an efficient way.

Common Categories of Data in Data Science

2021_09_Data-Types-1-e1630583521171.jpg

Categorical Data

Categorical data is defined as a collection of grouped information. Measuring on a categorical scale consists of observing the result of an experiment and assigning it a class or category, from a finite number of possible classes. The analysis of categorical data generally includes the use of information tables. To explain it better, here I am taking an example of a company – If any company is planning to divide their employee population basis their portfolio, the resulting data would be categorical and the employees will be grouped basis their departments, teams, educational qualifications, gender, place of residence, etc.

Explore Statistics for Data Science Online Courses

Types of Categorical Data

Nominal Data

Nominal data is also regarded as “labeled” or “named” data. Such data is used in naming variables and can be divided into various groups that do not overlap. Nominal data helps data analysts and researchers to ease the data collection and sorting processes since nominal data is assigned to multiple unique groups with no common elements. In addition, nominal data cannot be manipulated with any mathematical tool or operator but can be analyzed using statistical tools and methods.

Example of Nominal data

A company conducts an annual employee satisfaction survey, where this question stands out mainly:

Q – “How happy are you working with your manager and your team”?

Ans. – ____________________

Data collected using such surveys or questionnaires are descriptive in nature and thus are Nominal types. Nominal data sometimes poses the issue of dealing with irrelevant data.

Ordinal Data

Ordinal data is a statistical type of data with a set order or scale. This type of data has variables in ordered categories that occur naturally. The distance between the two categories is not established using ordinal data since there is no standard scale to measure it. Ordinal data has the characteristics of both categorical and numeric data, where the categorical character is dominant, hence the categorization.

The main difference between nominal and ordinal data is that ordinal data have a category order while nominal data do not. Ordinal data are presented in a tabular format that makes analysis easy for the researcher. Tile charts are also used to establish the relationship between nominal and ordinal data.

Example of Ordinal data – Taking the above question again, now to understand Ordinal data.

  1. “How happy are you working with your manager and your team”?

Ans. –

Extremely happy – 5

Happy – 4

Neural – 3

Unhappy – 2

Extremely unhappy – 1

These are specific replies and the results of such surveys correspond to ordinal data.

Read on to know more about the job profile and responsibilities of a Data Scientist – What is Data Scientist?

Numeric Data

Everything that can be measured and counted, we say can be quantified. The concept “quantitative data” refers precisely that, to tangible information, which is obtained through some research method. Numeric or Quantitative types of data in statistics quantify things by considering numerical values, making the data countable. Quantitative data for the basis of statistical analysis and can be measured and verified. Such data give us information about quantities; that is, information that can be measured and written in numbers.

Examples of Numeric Data

  • Measure the height of a person. (You can measure the height in meters, centimeters, and even give a measurement in millimeters, that is, the data is continuous.
  • Age (You can define an age in years, months, and even days)

Numeric Data Types

Discrete – Whole numbers or integers fall under the discrete data category.

Examples of discrete data:

  • The number of members in a team
  • The number of players in a cricket team
  • The number of students in a class
  • The number of exam questions you answered correctly

Continuous – Fractional data fall under the continuous data category.

Examples of continuous data:

  • Time required to complete a project
  • Height of toddlers
  • Speed of vehicles
  • Square footage of a 3BHK apartment

Common Types of Data in Statistics, Data Science and Programming

Types of data in statistics, data science, and programming are declarations for variables, which determine the type and size of data associated with variables. Every value appearing in a program has a type. Different types of data used in data science and programming are –

Integer (int)

An integer is the most common numeric data type. The type int allows you to represent whole numbers.

An int stores numbers without any fractional component. e.g. …, -3 , -2 , -1 , 0 , 1 , 2 , 3 , …

Floating Point (float)

The name float comes from the term floating point. This is how the computer systems represent real numbers internally. Real numbers cannot be represented exactly on a computer. For example, the computer represents the decimal number 0.9 internally by the approximation 0.89999999999999996. All operations between float values are approximations.

For e.g. – 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 = 0.9999999999999998 = 1

Character (char)

Char or character is a display unit of information equivalent to an alphabet, symbol, digit, punctuation mark, or even blank space.

String (str or text)

String denotes a sequence of characters. A string can also include numbers or symbols, but they are always treated as text. The string is of the most popular data types to store text.

Boolean (bool)

Boolean data represent true and false values. In Boolean data, values are also represented as 0 for false and 1 for true.

Enumerated type (enum)

Enumerated type data has a set of predefined unique values. These values are often regarded as elements or enumerators, which can be assigned to a variable of enumerated data type.

Array

Arrays are also known as lists. This data type stores elements in a specific order, typically the same type.

Date

Computer systems store a date in the YYYY-MM-DD format as per ISO 8601 syntax.

Time

A time is stored in the hh:mm:ss format.

Datetime

Computer systems store a value of both date and time together in the YYYY-MM-DD hh:mm:ss format.

Timestamp

A timestamp represents the number of seconds elapsed since midnight. It is represented in the Unix format where the Unix epoch is 00:00:00 UTC on 1 January 1970. This is used to log the precise time and date of an event.

Data Sources

Wondering where do you get all this data from? Here are some of the common sources from where raw data is obtained for further business usage.

Web and social networks

Social networks such as Facebook, Twitter, and LinkedIn, etc. offer tons of user data that is used to understand and optimize web usage.

Big Transaction Data

Transactional data mainly includes billing records, in telecommunications detailed call records (CDR), etc. This transactional data is available in both semi-structured and unstructured formats.

Machine-to-Machine (M2M)

M2M refers to the technologies that allow you to connect to other devices. M2M uses devices such as sensors or meters that capture a particular event, which they transmit through wired, wireless, or hybrid networks to other applications that translate these events into meaningful information, security, and intelligence.

Biometric

Biometric information including fingerprints, retinal scanning, facial recognition, genetics, etc. In the area of ​​security and intelligence, biometric data is a crucial source of information for investigative agencies.

UGC

People generate various amounts of data such as the information that a call center saves when establishing a phone call, voice notes, emails, electronic documents, medical studies, etc.

Conclusion

Understanding data is important to drive data dependant decision-making processes and you should know how to apply them. The basic knowledge of data type will help you to –

  • Handle data correctly
  • Know what you can calculate with data
  • Type of data you can use to get the desired results
  • Present or visualize the data

If you have recently completed a professional course/certification, click here to submit a review.

About the Author
author-image
Rashmi Karan
Manager - Content

Rashmi is a postgraduate in Biotechnology with a flair for research-oriented work and has an experience of over 13 years in content creation and social media handling. She has a diversified writing portfolio and aim... Read Full Bio