IBM Data Engineering Professional Certificate
- Offered byCoursera
IBM Data Engineering Professional Certificate at Coursera Overview
Duration | 12 months |
Start from | Start Now |
Mode of learning | Online |
Schedule type | Self paced |
Difficulty level | Beginner |
Official Website | Go to Website |
Credential | Certificate |
IBM Data Engineering Professional Certificate at Coursera Highlights
- Earn a certificate of completion from IBM
- Gain an expertise on widely used skills like NoSQL and Big Data, Apache Spark, SQL, Data Science, Database (DBMS), NoSQL
IBM Data Engineering Professional Certificate at Coursera Course details
- RDBMS fundamentals including Design & Creation of Databases, Schemas, Tables; DB Administration, Security & working with MySQL, PostgreSQL & IBM Db2.
- SQL query language, SELECT, INSERT, UPDATE, DELETE statements, database functions, stored procs, working with multiple tables, JOINs, & transactions.
- NoSQL & Big Data concepts including practice with MongoDB, Cassandra, IBM Cloudant, Apache Hadoop, Apache Spark, SparkSQL, SparkML, Spark Streaming.
- ETL, Data Pipelines using Python, Shell Scripts, Apache Airflow and Apache Kafka; Building & Populating Data Warehouses, and Querying with BI tools.
- This Professional Certificate is for anyone who wants to develop job-ready skills, tools, and a portfolio for an entry-level data engineer position. Throughout the self-paced online courses, you will immerse yourself in the role of a data engineer and acquire the essential skills you need to work with a range of tools and databases to design, deploy, and manage structured and unstructured data. By the end of this Professional Certificate, you will be able to explain and perform the key tasks required in a data engineering role. You will use the Python programming language and Linux/UNIX shell scripts to extract, transform and load (ETL) data. You will work with Relational Databases (RDBMS) and query data using SQL statements. You will use NoSQL databases and unstructured data. You will be introduced to Big Data and work with Big Data engines like Hadoop and Spark. You will gain experience with creating Data Warehouses and utilize Business Intelligence tools to analyze and extract insights.
- Each course includes numerous hands-on labs & projects to apply the concepts and skills you learn. The program will culminate in a Capstone Project where you will bring together all of these skills to develop and implement an entire data platform with various data repositories and pipelines to address a real-world inspired data analytics problem. This program does not require any prior data engineering, or programming experience.
- Applied Learning Project
- Throughout this Professional Certificate, you will complete hands-on labs and projects to help you gain practical experience with Python, SQL, Relational Databases, NoSQL Databases, Apache Spark, building a data pipeline, managing a database and working with data in a data warehouse. In the final course in this Professional Certificate, you will complete a Capstone Project that applies what you have learned to a real-world inspired scenario that requires you to design, deploy and manage an end-to-end data engineering platform consisting of various Relational (Transactional Data Warehousing), NoSQL & Big Data repositories as well as data pipelines to connect them.
IBM Data Engineering Professional Certificate at Coursera Curriculum
COURSE 1 - Introduction to Data Engineering
This course introduces you to the core concepts, processes, and tools you need to know in order to get a foundational knowledge of data engineering. You will gain an understanding of the modern data ecosystem and the role Data Engineers, Data Scientists, and Data Analysts play in this ecosystem.
The Data Engineering Ecosystem includes several different components. It includes disparate data types, formats, and sources of data. Data Pipelines gather data from multiple sources, transform it into analytics-ready data, and make it available to data consumers for analytics and decision-making. Data repositories, such as relational and non-relational databases, data warehouses, data marts, data lakes, and big data stores process and store this data. Data Integration Platforms combine disparate data into a unified view for the data consumers.
COURSE 2 - Python for Data Science, AI & Development
Kickstart your learning of Python for data science, as well as programming in general, with this beginner-friendly introduction to Python. Python is one of the world's most popular programming languages, and there has never been greater demand for professionals with the ability to apply Python fundamentals to drive business solutions across industries.
This course will take you from zero to programming in Python in a matter of hours'no prior programming experience necessary! You will learn Python fundamentals, including data structures and data analysis, complete hands-on exercises throughout the course modules, and create a final project to demonstrate your new skills.
By the end of this course, you'll feel comfortable creating basic programs, working with data, and solving real-world problems in Python. You'll gain a strong foundation for more advanced learning in the field, and develop skills to help advance your career.
COURSE 3 - Python Project for Data Engineering
This mini-course is intended to apply foundational Python skills by implementing different techniques to collect and work with data. Assume the role of a Data Engineer and extract data from multiple file formats, transform it into specific datatypes, and then load it into a single source for analysis. Continue with the course and test your knowledge by implementing webscraping and extracting data with APIs all with the help of multiple hands-on labs. After completing this course you will have acquired the confidence to begin collecting large datasets from multiple sources and transform them into one primary source, or begin web scraping to gain valuable business insights all with the use of Python.
PRE-REQUISITE: **Python for Data Science, AI and Development** course from IBM is a pre-requisite for this project course. Please ensure that before taking this course you have either completed the Python for Data Science, AI and Development course from IBM or have equivalent proficiency in working with Python and data.
COURSE 4 - Introduction to Relational Databases (RDBMS)
In this course, you will learn the essential concepts behind relational databases and Relational Database Management Systems (RDBMS). You'll study relational data models and discover how they are created and what benefits they bring, and how you can apply them to your own data. You'll be introduced to several industry standard relational databases, including IBM DB2, MySQL, and PostgreSQL.
This course incorporates hands-on, practical exercises to help you demonstrate your learning. You will work with real databases and explore real-world datasets. You will create database instances and populate them with tables.
No prior knowledge of databases or programming is required. Anyone can audit this course at no-charge. If you choose to take this course and earn the Coursera course certificate, you can also earn an IBM digital badge upon successful completion of the course.
COURSE 5 - Databases and SQL for Data Science with Python
Much of the world's data resides in databases. SQL (or Structured Query Language) is a powerful language which is used for communicating with and extracting data from databases. A working knowledge of databases and SQL is a must if you want to become a data scientist.
The purpose of this course is to introduce relational database concepts and help you learn and apply foundational knowledge of the SQL language. It is also intended to get you started with performing SQL access in a data science environment.
The emphasis in this course is on hands-on and practical learning . As such, you will work with real databases, real data science tools, and real-world datasets. You will create a database instance in the cloud. Through a series of hands-on labs you will practice building and running SQL queries. You will also learn how to access databases from Jupyter notebooks using SQL and Python.
No prior knowledge of databases, SQL, Python, or programming is required.
Anyone can audit this course at no-charge. If you choose to take this course and earn the Coursera course certificate, you can also earn an IBM digital badge upon successful completion of the course.
COURSE 6 - Introduction to NoSQL Databases
This course will provide you with technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to effectively handle scalability and flexibility issues raised by modern applications.
You will start by learning the history and the basics of NoSQL databases and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ from each other. You will explore the architecture and features of several different implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant. You will then get hands-on experience using those NoSQL databases to perform standard database management tasks, such as creating and replicating databases, loading and querying data, modifying database permissions, indexing and aggregating data, and sharding (or partitioning) data.
COURSE 7 - Introduction to Big Data with Spark and Hadoop
Bernard Marr defines Big Data as the digital trace that we are generating in this digital era. In this course, you will learn about the characteristics of Big Data and its application in Big Data Analytics. You will gain an understanding about the features, benefits, limitations, and applications of some of the Big Data processing tools. You'll explore how Hadoop and Hive help leverage the benefits of Big Data while overcoming some of the challenges it poses.
Apache Spark is an open-source processing engine that provides users new ways to store and make use of big data. It is an open-source processing engine built around speed, ease of use, and analytics. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the different components that make up Apache Spark.
In this course, you will also learn about Resilient Distributed Datasets, or RDDs, that enable parallel processing across the nodes of a Spark cluster.
COURSE 8 - Data Engineering and Machine Learning using Spark
In this short course you'll gain practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering. The course culminates in a project where you will apply your Spark skills to an ETL for ML workflow use-case.
COURSE 9 - Hands-on Introduction to Linux Commands and Shell Scripting
This mini-course provides a practical introduction to commonly used Linux / UNIX shell commands and teaches you basics of Bash shell scripting to automate a variety of tasks. The course includes both video-based lectures as well as hands-on labs to practice and apply what you learn. You will have no-charge access to a virtual Linux server that you can access through your web browser, so you don't need to download and install anything to perform the labs.
In this course you will work with general purpose commands like id, date, uname, ps, top, echo, man; directory manageent commands such as pwd, cd, mkdir, rmdir, find, df; file management commands like cat, wget, more, head, tail, cp, mv, touch, tar, zip, unzip; access control command chmod; text processing commands - wc, grep, tr; as well as networking commands - hostname, ping, ifconfig and curl.
You will create simple to more advanced shell scripts that involve Metacharacters, Quoting, Variables, Command substitution, I/O Redirection, Pipes & Filters, and Command line arguments. You will also schedule cron jobs using crontab.
COURSE 10 - Relational Database Administration (DBA)
Ongoing and proactive management is critical to the security and performance of database management systems. Database administration is the function of managing the operational aspects of database systems and maintaining them. Database administrators work to ensure that applications make the most efficient use of databases and that physical resources are used adequately and efficiently. In this course, you will discover some of the activities, techniques, and best practices for managing a database. You will learn about configuring and upgrading database server software and related products. You will also learn about database security; how to implement user authentication, assign roles, and assign object-level permissions. You will also gain an understanding of how to perform backup and restore procedures in case of system failures.
You will learn about how to optimize databases for performance, monitor databases, collect diagnostic data, and access error information to help you resolve issues that may occur. Many of these tasks are repetitive, so you will learn how to schedule maintenance activities and regular diagnostic tests and send automated messages of success or failure of a task.
COURSE 11 - ETL and Data Pipelines with Shell, Airflow and Kafka
After taking this course, you will be able to describe two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application.
You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for importing data into data repositories. You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure.
Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.
COURSE 12 - Getting Started with Data Warehousing and BI Analytics
Data is one of an organization's most valuable commodities. But how can organizations best use their data? And how does the organization determine which data is the most recent, accurate, and useful for business decision making at the highest level?
After taking this course, you will be able to describe different kinds of repositories including data marts, data lakes, and data reservoirs, and explain their functions and uses.
You will also be able to describe how data warehouses serve a single source of data truth for organization's current and historical data. Organizations create data value using analytics and business intelligence applications. Now that you have experienced the ELT process, gain hands-on analytics and business intelligence experience using IBM Cognos and its reporting, dashboard features including visualization capabilities. Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.
COURSE 13 - Data Engineering Capstone Project
In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate. You will assume the role of a Junior Data Engineer who has recently joined the organization and be presented with a real-world use case that requires a data engineering solution.