AWS Data Engineer Master’s Program

DASVM’s AWS Data Engineer Master Program course is geared for people who want to enhance their skills in AWS to help organizations design and migrate their architecture to the cloud. This course is designed to equip learners with the essential skills and knowledge required to build, manage, and scale data pipelines, data lakes, and analytical solutions on the Amazon Web Services (AWS) platform. This course is ideal for individuals who are aiming to become proficient in leveraging AWS for various data engineering tasks such as data ingestion, storage, processing, analytics, and security. If you want to make your career in AWS Data Engineering Domain, get through this online course here you will learn to implement the concepts of AWS data engineering on the AWS platform. Adding to this, you will learn to build Data Engineering pipelines with the help of Lambda, Athena, Glue, EMR, etc.

img
request

Can’t find a batch you were looking for?

 

DASVM’s AWS Data Engineer Master Program course is geared for people who want to enhance their skills in AWS to help organizations design and migrate their architecture to the cloud. This course is designed to equip learners with the essential skills and knowledge required to build, manage, and scale data pipelines, data lakes, and analytical solutions on the Amazon Web Services (AWS) platform. This course is ideal for individuals who are aiming to become proficient in leveraging AWS for various data engineering tasks such as data ingestion, storage, processing, analytics, and security. If you want to make your career in AWS Data Engineering Domain, get through this online course here you will learn to implement the concepts of AWS data engineering on the AWS platform. Adding to this, you will learn to build Data Engineering pipelines with the help of Lambda, Athena, Glue, EMR, etc.

 
Course Objectives:
 

In this course, you will learn to:

 
  • Data engineering concepts and AWS services
  • AWS Essentials such as S3, IAM, EC2, etc
  • Managing AWS IAM users, groups, roles and policies for RBAC (Role Based Access Control)
  • Engineering Batch Data Pipelines using AWS Glue Jobs
  • Running Queries using AWS Athena - Server less query engine service
  • Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
  • Data Ingestion using AWS Lambda Functions
  • Engineering Streaming Pipelines using AWS Kinesis
  • Streaming Web Server logs using AWS Kinesis Firehose
  • Running AWS Athena queries or commands using CLI
  • Creating AWS Redshift Cluster, Create tables and perform CRUD Operations
   

Course content

 

1. Python

 

Introduction to Python (8hrs)
  • Basics of Python
  • Data Structures in Python
  • Control Structures
  • Functions in Python
  • OOPS in Python
Programming in Python (3hrs)
  • Basic Coding
  • Lists
  • Strings
  • Other Data Structures
Python Practice Questions (2hrs)

 

2. Database Management System

 

Database Design (3hrs)
  • What is a Data Warehouse?
  • Structure of a Data Warehouse
  • Star Schema
  • OLAP vs OLTP
  • SETL
  • Entity Constraints
  • Referential Constraints
  • Semantic Constraints
  • ERDs
  • Star Schema: A Demonstration
Database Creation (2hrs)
  • Introduction to DDL and DML
  • DDL
  • DML
  • Modifying Columns
Querying in MySQL (2hrs)
  • Introduction
  • SQL Statements and Operators
  • Aggregate Functions
  • Ordering and Having Clause
  • Views
Joins (3hrs)
  • Types of Joins
  • Self Joins
  • Cross Joins
  • Set Operations
Advanced SQL (4hrs)
  • Window Functions
  • Case Statements
  • Stored Routines and Cursors
  • Query Optimisation techniques
Problem Solving using SQL (2hrs)

 

3. Python for Data Science

 

NumPy (2hrs)
  • Basics of NumPy
  • Operations of 1-D Arrays
  • Multidimensional Arrays
  • Computation Times in NumPy vs Python Lists
Pandas (2hrs)
  • Basics of Pandas
  • Pandas -Rows and Columns
  • Describing Data
  • Indexing and Slicing
  • Operations on Dataframes
  • Group by Aggregate Functions
  • Merging Data Frames
  • Pivot Tables

 

4. Data Visualisation in Python

 

Data Visualisation in Python (2hrs)
  • Industry Level Case Study
  • Matplotlib
  • Seaborn

 

5. Exploratory Data Analysis

 

EDA (3hrs)
  • Data Sourcing
  • Data Cleaning
  • Univariate Analysis
  • Bivariate and Multivariate Analysis
Industry Level Data Analysis (2hrs)

 

6. Data Management and Relational Modelling

 

Data Management and Relational Modelling (3hrs)
  • Data Management
  • E-R Models
  • Relational Models
  • Data Normalisation

 

7. Introduction to Cloud Computing & AWS Setup

 

Cloud Computing (3hrs)
  • Introduction to Cloud Computing
  • Benefits of Cloud Computing
  • Cloud-based Architecture & Deployment Models
  • Types of Cloud Services
AWS (3hrs)
  • Introduction to AWS
  • Virtual Machine on Cloud – EC2
  • EC2 – Login, File Transfer & Instance Termination
AWS EMR (3hrs)
  • Setting up an Amazon EMR cluster
  • EMR – Login & File Transfer
  • Practising Linux Commands
  • EMR – Instance Termination
AWS Services (6hrs)
  • AWS Glue
  • AWS Lambda
  • AWS S3
Virtual Machines (1hr)
  • Introduction to Virtualisation

 

8. Introduction to Hadoop and MapReduce Programming

 

Introduction to Hadoop (2hrs)
  • Introduction to Distributed Systems
  • Introduction to GFS and MapReduce
  • Introduction to Hadoop
  • Hadoop 2.x 7. YARN
  • Task Processing in Hadoop
  • Tools for Hadoop
Introduction to HDFS (2hrs)
  • File Storage in HDFS
  • Basic Commands in HDFS
  • Write Operation in HDFS
  • Rack Awareness in Hadoop
  • Read Operation in HDFS
  • Features and Limitations of HDFS
MapReduce Programming (3hrs)
  • Introduction to MapReduce Framework
  • Basic Implementation of MapReduce using Python
  • Hadoop Streaming
  • The Combiner
  • The Partitions
  • Job Scheduling and Fault Tolerance

 

9. NoSQL Databases and Apache HBase

 

Introduction to NoSQL Databases and Apache HBase (2hrs)
  • Introduction
  • Why NoSQL Databases?
  • How Are NoSQL Databases Designed?
  • Types of NoSQL Databases and Use Cases
  • Introduction to HBase
  • Data Model of HBase
  • Setting up an EMR instance for HBase
  • HBase Shell Commands
Programming in HBase (2hrs)
  • Introduction
  • HappyBase – HBase Python API
  • HappyBase – Use case
How HBase Works (3hrs)
  • Introduction
  • HBase Architecture
  • Read Operation in HBase
  • Write Operation in HBase
  • HBase Schema Design
  • HBase Use Cases
  • HBase Advantages and Disadvantages

 

10. Data Ingestion with Apache Sqoop and Apache Flume

 

Introduction to Data Ingestion (2hrs)
  • Introduction
  • Session Overview
  • Data Ingestion
  • Challenges in Data Ingestion
  • Key Steps of Data Ingestion
  • Tools for Data Ingestion
  • Types of Data and File Formats
Apache Sqoop – I (2hrs)
  • Session Overview
  • Introduction to Sqoop and its Architecture
  • Case Studies for Apache Sqoop
  • Apache Sqoop Set-Up and Database Set-Up
  • Exporting Data – Sqoop Export
  • Importing Data – Sqoop Import
  • Importing Data – Importing All Tables Using Sqoop
  • Importing Data – Handling NULL Values
  • Importing Data – Handling Mappers for a Sqoop Job
  • Importing Data – Importing in Various File Formats
  • Importing Data – Compression using Sqoop
  • Extra Coding Questions – Sqoop – I
Apache Sqoop – II (2hrs)
  • Session Overview
  • Importing Data – Importing Specific Rows in Sqoop
  • Importing Data – SQL Queries in Sqoop Import
  • Importing Data – Using Incremental Import in Sqoop
  • Sqoop Jobs
  • Tuning Sqoop
  • Extra Coding Questions – Sqoop – II
Apache Flume (2hrs)
  • Session Overview
  • Introduction to Apache Flume
  • Components of Flume
  • Characteristics and Use Cases of Flume
  • Case Study – Log Collection
  • Installation of Flume on Amazon EMR Instance
  • Flume Configuration Files
  • Flume Flows
  • Log Collection using Flume
  • Tuning Flume
  • Sqoop vs. Flume
  • Flume Practice Questions

 

11.Hive and Querying

 

Introduction to Hive (3hrs)
  • Module Mind Map
  • Session Overview
  • Introduction to Hive
  • Hive at Ola & Pinterest
  • Key Features of Hive
  • Use Cases of Hive
  • Architecture of Hive
  • Hive vs Relational Databases
  • Hive Data Models
  • Data Types in Hive
Basic Hive Queries (2hrs)
  • Session Overview
  • EMR and Hue Setup
  • Database Creation
  • Internal and External Tables I
  • Internal and External Tables II
  • Operations on Tables
  • Order By and Sort By
  • Distribute By and Cluster By
  • Indexing I
  • Indexing II
  • User-Defined Functions
  • Practice Question
Advanced Hive Queries (2hrs)
  • Introduction
  • Joins in Hive
  • Static Partitioning
  • Dynamic Partitioning and Dropping the Partitions
  • Bucketing
  • Practice Questions
Data Analysis using Hive (1hr)
  • Introduction
  • Load Amazon Review Data Set
  • External Table Creation
  • Data Analysis Without Partition
  • Data Analysis Using Partition
  • HBase-Hive Integration
  • Practice Questions

 

12. Amazon Redshift

 

Traditional Warehouse Vs. Amazon Redshift (1hr)
  • Module Introduction
  • Session Introduction
  • Recap: Data Warehousing
  • On-Premise vs Cloud Data Warehouses
  • Why Amazon Redshift?
  • Industrial Use Cases of Amazon Redshift
Redshift: Introduction and Architecture (2hrs)
  • Session Introduction
  • Introduction to Amazon Redshift
  • Redshift Architecture
  • Key Performance Features of Redshift
  • SORT Key I
  • SORT Key II & ZONE Maps
  • Data Distribution: DIST Key
Redshift Administration (2hrs)
  • Session Introduction
  • Creating a Redshift Cluster
  • Redshift Cluster: Node Types & Maintenance
  • Workload Management
  • Fault Tolerance and Security
Redshift Development (2hrs)
  • Session Introduction
  • Getting Started With Redshift Queries
  • Best Practices for Redshift Tables
  • Loading Data Into Redshift Tables
  • Data Analysis With Redshift
  • Custom Queries With Redshift
  • Query Optimisation in Redshift

 

13. Introduction to Apache Spark

 

Getting Started with Apache Spark (2hrs)
  • Module Introduction
  • Session Overview
  • Spark Overview
  • Spark vs. MapReduce
  • Spark Ecosystem
  • Spark Architecture
  • Spark APIs
Programming with Spark RDD (2hrs)
  • Session Overview
  • Spark Installation
  • Introduction to Spark RDDs
  • Creating RDDs
  • Operations on RDDs
  • Transformation Operations
  • Action Operations
  • Lazy Evaluation in Spark
Spark Structured APIs (5hrs)
  • Session Overview
  • Introduction to Structured APIs
  • DataFrames and Datasets
  • Catalyst Optimizer
  • Getting Started with DataFrame APIs
  • From Pandas Dataframe
  • DataFrame Operations
  • Spark SQL
ETL Project (3hrs)

 

14. Optimising Spark for Large Scale Data Processing

 

Optimising Disk IO for Spark (2hrs)
  • Course Introduction
  • Module Introduction
  • Session Overview
  • Spinning Up a Spark EMR Cluster
  • Spark jobs – Can We Do Better?
  • Analysing a Spark job
  • Why Optimise a Spark job?
  • Understanding Disk IO in Spark
  • Using Various File Formats in Spark
  • Serialization and Deserialization in Spark
  • Spark Memory Management Parameters
  • Practice Coding Questions
Optimising Network IO for Spark (2hrs)
  • Session Overview
  • Understanding Network IO
  • Understanding Shuffles
  • Optimising Joins in Spark
  • Understanding Data Partitioning in Spark
  • Practice Coding Questions
Optimising the Spark Clusters (2hrs)
  • Session Overview
  • Why Optimise Cluster Utilisation for Spark?
  • Job Deployment Modes in Spark
  • Tuning Spark Memory and CPU Parameters
  • Cost and Performance Trade-Offs
  • Apache Spark in the Production Environment
  • Best Practices While Working with Apache Spark
  • The Optimised Spark Job!
  • Practice Coding Questions

 

15. Real-Time Data Streaming with Apache Kafka

 

Introduction to Kafka (2hrs)
  • Module Introduction
  • Session Introduction
  • Batch and Real-Time Processing
  • Traditional Messaging System
  • Kafka – Introduction and Features
  • Use-Cases of Kafka
  • Kafka Architecture
Kafka Internals (2hrs)
  • Session Introduction
  • Topics and Partitions
  • Producers and Consumers
  • Consumer Groups
  • Rebalancing
  • Topic Replication
Producer and Consumer Demo (2hrs)
  • Session Introduction
  • Starting Kafka Servers
  • Creating a Topic
  • Using CLI to Start Producers and Consumers
  • Python Code For Producers
  • Python Code For Consumers
Kafka Connect and Kafka Streams (2hrs)
  • Session Introduction
  • Introduction: Kafka Connect API
  • Intricacies of Kafka Connect
  • Demo: Kafka Connect – Fetching Tweets
  • Introduction: Kafka Streams
  • Stream Processing Topology
  • Kafka Streams: Word Count Application
  • Running Word Count Demo Application
  • Practice Problem – Kafka Connect/Streams

 

16. Real-Time Data Processing using Spark Streaming

 

Introduction To Spark Streaming (3hrs)
  • Module Introduction
  • Session Overview
  • What Is Streaming?
  • Differences Between Streaming And Micro-Batching
  • What Is Spark Streaming?
Getting Started With Structured Streaming (2hrs)
  • Session Overview
  • What Is Structured Streaming?
  • First Spark Structured Streaming Application
  • Triggers And Output Modes
  • Implementing Triggers And Output Modes
  • Using Transformations And Aggregations
  • Joins With Streams
  • Implementing Joins In Structured Streaming
Advanced Structured Streaming (2hrs)
  • Session Overview
  • Windows
  • Implementing Windows
  • Late-Arriving Data and Watermarks
Spark Integration – Apache Kafka (2hrs)
  • Session Overview
  • Kafka Integration
  • Session Summary
  • Module Summary

 

17. Automating Data Pipelines using Apache Airflow

 

Introduction to Apache Airflow (2hrs)
  • Module Introduction
  • Session Introduction
  • Understanding Data Pipelines
  • Data Pipeline Use Case: Uber
  • How to Automate a Data Pipeline?
  • Introduction to Apache Airflow
  • DAGs: Data Pipelines in Airflow
  • Airflow Architecture
Hands-On with Apache Airflow (2hrs)
  • Session Introduction
  • Airflow Installation on EMR instance
  • Operators
  • Bash Operator
  • Python Operator
  • Sqoop Operator
  • Hive Operator
  • Spark Operator
Real-World Use Case of Airflow (2hrs)
  • Session Overview
  • Problem Statement
  • Coding Demonstration
  • DAG Construction
  • Spark Applications
  • Setting Task Dependencies
  • Running our DAG
  • Airflow Best Practices
  • Advantages and Limitations Of Airflow

 

18. Analytics Using Pyspark

 

Basic EDA Using Spark ML Library (2hrs)
  • Module Introduction
  • Session Introduction
  • MLLib Overview
  • Impute
  • Feature Transformer: Vector Assembler
  • Pipeline
Analysis using Spark (2hrs)
Capstone Project (3hrs)

 

19. AWS Other Services

 

Amazon Simple Storage Service (Amazon S3)
  • Amazon Cloud Front, Edge locations and Route53
  • Demonstration and hands on Labs on creating S3 Buckets, Hosting Static, CloudFront
  • Putting Objects, Bucket Properties
  • Introduction to S3
  • AWS Management Console, AWS CLI, Boto3
  • S3 Multipart Upload, Storage Classes
  • S3 Security and Encryption
  • Database Engine Types
  • Relational Database Service (RDS)
  • Serverless Options
  • Lab: RDS Instances and Engines
  • AWS Elastic Map Reduce
  • Use Cases & Hands on
AWS Security using IAM – Managing AWS Users, Roles and Policies
  • Creating AWS IAM Users with Programmatic and Web Console Access
  • Logging into AWS Management Console using AWS IAM User
  • Validate Programmatic Access to AWS IAM User via AWS CLI
  • Getting Started with AWS IAM Identity-based Policies
  • Managing AWS IAM User Groups
  • Managing AWS IAM Roles for Service Level Access
  • Overview of AWS Custom Policies to grant permissions to Users, Groups, and Roles
  • Managing AWS IAM Groups, Users, and Roles using AWS CLI
AWS Lambda
  • Introduction to AWS Lambda
  • Introduction to Data Collection and Getting Data Into AWS
  • Direct Connect, Snowball, Snowball Edge, Snowmobile
  • Database Migration Service
  • Data Pipeline
  • Lambda, API Gateway, and CloudFront
  • Features
  • Use Cases
  • Limitation
  • Hands on Labs
AWS Glue
  • What is AWS Glue
  • ETL With Glue
  • Working in Python Shell
  • Working in Spark Shell
  • Checking Logs
  • AWS Glue Data Catalog
  • AWS Glue Jobs
  • Glue Job Demo
  • Job Bookmarks
  • AWS Glue Crawlers
  • ETL Project
  • Use Cases & Pricing
AWS Kinesis
  • Building Streaming Pipeline using Kinesis
  • Rotating Logs
  • Setup Kinesis Firehose Agent
  • Create Kinesis Firehose Delivery Stream
  • Planning the Pipeline
  • Create IAM Group and User
  • Granting Permissions to IAM User using Policy
  • Configure Kinesis Firehose Agent
  • Start and Validate Agent
  • Conclusion – Building Simple Steaming Pipeline
AWS Athena
  • What is Athena
  • Features
  • Use Cases
  • Creating Athena Tables
  • Using Glue Crawlers
  • Querying Athena Tables
  • When To Use Athena
  • Visualizations and Dashboards
  • Security and Authentication
AWS Integrations and Use Cases
  • Amazon MQ
  • Amazon SNS
  • Amazon SQS
  • Amazon SWS
  • STEP FUNCTIONS
  • Interview Preparations
  • Mock Interviews
  • Implementing End to End Project
  • Assessments
Introduction to Aurora
  • DB Clusters
  • Connection Management
  • Storage and Reliability
  • Security
  • High Availability and Global Databases for Aurora
  • Replication with Aurora
  • Setting Environment
  • Amazon RDS Aurora Architecture
  • Aurora Metrics, Logging & Events
  • Aurora Scaling and High Availability
  • Configuring Security

 

To see the full course content Download now

Course Prerequisites

 
  • Experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions
  • Python & SQL knowledge
  • A fundamental level understanding of big data and Hadoop concepts

Who can attend

 
  • Existing AWS power users
  • Beginner python developers curious for data science
  • Data engineers
  • BI/ ETL Developers
  • Data Scientists / Analysts
  • Data Engineers
  • Anyone from technical background who wants to learn Data engineering in AWS
  • Professionals who wish to learn advanced ways to AWS and build a data warehouse
  • Beginners can also learn from scratch but will have to go through extensive lectures

Number of Hours: 150hrs

Certification

AWS Data Engineer Associate (DEA-C01)

Key features

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Virtual Live Experience
  • Preparing for Certification

FAQs

DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.

  • One to One Training
  • Online Training
  • Fastrack & Normal Track
  • Resume Modification
  • Mock Interviews
  • Video Tutorials
  • Materials
  • Real Time Projects
  • Materials
  • Preparing for Certification

Call now: +91-99003 49889 and know the exciting offers available for you!

We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.

Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.

All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.

No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.

DASVM Technologies provides many suitable modes of training to the students like:

  • Classroom training
  • One to One training
  • Fast track training
  • Live Instructor LED Online training
  • Customized training

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.

Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.

Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.

We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.

DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.

Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.

Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com

like our courses