AWS Data Engineer Training | DASVM Technologies

AWS Data Engineer

DASVM’s AWS Data Engineer course is geared for people who want to enhance their skills in AWS to help organizations design and migrate their architecture to the cloud. This is the next step after obtaining your Cloud Practitioners Certification. This course will develop your skills your learned from the Cloud Practitioners Course in further detail. We cover a broad range of topics with a specific concentration on High Availability, EC2, VPC, Storage and Overall Management of the AWS Console.

What you'll learn

Data engineering concepts and AWS services
AWS Essentials such as s3, IAM, EC2, etc
Managing AWS IAM users, groups, roles and policies for RBAC (Role Based Access Control)
Engineering Batch Data Pipelines using AWS Glue Jobs
Running Queries using AWS Athena - Server less query engine service
Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
Data Ingestion using AWS Lambda Functions
Engineering Streaming Pipelines using AWS Kinesis
Streaming Web Server logs using AWS Kinesis Firehose
Running AWS Athena queries or commands using CLI
Creating AWS Redshift Cluster, Create tables and perform CRUD Operations

Course content

SQL Basics

Introduction to SQL
Software Installation:
Manipulating data:
SQL Operators
Working with SQL: Join, Tables, and Variables
Deep Dive into SQL Functions
Working with Subqueries
SQL Views, Functions, and Stored Procedures

Python Basics

Overview of Python
Methods
String Manipulation
Working With Files
Dataframes & Dataframes Methods
Values, Types, Variables
Operands and Expressions, Conditional Statements
Loops, Command Line Arguments
Python files I/O Functions
Numbers
Strings, Tuples, Dictionaries, Sets Functions & Parameters
Global Variables, Lambda Functions
Standard Libraries, Import Statements
NUMPY’S, PANDA

Introduction to SPARK

Big Data, Hadoop Ecosystem and HDFS, Core Components
YARN and its Advantage
Hadoop Cluster and its Architecture
Big Data Analytics with Batch & Real-Time Processing
Linux Commands
MAP Reduce
HIVE AND SQOOP
Introduction to Spark
Components & its Architecture, Deployment Modes
Spark Shell
Writing Spark Job (SBT)& Submitting
Web UI, Data Ingestion using Sqoop
What is RDD, It’s Operations, Transformations & Actions, Data Loading & Saving Through RDDs
Key-Value Pair RDDs
RDD Lineage, Persistence
WordCount,
Passing Functions to Spark

PySpark, Processing Multiple Batches & Streaming

Spark SQL Architecture, SQL Context in Spark SQL
Functions, Data Frames & Datasets
Interoperating with RDDs, Loading Data through Different Sources
Spark – Hive Integration
Introduction to PySpark Shell
Submitting PySpark Job
Writing your first PySpark Job
Introduction to MLlib, MLlib Tools
ML algorithms, Linear Regression, Logistic Regression, Decision Tree, Random Forest
K-Means Clustering
Spark Streaming? Features & Workflow
Streaming Context & DStreams Transformations
Slice, Window and ReduceByWindow Operators
Stateful Operators
Streaming: Data Sources
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source

APACHE Airflow

Introduction to APACHE AIRFLOW
AIRFLOW DAG’s, Running Tasks and Navigating
Operators, Scheduling, & trouble shooting
SLA
Building Protection Pipeline
Introduction to Tableau
Calculations: working with Strings, Dates and Conversion functions
Creating a Dashboard
Combining Views
Add Actions for Interactivity
Using Sets
Advance Calculations
Filters and Expressions
Advance Mapping
Comparing Measures against a Goal
Showing Statistics and Forecasting
Advanced Dashboards
AWS BASICS

Introduction to AWS Data Engineer

Intro to AWS And its importance in Data Engineer
Create AWS S3 Bucket using AWS Web Console
Create AWS IAM Group and User using AWS Web Console
Overview of AWS IAM Roles to grant permissions between AWS Services
Create and Attach AWS IAM Custom Policy using AWS Web Console
Configure and Validate AWS Command Line Interface to run AWS Commands

Amazon Simple Storage Service (Amazon S3)

Amazon Cloud Front, Edge locations and Route53
Demonstration and hands on Labs on creating S3 Buckets, Hosting Static, CloudFront
Putting Objects, Bucket Properties
Introduction to S3
AWS Management Console, AWS CLI, Boto3
S3 Multipart Upload, Storage Classes
S3 Security and Encryption
Database Engine Types
Relational Database Service (RDS)
Serverless Options
Lab: RDS Instances and Engines
AWS Elastic Map Reduce
Use Cases & Hands on

AWS Security using IAM – Managing AWS Users, Roles and Policies

Creating AWS IAM Users with Programmatic and Web Console Access
Logging into AWS Management Console using AWS IAM User
Validate Programmatic Access to AWS IAM User via AWS CLI
Getting Started with AWS IAM Identity-based Policies
Managing AWS IAM User Groups
Managing AWS IAM Roles for Service Level Access
Overview of AWS Custom Policies to grant permissions to Users, Groups, and Roles
Managing AWS IAM Groups, Users, and Roles using AWS CLI

AWS Lambda

Introduction to AWS Lambda
Introduction to Data Collection and Getting Data Into AWS
Direct Connect, Snowball, Snowball Edge, Snowmobile
Database Migration Service
Data Pipeline
Lambda, API Gateway, and CloudFront
Features
Use Cases
Limitation
Hands on Labs

AWS Glue

What is AWS Glue
ETL With Glue
Working in Python Shell
Working in Spark Shell
Checking Logs
AWS Glue Data Catalog
AWS Glue Jobs
Glue Job Demo
Job Bookmarks
AWS Glue Crawlers
ETL Project
Use Cases & Pricing

Getting started with EMR

Planning of EMR Cluster
Create EC2 Key Pair
Setup EMR Cluster with Spark
Understanding Summary of AWS EMR Cluster
Review EMR Cluster Application User Interfaces
Review EMR Cluster Monitoring07 Review EMR Cluster Monitoring
Review EMR Cluster Hardware and Cluster Scaling Policy
Review EMR Cluster Configurations
Review EMR Cluster Events
Review EMR Cluster Steps
Review EMR Cluster Bootstrap Actions
Connecting to EMR Master Node using SSH
Disabling Termination Protection and Terminating the Cluster
Clone and Create New Cluster
Listing AWS S3 Buckets and Objects using AWS CLI on EMR Cluster
Listing AWS S3 Buckets and Objects using HDFS CLI on EMR Cluster
Managing Files in AWS s3 using HDFS CLI on EMR Cluster

AWS Kinesis

Building Streaming Pipeline using Kinesis
Rotating Logs
Setup Kinesis Firehose Agent
Create Kinesis Firehose Delivery Stream
Planning the Pipeline
Create IAM Group and User
Granting Permissions to IAM User using Policy
Configure Kinesis Firehose Agent
Start and Validate Agent
Conclusion – Building Simple Steaming Pipeline

AWS Athena

What is Athena
Features
Use Cases
Creating Athena Tables
Using Glue Crawlers
Querying Athena Tables
When To Use Athena
Visualizations and Dashboards
Security and Authentication

AWS Redshift

Introduction To Redshift
Features
Redshift Architecture
DDL, DML in Redshift
Core Structure & MPP
Redshift Spectrum
Loading & Unloading Data From Redshift
Pricing & Optimization
Redshift in the AWS Service Ecosystem
Redshift Use Cases
Redshift Table Design
Querying Data from Multiple Redshift Spectrum Tables
Launching a Redshift Cluster
Resizing a Redshift Cluster
Utilizing Vacuum and Deep Copy
Backup and Restore
Monitoring
Lab: Manually Migrating Data

AWS Integrations and Use Cases

Amazon MQ
Amazon SNS
Amazon SQS
Amazon SWS
STEP FUNCTIONS
Interview Preparations
Mock Interviews
Implementing End to End Project
Assessments

Introduction to Aurora

DB Clusters
Connection Management
Storage and Reliability
Security
High Availability and Global Databases for Aurora
Replication with Aurora
Setting Environment
Amazon RDS Aurora Architecture
Aurora Metrics, Logging & Events
Aurora Scaling and High Availability
Configuring Security

Other AWS Services

CloudWatch
CloudTrail

AWS Data Engineering Project

Triggering Lambda with S3 Event
Reading S3 File from AWS Glue
Doing Transformation in AWS Glue
Converting Data into Parquet & Loading into Redshift
Troubleshooting & Monitoring
Validating & Analyzing Data on Redshift

Case Studies Details

Data Ingestion Pipeline:

This project involves data ingestion and processing pipeline creation with real-time streaming and batch loads on AWS. The use of tools and services would fairly be in accordance to the course curriculum; however, you are free to use other competitive solution as per your knowledge and framework design.
The minimal project architecture would make use of a Data Warehouse (S3 and AWS Data lake), Data Streaming/Reading pipeline,
RDS/DynamoDB to store metadata, event-based Notification and App using Lambda, data processing through PySpark on EMR cluster and dashboarding using Tableau.
The whole solution built as part of this project would be capable of handling data ingestion process for an independent requirement in a limited capacity and may scale to higher production size data.

To see the full course content Download now

Course Prerequisites

At least 2 years of hands-on experience working on AWS
Experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions
Python / SQL knowledge

Who can attend

Students should have sufficient AWS knowledge and have 2 years or more experience with AWS
Beginners can also learn from scratch but will have to go through extensive lectures
Existing AWS power users
Beginner python developers curious for data science
Data engineers
BI/ ETL Developers
Data Scientists / Analysts
Anyone from technical background who wants to learn Data engineering in AWS
Professionals who wish to learn advanced ways to AWS and build a data warehouse

Number of Hours: 40hrs

Certification

AWS Certified Data Analytics – Specialty (DAS-C01)

Key features

One to One Training
Online Training
Fastrack & Normal Track
Resume Modification
Mock Interviews
Video Tutorials
Materials
Real Time Projects
Virtual Live Experience
Preparing for Certification

FAQs

Why should I learn from DASVM technologies?

DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.

One to One Training
Online Training
Fastrack & Normal Track
Resume Modification
Mock Interviews
Video Tutorials
Materials
Real Time Projects
Materials
Preparing for Certification

Are you looking for existing offer?

Call now: +91-99003 49889 and know the exciting offers available for you!

Does DASVM Technologies offer placement assistance after course completion?

We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.

Who is my trainer and how they selected?

Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.

All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.

What if I Miss a class?

No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.

What are the different modes of training that DASVM Technologies provides?

DASVM Technologies provides many suitable modes of training to the students like:

Classroom training
One to One training
Fast track training
Live Instructor LED Online training
Customized training

Is the course material accessible to the students even after the course training is over?

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

What Certification will I receive after the course completion?

You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.

Does DASVM Technologies provide corporate trainings?

Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.

How about group discounts or Corporate training for our team?

Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.

What are the payment options?

We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.

What is the refund policy?

DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.

What if I have queries after I complete this course?

Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.

Have more queries?

Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com