DASVM’s AWS Data Engineer course is geared for people who want to enhance their skills in AWS to help organizations design and migrate their architecture to the cloud. This is the next step after obtaining your Cloud Practitioners Certification. This course will develop your skills your learned from the Cloud Practitioners Course in further detail. We cover a broad range of topics with a specific concentration on High Availability, EC2, VPC, Storage and Overall Management of the AWS Console.
What you'll learn
- Data engineering concepts and AWS services
- AWS Essentials such as s3, IAM, EC2, etc
- Managing AWS IAM users, groups, roles and policies for RBAC (Role Based Access Control)
- Engineering Batch Data Pipelines using AWS Glue Jobs
- Running Queries using AWS Athena - Server less query engine service
- Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
- Data Ingestion using AWS Lambda Functions
- Engineering Streaming Pipelines using AWS Kinesis
- Streaming Web Server logs using AWS Kinesis Firehose
- Running AWS Athena queries or commands using CLI
- Creating AWS Redshift Cluster, Create tables and perform CRUD Operations
Course content
SQL Basics
- Introduction to SQL
- Software Installation:
- Manipulating data:
- SQL Operators
- Working with SQL: Join, Tables, and Variables
- Deep Dive into SQL Functions
- Working with Subqueries
- SQL Views, Functions, and Stored Procedures
Python Basics
- Overview of Python
- Methods
- String Manipulation
- Working With Files
- Dataframes & Dataframes Methods
- Values, Types, Variables
- Operands and Expressions, Conditional Statements
- Loops, Command Line Arguments
- Python files I/O Functions
- Numbers
- Strings, Tuples, Dictionaries, Sets Functions & Parameters
- Global Variables, Lambda Functions
- Standard Libraries, Import Statements
- NUMPY’S, PANDA
Introduction to SPARK
- Big Data, Hadoop Ecosystem and HDFS, Core Components
- YARN and its Advantage
- Hadoop Cluster and its Architecture
- Big Data Analytics with Batch & Real-Time Processing
- Linux Commands
- MAP Reduce
- HIVE AND SQOOP
- Introduction to Spark
- Components & its Architecture, Deployment Modes
- Spark Shell
- Writing Spark Job (SBT)& Submitting
- Web UI, Data Ingestion using Sqoop
- What is RDD, It’s Operations, Transformations & Actions, Data Loading & Saving Through RDDs
- Key-Value Pair RDDs
- RDD Lineage, Persistence
- WordCount,
- Passing Functions to Spark
PySpark, Processing Multiple Batches & Streaming
- Spark SQL Architecture, SQL Context in Spark SQL
- Functions, Data Frames & Datasets
- Interoperating with RDDs, Loading Data through Different Sources
- Spark – Hive Integration
- Introduction to PySpark Shell
- Submitting PySpark Job
- Writing your first PySpark Job
- Introduction to MLlib, MLlib Tools
- ML algorithms, Linear Regression, Logistic Regression, Decision Tree, Random Forest
- K-Means Clustering
- Spark Streaming? Features & Workflow
- Streaming Context & DStreams Transformations
- Slice, Window and ReduceByWindow Operators
- Stateful Operators
- Streaming: Data Sources
- Apache Flume and Apache Kafka Data Sources
- Example: Using a Kafka Direct Data Source
APACHE Airflow
- Introduction to APACHE AIRFLOW
- AIRFLOW DAG’s, Running Tasks and Navigating
- Operators, Scheduling, & trouble shooting
- SLA
- Building Protection Pipeline
- Introduction to Tableau
- Calculations: working with Strings, Dates and Conversion functions
- Creating a Dashboard
- Combining Views
- Add Actions for Interactivity
- Using Sets
- Advance Calculations
- Filters and Expressions
- Advance Mapping
- Comparing Measures against a Goal
- Showing Statistics and Forecasting
- Advanced Dashboards
- AWS BASICS
Introduction to AWS Data Engineer
- Intro to AWS And its importance in Data Engineer
- Create AWS S3 Bucket using AWS Web Console
- Create AWS IAM Group and User using AWS Web Console
- Overview of AWS IAM Roles to grant permissions between AWS Services
- Create and Attach AWS IAM Custom Policy using AWS Web Console
- Configure and Validate AWS Command Line Interface to run AWS Commands
Amazon Simple Storage Service (Amazon S3)
- Amazon Cloud Front, Edge locations and Route53
- Demonstration and hands on Labs on creating S3 Buckets, Hosting Static, CloudFront
- Putting Objects, Bucket Properties
- Introduction to S3
- AWS Management Console, AWS CLI, Boto3
- S3 Multipart Upload, Storage Classes
- S3 Security and Encryption
- Database Engine Types
- Relational Database Service (RDS)
- Serverless Options
- Lab: RDS Instances and Engines
- AWS Elastic Map Reduce
- Use Cases & Hands on
AWS Security using IAM – Managing AWS Users, Roles and Policies
- Creating AWS IAM Users with Programmatic and Web Console Access
- Logging into AWS Management Console using AWS IAM User
- Validate Programmatic Access to AWS IAM User via AWS CLI
- Getting Started with AWS IAM Identity-based Policies
- Managing AWS IAM User Groups
- Managing AWS IAM Roles for Service Level Access
- Overview of AWS Custom Policies to grant permissions to Users, Groups, and Roles
- Managing AWS IAM Groups, Users, and Roles using AWS CLI
AWS Lambda
- Introduction to AWS Lambda
- Introduction to Data Collection and Getting Data Into AWS
- Direct Connect, Snowball, Snowball Edge, Snowmobile
- Database Migration Service
- Data Pipeline
- Lambda, API Gateway, and CloudFront
- Features
- Use Cases
- Limitation
- Hands on Labs
AWS Glue
- What is AWS Glue
- ETL With Glue
- Working in Python Shell
- Working in Spark Shell
- Checking Logs
- AWS Glue Data Catalog
- AWS Glue Jobs
- Glue Job Demo
- Job Bookmarks
- AWS Glue Crawlers
- ETL Project
- Use Cases & Pricing
Getting started with EMR
- Planning of EMR Cluster
- Create EC2 Key Pair
- Setup EMR Cluster with Spark
- Understanding Summary of AWS EMR Cluster
- Review EMR Cluster Application User Interfaces
- Review EMR Cluster Monitoring07 Review EMR Cluster Monitoring
- Review EMR Cluster Hardware and Cluster Scaling Policy
- Review EMR Cluster Configurations
- Review EMR Cluster Events
- Review EMR Cluster Steps
- Review EMR Cluster Bootstrap Actions
- Connecting to EMR Master Node using SSH
- Disabling Termination Protection and Terminating the Cluster
- Clone and Create New Cluster
- Listing AWS S3 Buckets and Objects using AWS CLI on EMR Cluster
- Listing AWS S3 Buckets and Objects using HDFS CLI on EMR Cluster
- Managing Files in AWS s3 using HDFS CLI on EMR Cluster
AWS Kinesis
- Building Streaming Pipeline using Kinesis
- Rotating Logs
- Setup Kinesis Firehose Agent
- Create Kinesis Firehose Delivery Stream
- Planning the Pipeline
- Create IAM Group and User
- Granting Permissions to IAM User using Policy
- Configure Kinesis Firehose Agent
- Start and Validate Agent
- Conclusion – Building Simple Steaming Pipeline
AWS Athena
- What is Athena
- Features
- Use Cases
- Creating Athena Tables
- Using Glue Crawlers
- Querying Athena Tables
- When To Use Athena
- Visualizations and Dashboards
- Security and Authentication
AWS Redshift
- Introduction To Redshift
- Features
- Redshift Architecture
- DDL, DML in Redshift
- Core Structure & MPP
- Redshift Spectrum
- Loading & Unloading Data From Redshift
- Pricing & Optimization
- Redshift in the AWS Service Ecosystem
- Redshift Use Cases
- Redshift Table Design
- Querying Data from Multiple Redshift Spectrum Tables
- Launching a Redshift Cluster
- Resizing a Redshift Cluster
- Utilizing Vacuum and Deep Copy
- Backup and Restore
- Monitoring
- Lab: Manually Migrating Data
AWS Integrations and Use Cases
- Amazon MQ
- Amazon SNS
- Amazon SQS
- Amazon SWS
- STEP FUNCTIONS
- Interview Preparations
- Mock Interviews
- Implementing End to End Project
- Assessments
Introduction to Aurora
- DB Clusters
- Connection Management
- Storage and Reliability
- Security
- High Availability and Global Databases for Aurora
- Replication with Aurora
- Setting Environment
- Amazon RDS Aurora Architecture
- Aurora Metrics, Logging & Events
- Aurora Scaling and High Availability
- Configuring Security
Other AWS Services
- CloudWatch
- CloudTrail
AWS Data Engineering Project
- Triggering Lambda with S3 Event
- Reading S3 File from AWS Glue
- Doing Transformation in AWS Glue
- Converting Data into Parquet & Loading into Redshift
- Troubleshooting & Monitoring
- Validating & Analyzing Data on Redshift
Case Studies Details
Data Ingestion Pipeline:
- This project involves data ingestion and processing pipeline creation with real-time streaming and batch loads on AWS. The use of tools and services would fairly be in accordance to the course curriculum; however, you are free to use other competitive solution as per your knowledge and framework design.
- The minimal project architecture would make use of a Data Warehouse (S3 and AWS Data lake), Data Streaming/Reading pipeline,
- RDS/DynamoDB to store metadata, event-based Notification and App using Lambda, data processing through PySpark on EMR cluster and dashboarding using Tableau.
- The whole solution built as part of this project would be capable of handling data ingestion process for an independent requirement in a limited capacity and may scale to higher production size data.
To see the full course content Download now
Course Prerequisites
- At least 2 years of hands-on experience working on AWS
- Experience and expertise working with AWS services to design, build, secure, and maintain analytics solutions
- Python / SQL knowledge
Who can attend
- Students should have sufficient AWS knowledge and have 2 years or more experience with AWS
- Beginners can also learn from scratch but will have to go through extensive lectures
- Existing AWS power users
- Beginner python developers curious for data science
- Data engineers
- BI/ ETL Developers
- Data Scientists / Analysts
- Anyone from technical background who wants to learn Data engineering in AWS
- Professionals who wish to learn advanced ways to AWS and build a data warehouse
Number of Hours: 40hrs
Certification
Key features
- One to One Training
- Online Training
- Fastrack & Normal Track
- Resume Modification
- Mock Interviews
- Video Tutorials
- Materials
- Real Time Projects
- Virtual Live Experience
- Preparing for Certification
FAQs
DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.
- One to One Training
- Online Training
- Fastrack & Normal Track
- Resume Modification
- Mock Interviews
- Video Tutorials
- Materials
- Real Time Projects
- Materials
- Preparing for Certification
Call now: +91-99003 49889 and know the exciting offers available for you!
We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.
Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.
All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.
No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.
DASVM Technologies provides many suitable modes of training to the students like:
- Classroom training
- One to One training
- Fast track training
- Live Instructor LED Online training
- Customized training
Yes, the access to the course material will be available for lifetime once you have enrolled into the course.
You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.
Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.
Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.
We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.
DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.
Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.
Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com