This training course for data quality testing with Soda is designed to transform data professionals into experts at maintaining reliable pipelines. The primary objectives focus on mastering the Soda Checks Language (SodaCL) and integrating automated validation into the modern data stack. It focuses on using SodaCL (Soda Checks Language)—a YAML-based, human-readable language—to write tests that both engineers and stakeholders can understand.
Course Objectives:
In this course, you will learn to:
- Establish a Foundation in Data Quality Principles: Understand and apply the five key data quality dimensions—accuracy, completeness, consistency, validity, and freshness—within a production environment.
- Master SodaCL for Declarative Testing: Learn to write human-readable SodaCL checks for standard metrics such as row_count, duplicate_count, and missing_count.
- Implement Advanced Logic & Custom Metrics: Gain the ability to define complex validation logic using Custom SQL checks and User-Defined Metrics when built-in metrics are insufficient.
- Automate Testing in Data Pipelines: Integrate Soda Core into orchestration tools like Airflow or CI/CD workflows (e.g., GitHub Actions) to catch "bad data" before it reaches production.
- Deploy Observability and Alerts: Configure Soda Cloud to visualize health trends, detect anomalies via machine learning, and set up real-time notifications for tools like Slack or Jira.
- Execute Data Governance & Compliance: Use Data Contracts to align data producers and consumers on explicit quality standards, ensuring regulatory compliance (e.g., BCBS 239).
Course content
Data Quality Testing with Soda Tool
Introduction to Soda & Data Quality
- Overview of the Soda Data Quality Platform and its components (Core vs. Cloud).
- Key Data Quality Dimensions: Accuracy, Completeness, Consistency, Validity, and Freshness.
Installation & Configuration
- Setting up a Python Virtual Environment for Soda.
- Installing Soda Core via CLI and connecting to data sources (e.g., Snowflake, BigQuery, or Databricks).
- Configuring the configuration with API keys and data source credentials.
Defining Checks with SodaCL
- Writing Soda Checks Language (SodaCL) in checks.yml files.
- Standard Checks: row_count, missing_count, duplicate_count, and invalid_count.
- Dynamic Thresholds: Setting pass/fail/warn conditions.
- Custom SQL Checks: Writing bespoke validations using fail query.
Running Scans & Analyzing Results
- Executing the Soda Scan from the command line.
- Reviewing results in the CLI and investigating failed records via the Soda Cloud Dashboard.
Pipeline Integration & Orchestration
- Integrating Soda into CI/CD (GitHub Actions) or Airflow DAGs.
- Setting up Data Contracts to prevent bad data from reaching production.
Advanced Observability (Soda Cloud)
- Anomaly Detection: Using AI to detect statistical outliers.
- Schema Evolution: Monitoring for unexpected column additions or deletions.
- Incidents & Alerts: Configuring Slack or MS Teams notifications for failing tests.
To see the full course content Download now
Course Prerequisites
- Basic SQL:Understanding how to query tables is essential for Custom SQL checks.
- Basic Python/CLI:Comfort with the terminal is needed to install Soda Core and run scans.
- YAML Familiarity:Since SodaCL is YAML-based, knowing how to indent properly will save you a lot of debugging time.
Who can attend
- Data Engineers:To automate schema evolution checks and row-level validations.
- Analytics Engineers:To ensure the models built in tools like dbt are fed by clean raw data.
- Data Quality Analysts:To define Data Contracts that ensure producers and consumers are aligned.
- Data Governance Officers:Those who use the Soda Cloud Dashboard to track high-level health scores and ensure the company meets regulatory standards like GDPR or BCBS 239.
- Data Scientists:Users who need to verify that the features feeding their Machine Learning models haven't drifted or become corrupted.
- Product Managers (Data):Stakeholders who want to define Data Contracts to ensure the data they receive from engineering meets specific business requirements.
Number of Hours: 15hrs
Certification
Key features
- One to One Training
- Online Training
- Fastrack & Normal Track
- Resume Modification
- Mock Interviews
- Video Tutorials
- Materials
- Real Time Projects
- Virtual Live Experience
- Preparing for Certification
FAQs
DASVM Technologies offers 300+ IT training courses with 10+ years of Experienced Expert level Trainers.
- One to One Training
- Online Training
- Fastrack & Normal Track
- Resume Modification
- Mock Interviews
- Video Tutorials
- Materials
- Real Time Projects
- Materials
- Preparing for Certification
Call now: +91-99003 49889 and know the exciting offers available for you!
We working and coordinating with the companies exclusively to get placed. We have a placement cell focussing on training and placements in Bangalore. Our placement cell help more than 600+ students per year.
Learn from experts active in their field, not out-of-touch trainers. Leading practitioners who bring current best practices and case studies to sessions that fit into your work schedule. We have a pool of experts and trainers are composed with highly skilled and experienced in supporting you in specific tasks and provide professional support. 24x7 Learning support from mentors and a community of like-minded peers to resolve any conceptual doubts. Our trainers has contributed in the growth of our clients as well as professionals.
All of our highly qualified trainers are industry experts with at least 10-12 years of relevant teaching experience. Each of them has gone through a rigorous selection process which includes profile screening, technical evaluation, and a training demo before they are certified to train for us. We also ensure that only those trainers with a high alumni rating continue to train for us.
No worries. DASVM technologies assure that no one misses single lectures topics. We will reschedule the classes as per your convenience within the stipulated course duration with all such possibilities. If required you can even attend that topic with any other batches.
DASVM Technologies provides many suitable modes of training to the students like:
- Classroom training
- One to One training
- Fast track training
- Live Instructor LED Online training
- Customized training
Yes, the access to the course material will be available for lifetime once you have enrolled into the course.
You will receive DASVM Technologies recognized course completion certification & we will help you to crack global certification with our training.
Yes, DASVM Technologies provides corporate trainings with Course Customization, Learning Analytics, Cloud Labs, Certifications, Real time Projects with 24x7 Support.
Yes, DASVM Technologies provides group discounts for its training programs. Depending on the group size, we offer discounts as per the terms and conditions.
We accept all major kinds of payment options. Cash, Card (Master, Visa, and Maestro, etc), Wallets, Net Banking, Cheques and etc.
DASVM Technologies has a no refund policy. Fees once paid will not be refunded. If the candidate is not able to attend a training batch, he/she is to reschedule for a future batch. Due Date for Balance should be cleared as per date given. If in case trainer got cancelled or unavailable to provide training DASVM will arrange training sessions with other backup trainer.
Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.
Please Contact our course advisor +91-99003 49889. Or you can share your queries through info@dasvmtechnologies.com
