Data Quality Engineer, Enterprise Data Platform - 10 +Years ( Data Quality & Data governance) , AWS , SQL & Python , Cloud, Power BI

7 Minutes ago • 5-10 Years
Data Analysis

Job Description

We are seeking a skilled and forward-thinking Data Quality Engineer to advance the data trust, governance, and certification framework for our enterprise Data Lakehouse platform built on Databricks, Apache Iceberg, and AWS. This role is critical in ensuring that data across Bronze, Silver, and Gold layers is certified, discoverable, and AI/BI-ready. You will design data quality pipelines, semantic layers, and governance workflows, enabling both Power BI dashboards and Conversational Analytics leveraging LLMs. Your work will ensure that all 9 dimensions of data quality are continuously met, so both humans and AI systems can trust and use the data effectively.
Good To Have:
  • Exposure to Conversational Analytics platforms or LLM-powered BI (e.g., natural language query over Lakehouse/Power BI).
  • Experience integrating LLM pipelines (LangChain, OpenAI, AWS Bedrock, etc.) with enterprise data.
  • Familiarity with data observability tools (Monte Carlo, Bigeye, DataDogs, Grafana).
  • Knowledge of data compliance frameworks (GDPR, CCPA, HIPAA).
  • Cloud certifications: AWS Data Analytics Specialty, Databricks Certified Data Engineer.
Must Have:
  • Build and maintain automated validation frameworks across Bronze → Silver → Gold pipelines.
  • Develop tests for schema drift, anomalies, reconciliation, timeliness, and referential integrity.
  • Integrate validation into Databricks (Delta Lake, Delta Live Tables, Unity Catalog) and Iceberg-based pipelines.
  • Define data certification workflows ensuring only trusted data is promoted for BI/AI consumption.
  • Leverage Atlan and AWS Glue Catalog for metadata management, lineage, glossary, and access control.
  • Utilize Iceberg’s schema evolution & time travel to ensure reproducibility and auditability.
  • Build a governed semantic layer on gold data to support BI and AI-driven consumption.
  • Enable Power BI dashboards and self-service reporting with certified KPIs and metrics.
  • Partner with data stewards to align semantic models with business glossaries in Atlan.
  • Prepare and certify datasets that fuel conversational analytics experiences.
  • Collaborate with AI/ML teams to integrate LLM-based query interfaces with Dremio, Databricks SQL, and Power BI.
  • Ensure LLM responses are grounded on high-quality, certified datasets, reducing hallucinations and maintaining trust.
  • Provide certified, feature-ready datasets for ML training and inference in SageMaker Studio.
  • Collaborate with ML engineers to ensure input data meets all 9 quality dimensions.
  • Establish monitoring for data drift and model reliability.
  • Continuously enforce all 9 dimensions of data quality: Accuracy, Completeness, Consistency, Timeliness, Validity, Uniqueness, Integrity, Conformity, Reliability.
  • 5–10 years of experience in data engineering, data quality, or data governance roles.
  • Strong skills in Python, PySpark, and SQL.
  • Hands-on with Databricks (Delta Lake, Unity Catalog, Delta Live Tables) and Apache Iceberg.
  • Expertise in AWS data stack (S3, Glue ETL, Glue Catalog, Athena, EMR, Redshift, SageMaker Studio).
  • Experience with Power BI semantic modeling, DAX, and dataset certification.
  • Familiarity with Dremio or query engines (Trino, Presto).
  • Knowledge of Atlan or equivalent catalog/governance tools.
  • Experience with data quality testing frameworks (Great Expectations, Deequ, Soda).

Add these skills to join the top 1% applicants for this job

data-analytics
unity
game-texts
aws
grafana
power-bi
python
sql

About the Role

We are seeking a skilled and forward-thinking Data Quality Engineer to advance the data trust, governance, and certification framework for our enterprise Data Lakehouse platform built on Databricks, Apache Iceberg, AWS (Glue, Glue Catalog, SageMaker Studio), Dremio, Atlan, and Power BI.

This role is critical in ensuring that data across Bronze (raw), Silver (curated), and Gold (business-ready) layers is certified, discoverable, and AI/BI-ready. You will design data quality pipelines, semantic layers, and governance workflows, enabling both Power BI dashboards and Conversational Analytics leveraging LLMs (Large Language Models).

Your work will ensure that all 9 dimensions of data quality (accuracy, completeness, consistency, timeliness, validity, uniqueness, integrity, conformity, reliability) are continuously met, so both humans and AI systems can trust and use the data effectively.

ESSENTIAL DUTIES AND RESPONSIBILITIES

Data Quality & Reliability

  • Build and maintain automated validation frameworks across Bronze → Silver → Gold pipelines.
  • Develop tests for schema drift, anomalies, reconciliation, timeliness, and referential integrity.
  • Integrate validation into Databricks (Delta Lake, Delta Live Tables, Unity Catalog) and Iceberg-based pipelines.

Data Certification & Governance

  • Define data certification workflows ensuring only trusted data is promoted for BI/AI consumption.
  • Leverage Atlan and AWS Glue Catalog for metadata management, lineage, glossary, and access control.
  • Utilize Iceberg’s schema evolution & time travel to ensure reproducibility and auditability.

Semantic Layer & Business Consumption

  • Build a governed semantic layer on gold data to support BI and AI-driven consumption.
  • Enable Power BI dashboards and self-service reporting with certified KPIs and metrics.
  • Partner with data stewards to align semantic models with business glossaries in Atlan.

Conversational Analytics & LLM Enablement

  • Prepare and certify datasets that fuel conversational analytics experiences.
  • Collaborate with AI/ML teams to integrate LLM-based query interfaces (e.g., natural language to SQL) with Dremio, Databricks SQL, and Power BI.
  • Ensure LLM responses are grounded on high-quality, certified datasets, reducing hallucinations and maintaining trust.

ML Readiness & SageMaker Studio

  • Provide certified, feature-ready datasets for ML training and inference in SageMaker Studio.
  • Collaborate with ML engineers to ensure input data meets all 9 quality dimensions.
  • Establish monitoring for data drift and model reliability.

Holistic Data Quality Dimensions

  • Continuously enforce all 9 dimensions of data quality:
  • Accuracy, Completeness, Consistency, Timeliness, Validity, Uniqueness, Integrity, Conformity, Reliability.

Required

  • 5–10 years of experience in data engineering, data quality, or data governance roles.
  • Strong skills in Python, PySpark, and SQL.
  • Hands-on with Databricks (Delta Lake, Unity Catalog, Delta Live Tables) and Apache Iceberg.
  • Expertise in AWS data stack (S3, Glue ETL, Glue Catalog, Athena, EMR, Redshift, SageMaker Studio).
  • Experience with Power BI semantic modeling, DAX, and dataset certification.
  • Familiarity with Dremio or query engines (Trino, Presto).
  • Knowledge of Atlan or equivalent catalog/governance tools.
  • Experience with data quality testing frameworks (Great Expectations, Deequ, Soda).

Preferred

  • Exposure to Conversational Analytics platforms or LLM-powered BI (e.g., natural language query over Lakehouse/Power BI).
  • Experience integrating LLM pipelines (LangChain, OpenAI, AWS Bedrock, etc.) with enterprise data.
  • Familiarity with data observability tools (Monte Carlo, Bigeye, DataDogs, Grafana).
  • Knowledge of data compliance frameworks (GDPR, CCPA, HIPAA).
  • Cloud certifications: AWS Data Analytics Specialty, Databricks Certified Data Engineer.

Set alerts for more jobs like Data Quality Engineer, Enterprise Data Platform - 10 +Years ( Data Quality & Data governance) , AWS , SQL & Python , Cloud, Power BI
Set alerts for new jobs by Western Digital
Set alerts for new Data Analysis jobs in India
Set alerts for new jobs in India
Set alerts for Data Analysis (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙