About the Role
We are seeking an experienced Data Modeler to design and evolve the data architecture and semantic foundation for our enterprise Data Lakehouse platform, spanning Databricks, Apache Iceberg, AWS (Glue, Glue Catalog, SageMaker Studio), Dremio, Atlan, and Power BI.
In this role, you will translate business requirements into conceptual, logical, and physical models, ensuring that data is standardized, canonical, and business-ready as it flows across the Bronze (raw), Silver (canonical), and Gold (curated/consumption/semantic) layers.
You will play a central role in implementing canonical data models at the Silver layer, enabling consistency across systems and simplifying downstream consumption in Power BI dashboards, ML models in SageMaker Studio, and conversational analytics with LLMs.
Key Responsibilities
Data Modeling & Architecture
- Design conceptual, logical, and physical models to support ingestion, curation, and consumption in a modern lakehouse.
- Define canonical data models at the Silver layer to harmonize data across domains and source systems.
- Implement modeling patterns (dimensional modeling, star/snowflake, data vault, semantic modeling) aligned with enterprise standards.
- Optimize physical models for Databricks Delta and Apache Iceberg tables, ensuring schema evolution, partitioning, and performance.
Semantic Layer & Business Enablement
- Build and maintain the semantic layer in Dremio and Power BI, ensuring certified KPIs, hierarchies, and measures align with canonical models.
- Provide certified and reusable datasets to BI teams for dashboards and reporting.
- Collaborate with business stakeholders to align semantic definitions with canonical data models.
AI/ML & Conversational Analytics Enablement
- Prepare feature-ready canonical datasets for ML engineers and data scientists in SageMaker Studio.
- Design models that can be effectively consumed by LLM-powered conversational analytics (natural language → SQL queries over curated data).
- Ensure AI/BI queries map back to trusted canonical models, reducing ambiguity and duplication.
Governance, Certification & Metadata
- Document canonical and semantic models in Atlan and AWS Glue Catalog, ensuring discoverability and lineage.
- Collaborate with Data Quality Engineers to embed validation and certification rules into canonical models.
- Align canonical modeling with business glossaries, standards, and compliance requirements.
Collaboration & Best Practices
- Translate business requirements into canonical and semantic modeling patterns.
- Partner with architects, engineers, and analysts to define standards for canonical modeling in the Silver layer.
- Mentor junior team members on canonical modeling and semantic design principles.
Qualifications
Required
- 5–10 years of experience in data modeling, data architecture, or BI data design.
- Strong knowledge of conceptual, logical, physical, and canonical data modeling.
- Experience with dimensional modeling (Kimball), data vault, and semantic modeling.
- Hands-on with Databricks (Delta Lake, Unity Catalog) and Apache Iceberg.
- Familiarity with AWS services (S3, Glue, Glue Catalog, Redshift, SageMaker Studio).
- Experience with Dremio or similar query/semantic engines.
- Proficiency with Power BI modeling, DAX, and dataset certification.
- Experience with Atlan (or equivalent catalog/governance tools).
Preferred
- Experience implementing canonical data models at enterprise scale.
- Familiarity with LLM-driven conversational analytics and preparing canonical/semantic data for AI consumption.
- Knowledge of data quality principles and certification workflows.
- Familiarity with data observability platforms.
- Cloud certifications: AWS Data Analytics Specialty, Databricks Certified Data Engineer/Architect.