Data Engineer

2 Minutes ago • 6 Years + • Data Analysis • $180,000 PA - $225,000 PA

Job Summary

Job Description

Scale AI is at the forefront of the AI revolution, building the Generative AI Data Engine that powers advanced LLMs and generative models globally. This engine leverages world-class RLHF, human data generation, model evaluation, safety, and alignment, producing crucial data for human-AI interaction. The Data Analytics team at Scale is central to this, responsible for centralized data, experimentation, and reporting. They build critical data pipelines and platforms to support data-driven decisions, financial reporting, and AI-enabled insights, collaborating closely with various teams to build flexible systems.
Must have:
  • Provide critical input in the Data Engineering team’s roadmap and technical direction.
  • Continually improve ongoing data pipelines and simplify self-service support for business stakeholders.
  • Perform regular system audits, and create data quality tests to ensure complete and accurate reporting of data/metrics.
  • Design and implement and deploy data engineering frameworks.
  • Manage and optimize data pipelines, warehouses and costs.
  • Deliver at a high velocity and level of quality to engage our customers.
  • Work across the entire product lifecycle from conceptualization through production.
  • Be able, and willing, to multi-task and learn new technologies quickly.
  • Work closely with cross-functional partners like finance, product, software engineers, and operations to identify opportunities for business impact, understand, refine and prioritize requirements for Data engineering.
  • 6+ years of relevant work experience in a role requiring application of data modeling, warehouse optimization and automation skills.
  • Ability to create extensible and scalable data schema and pipelines that lay the foundation for downstream analysis using SQL and Python.
  • Experience building a reliable transformation layer and pipelines from ambiguous business processes using tools such DBT to create a foundation for data insights.
  • Experience partnering with engineering, and business stakeholders to automate manual data workflows.
  • Experience in best practices for query and cost optimization in Snowflake.
  • Strong written and verbal communication skills.
  • Strong problem-solving skills, and be able to work independently or as part of a team.
Good to have:
  • Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI)
  • Experience developing and deploying data engineering tooling
  • Excitement to work with AI technologies
Perks:
  • Comprehensive health, dental and vision coverage
  • Retirement benefits
  • Learning and development stipend
  • Generous PTO
  • Commuter stipend

Job Details

About Scale

Software is eating the world, but AI is eating software. We live in unprecedented times – AI has the potential to exponentially augment human intelligence. Every person will have a personal tutor, coach, assistant, personal shopper, travel guide, and therapist throughout life. As the world adjusts to this new reality, leading platform companies are scrambling to build LLMs at billion scale, while large enterprises figure out how to add it to their products. To make them safe, aligned and actually useful, these models need human eval and reinforcement learning through human feedback (RLHF) during pre-training, fine-tuning, and production evaluations. This is the main innovation that’s enabled ChatGPT to get such a large head start among competition.

About Data Engine

At Scale, our Generative AI Data Engine powers the most advanced LLMs and generative models in the world through world-class RLHF, human data generation, model evaluation, safety, and alignment. The data we are producing is some of the most important work for how humanity will interact with AI.

About our Analytics Team

The Data Analytics team is responsible for centralized data, experimentation and reporting across all areas of Scale. We are building out the critical data pipelines, platforms and reporting, to support data-driven decision making and strategy for the company, including support for financial reporting, experimentations, and AI enabled insights.. The team are strong relationship builders and work in close collaboration with delivery, operations, finance, and engineering. You’ll be deeply invoiced in building flexible new systems to support experimentation across the company, and we are looking for engineers who are relentlessly curious and thrive on building systems from ambiguity.

Responsibilities:

  • Provide critical input in the Data Engineering team’s roadmap and technical direction
  • Continually improve ongoing data pipelines and simplify self-service support for business stakeholders
  • Perform regular system audits, and create data quality tests to ensure complete and accurate reporting of data/metrics
  • Design and implement and deploy data engineering frameworks
  • Manage and optimize data pipelines, warehouses and costs
  • Deliver at a high velocity and level of quality to engage our customers.
  • Work across the entire product lifecycle from conceptualization through production
  • Be able, and willing, to multi-task and learn new technologies quickly
  • Work closely with cross-functional partners like finance, product, software engineers, and operations to identify opportunities for business impact, understand, refine and prioritize requirements for Data engineering.

Requirements:

  • 6+ years of relevant work experience in a role requiring application of data modeling, warehouse optimization and automation skills.
  • Ability to create extensible and scalable data schema and pipelines that lay the foundation for downstream analysis using SQL and Python
  • Experience building a reliable transformation layer and pipelines from ambiguous business processes using tools such DBT to create a foundation for data insights.
  • Experience partnering with engineering, and business stakeholders to automate manual data workflows
  • Experience in best practices for query and cost optimization in Snowflake.
  • Strong written and verbal communication skills
  • Strong problem-solving skills, and be able to work independently or as part of a team.

Nice to haves:

  • Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI).
  • Experience developing and deploying data engineering tooling
  • Excitement to work with AI technologies.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in San Francisco, California, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Data Analysis Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

India (Remote)

San Francisco, California, United States (On-Site)

London, England, United Kingdom (On-Site)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Scale AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug