Software is eating the world, but AI is eating software. We live in unprecedented times – AI has the potential to exponentially augment human intelligence. Every person will have a personal tutor, coach, assistant, personal shopper, travel guide, and therapist throughout life. As the world adjusts to this new reality, leading platform companies are scrambling to build LLMs at billion scale, while large enterprises figure out how to add it to their products. To make them safe, aligned and actually useful, these models need human eval and reinforcement learning through human feedback (RLHF) during pre-training, fine-tuning, and production evaluations. This is the main innovation that’s enabled ChatGPT to get such a large head start among competition.
At Scale, our Generative AI Data Engine powers the most advanced LLMs and generative models in the world through world-class RLHF, human data generation, model evaluation, safety, and alignment. The data we are producing is some of the most important work for how humanity will interact with AI.
The Data Analytics team is responsible for centralized data, experimentation and reporting across all areas of Scale. We are building out the critical data pipelines, platforms and reporting, to support data-driven decision making and strategy for the company, including support for financial reporting, experimentations, and AI enabled insights.. The team are strong relationship builders and work in close collaboration with delivery, operations, finance, and engineering. You’ll be deeply invoiced in building flexible new systems to support experimentation across the company, and we are looking for engineers who are relentlessly curious and thrive on building systems from ambiguity.
- Provide critical input in the Data Engineering team’s roadmap and technical direction
- Continually improve ongoing data pipelines and simplify self-service support for business stakeholders
- Perform regular system audits, and create data quality tests to ensure complete and accurate reporting of data/metrics
- Design and implement and deploy data engineering frameworks
- Manage and optimize data pipelines, warehouses and costs
- Deliver at a high velocity and level of quality to engage our customers.
- Work across the entire product lifecycle from conceptualization through production
- Be able, and willing, to multi-task and learn new technologies quickly
- Work closely with cross-functional partners like finance, product, software engineers, and operations to identify opportunities for business impact, understand, refine and prioritize requirements for Data engineering.
- 6+ years of relevant work experience in a role requiring application of data modeling, warehouse optimization and automation skills.
- Ability to create extensible and scalable data schema and pipelines that lay the foundation for downstream analysis using SQL and Python
- Experience building a reliable transformation layer and pipelines from ambiguous business processes using tools such DBT to create a foundation for data insights.
- Experience partnering with engineering, and business stakeholders to automate manual data workflows
- Experience in best practices for query and cost optimization in Snowflake.
- Strong written and verbal communication skills
- Strong problem-solving skills, and be able to work independently or as part of a team.
- Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI).
- Experience developing and deploying data engineering tooling
- Excitement to work with AI technologies.