We are seeking a Senior Software Engineer to spearhead our data acquisition and management systems, critical to our advanced AI research. In this role, you will architect and maintain efficient pipelines for sourcing, processing, and organizing the extensive datasets that fuel our generative AI models. Your expertise will have a direct and transformative impact on the quality and capabilities of our technology.
Responsibilities
- Partner with research teams to understand and address model performance gaps by identifying and leveraging novel data sources.
- Develop and implement robust data pipelines for acquisition, deduplication, filtering, and pre-training dataset preparation.
- Collaborate with annotation operations teams to design innovative data filtering strategies and enhance dataset quality.
- Apply and integrate advanced methodologies such as self-supervised active learning to scale data systems.
- Lead research projects to improve data quality and drive advancements in video generation models.
Qualifications
- Education: Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
- Experience: 3+ years of experience in managing and curating large-scale datasets, particularly in fields like computer vision, NLP, robotics, or self-driving technologies.
Key Skills:
- Strong proficiency in Python and familiarity with deep learning frameworks such as PyTorch.
- Experience with large-scale data processing tools, such as SQL or Spark.
- Hands-on expertise in designing and working with distributed systems.
- Proven ability to thrive in a fast-paced, research-focused environment and deliver end-to-end project solutions.
Note: This position is not intended for recent graduates.
Compensation
The salary range for this role in California is $175,000–$250,000 per year. Actual base pay may vary based on factors such as job-related expertise, skills, experience, and candidate location. Additionally, we provide competitive equity packages through stock options and a comprehensive benefits plan.