Sr. Data Engineer

Yahoo

Job Summary

Yahoo Mail is a leading consumer inbox with hundreds of millions of users, offering an organized and fast email experience. The Mail Analytics Engineering team builds mission-critical data systems, pipelines, warehouses, analytics, and ML/AI programs for the Communications business, including Yahoo Mail. This role involves working on data engineering infrastructures, pipelines, and next-generation Machine Learning- and AI-based data infrastructure. You will support new functionalities, mine data for insights, and address technical challenges in efficient query processing, large-scale stream processing, machine learning, and complex business rules within a petabyte-scale data environment.

Must Have

  • Develop new or improve existing data infrastructures for machine learning and deep learning
  • Implement algorithms and systems efficiently with other engineers
  • Take end-to-end ownership of Machine Learning-based distributed data systems
  • Develop complex queries, large volume data pipelines, and analytics applications
  • Develop software programs to solve analytics and data mining problems
  • Interact with stakeholders to understand requirements and deliver data solutions
  • Prototype new metrics or data systems
  • Lead data investigations to troubleshoot data issues
  • Maintain and improve released systems
  • Provide engineering consulting on large and complex warehouse data
  • BS/MS/PhD in Computer Science/Electrical Engineering or related disciplines
  • 6+ years of hands-on experience in data engineering
  • Strong fundamentals in algorithms, distributed computing, data structure, database
  • Fluency with Python, Java, and SQL
  • Self-driven, detail-oriented, teamwork spirit, excellent communication skills
  • Ability to multitask and manage expectations

Good to Have

  • Experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Storm, Spark, Kafka, Oozie)
  • Experience with Google Cloud Platform (BigQuery, Dataproc, Dataflow)
  • Experience with machine learning algorithms, NLP, and/or statistical methods
  • Experience in machine learning, analytics, data mining, or data mart and warehouse
  • Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib) and SQL/Unix/Shell

Perks & Benefits

  • Flexible hybrid work options
  • Healthcare
  • 401k
  • Backup childcare
  • Education stipends

Job Description

Yahoo Mail is the ultimate consumer inbox with hundreds of millions of users. It’s the best way to access your email and stay organized from a computer, phone or tablet. With its beautiful design and lightning fast speed, Yahoo Mail makes reading, organizing, and sending emails easier than ever.

A Little About Us

Yahoo makes the world’s daily habits inspiring and entertaining. By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world. Yahoo’s vast businesses span across Search, Communications, Media, and many other verticals.

Yahoo generates terabytes of data every day and it is critical to collect, manage and process data at petabyte scale to provide timely and accurate insights to executives, sales, product managers and product developers on all aspects of user interaction.

The Mail Analytics Engineering team at Yahoo is responsible for building mission critical data systems, pipelines, warehouses, analytics systems, and Machine Learning/AI/data mining programs for the Communications business, which includes Yahoo Mail, with 200M monthly active users. We are constantly pushing the envelope of data platforms due to the insane amount of data we need to harness.

A Lot About You

As part of the Mail Analytics Engineering team, you will be working on data engineering infrastructures, pipelines and next generation Machine Learning- and AI-based data infrastructure, supporting new functionalities on existing platforms, and mining data for analytics insights and product features.

Our Big Data footprints are among the largest few in the world, at double-digit petabyte scale. Developing this infrastructure presents many technical challenges in the areas of efficient query processing, large-scale stream processing, machine learning and modeling, as well as satisfying complex business rules.

If you are someone who is passionate about harnessing data at insane scale, enjoys working with new technologies, setting up petabyte data infrastructures and implementing new machine learning solutions and metrics systems, we want to hear from you!

Your Day

  • Develop new or improve existing data infrastructures for data processing machine learning, and deep learning using your core expertise
  • Work with other engineers to implement algorithms and systems in an efficient way
  • Take end to end ownership of Machine Learning-based distributed data systems - from data and training pipelines, to real time data serving engines.
  • Develop complex queries, very large volume data pipelines, and analytics applications
  • Develop complex queries and software programs to solve analytics and data mining problems
  • Interact with data analysts, data scientists, product managers, and software engineers to understand business problems, technical requirements to deliver data solutions
  • Prototype new metrics or data systems
  • Lead data investigations to troubleshoot data issues that arise along the data pipelines
  • Maintenance and improvement of released systems
  • Engineering consulting on large and complex warehouse data

You Must Have

  • BS/MS/PhD in Computer Science/Electrical Engineering, or related engineering disciplines, ideally with specialization in Data Engineering or Machine Learning
  • 6+ years of hands-on experience in relevant fields, including data engineering
  • Strong fundamentals: algorithms, distributed computing, data structure, database
  • Fluency with: Python/Java/SQL
  • Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations

Preferred

  • Experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Storm, Spark, Kafka, Oozie).
  • Experience with Google Cloud Platform (BiqQuery, Dataproc, Dataflow, etc.) a big plus
  • Experience with machine learning algorithms, NLP, and/or statistical methods a big plus
  • Experience in any of: machine learning, analytics, data mining, or data mart and warehouse
  • Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib) and SQL/Unix/Shell

18 Skills Required For This Role

Team Management Communication Data Analytics Game Texts Hbase Unix Hadoop Spark Google Cloud Platform Deep Learning Python Keras Shell Algorithms Sql Tensorflow Java Machine Learning