Sr. Data Engineer

Yahoo

6+ Years | United States (Hybrid) | Full Time | 3 months ago

Apply Now

Job Summary

Yahoo Mail is a leading consumer inbox with hundreds of millions of users, offering an organized and fast email experience. The Mail Analytics Engineering team builds mission-critical data systems, pipelines, warehouses, analytics, and ML/AI programs for the Communications business, including Yahoo Mail. This role involves working on data engineering infrastructures, pipelines, and next-generation Machine Learning- and AI-based data infrastructure. You will support new functionalities, mine data for insights, and address technical challenges in efficient query processing, large-scale stream processing, machine learning, and complex business rules within a petabyte-scale data environment.

Must Have

Develop new or improve existing data infrastructures for machine learning and deep learning
Implement algorithms and systems efficiently with other engineers
Take end-to-end ownership of Machine Learning-based distributed data systems
Develop complex queries, large volume data pipelines, and analytics applications
Develop software programs to solve analytics and data mining problems
Interact with stakeholders to understand requirements and deliver data solutions
Prototype new metrics or data systems
Lead data investigations to troubleshoot data issues
Maintain and improve released systems
Provide engineering consulting on large and complex warehouse data
BS/MS/PhD in Computer Science/Electrical Engineering or related disciplines
6+ years of hands-on experience in data engineering
Strong fundamentals in algorithms, distributed computing, data structure, database
Fluency with Python, Java, and SQL
Self-driven, detail-oriented, teamwork spirit, excellent communication skills
Ability to multitask and manage expectations

Good to Have

Experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Storm, Spark, Kafka, Oozie)
Experience with Google Cloud Platform (BigQuery, Dataproc, Dataflow)
Experience with machine learning algorithms, NLP, and/or statistical methods
Experience in machine learning, analytics, data mining, or data mart and warehouse
Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib) and SQL/Unix/Shell

Perks & Benefits

Flexible hybrid work options
Healthcare
401k
Backup childcare
Education stipends

Job Description

Yahoo Mail is the ultimate consumer inbox with hundreds of millions of users. It’s the best way to access your email and stay organized from a computer, phone or tablet. With its beautiful design and lightning fast speed, Yahoo Mail makes reading, organizing, and sending emails easier than ever.

A Little About Us

Yahoo makes the world’s daily habits inspiring and entertaining. By creating highly personalized experiences for our users, we keep people connected to what matters most to them, across devices and around the world. Yahoo’s vast businesses span across Search, Communications, Media, and many other verticals.

Yahoo generates terabytes of data every day and it is critical to collect, manage and process data at petabyte scale to provide timely and accurate insights to executives, sales, product managers and product developers on all aspects of user interaction.

The Mail Analytics Engineering team at Yahoo is responsible for building mission critical data systems, pipelines, warehouses, analytics systems, and Machine Learning/AI/data mining programs for the Communications business, which includes Yahoo Mail, with 200M monthly active users. We are constantly pushing the envelope of data platforms due to the insane amount of data we need to harness.

A Lot About You

As part of the Mail Analytics Engineering team, you will be working on data engineering infrastructures, pipelines and next generation Machine Learning- and AI-based data infrastructure, supporting new functionalities on existing platforms, and mining data for analytics insights and product features.

Our Big Data footprints are among the largest few in the world, at double-digit petabyte scale. Developing this infrastructure presents many technical challenges in the areas of efficient query processing, large-scale stream processing, machine learning and modeling, as well as satisfying complex business rules.

If you are someone who is passionate about harnessing data at insane scale, enjoys working with new technologies, setting up petabyte data infrastructures and implementing new machine learning solutions and metrics systems, we want to hear from you!

Your Day

Develop new or improve existing data infrastructures for data processing machine learning, and deep learning using your core expertise
Work with other engineers to implement algorithms and systems in an efficient way
Take end to end ownership of Machine Learning-based distributed data systems - from data and training pipelines, to real time data serving engines.
Develop complex queries, very large volume data pipelines, and analytics applications
Develop complex queries and software programs to solve analytics and data mining problems
Interact with data analysts, data scientists, product managers, and software engineers to understand business problems, technical requirements to deliver data solutions
Prototype new metrics or data systems
Lead data investigations to troubleshoot data issues that arise along the data pipelines
Maintenance and improvement of released systems
Engineering consulting on large and complex warehouse data

You Must Have

BS/MS/PhD in Computer Science/Electrical Engineering, or related engineering disciplines, ideally with specialization in Data Engineering or Machine Learning
6+ years of hands-on experience in relevant fields, including data engineering
Strong fundamentals: algorithms, distributed computing, data structure, database
Fluency with: Python/Java/SQL
Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations

Preferred

Experience in Hadoop technologies (Map/Reduce, Pig, Hive, HBase, Storm, Spark, Kafka, Oozie).
Experience with Google Cloud Platform (BiqQuery, Dataproc, Dataflow, etc.) a big plus
Experience with machine learning algorithms, NLP, and/or statistical methods a big plus
Experience in any of: machine learning, analytics, data mining, or data mart and warehouse
Experience with Deep Learning platforms (Tensorflow/Keras/Spark MLlib) and SQL/Unix/Shell

18 Skills Required For This Role

Team Management Communication Data Analytics Game Texts Hbase Unix Hadoop Spark Google Cloud Platform Deep Learning Python Keras Shell Algorithms Sql Tensorflow Java Machine Learning