Member of Technical Staff – Voice & Vision

Microsoft

6-8 Years | Mountain View, California, United States (Hybrid) | Full Time | 9 months ago

Apply Now

Job Summary

Microsoft seeks a highly skilled Member of Technical Staff – Voice & Vision to develop voice and vision capabilities for its Copilot product. Responsibilities include working on cutting-edge technologies like super-resolution and real-time video streaming; collaborating on innovative solutions; applying expertise in audio/video technologies to new AI contexts (noise suppression, echo cancellation); developing advanced techniques for video/image manipulation (upscaling, super-resolution); leading vision capability integration; driving development of voice/video features in generative AI; collaborating with stakeholders to define technical requirements; designing, developing, and optimizing voice recognition algorithms; implementing NLP techniques; developing computer vision algorithms; and ensuring system scalability, performance, and reliability. The role requires strong engineering skills with a focus on voice recognition, natural language processing, and computer vision.

Must Have

6+ years experience
Proficiency in C, C++, C#, Java, JavaScript, or Python
Experience in voice recognition, NLP, computer vision
Strong engineering skills
Collaboration and communication skills

Good to Have

Experience with generative AI
Experience with audio/video manipulation
Experience with mobile platforms

Job Description

Overview:

As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.

We are seeking a highly skilled and experienced Member of Technical Staff – Voice & Vision to join our team and drive the development of voice and vision capabilities for our Copilot product. The ideal candidate will have a strong background in engineering, with a focus on voice recognition, natural language processing, and computer vision technologies. As a Member of Technical Staff – Voice & Vision, you will be a founding engineer in a team that is at the forefront of generative AI, making significant strides in audio while aggressively pushing into the nascent field of video.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

By applying to this U.S. Mountain View, CA position, you are required to be local to the San Francisco area and in office 3 days a week.

Key Responsibilities:

Work on cutting-edge technologies with a focus on voice and vision, including super resolution and real-time video streaming.
Collaborate with a dynamic team to deliver innovative solutions that enhance the user experience across various platforms.
Apply expertise in audio and video technologies to new AI contexts, focusing on traditional methods such as noise suppression and echo cancellation to enhance voice quality across mobile platforms.
Develop and implement advanced techniques for video and image manipulation, including upscaling and super resolution.
Lead the integration and rollout of vision capabilities, ensuring seamless collaboration between the vision and voice teams to deliver next-level improvements and expand functionality across various contexts.
Drive hands-on development of voice-heavy and video-heavy features, pushing the boundaries of generative AI in both audio and video domains.
Collaborate with product managers, designers, and other stakeholders to define technical requirements and deliverables.
Design, develop, and optimize voice recognition algorithms, including acoustic modeling, language modeling, and speech-to-text conversion.
Implement and enhance natural language processing (NLP) techniques, such as named entity recognition, sentiment analysis, and intent detection.
Develop and refine computer vision algorithms for tasks such as image classification, object detection, and facial recognition.
Ensure the scalability, performance, and reliability of voice and vision systems through rigorous testing and validation.

Required Qualifications

Bachelor's Degree in Computer Science, or related technical discipline AND 6 years technical engineering experience with coding in languages including, but not limited to, C, C , C#, Java, JavaScript, or Python
OR equivalent experience.

Preferred Qualifications

Bachelor's degree in computer science, or related technical discipline AND 8 years technical engineering experience with coding in languages including, but not limited to, C, C , C#, Java, JavaScript, or Python
OR equivalent experience.

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications and processes offers for these roles on an ongoing basis.

#MicrsoftAI #Copilot