Member of Technical Staff – Voice & Vision
Microsoft
Job Summary
Microsoft seeks a highly skilled Member of Technical Staff – Voice & Vision to develop voice and vision capabilities for its Copilot product. Responsibilities include working on cutting-edge technologies like super-resolution and real-time video streaming; collaborating on innovative solutions; applying expertise in audio/video technologies to new AI contexts (noise suppression, echo cancellation); developing advanced techniques for video/image manipulation (upscaling, super-resolution); leading vision capability integration; driving development of voice/video features in generative AI; collaborating with stakeholders to define technical requirements; designing, developing, and optimizing voice recognition algorithms; implementing NLP techniques; developing computer vision algorithms; and ensuring system scalability, performance, and reliability. The role requires strong engineering skills with a focus on voice recognition, natural language processing, and computer vision.
Must Have
- 6+ years experience
- Proficiency in C, C++, C#, Java, JavaScript, or Python
- Experience in voice recognition, NLP, computer vision
- Strong engineering skills
- Collaboration and communication skills
Good to Have
- Experience with generative AI
- Experience with audio/video manipulation
- Experience with mobile platforms
Job Description
Job Description
- Work on cutting-edge technologies with a focus on voice and vision, including super resolution and real-time video streaming.
- Collaborate with a dynamic team to deliver innovative solutions that enhance the user experience across various platforms.
- Apply expertise in audio and video technologies to new AI contexts, focusing on traditional methods such as noise suppression and echo cancellation to enhance voice quality across mobile platforms.
- Develop and implement advanced techniques for video and image manipulation, including upscaling and super resolution.
- Lead the integration and rollout of vision capabilities, ensuring seamless collaboration between the vision and voice teams to deliver next-level improvements and expand functionality across various contexts.
- Drive hands-on development of voice-heavy and video-heavy features, pushing the boundaries of generative AI in both audio and video domains.
- Collaborate with product managers, designers, and other stakeholders to define technical requirements and deliverables.
- Design, develop, and optimize voice recognition algorithms, including acoustic modeling, language modeling, and speech-to-text conversion.
- Implement and enhance natural language processing (NLP) techniques, such as named entity recognition, sentiment analysis, and intent detection.
- Develop and refine computer vision algorithms for tasks such as image classification, object detection, and facial recognition.
- Ensure the scalability, performance, and reliability of voice and vision systems through rigorous testing and validation.
- Bachelor's Degree in Computer Science, or related technical discipline AND 6 years technical engineering experience with coding in languages including, but not limited to, C, C , C#, Java, JavaScript, or Python
- OR equivalent experience.
- Bachelor's degree in computer science, or related technical discipline AND 8 years technical engineering experience with coding in languages including, but not limited to, C, C , C#, Java, JavaScript, or Python
- OR equivalent experience.