Member of Technical Staff, Vision (Enterprise)

48 Minutes ago • All levels • $180,000 PA - $440,000 PA
Research Development

Job Description

This role at xAI involves working directly with enterprise customers to strategize and execute Vision or Video Understanding integrations. You will act as a specialized AI startup CTO, leading high-stakes projects and delivering measurable impact. Responsibilities include designing and building end-to-end AI solutions, benchmarking vision models, improving model performance through prompt tuning and fine-tuning VLMs, and analyzing vision request logs. The role emphasizes deep technical expertise in vision/video models, adaptability, and strong communication skills to drive projects to completion.
Good To Have:
  • Building evaluations for Vision capabilities, such as image recognition accuracy or robustness of object detection
  • Demonstrating expertise in machine learning fundamentals, including vision model evaluation, training, or fine-tuning
  • Deploying Vision models to production, optimizing for low-latency and high-reliability environments
  • Writing developer documentation or creating vision-specific SDKs
  • Working with large-scale image or video datasets, optimizing vision processing pipelines, or scaling systems for enterprise-grade workloads
  • Using infrastructure tools like Pulumi or Terraform for deploying Vision systems
Must Have:
  • Strong engineering background
  • Experience interfacing between technical and customer-facing teams
  • Ability to translate business and vision-specific product needs into technical solutions
  • Proven experience implementing VLM or machine learning products, including APIs, back-end, and front-end vision interfaces
  • Strong proficiency in Python and/or TypeScript
  • Solid understanding of HTTP protocol and real-time communication protocols (e.g., WebRTC for video streaming)
  • Deep expertise in working with vision or video understanding models
  • Ability to handle ambiguity, adapt to evolving requirements, and prioritize effectively
  • Exceptional communication skills to clarify specific requirements with customers
  • Emphasis on designing, implementing, and maintaining efficient architectures
  • Proficiency in managing complex codebases and optimizing vision data pipelines
Perks:
  • Equity
  • Comprehensive medical, vision, and dental coverage
  • Access to a 401(k) retirement plan
  • Short & long-term disability insurance
  • Life insurance
  • Various other discounts and perks

Add these skills to join the top 1% applicants for this job

cross-functional
communication
excel
game-texts
quality-control
user-experience-ux
image-classification
object-detection
terraform
webrtc
front-end
back-end
python
typescript
machine-learning

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

You will work directly with our enterprise customers, owning the strategy and execution of Vision or Video Understanding integrations. You’ll act as a specialized AI startup CTO, focusing on vision-driven technologies, leading high-stakes projects, and delivering measurable impact. If you excel at combining deep technical expertise with customer-focused innovation, particularly in the Vision domain, we’d love to hear from you. Your day-to-day work may include:

  • Designing and building end-to-end AI solutions, from understanding customer pain points to scoping product specs and deploying VLM-powered vision interfaces.
  • Benchmarking vision models, writing evaluations, or analyzing performance to identify weaknesses in image recognition, object detection, or visual understanding.
  • Improving model performance through system prompt tuning and fine-tuning VLMs.
  • Working with multimodal teams to generate data for development efforts.
  • Generating synthetic data or kicking off campaigns to generate human data with the help of our AI tutors.
  • Analyzing vision request logs, image data, or video inputs to enhance system accuracy and user experience.
  • Building internal tools to automate VLM workflows, such as image processing pipelines or real-time visual analysis.

Focus

  • Deep expertise in working with vision or video understanding models, delivering robust and scalable solutions.
  • Ability to handle ambiguity, adapt to evolving requirements, and prioritize effectively in a fast-paced startup setting.
  • Exceptional communication skills to clarify specific requirements with customers and drive projects to successful completion.
  • Emphasis on designing, implementing, and maintaining efficient architectures, including image recognition, object detection, and real-time visual processing.
  • Proficiency in managing complex codebases and optimizing vision data pipelines for high-throughput, low-latency performance.
  • Define critical benchmarks for Vision or Video Understanding performance: Establish key performance benchmarks tailored to enterprise vision use cases, such as image classification accuracy, object detection precision, and real-time latency, reflecting customer data distributions.
  • Initiate human data collection: Design and manage campaigns to acquire high-quality image and video data from diverse enterprise contexts, supporting model training and validation.
  • Drive Vision model integration with enterprise partners: Collaborate with cross-functional teams to integrate Vision capabilities into enterprise workflows, enabling seamless adoption in areas like automated quality control, surveillance, and augmented reality.

Requirements

An ideal candidate meets at least the following:

  • Strong engineering background.
  • Experience interfacing between technical and customer-facing teams.
  • Excellent verbal and written communication skills in English.
  • Ability to translate business and vision-specific product needs into technical solutions.
  • Proven experience implementing VLM or machine learning products, including APIs, back-end, and front-end vision interfaces.
  • Strong proficiency in Python and/or TypeScript.
  • Solid understanding of HTTP protocol and real-time communication protocols (e.g., WebRTC for video streaming).

Standout Experiences

Candidates may distinguish themselves with:

  • Building evaluations for Vision capabilities, such as image recognition accuracy or robustness of object detection.
  • Demonstrating expertise in machine learning fundamentals, including vision model evaluation, training, or fine-tuning.
  • Deploying Vision models to production, optimizing for low-latency and high-reliability environments.
  • Writing developer documentation or creating vision-specific SDKs.
  • Working with large-scale image or video datasets, optimizing vision processing pipelines, or scaling systems for enterprise-grade workloads.
  • Using infrastructure tools like Pulumi or Terraform for deploying Vision systems.

Interview Process

After submitting your application, our team reviews your CV and Statement of Exceptional Work. If selected, you’ll be invited to a 15-minute technical phone interview where we’ll discuss your background in VLMs/LLMs. Successful candidates proceed to the main process:

  • 15 min Technical Screen
  • 2x 45 min Coding Interview (focused on Vision models or related challenges)

The Statement of Exceptional Work is a critical factor in our evaluation.

We aim to complete the main process within one week. All applications are reviewed by our technical team, not recruiters. Interviews are conducted via Google Meet or in-person.

Annual Salary Range

$180,000 - $440,000 USD

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Create a Job Alert

Interested in building your career at xAI? Get future opportunities sent straight to your email.

Create alert

Apply for this job

------------------

  • indicates a required field

Autofill with Greenhouse

First Name*

Last Name*

Email*

Phone

Country*

Phone*

Location (City)*

Locate me

Resume/CV*

AttachAttach

Dropbox

Google Drive

Enter manuallyEnter manually

Accepted file types: pdf, doc, docx, txt, rtf

  • * *

Current company

If you are currently employed in the field, please tell us the name of your employer.

Current title

If you are currently employed in the field, please tell us your role including your seniority level (e.g. Software Engineer II).

LinkedIn Profile

If you have a public LinkedIn profile, please provide its URL.

X Profile

If you have a public X profile, please provide its URL.

Google Scholar

If you have a Google Scholar page, please provide its URL.

What exceptional work have you done?*

In 100 words or less, tell us about a piece of work you are most proud of.

Will you now, or in the future, require sponsorship for employment visa status (e.g., H-1B visa) to legally work for X.AI LLC in the U.S.?*

Select...

Submit application

Set alerts for more jobs like Member of Technical Staff, Vision (Enterprise)
Set alerts for new jobs by xAI
Set alerts for new Research Development jobs in United Kingdom
Set alerts for new jobs in United Kingdom
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙