Member of Technical Staff, Multimedia (Vision)

2 Months ago • 3 Years +

Software Development & Engineering

Job Description

Fireworks AI is seeking a Member of Technical Staff specializing in vision-language modeling to advance their generative AI platform. This role involves leading research and development in multimodal models, from data preparation to deployment, and building production-quality systems. Responsibilities include designing and implementing scalable machine learning systems for tasks like image captioning and visual question answering, training large-scale VLMs using advanced techniques like LoRA/QLoRA and distributed training, writing production-ready Python code, and analyzing model performance. Collaboration with engineering, product, and design teams, as well as direct customer interaction, is crucial for translating VLM capabilities into real-world applications and contributing to the platform roadmap.

Good To Have:

Master's or PhD in a relevant field
Research experience in VLM or multimodal modeling
Experience with multimodal training/fine-tuning
Familiarity with LLMs and visual encoders
Open-source contributions or top-tier publications

Must Have:

3 years of ML experience
Focus on computer vision, NLP, or multimodal systems
Proficiency in Python and deep learning frameworks
Experience training/deploying large models
Ability to write production-quality code
Customer interaction experience

Perks:

Solve hard problems at the forefront of AI infrastructure
Build cutting-edge technology impacting AI adoption globally
Ownership and impact in a fast-growing team
Collaborate with world-class engineers and researchers

Add these skills to join the top 1% applicants for this job

model-serving

pytorch

deep-learning

reinforcement-learning

computer-vision

python

tensorflow

machine-learning

About Us:

Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.

The Role:

We are looking for a highly motivated Member of Technical Staff with expertise in vision-language modeling to join our research and engineering team. This role will drive advancements in our multimodal models and applications that combine visual understanding with natural language. You’ll be responsible for conducting cutting-edge research and building production-quality systems that bring state-of-the-art VLM capabilities into real-world products.

Key Responsibilities:

Lead research and development efforts in vision-language models, including data preparation, model training, evaluation, and deployment.
Collaborate with teams in engineering, product, and design, and work directly with customers to understand their needs and translate VLM capabilities into real-world applications.
Design and implement scalable machine learning systems for tasks such as image captioning, visual question answering, retrieval, grounding, and multimodal reasoning.
Train large-scale VLMs using techniques such as parameter-efficient fine-tuning (LoRA/QLoRA), reinforcement learning approaches, dataset curation and preparation, distributed training (DDP/FSDP), hyperparameter optimization.
Build robust, maintainable code in Python for both experimentation and production use.
Analyze model performance, conduct rigorous evaluations, and experiment based on empirical insights.
Contribute to the platform roadmap by providing technical insights into quality improvements, integrating latest multi-modal research, and identifying and proposing new platform capabilities with significant commercial potential.

Minimum Qualifications:

Bachelor’s degree in Computer Science, Electrical Engineering, or a related field.
3 years of experience in machine learning, with a focus on computer vision, NLP, or multimodal systems.
Strong proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow.
Experience training and deploying large-scale models and working with distributed computing environments.
Demonstrated ability to write production-quality code and collaborate across teams.
Experience working directly with customers, partners, or external stakeholders to define use cases or requirements.

Preferred Qualifications:

Master’s or PhD in a relevant technical field with research experience in vision-language or multimodal modeling.
Experience with multimodal training/fine-tuning and downstream tasks like VQA, captioning, or retrieval.
Familiarity with large language models (LLMs) and their integration with visual encoders.
Contributions to open-source projects or publications in top-tier ML/AI conferences (e.g., CVPR, ICCV, NeurIPS, ICML, ACL).
Comfortable working in fast-paced, cross-disciplinary environments and shipping research into production.

Why Fireworks AI?

Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.

Set alerts for more jobs like Member of Technical Staff, Multimedia (Vision)

Set alerts for new jobs by Fireworks AI

Set alerts for new Software Development & Engineering jobs in United States

Set alerts for new jobs in United States

Set alerts for Software Development & Engineering (Remote) jobs