LLMOps Engineer - R01554714

Brillio

6-12 Years | Tampa, FL, United States (Hybrid) | Full Time | 4 months ago

Apply Now

Job Summary

Brillio is a fast-growing digital technology service provider. This LLMOps Engineer role involves operationalizing large language models, implementing scalable solutions, and driving innovation in AI/ML deployment. Responsibilities include designing and maintaining LLM pipelines, building optimized infrastructure, deploying LLMs to production with monitoring, and developing RESTful APIs for inference. The role requires passion for new technologies and informed technical decisions.

Must Have

Operationalize large language models.
Implement scalable AI/ML deployment solutions.
Design and maintain LLM training and deployment pipelines.
Build and optimize scalable infrastructure for LLM operations.
Deploy LLMs to production with monitoring and optimization.
Develop and maintain RESTful APIs for LLM inference.
Implement comprehensive monitoring for model performance.
Research and evaluate emerging LLMOps techniques.
Establish LLM operations best practices.
Optimize data preprocessing and feature engineering.
Implement model governance and secure deployment.
Maintain LLM development and production environments.
6-12 years software engineering experience.
5+ years Python programming.
2+ years hands-on LLMOps experience.
1+ year ML operations and model deployment.
Proficiency with AWS or GCP ML services.
Experience with Docker, Kubernetes, CI/CD, DevOps.
Strong SQL programming skills.
Understanding of LLM architectures and fine-tuning.
Knowledge of ML pipeline design and monitoring.
Understanding of distributed systems.

Good to Have

Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
Experience with vector search

Job Description

About Brillio:

Brillio is one of the fastest growing digital technology service providers and a partner of choice for many Fortune 1000 companies seeking to turn disruption into a competitive advantage through innovative digital adoption. Brillio, renowned for its world-class professionals, referred to as "Brillians", distinguishes itself through their capacity to seamlessly integrate cutting-edge digital and design thinking skills with an unwavering dedication to client satisfaction.

Brillio takes pride in its status as an employer of choice, consistently attracting the most exceptional and talented individuals due to its unwavering emphasis on contemporary, groundbreaking technologies, and exclusive digital projects. Brillio's relentless commitment to providing an exceptional experience to its Brillians and nurturing their full potential consistently garners them the Great Place to Work® certification year after year.

Role: LLMOps Engineer

Responsibilities

The candidate will be responsible for operationalizing large language models, implementing scalable solutions, and driving innovation in AI/ML deployment practices
This role requires someone who is passionate about learning new technologies, investigating cutting-edge techniques, and providing informed technical decisions
Why It’s Important: Strategic or operational significance of solving this problem
Success Metrics: What outcomes are expected when the solution is fully functional?
Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
Build and optimize scalable infrastructure for large language model operations
Deploy LLMs to production environments with prompt management, observability, serverless deployment, proper monitoring, scaling, and performance optimization
Design, develop, and maintain RESTful APIs endpoints for LLM inference and model interactions
Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
Research and evaluate emerging LLMOps techniques, tools, and methodologies
Provide informed recommendations on technology choices, architecture decisions, and implementation strategies
Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
Develop prototypes and POCs to validate new approaches and technologies
Work closely with data scientists, ML engineers, DevOps teams, and product managers
Create comprehensive documentation for systems, processes, and architectural decisions
Mentor team members and share expertise through technical presentations and training sessions
Optimize data preprocessing and feature engineering pipelines for LLM training and inference
Implement data validation, quality checks, and lineage tracking for model training datasets
Design efficient data storage and retrieval systems for large-scale model artifacts and training data
Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
Ensure secure model deployment practices, access controls, and data privacy measures
Identify and mitigate risks associated with LLM deployment and operations
Maintain development, staging, and production environments for LLM workflows

Qualifications

Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (exceptional candidates without advanced degrees will be considered).
LLMOps Engineer with software engineering experience

Education:

B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent

Experience:

6-12 years of experience building production-quality software (at least 5 years in Python) + 2 years in LLMOps
6+ years of software development experience with strong programming skills in Python, SQL
2+ years of hands-on experience LLMOps
1+ years of experience with machine learning operations, model deployment, and lifecycle management
Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
Experience with Docker, Kubernetes, and container orchestration for ML workloads
Strong experience in designing, building, and maintaining production-grade APIs for ML services
Proficiency with Git, CI/CD pipelines, and DevOps practices
Understanding of LLM architectures, training methodologies, and fine-tuning techniques
Knowledge of ML pipeline design, model monitoring, and deployment strategies
Understanding of distributed systems, scalability patterns, and microservices architecture

Good-to-Have Technical Skills

Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
Experience with vector search

Know what it’s like to work and grow at Brillio: https://www.brillio.com/join-us/

Equal Employment Opportunity Declaration

Brillio is an equal opportunity employer to all, regardless of age, ancestry, colour, disability (mental and physical), exercising the right to family care and medical leave, gender, gender expression, gender identity, genetic information, marital status, medical condition, military or veteran status, national origin, political affiliation, race, religious creed, sex (includes pregnancy, childbirth, breastfeeding, and related medical conditions), and sexual orientation.

#LI-CH1

16 Skills Required For This Role

Github Game Texts User Experience Ux Prototyping Aws Model Deployment Pytorch Ci Cd Docker Microservices Kubernetes Git Python Sql Tensorflow Machine Learning

Similar Jobs

Research Development

Software Engineer, BigQuery AI Developer Experience

Google • Kirkland, Washington, United States of America (On Site)