About Brillio:
Brillio is one of the fastest growing digital technology service providers and a partner of choice for many Fortune 1000 companies seeking to turn disruption into a competitive advantage through innovative digital adoption. Brillio, renowned for its world-class professionals, referred to as "Brillians", distinguishes itself through their capacity to seamlessly integrate cutting-edge digital and design thinking skills with an unwavering dedication to client satisfaction.
Brillio takes pride in its status as an employer of choice, consistently attracting the most exceptional and talented individuals due to its unwavering emphasis on contemporary, groundbreaking technologies, and exclusive digital projects. Brillio's relentless commitment to providing an exceptional experience to its Brillians and nurturing their full potential consistently garners them the Great Place to Work® certification year after year.
Role: LLMOps Engineer
Responsibilities
- The candidate will be responsible for operationalizing large language models, implementing scalable solutions, and driving innovation in AI/ML deployment practices
- This role requires someone who is passionate about learning new technologies, investigating cutting-edge techniques, and providing informed technical decisions
- Why It’s Important: Strategic or operational significance of solving this problem
- Success Metrics: What outcomes are expected when the solution is fully functional?
- Design, implement, and maintain end-to-end pipelines for LLM training, fine-tuning, validation, and deployment
- Build and optimize scalable infrastructure for large language model operations
- Deploy LLMs to production environments with prompt management, observability, serverless deployment, proper monitoring, scaling, and performance optimization
- Design, develop, and maintain RESTful APIs endpoints for LLM inference and model interactions
- Ensure API reliability, performance optimization, rate limiting, authentication, and comprehensive documentation
- Implement comprehensive monitoring solutions for model performance, drift detection, and system health metrics
- Research and evaluate emerging LLMOps techniques, tools, and methodologies
- Provide informed recommendations on technology choices, architecture decisions, and implementation strategies
- Establish and document best practices for LLM operations, deployment patterns, and governance frameworks
- Develop prototypes and POCs to validate new approaches and technologies
- Work closely with data scientists, ML engineers, DevOps teams, and product managers
- Create comprehensive documentation for systems, processes, and architectural decisions
- Mentor team members and share expertise through technical presentations and training sessions
- Optimize data preprocessing and feature engineering pipelines for LLM training and inference
- Implement data validation, quality checks, and lineage tracking for model training datasets
- Design efficient data storage and retrieval systems for large-scale model artifacts and training data
- Implement model governance frameworks including audit trails, compliance monitoring, and approval workflows
- Ensure secure model deployment practices, access controls, and data privacy measures
- Identify and mitigate risks associated with LLM deployment and operations
- Maintain development, staging, and production environments for LLM workflows
Qualifications
- Bachelor’s degree in Computer Science, Statistics, Engineering or a related field (exceptional candidates without advanced degrees will be considered).
- LLMOps Engineer with software engineering experience
Education:
B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
Experience:
- 6-12 years of experience building production-quality software (at least 5 years in Python) + 2 years in LLMOps
- 6+ years of software development experience with strong programming skills in Python, SQL
- 2+ years of hands-on experience LLMOps
- 1+ years of experience with machine learning operations, model deployment, and lifecycle management
- Proficiency with at least one major cloud provider (AWS or GCP) and their ML services
- Experience with Docker, Kubernetes, and container orchestration for ML workloads
- Strong experience in designing, building, and maintaining production-grade APIs for ML services
- Proficiency with Git, CI/CD pipelines, and DevOps practices
- Understanding of LLM architectures, training methodologies, and fine-tuning techniques
- Knowledge of ML pipeline design, model monitoring, and deployment strategies
- Understanding of distributed systems, scalability patterns, and microservices architecture
Good-to-Have Technical Skills
- Experience with HuggingFace Transformers, PyTorch, TensorFlow, or similar frameworks
- Knowledge of prompt optimization, RAG (Retrieval-Augmented Generation) architectures
- Experience with vector search
Know what it’s like to work and grow at Brillio: https://www.brillio.com/join-us/
Equal Employment Opportunity Declaration
Brillio is an equal opportunity employer to all, regardless of age, ancestry, colour, disability (mental and physical), exercising the right to family care and medical leave, gender, gender expression, gender identity, genetic information, marital status, medical condition, military or veteran status, national origin, political affiliation, race, religious creed, sex (includes pregnancy, childbirth, breastfeeding, and related medical conditions), and sexual orientation.
#LI-CH1