Senior Software Engineer, Tanzu Intelligent Assist
broadcom
Job Summary
Broadcom is seeking a Staff Engineer to lead the design, development, and scaling of end-to-end machine learning systems for Tanzu Intelligent Assist. This role involves architecting robust backend services, deploying and monitoring highly-available LLM-powered systems in production, and translating state-of-the-art research into product features. The engineer will work across the stack, ensuring performance, scalability, security, and maintainability, while driving technical initiatives and mentoring teammates.
Must Have
- Architect, build, and maintain highly scalable backend services and APIs in Java and Python to support ML and LLM workloads.
- Lead the deployment and integration of LLM and ML capabilities into core products.
- Take ownership of the production environment, including containerization, CI/CD pipelines, and Kubernetes deployments.
- Implement comprehensive observability through logging, monitoring, and metrics.
- Drive technical initiatives and provide mentorship to teammates.
- Ensure security and privacy are considered at every stage of the development and operations lifecycle.
Good to Have
- Familiarity with modern cloud infrastructure (AWS, GCP, or Azure).
- Experience with infrastructure-as-code tools (Terraform).
- Familiarity with Kubernetes tooling (Helm).
- Experience with observability stacks (Prometheus and Grafana).
- Experience with distributed systems at scale.
- Familiarity with other programming languages like Golang or C++.
- Demonstrated ability to lead technical initiatives and influence a cross-functional team.
- Knowledge of compliance frameworks (GDPR, SOC2, etc.).
Perks & Benefits
- Medical plans
- Dental plans
- Vision plans
- 401(K) participation including company matching
- Employee Stock Purchase Program (ESPP)
- Employee Assistance Program (EAP)
- Company paid holidays
- Paid sick leave
- Vacation time
- Paid Family Leave
Job Description
About the Role
We are seeking a Staff Engineer to lead the design, development, and scaling of our end-to-end machine learning systems. In this hands-on role, you will take ownership of the entire ML lifecycle, from architecting robust backend services to deploying and monitoring highly-available LLM-powered systems in production.
This is a senior position on a small, high-impact team. You will be instrumental in translating state-of-the-art research into tangible product features that deliver new value to our users. You'll work across the stack, ensuring that our systems are not only performant and scalable but also secure, observable, and built for long-term maintainability.
Responsibilities
- Architect, build, and maintain highly scalable backend services and APIs in Java and Python to support ML and LLM workloads.
- Lead the deployment and integration of LLM and ML capabilities into our core products, collaborating closely with product managers, data scientists, and other engineering teams.
- Take ownership of the production environment, including containerization, CI/CD pipelines (GitHub Actions), and Kubernetes deployments to ensure system reliability and performance at scale.
- Implement comprehensive observability through logging, monitoring, and metrics, anticipating potential failure modes and building in robust safeguards.
- Drive technical initiatives and provide mentorship to teammates, establishing and championing engineering best practices and technical standards.
- Ensure security and privacy are considered at every stage of the development and operations lifecycle, adhering to industry best practices.
Qualifications
##### Required
- Bachelor of Science degree in Computer Science or related field and a minimum of 5+ years of experience OR Master's Degree with 4+ years of relevant experience
- Extensive experience designing, developing, and deploying end-to-end machine learning systems into a production environment.
- Hands-on expertise with LLM systems and the modern LLM ecosystem.
- Proven expertise in backend engineering with Java and Python.
- Deep production experience with containerization (Docker) and Kubernetes.
- Strong understanding of CI/CD principles with practical experience building and managing pipelines.
- Solid experience with observability and operational readiness, including monitoring and logging tools.
- A strong grasp of security and privacy fundamentals in software development.
- Must have legal authorization to work in the U.S.
##### Preferred
- Familiarity with modern cloud infrastructure, including public cloud platforms (AWS, GCP, or Azure), infrastructure-as-code tools (Terraform), Kubernetes tooling (Helm), and observability stacks (Prometheus and Grafana)
- Experience with distributed systems at scale.
- Familiarity with other programming languages like Golang or C++.
- Demonstrated ability to lead technical initiatives and influence a cross-functional team.
- Knowledge of compliance frameworks (GDPR, SOC2, etc.).