This role involves developing and orchestrating bare-metal provisioning systems, designing high-performance networking with RDMA, and establishing distributed storage using Rook-Ceph. The engineer will also build infrastructure automation with IaC/GitOps and create a developer self-service platform. Key qualifications include a Master's degree in CS/EE or related, and experience with Kubernetes, CI/CD, monitoring systems, and scripting in Python/Bash.
Good To Have:
Experience in operating private cloud environment and also public cloud like AWS, Azure, GCP
Experience in automating infrastructure using IaC tools like Terraform and Ansible
Experience in providing GPU/NPU-based computing resources for model serving or training
Experience with Python or C/C++ build systems (PyPI, CMake, Conan, etc.)
Must Have:
Develop a Metal3 and KubeVirt-based automated bare-metal provisioning system and a unified VM/Container orchestration platform
Design and implement an RDMA (RoCEv2, InfiniBand) multi-network architecture and configure SR-IOV and Multus CNI
Establish Rook-Ceph-based distributed storage clusters and develop a dynamic volume provisioning system
Build infrastructure automation and CI/CD pipelines based on IaC (Terraform, Ansible) and GitOps (ArgoCD)
Construct a Developer Self-Service Platform like automated tools and a portal for resource provisioning and task execution, providing a Service Catalog, implementing RBAC (Role-Based Access Control) for permission management, offering Golden Path templates, and applying guardrail policies
Master's or higher degree in Computer Science, Electrical Engineering, or a related field
Experience in building and operating cloud environments using Kubernetes
Experience in building and operating CI/CD pipelines and workflows
Experience in designing and implementing monitoring systems
Excellent problem-solving skills and scripting abilities (Python, Bash, etc.)
Effective communication and collaboration skills
Add these skills to join the top 1% applicants for this job
communication
cpp
talent-acquisition
game-texts
networking
aws
azure
model-serving
ansible
terraform
ci-cd
kubernetes
python
bash
construct
c-make
Responsibilities and Opportunities
Bare Metal Provisioning and Unified Orchestration: Develop a Metal3 and KubeVirt-based automated bare-metal provisioning system and a unified VM/Container orchestration platform
High Performance Networking: Design and implement an RDMA (RoCEv2, InfiniBand) multi-network architecture and configure SR-IOV and Multus CNI
Distributed Storage Systems: Establish Rook-Ceph-based distributed storage clusters and develop a dynamic volume provisioning system
Infrastructure Automation and CI/CD: Build infrastructure automation and CI/CD pipelines based on IaC (Terraform, Ansible) and GitOps (ArgoCD)
Developer Self-Service Platform: Construct a Developer Self-Service Platform like automated tools and a portal for resource provisioning and task execution, providing a Service Catalog, implementing RBAC (Role-Based Access Control) for permission management, offering Golden Path templates, and applying guardrail policies
Key Qualifications
Master's or higher degree in Computer Science, Electrical Engineering, or a related field
Experience in building and operating cloud environments using Kubernetes
Experience in building and operating CI/CD pipelines and workflows
Experience in designing and implementing monitoring systems
Excellent problem-solving skills and scripting abilities (Python, Bash, etc.)
Effective communication and collaboration skills
Ideal Qualifications
Experience in operating private cloud environment and also public cloud like AWS, Azure, GCP
Experience in automating infrastructure using IaC tools like Terraform and Ansible
Experience in providing GPU/NPU-based computing resources for model serving or training
Experience with Python or C/C++ build systems (PyPI, CMake, Conan, etc.)
The application process may vary by job and may change depending on schedule and circumstances.
Application schedule and results will be individually notified via the email address provided during application.
Notes
This announcement may close early if recruitment is completed.
If there are false facts in the application, acceptance may be canceled.
Employment may be restricted if legal qualifications required for employment and job performance are not met.
Being a veteran or a person with a disability does not disadvantage the hiring process.
The scope of duties may change considering the candidate's overall career and experience. If such changes are necessary, they will be communicated with the candidate at an appropriate time before the final acceptance notification.
For inquiries regarding recruitment, please contact the email address below.
recruit@rebellions.ai
Set alerts for more jobs like Kubernetes Infrastructure/DevOps Engineer