Introducing Our Team (Project)
[Deep Learning Division Vision]
KRAFTON's Deep Learning Division collaborates with various internal and external fields to provide AI solutions for diverse problems and develops our own services through in-house deep learning research. The direction is broadly fourfold:
- Production Cost Down: By applying deep learning technology to many processes involved in game production, we shorten the production process and innovate the work experience of game creators.
- New Way to Create: We expand creators' creativity across game production using various deep learning technologies, including generative AI.
- Virtual Friends: We develop deep learning-based Virtual Friends and apply them to various in-game and out-of-game applications.
- Unique, Endless Gameplay: Through deep learning technology, we provide users with endlessly enjoyable game content by offering different experiences every time.
[R&D]
The Deep Learning Division is researching and developing deep learning technologies such as Language Model, Voice Synthesis, Vision & Animation, Reinforcement Learning, and Data-centric AI, which are necessary for the above vision.
Additionally, we are conducting research on hyper-scale/lightweight models that can effectively encompass these, and multimodal models that freely combine each elemental technology. We integrate these technologies into the actual game production environment to innovate game production work experience and expand creativity.
[Culture Fit]
Members of the Deep Learning Division can interact and collaborate with team members from various fields through diverse projects, proposing creative ideas for various problems. An atmosphere that encourages free expression of opinions regardless of age or position is fostered.
The team is composed of individuals from diverse cultural backgrounds, and we actively support methods to resolve language barriers, such as interpretation and translation, for active communication.
[Team Introduction]
KRAFTON's MLSys & Ops Team designs, builds, and operates GPU infrastructure and ML platforms for model development within the division.
We also support serving and model optimization for ML models deployed in game services, and on-device (edge) deployment when necessary.
Introducing the Mission You'll Undertake with Our Team
This recruitment position is responsible for infrastructure/platform operation and advancement.
Responsibilities (Infra/Platform-centric)
- Design, build, and operate Kubernetes-based ML/GPU clusters
- Scheduling/isolation/security, upgrade/expansion, multi-tenancy/resource efficiency
- GPU platform advancement
- Drivers/runtimes/device plugins, GPU Operator operation, DCGM-based observability, MIG/MPS utilization, capacity/cost/performance optimization
- ML platform component operation
- Experiment/training workspaces, job/pipeline orchestration (e.g., Argo Workflows), artifacts/registries/storage
- Model serving infrastructure operation
- Operating serving stacks based on KServe/Triton/ONNX Runtime/Ray Serve, SLO (latency/throughput/availability) and deployment automation (Canary/Rollout)
- Data path/storage/network design
- Object/block/file storage (Ceph/MinIO, etc.), high-bandwidth/low-latency transmission paths, (if necessary) considering IB/RoCE for training networking
- Observability/reliability/security systems
- Logs/metrics/tracing (OTel), alerts/dashboards, image signing (cosign)/policies (OPA Gatekeeper/Kyverno)/runtime protection
- Standardization/automation
- Operating reproducible platform templates and change management (rollback/audit) based on IaC/GitOps (Terraform/Argo CD, etc.)
We want to grow with someone who has these experiences! (Required Qualifications)
- Production Kubernetes operation experience
- Troubleshooting/upgrades, some practical experience with multi-node/multi-cluster
- Deep understanding of Linux system/resource management
- cgroups/NUMA/IO/networking, container runtimes (containerd/CRI-O)
- GPU workload operation experience
- Some practical experience with NVIDIA GPU Operator, k8s device plugin, DCGM, MIG/MPS, etc.
- Observability and operation automation
- Some experience with Prometheus/Grafana/ELK·Loki/OpenTelemetry, IaC/GitOps (Terraform/Argo CD, etc.)
- Documentation and collaboration skills
- Defining operational standards/SLOs, technical communication like change management/release notes
- No disqualification for overseas business trips
If you have these experiences, you are the one we are looking for! (Preferred Qualifications)
- Experience leading cluster/platform architecture
- Leading cases of large-scale expansion/migration/replatforming or multi-tenancy isolation/cost-performance optimization
- Serving/platform operation experience
- Practical operation of at least one model serving infrastructure like KServe + (Triton/ONNX Runtime) or Ray Serve
- SLO/cost/capacity planning governance
- Establishing and operating targets with key metrics like GPU efficiency/latency/throughput, leading on-call/post-incident analysis
- New construction of on-premise GPU clusters, in-depth operation of CNI (Cilium/Calico) and service mesh (Istio/Envoy)
- Distributed learning infrastructure experience
- NCCL/GPUDirect, IB/RoCE network, (Ray/Horovod/DeepSpeed, etc.) learning job orchestration
- Storage/data path optimization
- Design/operation and performance tuning of Ceph/Rook, MinIO, parallel/distributed file systems
- Pipeline/platform engineering
- Experience operating job/pipeline orchestration like Argo Workflows·Kubeflow·Airflow, MLflow (Model Registry), Feast (Feature Store)
---
To join KRAFTON's challenge, the following selection process is required.
- Document Screening > Phone Interview > Pre-Test > Technical Fit Interview > Culture Fit Interview > Offer & Onboarding
- This is an ongoing recruitment, and early closure may occur if excellent candidates are hired.
- Successful candidates for each stage will be individually notified via the email or phone number registered in their application.
- Please note in advance that the phone interview is an optional stage conducted as needed, not a mandatory one. Details will be provided individually.
- Additional interviews may be required, and details will be provided individually.
Please check the required documents!
- Application form (free format), academic transcript, self-introduction, career description, portfolio (mandatory)
- For new graduates, please focus on the self-introduction; for experienced candidates, focus on the career description.
- When attaching a portfolio, please check the information below.
Work Location
Employment Type
---
Please check the information below!
- Those eligible for employment protection, such as individuals with disabilities and national merit recipients, will be given preference according to relevant laws.
- If there are false statements in your application, your acceptance may be revoked.
- A 5-month probationary period applies. The company may refuse final employment based on the evaluation results during the probationary period, and even during the probationary period, it may terminate the probationary period early and refuse final employment based on interim evaluation results. There will be no changes in employment type or salary during the probationary period.
- For questions during the recruitment process, please refer to the KRAFTON Recruitment FAQ.
BE BOLD, LEARN AND WIN! Would you like to learn about KRAFTON's growth and challenge story?