Add these skills to join the top 1% applicants for this job
Hyperconnect AI Lab identifies and solves problems in services that connect people, which are difficult to approach with existing technologies but can be resolved through machine learning. By doing so, we innovate the user experience. To achieve this, we develop numerous models across various domains, including video, audio, natural language, and recommendations. Our goal is to contribute to the growth of actual services by stably providing these models via mobile and cloud servers and resolving any challenges encountered. Under this objective, Hyperconnect AI Lab has been advancing machine learning technologies for several years, contributing to Hyperconnect's products, including Azar.
The ML Platform Team, part of the AI Lab, automates and stabilizes the entire ML production process to ensure that AI technology quickly translates into business impact. Our aim is to maximize the research and development productivity of the entire organization through a sustainable platform.
Currently, we address complex technical challenges arising from operating over 50 models in production. To successfully accomplish this mission, we are responsible for the following core tasks:
We develop and operate ML Ops components to establish an automated virtuous cycle (AI Flywheel) that utilizes product data to retrain, evaluate, and deploy models, thereby continuously improving products. Key components include:
Furthermore, we provide developer portals, SDKs, and CLI tools to control and leverage the aforementioned ML Ops components and platforms, making it easy to build continuous learning pipelines. We also conduct Proof of Concepts (PoC) for rapidly evolving MLOps new technologies and apply them to production when necessary, continuously improving the system.
To support seamless ML research and large-scale model training, we design and build a Slurm-based HPC (High-Performance Computing) GPU cluster optimized for business requirements. This includes the latest GPU resources such as A100/H100, as well as high-speed interconnects like InfiniBand (EDR/HDR/NDR) to minimize bottlenecks between nodes.
We meticulously tune scheduling policies to cost-effectively share limited computing resources within the research organization. We segregate partitions based on workload characteristics and manage job priorities. Additionally, we monitor key metrics by integrating Prometheus and Grafana with Slurm's accounting data, continuously optimizing resource allocation.
To ensure cluster stability and reproducibility, we manage various configurations using IaC (Infrastructure as Code) tools such as Ansible and Terraform. We also integrate parallel/network file systems like Lustre and NFS for large-capacity training data.
We develop and operate automation tools for cluster management, monitoring, disaster recovery, and handling user requests.
For large-scale model training, we adopt cutting-edge distributed training technologies such as FSDP (Fully Sharded Data Parallel) and DeepSpeed to accelerate training speed. For serving, we apply model compilation using NVIDIA TensorRT and ONNX Runtime to meet business requirements (e.g., Latency vs. Throughput). We also implement lightweighting techniques like INT8/FP16 quantization to reduce response times.
We maximize throughput with dynamic batching using Triton Inference Server and significantly reduce cost per query by leveraging high-efficiency computing resources like AWS Inferentia. Through performance profiling, we monitor key metrics such as resource utilization, P99 Latency, and RPS (Requests Per Second), and continuously improve cost-effectiveness by implementing efficient auto-scaling policies using KEDA (Kubernetes Event-driven Autoscaling).
More detailed information on inference optimization with AWS Inferentia can be found at the following links:
We research and develop an inference engine SDK that enables Hyperconnect's on-device AI models to operate stably and efficiently in mobile environments using various frameworks such as TFLite and PyTorch Mobile. Beyond simple model conversion, we apply the latest techniques such as quantization, pruning, SIMD optimization, and GPU/NNAPI acceleration to minimize latency and optimize battery and memory usage.
We also establish mobile model build and deployment pipelines, test automation environments, profiling, and debugging to ensure consistent performance across diverse device environments like iOS/Android. This allows us to deliver a commercial-grade mobile AI platform capable of providing models developed in the research stage to a large user base.
In this process, beyond pure engineering, we collaborate with research teams to explore optimization strategies suitable for model structures and make balanced decisions between model performance and user experience. Consequently, the mobile inference engine we develop ensures seamless and rapid responsiveness with a positive user experience even in resource-constrained environments, delivering AI-based user experience innovation to global users.
We enhance and automate inefficiencies across the entire ML model lifecycle, from data collection and preprocessing to model deployment and monitoring. Beyond merely providing platforms and tools, we quantitatively measure the development experience of ML Engineers. We define and monitor key productivity metrics such as time to first experiment or model deployment lead time. By thoroughly analyzing and improving identified bottlenecks and root causes, we foster a research and development environment where ML Engineers can focus solely on solving core business problems without expending time on infrastructure setup or debugging.
If any false information is found in the submitted content or if there are disqualifying reasons for employment under relevant laws, the recruitment may be canceled. Additional screening and document verification may be conducted beyond the recruitment process announced in advance, if necessary.
National meritorious persons are given preferential treatment according to relevant laws; if applicable, please notify us when applying and submit supporting documents upon hiring.
When applying for a position at Hyperconnect, this privacy policy applies to the processing of personal information: https://career.hyperconnect.com/privacy
#HPCNT