LLM Applied Data Scientist (RAG/ NLP)

5 Minutes ago • All levels

Research Development

Job Description

Binance is seeking a highly skilled Research Scientist/Engineer to advance the reasoning and planning capabilities of large foundation models. This role involves enhancing model performance across the entire development lifecycle, including data acquisition, supervised fine-tuning (SFT), reward modelling, and reinforcement learning. You will synthesize large-scale, high-quality datasets, solve complex tasks using System 2 thinking and advanced decoding strategies, design robust evaluation methodologies, and build agents capable of addressing sophisticated real-world problems.

Good To Have:

Publications in top-tier conferences/journals (NeurIPS, ICML, ACL, CVPR, SIGIR, KDD, WWW).
Awards in ACM/ICPC or similar competitions.

Must Have:

Design, develop, and optimize data processing and retrieval pipelines for enterprise-level generative tasks and model training applications.
Research and evaluate advanced AI-native retrieval algorithms to strengthen LLM/VLM/Agentic AI capabilities.
Collaborate with infrastructure and application teams to integrate RAG pipelines into production systems.
Develop and optimize retrieval and ranking pipelines to improve user experience.
Participate in LLM training and RAG system, staying current with techniques such as pre-training, SFT, and reinforcement learning.
Apply NLP, CV, and multimodal methods to analyze user-generated content.
Master’s in Information Retrieval, NLP, Machine Learning, Computer Vision, Multimodal Learning, or related fields.
Proficient in PyTorch with strong coding skills in Python or C++.
Solid theoretical foundation in information retrieval, NLP, and deep learning.
Hands-on experience with RAG, vector databases, multimodal/graph retrieval, or large-scale AI systems.
Strong engineering ability to translate research into scalable, production-level systems.
Self-driven, able to own projects end-to-end (design → implementation → deployment).

Perks:

Shape the future with the world’s leading blockchain ecosystem.
Collaborate with world-class talent in a user-centric global organization with a flat structure.
Tackle unique, fast-paced projects with autonomy in an innovative environment.
Thrive in a results-driven workplace with opportunities for career growth and continuous learning.
Competitive salary and company benefits.
Work-from-home arrangement.

Add these skills to join the top 1% applicants for this job

team-management

communication

cpp

data-structures

game-texts

user-experience-ux

pytorch

deep-learning

computer-vision

reinforcement-learning

python

algorithms

machine-learning

About the Role

We are seeking a highly skilled Research Scientist/Engineer to advance the reasoning and planning capabilities of large foundation models. In this role, you will enhance model performance across the entire development lifecycle—including data acquisition, supervised fine-tuning (SFT), reward modelling, and reinforcement learning—while driving innovations in reasoning and decision-making. You will synthesise large-scale, high-quality datasets through rewriting, augmentation, and generation techniques to strengthen foundation models during pretraining, SFT, and RL stages. A key part of the role involves solving complex tasks using System 2 thinking and applying advanced decoding strategies such as MCTS and A*. You will design and implement robust evaluation methodologies, teach models to interact with external tools, APIs, and code interpreters, and build agents and multi-agent systems capable of addressing sophisticated real-world problems.

Responsibilities

Design, develop, and optimize data processing and retrieval pipelines for enterprise-level generative tasks and mode training applications (Customer Service, Token Report, Web3 Domain Models). This includes embedding, reranking, context engineering, and query rewriting models.
Research and evaluate advanced AI-native retrieval algorithms (e.g., low-latency, multimodal retrieval, hierarchical retrieval, GraphRAG) to strengthen large-scale LLM/VLM/Agentic AI capabilities in Binance products.
Collaborate with infrastructure and application teams to integrate RAG pipelines into production systems, ensuring scalability, reliability, and measurable business impact.
Develop and optimize retrieval and ranking pipelines (indexing, vector search, retrieval scoring, reranking) to improve user experience.
Participate in LLM training and RAG system, staying current with techniques such as pre-training, SFT, and reinforcement learning, and apply them to retrieval and generation tasks.
Apply NLP, CV, and multimodal methods to analyze user-generated content (classification, quality evaluation, trend detection, comment analysis).

Requirement

Master’s in Information Retrieval, NLP, Machine Learning, Computer Vision, Multimodal Learning, or related fields.
Proficient in PyTorch with strong coding skills in Python or C++.
Strong communication skills, intellectual curiosity, and passion for lifelong learning. Able to identify opportunities and drive cutting-edge retrieval & RAG technologies into real-world applications.
Solid theoretical foundation in information retrieval, NLP, and deep learning (experience with embeddings, reranking, query understanding preferred).
Hands-on experience with RAG, vector databases, multimodal/graph retrieval, or large-scale AI systems.
Strong engineering ability to translate research into scalable, production-level systems.
Self-driven, able to own projects end-to-end (design → implementation → deployment).
Publications in top-tier conferences/journals (NeurIPS, ICML, ACL, CVPR, SIGIR, KDD, WWW) are a plus; awards in ACM/ICPC or similar competitions preferred.

Why Binance

Shape the future with the world’s leading blockchain ecosystem
Collaborate with world-class talent in a user-centric global organization with a flat structure
Tackle unique, fast-paced projects with autonomy in an innovative environment
Thrive in a results-driven workplace with opportunities for career growth and continuous learning
Competitive salary and company benefits
Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)

Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.

By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice._

Why Binance

Shape the future with the world’s leading blockchain ecosystem
Collaborate with world-class talent in a user-centric global organization with a flat structure
Tackle unique, fast-paced projects with autonomy in an innovative environment
Thrive in a results-driven workplace with opportunities for career growth and continuous learning
Competitive salary and company benefits
Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)

Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.

By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice._

Set alerts for more jobs like LLM Applied Data Scientist (RAG/ NLP)

Set alerts for new jobs by binance

Set alerts for Research Development (Remote) jobs