LLM Data Annotation Intern - Data Cleaning - 2024 Start

2 Hours ago • Upto 1 Years

About the job

SummaryBy Outscal

Join ByteDance's LLM Global Data Team as an intern and contribute to data cleaning and preparation for large language model training. You will identify, source, and clean data from public datasets, develop data cleaning strategies, and collaborate with product managers and engineers.
Responsibilities
ByteDance will be prioritizing applicants who have a current right to work in Singapore, and do not require ByteDance's sponsorship of a visa. About ByteDance Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content. Why Join Us Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible. Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve. Join us. About the team Be one of our first interns on our LLM Global Data Team, you'll be at the heart of our operations. Gain first-hand experience in understanding the intricacies of training Large Language Models (LLMs) with diverse data sets. You will: 1. Identify, source and clean data from public data sets. 2. Develop and implement data cleaning strategies and techniques to improve data quality and consistency. 3. Define data format and utilize data cleaning tools or scripts to automatically clean data. 4. Collaborate with product managers and engineers to understand what data is most effective in improving our LLM performance. We are looking for talented individuals to join us for an internship in 2024. Internships at ByteDance aim to offer students industry exposure and hands-on experience. Watch your ambitions become reality as your inspiration brings infinite opportunities at ByteDance. Successful candidates must be able to commit to one of the following internship cycles below: 1. Off cycle/Credit bearing - Starting Q1 2024 We will prioritize candidates who are able to commit to a minimum 3-month internship period. Please state your availability clearly in your resume (Start date, End date). Candidates can apply to a maximum of two positions and will be considered for jobs in the order you apply. The application limit is applicable to ByteDance and its affiliates' jobs globally. Applications will be reviewed on a rolling basis - we encourage you to apply early.
Qualifications
Minimum Qualifications: 1. Professional proficiency in English. 2. Proficient in SQL and one of the programming languages, such python, java and etc. 3. Experience in data cleaning. Preferred Qualifications: 1. Multilingual preferred. 2. Familiar with Huggingface/Github/Kaggle and other public datasets for instruction tuning. Note: This role requires a paper test prior to interviews. ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too. By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy. If you have any questions, please reach out to us at apac-earlycareers@bytedance.com

About The Company

Where imagination meets innovation, delivering limitless gaming experiences.

View All Jobs

Similar Skill Jobs

Aristocrat Gaming - Data Analyst

Uttar Pradesh, India (Hybrid)

Go Fund Me - Manager, Data Science

California, United States (Hybrid)

Truecaller - Senior Android Engineer

Stockholm County, Sweden (On-Site)

paypay - Android Engineer

Worldwide (Remote)

Social Discovery Group - Senior NLP Engineer

Serbia (Remote)

Social Discovery Group - Senior NLP Engineer

Georgia (Remote)

Social Discovery Group - Senior NLP Engineer

Poland (Remote)

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug