SummaryBy Outscal
Join ByteDance's LLM Global Data Team as an intern and contribute to data cleaning and preparation for large language model training. You will identify, source, and clean data from public datasets, develop data cleaning strategies, and collaborate with product managers and engineers.
Responsibilities
ByteDance will be prioritizing applicants who have a current right to work in Singapore, and do not require ByteDance's sponsorship of a visa.
About ByteDance
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
Why Join Us
Creation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.
Together, we inspire creativity and enrich life - a mission we aim towards achieving every day.
To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At ByteDance, we create together and grow together. That's how we drive impact - for ourselves, our company, and the users we serve.
Join us.
About the team
Be one of our first interns on our LLM Global Data Team, you'll be at the heart of our operations. Gain first-hand experience in understanding the intricacies of training Large Language Models (LLMs) with diverse data sets.
You will:
1. Identify, source and clean data from public data sets.
2. Develop and implement data cleaning strategies and techniques to improve data quality and consistency.
3. Define data format and utilize data cleaning tools or scripts to automatically clean data.
4. Collaborate with product managers and engineers to understand what data is most effective in improving our LLM performance.
We are looking for talented individuals to join us for an internship in 2024. Internships at ByteDance aim to offer students industry exposure and hands-on experience. Watch your ambitions become reality as your inspiration brings infinite opportunities at ByteDance.
Successful candidates must be able to commit to one of the following internship cycles below:
1. Off cycle/Credit bearing - Starting Q1 2024
We will prioritize candidates who are able to commit to a minimum 3-month internship period. Please state your availability clearly in your resume (Start date, End date).
Candidates can apply to a maximum of two positions and will be considered for jobs in the order you apply. The application limit is applicable to ByteDance and its affiliates' jobs globally. Applications will be reviewed on a rolling basis - we encourage you to apply early.
Qualifications
Minimum Qualifications:
1. Professional proficiency in English.
2. Proficient in SQL and one of the programming languages, such python, java and etc.
3. Experience in data cleaning.
Preferred Qualifications:
1. Multilingual preferred.
2. Familiar with Huggingface/Github/Kaggle and other public datasets for instruction tuning.
Note: This role requires a paper test prior to interviews.
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy.
If you have any questions, please reach out to us at apac-earlycareers@bytedance.com