The Role:
Working & collaborating with a cross-functional team of Data Scientists, MLOPs Engineers, Solution Architects, Software Engineering & Product Managers to help build an automated solution for data collection. Seamlessly deploy and operationalize models as scalable & robust services which will include requirement understanding, model development, productionizing models, model serving, API/library/CLI development, developing data visualization tools, code refactoring, unit testing and support. As a Data Scientist, you will be a leading contributor in the implementation of Artificial Intelligence (AI) within Data Collections software applications, API’s, and other data products. This role requires significant interaction with both upstream and downstream stakeholders across Technology, Data, Products, Sales/Service, and Research for:
Data Collection and Cleaning: Gathering data from various sources and ensuring its quality by cleaning and organizing it.
Data Analysis: Using statistical techniques and machine learning algorithms to analyze data and uncover patterns, trends, and insights.
Model Building: Creating predictive models and algorithms to solve business problems and improve decision-making.
Data Visualization: Presenting data insights through visualizations and reports to help stakeholders understand the findings.
Collaboration: Working closely with business stakeholders to understand their goals and determine how data can be used to achieve them
Key Requirements:
Must have 4-7 years of relevant experience as Data Scientist.
Experience in extracting data / information, through complex semi-structured and unstructured documents using NLP & Parsing
Analyzing business problem and cut through the data challenges
Ability to churn the raw corpus and develop a data/ML model to provide business analytics (not just EDA), machine learning based document processing and information retrieval.
Quick to develop the POCs and transform it to high scale production ready code.
Must Have Skills:
NLP, Scraping, Parsing including libraries such as NLTK, Gensim, Spacy, Scrapy, beautifulsoup, regex etc.
Deep Learning including Keras, TensorFlow / PyTorch, Neural Networks, such as CNN, LSTM/GRU/RNN/CNN/GAN/Residual Networks etc.
Generative AI, Transfer Learning, Transformers, Embeddings, LLMs, Prompt Engineering, Encoders, Decoders etc.
Supervised, unsupervised, semi-supervised, few shot / zero shot learning including EDA, training, modelling, hyper-parameter tuning, API creation etc. in Regression & Binary/Multiclass classifications in algorithms such as Decision Trees, SVM, XGBoost etc.
Python data structures using List, tuple, dictionary, collections, iterators, Pandas, NumPy etc. including libraries such as Scikit-learn, imblearn, SciPy etc.
Database & SQL knowledge (like Postgres, SQL Server, MySQL etc)
Desired Skills:
AWS services like EC2, Beanstalk, Lambda including Containerization, Docker images etc.
Generative AI, Transfer Learning, Transformers, Embeddings, LLMs, Prompt Engineering, Encoders, Decoders etc.
Object oriented programing(OOP) & Rest API)
CI/CD/CT, MLOps
Morningstar is an equal opportunity employer.
How is it to work with Data collection AI team at Morningstar?
You get to work on
1. Research work coupled with business value
2. Machine learning development Lifecyle, i.e. End to end project development (Not just POCs)
3. Exposure to advanced workspace on cloud environment
4. Encouragement for innovation and ideation
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity
Morningstar’s hybrid work environment gives you the opportunity to work remotely and collaborate in-person each week. We’ve found that we’re at our best when we’re purposely together on a regular basis, at least three days each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you’ll have tools and resources to engage meaningfully with your global colleagues.