About the job
SummaryBy Outscal
Join a team building next-generation security analytics products. Leverage your expertise in data mining, statistical modeling, and machine learning to extract insights from massive datasets. Contribute to a SaaS-based platform processing over 100 million transactions daily. You must have strong Python/R programming skills, SQL experience, and familiarity with machine learning techniques.
About the job
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!
We are seeking a Data Scientist to help build next-generation Security Analytics products. Working with a team of engineers and architects, you will be responsible for prototyping, designing, developing, and supporting a highly scalable SaaS-based Security Analytics product.
This is a great opportunity to be an integral part of a team-building Qualys’ next-generation Micro-Services based technology platform processing over 100 million transactions and terabytes of data per day, leverage open-source technologies, and work on challenging and business-impacting projects.
We are looking for Data Scientist, who will support our Research and Development team with insights gained from analyzing security data.
The ideal candidate has a background in a quantitative or technical field, is adept at using large data sets to find opportunities for product and process optimization, and using models to test the effectiveness of different courses of action.
They must have strong experience using a variety of data mining/data analysis methods, using a variety of data tools, building and implementing models, using/creating algorithms, and creating/running simulations. You are focused on results, a self-starter, and have demonstrated success in using analytics to drive the understanding, growth, and success of a product.
Responsibilities:
- Extract analysis/insights on given data using Data Mining and exploratory data analysis method.
- Develop custom statistical data models and algorithms to apply to data sets.
- Work with Product Management and other stakeholder to develop the hypothesis based on the data and perform statistical test.
- Develop compelling and insightful Data Visualizations and presentations, technical reports, etc.
- Assess the effectiveness and accuracy of new data sources and data gathering techniques.
- Develop processes and tools to monitor and analyze model performance and data accuracy.
- Collaborate with data and subject matter experts throughout the organization to identify opportunities for leveraging data to drive business solutions.
- Understand the Distributed Ecosystem/Cloud computing services and deploy and monitor ML models on the same.
- Designing and deploying Machine Learning Algorithms - both Shallow learning models and Deep learning models including Large Language Models.
Qualifications:
- 3-5 years of work experience with BS or MS or Ph.D. in Computer Science, Information Technology, Data Science, Artificial Intelligence/Machine Learning, or equivalent fields. Specialization in data science/machine learning is preferred.
- Understanding and demonstrated project work in Object-oriented programming concepts - Python, R, Java, Scala etc.
- Proven experience with SQL, Pandas and PySpark in analysing large and complex datasets.
- Experience in visualizing/presenting data for stakeholders using: Matplotlib, Seaborn, ggplot or any data visualization tool.
- Experience with data cleansing, data engineering, data quality assessment, and using analytics for data assessment for both structured/semi-structured and unstructured data(text).
- Solid theoretical and practical understanding of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, Natural Language Processing, etc.) and their real-world advantages/drawbacks.
- Experience of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
- Having experience in developing some use cases related to Cyber Security is desirable.
- Familiarity with distributed data/computing tools: Map/Reduce, Hadoop, Hive, Flink, Spark, Cassandra, Elasticsearch/OpenSearch, Data LakeHouse, etc.
- Publication of original research in the field of AI/ML in reputed and peer-reviewed journals or conferences is highly desirable
- Hands-on Experience in Data science programming skills - Scikit-Learn, XGBOOST, LightGBM, Tensorfflow, PyTorch, JAX, etc.
- Generative AI: Familiarity with building and fine tuning Large Langugae Models on multiple GPU setups using quantization techniques using HuggingFace Transformer/PyTorch libraries (Nice to have).
- Good communication skills and a Team player.