Job Title: Applied Data Scientist
Role Overview
We are looking for a pragmatic and creative Applied Data Scientist to join our cybersecurity team. This role focuses on building ML models that detect phishing threats across email and web channels at scale. You will work at the intersection of ML, adversarial content analysis, and real-time detection, collaborating closely with researchers and engineers to develop robust and explainable defenses against evolving social engineering and phishing attacks.
The Location: We are considering candidates who can work in a hybrid model, based out of or near the Raleigh, North Carolina area.
Key Responsibilities
● Develop and refine NLP models to detect malicious intent, impersonation, and social engineering patterns across emails and web.
● Collaborate with phishing researchers to label real and synthetic phishing samples and validate model behavior.
● Design end-to-end ML/AI pipelines using both traditional NLP methods (TF-IDF, topic modeling) and modern transformer-based models (BERT, RoBERTa, GPT, LLaMA).
● Simulate adversarial phishing attacks using GenAI to stress-test models and improve resilience to evasion techniques.
● Monitor model drift and adversarial adaptation, tuning for generalization across varied customer environments and content types.
● Integrate models into production systems with scalable, interpretable outputs that support real-time decisions.
Requirements
● 5+ years of experience developing NLP models, ideally in cybersecurity, fraud, or related risk-focused domains.
● Proven track record with NLP techniques including text classification, intent detection, and keyword/topic extraction.
● Strong Python programming skills with hands-on experience in ML/NLP libraries such as scikit-learn, spaCy, HuggingFace Transformers, TensorFlow, or PyTorch.
● Experience working with unstructured, adversarial, or noisy datasets (e.g., phishing emails, social engineering content, suspicious sites).
● Ability to work cross-functionally with threat researchers, analysts, and engineering teams to translate findings into detection logic.
Nice to Have
● Familiarity phishing-specific datasets, or indicators like SMTP headers and MIME structures.
● Experience deploying ML models in production environments with a focus on latency, scalability, and interpretability.
● Background in Computer Vision techniques (OpenCV, Tesseract OCR, Vision Transformers etc.), especially for analyzing screenshots or visual phishing artifacts.