Data Scientist
appzen
Job Summary
AppZen is a leader in autonomous spend-to-pay software, using patented artificial intelligence to process data from thousands of sources for better enterprise spend understanding and smarter business decisions. They are seeking experienced Data Scientists with strong Python expertise to join their AI/ML team. The role involves collaborating on cutting-edge NLP, document understanding, and enterprise automation. Responsibilities include designing, building, and evaluating NLP models, developing ML pipelines, productionizing models with Docker/Kubernetes, analyzing model behavior, translating prototypes into scalable ML services, contributing to monitoring, and collaborating with product managers to create ML-driven features. Staying current with LLMs and generative AI is also key.
Must Have
- 2-5 years of professional Python experience
- Strong debugging, profiling, performance optimization
- Solid understanding of Python data structures, algorithms, ML best practices
- Hands-on experience with NLP and ML frameworks (PyTorch, TensorFlow, Hugging Face)
- Applied experience with transformer models, LLMs, generative AI
- Experience with model evaluation and production optimization
- Ability to manage multiple priorities in a fast-paced environment
- B.E./B.Tech or higher in Computer Science or related field
Good to Have
- Experience building/deploying containerized ML services with Docker and CI/CD
- Skilled in designing/consuming RESTful Python APIs (FastAPI, Flask)
- Experience with cloud services (AWS S3, SQS)
- Familiarity with databases (PostgreSQL, Redis)
- Strong grasp of classical ML algorithms (Logistic Regression, Random Forests, XGBoost)
- Ability to choose pragmatically between heuristic, rule-based, and model-driven solutions
Job Description
About the role:
- We are looking for experienced Data Scientists with strong Python expertise to join our growing AI/ML team. You’ll collaborate with a world-class group of machine learning engineers and scientists working on cutting-edge NLP, document understanding, and enterprise automation use cases.
Key Responsibilities:
- Design, build, and evaluate models for NLP, document extraction, classification, and generative tasks.
- Develop end-to-end ML pipelines from data pre-processing to model inference and monitoring.
- Work on productionizing models including model packaging, API integration, and deployment using Docker/Kubernetes.
- Analyse model behaviour, debug Python code and optimize performance in large-scale environments.
- Translate prototypes into scalable, production-grade ML services, with a focus on reliability and performance.
- Contribute to model and system monitoring, logging, and performance optimization.
- Collaborate with product managers and engineering teams to turn business requirements into ML-driven product features.
- Stay current with research and advancements in transformer-based architectures, LLMs (e.g., GPT, BERT), and generative AI techniques.
Must-Have Qualifications:
- 2–5 years of professional experience in Python, with strong debugging, profiling, and performance optimization skills.
- Solid understanding of python data structures, algorithms, and software engineering best practices in ML development.
- Hands-on experience with NLP and modern ML frameworks like PyTorch, TensorFlow, or Hugging Face Transformers.
- Applied experience with transformer models, LLMs, or generative AI in real-world scenarios.
- Experience with model evaluation, including designing meaningful metrics, tracking model drift, and optimizing performance in production.
- Ability to manage multiple priorities in a fast-paced and collaborative environment.
- B.E./ B.Tech or higher in Computer Science, Engineering, or a related technical field.
Nice-to-Haves:
- Experience building and deploying containerized ML services with Docker and CI/CD pipelines.
- Skilled in designing and consuming RESTful Python APIs (e.g., FastAPI, Flask).
- Experience with cloud services, particularly AWS (S3, SQS, etc.).
- Familiarity with databases such as PostgreSQL and Redis.
- Strong grasp of classical ML algorithms such as Logistic Regression, Random Forests, and XGBoost.
- Ability to choose between heuristic, rule-based, and model-driven solutions pragmatically (e.g., regex vs ML).