Senior AI/NLP Engineer

8 Minutes ago • 4-5 Years • Research Development

Job Summary

Job Description

We are looking for a skilled Document AI / NLP Engineer to develop intelligent systems that extract meaningful data from documents such as PDFs, scanned images, and forms. In this role, you will build document processing pipelines using OCR and NLP technologies, fine-tune ML models for tasks like entity extraction and classification, and integrate those solutions into scalable cloud-based applications. You will collaborate with cross-functional teams to deliver high-performance, production-ready pipelines and stay up to date with advancements in the document understanding and machine learning space.
Must have:
  • Design, build, and optimize document parsing pipelines using tools like Amazon Textract, Azure Form Recognizer, or Google Document AI.
  • Perform data preprocessing, labeling, and annotation for training machine learning and NLP models.
  • Fine-tune or train models for tasks such as Named Entity Recognition (NER), text classification, and layout understanding using PyTorch, TensorFlow, or HuggingFace Transformers.
  • Integrate document intelligence capabilities into larger workflows and applications using REST APIs, microservices, and cloud components (e.g., AWS Lambda, S3, SageMaker).
  • Evaluate model and OCR accuracy, applying post-processing techniques or heuristics to improve precision and recall.
  • Collaborate with data engineers, DevOps, and product teams to ensure solutions are robust, scalable, and meet business KPIs.
  • Monitor, debug, and continuously enhance deployed document AI solutions.
  • Maintain up-to-date knowledge of industry trends in OCR, Document AI, NLP, and machine learning.

Job Details

Project description

We are looking for a skilled Document AI / NLP Engineer to develop intelligent systems that extract meaningful data from documents such as PDFs, scanned images, and forms. In this role, you will build document processing pipelines using OCR and NLP technologies, fine-tune ML models for tasks like entity extraction and classification, and integrate those solutions into scalable cloud-based applications.

You will collaborate with cross-functional teams to deliver high-performance, production-ready pipelines and stay up to date with advancements in the document understanding and machine learning space.

Responsibilities

  • Design, build, and optimize document parsing pipelines using tools like Amazon Textract, Azure Form Recognizer, or Google Document AI.
  • Perform data preprocessing, labeling, and annotation for training machine learning and NLP models.
  • Fine-tune or train models for tasks such as Named Entity Recognition (NER), text classification, and layout understanding using PyTorch, TensorFlow, or HuggingFace Transformers.
  • Integrate document intelligence capabilities into larger workflows and applications using REST APIs, microservices, and cloud components (e.g., AWS Lambda, S3, SageMaker).
  • Evaluate model and OCR accuracy, applying post-processing techniques or heuristics to improve precision and recall.
  • Collaborate with data engineers, DevOps, and product teams to ensure solutions are robust, scalable, and meet business KPIs.
  • Monitor, debug, and continuously enhance deployed document AI solutions.
  • Maintain up-to-date knowledge of industry trends in OCR, Document AI, NLP, and machine learning.

Skills

Must have

  • 4-5 years of hands-on experience in machine learning, document AI, or NLP-focused roles.
  • Strong expertise in OCR tools and frameworks, especially Amazon Textract, Azure Form Recognizer, Google Document AI, or open-source tools like Tesseract, LayoutLM, or PaddleOCR.
  • Solid programming skills in Python and familiarity with ML/NLP libraries: scikit-learn, spaCy, transformers, PyTorch, TensorFlow, etc.
  • Experience working with structured and unstructured data formats, including PDF, images, JSON, and XML.
  • Hands-on experience with REST APIs, microservices, and integrating ML models into production pipelines.
  • Working knowledge of cloud platforms, especially AWS (S3, Lambda, SageMaker) or their equivalents.
  • Understanding of NLP techniques such as NER, text classification, and language modeling.
  • Strong debugging, problem-solving, and analytical skills.
  • Clear verbal and written communication skills for technical and cross-functional collaboration.

Nice to have

  • N/A

Other

  • Languages: English: B2 Upper Intermediate
  • Seniority: Senior

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Bengaluru, Karnataka, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Empower your future with Luxoft: Innovate, thrive and grow in a software-defined world.

Tampa, Florida, United States (On-Site)

Guadalajara, Jalisco, Mexico (On-Site)

Ukraine (Remote)

Gurugram, India (On-Site)

Chennai, Tamil Nadu, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Gdańsk, Pomeranian Voivodeship, Poland (On-Site)

Sofia, Sofia City Province, Bulgaria (On-Site)

Pune, Maharashtra, India (On-Site)

Warsaw, Masovian Voivodeship, Poland (On-Site)

View All Jobs

Get notified when new jobs are added by luxsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug