Case study · Apr 15, 2026
AI Parsing & ETL Microservices
Python microservices that parse resumes, extract structured candidate data, and run ETL into the platform's search and data stores.
Sanitized professional case study based on enterprise recruitment/search platform experience. Client names, internal data, screenshots, and exact metrics are intentionally omitted; this describes the architecture and my role, not proprietary implementation details.
One-line summary
The parsing and ETL layer that turns messy resumes — local files, job-portal exports, and PDFs — into clean, structured candidate data the platform can search and match on.
Problem
Resumes are unstructured and inconsistent. Without reliable extraction, every downstream feature — search, matching, reporting — inherits the noise. Parsing also has to keep up as volume grows, without blocking the recruiter workflow.
Solution
A set of Python microservices that:
- Parse resumes from multiple sources (local, Naukri-style portal exports, PDFs) into normalized candidate records.
- Extract skills, roles, education, and contact fields with AI-assisted extraction where rules alone fall short.
- Load the structured output through ETL into the relational store and the OpenSearch index that powers candidate search.
Designed to run asynchronously so parsing scales independently of the user-facing platform.
My role
Design and guide the parsing/extraction/ETL workflows, the data contracts between services, and how the output feeds the search and matching layers.
Tools
Python microservices, AI extraction, ETL pipelines, PostgreSQL, OpenSearch, and AWS for storage and compute.