AI Parsing & ETL Microservices

Sanitized professional case study based on enterprise recruitment/search platform experience. Client names, internal data, screenshots, and exact metrics are intentionally omitted; this describes the architecture and my role, not proprietary implementation details.

One-line summary

The parsing and ETL layer that turns messy resumes — local files, job-portal exports, and PDFs — into clean, structured candidate data the platform can search and match on.

Problem

Resumes are unstructured and inconsistent. Without reliable extraction, every downstream feature — search, matching, reporting — inherits the noise. Parsing also has to keep up as volume grows, without blocking the recruiter workflow.

Solution

A set of Python microservices that:

Parse resumes from multiple sources (local, Naukri-style portal exports, PDFs) into normalized candidate records.
Extract skills, roles, education, and contact fields with AI-assisted extraction where rules alone fall short.
Load the structured output through ETL into the relational store and the OpenSearch index that powers candidate search.

Designed to run asynchronously so parsing scales independently of the user-facing platform.

My role

Design and guide the parsing/extraction/ETL workflows, the data contracts between services, and how the output feeds the search and matching layers.

Tools

Python microservices, AI extraction, ETL pipelines, PostgreSQL, OpenSearch, and AWS for storage and compute.