Financial Unstructured PDF to CSV Conversion

  • Tech Stack: Python, pdfplumber, Microsoft Table Transformers, OCR, FastAPI, Pandas
  • Domain: Data Forensics / Financial Analysis

Engineered a deterministic PDF extraction engine to convert unstructured, multi-format bank statements into analysis-ready CSVs for forensic investigations at Prosoft e-Solutions.

Architected a Tri-Modal pipeline with a geometric router, combining spatial heuristics, Microsoft Table Transformers, and OCR for high-accuracy parsing across diverse PDF formats.

The system handles mixed-format documents including scanned images, digitally-generated tables, and hybrid layouts with robust error handling and fallback mechanisms.