Financial Unstructured PDF to CSV Conversion
- Tech Stack: Python, pdfplumber, Microsoft Table Transformers, OCR, FastAPI, Pandas
- Domain: Data Forensics / Financial Analysis
Engineered a deterministic PDF extraction engine to convert unstructured, multi-format bank statements into analysis-ready CSVs for forensic investigations at Prosoft e-Solutions.
Architected a Tri-Modal pipeline with a geometric router, combining spatial heuristics, Microsoft Table Transformers, and OCR for high-accuracy parsing across diverse PDF formats.
The system handles mixed-format documents including scanned images, digitally-generated tables, and hybrid layouts with robust error handling and fallback mechanisms.