Skip links

Data Engineering

Procurement Intelligence: How AI Is Reshaping Enterprise Purchasing

Enterprise procurement is a $13 trillion annual spend category globally, and most of it is managed with spreadsheets, email threads, and institutional knowledge locked in the heads of senior buyers. Over the past 18 months, Harbor Software has built procurement intelligence systems for three enterprise

Margin Analysis Automation: From Spreadsheets to Systems

The Spreadsheet Problem Every finance team we’ve worked with has the same origin story. Margin analysis started in a single Excel workbook. Someone built a clever set of formulas. It worked for a while. Then the business grew, and that workbook became a monster —

ETL vs ELT: Making the Right Choice for Your Data Stack

The ETL versus ELT debate has been running for over a decade, and the conventional wisdom has shifted dramatically during that time. Five years ago, the default recommendation was almost always ETL: extract data from sources, transform it in a dedicated processing layer (typically Spark

Building a Procurement Research Framework with AI

Procurement teams spend an extraordinary amount of time on research before they can make sourcing decisions. Before issuing an RFP, they need to understand the supplier landscape for the category. Before negotiating a contract renewal, they need current market rates and competitive alternatives. Before approving

The Hidden Complexity of PDF Processing

PDF is the cockroach of file formats. It was designed in 1993 by Adobe to faithfully reproduce printed documents on screen, and it has survived every attempt to replace it. Every business uses PDFs. Every developer eventually has to process them. And every developer who

Document Parsing with AI: Extracting Structure from Chaos

Businesses run on documents. Invoices, contracts, purchase orders, spec sheets, compliance filings — the operational backbone of most companies is a sprawling mess of PDFs, Word files, scanned images, and emails with attachments. The information locked inside these documents is critical for operations, compliance, and

Building Data Pipelines That Don’t Break at 3 AM

Every data engineer has the story. Your phone buzzes at 3:14 AM. The nightly ETL job failed. Upstream changed a column name, or the API started returning 429s, or a single malformed record caused the entire batch to abort. You open your laptop in bed,
Explore
Drag